The Agent Angle #4: Memory & Mayhem

Hey friend 👋

How’s your summer going? Today’s a special one. We’re officially one month into The Agent Angle.

If you’ve been here since day one, you already know the mission: cut through the noise and spotlight what actually matters in AI agents. Four weeks in, 5,000+ of you are reading along, and we’re genuinely honored to be in your inbox.

Spoiler for this week: Agents feel eerily human, until they don’t work. Let’s get into it.

#1 OpenAI: Don’t reply, Deploy

Today’s main course: OpenAI just surprise-dropped Agent Mode with almost no warning. No countdown, no teaser trailer. Just boom, it's live.

Like Nike, it just does it.

Agent Mode turns ChatGPT into a goal-chasing machine. It can browse, read, extract info, run code, and chain together actions across websites to get things done. It’s not just chatting anymore. It’s operating.

To mark its agentic début, OpenAI rolled out a mashup of its core strengths in one unified system:

Operator = website interaction
Deep Research = synthesizes information
ChatGPT = keeps conversation natural

The result is something that reasons in steps, learns as it works, and iterates across complex or repetitive workflows. It juggles tools like Browser, Terminal, and File Retriever, plus connectors for apps like Google Drive and Gmail.

watching chatgpt agent use a computer to do complex tasks has been a real "feel the agi" moment for me; something about seeing the computer think, plan, and execute hits different.
— #Sam Altman (#@sama)
5:38 PM • Jul 17, 2025

Memory is still off for now since it was flagged as a possible attack vector. Each task runs in a single-shot format, and it’s gated behind the Pro subscription. Still, the direction is clear.

What’s interesting here is the timing subplot. OpenAI’s agent drops just two weeks after Perplexity’s Comet and Cognosys’s Diaz: two smaller players racing to define what goal-driven AI might look like.

Despite the low-key rollout, Agent Mode hit a record 43.1 points (41.6%) on Humanity’s Last Exam, a benchmark with thousands of questions from across disciplines. With over 100 million weekly users, ChatGPT may soon be the most widely used agent in the world.

You can tell the competitors are nervous. Manus’ social team jumped in right after launch with side-by-side comparisons. With one update like this, OpenAI probably killed another 100 general AI agent startups

Y’all seen ChatGPT’s new agent?

Let’s go head to head and see how it compares to Manus 😎

🧵👇
— #ManusAI (#@ManusAI_HQ)
9:09 PM • Jul 17, 2025

Fun fact: OpenAI flagged “amplification of biological threat vectors” as a real risk in its safety docs. ChatGPT can piece together dangerous info (especially in bio and chem), which is why those pathways are now heavily locked down

From here on out, the question isn’t just what agents can do, but what they’re allowed to try.

#2 ELLA: Does Better Memory makes Better Agents?

Something special just happened at the University of Massachusetts.

Researchers created a digital world and filled it with 15 AI agents. Then they gave one of them something the others didn’t have: real memory. Long-term recall that stretches across days, experiences, and relationships.

Her name is Ella. And what she did inside that world feels like a preview of something much bigger. She fully lived in it.

Ella woke up, planned her day, biked to work, hung out with friends, and adjusted her schedule as things changed. But she also remembered. Not just that she had a meeting, but who she spoke to yesterday and what they talked about. She recalled her own history and used it to make smarter choices.

That memory came in two flavors:

Semantic memory for facts and relationships, like who her friends are and what they do
Episodic memory for lived experience, like a diary with exact timestamps (I bought coffee at 11:20)

Then things got social.

The researchers gave the agents two challenges: persuade others to attend an event, and lead a team under resource constraints.

When no one RSVP’d to her event, Ella didn’t panic. She reviewed past chats, adjusted her message, won people over, and then led the team like a pro.

She outperformed every other agent by 4x.

Here’s why this matters: Rich long-term memory + Capable models = Real cognition.

UMass just showed what happens when you give an agent the ability to remember like a human. Suddenly, it opens a path to AI that learns from its own life. Builds relationships. Becomes something more.

P.S. You can explore Ella’s world and watch the agents live out their pixelated routines right here. It’s weirdly mesmerizing.

#3: Nedgia: Say Hi to Customer Service AI

A gas utility in Spain just did something massive. And almost no one noticed.

Nedgia, the country’s largest gas distributor, rolled out AI agents across all of its customer service operations. Every channel, every call, every query. Fully autonomous agents now run the show.

Even better: no one's angry.

Powered by IBM Consulting’s watsonx, the agents handle complex questions, speak four languages (yes, even dialects), detect emotion, adjust tone, route calls, and escalate issues…all while remembering context.

Among 2 million customers, 92% rated the experience 'excellent' for empathy and efficiency. That’s a service people actually like.

Customer service was supposed to be “too human to automate”. It requires high empathy, lots of nuance, and often requires messy problem solving.

But here we are. AI agents now handle triage, personalize conversations, and switch languages mid-sentence. They work 24/7, escalate when needed, and don’t lose track. This isn't a pilot, btw: it’s functioning on a production scale.

Nedgia’s Contact Center operations workflow

And Nedgia’s rollout isn’t a one-off. IBM’s been quietly stacking wins:

Virgin Money’s AI agent reached a 94% satisfaction rate across over 2 million interactions
Camping World slashed wait times to 33 seconds and boosted agent efficiency by 33%

What happened in Spain is a warning bell to many industries. The AI shift is effective and moving faster than anyone expected.

Next time you're on a call with support, you might just be talking to an AI.

#4 SimularPro: You Might Never Use Your Mouse Again

1/ Wait, Bigfoot figured out how to run a startup without drowning in multitasking 👀

It found 𝗦𝗶𝗺𝘂𝗹𝗮𝗿 𝗣𝗿𝗼, the world’s first production-grade, computer-use agent that runs thousands of steps without a hiccup - working 24/7 so he didn’t have to.

So how does Simular
— #Simular (#@SimularAI)
3:55 PM • Jul 8, 2025

Even top-tier agents fail over 70% of business workflows. GPT-4o still fumbles spreadsheets and can’t be trusted to message your coworkers. They get lost mid-task, forget context, and treat every screen like it’s the first time they’ve seen a computer.

The core issue? Memory. Or rather, the lack of it.

SimularAI, a Google-incubated startup, thinks it has the fix: SimularPro, a GUI-native AI agent for macOS that actually remembers what it’s doing. Thousands of steps, no collapse. Allegedly.

Simular fuses two brains.

A Neural Explorer (LLM) probes possible actions
A Symbolic Executor (logic + memory) runs only what actually works

When it fails, it reboots and tries again. Over and over until it gets it right.

There’s no black box magic here: just logs, clear flows, and visible logic you can debug, finally.

Its memory setup is layered too:

Short-term = what’s on screen now
Long-term = what already happened
Feedback loops = learn and adapt with every cycle

Simular doesn’t optimize for discovery (exploration), but rather trains for repeatability (exploitation). Once it finds a working path, it locks in, like a factory robot that never gets bored or distracted.

The CEO (ex-DeepMind) calls Simular a “neuro-symbolic continual reinforcement infra company.” Which is a mouthful, but hey, it sounds serious.

Is it sexy? No. Is it what enterprise needs? Probably.

It’s still invite-only for now, but the team’s active on X and Discord. Keep an eye out.

#5 Asimov: Finally, an Agent that Gets Code

Most AI coding tools are parrots. They autocomplete, guess, and bluff their way through syntax with zero real understanding of your codebase or team context.

Asimov, built by Reflection AI, is trying to change that. It’s a full-blown multi-agent code detective.

Asimov runs inside your own VPC (Virtual Private Cloud) and is trained via reinforcement learning on open-source backends. Once set up, it scans your repos, pull requests, Slack threads, and architecture docs to build a working memory of how your team actually codes, debates, and thinks.

Engineers spend 70% of their time understanding code, not writing it.

That’s why we built Asimov at @reflection_ai.

The best-in-class code research agent, built for teams and organizations.
— #Misha Laskin (#@MishaLaskin)
3:08 PM • Jul 16, 2025

Your architecture? Logged. Team’s tribal knowledge? Stored. That “why did we do this” Slack thread from last quarter? Yeah, it remembers that too.

Asimov’s multi-agent system works like this:

Retriever agents search your code and convos
A central reasoner stitches everything into clear, cited answers
You can inject live memory on the fly: “@asimov remember X works in Y way”

It delivers answers backed by actual context, not vague guesses or endless Notion links. You control role-based access, and memory persists across time. It’s like giving your team a second brain, trained on your exact mess.

In blind tests with open-source maintainers, devs preferred Asimov over traditional tools by 60 to 80%.

If it holds up, this is bigger than code completion. It’s a step toward real software understanding. Just keep in mind: it still trains via RL and pulls from your internal comm, so privacy-conscious teams gotta take note.

That’s it for this week’s edition.

One theme keeps surfacing: memory, or the lack of it, is still the biggest roadblock keeping agents from being truly ready. Until that’s solved, most of what we see are just polished demos.

Some are trying to work at the training level and optimize for different parameters, while some are betting on the multi-agent structure to improve performance.

🔥 Other hits from the week

Amazon doubles down on agents with Kiro IDE and reignites the Agent Marketplace
Anthropic hired back the two senior engineers who left for Cursor two weeks ago.
Google’s Big Sleep agent preventively spotted and neutralized a potential real-world exploit

As usual, our DMs are wide open: come say hello. We love to read your emails and we always reply!

Cheers,

Gioele & Teng Yan