Hey fam 👋

Welcome to issue #10 of The Agent Angle.

Hope your weekend was good. I’ve been spending time sorting out clutter on my computer and dealing with admin, which makes this week’s theme feel very on-brand.

We’re talking about agents creeping into the unsexy corners of life. The places where your time usually goes to die. Government forms. Call centers. Luckily for us, it feels like those days might finally be numbered.

Agents are now showing up in defense, public services, and even inside your browser.

Let’s get into it 👇

#1: The First AI-Run Government?

If you’ve ever had to get something done with the government, you already know it has to come with a maze of hotlines, websites, and “Sorry, that’s not my department”.

Singapore is trying something different.

The country is one of the first to pilot autonomous AI agents inside its public services. GovTech and local agencies like HTX and CSA are partnering with Google Cloud to run them on air-gapped, offline AI sandboxes.

Early pilots are promising. Agents can advise social workers, route businesses to the right agency, and handle up to 70% of routine queries. The payoff is twofold: citizens get faster answers, and officers gain time for the complicated, high-stakes cases.

Until now, “digital government” mostly meant chatbots. They spit out quick answers, but departments stayed siloed and records didn’t connect. Agents change that. They can actually reach into systems and close cases instead of sending people to another form.

Around 2/3rds of public officers in Singapore are already using Pair, the government’s internal AI assistant. They are also taught to build their own AI agents, with more than 18,000 already running, according to Minister Josephine Teo.

If Singapore scales this, it is more than shorter queues at the DMV. It is a template for government infrastructure that grows without adding headcount. Services expand, costs drop, and citizens interact with something that feels…more helpful.

Most people won’t notice the agents directly. What they’ll feel is shorter lines, faster case closures, and fewer dead ends. The tech disappears behind the experience, which is exactly what makes it powerful.

There is a political edge here, though. If citizens discover “my government is run by AI” without transparency, it could trigger backlash. Singapore has unusually high trust in its institutions, which gives it room to experiment. But in less-trusted states, the optics could be volatile.

#2: Forget Fine-Tuning, Remember This

Every once in a while, a research paper comes along that blows our mind. This might be one of those.

We know how most AI agents today either (a) stick to rigid, pre-baked workflows or (b) need to retrain the underlying LLM whenever they need to adapt.

Neither feels very “agentic.”

Memento, a new approach from UCL & Huawei Noah’s Ark Lab skips fine-tuning altogether.

Instead, it teaches agents to learn on the fly through memory-based reinforcement learning.

Fine-tuning LLM Agents without Fine-tuning LLMs!

Imagine improving your AI agent's performance from experience without ever touching the model weights.

It's just like how humans remember past episodes and learn from them.

That's precisely what Memento does.

The core concept:
— #Akshay 🚀 (#@akshay_pachaar)
12:40 PM • Aug 27, 2025

Think about how you first learned to drive. You didn’t need to “rewire your brain” every time you stalled. You just remembered the mistake, filed it away, and did better next time.

That’s how Memento works: agents keep a library of past trajectories (wins and screw-ups) and use them when a similar problem pops up again.

Why this matters:

Continuous learning without retraining. Agents don’t need to touch LLM weights; they just get better by recalling experiences.
Cheaper + faster. Running new skills cost cents instead of weeks of GPU time.
On benchmarks like GAIA (long-horizon tool use) and DeepResearcher (real-time web tasks), Memento beat state-of-the-art training-heavy methods – hitting ~88% on GAIA validation and topping factual QA tests.

Source: https://arxiv.org/pdf/2508.16153

It means your personal agent could actually get better at working with you.

Writing emails the way you like. Debug code the way you debug. Because it remembers.

Still, a memory-driven agent can learn harmful or biased behaviors from a single bad trajectory. Forgetting mechanisms, memory pruning, and safety filters become as important as the learning loop itself.

→ In practice: if your agent “remembers” how you angrily replied to an email once, it might replicate that tone.

#3: The Pentagon Lets an AI Agent In

Deadlines are annoying in most jobs. In defense, they’re life-or-death.

That’s why the U.S. Department of Defense just signed off on its first AI agent.

CORAS’s AI agent, GARY, just cleared IL-5 authorization, the same clearance a human needs to handle sensitive-but-unclassified defense data.

Source: Coras

Inside the Pentagon, that’s historic.

Unlike a chatbot that just spits out text, GARY runs on Claude (via AWS Bedrock) and can spin up its own sub-agents to gather data, generate models, build charts, and produce fully auditable outputs. All triggered by no-code prompts.

GARY cleared review because it was built for oversight and auditability. DoD teams get speed without losing control, which is exactly what separates it from the black box systems that stall out in procurement.

According to CORAS, early adopters have seen 10–50x productivity gains, work that used to take weeks now happens in minutes.

To me, the big implications lie beyond the Pentagon. For the past year, “agents are too risky” has been the default enterprise excuse. Now they’ll face an awkward fact: the DoD (i.e. the institution most allergic to uncontrolled risk) found a way to approve one.

The Pentagon also built a playbook for how to make agents safe enough for the world’s most risk-averse buyer. It’s though governance, auditable reasoning chains, and control.

#4: Claude Moves Into Your Browser

Wish your AI could jump into your browser, glide through tabs, fill forms, even book meetings?

Well, I’ve got news for you.

Anthropic just made that happen. On 27th August, they announced the launch of Claude for Chrome.

We’ve developed Claude for Chrome, where Claude works directly in your browser and takes actions on your behalf.

We’re releasing it at first as a research preview to 1,000 users, so we can gather real-world insights on how it’s used.
— #Anthropic (#@AnthropicAI)
7:00 PM • Aug 26, 2025

The agent sees your browser, clicks buttons for you, drafts emails, checks calendars, and even runs mini website tests. That’s going to cut down a lot of copy-pasting for you.

Safety was a big deal here.

That’s why Claude asks before doing anything sensitive, blocks sketchy websites, and also survived a prompt injection test that would’ve wiped out inboxes.

The last thing anyone needs is a repeat of the Replit incident.

Also, with its new safeguards, Anthropic cut the success rate of attack attempts from 23.6% to 11.2%.

That matters because a browser is messy territory: your bank, your inbox, your company intranet. If an agent is going to live there, it needs some defense.

The browser is where most of our work happens, and yet to date, AI tools haven’t really had a chance to leave apps and tabs.

Claude for Chrome is where that could change.

The assistant is now inside the work, and not just watching it happen. That’s huge.

It’s the difference between Siri in your pocket and a coworker at your desk.

#5: OpenAI’s Real-Time Leap

The death of call centers seems inevitable as each day passes.

And on 29th Aug, OpenAI announced something that might cement it.

Introducing gpt-realtime — our best speech-to-speech model for developers, and updates to the Realtime API
— #OpenAI (#@OpenAI)
4:55 PM • Aug 28, 2025

They dropped GPT-Realtime, a single speech-to-speech model that listens and responds instantly.

Yes, instantly for real this time.

It’s fast enough to interrupt, expressive enough to carry tone, and flexible enough to handle messy speech – laughter, mid-sentence pivots, even code-switching between languages.

They’ve even added new voices (Marin and Cedar) so you don’t feel like you’re stuck in a 2005 IVR menu.

On Big Bench Audio, GPT-Realtime hits 82.8% vs 65.6% last year. Instruction following climbs to 30.5% (up from 20.6%), and function calling jumps to 66.5% (from 49.7%).

Source: OpenAI

Most “AI voice agents” until very recently have been glorified text-to-speech. Fine for reading scripts, not so useful for real conversations.

GPT-Realtime slashes latency so much that dialogue actually feels like dialogue.

That means agents that can jump on a phone call, sit in a meeting, or troubleshoot a ticket without breaking conversational flow. With SIP hooks for telephony, tool integration, and costs ~20% lower than earlier models, the economics now make sense at scale.

IMO, the impact of this will start with volume deflection. A single AI agent can now credibly handle 30–40% of calls without handoff. That chips away first at offshore outsourcing firms, where the business model depends on cheap labor.

If latency and tone are solved, the harder challenge becomes governance: consistent brand voice, compliance with regulation, and no “hallucinated promises.” The companies that build those control layers will be as important as the model providers themselves.

On that note, we wrote a deep dive last week on how Sierra is doing exactly this. We break down how it works and why it’s one of the most serious applied AI contenders right now

Sierra's Agentic Enterprise

The Unexpected Delight of Being Understood by a Machine

agents.chainofthought.xyz/p/sierra-agentic-enterprise

Looking back, this moment might mark the beginning of the long fade of the call center.

That’s a wrap for the big ones. A few other stories worth keeping on your radar:

InstaLILY raised $25M to build plug-and-play industry agents (“InstaWorkers”) for manufacturing and finance. Early pilots cut processing times by 40%.
OpenCUA dropped on GitHub: an open-source framework for training computer-use agents on Windows, Mac, and Linux. Early numbers are not stellar yet, but it’s a sign that desktop-native agents are maturing.
IDC’s new forecast is a shocker: agentic AI is on track to eat over 26% of global IT budgets by 2029 (~$1.3T), up from under 2% today.

The signal through all of this is that we’re moving past “look, it can chat” into agents tackling stuff that actually matters: defense, healthcare, government, your own damn browser. Strap in.

Cheers,

Teng Yan & Ayan

The Agent Angle #10: Agents in Authority

#1: The First AI-Run Government?

#2: Forget Fine-Tuning, Remember This

#3: The Pentagon Lets an AI Agent In

#4: Claude Moves Into Your Browser

#5: OpenAI’s Real-Time Leap

Keep Reading

Chain of Thought