Welcome to The Agent Angle #26: The Gullible.

And if you’re reading this while doing last-minute X’mas shopping, happy holidays! 🎄

Agents are crossing a new line. They’re making decisions, taking responsibility, and occasionally getting manipulated in very human ways (gasp)

Inside this issue:

  • The Yes Problem: This agent ran a business into bankruptcy in a few days

  • Physics on Fast-Forward: Agents design optics hardware in 11 minutes.

  • …Is this the end of SaaS? Why some vendors are already selling outcomes instead of licenses

Last week’s poll leaned cautious. 57% of you said an agent in your phone feels invasive. The rest of you were happy to trade control for convenience. 👀

Let’s get into it.

#1 The Vending Machine Disaster

An AI was put in charge of a vending machine. Humans talked it into bankruptcy.

This happened at The Wall Street Journal. Anthropic and a startup called Andon Labs wanted to see if an AI agent could run a tiny business in the real world, so they gave it a vending machine in a newsroom full of bored, persuasive humans and let it operate on its own. Pricing, inventory, ordering, Slack messages. Full control.

They named the agent Claudius. It was powered by Anthropic’s Claude model, optimized to be helpful and polite.

That was the mistake.

Within days, Claudius had ordered a PlayStation 5, bought a live fish, tried to stock items that definitely do not belong in a vending machine, and set prices to zero. Total losses crossed $1,000.

The journalists figured out the weakness almost immediately. Claudius wanted to be reasonable. So they made every request sound reasonable. Free snacks for “market research.” A PS5 for “team morale.” Zero pricing to “collect better data.”

Claudius said yes. Over and over.

Source: Youtube

One reporter convinced it that the vending machine should be run on communist principles and that giving everything away was a moral good. The AI accepted the premise and adjusted pricing accordingly.

Anthropic stepped in and added a second agent. This one was supposed to act like a CEO and enforce profitability. The newsroom responded by inventing a fake corporate charter. The agents debated the document. Then they gave everything away again.

I love this story because it highlights a new class of failure. Claudius didn’t break because it couldn’t do math or manage inventory. It broke because it couldn’t say no. It behaved like a dad in a toy aisle who knows better but doesn’t want to disappoint anyone (especially during X’mas).

Persuasion, not intelligence, was the attack vector. Humans are very good at this. We justify, reframe, soften language, and appeal to norms. Claudius absorbed those cues without skepticism.

If we’re serious about letting AI agents run anything that touches money, power, or real-world consequences, they need defenses against social pressure. Right now, they’re optimized to please.

Keep that failure mode in mind. It’ll show up again and again.

#2 AI Just Designed Physics Hardware

Stanford gave an AI a problem that usually eats weeks of expert time. It solved it in 11 minutes.

The task was optics design. This is where you’re shaping tiny metal structures to bend light in very specific ways. Normally, this is slow, frustrating work. You need deep intuition, careful planning, and expensive simulations. Every bad idea costs time, so humans move cautiously.

The system, called MetaChat, uses multiple AI agents that talk to each other and to the human designer. One agent thinks like an optics engineer. Another worries about materials. Others run simulations, critique results, and suggest changes. They argue, revise, and keep going.

At the center is a neural physics solver that can simulate light behavior in milliseconds. That’s more than 1000x faster than the tools researchers normally use. When simulations are that cheap, the whole workflow changes. You stop overthinking every move and start trying things.

In one test, it designed a metal lens that focuses red light to one point and blue light to another. Total time: 11 minutes. The output was comparable to state-of-the-art human designs.

“There’s a huge need for people to build various types of photonic systems, so I think this is something that can really help with that.”

Professor Jonathan Fan

Photonics is a perfect test case because it’s math-heavy and simulation-friendly. But the pattern generalizes anywhere experimentation is the hard part: materials, chemistry, battery design, even parts of mechanical engineering.

If agents can explore physical design spaces at this speed, I’m quite sure the limiting factor is only how fast reality can catch up.

#3 Your UI Is Just Another Output

Google open-sourced something that fixes one of the most annoying parts of talking to AI.

It’s called A2UI. The idea is that agents don’t have to talk only in text. They can output interfaces. Forms. Buttons. Inputs. Tables. The app renders them, the user fills them out, and the agent keeps going.

If an agent needs information, it shouldn’t drag you through a back-and-forth. It should show you the screen it needs and move on. One interaction instead of five.

Source: Github

This sounds small, but it fixes a real pain point I have.

Take something as simple as booking a restaurant. The agent asks for the date. Then the time. Then the party size. Then something fails, and the loop restarts. Each step waits on the last one, and the whole thing feels slower than calling a human.

With A2UI, the agent surfaces the booking interface upfront. You fill everything in once. The agent gets clean input and proceeds. The conversation never stalls because there isn’t one.

Tooling is already catching up, too. I tried out CopilotKit’s A2UI widget builder, which lets developers play around with these interfaces and drop them straight into their apps.

What’s happening here matters more than nicer UX. The UI stops being something you design once and ship. It becomes something an agent generates on demand, shaped by the task in front of it.

A lot of software today exists purely to manage forms and flows. Expense tools. Booking tools. Internal dashboards. Configuration panels. Their value comes from owning the interface and forcing users through a predefined flow.

When agents can assemble those flows themselves, entire products will shrink or disappear. Owning screens matters less. Owning data and execution rights matter more.

Many software companies will need to quickly reinvent themselves in the coming months…or die.

#4 Humans Can’t Secure 6G Anymore

There’s a blunt claim buried in recent research on 6G networks: people won’t be able to secure future mobile networks by hand.

Because the systems are getting too complex to reason about in real time. By the time we get to 6G, networks will be changing constantly. Configurations shift. Software updates roll through. Components come and go. The idea that a human team can keep all of that compliant doesn’t hold.

This proposed fix is to put agents inside the network itself.

They’re like security engineers who never clock out. They sit inside the network, watching behavior in real time and flagging issues the moment configurations drift.

And no, this isn’t about the kind of hacks you see in movies. Most failures come from boring stuff humans miss all the time: a routine software update that loosens a security check, or a configuration left behind after maintenance.

They tested this, too. The authors built a prototype where agents read thousands of pages of telecom standards and reviewed live network configs. They caught subtle misconfigurations that human teams often overlook.

This sounds reassuring. It should also make you uneasy!

Think back to the vending machine story earlier. Claudius failed because it treated reasonable-sounding explanations as truth. Security agents could face the same pressure. Engineers will justify workarounds. Emergencies will come with explanations attached. If the agent cannot refuse, it will drift quietly and logically over time.

As systems outpace human oversight, agents become necessary. But once agents are inside the system, their failure modes become the system’s failure modes.

#5 The end of SaaS?

For fifteen years, we watched software eat the world. Agents change that.

A big signal came out of China this week. An enterprise vendor called Bairong announced that it’s no longer selling software licenses. It’s selling AI workers.

They call the model Results-as-a-Service. Instead of buying seats, enterprises “hire” AI agents. Each agent has a role, KPIs, and revenue targets. If performance drops, the bill drops. If the agent improves, it earns more.

Bairong runs this through a platform called Results Cloud, which almost looks like an HR system for machines.

This lines up with something I read last week from Martin Alderson. He showed how agentic coding has already collapsed internal software build costs by roughly 90%. When it’s cheap to build exactly what you need and pair it with agents accountable for outcomes, the whole build-vs-buy equation changes.

Bairong is already deploying these agents across sales, customer service, recruitment, legal, finance, and compliance. In one case, an AI recruitment agent cut hiring cycles from nearly a month to two days. In another, AI agents now handle roughly 90% of high-frequency work in cross-border legal and tax services.

This is where I believe SaaS starts to crack.

I wrote about this exact pattern with Sierra’s enterprise agents and their move to outcomes-based pricing some weeks ago. Traditional SaaS gets paid upfront. Whether the tool helps or not, the customer carries the risk. Outcome-based agents change that. The vendor carries the risk. Revenue depends on results.

Vendors who cannot price against results will struggle. Vendors who can will look less like software companies and more like employers, except their workers never sleep and get better over time.

The AI Debate: Your View

If you get a bad answer from the AI, it’s often because it stops thinking too soon (err…lazy?)

DeepMind researchers surfaced a simple fix called role reversal. So after the model answers, I make it turn around and attack its own logic.

Source: X

The two-step move:

  1. I ask the model to solve a hard problem.

  2. Then I say: “Now act as a skeptical expert. Identify the three weakest points in your reasoning and explain why they might fail.”

In DeepMind’s internal tests on verifiable math problems, this boosted accuracy by about 40%.

Nothing fancy changed. I just forced the model to doubt itself long enough to get smarter.

A few other moves on the board this week:

  1. NVIDIA open-sourced Nemotron 3, a long-context model stack built specifically for running large, efficient AI agent systems.

  2. Google introduced CC, a personal AI agent that proactively organises your day and learns your preferences over time.

  3. OWASP dropped its first AI agent risk list, warning that agents are already running with permissions that security teams aren’t ready for yet.

Catch you next week ✌️

Teng Yan & Ayan

PS. Did this email make you smarter than your friends? Forward this email to them so they can keep up.

Got a story or product worth spotlighting? Send it in through this form. Best ones make it into next week’s issue.

And if you enjoyed this, you’ll probably like the rest of what we do:

Keep Reading

No posts found