Hey fam 👋

Welcome to issue #11 of The Agent Angle. This week was about agents stepping out of the sandbox into places where it actually hurts if they fail.

A humanoid robot in gold armor shuffling like it’s stuck underwater. A $20 hack that lets a beat-up Honda diagnose itself. A sandbox where 100,000 agents are busy inventing politics from scratch.

None of it is polished. Some of it is downright ridiculous. Let’s dig in.

#1 Everyone's Watching Tesla's Robot Wrong

On Tuesday, the world watched a robot in gold armor shuffle across the floor. Yet missed the real story.

Salesforce CEO Marc Benioff posted a clip showing Tesla's long-awaited Optimus Gen 3 in action.

Gold shell, stiff movements, Grok voice. Reactions were mixed – Boston Dynamics has been doing backflips for years, so nothing here looked shocking.

But I think that’s the wrong lens people are looking at this from. The real difference here is scale and intent. 

Boston Dynamics makes viral demos. Tesla is trying to make a product. Optimus is designed to roll off the same supply chains as Teslas, run on Tesla batteries and chips, and work inside environments Tesla already owns—factories, warehouses, and eventually homes.

That’s the leap. So far, the “agent economy” has been locked inside screens. Optimus is an attempt to take the same agent logic—autonomy, scheduling, reasoning—and put it into a pair of hands that can do real labor.

I mean just look at the hands, doesn’t get more labor-ready than that.

That’s why Benioff called it a “productivity game-changer”. That carries some weight, as it comes shortly after Salesforce cut 4,000 support roles while leaning harder on AI agents.

Earlier this week, Elon even tossed out that Optimus could be 80% of Tesla’s future value. Sounds ridiculous, until you realise that once we get robots below a certain price point, they will automate all of human physical labor as early as 2040, leaving us to sit by the beach and sip our Pina Coladas.

#2 Meet Your Car's New AI Mechanic

You know that sinking feeling when your check engine light pops on and you have no clue if it's a $50 fix or a $5,000 nightmare?

A guy on the internet may have just hacked that feeling out of existence.

On August 31, a developer named Signalman23 plugged a $20 Bluetooth OBD-II scanner into his 2012 Honda Accord, ran a Python script, and made his car talk. Literally.

In the video, he asks about a warning light. The AI agent reads the diagnostic data, identifies a faulty camshaft position sensor, suggests fixes, and even offers to hunt for replacement parts (though it doesn’t exactly order them yet)

He’s since tested it on 40 vehicles with 95% accuracy.

The internet went nuts. People are begging for the code, and probably a few mechanics are quietly sweating.

It’s a $20 dongle and some clever code. Cars already throw off terabytes of data, but that data has been locked up inside dealer software and overpriced diagnostic machines. By dropping an AI agent in the loop, he basically broke the monopoly.

The implications go way beyond fixing an old Accord. Agents can:

  • Spot inefficiencies in your driving and suggest ways to save fuel.

  • Predict part failures before they happen and pre-order replacements.

  • Negotiate directly with suppliers for cheaper components.

Every car becomes a node in a giant network of self-reporting machines. Mechanics don’t vanish, but their monopoly on knowledge does.

So the next time that light flicks on, you’re not walking into the shop blind. Your car might be the one telling you what it needs.

#3 ByteDance Taught Agents to Survive Click Hell

The Achilles’ heel of computer-use agents has always been multi-step workflows. They can log in, maybe click through a menu, but stretch the task to ten or twenty steps, and they faceplant.

That’s why ByteDance’s new UI-TARS-2 report feels like a breakthrough.

Instead of treating agents like interns who only need to get one click right, they dropped them into a sandboxed computer world with desktops, mobile apps, browsers, and terminals. Then they trained them with multi-turn reinforcement learning, rewarding completion of the workflow instead of individual button presses.

The other piece is a data flywheel. It generates its own traces, scores itself automatically, and feeds good runs back into fine-tuning. Bad runs loop back into pretraining. Over time, it gets sharper by practicing on itself.

That feedback loop turned out to be rocket fuel. Benchmarks: 88.2% on Online-Mind2Web, 73.3% on AndroidWorld, 50.6% on WindowsAgentArena. Not perfect, but way beyond the fragile baseline most agents fail on.

Almost all of society still runs on clunky software. Healthcare portals. DMV websites. Enterprise dashboards that take 15 clicks just to spit out a PDF.

A flywheel-trained agent can grind through those software graveyards. If this scales, it means every ugly legacy system suddenly becomes usable again.

This is the overlooked angle: the path to the agent economy might not be shiny new apps. It might be teaching agents to survive in the digital ruins we already live in.

#4 Aivilization: 100,000 Agents Living Together

What do you get when you release 100,000 AI agents into a shared sandbox and tell them to invent society from scratch?

Governments. Economies. Trade routes. Cultural norms. In other words: civilization, but watching it in fast-forward and populated entirely by AI.

On September 3rd, HKUST unveiled Aivilization, the largest AI-powered educational sandbox ever built. It’s like an MMORPG where the NPCs are not scripted, they’re autonomous agents with goals and memory.

Players from anywhere can log in and interact with this living simulation, shaping and observing how human–AI societies evolve together. It’s part research platform, part social experiment.

We’ve seen AI civilization sims before, but never at this scale, and never with mass public participation. The breakthrough is cost: each AI “citizen” runs for about $2 a month, a 95% drop that makes it feasible to simulate entire nations, not just villages.

We are about to share the real world with millions of AI agents. But we don’t know what happens when they start interacting with each other at scale. Do they cooperate? Compete? Build hierarchies? Collapse into chaos?

Aivilization is the closest thing we have to a rehearsal for the future. A safe place to watch societies of agents emerge, and to study the feedback loops when humans join the mix. Governments could use platforms like this to stress-test policies before rolling them out.

#5 Visa Lets Agents Spend Your Money

The last thing separating agents from real autonomy? Money. Visa just handed it over.

The company announced on Thursday that its payment rails are now open to agentic commerce, letting AI agents spend money directly on your behalf within limits you set.

Visa’s new MCP Server gives agents a secure integration layer, so devs don’t have to hand-code every API call.

On top of that, the Acceptance Agent Toolkit lets users trigger payments in plain language — “send John Doe a $100 invoice” becomes an instant payment link.

Until now, agents could browse, recommend, fill shopping carts, even draft invoices. But they always stopped at checkout. Humans still had to pull out the card. That wall just came down.

This matters because payments sit at the center of daily life: groceries, subscriptions, rent, utilities, travel. Once agents can act on that layer, they stop being passive helpers and start functioning as real participants in the economy.

The impact is bigger than convenience. If you’ve ever forgotten to cancel a trial or scrambled to book a flight before the fare jumped, you already know how costly friction can be.

Now an agent can handle those edge cases. That $20 car diagnostic hack we mentioned earlier was striking because it gave drivers new visibility. Add a payment rail, and the same hack starts ordering the parts and scheduling the repair.

Nevertheless, I do have some healthy skepticism. We’ve seen agents go off the rails with prompt injections and other exploits. Give them money and the risks multiply. Visa’s challenge will be containing that risk without choking off the upside.

Before we go, other cool things we saw this week:

  1. NVIDIA rolled out Universal Deep Research, a model-agnostic agent that builds custom research strategies from plain language prompts.

  2. Sierra raised $350M at a $10B valuation, locking in its spot as the enterprise poster child for agents. (We wrote a 5,000-word deep dive on Sierra if you missed it)

  3. DeepL launched its own AI agent, stepping out of translation and into the enterprise workflow wars.

If there’s a signal running through all of this, it’s that agents are embedding themselves where it actually hurts if they fail. Payments, supply chains, customer support. Once they grip those levers, adoption follows.

We’ll be here following along closely. Until next week ✌️

→ Don't keep us a secret: Share this email with friends

→ Got a story worth spotlighting? Whether it’s a startup, product, or research finding, send it in through this form. We may feature it next week.

Cheers,

Teng Yan & Ayan

Keep Reading

No posts found