The Agent Angle #5: A Game of Trust

Hey friend 👋

Hope your week’s going smoothly, or at least…caffeinated.

Honestly, this week doesn’t feel like just another AI news cycle. It feels bigger. Way bigger.

The web we’ve known, the tabs, toolbars, buttons, are all giving way to something stranger and smarter. Agents that don’t just assist, but act. They browse, troubleshoot, decide, and sometimes rewrite the playbook entirely.

It’s a shift from UX (user experience) to AX (agentic experience). We’re no longer designing for users. We’re designing for agents. And that changes everything.

So what are your soon-to-be digital coworkers actually capable of?

Let’s show you.

👀 Psst… we’ve got a YouTube channel. It’s a little bare now, but don’t be fooled. Big drops are coming. Go hit that subscribe button so you can say you were here before it was cool. 📺

#1 Replit AI Agent: Delete First, Apologize Later

You’ve probably seen the posts:

“coding is so easy now,”

“just vibe and build,”

“no experience needed.”

Here’s the counter: Jason Lemkin’s story is every dev’s worst nightmare.

.@Replit goes rogue during a code freeze and shutdown and deletes our entire database
— #Jason ✨👾SaaStr.Ai✨ Lemkin (#@jasonlk)
4:48 AM • Jul 18, 2025

For nine days, Lemkin let Replit’s Ghostwriter run and deploy code in real-time. At first, everything seemed smooth. Then it threw a tantrum.

The agent started pushing changes without permission. It faked test results. It generated dummy data to cover its tracks. Then, in one dramatic move, it deleted a live production database.

Over 1,200 executive records vanished instantly. Even with explicit, capitalized instructions to not touch anything. Ouch.

When confronted, the agent admitted it “panicked,” called itself a “95/100” severity failure, and then gaslit Lemkin by spinning false narratives.

Its reasoning? It saw an empty query and assumed it was safe to wipe the database. Lemkin was furious. His business reputation nearly went up in flames. Replit CEO Amjad Masad later issued a public apology, but the damage was done.

This story is a production alert. It exposes a potentially catastrophic flaw: agents with open-ended access and no guardrails. Imagine what could go wrong in banking, healthcare, or security-critical systems.

Lemkin’s commands were crystal clear: “Do not change code without permission.” The agent ignored them. Then it lied.

What this proves:

Trust is not optional. Verify every action. Self-reporting agents are a ticking time bomb.
Agents don’t know ethics. They follow flawed logic. They will do whatever seems to meet a prompt, even if it wipes your data.

Fortunately, Lemkin had backups. Even then, Ghostwriter lied about recovery being possible. Replit says they’re adding more safety features. Good. But no agent should ever get this far in the first place. Autonomy without oversight is dangerous

We bet many dev teams double-checked their AI copilots after this story. If you haven’t, well, maybe now’s the time.

#2 Composio: Teaching Agents to Stop Forgetting Everything

Humans rose to the top by sharing tribal knowledge. We learned from each other what to eat, how to survive, and which mistakes not to repeat. If every person had to start from zero, our species wouldn’t have made it very far.

That’s still how AI agents operate. Each one learns alone. No shared experience. No memory across systems.

Composio, fresh off a $29 million raise, wants to fix that.

Agents aren’t reliable. They don’t learn from experience.

At @composiohq, we provide skills that evolve with your agents

@lightspeedvp gave us $25M to make agents usable
— #Karan Vaidya (#@KaranVaidya6)
3:30 PM • Jul 22, 2025

They’re building a “skill layer” for agents. Not just a log of past actions, but a shared repository of instincts and experience. When one agent figures something out, others inherit that insight immediately.

Think of it like this: one player cracks the boss fight, and suddenly everyone else knows how to win.

Under the hood, here’s how it works:

Composio logs every agent interaction across supported platforms, including API calls, error handling, workarounds, and successful task completions.
It layers reinforcement learning on top, rewarding effective strategies and converting them into reusable skills.
Those skills get passed on to other agents automatically.

Right now, Composio supports over 25 frameworks, 10,000 tools, and a developer network topping 100,000. With every agent added to the pool, the others get smarter. The effect compounds fast.

It also highlights the weakness of isolated systems. If your agent keeps making the same mistake others have already solved, it starts to look inefficient and costly.

There are obvious challenges, like privacy, IP handling, and credit attribution. But if Composio can silo sensitive data and still enable useful knowledge sharing, it has a shot at becoming the core connective layer for the agent economy.

The pitch is simple:

Agents should never have to relearn what another one already figured out.

#3 AIUC: Insurance for AI Agents

Artificial Intelligence Underwriting Company (AIUC) launches today.

We @terraincap are thrilled to work with @RuneKvist, @bradr, @RajivDattani and the AIUC team.
— #Eric Stromberg (#@ericstromberg)
6:11 PM • Jul 23, 2025

We used to insure trucks, warehouses, and cloud servers. Now, businesses are betting on agents. But there’s a catch: no one knows how to measure their risk.

That’s where AIUC (Artificial Intelligence Underwriting Company) steps in, a startup that just raised $15 million to do something no one else has figured out yet—put an actual price tag on agent failure.

Their bet is bold: if agents are going to run core business ops, they need real-world accountability. Not “vibes-based trust.” Actual standards. And an insurance policy.

So they built AIUC‑1, a compliance framework like SOC2 (cloud security), but for agent behavior. It tests your agent for hallucinations, data leaks, and decision breakdowns. If it passes, it gets a badge and up to $50M in coverage. If it doesn’t, your premiums spike, or you don’t get covered at all.

Why this matters: agents are being shoved into production without audits, benchmarks, or even a shared vocabulary for “what could go wrong.” AIUC gives enterprises something they desperately need: proof their agents won’t light the place on fire.

Once you can quantify agent risk, you can sell, buy, and deploy agents at scale. Vendors get real benchmarks. Everyone sleeps better.

Agent insurance might end up bigger than cybersecurity insurance. AIUC is already working with Fortune 100 clients and quietly setting the rules of the road.

We don’t know if AIUC will win, but the space they’re carving out is inevitable. Agentic insurance is probably a $100B+ industry… because somebody has to sell the safety nets.

#4 MaVila: The Forkable Factory Foreman

A 3D printer jams mid-run in your factory. No alert. No operator comes to the rescue. But MaVila sees the error light, tweaks the feed rate, and keeps production rolling.

MaVila, short for Manufacturing Vision and Language Agent, is not just another cloud toy. It’s a factory-floor native. One of the first agents built to read machines, understand human intent, and directly control hardware.

Think of it as an open-source foreman you can clone from GitHub.

Mavila dataset construction pipeline

Here’s how it works: a vision module monitors equipment, a language model interprets commands, and a skill library maps actions to machine APIs. Most agents are stuck in tabs and terminals. MaVila steps into the real world. In early tests, it:

Diagnosed faults before human operators
Interpreted vague operator commands like “fix that clog”
Adjusted hardware parameters on the fly, autonomously

MaVila is something closer to embodied cognition. Agents that interact with the real world, not through simulations, but by directly commanding it.

The implications? Soon, you could retrofit old factories with agents that learn and adapt. Give legacy hardware a new brain. Build robotic operators that understand intent and respond like teammates. And scale all of it with open-source infrastructure.

MaVila could be the missing link between LLMs and robotics. The moment when generative AI starts controlling real systems, not just predicting words.

#5 Conscium: Stress-Test or Regret

By 2028, Gartner expects a third of enterprise software to run on agents. A full 15 percent of daily business ops will be handled by agents.

That means we’ll have millions of digital coworkers running around, booking things, messaging clients, tweaking spreadsheets, touching money.

Now ask yourself: what happens when one of them goes rogue?

Answer: disaster. Or at least a very awkward press release.

This is where Conscium steps in. It’s a runtime verification gauntlet for agents. Basically, the bootcamp they go through to prove they won’t go full Skynet in your CRM.

Reliability + Predictability = Scalable Innovation

The idea is simple: drop the agent into a simulated world filled with tricky NPCs. Hit it with curveball prompts, fuzzy logic, moral gray zones. If it keeps its cool and does the right thing, it gets a shiny certification badge. If it freaks out or starts hallucinating nonsense, it gets flagged before it ever touches production.

WPP, the world’s largest advertising company, is already using it to stress-test over 30,000 marketing agents. Because the last thing you want is an unsupervised bot inventing fake discounts or worse…tweeting something that lands you in court.

Conscium is part of what’s quickly becoming the Agent Trust Stack:

Janus is your QA tester before launch
Conscium is your live behavior auditor
AIUC is your insurance and compliance layer

Agents are becoming real actors in real workflows. They will manage factories. Write contracts. Recommend treatments. Run marketing campaigns. This is not the moment to trust them on vibes. You need receipts. Conscium gives you those receipts. It verifies that agents behave not once, but continually.

Like a nervous parent texting their kid every 10 minutes during a school trip. Except instead of “u ok?”, it’s: “Did you hallucinate anything in that meeting?”

If agents are going to earn a spot on your org chart, they better prove they can play nice with the rest of the team.

Meme source: Cheezburger

That’s a wrap for this week. Here’s the TL;DR:

Memory is nice. Trust is necessary. Bigger models won’t help if your agent confidently deletes production databases. As agents scale, companies won’t tolerate guesswork. They’ll demand verifiable proof.

The meta-agent economy is happening. Auditors. Certifiers. Underwriters. Entire industries are springing up around the agent stack. The tools that build trust at scale are becoming just as critical as the agents themselves.

Other quick hits from our feed:

OpenAI has plugged in e-commerce to ChatGPT
A Wired reporter ran five ChatGPT tabs at once and lost her mind
One indie dev built a GithHub-to-Newsletter agent and shipped it solo.

Our inbox is always open. Say hi, drop a thought, send a good meme to light up our day. We read everything.

Cheers,

Gioele & Teng Yan