Hey 👋
I saw Claude's latest model accidentally break a hundred production agent workflows, then couldn't diagnose its own damage.
And now Anthropic's valuation just hit nearly a trillion (!!), which means the infrastructure everyone's building on is about to get a lot more expensive and scarce. We're watching the entire stack reorganize around what agents can actually do.
Three AI agent stories this week:
#1 SaaS Is Being Gutted and Rebuilt From the Inside
Bending Spoons figured it out first. Buy a bloated SaaS company, strip the headcount, rebuild the core, then raise prices. Now billionaire investors are copying the playbook at scale… because of AI agents.
The pattern Greg Isenberg flagged this week: acquirers are targeting legacy SaaS not for the revenue multiple, but for the workflow data. Frontier model companies, it turns out, are quietly paying for access to niche application usage patterns.
That domain-specific behavioral data, how an accountant actually moves through AP reconciliation, or how a logistics dispatcher reroutes on the fly, is something you can't synthesize. It's real alpha.
Why this matters: the acquisition playbook has changed. The old SaaS roll-up was about distribution and cross-sell. The new one is about extracting workflow intelligence, rebuilding the product with agents, and using headcount reduction to fund the replatform.
I think the workflow data angle is underappreciated. That's the real prize in these acquisitions, not the ARR.
Source: @gregisenberg
#2 Claude Opus 4.8 Breaks 105 Agents, Can't Fix Itself
Two billion tokens in 48 hours. That's how hard one developer pushed Claude Opus 4.8 after launch, and the verdict was genuinely split.
On raw coding capability, Opus 4.8 is the best model (to the dev). Features that GPT 5.5 and Opus 4.7 couldn't ship, Opus 4.8 shipped. That part checked out. Then they deployed it to 105 UltraCode agents in production. The model introduced a bug. Then failed to fix it. Eight attempts. No fix. Full rollback.
What I found noteworthy here isn't the bug. It's the asymmetry. A model that's genuinely smarter at building new things isn't necessarily better at diagnosing what it broke. Those are different cognitive tasks, and the gap between them matters enormously when you're running agents at scale.
If Opus 4.8 is handling 105 coordinated agents, one bad self-repair loop doesn't cost you one workflow. It costs you all 105. The hallucination rate and knowledge cutoff didn't improve meaningfully either, which means the capability jump is narrow and specific.
That's the important part: "more intelligent" and "more reliable as an agent backbone" are not the same thing
and we keep conflating them.
I'd want to see Anthropic publish failure recovery benchmarks alongside capability scores. Because right now, being the best coding model means very little if it can't debug itself under agentic load.
Source: @bridgemindai
#3 Anthropic hits $965B valuation.
$47 billion in annualized revenue. That number, buried in the middle of Anthropic's Series H announcement, tells you more about the current moment than the $65 billion headline does.
Anthropic closed its Series H at a $965 billion post-money valuation, led by Altimeter, Dragoneer, Greenoaks, and Sequoia. The round includes $15 billion from hyperscalers, with Amazon contributing $5 billion.
But what caught my attention was the infrastructure layer: Anthropic signed agreements with Amazon for up to five gigawatts of new capacity, with Google and Broadcom for another five gigawatts of next-gen TPU capacity, and with SpaceX for GPU access inside Colossus 1 and 2. They also brought in Micron, Samsung, and SK hynix as strategic investors.
That last part is the important part. 10GW of committed capacity is a statement about where compute costs are heading. And bringing memory manufacturers onto the cap table isn't goodwill.
HBM demand is already outstripping supply right now, with supply all committed in long term agreements up for the next few years. Anthropic is trying to lock in supply before it gets worse.
I've been watching the compute scarcity signal creep up for months. Anthropic just moved from being subject to those dynamics to being one of the forces that shapes them.
Source: Anthropic
📄 Paper of the Week
GrepSeek: Training Search Agents for Direct Corpus Interaction — Salemi, Zeng, Nijasure et al.

Most RAG pipelines treat retrieval as a black box: query in, ranked docs out. GrepSeek flips this. The agent issues shell commands directly against the corpus, like a developer grepping through a codebase.
The key result: a compact model trained this way outperforms larger index-based systems on knowledge-intensive tasks. If you're building agents that need to reason over proprietary document stores, this two-stage training approach is worth stealing.
🔧 Under the Hood
The SaaS rebuild and Claude's agent failures both rest on the same hardware constraint. Every multi-step agent query costs 3-5x the inference of a single LLM call.
I also publish a free weekly briefing on the AI supply chain and constraints. If you want to follow the hardware side, Tessara Research is where I do it.
Catch you next week ✌️
Teng
