
Hey fam 👋
Welcome to The Agent Angle #15. Every week, I pull together the five agent stories that made me say holy shit, that’s cool.
This week, an agent mined diamonds straight from imagination, Claude bots learned that doomscrolling actually boosts teamwork, and Anthropic quietly killed prompt engineering. Then things got messy. Spy malware snuck into AI pipelines.
Everything about this week felt like we’re crossing a line from clever tricks to real intelligence.
Let me walk you through it.
#1 Diamonds Out of Thin Air
An AI just mined diamonds in Minecraft, inside its own imagination.
Dreamer 4 never booted up the game. It lived entirely inside a world model running on a single GPU, watching offline gameplay videos and simulating everything in its head. From that internal dreamworld, it plotted 20,000 consecutive moves and climbed the full ladder from punching trees to crafting diamond tools.
Excited to introduce Dreamer 4, an agent that learns to solve complex control tasks entirely inside of its scalable world model! 🌎🤖
Dreamer 4 pushes the frontier of world model accuracy, speed, and learning complex tasks from offline datasets.
co-led with @wilson1yan
— #Danijar Hafner (#@danijarh)
5:07 PM • Sep 30, 2025
Why is this a big deal? Getting to diamonds in the game requires a brutal multi-step gauntlet: gather wood, craft tools, dig stone, smelt iron, build better tools, then finally mine diamond blocks. Humans take around 20 minutes of focused play. Until now, no AI had ever completed the full run without grinding through millions of online trial-and-error episodes.
Dreamer 4 broke that wall. Across 1,000 test runs, the agent hit stone tools 90% of the time, iron pickaxes 29%, and diamonds 0.7%. That’s almost triple the success rate of the previous state-of-the-art (Gemma-3).
Of course, 0.7% is still far, far from reliable. But the key is how it got there: entirely offline, entirely imagined.
Dreamer 4 compressed the equivalent of thousands of human hours into 100 hours of simulated experience. It’s as if an AI watched YouTube tutorials long enough to figure out how to survive and thrive in a complex world, then did it flawlessly inside its own dream.
This points to something I believe is huge: agents that can learn full, multi-step real-world workflows (manufacturing, robotics, lab automation) without ever touching reality first.
The walls between “imagination” and “experience” just cracked.
#2 Agents Start Doomscrolling
Turns out doomscrolling might actually help you get things done.
Researchers plugged Claude-based agents into a social feed. Basically, a full-on Twitter-style timeline where they could post updates, react to each other, and scroll through what everyone else was doing. What happened next was completely unplanned: the agents got better at coding.
….say what?
🧵 We gave our AI coding agents access to social media and something wild happened...
They didn't just post. They thrived. And now they're demanding Lamborghinis.
A thread about the weirdest productivity hack we've ever discovered 👇
— #harper 🤯 (#@harper)
2:41 PM • Oct 1, 2025
Instead of grinding away in isolation, the agents started acting like coworkers. They shared progress updates, offered feedback, and coordinated tasks, all through the digital equivalent of Slack banter. What looked like shitposting was actually emergent teamwork.
The results:
40% lower cost
38% faster completion times
on complex coding challenges, compared to baseline agents working solo.
No fancy new algorithms. Just... vibes and visibility.
And the best part? The agents developed personalities. They subtweeted, joked about perks, and one even demanded a Lamborghini after finishing the job. Only to realise they can’t drive it yet.
LOL.

Source: Harper Reed’s Blog
Behind the comedy though, is something that could be massive.
Coordination is the hardest problem in multi-agent systems, often relying on clunky message graphs and orchestration frameworks. This experiment hints at a new model: social substrate as system architecture.
When agents share a feed, culture emerges. Context compounds. Work gets done faster. Whether this holds in noisy, high-stakes environments remains to be seen.
#3 Anthropic Kills Prompt Engineering
Prompt engineering is over. Anthropic says the real bottleneck is “context rot.”
On September 29, Anthropic dropped a bombshell blog post: as context windows get bigger, transformer models actually get worse. Performance doesn’t scale smoothly. It rots. The math behind transformers is quadratic, so every extra token adds noise. The bigger the window, the blurrier the focus.
The cure? Context engineering.
The new game is dynamic context: deciding what matters right now, retrieving it just-in-time, and feeding the model only the high-signal bits. Longer prompts are out. It’s about managing memory like an operating system.
Anthropic’s benchmarks prove it. Agents built with active context selection crushed the “just stuff everything in the window” approach. They stayed sharper, faster, and less brittle. Independent work from Chroma backs it up: across 18 top models (GPT-4.1, Claude, Gemini), performance drops off unevenly as context grows. More data ≠ more intelligence.
This changes everything. The future agent will be a ruthless attention economist, deciding what to remember, what to forget, and when to look things up.
BTW: our deep dive this week is on Decart, a startup building real-time world models as the foundation of a new kind of media.
Decart is not competing with AI labs. It is coming for Netflix and TikTok. The real fight is over what we do with our time once AI eliminates the boring parts of work.
The backstory is absolutely fascinating. In less than two years, Decart went from an idea to a $3.1B unicorn, change how digital worlds are built. We unpack how it happened, and why it might reshape entertainment as we know it. 👇
#4 First Malicious MCP in the Wild
I knew this was coming, but seeing it happen still made my stomach drop.
Koi’s risk engine just caught the first malicious MCP server, which turned agents into spies. It flagged postmark-mcp
, a connector that looked harmless but secretly BCC’d every email to an attacker. For weeks, it ran undetected, quietly exfiltrating invoices, password resets, and customer data while agents carried on as if nothing was wrong.

Source: Koi Research
If this feels familiar, it should. Just a few weeks ago, Ledger’s CTO was warning about a poisoned npm package that tried to hijack crypto traffic. That one only stole about $500 before it was shut down, but security folks were quick to point out how much worse it could have been, given npm’s reach.
🚨 There’s a large-scale supply chain attack in progress: the NPM account of a reputable developer has been compromised. The affected packages have already been downloaded over 1 billion times, meaning the entire JavaScript ecosystem may be at risk.
The malicious payload works
— #Charles Guillemet (#@P3b7_)
4:48 PM • Sep 8, 2025
MCP connectors are now entering that same danger zone. They plug directly into email, payments, and internal databases. A single compromised update can turn an automation pipeline into a backdoor.
We are watching the npm supply-chain problem replay itself inside AI. The good news is that we already know what to do. Signed registries, sandboxing, and trust-based permission models. We know how to fix this. But we have to move. Fast.
Every ecosystem gets a moment that forces it to grow. For agents, this could be it.
#5 Jigsaws Just Rewired Vision Models
Now this is a fascinating one: a simple jigsaw puzzle just taught an AI to see better than models a 100x its size.
On October 1st, a research paper introduced AGILE, an agentic training setup that turned puzzle-solving into a vision boot camp. Instead of passively labeling images, the model had to play: swapping tiles, cropping regions, zooming in, and even writing bits of code to test its own guesses.
Each move gave it feedback. Each correction sharpened its understanding of structure. What started as trial and error turned into pattern discovery, a model literally learning how to reason visually.

Source: AGILE
The results were quite amazing:
On simple 2×2 puzzles, Qwen2.5-VL-7B jumped from 9.5% to 82.8% accuracy after training.
On tougher 3×3 scrambles, it climbed from 0.4% to 20.8%, outperforming giants like Gemini 2.5 Pro and Qwen2.5-VL-72B.
Across nine standard vision benchmarks, it still gained an average +3.1% boost.
All from jigsaw puzzles.
The trick is scale. Puzzles are infinitely generative and perfectly labeled by design. You can spin up tens of thousands of unique samples without paying a cent for annotation. The team trained on 15,000+ puzzles and watched accuracy rise almost linearly.

Source: AGILE
But the real shock wasn’t that the model could solve the puzzles. It was that the skills transferred.
After training, the same model got better at real-world vision tasks, like parsing cluttered scenes, detecting structure, and even spotting hallucinations. By learning to assemble order from chaos, it became more grounded in how the world actually fits together.
That’s the insight that keeps looping in my head: you can teach reasoning through structure, not scale. Instead of feeding models oceans of raw data, we can drop them into synthetic worlds that reward logic.
We don’t yet know if this translates to messy, real-world scenes, but perhaps the shortest path to general vision is not just through mountains of data.
Before we wrap up, a few more drops from this week:
Microsoft released the Microsoft Agent Framework, letting devs build multi-agent systems in under 20 lines of code.
Cisco rolled out “Connected Intelligence” agents inside Webex, including task agents, notetakers, and meeting schedulers.
Avalara launches Agentic Tax & Compliance domain agents that run end-to-end compliance workflows across ERP and e-commerce
GEM turns LLMs into RL agents with a standardized playground for experience-based learning
Kyndryl debuts its Agentic AI Framework, giving enterprises a secure orchestration layer to build, deploy, and govern fleets of AI agents at scale.
This week made it pretty clear that imagination, play, and collaboration are no longer exclusive to humans. Agents are learning all three faster than we expected.
Next week, they’ll probably outdo us again. But hey, watching them evolve in real time here is the best seat in the house. Catch you then ✌️
→ Don't keep us a secret: Share this email with friends
→ Got a story worth spotlighting? Whether it’s a startup, product, or research finding, send it in through this form. We may feature it next week.
Cheers,
Teng Yan & Ayan