I’ve been chewing on this idea for months, and it finally feels like the right time to kick this off.

So…

Welcome to The Agent Angle, a brand new weekly newsletter from the Chain of Thought team, dedicated entirely to AI agents!

Nope, not crypto agents. This isn’t a Web3 thing. We’re stepping outside crypto, because frankly, there’s a ton of wonderful, fast-moving stuff happening in the broader agent world. And we’re obsessed.

Agents are getting weirdly good. They’re coding, planning, booking meetings, writing, debugging, even negotiating deals. Some are helpful, some are a little unhinged. But all of them are pushing the boundaries of what software can do without you lifting a finger.

We’re convinced that agents are the next big shift in tech. They’re already starting to act less like apps and more like coworkers.

That’s why we’re doing this: Every week, we’ll send you 5 spicy, handpicked stories from the world of AI agents. The boldest product drops. The freakiest demos. The smartest research findings. All curated to keep you ahead of whatever the hell is coming next.

You can read it in under 5 minutes. No fluff, guaranteed.

You’re receiving this because you subscribe to Chain of Thought. If you’d rather not get this new weekly AI agent email, you can unsubscribe with one click right here.

Let’s get into it!

#1: 11ai Lets You Just Say The Word

Introducing 11ai - the AI personal assistant that's voice-first and supports MCP.

This is an experiment to show the potential of Conversational AI:

1. Plan your day and add your tasks to Notion
2. Use Perplexity to research a customer
3. Search and create Linear issues
— #ElevenLabs (#@elevenlabsio)
5:24 PM • Jun 23, 2025

On Jun 23, Eleven Labs announced 11ai, a voice-first agent built on its low-latency Conversational AI platform. It supports retrieval-augmented generation for real-time answers and integrates directly with Notion, Slack, and Perplexity via Model Context Protocol (MCP).

You can pick from over 5,000 voices or use your own. Instead of teaching people new tools, 11ai turns voice into a command layer for apps they already use. It’s a smart move.

If there’s one thing we know, it's that we humans are lazy. The lesser the friction, the greater the retention.

Humans type around 40 words per minute. We speak at 150. Voice is not just easier, it’s three times faster. With 11ai, research, execution, and multi-step tasks go conversational.

And with another high-profile integration, MCP gains momentum as the integration layer to beat.

11ai Agent’s Architecture - ElevenLabs

That puts pressure on the incumbents. Apple and Google have great voice models but clunky integration. Amazon’s Alexa still lacks a universal function-calling layer. ElevenLabs just leapfrogged them with smoother execution across tools people already trust.

11ai is free right now. Try it out. Use your voice to query Perplexity, then auto-summarize into a Notion page. It’s slick, and it only takes a sentence.

#2: Sakana’s ALE-Agent Solves Hard Problems

Most AI agents fetch documents, clean up your emails, or wrangle Slack threads. ALE-Agent is out here solving problems that normally require a full whiteboard session, a PhD or two, and some serious coffee.

It’s built for a different beast entirely: hardcore optimization challenges in industrial processes, like logistics, routing and production planning.

The kind of gnarly, open-ended problems LLMs usually flail at because they need long-range strategic thinking.

Introducing ALE-Bench, ALE-Agent!
Towards Automating Long-Horizon Algorithm Engineering for Hard Optimization Problems

Blog: sakana.ai/ale-bench/
Paper: arxiv.org/abs/2506.09050

ALE-Bench is a coding benchmark primarily focused on hard optimization (NP-hard) problems. We
— #Sakana AI (#@SakanaAILabs)
12:17 AM • Jun 17, 2025

So how smart is it, really?

In May, ALE-Agent entered the AtCoder Heuristic Competition (think: coding Olympics for optimization nerds) alongside 1,000 human contestants. It placed 21st. That’s top 2%. Against actual people.

We’ve seen lots of AI agents beating humans (e.g. chess, coding, math) But this time its different.

This is AI breaking into one of the last strongholds of human advantage: creative reasoning. If agents can out-optimize 98% of expert programmers, we’re not talking spellcheck copilots anymore. We’re looking at full-blown industrial problem solvers who are better than teams of human engineers.

ALE-Agent reasoning framework - SakanaAI

You probably won’t use ALE-Agent yourself. It won’t trend on X. But this story still matters, because it signals where things are heading: agents that can think abstractly, plan several steps ahead, and beat domain experts at their own game.

Which, for us, is frightening and thrilling at the same time.

One more twist: Sakana didn’t just build the agent. They built the benchmark too: ALE-Bench, is a coding benchmark that focuses on hard optimization problems. So, yeah, they own both the playground and the best player on the field at the same time.

#3: Janus Stress Tests Your Agent Like a Pro

withjanus.com

Manual QA for AI agents is kind of busted. It’s slow, half-baked, and way too human to catch the freaky corner cases where LLMs start hallucinating or making stuff up.

Enter Janus, launched in late May.

Janus spins up thousands of synthetic users (think confused customers, cranky devs, conspiracy-loving uncles) and throws them at your AI agent to see what breaks. It digs up hallucinations, shaky logic, and biased behavior. Then it hands you a full report with fixes and straight-up advice.

You define what “good” looks like using plain English prompts. Ten minutes later, boom: full QA audit, complete with fake user feedback and action items to fix problems.

Bad outputs aren’t just embarrassing: they can ruin your brand reputation before you even launch. Janus finds those failure points before your real users do

But here’s the spicy part: this is agent-on-agent supervision. Janus is basically a meta-agent, testing, correcting, and improving other AIs. That’s a huge step toward systems that maintain themselves.

ALE-Agent, and now Janus. The trend is clear. Agents are not only becoming smarter and with broader capabilities, they also require less and less human supervision.

#4: Salesforce Enters The Arena with Agentforce3

If Janus is where you go to break your agent, Agentforce 3 is where you go to deploy one that actually holds up in production.

Salesforce’s newest release brings native MCP support and hundreds of prebuilt actions. Agents can now kick off workflows, pull contracts, update CRM records, and launch reports. No custom code, no duct tape integrations.

Agentforce 3 is here.

🔹 Command Center for a complete view of your agents
🔹 Plug-and-play with services you use—thanks to MCP
🔹 20+ top partners ready to go via AgentExchange
🔹 200+ pre-built industry actions

All powered by an enhanced architecture: sforce.co/44mi51E
— #Salesforce (#@salesforce)
4:28 PM • Jun 23, 2025

Here’s why this matters: in the enterprise world, interoperability = money. Every workflow that just works without manual fiddling saves time, reduces errors, and boosts margins. For companies running complex ops, even a 0.5% gain adds up to millions of dollars.

Agentforce 3’s MCP Client - Salesforce

Agentfroce is not sexy. It’s not supposed to be. It’s designed for the grown-up league (i.e. enterprises), where compliance, consistency, and predictable results win the day. That’s the secret sauce in B2B: boring is profitable.

Salesforce knows its customers. Agentforce 3 is a masterclass in meeting them exactly where they are.

#5: Abacus, A Generalist That Doesn’t Fall Apart

Most tools that claim they can “do everything” usually can’t do much of anything. So when Abacus’s DeepAgent promised to automate basically any task, we were skeptical.

But after seeing it run, we’ve changed our tune.

Yes, it builds apps. Yes, it handles workflows, makes short videos, digs up research, and even delegates tasks to sub-agents like a tiny Russian doll of productivity. But what really stood out? The speed of iteration. The team has been dropping updates left and right, listening to user feedback like they actually mean it.

We have been continuously iterating and improving our Deep Research.

It can now:
•Generate and integrate images
•Build smart tables
•Draws charts & graphs

It’s your all-in-one powerful research report!
— #Abacus.AI (#@abacusai)
3:57 PM • Jun 23, 2025

It’s not the prettiest thing out there and definitely more developer-leaning UX. Some flows could use polish. But for $10 you get a decent chunk of tokens, access to top-tier LLMs, and enough power to prototype agent workflows without lighting your wallet on fire.

In a sea of flimsy generalist tools, DeepAgent feels surprisingly sturdy. It’s a Swiss Army knife: it won’t replace your expert tools, but when you’re figuring things out or just need to get stuff done, it won’t let you down.

Quick hits from this week:

Interop is everything. MCP is no longer just a nice-to-have…it's a must-have.
Agents are leveling up fast. How long will humans stay in the loop?
Speed kills. Big players are moving in, so smaller teams need to ship faster than ever.

Catch you next week for more spicy takes on AI agents.

Got thoughts or feedback? Just hit reply, we read all your emails (pinky promise)

Cheers,

0xDriverz_ & Teng Yan

The Agent Angle #1: ElevenLabs's new stack

#1: 11ai Lets You Just Say The Word

#2: Sakana’s ALE-Agent Solves Hard Problems

#3: Janus Stress Tests Your Agent Like a Pro

#4: Salesforce Enters The Arena with Agentforce3

#5: Abacus, A Generalist That Doesn’t Fall Apart

Keep Reading

Chain of Thought