Secret Agent #43: A Harvard lab just shut down to fix agent memory

Hey 👋

Last Tuesday I watched a founder demo their agent in a sterile conference room. Worked like magic. Then I asked them: "What happens when your customer's data looks nothing like your training set?" The room got quiet.

That question, in different forms, shows up in all our stories this week.

Let's get into it.

#1 Agents Are Driving the Data Center Crunch

The same people who worried about overbuilding in AI are worried about not building fast enough.

Source: The Atlantic

Ethan Mollick flagged an Atlantic piece this week making the case that the market has done a full 180 on data center sentiment. Six months ago the story was "too much capacity, not enough demand." Now it's the opposite, and the piece pins the shift squarely on AI agents. Not foundation models. Not inference on chatbots. Agents, specifically, because they run longer, chain more calls, and don't stop when you close the browser tab. (Source)

Why it matters: agents consume compute differently than chat interfaces. A single agentic workflow can generate dozens of model calls where a user query generates one. If enterprise agent adoption is even 20% of what the optimists are projecting, the infrastructure math breaks in a hurry. That's why hyperscalers are still spending, even as everyone else tightens. Microsoft, Google, and Amazon didn't get religion on capex for fun.

I've been saying for a while that agents change the demand curve in ways that are hard to model from the outside. The Atlantic noticing it means this has officially left the AI-twitter bubble and entered the mainstream investment thesis. Watch the data center REITs.

#2 Harvard Researchers Quit Academia to Build Memory That Learns You

Most AI assistants forget you the moment you close the tab. These researchers are betting that's the fundamental flaw worth fixing.

A team with 160+ publications across Nature and ICLR has closed their Harvard lab to commercialize what they're calling Large Memory Models, or LMMs. The pitch: an architecture that captures personal context over time and surfaces relevant information without you having to ask for it. Think less "retrieval on demand" and more "a system that already knows what you need." (Source)

The distinction from RAG matters more than it sounds. RAG is fundamentally reactive: you query, it fetches. LMMs as described are proactive, building a persistent model of you that evolves. If that actually works at scale, it changes the entire stack for personal AI agents, which today still rely heavily on explicit memory injection or clunky context windows.

Worth noting: memory tightness in the infrastructure layer is already sitting at 89.0 on our tracking index, up 3 points in a week. HBM demand is outstripping supply. A genuinely new memory architecture that is compute-hungry arriving into that environment is not ideal timing.

I'm cautiously interested. "Mimicking human memory" is one of those phrases that has burned a lot of researchers before. But shutting down a Harvard lab to go build it is a real signal that these founders believe the research is past the interesting-paper stage.

#3: HBM Stays Bound (From Tessara Research)

The agents you read about run on accelerators. Every accelerator runs on HBM (high bandwidth memory). And HBM just hit our highest scarcity band of the cycle.

I just published the first in a series of Tessara AI supply-chain theses!

Thesis: HBM Stays Bound.

The 2026 memory book is already spoken for. Here’s our map.

Tessara Research • Teng Yan

Our calls:

HBM is structurally short through 2026 Q4. Three scaled suppliers are not collectively adding enough capacity to clear demand
The 2026 book is already largely spoken for. SK Hynix, Micron, and Samsung have each described 2026 HBM capacity as committed, booked, or sold out. ASML’s upstream commentary points in the same direction.
The demand stack is broader than NVIDIA. NVIDIA remains the largest buyer, but AMD, AWS, Google, and Microsoft now account for a meaningful share of HBM-critical accelerator programs in our tracker.

What this means for agent builders: the binding constraint on mass agent rollouts in 2026-2027 isn't model quality. It isn't even compute. It's whether your hyperscaler can get enough memory inside the chips you depend on.

I also share the 3 companies I’m watching most closely, why they matter to the thesis, and what evidence would prove us wrong.

Read the thesis: HBM Stays Bound →

Or subscribe to Tessara Research (free, 2 memos a week)

#4 Figure's Brett Adcock calls Figure 04 his "iPhone 1 moment"

Brett Adcock doesn't do subtle. The Figure CEO just gave a camera crew a full walk-through of their San Jose campus, BotQ manufacturing floor included, and capped it with a tease that Figure 04 will be a generational product.

The tour showed Figure 03 autonomously tidying a living room, no hand-coded instructions, running entirely on Helix, their vision-language-action model trained in-house. Adcock also walked through the "Vulcan project," an internal stability testing program, and the system integration lab where the hardware and neural network stack get married before any unit ships. (Source)

What I found noteworthy wasn't the robot footage. It was the manufacturing infrastructure behind it. BotQ is Figure's bet that vertical integration, owning the factory, the AI training pipeline, and the hardware, is what separates a demo company from a real one. If they can close that loop at scale, the comparison to Apple's supply chain control isn't crazy. The pivot from hand-coded controls to end-to-end neural networks is also the real story: that's the same architectural shift that made software agents actually useful.

I've been skeptical of humanoid timelines for a while. But a CEO willing to show the factory floor, not just the highlight reel, is at least pointing in the right direction.

📄 Paper of the Week

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence — NVIDIA

NVIDIA quietly shipped a 30B sparse model (only 3B active parameters) that handles text, images, video, and audio natively, with strong results on agentic computer use benchmarks. The token-reduction techniques mean meaningfully lower inference latency at roughly the cost of a much smaller model.

For agent builders, the interesting part isn't the multimodal breadth. It's that document understanding and screen control in one open, efficient package removes a real integration headache. Weights are out in BF16, FP8, and FP4.

📊 Free for Secret Agent readers. the AI Infrastructure Map

Your 5-minute map on the AI infrastructure buildout and what we’re watching closely. Tessara Research covers this layer in full. 2-3 memos a week, regime scoring across compute, memory, and power, named exposures at every chokepoint.

Get the Map →

Catch you next week ✌️

Teng

Secret Agent #43: A Harvard lab just shut down to fix agent memory

#1 Agents Are Driving the Data Center Crunch

#2 Harvard Researchers Quit Academia to Build Memory That Learns You

#3: HBM Stays Bound (From Tessara Research)

#4 Figure's Brett Adcock calls Figure 04 his "iPhone 1 moment"

📄 Paper of the Week

Keep Reading

The Secret Agent
by Chain of Thought

Secret Agent #43: A Harvard lab just shut down to fix agent memory

#1 Agents Are Driving the Data Center Crunch

#2 Harvard Researchers Quit Academia to Build Memory That Learns You

#3: HBM Stays Bound (From Tessara Research)

#4 Figure's Brett Adcock calls Figure 04 his "iPhone 1 moment"

📄 Paper of the Week

Keep Reading

The Secret Agent by Chain of Thought

The Secret Agent
by Chain of Thought