
This is the first piece in our new robotics series, and the goal is simple. Go one layer deeper than the hype without forcing you to get a PhD. I’m going to walk through the infrastructure that actually powers robots, both hardware and software. And surface the parts of the stack where progress is actually compounding.
I’m starting with dexterity and hands, because a robot would be a useless (and expensive) pile of metal if it couldn’t use its hands properly. It’s not just a hardware problem; it’s about a unique form of intelligence. Working on this research made me appreciate what a gift our hands truly are.
Let’s dive in!
-Teng Yan
Man is the wisest of all animals because he has hands.
I learned this watching my grandmother fold laundry.
She sat on the edge of the bed with a warm pile fresh from the dryer. Towel, shirt, socks, one after another. Her eyes were half closed, drifting in and out of a television show she had already seen a dozen times. Her hands never paused. Corners met cleanly. Sleeves tucked themselves in. Each piece landed in a neat stack, the same size as the last.
She was barely paying attention. The movements ran on their own. The skill had sunk below awareness, into something older and quieter. A kind of ancient intelligence, running on its own.
That is what human hands are. Not accessories, but a form of intelligence we barely notice. 400 million years of evolution compressed into a tool you can’t fully explain, even while you use it perfectly.
That is why robot demos often leave me with a faint sense of pity.
They can backflip flawlessly on demos, then struggle to load a dishwasher without teleoperation. They can run. They can jump. They still lose to dirty plates.
This piece is about why dexterity is a real bottleneck in robotics, why it lives below intelligence, and how to tell real progress from impressive demos.
We are looking for intelligence in the wrong place. For a long time, we inherited a bias that treated the body as a servant of the brain. Think first. Move second. In practice, the order is reversed.
As neuroscientist Daniel Wolpert argues, the real reason humans have such a powerful brain is to generate complex movement. The brain exists to control the body, not the other way around.
Try this. Reach out to touch the cheek of someone you love. Before contact, your hand reshapes itself to the curve of their face. At contact, force spreads across bones and tendons, skin floods your nervous system with signal, your grip adjusts without effort. The touch lands soft. You didn’t plan it. And you couldn’t describe it afterward.
To your mind, this feels like nothing. To a robot, it is everything.
Robotic manipulation is a fight with how much of reality lives at the point of contact. We are trying to force metal and plastic to reproduce a capability forged over hundreds of millions of years of evolution, and we often cannot even describe the skill we are trying to copy.
So before we talk about progress, we need a sharper definition of the problem.
What’s Dexterity?

Many confuse movement with dexterity. A robotic arm welding a car door is not dexterous. It is precise. It can repeat a pre-calculated path because it operates in a highly structured environment where every variable is controlled.
Dexterity is the ability to manage contact forces when the world does not cooperate.
It is rotating a pen in your fingers without dropping it.
It is the way your skin feels the edge of a coin in a dark pocket.
It is feeling the texture of a strawberry and knowing, without looking, exactly how much pressure will hold it and how much will turn it into jam.
In other words, Dexterity = Contact intelligence. The last inch problem in robotics.
You can hand a person an oddly shaped tool, and they’ll figure out how to grasp and use it. We don’t need prior training to hold a new object. We just do it.
Robots cannot do this. They are specialists, excellent at one task and brittle everywhere else. That gap is captured by Moravec’s Paradox, the observation that tasks humans find effortless, like picking up a coin or walking, are brutally hard for machines, even though they are superhuman at other things like mathematics.
You see, logic is computationally cheap. Running a chess algorithm requires very little energy. But the sensorimotor computation required to tie a shoelace is much more expensive.
The hard problems are easy, and the easy problems are hard
Evolution spent hundreds of millions of years optimizing the motor control of our body before it added the thin layer of cortex we celebrate as intelligence. The neocortex is young. Hands are ancient. Reverse-engineering the hand into a metal claw is fiendishly hard.
The mistake is thinking this is one problem. Dexterity is actually a stack of problems, and most of them live below the level of thought.
Man, why is it so hard to build a “hand”?
Robotic dexterity is where several hard problems converge. The first one starts with anatomy.
Bottleneck #1: Actuators & Packaging
Hold your forearm and wiggle your fingers. You will feel movement, but not where you expect. The muscles doing the work are not in your hand at all. They sit up in the forearm. Force travels through long tendons, essentially biological cables, down into the fingers.
Evolution made the hand light, fast, and low-inertia. You can accelerate, decelerate, and reverse direction without fighting your own mass. The human hand is a masterpiece of remote actuation.
When engineers try to build a robot hand, they have three broad options
Motors in the joints: Putting electric motors in the knuckles gives precise control, but it makes the fingers heavy. Added mass raises inertia, which limits speed and makes high-force motions dangerous.
Cable-driven: Motors live in the forearm, and tendons or cables route force to the fingers. This looks closer to biology, and companies like Tesla and Shadow Robot have pursued it seriously. The tradeoff is control complexity. Cables stretch, wear, slip, and introduce friction that changes over time. Compensating for that variability requires constant estimation, and small errors compound quickly at the fingertips.
Hydraulics: Pressurized fluid can deliver enormous force from compact actuators. The cost is system overhead. Pumps, valves, seals, and reservoirs add weight, failure modes, and latency. Controlling fluid flow with millisecond precision, the kind needed to catch a falling glass, pushes the limits of real-time control.

Robots need both Power & Control to manipulate objects well. Data sources: Science Robotics, Frontiers, Science Robotics (Comparison), Actuators MDPI, Maxon Group, Advanced Materials
Bottleneck #2: Latency
Pick up a paper cup, and you know immediately whether it is empty or full. If it starts to slip, you feel it before it falls.
Human dexterity depends on fast, local control loops that sit below awareness. The hand adjusts grip force continuously, correcting for slip, vibration, and micro-misalignment in tens of milliseconds. By the time awareness catches up, the problem is already handled.
Compare how much data the nervous system takes in versus how little consciousness can handle:
Sensory systems deliver approximately 1 billion bits per second across vision, touch, and proprioception.
Conscious awareness of the brain: just 10-50 bits per second.
In other words, the nervous system filters and discards most sensory input, letting only a thin summary reach awareness. When a mug slips, consciousness may form one simple instruction, “grab it,” while the real work happens elsewhere through rapid corrective loops outside awareness. That is the “ancient intelligence” my grandma had.
Historically, engineers tried routing all raw sensory data to the robot's central brain. That design choice fights biology head-on. Human dexterity depends on peripheral intelligence. If a robot hand has to ask its head what to do every time friction changes, it will always move too slowly.
Vision Sensors Latency: Tactile sensors use cameras to read surface deformation. Great detail, but with frame rates at 30-60 Hz, they are too slow to catch a micro-slip. Startups like Daimon Robotics (backed by Lenovo) and GelSight are pushing for higher frame rates and thinner form factors.
Inference Latency: Vision-Language-Action (VLA) models excel at generalization because they learn concepts from massive datasets. But they are slow, thinking in hundreds of milliseconds. VLAs should decide goals at 2–10Hz. Hands need reflex control at 200–1000Hz.

Comparison of signal processing and reaction latencies. Human spinal reflexes operate on a ~30ms loop. In contrast, modern VLA models introduce latencies of 200-500ms. Data sources: NCBI, PubNub, Reddit (AskScience), Robotics SE, Arxiv, Rohit Bandaru
Bottleneck #3. Dimensionality
Evolution, in its brilliance, solved the balance between strength, speed, and weight with the biological muscle. An actuator that is soft, compliant, power-dense, and self-repairing. It tolerates error. It absorbs shocks. It deforms gracefully instead of breaking.
Researchers tried to recreate this using actuators in robots, which act as the muscles that move a joint. That attempt runs straight into what I sometimes think of as the original sin of robotics: adding degrees of freedom faster than we improve mechanical intelligence.
Degrees of freedom (DoF) describe how many independent ways a system can move. A simple industrial gripper has one. Open or close. A human hand has roughly 25, depending on how you count coupled joints, with multiple muscles interacting across fingers, palm, and wrist.

Robot Hand Degrees of Freedom Evolution: Path Toward Human-Level Dexterity (1980-2025)
To rotate a pen, a robot must coordinate 27 joints simultaneously while predicting how the pen will slide (friction) and how the fingers will deform (compliance).
The difficulty of controlling a system does not grow linearly with DoF; it grows exponentially. This is the curse of dimensionality. As dimensions increase, the space of possible states grows so fast that exhaustive planning becomes impossible. In principle, a robot would need to evaluate an astronomical number of possible futures each second to ensure the pen does not drop.
Rigid objects like pens are already hard but still somewhat tractable. Deformable objects are worse. A string, a cloth, or a shoelace changes shape continuously. You cannot reliably compute every fold and twist in real time
This forces robots to rely on intuition (learned neural networks) rather than physics equations. Neural networks approximate good behavior without fully modeling the world.
And this is why the ByteDance robot’s 83% success in threading shoelaces is so impressive. It handles a deformable object, a tiny target, frequent occlusion, and constant uncertainty, all at once, with high success.
Bottleneck #4. Perception Breakdown
Most modern robots see before they act. Vision drives everything. The system detects a cup, estimates its pose, plans a reach, and executes the grasp.
Then the fingers close.
At that moment, the robot blinds itself. The hand blocks the cameras exactly when contact begins, which is when the interaction becomes hardest and most information-rich. Vision works at a distance. Dexterity happens at contact. The hand steps in front of the eyes and the model loses the scene.
That failure mode is the occlusion problem
To fix this, researchers are developing "Tactile Hallucination" techniques. The AI infers tactile properties, such as hardness and friction, from visual data before contact is made. Basically, it guesses what the touch should feel like.
Others, like Figure 02, are integrating cameras directly into the palm or fingertips. If you can’t trust the brain to hallucinate, you might as well put eyes in the hands.
The Robot Dexterity Stack
So now that the constraints are clear, we can shift our frame to how fast are we actually moving? This matters because it gives us an idea of the timeline for when generally useful humanoid robots will be ready for our homes.
My view is that human-level dexterity will be won by a systems stack, not a single breakthrough. Some layers will move faster with better learning. Others are dominated by physics and mechanics.
I find it helpful to think in terms of a Dexterity Stack:
Materials and compliance
Actuation and power density
Tactile sensing
Reflex control loops (local, fast)
Skill learning (policies)
Task planning (reasoning)

Progress is fastest at the top. We’re moving quickly on learning and planning, because these software layers scale with data and compute. Progress is much slower downstream, where the world is fought in metal, friction, heat, and wear.
General-purpose manipulation is decided at contact, and contact is dominated by the lower layers. So when you see an impressive demo, the right question to ask is: “Which layers did this actually advance?”
1) Hardware unlocks: actuators, packaging, and compliance
Mechanically replicating human hand muscles is hard.
Any robotic actuator faces trade-offs between strength, speed, precision, size, weight, and energy use. A hand design that’s strong enough might be too bulky; a design that’s compact and precise might not be durable or might require too much power. Adding more degrees of freedom increases the mechanism's fragility and complexity.
Metal and plastic are unforgiving. When contact forces misalign, something has to give, and too often it is the mechanism itself. This is why so much current work explores softer materials and new actuator designs. Artificial muscles, elastomeric joints, variable stiffness mechanisms, and hybrid soft-rigid structures all aim to reintroduce mechanical forgiveness at the material level.
Smart Springs
Human hands spend energy mainly when they move. The natural springiness helps hold a grip without much extra effort. Many robotic hands are the opposite. If a motor is holding something tightly, it often requires constant power, which generates heat, necessitating liquid-cooling systems in robots like Optimus. Research in 2025 is pivoting toward springs and adjustable stiffness so the hand can maintain grip with much less force.
Routing the cabling for hundreds of sensors and dozens of motors through a rotating wrist joint introduces more mechanical points of failure.
Instead of giving every touch sensor its own cable, researchers are moving to a single shared data line so you only need a few wires for power + data and can chain modules together.
2) Systems unlocks: latency, reflexes, and local control
Edge Computing in the Palm
The hand must declare independence from the body. Currently, the brain in the robot's chest controls the fingers. This is too slow. The signal takes too long to travel.
We are seeing processors embedded directly in the palm or fingers, running high-frequency control loops locally and sending only high-level state upward. This dramatically reduces reaction time and mirrors the spinal architecture that makes human manipulation possible.
There is also a more radical approach: eliminate skin sensors entirely. Kyber Labs uses fingers built to move very freely, so when a finger bumps into something (even something as light as a feather), the motor immediately senses extra resistance. That resistance shows up as a small change in the motor’s drive current, so the controller can estimate contact force and stop right away, using the motor itself as the sensor.
3) Learning unlocks: dimensionality
This is arguably the toughest problem in dexterity, and the one most likely to be eased by greater intelligence.
What learning can do is make the space navigable.
Classical method
In the early days, engineers tried to program dexterity using exact coordinates and equations. You measure the mass of the cup, the friction of the finger, and calculate exactly how much force to apply.
However, if your estimate is off by a small amount, maybe the cup is damp, maybe the surface is worn, the object drops.
We cannot define dexterity in equations because even you don’t know how you do it. We know more than we can tell. All these dexterity actions are pre-linguistic. That is why hand-coded manipulation systems fail
Imitation & Reinforcement Learning
So, researchers moved to learning methods.
In imitation Learning (IL), the robot observes a human or teleoperator performing a task many times and learns to reproduce the behavior. This dramatically reduces trial counts compared to blind exploration. The tradeoff is coverage. The robot can only learn what it sees, and human demonstrations do not map cleanly onto robotic joints, sensors, and constraints.
Another learning method is Reinforcement Learning, where the robot explores on its own, often millions of times, discovering strategies through reward and failure rather than demonstration. This can uncover behaviors humans would never teach, but the sample cost is enormous. As a result, most reinforcement learning for manipulation happens in simulation, where failure is cheap and fast.
That leads to the next bottleneck. Skills learned in simulation often break in the real world. Physics engines are approximations. Friction is wrong. Contacts behave differently. Sensors are noisier. This is the sim-to-real gap, and it is one of the central unsolved problems in dexterous robotics
One promising response is the rise of differentiable physics engines. Instead of treating physics as a fixed black box, these systems allow learning algorithms to differentiate through the simulator itself, adjusting physical parameters during training. NVIDIA's Newton is an example of this shift.
Alongside this, a new class of models is reshaping how robots reason in high-dimensional spaces.
Vision-Language-Action models take the reasoning power of large language models and ground it in the physical world. Give a robot a natural language command, “pick up the apple,” combine it with visual input, and the model outputs a sequence of motor actions. This is where common sense finally shows up in robotics.
The tradeoff is speed. These models reason well, but they reason slowly. Typical inference takes 200 to 500 milliseconds. That is fine for deciding what to do. It is far too slow to decide how to do it in the moment.
New research like Tactile-VLA frameworks pushes this idea further by adding touch into the model’s vocabulary. Fine-tuning on tactile tokens lets the system reason about physical interaction itself, not just objects and goals. “The surface feels slippery, increase grip force” becomes a conceptual step that the model can represent.
Better VLAs do not remove dimensionality. But they make it navigable.
Where Do We Go from Here?
The mistake is to expect a single “AGI for robots” moment. Dexterity will arrive the way most real engineering arrives, through a long series of small wins that compound.
To measure progress without being fooled by demos, I’m keeping my eye out for these signs:
Stable grasps under occlusion without slowing to a crawl
Recovery from near-failure states (incipient slip, mis-grasp, collisions)
Competence with deformables (cloth, cables, food)
Hands that survive months of contact-rich work without constant recalibration
Lower power and heat for the same manipulation performance
That being said, there are a couple of areas I’m particularly excited about.
Innovation in Robotic Skins
One area where progress actually aligns with biology is robotic skin. Touch is not just another sensor. In humans, it is a fast, local, and opinionated system.
A promising direction here is Neuromorphic Electronic (NRE) Skin. Instead of streaming continuous measurements, these skins only send signals when something changes. A slip, a spike in pressure, a sudden contact. The result is bursty, event-driven feedback that looks much closer to how biological skin works. In practical terms, it means robots can receive rich touch information in microseconds, without flooding the system with data the way cameras and frame-based pressure sensors do today.
The even bigger step is active pain perception, or “artificial pain”.
Robots today are economically blind at the point of contact. They will happily destroy a $50,000 manipulator to crush a $0.10 soda can because there is no local sense of the cost of damage. Everything is optimized for task success.
This is why artificial pain matters. It’s a missing control layer.
When local stress or pressure crosses a safety threshold, the skin can trigger an immediate withdrawal reflex without waiting for a central controller.
This is still a faraway idea, but such sensors in the fingertips will not just detect pressure but could assign a real-time depreciation value to the hardware. The controller learns that some actions are expensive even if they succeed. If the mechanical stress exceeds a localized budget, the finger retracts. This is the financialization of robot reflexes.
On a parallel track, companies like Prophesee are embedding event-based cameras (Metavision) into tactile sensors (like GelSight). These can detect micro-vibrations and adjust grip force microseconds before an object falls.
GPT for Touch
I believe that eventually, manipulation will shift from just being able to see more, to predicting contact.
The natural endpoint is a Large Tactile Model, a foundation model trained not on words or images, but on force, vibration, slip, and temperature. Instead of predicting the next token, it predicts the next moment of contact. Given an action and a context, it estimates how forces will evolve over the next few milliseconds.
With enough physical experience, such a model does not need to see every object before. A sponge looks squishy. A paper cup looks fragile. The system learns the priors that humans carry implicitly.
In practice, this will look like a hierarchy:
Vision and language decide what to do next (slow, semantic, 2–10 Hz)
Touch decides how to do it safely at contact (fast, reactive, 200–1000 Hz)
The tactile model becomes a subsystem inside a larger robotic foundation model. Not the brain, but the part that prevents the brain from ruining the task at the last inch.
Companies like Mimic are explicitly pursuing this idea, aiming to encode manipulation intuition for industrial environments, even if true human-level dexterity remains aspirational.
Dexterity is also about comfort (warmth)
Dexterity isn’t only control. It’s also how contact feels to humans.
A robot can apply the correct pressure, match the right texture, even mirror the timing of a human handshake, and still fail the interaction if the hand is cold. At room temperature, skin triggers an immediate recoil. It feels wrong. Not mechanical, but dead.
That response is biological. Warmth signals life.
If humanoids ever move into homes, I expect comfort engineering to become a real design axis. Waste heat from batteries, motors, or processors will be routed through the palm and fingers to keep surface temperature near the human range. Not for performance, but for trust. When touch enters social space, thermodynamics becomes part of the UI.
Field Guide: The Current Ecosystem
Despite the bottlenecks, dexterity is progressing fast. But it’s not progressing in one single direction. It’s splitting into a few distinct design philosophies.
The easiest way to stay oriented is to stop asking “who has the best hand,” and start asking:
What trade-off are they choosing?
Power vs bandwidth, sensing vs simplicity, compliance vs precision, product reliability vs research flexibility.

1) Industrial pragmatists: “Enough dexterity to ship”
In factories, partial dexterity is good enough. Companies focus on simpler grippers combined with smart AI. They’re not fully dexterous by human standards, but they push the envelope in automation by handling a range of objects with minimal reprogramming.
So the winning pattern in industrial automation is not ultra-dexterous hands. It’s simple end-effectors + strong perception + fast integration.
You can see the ceiling of this philosophy in extremely compact industrial arms that stay stable even through kinematic singularities, where many systems become jittery or unstable. Just like this one:
MIRO U: This is a weird, superhumanoid industrial robot that looks less like a person and more like Hydra from Spiderman. That’s because it has six arms. It works by rolling quickly through factories on a stable wheel-leg base to multitask.
2) Humanoid generalists: “We need hands that survive the real world”
Humanoids inherit the worst constraint of all: they are expected to touch everything.
Bins, shelves, cables, packaging, soft goods, tools, doors, handles, carts. And they need to do it while their hands block their own cameras, while the objects vary day to day, and while humans stand nearby.
That forces different design choices. You start to see teams converge on three requirements:
more degrees of freedom (so the hand can adapt)
more sensing (so contact isn’t blind)
more mechanical forgiveness (so small errors don’t become failures)

Heatmap of 8 robots across various metrics. No single robot dominates all metrics
Here are four representative bets:
Tesla Optimus: Tesla moved actuators to the forearm in Gen 2 and increased complexity to 22 DoF in Gen 3, with 25 actuators per hand. That’s a nearly 200% increase in complexity from previous prototypes. The fingertips also have a multi-zone feature, which allows it to detect not just pressure, but also texture, slippage, and friction.
Sanctuary AI Phoenix: Sanctuary is one of the strongest “hydraulics can win” arguments in the market. Hydraulics can deliver serious force in small volumes, which matters for compact hands. Their bet is that if you can control it, you can unlock fast, high-strength in-hand manipulation that electric systems struggle with.
Figure: Figure is taking a brutally practical route. It added palm cameras so the hand can still see when head cameras are occluded. It also uses soft goods (textiles/rubbers). Soft fingers are forgiving; if the grasp is 1mm off, the finger squishes rather than fails.
Clone Robotics: Takes biomimicry to the extreme with Hydraulic Artificial Muscles, which offer compliance and impact tolerance that rigid geartrains can’t match. However, artificial muscles are difficult to model and less efficient than direct-drive electric systems due to thermodynamic losses. The company released its full-body "Protoclone V1" in early 2025, signaling a move toward full musculoskeletal androids.
3) Touch-first specialists: “Dexterity is sensing.”
A different group is betting that the true bottleneck is not degrees of freedom, but contact observability.
If you can’t detect slip early, measure micro-vibrations, or read force distribution in the fingertip, then the control policy is always acting late. And late is how objects drop.
Sharpa: The SharpaWave hand is a clean example of dense tactile ambition: arrays in the fingertips that can pick up subtle pressure shifts, paired with high 22 DoF. The aim is not just grasping, but controlled interaction and faster correction when contact changes.
SpikeATac (Columbia Engineering ROAM and CLUE labs): They put two complementary touch signals into one fingertip: one produces fast spikes when contact starts or breaks, and the other measures steady pressure. This pairing lets the robot react immediately upon contact and stop within 2mm, enabling fast yet gentle handling of fragile objects.
Psyonic: Psyonic’s Ability Hand is a bionic hand for amputees, but it’s also useful as a model for robotic touch. It has pressure sensing and sends vibration feedback to the arm. Such high-fidelity pressure datacan drive quick reflexes, like tightening when it detects a slip.
4) Minimalist reflex hands: “Don’t instrument everything, infer contact.”
The opposite philosophy is equally interesting: avoid complex tactile skins altogether.
Instead, use the actuator itself as the sensor.
If a finger is free-moving, any contact results in a change in resistance and motor current. That can be enough to trigger a fast reaction without high-resolution touch arrays.
Kyber Labs: Their demos are a strong case study in reflex design. High-speed motion, immediate stop on feather-light contact - this video demo of it rotating a nut is quite fascinating. Instead of fingertip sensors, it detects contact by monitoring motor current (a change in resistance shows up as a current change). It is meant to be low-cost and factory-ready on standard robot arms
5) Research workhorses: “Make it survive training.”
Finally, there’s the category that quietly powers the whole ecosystem: hands built for researchers.
The constraint here is brutal. Reinforcement learning and long-horizon manipulation training produce endless crashes, slips, collisions, and resets. Fragile hands die.
Shadow Robot: It is best known for the "Shadow Hand.” It has only three fingers, so it can survive repeated collisions and the wrenching of fingers common during reinforcement learning experiments. In practice, it has become famous for “stress-test” style durability stories. Downtime is expensive in large-scale training
Conclusion
Dexterity won’t be solved by a single breakthrough. It will be solved by a compounding stack.
Better sensors make learning easier. Better actuators make control safer. Better models turn messy hardware into something usable. Each layer lifts the others. Progress compounds.
Matthew Crawford wrote that working with machines subjects you to a judgment outside your own mind. You can lie with words. You can fool yourself with vision. But you cannot negotiate with a stripped bolt. Touch is that kind of truth. Immune to any kind of marketing.
We’ve built models that thrive in the world of bits, where everything is cheap, reversible, and infinite. But we live in the world of atoms, where friction is real. And mistakes are expensive.
So yes, we’re getting good at building machines that can reason like us.
We’re still learning to build machines that can grasp like children.
What comes next will look unglamorous in real time. Better skins. Faster reflex loops. More compliant hands. Controllers that know when to back off. A thousand small fixes that turn “impressive demo” into “reliable worker.”
Today: folding towels.
Tomorrow: threading cables in a crowded bin.
Eventually: fixing a leaky pipe under your sink, then telling you exactly why it failed.
Cheers,
Teng Yan & Ravi
This is the first piece in our ongoing robotics research series. More deep dives are coming.
PS. If you’re building or investing in AI/robotics, this is the stuff people usually learn the hard way. Forward this to the one person on your team who should be thinking about this.
I also write a newsletter on decentralized AI and robotics at Chainofthought.xyz.
