Robotics

FOBI Editorial

FOBI Editorial
The robotics sector is betting on physical AI while underestimating the hardware resilience it demands.
07.06.26

Decade away

Robot action trajectory data scarcity

Foundation models for robotics face a data bottleneck: the gap between robot-relevant training data and LLM-scale datasets is ~120,000×. Existing datasets (Open X-Embodiment, RH20T) aggregate hundreds of thousands of trajectories; LLMs train on trillions of text tokens. This scarcity limits generalization, forces task-specific retraining, and slows embodied AI progress. Closing this gap—through automated data collection, human-in-the-loop systems, and sim-to-real bridging—is essential for scaling foundation models and enabling zero-shot transfer across tasks and robots.

Approaches in flight

Robot action trajectory data scarcity

▸ Large-scale distributed data collection and pooling

Open X-Embodiment (Google + 22 institutions) aggregated 527 skills and 1 million+ trajectories across 22 robot platforms by cross-licensing datasets. Scale AI launched its Physical AI Data Engine (collecting 100,000+ hours of real-world robotics data in 2025) with focus on semantic enrichment: every trajectory is annotated with task goals, failure modes, and success criteria. RH20T (U. Maryland, 110,000+ contact-rich sequences with force/torque) open-sourced to establish public benchmarks. The bottleneck: no single company can generate enough data alone. Pooling requires standardized formats (H5 datasets, RLDS protocol), privacy agreements, and incentive alignment. Cost: teleoperation ($100–$300 per hour of data) limits scale. Recent innovation: autonomous data collection robots deployed in homes and warehouses (Scale AI, Covariant, 1X Technologies) reduce per-hour cost to $30–$50. Challenges: diversity remains limited (most datasets skew toward table-top manipulation, not outdoor or contact-heavy tasks); heterogeneity across sensor modalities (RGB-D, stereo, lidar vary per robot) complicates learning.

▸ Generative learning and synthetic data augmentation

Research (2025–2026) on synthetic trajectory generation aims to create diverse robot experiences without teleoperation. World models (diffusion, latent dynamics) can roll forward predictions of novel situations; behavioral cloning or imitation learning then learns from these synthetic trajectories. Results show 30–50% reduction in required real-world data for simple tasks (grasping, pushing) when augmented with realistic synthetic data. Limitations: synthetic data is only as good as the underlying world model; errors accumulate quickly, and policies trained on corrupted trajectories diverge from real robot behavior. Generative models work best for high-level planning (where does the object go?) rather than low-level control (how much torque?). Scaling requires massive compute for diffusion models; inference cost remains high (10–50× more than deterministic simulators). Adoption timeline: 2–3 years before synthetic augmentation becomes standard in production pipelines.

▸ Teleoperation and human-in-the-loop data annotation

Human teleoperators collect high-quality demonstrations by directly controlling robots (VR headsets, haptic gloves, AR overlays). Covariant, Physical Intelligence, and Sanctuary AI employ distributed teleoperation teams (e.g., operators in Philippines, data labeled in India) to collect diverse trajectories. Cost per hour is dropping (from $500 in 2020 to $50–$100 in 2025 due to tooling improvements). Annotation overhead (per-frame labels for task progress, error detection, context) adds $20–$30/hour. Recent innovations (DOBB-E: record household tasks via smartphone camera; UMI: capture human hand demonstrations with commodity grippers) make data collection more accessible to researchers. Scale AI's semantic enrichment layer (annotating not just what happened, but why) improves downstream model quality by 20–30%. Bottleneck: human expertise is scarce (experienced roboticists earn $100k+/year). Scaling to billions of trajectories requires either cheaper labor (higher error rates) or full automation (sim-only, still low diversity). Current trajectory: 1–10 million human-collected sequences/year at industry labs; need is 10–100 billion to match LLM training scale.

▸ Transfer learning and cross-embodiment meta-learning

Foundation models pretrained on diverse embodiments (arms, wheels, legs, grippers) aim to extract task-agnostic features (e.g., 'moving toward' applies across morphologies). GR00T N1.6 and rt-2 show modest cross-embodiment transfer (~10–30% task success when deployed on unseen robots without retraining). Significant gains come from fine-tuning on target robot data (5–50 real-world rollouts). Meta-learning approaches (MAML, Prototypical Networks) are being explored but don't yet match task-specific supervised learning. Fundamental challenge: embodiment differences (leg length, actuator speed, sensor placement) create large distribution shifts that generic pretraining doesn't fully bridge. Near-term approach: pretrain on internet-scale vision + diverse trajectory data, then quickly adapt to specific robots/tasks (few-shot learning). This reduces data scarcity from 120,000× to perhaps 10–100× over the next 2–3 years, still a major blocker for true generalist systems.

Robotics

Robotics

FOBI Editorial

Latest Stories

Companies

KPIs

Latest News

Atlas at the FIFA World Cup

IEEE Honors Robotics Pioneer Toshio Fukuda

Videos

Talent Moves

Catalysts

Conferences

Earnings Calls

Venture Stages

Valuations

Funding & analysis

Bottlenecks

Generalist manipulation in unstructured environments

Generalist manipulation in unstructured environments

Power density and runtime

Power density and runtime

Sim-to-real transfer at scale

Sim-to-real transfer at scale

Robust bipedal locomotion and balance

Robust bipedal locomotion and balance

Unit economics and scalable manufacturing

Unit economics and scalable manufacturing

Robot action trajectory data scarcity

Robot action trajectory data scarcity

Investment Theses

Embodied labor for the jobs humans no longer want

Embodied labor for the jobs humans no longer want

Foundation models eat robotics the way LLMs ate language

Foundation models eat robotics the way LLMs ate language

China's manufacturing stack collapses the humanoid cost curve

China's manufacturing stack collapses the humanoid cost curve

Defense procurement underwrites autonomy at wartime speed and scale

Defense procurement underwrites autonomy at wartime speed and scale

Top 10

Investors

Books

Modern Robotics: Mechanics, Planning, and Control

Probabilistic Robotics

Springer Handbook of Robotics

Autonomy: The Quest to Build the Driverless Car—And How It Will Reshape Our World

Our Robots, Ourselves: Robotics and the Myths of Autonomy

The New Breed: What Our History with Animals Reveals about Our Future with Robots

Robotics: A Very Short Introduction

Graveyard

Embodied

Humanoid says KinetIQ Ascend reinforcement learning approaches human-level dexterity

#RoboCup2026 – humanoid league day 2

Automate 2026 show recap

#RoboCup2026 – humanoid league day 1

Luxonis closes Series A round to scale physical AI perception layer

Are Humanoid Robots Really Ready to Deploy?

Why 10,000 buyers pre-ordered UBTech's new emotional AI robot

SVT Robotics surpasses four billion transactions on its Softbot automation platform

Insights behind Kinisi’s acquisition by Bear Robotics

BMW Group deploys Figure 03 humanoid after tests with previous version

Predictions

Policy & Courts

Round sizes

Stage mix

Lead investors

Publications

Conferences

University labs