LEARNINGCURRENT2025-09-30

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi

ARCHITECTURE
RL policy with interaction-preserving retargeting
ROBOT
Unitree G1 humanoid
DATASET
8+ hours of generated trajectories
KEY METRIC
30 seconds
TASK
loco-manipulation, locomotion, parkour, scene interaction

OmniRetarget solves one of robotics' hardest problems: teaching humanoid robots complex acrobatic skills from human videos. The breakthrough result is stunning—a Unitree G1 humanoid successfully executes 30-second parkour sequences, carrying chairs, climbing platforms, and performing parkour rolls, all trained with just 5 simple Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. terms and no curriculum learning. What makes this remarkable is that the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. learns from human motion capture without any computer vision during Core ConceptsExecutionActually carrying out planned or predicted actions on the robot. (proprioceptive-only Control & PlanningControlThe method used to make the robot move the way you want.). The key innovation is preserving interactions: instead of treating human-to-robot motion retargeting as a pure kinematic problem, OmniRetarget explicitly models and maintains Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. relationships between the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions., objects it's manipulating, and terrain. This turns a single human Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. into a data goldmine—you can automatically generate Robot LearningTrainingThe process of fitting a model using data or experience. data with different object sizes, positions, terrains, and even different Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. embodiments, all while keeping the interaction semantics intact.

ARCHITECTURE

THE PROBLEM

Before OmniRetarget, motion retargeting (converting human movements into Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. commands) was plagued by the Core ConceptsEmbodimentThe robot’s physical form, including its body, joints, sensors, and actuation limits. gap problem. Humans and humanoid robots have fundamentally different body proportions, Movement, Mechanics & Robot BodyJointA movable connection between robot parts. ranges, and physical capabilities. When you naively retarget human motion to robots, you get physical disasters: feet sliding through floors (foot-skating), hands penetrating objects, and Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. artifacts that make the motion physically implausible. Existing methods like Geometric Motion Retargeting (GMR) and physics-based humanoid controllers (PHC) tried to fix kinematic infeasibility, but they completely ignored the semantic content—the actual interactions between the human, objects, and Core ConceptsEnvironmentThe external world the robot operates in, including objects, obstacles, people, and surfaces.. A human Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. of 'carry a box up stairs' contains rich relational information about hand-object Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. and foot-ground Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. that previous methods simply discarded. This meant Robot LearningTrainingThe process of fitting a model using data or experience. data was wasteful: one human video could only train one Modern Robot LearningSkillA reusable behavior like grasp, push, place, or open drawer. on one Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. with one object configuration. Developers had to manually create massive motion datasets or craft Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. functions by hand, making it expensive and brittle to scale humanoid learning.

HOW IT WORKS

1

Interaction Mesh Construction

OmniRetarget represents the scene as an interaction mesh—a unified geometric representation that explicitly tracks spatial relationships between the agent's body, manipulated objects, and terrain. Instead of treating retargeting as independent Movement, Mechanics & Robot BodyJointA movable connection between robot parts. angle conversion, the system models Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. relationships: which fingers touch the object, which foot contacts the ground, and what the relative geometry should be. The interaction mesh becomes a Control & PlanningConstraintA rule the robot must obey, such as avoiding collisions or staying within joint limits. that must be satisfied during retargeting, ensuring that if a human's hand grasped a box at a specific location and angle, the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.'s retargeted motion preserves that same grasp relationship. This is fundamentally different from prior work that either ignored interactions or handled them as soft objectives that could be violated.

flagship
wallflip
roll
climb 4
2

Laplacian Deformation with Kinematic Constraints

Given the interaction mesh, OmniRetarget solves a constrained optimization problem: deform the human skeleton into a Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. skeleton while minimizing Laplacian deformation (preserving local geometric structure) and enforcing hard kinematic constraints. Laplacian deformation ensures the motion stays smooth and natural—local neighborhoods of the skeleton maintain their shape even though the global skeleton changes. Simultaneously, the system enforces that all Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Movement, Mechanics & Robot BodyJointA movable connection between robot parts. limits are satisfied, that feet don't penetrate terrain, and that hands maintain their Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. relationships with objects. This produces kinematically feasible trajectories that a real Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. can actually execute, eliminating the foot-skating and penetration artifacts that plague naive retargeting. The math is solving a non-convex optimization per frame, but the authors made it efficient enough to process over 9 hours of motion data.

3

Systematic Data Augmentation from Interaction Semantics

Because OmniRetarget preserves the underlying interaction structure (not just Movement, Mechanics & Robot BodyJointA movable connection between robot parts. angles), it can automatically generate diverse Robot LearningTrainingThe process of fitting a model using data or experience. data from a single Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human.. You show the system one video of a Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. carrying a box—OmniRetarget extracts the semantic interaction (hand-object Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface., foot-ground Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. patterns). It then automatically generates new Robot LearningTrainingThe process of fitting a model using data or experience. data by varying the object's initial position (rotated 45°, translated left/right), the object's size (small/large), the terrain height (0.8× to 1.2× scale), and even different Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. embodiments (Unitree T1 vs H1). Each augmented Core ConceptsTrajectoryA sequence of states or actions over time. preserves the core interaction semantics while adapting to new configurations. This is genuinely powerful: one human Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. becomes dozens of Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Robot LearningTrainingThe process of fitting a model using data or experience. trajectories automatically.

4

Proprioceptive RL Training with Minimal Rewards

The high-quality retargeted motion data serves as kinematic references for Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards.. Instead of learning from scratch with hand-crafted Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. engineering, the Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. Core ConceptsPolicyThe rule or model that maps observations or states to actions. (trained using standard methods) simply tries to track these reference trajectories while respecting physics and using only proprioceptive Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. (Movement, Mechanics & Robot BodyJointA movable connection between robot parts. angles, velocities, IMU). The authors used only 5 Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. terms and 4 Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. parameters—shared across all tasks (parkour, carrying, climbing). This is remarkably minimal compared to typical humanoid Control & PlanningControlThe method used to make the robot move the way you want. papers that require task-specific Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. tuning and curriculum learning. The Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. agent learns that following the retargeted references leads to successful Modern Robot LearningSkillA reusable behavior like grasp, push, place, or open drawer. Core ConceptsExecutionActually carrying out planned or predicted actions on the robot., and the quality of the retargeting data determines whether this is actually feasible.

MORE DEMONSTRATIONS

climb 1
climb 3
climb 2
climb 5
step
crawl 1
crawl 2
crawl 3
crawl 4
box 1
box 2
box 3
box 4
box 5
box 6
box 7
box 8
box aug 1
box aug original
box aug 2
box size aug 1
box size aug original
box size aug 2
terrain aug 1
terrain aug ori
terrain aug 2

KEY RESULTS

Long-horizon task execution30 seconds

vs. typical humanoid skills at 5-10 seconds

The Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. successfully executes multi-phase tasks (carry chair → climb platform → parkour roll) lasting 30 seconds continuously. This demonstrates coherent, long-horizon reasoning where the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. maintains balance, object Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects., and dynamic movement across multiple phases without falling or losing track of the Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening..

Training simplicity5 reward terms, 4 domain randomization parameters, no curriculum

vs. typical humanoid papers requiring 15-20+ rewards and multi-stage curricula

The entire Robot LearningTrainingThe process of fitting a model using data or experience. pipeline uses minimal hyperparameter tuning—one shared Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. structure and simple Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. work across all tasks (parkour, Manipulation & TasksLoco-manipulationLocomotion and manipulation happening together, often in humanoids., climbing, crawling). This suggests the retargeted motion data is so high-quality that Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. doesn't need extensive task-specific engineering. This is practically important: it means you can scale to new tasks without rebuilding Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. functions from scratch.

Data generation scale9+ hours of motion trajectories

vs. from multiple human mocap datasets (OMOMO, LAFAN1, proprietary)

OmniRetarget processed and retargeted over 9 hours of human motion capture data across three different datasets, producing physically feasible Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. trajectories. This demonstrates Modern Robot LearningRobustnessHow well a robot keeps working despite noise, disturbances, or variation. across different human movement styles and datasets, not just curated in-house motion capture.

Contact preservation vs. baselinesBetter kinematic constraint satisfaction and zero foot-skating

vs. GMR and PHC baselines showing visible foot-skating and penetration

Visual comparisons in the project page show that GMR produces obvious foot-sliding artifacts and object penetration, while OmniRetarget trajectories obey non-penetration constraints and maintain Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. integrity. This is the core technical contribution—making retargeting interaction-aware actually eliminates the physical artifacts that break Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. Robot LearningTrainingThe process of fitting a model using data or experience..

PERFORMANCE COMPARISON

WHY DEVELOPERS SHOULD CARE

If you're building humanoid robotics applications, OmniRetarget changes the game in two ways. First, it solves the data bottleneck. Creating Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Robot LearningTrainingThe process of fitting a model using data or experience. data has been expensive—you either hire motion capture studios, manually craft trajectories by hand, or run massive Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. simulations. OmniRetarget lets you harvest human motion from public datasets (LAFAN1 contains thousands of diverse human movements) and automatically convert them to Robot LearningTrainingThe process of fitting a model using data or experience. data for multiple Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. embodiments and configurations. One human video becomes dozens of Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Robot LearningTrainingThe process of fitting a model using data or experience. scenarios. Second, it shows that motion retargeting, done correctly, is actually a foundational building block for humanoid learning. Prior work treated retargeting as a preprocessing step that was 'good enough'—but this paper demonstrates that preserving interaction semantics during retargeting is critical. The Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. Core ConceptsPolicyThe rule or model that maps observations or states to actions. can then focus on the Control & PlanningControlThe method used to make the robot move the way you want. problem (tracking references with physics) rather than learning from scratch. This is important philosophically: it suggests that human demonstrations contain rich structure about what skillful movement should look like, and respecting that structure (especially interaction structure) makes learning much more efficient. For developers, this means: (1) leverage human mocap data systematically instead of collecting robot-only data, (2) think about interaction constraints as first-class citizens in motion processing, (3) don't over-engineer Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. functions if your reference data is high-quality.

LIMITATIONS

OmniRetarget relies on accurate motion capture input and requires that interactions can be modeled geometrically (hand-object Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface., foot-ground Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface.). It doesn't handle situations where the scene geometry is unknown or complex interaction logic is needed (e.g., 'grasp the handle, not the blade'). The method also assumes that human motion is retargetable to the target Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. at all—some human movements (like extreme flexibility) simply aren't feasible for robots, and the paper doesn't discuss how gracefully it handles such cases. Additionally, all experiments are on Unitree humanoids in relatively controlled environments; Modern Robot LearningGeneralizationThe robot’s ability to work in new situations it has not seen before. to other Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. morphologies or unstructured real-world scenes is untested. The proprioceptive-only Core ConceptsPolicyThe rule or model that maps observations or states to actions. also means the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. has no visual Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior., limiting Modern Robot LearningRobustnessHow well a robot keeps working despite noise, disturbances, or variation. to unexpected scene variations or dynamic obstacles that the Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. didn't cover.

WHAT COMES NEXT

The next frontier is likely bridging the Simulation & Sim-to-RealSim-to-real (sim2real)Transferring a policy trained in simulation to a real robot. gap more reliably and adding Perception & SensingPerceptionThe process of turning raw sensor data into useful understanding of the world.. Currently, OmniRetarget generates data in Simulation & Sim-to-RealSimulationA virtual environment where robots can be trained or tested., and there's always slippage when deploying to real robots. Combining interaction-preserving retargeting with vision-based policies (so the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. can adapt when objects or terrain don't match the Data, Distributions & Training IssuesTraining distributionThe kinds of examples the model saw during training. exactly) would make this approach production-ready. Another direction is learning from in-the-wild human video (YouTube, TikTok) without mocap—estimating 3D human pose from video, preserving interactions, and retargeting for robots. Finally, extending interaction meshes to more complex scenarios (multi-object Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects., human-robot collaboration, contact-rich Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. like piano playing) could unlock even richer Modern Robot LearningSkillA reusable behavior like grasp, push, place, or open drawer. learning from human demonstrations.

RELATED PAPERS