Humanoid whole-body controllers: HumanPlus, OmniH2O, HOVER, ASAP, BeyondMimic

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (15 min)

WBC (Whole-Body ) — that controls all joints of a humanoid simultaneously: legs () + torso (balance) + arms () + head (gaze). 25–35 DoFs.
HumanPlus — Stanford 2024. Teleoperation-driven humanoid policies. Pose-tracking from human video.
OmniH2O — CMU 2024. Universal humanoid : maps human motion to via shared latent.
HOVER — NVIDIA 2024. Distillation framework: train multiple specialist policies (, , dancing), distill into one generalist.
ASAP — UCB / CMU 2025. Aligning and Physical worlds via online residual learning. The model knows it might be wrong; corrects on the fly.
BeyondMimic — Berkeley / Stanford 2025. Latent diffusion over motion repertoires; classifier guidance for combining skills. Generalizes to unseen motion combinations.
Motion retargeting — Map human MoCap motion to trajectories (different geometry).
AMP (Adversarial Motion Priors) — Discriminator distinguishes "robot-like" from "human-like" motion; regularized to look human-like.
Reference motion — A target (from MoCap or animation) the is rewarded for matching.

Real-world analogy

If quadruped policies are "motorcycles" (4 contacts, low CoM, simpler), humanoid WBC is "unicycles" (2 contacts, high CoM, vastly harder to balance). Five years of progress: HumanPlus = first prototype; OmniH2O = better ; HOVER = generalist via distillation; ASAP = closes gap actively; BeyondMimic = composes new motions.

Hour 1 — Reading (pick 3 of 5)

Read abstracts + figures of all 5; deep-dive any 3 (~50 min total):

HumanPlus: https://humanoid-ai.github.io/ (~10 min)
OmniH2O: https://omni.human2humanoid.com/ (~15 min)
HOVER: https://hover-versatile-humanoid.github.io/ (~15 min)
ASAP: https://agile.human2humanoid.com/ (~15 min)
BeyondMimic: https://beyondmimic.github.io/ (~20 min)

Hour 2 — Comparison table

Create docs/day39_humanoid_wbc.md:

# Humanoid WBC comparison

| Method | Year | Org | Robot | Source | Action repr | Key idea |
|---|---|---|---|---|---|---|
| HumanPlus | Jun 2024 | Stanford | H1 | Human video | PPO + AMP | Imitate human motion via 6D pose |
| OmniH2O | 2024 | CMU | H1 | Teleop + sim | RL | Universal "human → humanoid" map |
| HOVER | Late 2024 | NVIDIA | H1 / G1 | Many specialists | Distillation | Multi-task generalist via distillation |
| ASAP | Mar 2025 | UCB / CMU | H1 / G1 | Sim + real residual | RL + residual MLP | Learn the sim-to-real *delta* online |
| BeyondMimic | Aug 2025 | UCB / Stanford | H1 / G1 | MoCap library | Latent diffusion | Compose unseen motions via classifier guidance |

## What each adds over the previous

- HumanPlus → OmniH2O: better teleop topology, robot-agnostic
- OmniH2O → HOVER: multi-task generalization via distillation
- HOVER → ASAP: fix sim-to-real gap explicitly with residual model
- ASAP → BeyondMimic: compose multiple motion skills, not just retarget one

## Common ingredients (the recipe)

1. Reference motion (MoCap or human video, retargeted)
2. Reward = motion tracking + alive bonus + smoothness
3. Domain randomization (Day 25) — universal
4. Teacher-student (Day 27) — universal except BeyondMimic
5. Action filtering (low-pass on actions) for hardware

## When to use what

- I want a humanoid to dance: BeyondMimic (multi-skill latent diffusion)
- I want a humanoid that walks like a particular person: HumanPlus
- I want a generalist humanoid for many tasks: HOVER
- I deploy on hardware and have a sim2real gap: ASAP
- I do teleop demos: OmniH2O

LAB

Hour 3 — Lab: replicate a BeyondMimic-style latent-diffusion classifier guidance toy (60 min)

What you're building. A minimal version of BeyondMimic's core trick: train a small VAE on a few motion clips, then combine two classifier guidances at sample time to produce hybrid motions (e.g. "walk + raise arm").

This is conceptual — not full BeyondMimic. The full method needs MoCap data and an Isaac Lab humanoid env.

Step 1 — Get a few MoCap clips (15 min)

Use the AMASS or LAFAN1 (free). Or for a quick toy, use synthetic clips:

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.

Papers you will re-read after this

Mobile ALOHA — bimanual mobile manipulation