Day 41

EgoScale and the data-collection paradigm shift

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (10 min)

  • EgoScale — Feb 2026 paper / Robot LearningDatasetA collection of training or evaluation data. (Meta + collaborators). Massive egocentric video Robot LearningDatasetA collection of training or evaluation data. (10,000+ hours) with synchronized hand pose, gaze, language. Targeted at Robot LearningTrainingThe process of fitting a model using data or experience. generalist VLAs.
  • Egocentric video — First-person video, e.g. from head-mounted GoPro or smart glasses. Captures Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. Core ConceptsExecutionActually carrying out planned or predicted actions on the robot. from the actor's POV.
  • Project Aria — Meta's research smart-glasses platform. EgoScale uses Aria-style devices.
  • Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. labels from video — Use VLMs (Day 33) to auto-label actions from egocentric video. Cheap; less accurate than Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations..
  • Hand-tracking model — Reconstructs 3D hand pose from RGB video. Standard approach: HaMeR, MANO model.
  • Why this matters — Until 2025, Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. data was bottlenecked by Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations. hours (~$50/hour, slow). Egocentric video can be collected at scale ($0.50/hour from existing footage). 100× cost reduction.

Real-world analogy

Pre-EgoScale: train a chef by having them re-cook each recipe 100× while a lab tech holds their hands and records every motion. Post-EgoScale: just film professional chefs doing their job in their kitchens; auto-extract what their hands did. Same data, 100× cheaper.

Hour 1 — Reading

Hour 2 — Inspect EgoScale samples

If the Robot LearningDatasetA collection of training or evaluation data. is downloadable:

huggingface-cli download facebook/egoscale-v1 --local-dir data/egoscale --include "samples/*"
ls data/egoscale/samples/
  • For each sample, look at:
  • The egocentric MP4 (~30 s)
  • The hand pose JSON (per-frame 3D Movement, Mechanics & Robot BodyJointA movable connection between robot parts. positions)
  • The auto-generated Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. labels

LAB

Hour 3 — Lab: extract hand pose from a 30-second clip (60 min)

What you're building. Take an egocentric video clip you record yourself (point your phone at your hands while making a sandwich, ~30 s). Run HaMeR (open-source 3D hand pose reconstruction) on it. Output a 30-second timeline of 3D hand keypoints.

Step 1 — Install HaMeR (15 min)

git clone https://github.com/geopavlakos/hamer
cd hamer
uv pip install -e .
huggingface-cli download geopavlakos/hamer --local-dir checkpoints/hamer

Step 2 — Record + run (30 min)

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.

Papers you will re-read after this