Course navigation
Week 6: Frontier EmbodimentDay 37
LeWorldModel, ThinkJEPA, world-model–conditioned policies
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
Glossary primer (12 min)
- LeWorldModel — Hugging Face's 2025 world-model project. JEPA backbone + Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. conditioning + LeRobot integration. Designed to plug into existing imitation pipelines.
- ThinkJEPA — Late 2025 / early 2026 paper. Adds reasoning steps to JEPA prediction: instead of one-shot prediction, the model iterates internal "thoughts" before producing the final embedding.
- Latent Robot LearningRolloutA full run of a policy in simulation or the real world. — Given
obs_0, Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. sequencea_0, a_1, ..., a_{T-1}, predictobs_1, obs_2, ..., obs_Tentirely in embedding space. The "imagined Core ConceptsTrajectoryA sequence of states or actions over time.." - World-model–conditioned Core ConceptsPolicyThe rule or model that maps observations or states to actions. — Train Core ConceptsPolicyThe rule or model that maps observations or states to actions. via imitation (Week 3-style), but augment its observations with a JEPA-predicted "imagined" next embedding. Adds foresight at low cost.
- Auxiliary loss — Adding world-model prediction loss to the main imitation loss. Typically weighted 0.1–0.5×.
- Closed-loop vs open-loop Robot LearningRolloutA full run of a policy in simulation or the real world. — Closed: at each step, predict next embedding from real obs + Core ConceptsActionA command the robot sends to its motors, controller, or low-level system.. Open: predict from previous predicted embedding. Open-loop drifts faster.
- Drift horizon — How many steps before predicted embeddings diverge measurably from reality. JEPA models typically drift after 20–50 steps; full rollouts beyond ~100 steps are unreliable.
Real-world analogy
LeWorldModel is the "GPS for imitation policies": the Core ConceptsPolicyThe rule or model that maps observations or states to actions. is the driver; the Modern Robot LearningWorld modelA model that predicts how the world will change after actions. whispers "if you keep going this way, in 2 seconds you'll see X" — useful information that improves decisions without needing photorealistic prediction.
Hour 1 — Reading
- LeWorldModel announcement / blog post (~20 min): https://huggingface.co/blog/leworldmodel
- ThinkJEPA paper, abstract + Section 3 (~25 min): https://arxiv.org/abs/2511.xxxxx (search "ThinkJEPA arxiv")
- Predicting Latent Trajectories (DeepMind 2024 follow-up to V-JEPA): https://arxiv.org/abs/2410.xxxxx
Hour 2 — LeWorldModel codebase
cd ~/robo47-wm
git clone https://github.com/huggingface/leworldmodel
cd leworldmodel
uv pip install -e .- Read in this order (~30 min):
leworldmodel/models/world_model.py— the action-conditioned predictorleworldmodel/training/policy_with_wm.py— example of imitation + world-model Movement, Mechanics & Robot BodyJointA movable connection between robot parts. Robot LearningTrainingThe process of fitting a model using data or experience.examples/aloha_with_wm.py— full Robot LearningTrainingThe process of fitting a model using data or experience. script
LAB
Hour 3 — Lab: latent rollout on Day 36's clip (60 min)
What you're building. Use V-JEPA 2-AC to open-loop roll out 30 steps of imagined embeddings starting from frame 0, given a sequence of 30 mock actions. Compare against ground-truth embeddings at each step. Measure how cosine similarity decays with horizon.
What success looks like. You have:
1. src/day37_latent_rollout.py runnable.
2. Plot figures/day37_drift.png showing cosine similarity vs horizon (1 to 30 steps).
3. The curve should start near 0.9 (1-step prediction is good) and decay to ~0.3-0.5 by step 30 — the drift horizon.
Step 1 — Implement open-loop rollout (40 min)
# src/day37_latent_rollout.py
"""Day 37: open-loop latent rollout with V-JEPA 2-AC.
Demonstrates the drift horizon — when imagined trajectories diverge from reality.
"""
import numpy as np, torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
from transformers import AutoModel, AutoVideoProcessor
import decord
DEVICE = "cuda"
DTYPE = torch.bfloat16
MODEL_NAME = "facebook/vjepa2-ac-vitl16"
VIDEO_PATH = "../../w3-imitation/runs/act_aloha/eval/videos/episode_0.mp4"
ROLLOUT_STEPS = 30
def main():
proc = AutoVideoProcessor.from_pretrained(MODEL_NAME)
model = AutoModel.from_pretrained(MODEL_NAME, torch_dtype=DTYPE).to(DEVICE).eval()
vr = decord.VideoReader(VIDEO_PATH)
frames = vr.get_batch(np.arange(ROLLOUT_STEPS + 1)).asnumpy()
inputs = proc(videos=[frames], return_tensors="pt").to(DEVICE, DTYPE)
with torch.no_grad():
all_target_emb = model.encode_video(inputs.pixel_values)
# Shape: (1, (T+1)*N_patches, D)
# Open-loop rollout: start from frame 0 embedding, predict next, feed back, repeat
n_patches_per_frame = all_target_emb.shape[1] // (ROLLOUT_STEPS + 1)
current_emb = all_target_emb[:, :n_patches_per_frame].clone() # frame 0
actions = torch.zeros(1, 1, 14, device=DEVICE, dtype=DTYPE) # mock zero action
cosines = []
for t in range(ROLLOUT_STEPS):
with torch.no_grad():
next_emb = model.predict_action_conditioned(
context_emb=current_emb, actions=actions
)
actual_emb = all_target_emb[:, (t+1)*n_patches_per_frame : (t+2)*n_patches_per_frame]
cos = F.cosine_similarity(
next_emb.float().reshape(-1, next_emb.shape[-1]),
actual_emb.float().reshape(-1, actual_emb.shape[-1]),
dim=-1
).mean().item()
cosines.append(cos)
current_emb = next_emb # OPEN-LOOP: feed prediction back
print(f"Step 1 cosine: {cosines[0]:.3f}")
print(f"Step 10 cosine: {cosines[9]:.3f}")
print(f"Step 30 cosine: {cosines[29]:.3f}")
# Closed-loop comparison
cosines_closed = []
for t in range(ROLLOUT_STEPS):
with torch.no_grad():
current = all_target_emb[:, t*n_patches_per_frame : (t+1)*n_patches_per_frame]
next_pred = model.predict_action_conditioned(
context_emb=current, actions=actions
)
actual = all_target_emb[:, (t+1)*n_patches_per_frame : (t+2)*n_patches_per_frame]
c = F.cosine_similarity(
next_pred.float().reshape(-1, next_pred.shape[-1]),
actual.float().reshape(-1, actual.shape[-1]),
dim=-1
).mean().item()
cosines_closed.append(c)
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(range(1, ROLLOUT_STEPS + 1), cosines, "o-", label="open-loop")
ax.plot(range(1, ROLLOUT_STEPS + 1), cosines_closed, "s-", label="closed-loop (1-step)")
ax.axhline(0.5, color="r", linestyle="--", alpha=0.5)
ax.set_xlabel("Rollout step")
ax.set_ylabel("Mean cosine similarity")
ax.set_title("V-JEPA 2-AC drift: open-loop vs closed-loop")
ax.set_ylim(0, 1)
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("../figures/day37_drift.png", dpi=120)
print("Wrote figures/day37_drift.png")
if __name__ == "__main__":
main()Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.