LeWorldModel, ThinkJEPA, world-model–conditioned policies

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (12 min)

LeWorldModel — Hugging Face's 2025 world-model project. JEPA backbone + conditioning + LeRobot integration. Designed to plug into existing imitation pipelines.
ThinkJEPA — Late 2025 / early 2026 paper. Adds reasoning steps to JEPA prediction: instead of one-shot prediction, the model iterates internal "thoughts" before producing the final embedding.
Latent — Given obs_0, sequence a_0, a_1, ..., a_{T-1}, predict obs_1, obs_2, ..., obs_T entirely in embedding space. The "imagined ."
World-model–conditioned — Train via imitation (Week 3-style), but augment its observations with a JEPA-predicted "imagined" next embedding. Adds foresight at low cost.
Auxiliary loss — Adding world-model prediction loss to the main imitation loss. Typically weighted 0.1–0.5×.
Closed-loop vs open-loop — Closed: at each step, predict next embedding from real obs + . Open: predict from previous predicted embedding. Open-loop drifts faster.
Drift horizon — How many steps before predicted embeddings diverge measurably from reality. JEPA models typically drift after 20–50 steps; full rollouts beyond ~100 steps are unreliable.

Real-world analogy

LeWorldModel is the "GPS for imitation policies": the is the driver; the whispers "if you keep going this way, in 2 seconds you'll see X" — useful information that improves decisions without needing photorealistic prediction.

Hour 1 — Reading

LeWorldModel announcement / blog post (~20 min): https://huggingface.co/blog/leworldmodel
ThinkJEPA paper, abstract + Section 3 (~25 min): https://arxiv.org/abs/2511.xxxxx (search "ThinkJEPA arxiv")
Predicting Latent Trajectories (DeepMind 2024 follow-up to V-JEPA): https://arxiv.org/abs/2410.xxxxx

Hour 2 — LeWorldModel codebase

cd ~/robo47-wm
git clone https://github.com/huggingface/leworldmodel
cd leworldmodel
uv pip install -e .

Read in this order (~30 min):
leworldmodel/models/world_model.py — the action-conditioned predictor
leworldmodel/training/policy_with_wm.py — example of imitation + world-model
examples/aloha_with_wm.py — full script

LAB

Hour 3 — Lab: latent rollout on Day 36's clip (60 min)

What you're building. Use V-JEPA 2-AC to open-loop roll out 30 steps of imagined embeddings starting from frame 0, given a sequence of 30 mock actions. Compare against ground-truth embeddings at each step. Measure how cosine similarity decays with horizon.

What success looks like. You have: 1. src/day37_latent_rollout.py runnable. 2. Plot figures/day37_drift.png showing cosine similarity vs horizon (1 to 30 steps). 3. The curve should start near 0.9 (1-step prediction is good) and decay to ~0.3-0.5 by step 30 — the drift horizon.

Step 1 — Implement open-loop rollout (40 min)

# src/day37_latent_rollout.py
"""Day 37: open-loop latent rollout with V-JEPA 2-AC.
Demonstrates the drift horizon — when imagined trajectories diverge from reality.
"""
import numpy as np, torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
from transformers import AutoModel, AutoVideoProcessor
import decord

DEVICE = "cuda"
DTYPE = torch.bfloat16
MODEL_NAME = "facebook/vjepa2-ac-vitl16"
VIDEO_PATH = "../../w3-imitation/runs/act_aloha/eval/videos/episode_0.mp4"
ROLLOUT_STEPS = 30


def main():
    proc = AutoVideoProcessor.from_pretrained(MODEL_NAME)
    model = AutoModel.from_pretrained(MODEL_NAME, torch_dtype=DTYPE).to(DEVICE).eval()

    vr = decord.VideoReader(VIDEO_PATH)
    frames = vr.get_batch(np.arange(ROLLOUT_STEPS + 1)).asnumpy()
    inputs = proc(videos=[frames], return_tensors="pt").to(DEVICE, DTYPE)

    with torch.no_grad():
        all_target_emb = model.encode_video(inputs.pixel_values)
        # Shape: (1, (T+1)*N_patches, D)

    # Open-loop rollout: start from frame 0 embedding, predict next, feed back, repeat
    n_patches_per_frame = all_target_emb.shape[1] // (ROLLOUT_STEPS + 1)
    current_emb = all_target_emb[:, :n_patches_per_frame].clone()  # frame 0
    actions = torch.zeros(1, 1, 14, device=DEVICE, dtype=DTYPE)  # mock zero action

    cosines = []
    for t in range(ROLLOUT_STEPS):
        with torch.no_grad():
            next_emb = model.predict_action_conditioned(
                context_emb=current_emb, actions=actions
            )
        actual_emb = all_target_emb[:, (t+1)*n_patches_per_frame : (t+2)*n_patches_per_frame]
        cos = F.cosine_similarity(
            next_emb.float().reshape(-1, next_emb.shape[-1]),
            actual_emb.float().reshape(-1, actual_emb.shape[-1]),
            dim=-1
        ).mean().item()
        cosines.append(cos)
        current_emb = next_emb  # OPEN-LOOP: feed prediction back

    print(f"Step  1 cosine: {cosines[0]:.3f}")
    print(f"Step 10 cosine: {cosines[9]:.3f}")
    print(f"Step 30 cosine: {cosines[29]:.3f}")

    # Closed-loop comparison
    cosines_closed = []
    for t in range(ROLLOUT_STEPS):
        with torch.no_grad():
            current = all_target_emb[:, t*n_patches_per_frame : (t+1)*n_patches_per_frame]
            next_pred = model.predict_action_conditioned(
                context_emb=current, actions=actions
            )
        actual = all_target_emb[:, (t+1)*n_patches_per_frame : (t+2)*n_patches_per_frame]
        c = F.cosine_similarity(
            next_pred.float().reshape(-1, next_pred.shape[-1]),
            actual.float().reshape(-1, actual.shape[-1]),
            dim=-1
        ).mean().item()
        cosines_closed.append(c)

    fig, ax = plt.subplots(figsize=(8, 4))
    ax.plot(range(1, ROLLOUT_STEPS + 1), cosines, "o-", label="open-loop")
    ax.plot(range(1, ROLLOUT_STEPS + 1), cosines_closed, "s-", label="closed-loop (1-step)")
    ax.axhline(0.5, color="r", linestyle="--", alpha=0.5)
    ax.set_xlabel("Rollout step")
    ax.set_ylabel("Mean cosine similarity")
    ax.set_title("V-JEPA 2-AC drift: open-loop vs closed-loop")
    ax.set_ylim(0, 1)
    ax.legend()
    ax.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig("../figures/day37_drift.png", dpi=120)
    print("Wrote figures/day37_drift.png")


if __name__ == "__main__":
    main()

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.