Day 44

Train baseline

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

The point of today: get your "non-extension" version running end-to-end with at least one seed before you sleep. A working Evaluation & ResearchBaselineA reference method used for comparison. at 5pm is worth more than a perfect Evaluation & ResearchBaselineA reference method used for comparison. next week.

Hour 1 — Implement train.py for your track (45 min)

TrackWhat this is
ALoRA fine-tune of π0.7 on your 30-episode Robot LearningDatasetA collection of training or evaluation data.
BTrain 100M-param JEPA predictor on your 50h Robot LearningDatasetA collection of training or evaluation data.
CDR-only Go1 PPO (Day 25 reproduce)
DACT on your 50-episode bimanual Robot LearningDatasetA collection of training or evaluation data.

Each of these you've effectively done before. Today you wrap it in a proper script with seeds, wandb logging, and a saved checkpoint.

Hour 2 — Launch first seed (training in background; ~3-6 hours)

While it runs in tmux, work on Hour 3.

LAB

Hour 3 — Implement eval.py (60 min)

# src/eval.py
"""Capstone eval. Multi-seed. Outputs metrics.csv rows + plot."""
import argparse, pandas as pd

def evaluate_policy(policy_path, n_episodes=20, seed=1):
    # ... load, run, return mean & std of success rate
    ...

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--seeds", type=str, default="1,2,3")
    args = parser.parse_args()
    seeds = [int(s) for s in args.seeds.split(",")]

    results = []
    for seed in seeds:
        for variant in ["baseline", "extension"]:
            ckpt = f"runs/{variant}_s{seed}"
            sr = evaluate_policy(ckpt, seed=seed)
            results.append({
                "seed": seed, "variant": variant, "success_rate": sr
            })
    df = pd.DataFrame(results)
    df.to_csv("runs/eval_summary.csv", index=False)
    # Make plot, log to metrics.csv

Make sure eval is fully scriptable (make eval works) — this is rubric category 3 (Evaluation & ResearchReproducibilityWhether others can reliably get the same result.).

Hour 4 — Late: status check

After ~4 hours, check on your Robot LearningTrainingThe process of fitting a model using data or experience. run. If it's progressing (loss decreasing, eval improving), good. If not, stop, debug, restart. Don't go to bed with a broken run.

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.