Course navigation
Week 7: CapstoneDay 44
Train baseline
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
The point of today: get your "non-extension" version running end-to-end with at least one seed before you sleep. A working Evaluation & ResearchBaselineA reference method used for comparison. at 5pm is worth more than a perfect Evaluation & ResearchBaselineA reference method used for comparison. next week.
Hour 1 — Implement train.py for your track (45 min)
| Track | What this is |
| A | LoRA fine-tune of π0.7 on your 30-episode Robot LearningDatasetA collection of training or evaluation data. |
| B | Train 100M-param JEPA predictor on your 50h Robot LearningDatasetA collection of training or evaluation data. |
| C | DR-only Go1 PPO (Day 25 reproduce) |
| D | ACT on your 50-episode bimanual Robot LearningDatasetA collection of training or evaluation data. |
Each of these you've effectively done before. Today you wrap it in a proper script with seeds, wandb logging, and a saved checkpoint.
Hour 2 — Launch first seed (training in background; ~3-6 hours)
While it runs in tmux, work on Hour 3.
LAB
Hour 3 — Implement eval.py (60 min)
# src/eval.py
"""Capstone eval. Multi-seed. Outputs metrics.csv rows + plot."""
import argparse, pandas as pd
def evaluate_policy(policy_path, n_episodes=20, seed=1):
# ... load, run, return mean & std of success rate
...
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--seeds", type=str, default="1,2,3")
args = parser.parse_args()
seeds = [int(s) for s in args.seeds.split(",")]
results = []
for seed in seeds:
for variant in ["baseline", "extension"]:
ckpt = f"runs/{variant}_s{seed}"
sr = evaluate_policy(ckpt, seed=seed)
results.append({
"seed": seed, "variant": variant, "success_rate": sr
})
df = pd.DataFrame(results)
df.to_csv("runs/eval_summary.csv", index=False)
# Make plot, log to metrics.csvMake sure eval is fully scriptable (make eval works) — this is rubric category 3 (Evaluation & ResearchReproducibilityWhether others can reliably get the same result.).
Hour 4 — Late: status check
After ~4 hours, check on your Robot LearningTrainingThe process of fitting a model using data or experience. run. If it's progressing (loss decreasing, eval improving), good. If not, stop, debug, restart. Don't go to bed with a broken run.
Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.