VQ-BeT and architecture-comparison day

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (10 min)

VQ-BeT (Vector-Quantized Behavior Transformer) — Carnegie Mellon 2024. Tokenize continuous actions via VQ-VAE, then learn a transformer over token sequences. Discrete output, multi-modal capable.
VQ-VAE — Vector-Quantized VAE. Encoder + discrete codebook. Used to convert continuous actions to discrete tokens.
Codebook — Set of learnable embeddings. Each snaps to nearest entry.
Behavior Transformer (BeT) — Predecessor of VQ-BeT. Uses k-means clustering instead of learned VQ.
token — A discrete index (1-512) representing one continuous chunk.

Real-world analogy

VQ-BeT is "tokenize all expert actions into a vocabulary of 256 'moves' (like chess move notation), then predict next moves like predicting next words in language modeling."

Hour 1 — Reading

VQ-BeT paper, sections 1–3 (~30 min): https://arxiv.org/abs/2403.03181
BeT paper for context: https://arxiv.org/abs/2206.11251

Hour 2 — Read the LeRobot VQ-BeT impl

~/robo47-il/.venv/.../lerobot/policies/vqbet/modeling_vqbet.py — read ~30 min.

LAB

Hour 3 — Lab: train VQ-BeT, do a 4-way architecture comparison (75 min)

What you're building. Train VQ-BeT on PushT. Combine with Day 15 , Day 16 ACT, Day 17 DP results into a 4-way comparison.

Step 1 — Train VQ-BeT on PushT (45 min)

lerobot-train \
  --policy.type=vqbet \
  --dataset.repo_id=lerobot/pusht \
  --env.type=pusht --env.task=PushT-v0 \
  --batch_size=16 \
  --steps=100000 \
  --eval_freq=10000 \
  --output_dir=runs/vqbet_pusht \
  --seed=1

Expected final: success_rate ~0.85, between DP (0.92) and ACT (0.65).

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.