Course navigation
Week 3: Imitation LearningDay 20
OpenVLA-OFT inference + LoRA on a custom small dataset
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
Glossary primer (12 min)
- OpenVLA — Stanford / TRI 2024. 7B-parameter Modern Robot LearningVision-Language-Action model (VLA)A model that takes images and language as input and outputs robot actions. built on Llama-2-7B + DINOv2 + SigLIP. Open-weights.
- OpenVLA-OFT — Optimized Modern Robot LearningFine-tuningTaking a pretrained model and adapting it to a specific robot or task. variant (Mar 2025). Removes autoregressive Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. decoding in favor of L1 regression head + parallel decoding. ~3× faster Robot LearningInferenceUsing a trained model to make predictions or choose actions., similar accuracy.
- L1 regression head — Predict actions directly via L1 loss. Better than MSE for Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. distributions with heavy tails.
- Parallel decoding — Output all Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. dimensions simultaneously (one forward pass), instead of token-by-token.
- OFT weight tying — Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. head shares weights with last LM block. Saves memory.
- 8-bit quantization (bnb) — Load model in 8-bit precision. Halves memory.
Real-world analogy
OpenVLA-OFT is OpenVLA with the autoregressive decoder replaced by a fast feed-forward head — like swapping a dictation typist for a stenographer.
Hour 1 — Reading
- OpenVLA paper, sections 1–4 (~30 min): https://openvla.github.io/
- OpenVLA-OFT blog/paper (~25 min): https://openvla-oft.github.io/
Hour 2 — Setup OpenVLA-OFT
cd ~/robo47-il
uv pip install bitsandbytes==0.43.0 peft==0.10.0
git clone https://github.com/openvla/openvla-oft
cd openvla-oft
uv pip install -e .Robot LearningInferenceUsing a trained model to make predictions or choose actions. test on a stock image:
python -c "
from openvla_oft import OpenVLAOFT
model = OpenVLAOFT.from_pretrained('openvla/openvla-7b-oft', load_in_8bit=True)
import PIL.Image
img = PIL.Image.open('demo.jpg').convert('RGB').resize((224,224))
action = model.predict_action(image=img, instruction='pick up the cup')
print(f'Predicted action (7-d): {action}')
"Expected: ~12 GB GPU memory used; one 7-d Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. vector printed; Simulation & Sim-to-RealLatencyDelay between input, computation, and action. ~50 ms.
LAB
Hour 3 — Lab: LoRA on a custom dataset of 10 episodes (75 min)
What you're building. Collect (or download) 10 episodes of a simple Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. on a consumer SO-101 arm or use the LeRobot SO-101 example Robot LearningDatasetA collection of training or evaluation data.. LoRA-fine-tune OpenVLA-OFT on this data. Evaluate the gap from Modern Robot LearningZero-shotDoing a new task without task-specific training..
(If you don't have an SO-101, use a bundled 10-episode subset of LIBERO-Spatial.)
Step 1 — Get a tiny dataset (15 min)
python -c "
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
ds = LeRobotDataset('lerobot/libero_spatial', episodes=list(range(10)))
print(f'Loaded 10 episodes, {len(ds)} frames')
ds.save_to_disk('./tiny_libero')
"Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.