Day 20

OpenVLA-OFT inference + LoRA on a custom small dataset

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (12 min)

  • OpenVLA — Stanford / TRI 2024. 7B-parameter Modern Robot LearningVision-Language-Action model (VLA)A model that takes images and language as input and outputs robot actions. built on Llama-2-7B + DINOv2 + SigLIP. Open-weights.
  • OpenVLA-OFT — Optimized Modern Robot LearningFine-tuningTaking a pretrained model and adapting it to a specific robot or task. variant (Mar 2025). Removes autoregressive Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. decoding in favor of L1 regression head + parallel decoding. ~3× faster Robot LearningInferenceUsing a trained model to make predictions or choose actions., similar accuracy.
  • L1 regression head — Predict actions directly via L1 loss. Better than MSE for Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. distributions with heavy tails.
  • Parallel decoding — Output all Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. dimensions simultaneously (one forward pass), instead of token-by-token.
  • OFT weight tyingCore ConceptsActionA command the robot sends to its motors, controller, or low-level system. head shares weights with last LM block. Saves memory.
  • 8-bit quantization (bnb) — Load model in 8-bit precision. Halves memory.

Real-world analogy

OpenVLA-OFT is OpenVLA with the autoregressive decoder replaced by a fast feed-forward head — like swapping a dictation typist for a stenographer.

Hour 1 — Reading

Hour 2 — Setup OpenVLA-OFT

cd ~/robo47-il
uv pip install bitsandbytes==0.43.0 peft==0.10.0
git clone https://github.com/openvla/openvla-oft
cd openvla-oft
uv pip install -e .

Robot LearningInferenceUsing a trained model to make predictions or choose actions. test on a stock image:

python -c "
from openvla_oft import OpenVLAOFT
model = OpenVLAOFT.from_pretrained('openvla/openvla-7b-oft', load_in_8bit=True)
import PIL.Image
img = PIL.Image.open('demo.jpg').convert('RGB').resize((224,224))
action = model.predict_action(image=img, instruction='pick up the cup')
print(f'Predicted action (7-d): {action}')
"

Expected: ~12 GB GPU memory used; one 7-d Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. vector printed; Simulation & Sim-to-RealLatencyDelay between input, computation, and action. ~50 ms.

LAB

Hour 3 — Lab: LoRA on a custom dataset of 10 episodes (75 min)

What you're building. Collect (or download) 10 episodes of a simple Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. on a consumer SO-101 arm or use the LeRobot SO-101 example Robot LearningDatasetA collection of training or evaluation data.. LoRA-fine-tune OpenVLA-OFT on this data. Evaluate the gap from Modern Robot LearningZero-shotDoing a new task without task-specific training..

(If you don't have an SO-101, use a bundled 10-episode subset of LIBERO-Spatial.)

Step 1 — Get a tiny dataset (15 min)

python -c "
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
ds = LeRobotDataset('lerobot/libero_spatial', episodes=list(range(10)))
print(f'Loaded 10 episodes, {len(ds)} frames')
ds.save_to_disk('./tiny_libero')
"

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.

Papers you will re-read after this