OpenVLA-OFT inference + LoRA on a custom small dataset

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (12 min)

OpenVLA — Stanford / TRI 2024. 7B-parameter built on Llama-2-7B + DINOv2 + SigLIP. Open-weights.
OpenVLA-OFT — Optimized variant (Mar 2025). Removes autoregressive decoding in favor of L1 regression head + parallel decoding. ~3× faster , similar accuracy.
L1 regression head — Predict actions directly via L1 loss. Better than MSE for distributions with heavy tails.
Parallel decoding — Output all dimensions simultaneously (one forward pass), instead of token-by-token.
OFT weight tying — head shares weights with last LM block. Saves memory.
8-bit quantization (bnb) — Load model in 8-bit precision. Halves memory.

Real-world analogy

OpenVLA-OFT is OpenVLA with the autoregressive decoder replaced by a fast feed-forward head — like swapping a dictation typist for a stenographer.

Hour 1 — Reading

OpenVLA paper, sections 1–4 (~30 min): https://openvla.github.io/
OpenVLA-OFT blog/paper (~25 min): https://openvla-oft.github.io/

Hour 2 — Setup OpenVLA-OFT

cd ~/robo47-il
uv pip install bitsandbytes==0.43.0 peft==0.10.0
git clone https://github.com/openvla/openvla-oft
cd openvla-oft
uv pip install -e .

test on a stock image:

python -c "
from openvla_oft import OpenVLAOFT
model = OpenVLAOFT.from_pretrained('openvla/openvla-7b-oft', load_in_8bit=True)
import PIL.Image
img = PIL.Image.open('demo.jpg').convert('RGB').resize((224,224))
action = model.predict_action(image=img, instruction='pick up the cup')
print(f'Predicted action (7-d): {action}')
"

Expected: ~12 GB GPU memory used; one 7-d vector printed; ~50 ms.

LAB

Hour 3 — Lab: LoRA on a custom dataset of 10 episodes (75 min)

What you're building. Collect (or download) 10 episodes of a simple on a consumer SO-101 arm or use the LeRobot SO-101 example . LoRA-fine-tune OpenVLA-OFT on this data. Evaluate the gap from .

(If you don't have an SO-101, use a bundled 10-episode subset of LIBERO-Spatial.)

Step 1 — Get a tiny dataset (15 min)

python -c "
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
ds = LeRobotDataset('lerobot/libero_spatial', episodes=list(range(10)))
print(f'Loaded 10 episodes, {len(ds)} frames')
ds.save_to_disk('./tiny_libero')
"

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.

Papers you will re-read after this

Octo — open-source generalist policy