Day 14
FoundationPose + Week 2 integration + fresh-clone
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
Glossary primer (10 min)
- 6-DoF Perception & SensingPose estimationEstimating an object’s or robot part’s position and orientation. — Find an object's full SE(3) pose (position + orientation) from sensors. Standard input to Manipulation & TasksGraspingTaking hold of an object. and Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. policies.
- FoundationPose — NVIDIA's 2024 universal 6-DoF pose estimator. Two modes: model-based (need CAD), model-free (need a few reference images). Released CVPR 2024 (Best Paper highlight).
- Refinement vs. tracking — Estimate: from-scratch pose given image + depth. Track: refine pose using prior frame's estimate. Tracking is faster (10× speedup) but requires good init.
- OnePose / OnePose++ — Earlier model-free pose estimators. Used for comparison.
- Mesh model — A
.objor.glbfile describing the object's 3D geometry. CAD-based pose estimators need this.
Real-world analogy
FoundationPose is "you've never seen this exact teapot, but you've seen 10 teapots before; given one CAD file or 5 photos, where exactly is it sitting in the scene?"
Hour 1 — Reading
- FoundationPose paper, abstract + Section 3 (~25 min): https://arxiv.org/abs/2312.08344
- NVIDIA's release blog: https://research.nvidia.com/labs/lpr/foundationpose/
- Awesome 6D Object Perception & SensingPose estimationEstimating an object’s or robot part’s position and orientation. paper list (skim): https://github.com/ZhongqunZHANG/awesome-6d-object
Hour 2 — Setup + run reference example (45 min)
NVIDIA's repo includes a reference example with the YCB-Video Robot LearningDatasetA collection of training or evaluation data.. Follow:
git clone https://github.com/NVlabs/FoundationPose
cd FoundationPose
docker pull shingarey/foundationpose:latest # or build per README
docker run --gpus all -v $(pwd):/foundationpose -it shingarey/foundationpose:latest bashInside container:
cd /foundationpose
bash scripts/run_demo.shThis runs the bundled demo: estimate pose of a mustard bottle in 5 reference images. Expected output: a sequence of overlay images in debug/ showing the predicted pose drawn as a 3D bounding box around the bottle. Simulation & Sim-to-RealLatencyDelay between input, computation, and action.: ~200 ms per refine step on H100.
LAB
Hour 3 — Lab: integrate Week 2 + fresh-clone test (75 min)
What you're building. A combined integration script that ties Week 2 together: it loads a TurtleBot4 sim, captures one image from its onboard camera, runs the Day 13 Perception & SensingPerceptionThe process of turning raw sensor data into useful understanding of the world. stack, runs FoundationPose on a target object (a YCB sugar box dropped into the warehouse), and reports the object's 6-DoF pose in the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. frame. Then a fresh-clone test verifies Day 8 (chatter), Day 9 (URDF), and Day 13 (Perception & SensingPerceptionThe process of turning raw sensor data into useful understanding of the world.) all reproduce.
What success looks like at the end. You have:
1. src/day14_w2_integration.py runs and reports a 6-DoF pose for the target.
2. figures/day14_pose_overlay.png shows the predicted pose as a 3D bounding box on the camera image.
3. RETRO_w2.md documents fresh-clone reproduction of three earlier days.
4. Repo w2-systems/ pushed to GitHub with all artifacts.
Step 1 — Spawn an object in TB4 sim (15 min)
In the running TB4 sim, drop a YCB-style box at a known pose:
Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.