Optional project shelf

More projects

Extra robotics builds for after Day 47, or for builders who want a hardware-heavy side quest before the next quarterly project.

Read HUG paper HUG code HUG weights

These are optional builds for after Day 47, or for learners who finish a week early and want a harder project. They do not count toward the 47-day streak. Treat them like quarter projects: scope them tightly, ship a demo, and write down what broke.

HUG + YouTube Task Robot

When to do it: after Day 47, or after Week 6 if you already have a , camera, and a safe .

One-line : watch a short YouTube video, turn it into an object-action plan, then use Human Universal (HUG) as the dexterous grasp for a real or simulated .

Why this belongs here: it ties together the whole curriculum: , , dexterous grasp generation, retargeting, , , , and . It also has the right kind of robotics difficulty. The cool part is not "the model saw YouTube and magically controlled a "; the cool part is building the system boundary between video understanding and real .

Reality check: start as an integration project, not a full reproduction. As of 2026-06-17, the HUG repository has and visualization code plus model weights, while the , HUG-Bench assets, , and code are listed for a later upstream release. That means your first win is using the released HUG checkpoint as a grasp generator and building the rest of the stack around it.

MVP task

Build a desk-reset :

Input: one short YouTube clip of someone clearing or organizing a desk.
Extract: objects, target zones, and coarse sequence.
scene: a real desk with 3-6 safe objects such as a mug, pen, notebook, soft snack, tape roll, and small box.
: detect object, select a target query point, generate a HUG grasp, retarget to your hand or , pick, place, and verify.
Output: a 2-minute demo video plus a reproducible repo.

Avoid heat, knives, glass, heavy liquid, pets, people in the , and anything that can shatter or spill during the first version.

System sketch

Layer	Build
Video understanding	Use a to summarize the YouTube clip into objects, actions, target states, and safety constraints.
Scene	Use , , and object matching to find the same objects in the .
Grasp	Use HUG : plus query point to MANO grasp, then retarget to your hand.
	Run pre-grasp, close, lift, place, and release with collision checks and limits.
Verification	Use camera to decide whether the object moved to the expected zone.

Build path

Phase 0 - paper and repo pass: read the HUG paper breakdown, run the official app on sample inputs, and save at least three predicted grasps.
Phase 1 - custom capture: capture rgb.png, depth.png, and intrinsics.txt from your camera, run HUG on a single object, and visualize the predicted MANO hand.
Phase 2 - target-point automation: replace manual clicking with . Pick the object centroid, a handle point, or a high-affordance point predicted by a .
Phase 3 - retargeting: map MANO wrist and fingers to your . If you only have a parallel , use HUG to choose approach pose and region, then close the normally.
Phase 4 - one object, one reliable loop: repeatedly pick and place one object with 20 trials. Record and failure modes.
Phase 5 - YouTube layer: convert a video into a 5-10 step plan, then execute only the safe pick/place subset.

Suggested hardware lanes

No hardware lane: run HUG , visualize grasps, and simulate the pick/place loop with a fake executor.
Low-cost lane: SO-101 or xArm Lite style arm, RealSense/ZED camera, and parallel . Use HUG for grasp pose priors.
Dexterous lane: xArm/Franka/UR5 plus LEAP, Allegro, Ability, WUJI, or another anthropomorphic hand. Retarget MANO to hand joints and keep the first slow and open-loop.

Deliverable checklist

README.md explains the , hardware lane, and safety constraints.
scripts/capture_rgbd.py saves registered RGB, depth, and intrinsics.
scripts/hug_predict.py runs HUG on your own capture.
scripts/plan_from_video.py converts a YouTube clip or transcript into object-action steps.
scripts/execute_pick_place.py runs one guarded pick/place loop.
eval/results.csv logs at least 20 trials with success, failure stage, and notes.
docs/failure_analysis.md lists the top 5 failure modes and the next fix.
videos/demo.mp4 shows the YouTube clip, extracted plan, attempt, and result.

Stretch goals

Generate 8 HUG grasp candidates, rank by , collision clearance, and hand limit margin.
Add closed-loop visual servoing before closing the hand.
Train a small success classifier from before/after images.
Add a second YouTube such as snack sorting, tool sorting, or coffee-bar dry run.
Compare HUG-based grasp priors against a simple top-down .
Publish a short writeup: what YouTube helped with, what it did not help with, and where the still needed real data.

What counts as a good version

A good version does not need a humanoid or a perfect . It needs a clear boundary:

YouTube gives structure and hints.
HUG gives a dexterous grasp prior from .
Your stack handles calibration, retargeting, , and safety.
Your says exactly where the system fails.

If you can show that loop on 3-6 desk objects with honest failure analysis, this is a strong post-curriculum robotics project.