Optional project shelf

More projects

Extra robotics builds for after Day 47, or for builders who want a hardware-heavy side quest before the next quarterly project.

These are optional builds for after Day 47, or for learners who finish a week early and want a harder project. They do not count toward the 47-day streak. Treat them like quarter projects: scope them tightly, ship a demo, and write down what broke.

HUG + YouTube Task Robot

When to do it: after Day 47, or after Week 6 if you already have a Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions., Perception & SensingRGB-DSensor input that combines color images and depth information. camera, and a safe Manipulation & TasksWorkspaceThe region of space the robot can reach..

One-line Core ConceptsGoalThe desired outcome or target state for a robot task.: watch a short YouTube Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. video, turn it into an object-action plan, then use Human Universal Manipulation & TasksGraspingTaking hold of an object. (HUG) as the dexterous grasp Modern Robot LearningPrimitive / action primitiveA simple reusable low-level movement or control building block. for a real or simulated Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions..

Why this belongs here: it ties together the whole curriculum: Perception & SensingRGB-DSensor input that combines color images and depth information. Perception & SensingPerceptionThe process of turning raw sensor data into useful understanding of the world., Perception & SensingSegmentationDividing an image into meaningful regions or object masks., dexterous grasp generation, retargeting, Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Core ConceptsExecutionActually carrying out planned or predicted actions on the robot., Imitation & Reinforcement LearningImitation Learning (IL)Teaching a robot by showing it examples of how to do a task., Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. Control & PlanningPlanningFiguring out what the robot should do before or during movement., and Simulation & Sim-to-RealEvaluationMeasuring how well a robot system performs.. It also has the right kind of robotics difficulty. The cool part is not "the model saw YouTube and magically controlled a Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions."; the cool part is building the system boundary between video understanding and real Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects..

Reality check: start as an integration project, not a full Robot LearningTrainingThe process of fitting a model using data or experience. reproduction. As of 2026-06-17, the HUG repository has Robot LearningInferenceUsing a trained model to make predictions or choose actions. and visualization code plus model weights, while the Robot LearningDatasetA collection of training or evaluation data., HUG-Bench assets, Simulation & Sim-to-RealSimulationA virtual environment where robots can be trained or tested. Simulation & Sim-to-RealEvaluationMeasuring how well a robot system performs., and Robot LearningTrainingThe process of fitting a model using data or experience. code are listed for a later upstream release. That means your first win is using the released HUG checkpoint as a grasp generator and building the rest of the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. stack around it.

MVP task

Build a desk-reset Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.:

  • Input: one short YouTube clip of someone clearing or organizing a desk.
  • Extract: objects, target zones, and coarse Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. sequence.
  • Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. scene: a real desk with 3-6 safe objects such as a mug, pen, notebook, soft snack, tape roll, and small box.
  • Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Modern Robot LearningSkillA reusable behavior like grasp, push, place, or open drawer.: detect object, select a target query point, generate a HUG grasp, retarget to your hand or Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects., pick, place, and verify.
  • Output: a 2-minute demo video plus a reproducible repo.

Avoid heat, knives, glass, heavy liquid, pets, people in the Manipulation & TasksWorkspaceThe region of space the robot can reach., and anything that can shatter or spill during the first version.

System sketch

LayerBuild
Video understandingUse a Modern Robot LearningVision-Language Model (VLM)A model that understands both images and text. to summarize the YouTube clip into objects, actions, target states, and safety constraints.
Scene Perception & SensingPerceptionThe process of turning raw sensor data into useful understanding of the world.Use Perception & SensingRGB-DSensor input that combines color images and depth information., Perception & SensingSegmentationDividing an image into meaningful regions or object masks., and object matching to find the same objects in the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Manipulation & TasksWorkspaceThe region of space the robot can reach..
Grasp Modern Robot LearningPrimitive / action primitiveA simple reusable low-level movement or control building block.Use HUG Robot LearningInferenceUsing a trained model to make predictions or choose actions.: Perception & SensingRGB-DSensor input that combines color images and depth information. plus query point to MANO grasp, then retarget to your Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. hand.
Core ConceptsExecutionActually carrying out planned or predicted actions on the robot.Run pre-grasp, close, lift, place, and release with collision checks and Manipulation & TasksWorkspaceThe region of space the robot can reach. limits.
VerificationUse camera Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. to decide whether the object moved to the expected zone.

Build path

  • Phase 0 - paper and repo pass: read the HUG paper breakdown, run the official Robot LearningInferenceUsing a trained model to make predictions or choose actions. app on sample inputs, and save at least three predicted grasps.
  • Phase 1 - custom Perception & SensingRGB-DSensor input that combines color images and depth information. capture: capture rgb.png, depth.png, and intrinsics.txt from your camera, run HUG on a single object, and visualize the predicted MANO hand.
  • Phase 2 - target-point automation: replace manual clicking with Perception & SensingSegmentationDividing an image into meaningful regions or object masks.. Pick the object centroid, a handle point, or a high-affordance point predicted by a Modern Robot LearningVision-Language Model (VLM)A model that understands both images and text..
  • Phase 3 - Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. retargeting: map MANO wrist and fingers to your Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.. If you only have a parallel Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects., use HUG to choose approach pose and Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. region, then close the Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects. normally.
  • Phase 4 - one object, one reliable loop: repeatedly pick and place one object with 20 trials. Record Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. and failure modes.
  • Phase 5 - YouTube Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. layer: convert a video into a 5-10 step plan, then execute only the safe pick/place subset.

Suggested hardware lanes

  • No hardware lane: run HUG Robot LearningInferenceUsing a trained model to make predictions or choose actions., visualize grasps, and simulate the pick/place loop with a fake executor.
  • Low-cost lane: SO-101 or xArm Lite style arm, RealSense/ZED camera, and parallel Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects.. Use HUG for grasp pose priors.
  • Dexterous lane: xArm/Franka/UR5 plus LEAP, Allegro, Ability, WUJI, or another anthropomorphic hand. Retarget MANO to hand joints and keep the first Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. slow and open-loop.

Deliverable checklist

Stretch goals

  • Generate 8 HUG grasp candidates, rank by Manipulation & TasksReachabilityWhether the robot can physically access a target position., collision clearance, and hand limit margin.
  • Add closed-loop visual servoing before closing the hand.
  • Train a small success classifier from before/after images.
  • Add a second YouTube Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. such as snack sorting, tool sorting, or coffee-bar dry run.
  • Compare HUG-based grasp priors against a simple top-down Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects. Evaluation & ResearchBaselineA reference method used for comparison..
  • Publish a short writeup: what YouTube helped with, what it did not help with, and where the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. still needed real data.

What counts as a good version

A good version does not need a humanoid or a perfect Core ConceptsPolicyThe rule or model that maps observations or states to actions.. It needs a clear boundary:

  • YouTube gives Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. structure and Modern Robot LearningAffordanceWhat actions an object allows, such as a handle being pullable or a button being pressable. hints.
  • HUG gives a dexterous grasp prior from Perception & SensingRGB-DSensor input that combines color images and depth information..
  • Your Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. stack handles calibration, retargeting, Core ConceptsExecutionActually carrying out planned or predicted actions on the robot., and safety.
  • Your Simulation & Sim-to-RealEvaluationMeasuring how well a robot system performs. says exactly where the system fails.

If you can show that loop on 3-6 desk objects with honest failure analysis, this is a strong post-curriculum robotics project.