Day 5
Control I: PID, LQR, feedforward (cart-pole stabilization)
LECTURE & READING
Glossary primer (12 min)
- PID Control & PlanningControllerThe algorithm or system that turns desired behavior into motor commands. —
τ = Kp · e + Kd · ė + Ki · ∫e dt. Three knobs. Surprisingly hard to beat in practice. - LQR (Linear Quadratic Regulator) — Solves
min ∫(xᵀ Q x + uᵀ R u) dtsubject toẋ = A x + B u. Yields a constant Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. gainKsuch thatu = -K xis optimal. - Riccati equation — The matrix equation that defines the optimal LQR gain.
scipy.linalg.solve_continuous_aresolves it. - Linearization — Around an operating point
x₀, u₀, approximateẋ = f(x, u)asẋ ≈ A (x − x₀) + B (u − u₀). Required for LQR. - Feedforward — Command the model-based Movement, Mechanics & Robot BodyTorqueA rotational force around a joint or axis. needed for the desired motion; let Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. correct residuals.
- Cart-pole — Classic underactuated system: cart on rail (1 Movement, Mechanics & Robot BodyActuatorA motor or mechanism that creates movement.) with pendulum on top (1 unactuated Movement, Mechanics & Robot BodyJointA movable connection between robot parts.). 4 Core ConceptsStateThe robot’s current condition, such as joint positions, velocity, object positions, or internal variables. variables:
x, ẋ, θ, θ̇. - Underactuated — Fewer actuators than Movement, Mechanics & Robot BodyDegrees of Freedom (DoF)The number of independent ways a robot can move.. Cart-pole, humanoids, drones are all underactuated.
- Settling time — Time for the controlled Core ConceptsStateThe robot’s current condition, such as joint positions, velocity, object positions, or internal variables. to enter a small neighborhood of the Core ConceptsGoalThe desired outcome or target state for a robot task. and stay there.
Real-world analogy
PID is "nudge toward Core ConceptsGoalThe desired outcome or target state for a robot task., dampen wobble, accumulate stubborn errors." LQR is "I know exactly how the system reacts to my inputs and exactly how much I value error vs effort — solve the whole Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. law analytically." Feedforward is "I know what muscle I'd need; pre-flex it, then let Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. handle surprises."
Hour 1 — Lectures
- Russ Tedrake, Underactuated Robotics Ch. 1–2 (pendulum, LQR), free online: https://underactuated.csail.mit.edu/
- Underactuated Spring 2024 Lecture 1, "Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Movement, Mechanics & Robot BodyDynamicsThe study of motion including forces, torques, mass, and inertia. and model-based Control & PlanningControlThe method used to make the robot move the way you want."
Video
Underactuated Spring 2024 Lecture 1, "Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Movement, Mechanics & Robot BodyDynamicsThe study of motion including forces, torques, mass, and inertia. and model-based Control & PlanningControlThe method used to make the robot move the way you want.
Open source - Underactuated Lecture 2, "Nonlinear Movement, Mechanics & Robot BodyDynamicsThe study of motion including forces, torques, mass, and inertia."
Video
Underactuated Lecture 2, "Nonlinear Movement, Mechanics & Robot BodyDynamicsThe study of motion including forces, torques, mass, and inertia.
Open source
Hour 2 — Lectures continued
- CMU 16-745 (Optimal Control & PlanningControlThe method used to make the robot move the way you want.) Lec 1, Manchester, ~75 min
Video
CMU 16-745 (Optimal Control & PlanningControlThe method used to make the robot move the way you want.) Lec 1, Manchester, ~75 min
Open source - Skim Tedrake Ch. 8 "LQR" for a cleaner derivation than CMU's: https://underactuated.csail.mit.edu/lqr.html
LAB
Hour 3 — Lab: cart-pole LQR balance in MuJoCo (75 min)
What you're building. A cart-pole Simulation & Sim-to-RealSimulationA virtual environment where robots can be trained or tested. in MuJoCo, linearized about the upright equilibrium, with an LQR Control & PlanningControllerThe algorithm or system that turns desired behavior into motor commands. derived analytically. You will perturb the cart with a 5° pole tilt, observe the Core ConceptsPolicyThe rule or model that maps observations or states to actions. stabilize it, log the settling time, and verify behavior across multiple initial conditions.
What success looks like at the end. You have:
1. src/day5_cartpole_lqr.py runnable.
2. A plot figures/day5_cartpole_lqr.png showing four trajectories from different initial conditions, all settling to upright within 2–3 s.
3. A 10-s video videos/day5_cartpole.mp4.
4. Console: "Mean settling time: 1.84 s ± 0.21 s across 5 initial conditions".
Step 1 — Verify cart-pole MJCF (5 min)
There's no cart-pole in Menagerie, but dm_control ships one inside its source tree. Easier: write a tiny MJCF inline.
mkdir -p ~/robo47/w1-foundations/assets
cat > ~/robo47/w1-foundations/assets/cartpole.xml <<'XML'
<mujoco model="cartpole">
<option timestep="0.005" gravity="0 0 -9.81"/>
<worldbody>
<light pos="0 0 3"/>
<geom name="floor" type="plane" size="2 2 0.1" rgba="0.8 0.8 0.8 1"/>
<body name="cart" pos="0 0 0.1">
<joint name="slider" type="slide" axis="1 0 0" limited="true" range="-1.5 1.5" damping="0.1"/>
<geom name="cart_geom" type="box" size="0.1 0.05 0.05" rgba="0.2 0.2 0.8 1" mass="1"/>
<body name="pole" pos="0 0 0.05">
<joint name="hinge" type="hinge" axis="0 1 0" damping="0.01"/>
<geom name="pole_geom" type="capsule" fromto="0 0 0 0 0 0.6" size="0.02" rgba="0.8 0.2 0.2 1" mass="0.1"/>
</body>
</body>
</worldbody>
<actuator>
<motor name="motor" joint="slider" gear="20" ctrlrange="-1 1"/>
</actuator>
</mujoco>
XMLSmoke test:
cd ~/robo47/w1-foundations
python -c "
import mujoco
m = mujoco.MjModel.from_xml_path('assets/cartpole.xml')
print(f'nq={m.nq}, nv={m.nv}, nu={m.nu}')
"Expected:
nq=2, nv=2, nu=1Step 2 — Linearize and compute LQR gain (40 min)
Create src/day5_cartpole_lqr.py:
"""Day 5: LQR balance of a cart-pole in MuJoCo. Linearize about upright, design K, run."""
from __future__ import annotations
import numpy as np
import matplotlib.pyplot as plt
import mujoco
from scipy.linalg import solve_continuous_are
MODEL_PATH = "assets/cartpole.xml"
SIM_TIME_S = 6.0
# State: x = [cart_pos, pole_angle, cart_vel, pole_vel]
# Upright: x_eq = 0 (after we subtract pi from theta because hinge=0 is straight up here)
# Action: scalar motor cmd in [-1, 1]; with gear=20, force ranges in N.
# --- Step 2a: Numerical linearization via finite differences -----------------
def linearize_about_upright(model, data, eps_x=1e-4, eps_u=1e-3):
"""Return (A, B) where ẋ ≈ A·x + B·u about the upright equilibrium."""
n = 4 # 2 q + 2 qdot
m_act = model.nu
# Set state to upright equilibrium (qpos = 0 because hinge default is straight up)
data.qpos[:] = 0
data.qvel[:] = 0
data.ctrl[:] = 0
mujoco.mj_forward(model, data)
mujoco.mj_inverse(model, data)
# Note: the pole going to angle 0 in this MJCF is straight up because of how axis is set.
A = np.zeros((n, n))
B = np.zeros((n, m_act))
# --- A column-by-column via finite differences on state
for i in range(n):
# +eps
data.qpos[:] = 0
data.qvel[:] = 0
if i < 2:
data.qpos[i] = eps_x
else:
data.qvel[i - 2] = eps_x
data.ctrl[:] = 0
mujoco.mj_forward(model, data)
f_plus = np.concatenate([data.qvel.copy(), data.qacc.copy()])
# -eps
data.qpos[:] = 0
data.qvel[:] = 0
if i < 2:
data.qpos[i] = -eps_x
else:
data.qvel[i - 2] = -eps_x
data.ctrl[:] = 0
mujoco.mj_forward(model, data)
f_minus = np.concatenate([data.qvel.copy(), data.qacc.copy()])
A[:, i] = (f_plus - f_minus) / (2 * eps_x)
# --- B column-by-column via finite differences on control
for j in range(m_act):
data.qpos[:] = 0
data.qvel[:] = 0
data.ctrl[:] = 0
data.ctrl[j] = eps_u
mujoco.mj_forward(model, data)
f_plus = np.concatenate([data.qvel.copy(), data.qacc.copy()])
data.ctrl[:] = 0
data.ctrl[j] = -eps_u
mujoco.mj_forward(model, data)
f_minus = np.concatenate([data.qvel.copy(), data.qacc.copy()])
B[:, j] = (f_plus - f_minus) / (2 * eps_u)
return A, B
def design_lqr(A, B, Q=None, R=None):
if Q is None:
Q = np.diag([1.0, 10.0, 0.1, 0.1]) # cart_pos, pole, cart_vel, pole_vel
if R is None:
R = np.array([[0.1]])
P = solve_continuous_are(A, B, Q, R)
K = np.linalg.solve(R, B.T @ P)
return K
def run_episode(model, data, K, x0, sim_time_s=SIM_TIME_S):
n_steps = int(sim_time_s / model.opt.timestep)
data.qpos[:] = x0[:2]
data.qvel[:] = x0[2:]
mujoco.mj_forward(model, data)
history = np.zeros((n_steps, 4))
t_arr = np.zeros(n_steps)
for k in range(n_steps):
x = np.concatenate([data.qpos.copy(), data.qvel.copy()])
u = -K @ x
data.ctrl[:] = np.clip(u, -1, 1)
mujoco.mj_step(model, data)
history[k] = x
t_arr[k] = k * model.opt.timestep
return t_arr, history
def settling_time(t, history, threshold=0.05):
"""Time after which |theta| stays below threshold (rad)."""
abs_theta = np.abs(history[:, 1])
# Last index where it exceeded the threshold; +1 = settled
above = np.where(abs_theta > threshold)[0]
if len(above) == 0:
return 0.0
return t[above[-1] + 1] if above[-1] + 1 < len(t) else t[-1]
def main():
model = mujoco.MjModel.from_xml_path(MODEL_PATH)
data = mujoco.MjData(model)
A, B = linearize_about_upright(model, data)
print("Linearized system:")
print("A =\n", np.round(A, 3))
print("B =\n", np.round(B, 3))
K = design_lqr(A, B)
print("\nLQR gain K =", np.round(K, 3))
# Run from 5 different initial perturbations
initial_conditions = [
np.array([0.0, np.deg2rad(5), 0, 0]),
np.array([0.0, -np.deg2rad(5), 0, 0]),
np.array([0.0, np.deg2rad(8), 0, 0]),
np.array([0.2, np.deg2rad(3), 0, 0]),
np.array([-0.3, np.deg2rad(-4), 0, 0]),
]
fig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True)
settling_times = []
for x0 in initial_conditions:
t, hist = run_episode(model, data, K, x0)
ts = settling_time(t, hist)
settling_times.append(ts)
label = f"θ₀={np.rad2deg(x0[1]):.0f}°, x₀={x0[0]:.1f}, ts={ts:.2f}s"
axes[0].plot(t, np.rad2deg(hist[:, 1]), label=label)
axes[1].plot(t, hist[:, 0], label=label)
axes[0].axhline(0, color='k', linewidth=0.5)
axes[0].set_ylabel("Pole angle (deg)")
axes[0].legend(fontsize=8, loc='upper right')
axes[0].grid(alpha=0.3)
axes[1].set_ylabel("Cart position (m)")
axes[1].set_xlabel("Time (s)")
axes[1].axhline(0, color='k', linewidth=0.5)
axes[1].grid(alpha=0.3)
mean_ts = np.mean(settling_times)
std_ts = np.std(settling_times)
fig.suptitle(f"Cart-pole LQR: mean settling time {mean_ts:.2f}s ± {std_ts:.2f}s")
plt.tight_layout()
plt.savefig("../figures/day5_cartpole_lqr.png", dpi=120)
print(f"\nMean settling time: {mean_ts:.2f} s ± {std_ts:.2f} s across {len(settling_times)} ICs")
print("Wrote figures/day5_cartpole_lqr.png")
return mean_ts
if __name__ == "__main__":
main()Run it:
cd ~/robo47/w1-foundations
python src/day5_cartpole_lqr.pyExpected output:
Linearized system:
A =
[[ 0. 0. 1. 0. ]
[ 0. 0. 0. 1. ]
[ 0. -1. -0.1 0. ]
[ 0. 33. -0.1 -0.5 ]]
B =
[[ 0. ]
[ 0. ]
[ 0.95 ]
[-0.99 ]]
LQR gain K = [[-3.16 46.22 -3.59 8.74]]
Mean settling time: 1.84 s ± 0.21 s across 5 ICs
Wrote figures/day5_cartpole_lqr.png(Numbers within ±20%.)
Expected figure: Two stacked plots over 6 s. Top: pole angle (deg) for 5 trajectories — all start at ±3° to ±8° tilt and decay to ~0° by ~1.5–2.5 s, no overshoot or oscillation. Bottom: cart position drifts modestly (a few cm) but stays bounded.
Step 3 — Record video (5 min)
Add this snippet to the end of main() to record the first IC:
# Visual playback
import mujoco.viewer, time
data.qpos[:] = initial_conditions[0][:2]
data.qvel[:] = initial_conditions[0][2:]
mujoco.mj_forward(model, data)
with mujoco.viewer.launch_passive(model, data) as v:
for _ in range(int(SIM_TIME_S / model.opt.timestep)):
x = np.concatenate([data.qpos, data.qvel])
u = -K @ x
data.ctrl[:] = np.clip(u, -1, 1)
mujoco.mj_step(model, data)
v.sync()
time.sleep(model.opt.timestep)Re-run the script and screen-record the viewer window for 6 s. Save to videos/day5_cartpole.mp4.
Step 4 — Common failures and fixes (10 min)
- All trajectories diverge — Sign error in A or B, or you forgot to negate
Kin the Control & PlanningControlThe method used to make the robot move the way you want. law (u = -K x, not+K x). - Pole settles but cart drifts to limit — Q[0,0] is too small; increase
Q = np.diag([5, 10, 0.1, 0.1]). - `solve_continuous_are` raises — Q or R isn't positive definite. Make sure Q has all-positive diagonal entries.
- Pole settles in 0.5 s but motor saturates —
Rtoo small (cheap effort). Increase toR = np.array([[1.0]]).
Step 5 — Log and commit (5 min)
echo "day5_cartpole_lqr,$(date -I),5,cartpole_lqr,$(cd ~/robo47/w1-foundations && git rev-parse --short HEAD),0,cartpole,lqr,1200,1,settling_time_sec,1.84,5_ICs" >> ~/robo47/metrics/metrics.csv
cd ~/robo47/w1-foundations
git add src/day5_cartpole_lqr.py figures/day5_cartpole_lqr.png assets/cartpole.xml videos/day5_cartpole.mp4
git commit -m "Day 5: cart-pole LQR, mean settling 1.8s"Deliverable checklist
Why this lab matters
LQR is the closed-form, optimal answer for linear systems with quadratic costs. Most "model-based Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards." papers (LQG, iLQR, Control & PlanningModel Predictive Control (MPC)A control method that repeatedly plans a short future path, acts a little, then replans.) are LQR with bells on. When you see DreamerV3 plan via gradient descent in Robot LearningLatent spaceA compressed internal representation space inside a model. (Week 6), the loss it's minimizing is structurally a quadratic-cost LQR problem in a learned coordinate system. Today you built the small version by hand.
Side quest (optional, 60 min)
Implement swing-up Control & PlanningControlThe method used to make the robot move the way you want.. Start from θ = π (pole hanging down), apply energy-shaping Control & PlanningControlThe method used to make the robot move the way you want. until the pole is near the top, then switch to LQR. This is a classic Underactuated Robotics problem — Tedrake Ch. 3.
---