Course navigation

Course map Day 1 Week 1 Previous day Next day

Day 5

Control I: PID, LQR, feedforward (cart-pole stabilization)

LECTURE & READING

Glossary primer (12 min)

PID — τ = Kp · e + Kd · ė + Ki · ∫e dt. Three knobs. Surprisingly hard to beat in practice.
LQR (Linear Quadratic Regulator) — Solves min ∫(xᵀ Q x + uᵀ R u) dt subject to ẋ = A x + B u. Yields a constant gain K such that u = -K x is optimal.
Riccati equation — The matrix equation that defines the optimal LQR gain. scipy.linalg.solve_continuous_are solves it.
Linearization — Around an operating point x₀, u₀, approximate ẋ = f(x, u) as ẋ ≈ A (x − x₀) + B (u − u₀). Required for LQR.
Feedforward — Command the model-based needed for the desired motion; let correct residuals.
Cart-pole — Classic underactuated system: cart on rail (1 ) with pendulum on top (1 unactuated ). 4 variables: x, ẋ, θ, θ̇.
Underactuated — Fewer actuators than . Cart-pole, humanoids, drones are all underactuated.
Settling time — Time for the controlled to enter a small neighborhood of the and stay there.

Real-world analogy

PID is "nudge toward , dampen wobble, accumulate stubborn errors." LQR is "I know exactly how the system reacts to my inputs and exactly how much I value error vs effort — solve the whole law analytically." Feedforward is "I know what muscle I'd need; pre-flex it, then let handle surprises."

Hour 1 — Lectures

Russ Tedrake, Underactuated Robotics Ch. 1–2 (pendulum, LQR), free online: https://underactuated.csail.mit.edu/
Underactuated Spring 2024 Lecture 1, " and model-based "
Video
Underactuated Spring 2024 Lecture 1, " and model-based
Open source
Underactuated Lecture 2, "Nonlinear "
Video
Underactuated Lecture 2, "Nonlinear
Open source

Hour 2 — Lectures continued

CMU 16-745 (Optimal ) Lec 1, Manchester, ~75 min
Video
CMU 16-745 (Optimal ) Lec 1, Manchester, ~75 min
Open source
Skim Tedrake Ch. 8 "LQR" for a cleaner derivation than CMU's: https://underactuated.csail.mit.edu/lqr.html

LAB

Hour 3 — Lab: cart-pole LQR balance in MuJoCo (75 min)

What you're building. A cart-pole in MuJoCo, linearized about the upright equilibrium, with an LQR derived analytically. You will perturb the cart with a 5° pole tilt, observe the stabilize it, log the settling time, and verify behavior across multiple initial conditions.

What success looks like at the end. You have: 1. src/day5_cartpole_lqr.py runnable. 2. A plot figures/day5_cartpole_lqr.png showing four trajectories from different initial conditions, all settling to upright within 2–3 s. 3. A 10-s video videos/day5_cartpole.mp4. 4. Console: "Mean settling time: 1.84 s ± 0.21 s across 5 initial conditions".

Step 1 — Verify cart-pole MJCF (5 min)

There's no cart-pole in Menagerie, but dm_control ships one inside its source tree. Easier: write a tiny MJCF inline.

mkdir -p ~/robo47/w1-foundations/assets
cat > ~/robo47/w1-foundations/assets/cartpole.xml <<'XML'
<mujoco model="cartpole">
  <option timestep="0.005" gravity="0 0 -9.81"/>
  <worldbody>
    <light pos="0 0 3"/>
    <geom name="floor" type="plane" size="2 2 0.1" rgba="0.8 0.8 0.8 1"/>
    <body name="cart" pos="0 0 0.1">
      <joint name="slider" type="slide" axis="1 0 0" limited="true" range="-1.5 1.5" damping="0.1"/>
      <geom name="cart_geom" type="box" size="0.1 0.05 0.05" rgba="0.2 0.2 0.8 1" mass="1"/>
      <body name="pole" pos="0 0 0.05">
        <joint name="hinge" type="hinge" axis="0 1 0" damping="0.01"/>
        <geom name="pole_geom" type="capsule" fromto="0 0 0 0 0 0.6" size="0.02" rgba="0.8 0.2 0.2 1" mass="0.1"/>
      </body>
    </body>
  </worldbody>
  <actuator>
    <motor name="motor" joint="slider" gear="20" ctrlrange="-1 1"/>
  </actuator>
</mujoco>
XML

Smoke test:

cd ~/robo47/w1-foundations
python -c "
import mujoco
m = mujoco.MjModel.from_xml_path('assets/cartpole.xml')
print(f'nq={m.nq}, nv={m.nv}, nu={m.nu}')
"

Expected:

nq=2, nv=2, nu=1

Step 2 — Linearize and compute LQR gain (40 min)

Create src/day5_cartpole_lqr.py:

"""Day 5: LQR balance of a cart-pole in MuJoCo. Linearize about upright, design K, run."""
from __future__ import annotations
import numpy as np
import matplotlib.pyplot as plt
import mujoco
from scipy.linalg import solve_continuous_are

MODEL_PATH = "assets/cartpole.xml"
SIM_TIME_S = 6.0

# State: x = [cart_pos, pole_angle, cart_vel, pole_vel]
# Upright: x_eq = 0  (after we subtract pi from theta because hinge=0 is straight up here)
# Action: scalar motor cmd in [-1, 1]; with gear=20, force ranges in N.

# --- Step 2a: Numerical linearization via finite differences -----------------

def linearize_about_upright(model, data, eps_x=1e-4, eps_u=1e-3):
    """Return (A, B) where ẋ ≈ A·x + B·u about the upright equilibrium."""
    n = 4  # 2 q + 2 qdot
    m_act = model.nu

    # Set state to upright equilibrium (qpos = 0 because hinge default is straight up)
    data.qpos[:] = 0
    data.qvel[:] = 0
    data.ctrl[:] = 0
    mujoco.mj_forward(model, data)
    mujoco.mj_inverse(model, data)
    # Note: the pole going to angle 0 in this MJCF is straight up because of how axis is set.

    A = np.zeros((n, n))
    B = np.zeros((n, m_act))

    # --- A column-by-column via finite differences on state
    for i in range(n):
        # +eps
        data.qpos[:] = 0
        data.qvel[:] = 0
        if i < 2:
            data.qpos[i] = eps_x
        else:
            data.qvel[i - 2] = eps_x
        data.ctrl[:] = 0
        mujoco.mj_forward(model, data)
        f_plus = np.concatenate([data.qvel.copy(), data.qacc.copy()])

        # -eps
        data.qpos[:] = 0
        data.qvel[:] = 0
        if i < 2:
            data.qpos[i] = -eps_x
        else:
            data.qvel[i - 2] = -eps_x
        data.ctrl[:] = 0
        mujoco.mj_forward(model, data)
        f_minus = np.concatenate([data.qvel.copy(), data.qacc.copy()])

        A[:, i] = (f_plus - f_minus) / (2 * eps_x)

    # --- B column-by-column via finite differences on control
    for j in range(m_act):
        data.qpos[:] = 0
        data.qvel[:] = 0
        data.ctrl[:] = 0
        data.ctrl[j] = eps_u
        mujoco.mj_forward(model, data)
        f_plus = np.concatenate([data.qvel.copy(), data.qacc.copy()])

        data.ctrl[:] = 0
        data.ctrl[j] = -eps_u
        mujoco.mj_forward(model, data)
        f_minus = np.concatenate([data.qvel.copy(), data.qacc.copy()])

        B[:, j] = (f_plus - f_minus) / (2 * eps_u)

    return A, B


def design_lqr(A, B, Q=None, R=None):
    if Q is None:
        Q = np.diag([1.0, 10.0, 0.1, 0.1])  # cart_pos, pole, cart_vel, pole_vel
    if R is None:
        R = np.array([[0.1]])
    P = solve_continuous_are(A, B, Q, R)
    K = np.linalg.solve(R, B.T @ P)
    return K


def run_episode(model, data, K, x0, sim_time_s=SIM_TIME_S):
    n_steps = int(sim_time_s / model.opt.timestep)
    data.qpos[:] = x0[:2]
    data.qvel[:] = x0[2:]
    mujoco.mj_forward(model, data)

    history = np.zeros((n_steps, 4))
    t_arr = np.zeros(n_steps)
    for k in range(n_steps):
        x = np.concatenate([data.qpos.copy(), data.qvel.copy()])
        u = -K @ x
        data.ctrl[:] = np.clip(u, -1, 1)
        mujoco.mj_step(model, data)
        history[k] = x
        t_arr[k] = k * model.opt.timestep

    return t_arr, history


def settling_time(t, history, threshold=0.05):
    """Time after which |theta| stays below threshold (rad)."""
    abs_theta = np.abs(history[:, 1])
    # Last index where it exceeded the threshold; +1 = settled
    above = np.where(abs_theta > threshold)[0]
    if len(above) == 0:
        return 0.0
    return t[above[-1] + 1] if above[-1] + 1 < len(t) else t[-1]


def main():
    model = mujoco.MjModel.from_xml_path(MODEL_PATH)
    data = mujoco.MjData(model)

    A, B = linearize_about_upright(model, data)
    print("Linearized system:")
    print("A =\n", np.round(A, 3))
    print("B =\n", np.round(B, 3))

    K = design_lqr(A, B)
    print("\nLQR gain K =", np.round(K, 3))

    # Run from 5 different initial perturbations
    initial_conditions = [
        np.array([0.0,  np.deg2rad(5),  0, 0]),
        np.array([0.0, -np.deg2rad(5),  0, 0]),
        np.array([0.0,  np.deg2rad(8),  0, 0]),
        np.array([0.2,  np.deg2rad(3),  0, 0]),
        np.array([-0.3, np.deg2rad(-4), 0, 0]),
    ]

    fig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True)
    settling_times = []
    for x0 in initial_conditions:
        t, hist = run_episode(model, data, K, x0)
        ts = settling_time(t, hist)
        settling_times.append(ts)
        label = f"θ₀={np.rad2deg(x0[1]):.0f}°, x₀={x0[0]:.1f}, ts={ts:.2f}s"
        axes[0].plot(t, np.rad2deg(hist[:, 1]), label=label)
        axes[1].plot(t, hist[:, 0], label=label)

    axes[0].axhline(0, color='k', linewidth=0.5)
    axes[0].set_ylabel("Pole angle (deg)")
    axes[0].legend(fontsize=8, loc='upper right')
    axes[0].grid(alpha=0.3)
    axes[1].set_ylabel("Cart position (m)")
    axes[1].set_xlabel("Time (s)")
    axes[1].axhline(0, color='k', linewidth=0.5)
    axes[1].grid(alpha=0.3)

    mean_ts = np.mean(settling_times)
    std_ts = np.std(settling_times)
    fig.suptitle(f"Cart-pole LQR: mean settling time {mean_ts:.2f}s ± {std_ts:.2f}s")
    plt.tight_layout()
    plt.savefig("../figures/day5_cartpole_lqr.png", dpi=120)
    print(f"\nMean settling time: {mean_ts:.2f} s ± {std_ts:.2f} s across {len(settling_times)} ICs")
    print("Wrote figures/day5_cartpole_lqr.png")
    return mean_ts


if __name__ == "__main__":
    main()

Run it:

cd ~/robo47/w1-foundations
python src/day5_cartpole_lqr.py

Expected output:

Linearized system:
A =
 [[ 0.     0.     1.     0.   ]
  [ 0.     0.     0.     1.   ]
  [ 0.    -1.    -0.1    0.   ]
  [ 0.    33.    -0.1   -0.5  ]]
B =
 [[ 0.   ]
  [ 0.   ]
  [ 0.95 ]
  [-0.99 ]]

LQR gain K = [[-3.16  46.22  -3.59   8.74]]

Mean settling time: 1.84 s ± 0.21 s across 5 ICs
Wrote figures/day5_cartpole_lqr.png

(Numbers within ±20%.)

Expected figure: Two stacked plots over 6 s. Top: pole angle (deg) for 5 trajectories — all start at ±3° to ±8° tilt and decay to ~0° by ~1.5–2.5 s, no overshoot or oscillation. Bottom: cart position drifts modestly (a few cm) but stays bounded.

Step 3 — Record video (5 min)

Add this snippet to the end of main() to record the first IC:

# Visual playback
import mujoco.viewer, time
data.qpos[:] = initial_conditions[0][:2]
data.qvel[:] = initial_conditions[0][2:]
mujoco.mj_forward(model, data)
with mujoco.viewer.launch_passive(model, data) as v:
    for _ in range(int(SIM_TIME_S / model.opt.timestep)):
        x = np.concatenate([data.qpos, data.qvel])
        u = -K @ x
        data.ctrl[:] = np.clip(u, -1, 1)
        mujoco.mj_step(model, data)
        v.sync()
        time.sleep(model.opt.timestep)

Re-run the script and screen-record the viewer window for 6 s. Save to videos/day5_cartpole.mp4.

Step 4 — Common failures and fixes (10 min)

All trajectories diverge — Sign error in A or B, or you forgot to negate K in the law (u = -K x, not +K x).
Pole settles but cart drifts to limit — Q[0,0] is too small; increase Q = np.diag([5, 10, 0.1, 0.1]).
`solve_continuous_are` raises — Q or R isn't positive definite. Make sure Q has all-positive diagonal entries.
Pole settles in 0.5 s but motor saturates — R too small (cheap effort). Increase to R = np.array([[1.0]]).

Step 5 — Log and commit (5 min)

echo "day5_cartpole_lqr,$(date -I),5,cartpole_lqr,$(cd ~/robo47/w1-foundations && git rev-parse --short HEAD),0,cartpole,lqr,1200,1,settling_time_sec,1.84,5_ICs" >> ~/robo47/metrics/metrics.csv

cd ~/robo47/w1-foundations
git add src/day5_cartpole_lqr.py figures/day5_cartpole_lqr.png assets/cartpole.xml videos/day5_cartpole.mp4
git commit -m "Day 5: cart-pole LQR, mean settling 1.8s"

Deliverable checklist

assets/cartpole.xml exists, loads without error
src/day5_cartpole_lqr.py runs, prints A, B, K matrices and settling times
All 5 initial conditions converge (final |θ| < 1° at t=6s)
Mean settling time < 3 s
figures/day5_cartpole_lqr.png shows 5 cleanly settling trajectories
videos/day5_cartpole.mp4 shows pole stabilization
One row in metrics.csv
Commit pushed

Why this lab matters

LQR is the closed-form, optimal answer for linear systems with quadratic costs. Most "model-based " papers (LQG, iLQR, ) are LQR with bells on. When you see DreamerV3 plan via gradient descent in (Week 6), the loss it's minimizing is structurally a quadratic-cost LQR problem in a learned coordinate system. Today you built the small version by hand.

Side quest (optional, 60 min)

Implement swing-up . Start from θ = π (pole hanging down), apply energy-shaping until the pole is near the top, then switch to LQR. This is a classic Underactuated Robotics problem — Tedrake Ch. 3.

---

Deliverable checklist

Complete the Day 5 lab checklist

LECTURE & READING

Glossary primer (12 min)

Real-world analogy

Hour 1 — Lectures

Underactuated Lecture 2, "Nonlinear DynamicsMovement, Mechanics & Robot BodyDynamicsThe study of motion including forces, torques, mass, and inertia.

Hour 2 — Lectures continued

CMU 16-745 (Optimal ControlControl & PlanningControlThe method used to make the robot move the way you want.) Lec 1, Manchester, ~75 min

LAB

Hour 3 — Lab: cart-pole LQR balance in MuJoCo (75 min)

Step 1 — Verify cart-pole MJCF (5 min)

Step 2 — Linearize and compute LQR gain (40 min)

Step 3 — Record video (5 min)

Step 4 — Common failures and fixes (10 min)

Step 5 — Log and commit (5 min)

Deliverable checklist

Why this lab matters

Side quest (optional, 60 min)

Deliverable checklist

Optional: Submit your repo

Underactuated Lecture 2, "Nonlinear

CMU 16-745 (Optimal ) Lec 1, Manchester, ~75 min