Skip to main content

Optimal control

Gradgen provides the class SingleShootingProblem, which facilitates the computation of the gradient of the total cost function of a finite horizon optimal control problem using the single shooting formulation.

note

Although you can use Function to construct the total cost function of your optimal control problem, and then you can use gradgen's automatic differentiation to compute the gradient, SingleShootingProblem produces staged Rust code: the generated code exploits the structure of the optimal control problem. Instead, if you simply use Function you can end up with thousands of lines of generated code.

Single shooting optimal control

We can the discrete-time dynamical system

xk+1=f(xk,uk,p),x_{k+1} = f(x_k, u_k, p),

where xkRnxx_k \in \mathbb{R}^{n_x} is the state, ukRnuu_k\in \mathbb{R}^{n_u} is the input, and pRnpp \in \mathbb{R}^{n_p} is a parameter vector. We have a stage cost function :Rnx×RnuR\ell: \mathbb{R}^{n_x} \times \mathbb{R}^{n_u} \to \mathbb{R} and a terminal cost function, Vf:Rnx×RnpRV_f: \mathbb{R}^{n_x} \times \mathbb{R}^{n_p} \to \mathbb{R}.

For a horizon NN we denote the sequence of control actions as useq=(u0,,uN1)u_{\mathrm{seq}} = (u_0, \ldots, u_{N-1}).

The total cost function with horizon NN is

VN(x0,useq,p)=t=0N1(xt,ut,p)+Vf(xN,p),V_N(x_0, u_{\mathrm{seq}}, p) = \sum_{t=0}^{N-1} \ell(x_t, u_t, p) + V_f(x_N, p),

where note that xt=xt(x0,u0,,ut1)x_t = x_t(x_0, u_0, \ldots, u_{t-1}), so xtx_t is a function of x0x_0 and usequ_{\mathrm{seq}}.

The objective is to generated code for VNV_N, its gradient with respect to usequ_{\mathrm{seq}}, as well as Hessian-vector products.

The computation of the gradient and Hessian-vector products is particularly useful for solving optimal control problems numerically.

Example

System dynamics

As an example, consider a dynamical system with nx=2n_x = 2, nu=1n_u=1, and np=2n_p = 2. The system dynamics is

f(x,u,p)=[x1+p1x2+ux2+p2u0.5x2].f(x, u, p) = \begin{bmatrix} x_1 + p_1 x_2 + u \\ x_2 + p_2 u - 0.5 x_2 \end{bmatrix}.

Let us create the necessary symbols and let us define the system dynamics

Try it In Colab

nx, nu, np, N = 2, 1, 2, 5

x = SXVector.sym("x", nx)
u = SXVector.sym("u", nu)
p = SXVector.sym("p", np)

dynamics = Function(
"dynamics",
[x, u, p],
[SXVector((x[0] + p[0] * x[1] + u[0], x[1] + p[1] * u[0] - 0.5 * x[0]))],
input_names=["x", "u", "p"],
output_names=["x_next"],
)

Stage and terminal costs

Next, the stage cost function is

(x,u,p)=x12+2x22+0.3u2.\ell(x, u, p) = x_1^2 + 2 x_2^2 + 0.3 u^2.

Try it In Colab

stage_cost = Function(
"stage_cost",
[x, u, p],
[x[0] * x[0] + 2.0 * x[1] * x[1] + 0.3 * u[0] * u[0]],
input_names=["x", "u", "p"],
output_names=["ell"],
)

Lastly, the terminal cost function is

Vf(x,p)=3x12+0.5x22.V_f(x, p) = 3 x_1^2 + 0.5 x_2^2.

Try it In Colab

terminal_cost = Function(
"terminal_cost",
[x, p],
[3.0 * x[0] * x[0] + 0.5 * x[1] * x[1]],
input_names=["x", "p"],
output_names=["vf"],
)

Single shooting total cost function

We can now construct an instance of SingleShootingProblem

Try it In Colab

problem = (
SingleShootingProblem("mpc_cost")
.with_horizon(N)
.with_dynamics(dynamics)
.with_costs(stage_cost, terminal_cost)
.with_input_names(
initial_state_name="x0",
control_sequence_name="u_seq",
parameter_name="p",
)
)

Code generation

We can now generate code for our problem. We will generate a Rust crate with a function that computes VNV_N, VN\nabla V_N (with respect to the sequence of inputs), and Hessian-vector products, 2VNd\nabla^2 V_N^\intercal d. Alongside, we want this function to return the corresponding sequence of states (this is what add_rollout_states does). Here is an example:

Try it In Colab

builder = (
CodeGenerationBuilder()
.with_backend_config(
RustBackendConfig()
.with_crate_name("single_shooting_kernel")
.with_backend_mode("no_std")
)
.for_function(problem)
.add_joint(
SingleShootingBundle()
.add_cost()
.add_gradient()
.add_hvp()
.add_rollout_states()
)
.done()
).build()

Instead of a single functions that computes the total cost, its gradient, and Hessian-vector products at the same time, we can generate separate functions (see docs for details) using

Try it In Colab

.for_function(problem)
.add_primal(include_states=True)
.add_gradient(include_states=True)
.add_hvp(include_states=True)

Generated code

An excerpt of the generated code is shown below

// Computation of total cost and state trajectory
for stage_index in 0..5 {
let u_t = &u_seq[stage_index..(stage_index + 1)];
single_shooting_kernel_mpc_cost_stage_cost(current_state, u_t, p, scalar_buffer, stage_work);
total_cost += scalar_buffer[0];
single_shooting_kernel_mpc_cost_dynamics(current_state, u_t, p, next_state, stage_work);
x_traj[((stage_index + 1) * 2)..((stage_index + 2) * 2)].copy_from_slice(next_state);
core::mem::swap(&mut current_state, &mut next_state);
}

Note the use of a for loop in the generated code.

The automatically generated Rust code for the computation of the cost gradient is also structured code.

Generated gradient

This is how the generated code for VN(useq,p)\nabla V_N(u_{{\rm seq}}, p) looks like (excerpt from auto-generated code):

// Forward pass
for stage_index in 0..5 {
let u_t = &u_seq[stage_index..(stage_index + 1)];
single_shooting_kernel_mpc_cost_dynamics(current_state, u_t, p, next_state, stage_work);
state_history[(stage_index * 2)..((stage_index + 1) * 2)].copy_from_slice(next_state);
core::mem::swap(&mut current_state, &mut next_state);
}
// Backward pass
single_shooting_kernel_mpc_cost_terminal_cost_grad_x(current_state, p, lambda_current, stage_work);
for stage_index in (1..5).rev() {
let x_t = &state_history[((stage_index - 1) * 2)..(stage_index * 2)];
let u_t = &u_seq[stage_index..(stage_index + 1)];
let grad_u_t = &mut gradient_u_seq[stage_index..(stage_index + 1)];
single_shooting_kernel_mpc_cost_stage_cost_grad_u(x_t, u_t, p, grad_u_t, stage_work);
single_shooting_kernel_mpc_cost_dynamics_vjp_u(x_t, u_t, p, &lambda_current[..], temp_control, stage_work);
grad_u_t[0] += temp_control[0];
single_shooting_kernel_mpc_cost_stage_cost_grad_x(x_t, u_t, p, lambda_next, stage_work);
single_shooting_kernel_mpc_cost_dynamics_vjp_x(x_t, u_t, p, &lambda_current[..], temp_state, stage_work);
lambda_next[0] += temp_state[0];
lambda_next[1] += temp_state[1];
lambda_current.copy_from_slice(lambda_next);
}

As a function

The single-shooting optimal control problem (problem) can be cast as a Function using .to_function():

problem_function = problem.to_function()

We can then call this function as follows (the arguments of the function are x0, u_seq, and p)1

val = problem_function(x0=[1., 2.],
u_seq=[0.1, 0.2, 0.3, 0.4, 0.5],
p=[-0.1, 0.5])

Quadratic penalty functions

Consider an optimal control problem where the total cost function with horizon NN involves quadratic penalty terms, that is, it has the form

VN(x0,useq,p,c)=t=0N1(xt,ut,p)+c2q(xt,ut,p)22+Vf(xN,p)+c2qN(xN,p)22,V_N(x_0, u_{\mathrm{seq}}, p, c) = \sum_{t=0}^{N-1} \ell(x_t, u_t, p) + \tfrac{c}{2}\Vert q(x_t, u_t, p) \Vert_2^2 + V_f(x_N, p) + \tfrac{c}{2}\Vert q_N(x_N, p)\Vert_2^2,

where c>0c > 0 is the penalty parameter.

If cc is large enough, then q(xt,ut,p)0q(x_t, u_t, p) \approx 0 for all tt, and qN(xN,p)0q_N(x_N, p) \approx 0. This way we can impose soft constraints at every stage. The formulation is also used within the quadratic penalty method.

Details

Internally gradgen compiles the stage cost function ~(x,u,p,c)=(x,u,p)+c2q(x,u,p)22\tilde{\ell}(x, u, p, c) = \ell(x, u, p) + \tfrac{c}{2}\Vert q(x, u, p) \Vert_2^2 and the terminal cost function V~f(x,p,c)=Vf(x,p)+c2qN(x,p)22\tilde{V}_f(x, p, c) = V_f(x, p) + \tfrac{c}{2}\Vert q_N(x, p)\Vert_2^2.

Example

As an example, suppose we have the stage penalty function q:R2×R×R2R2q:\mathbb{R}^2\times \mathbb{R} \times \mathbb{R}^2 \to \mathbb{R}^2 given by

q(x,u,p)=[x1+up1x2p2],q(x, u, p) = \begin{bmatrix} x_1 + u - p_1 \\ x_2 - p_2 \end{bmatrix},

and the terminal penalty function qN:R2×R2Rq_N:\mathbb{R}^2\times \mathbb{R}^2 \to \mathbb{R} given by

qN(x,p)=max{0,x221p1}q_N(x, p) = \max\{0, \Vert x \Vert_2^2 - 1 - p_1\}

The terminal penalty, qNq_N, can be used to enforce the constraint x221p10\Vert x \Vert_2^2 - 1 - p_1 \leq 0.

Try it In Colab

# Stage penalty
stage_penalty = Function(
"stage_penalty",
[x, u, p],
[SXVector((x[0] + u[0] - p[0], x[1] - p[1]))],
input_names=["x", "u", "p"],
output_names=["q"],
)

# Terminal penalty
terminal_penalty = Function(
"terminal_penalty",
[x, p],
[SXVector((maximum(0.0, x.norm2sq() - 1.0 - p[0]),))],
input_names=["x", "p"],
output_names=["q_n"],
)

We can then construct an instance of SingleShootingProblem and specify the above penalty functions using with_penalties as follows:

Try it In Colab

problem = (
SingleShootingProblem("penalized_mpc_cost")
.with_horizon(N)
.with_dynamics(dynamics)
.with_costs(stage_cost, terminal_cost)
.with_penalties(stage_penalty, terminal_penalty, c)
.with_input_names(
initial_state_name="x0",
control_sequence_name="u_seq",
parameter_name="p",
)
)

Footnotes

  1. these have been defined using with_input_names (see here). Use problem_function.input_names to get the input names.