Optimal control
Gradgen provides the class SingleShootingProblem, which facilitates the
computation of the gradient of the total cost function of a finite horizon
optimal control problem using the single shooting formulation.
Although you can use Function to construct
the total cost function of your optimal control problem, and then you can use
gradgen's automatic differentiation to compute the
gradient, SingleShootingProblem produces staged Rust code: the generated
code exploits the structure of the optimal control problem. Instead, if you simply
use Function you can end up with thousands of lines of generated code.
Single shooting optimal control
We can the discrete-time dynamical system
where is the state, is the input, and is a parameter vector. We have a stage cost function and a terminal cost function, .
For a horizon we denote the sequence of control actions as .
The total cost function with horizon is
where note that , so is a function of and .
The objective is to generated code for , its gradient with respect to , as well as Hessian-vector products.
The computation of the gradient and Hessian-vector products is particularly useful for solving optimal control problems numerically.
Example
System dynamics
As an example, consider a dynamical system with , , and . The system dynamics is
Let us create the necessary symbols and let us define the system dynamics
nx, nu, np, N = 2, 1, 2, 5
x = SXVector.sym("x", nx)
u = SXVector.sym("u", nu)
p = SXVector.sym("p", np)
dynamics = Function(
"dynamics",
[x, u, p],
[SXVector((x[0] + p[0] * x[1] + u[0], x[1] + p[1] * u[0] - 0.5 * x[0]))],
input_names=["x", "u", "p"],
output_names=["x_next"],
)
Stage and terminal costs
Next, the stage cost function is
stage_cost = Function(
"stage_cost",
[x, u, p],
[x[0] * x[0] + 2.0 * x[1] * x[1] + 0.3 * u[0] * u[0]],
input_names=["x", "u", "p"],
output_names=["ell"],
)
Lastly, the terminal cost function is
terminal_cost = Function(
"terminal_cost",
[x, p],
[3.0 * x[0] * x[0] + 0.5 * x[1] * x[1]],
input_names=["x", "p"],
output_names=["vf"],
)
Single shooting total cost function
We can now construct an instance of SingleShootingProblem
problem = (
SingleShootingProblem("mpc_cost")
.with_horizon(N)
.with_dynamics(dynamics)
.with_costs(stage_cost, terminal_cost)
.with_input_names(
initial_state_name="x0",
control_sequence_name="u_seq",
parameter_name="p",
)
)
Code generation
We can now generate code for our problem. We will generate a Rust
crate with a function that computes , (with respect to the
sequence of inputs), and Hessian-vector products, .
Alongside, we want this function to return the corresponding sequence of
states (this is what add_rollout_states does). Here is an example:
builder = (
CodeGenerationBuilder()
.with_backend_config(
RustBackendConfig()
.with_crate_name("single_shooting_kernel")
.with_backend_mode("no_std")
)
.for_function(problem)
.add_joint(
SingleShootingBundle()
.add_cost()
.add_gradient()
.add_hvp()
.add_rollout_states()
)
.done()
).build()
Instead of a single functions that computes the total cost, its gradient, and Hessian-vector products at the same time, we can generate separate functions (see docs for details) using
.for_function(problem)
.add_primal(include_states=True)
.add_gradient(include_states=True)
.add_hvp(include_states=True)
Generated code
An excerpt of the generated code is shown below
// Computation of total cost and state trajectory
for stage_index in 0..5 {
let u_t = &u_seq[stage_index..(stage_index + 1)];
single_shooting_kernel_mpc_cost_stage_cost(current_state, u_t, p, scalar_buffer, stage_work);
total_cost += scalar_buffer[0];
single_shooting_kernel_mpc_cost_dynamics(current_state, u_t, p, next_state, stage_work);
x_traj[((stage_index + 1) * 2)..((stage_index + 2) * 2)].copy_from_slice(next_state);
core::mem::swap(&mut current_state, &mut next_state);
}
Note the use of a for loop in the generated code.
The automatically generated Rust code for the computation of the cost gradient is also structured code.
Generated gradient
This is how the generated code for looks like (excerpt from auto-generated code):
// Forward pass
for stage_index in 0..5 {
let u_t = &u_seq[stage_index..(stage_index + 1)];
single_shooting_kernel_mpc_cost_dynamics(current_state, u_t, p, next_state, stage_work);
state_history[(stage_index * 2)..((stage_index + 1) * 2)].copy_from_slice(next_state);
core::mem::swap(&mut current_state, &mut next_state);
}
// Backward pass
single_shooting_kernel_mpc_cost_terminal_cost_grad_x(current_state, p, lambda_current, stage_work);
for stage_index in (1..5).rev() {
let x_t = &state_history[((stage_index - 1) * 2)..(stage_index * 2)];
let u_t = &u_seq[stage_index..(stage_index + 1)];
let grad_u_t = &mut gradient_u_seq[stage_index..(stage_index + 1)];
single_shooting_kernel_mpc_cost_stage_cost_grad_u(x_t, u_t, p, grad_u_t, stage_work);
single_shooting_kernel_mpc_cost_dynamics_vjp_u(x_t, u_t, p, &lambda_current[..], temp_control, stage_work);
grad_u_t[0] += temp_control[0];
single_shooting_kernel_mpc_cost_stage_cost_grad_x(x_t, u_t, p, lambda_next, stage_work);
single_shooting_kernel_mpc_cost_dynamics_vjp_x(x_t, u_t, p, &lambda_current[..], temp_state, stage_work);
lambda_next[0] += temp_state[0];
lambda_next[1] += temp_state[1];
lambda_current.copy_from_slice(lambda_next);
}
As a function
The single-shooting optimal control problem (problem) can be
cast as a Function using .to_function():
problem_function = problem.to_function()
We can then call this function as follows (the arguments of the function
are x0, u_seq, and p)1
val = problem_function(x0=[1., 2.],
u_seq=[0.1, 0.2, 0.3, 0.4, 0.5],
p=[-0.1, 0.5])
Quadratic penalty functions
Consider an optimal control problem where the total cost function with horizon involves quadratic penalty terms, that is, it has the form
where is the penalty parameter.
If is large enough, then for all , and . This way we can impose soft constraints at every stage. The formulation is also used within the quadratic penalty method.
Details
Internally gradgen compiles the stage cost function and the terminal cost function .
Example
As an example, suppose we have the stage penalty function given by
and the terminal penalty function given by
The terminal penalty, , can be used to enforce the constraint .
# Stage penalty
stage_penalty = Function(
"stage_penalty",
[x, u, p],
[SXVector((x[0] + u[0] - p[0], x[1] - p[1]))],
input_names=["x", "u", "p"],
output_names=["q"],
)
# Terminal penalty
terminal_penalty = Function(
"terminal_penalty",
[x, p],
[SXVector((maximum(0.0, x.norm2sq() - 1.0 - p[0]),))],
input_names=["x", "p"],
output_names=["q_n"],
)
We can then construct an instance of SingleShootingProblem and
specify the above penalty functions using
with_penalties as follows:
problem = (
SingleShootingProblem("penalized_mpc_cost")
.with_horizon(N)
.with_dynamics(dynamics)
.with_costs(stage_cost, terminal_cost)
.with_penalties(stage_penalty, terminal_penalty, c)
.with_input_names(
initial_state_name="x0",
control_sequence_name="u_seq",
parameter_name="p",
)
)