Training and Convergence#

train() runs the SDDP forward/backward iterations and returns an SDDPResult holding the lower bound and the convergence history.

result = sddp.train(n_iter=20, rel_tol=1e-3, patience=3)

Iterations and early stopping#

n_iter is the hard cap on the number of iterations. On its own the run does exactly that many and reports stop_reason == "max_iter".

The lower bound usually plateaus well before the cap, so rel_tol and patience add a stopping rule: training stops once the bound improves by less than rel_tol (relative) for patience consecutive iterations, reporting stop_reason == "converged". On ClearLake, rel_tol=1e-3, patience=3 stops after 8 of the 20 allowed iterations. Leaving rel_tol=None (the default) disables early stopping and runs all n_iter.

To train a risk-averse policy, pass risk=; see Risk Aversion (CVaR).

The result#

SDDPResult carries the outcome of the run:

lower_bound: the rigorous lower bound at the final iteration.
iterations_run: how many iterations actually ran.
stop_reason: "converged", "max_iter" or "interrupted".
convergence_table: the per-iteration bounds.

print(result) prints the summary box shown in the tutorial. When the instance is verbose (the default), training also prints one row per iteration as it goes.

Measuring the optimality gap#

The lower bound tells you how good the policy could be, not how good it is. Passing gap_paths runs an out-of-sample Monte Carlo of the trained policy after training and reports a rigorous gap:

result = sddp.train(n_iter=20, rel_tol=1e-3, patience=3, gap_paths=500)

Policy cost          :    1.210000E+2 ±  3.626809E+1   (500 MC paths, 95% CI)
Optimality gap       :        7.1862 %

The policy’s mean realised cost upper-bounds the true optimum, which the lower bound bounds from below, so their difference is the optimality gap (result.optimality_gap_pct), reported with the Monte Carlo confidence interval (policy_cost_mean ± policy_cost_stderr). gap_paths=0 (the default) skips this entirely and is perf-neutral.

Interrupting training#

Long runs can be stopped gracefully. Pressing Ctrl+C once finishes the current iteration, then returns the policy trained so far with stop_reason == "interrupted", and the cuts learned up to that point are intact and usable. Pressing Ctrl+C a second time aborts hard.