Training and Convergence#
train() runs the SDDP forward/backward iterations and returns an
SDDPResult holding the lower bound and the convergence history.
result = sddp.train(n_iter=20, rel_tol=1e-3, patience=3)
Iterations and early stopping#
n_iter is the hard cap on the number of iterations. On its own the run does
exactly that many and reports stop_reason == "max_iter".
The lower bound usually plateaus well before the cap, so rel_tol and
patience add a stopping rule: training stops once the bound improves by less
than rel_tol (relative) for patience consecutive iterations, reporting
stop_reason == "converged". On ClearLake, rel_tol=1e-3, patience=3 stops
after 8 of the 20 allowed iterations. Leaving rel_tol=None (the default)
disables early stopping and runs all n_iter.
To train a risk-averse policy, pass risk=; see Risk Aversion (CVaR).
The result#
SDDPResult carries the outcome of the run:
lower_bound: the rigorous lower bound at the final iteration.iterations_run: how many iterations actually ran.stop_reason:"converged","max_iter"or"interrupted".convergence_table: the per-iteration bounds.
print(result) prints the summary box shown in the
tutorial. When the instance is verbose (the default),
training also prints one row per iteration as it goes.
Measuring the optimality gap#
The lower bound tells you how good the policy could be, not how good it is.
Passing gap_paths runs an out-of-sample Monte Carlo of the trained policy
after training and reports a rigorous gap:
result = sddp.train(n_iter=20, rel_tol=1e-3, patience=3, gap_paths=500)
Policy cost : 1.210000E+2 ± 3.626809E+1 (500 MC paths, 95% CI)
Optimality gap : 7.1862 %
The policy’s mean realised cost upper-bounds the true optimum, which the lower
bound bounds from below, so their difference is the optimality gap
(result.optimality_gap_pct), reported with the Monte Carlo confidence
interval (policy_cost_mean ± policy_cost_stderr). gap_paths=0 (the
default) skips this entirely and is perf-neutral.
Interrupting training#
Long runs can be stopped gracefully. Pressing Ctrl+C once finishes the
current iteration, then returns the policy trained so far with
stop_reason == "interrupted", and the cuts learned up to that point are intact
and usable. Pressing Ctrl+C a second time aborts hard.
See also
The ClearLake tutorial shows a full training run and its summary.