Using a Trained Policy#
Training produces a policy: the collection of cuts. Two methods put it to work:
policy() answers a single “what should I do here?” question, and
simulate() evaluates the policy across many random paths.
Point queries with policy()#
policy() solves one stage at a given situation: a stage, the incoming state,
and the realised noise.
decision = sddp.policy(stage="mar", state=150, noise=350, report=[R, L, Z, F])
decision.decisions # {'R': 200.0, 'L': 250.0, 'Z': 0.0, 'F': 50.0}
decision.approx_cost_to_go # 625.0
The returned PolicyResult carries the queried stage, the
incoming_state and noise, the optimal decisions for the reported
variables, and approx_cost_to_go, the immediate stage cost plus the
cut-approximated future cost from here on.
stateis a scalar for a single state variable, or adictkeyed by state-variable name when there are several.reportlists the variables whose optimal level to return; it defaults to the state variables. A time-only variable reports afloat; a variable with extra dimensions reports adictkeyed by the non-time labels.
Note
policy() needs a trained policy. Calling it before train() warns that
no cuts exist yet and returns a decision that is not the trained policy.
Monte Carlo with simulate()#
simulate() runs the policy forward on fresh sampled paths and returns the
realised cost distribution.
sim = sddp.simulate(n_paths=1000, seed=0)
print(sim.summary)
n_paths 1000.000000
mean 140.000000
std 438.283119
p5 0.000000
p50 0.000000
p95 1500.000000
max 2500.000000
Name: total_cost, dtype: float64
The SimulationResult holds the per-path total_cost together with the
per-(path, stage) stage_costs, noise and reported variables; its
summary reduces total_cost to the mean, standard deviation and
percentiles above. report defaults to the state variables, and seed
defaults to the training seed plus one.
Both policy() and simulate() work on an instance reloaded from a
.sddp file, so a saved policy can be queried without retraining.
See also
The ClearLake tutorial runs both methods on a complete
model; Risk Aversion (CVaR) interprets the skewed cost distribution that
simulate() reveals here.