Classic ML Formulations#

This module provides formulations for embedding regression trees, random forests, and gradient boosted trees directly into GAMSPy optimization models. Models trained with scikit-learn can be represented as constraints within an optimization problem, enabling seamless integration of machine learning predictions with mathematical programming.

Supported formulations#

RegressionTree#

MIP Badge indicating this formulation uses binary and/or integer variables

When a Decision Tree is trained to predict numerical values (rather than class labels), it is referred to as a Regression Tree. Here is an example where we train a Regression tree and use the formulation to embed in an optimization model.

It should be noted we are using the sklearn.tree.DecisionTreeRegressor for convenience. You can also provide the information from the trained decision tree as a DecisionTreeStruct instance.

../../_images/regressionTree.png
import numpy as np
from sklearn.tree import DecisionTreeRegressor

import gamspy as gp
from gamspy.math import dim

X = np.array(
   [
      [2, 3],
      [3, 1],
      [1, 2],
      [5, 6],
      [6, 4],
   ]
)
y = np.array([10, 10, 10, 15, 33])

regressor = DecisionTreeRegressor(random_state=42)
# This is the regressor that you want to include in
# your optimization model
regressor.fit(X, y)

m = gp.Container()
# Formulation requires the regressor
dt_formulation = gp.formulations.RegressionTree(m, regressor)
# Let's create a sample input
m_input = gp.Parameter(m, "input", domain=dim((5, 2)), records=X)

# y_pred = regressor(m_input) and eqns are the equations that
# create this relation
y_pred, eqns = dt_formulation(m_input)

predict_values = gp.Model(
   m,
   "regressionTree",
   equations=eqns,
   problem="MIP",
)
predict_values.solve()
print(y_pred.toDense().flatten())
# [10. 10. 10. 15. 33.]

RandomForest#

MIP Badge indicating this formulation uses binary and/or integer variables

Random Forests fall into the category of ensembling techniques where multiple Decision trees are trained in parallel with random parts of the same data. The final prediction is then the average of all the Regression trees predictions. Here is an example where we train a Random Forest and use the formulation to embed in an optimization model.

It should be noted we are using the sklearn.ensemble.RandomForestRegressor for convenience. You can also provide the information from the trained Random forest as a list of DecisionTreeStruct

import numpy as np
from sklearn.ensemble import RandomForestRegressor

import gamspy as gp
from gamspy.math import dim

X = np.array(
   [
      [2, 3],
      [3, 1],
      [1, 2],
      [5, 6],
      [6, 4],
   ]
)
y = np.array([10, 10, 10, 15, 33])

ensemble = RandomForestRegressor(random_state=42)
# This is the ensemble that you want to include in
# your optimization model
ensemble.fit(X, y)

m = gp.Container()
# Formulation requires the trained ensemble
rf_formulation = gp.formulations.RandomForest(m, ensemble)
# Let's create a sample input
m_input = gp.Parameter(m, "input", domain=dim((5, 2)), records=X)

# y_pred = ensemble(m_input) and eqns are the equations that
# create this relation
y_pred, eqns = rf_formulation(m_input)

predict_values = gp.Model(
   m,
   "randomForest",
   equations=eqns,
   problem="MIP",
)
predict_values.solve()
print(y_pred.toDense().flatten())
# [10.46 10.23 10.23 19.41 25.83]

Note

Formulating a Random Forest with a large number of trees in GAMSPy can be time-intensive, as the formulation must traverse each tree individually.

GradientBoosting#

MIP Badge indicating this formulation uses binary and/or integer variables

Gradient Boosted trees also fall into the category of ensembling techniques where multiple Decision trees are trained sequentially, with each new tree learning to correct the errors of the previous ones. The contribution of each tree is scaled by a learning rate, and the final prediction is the weighted sum of the outputs from all individual trees. Here is an example where we train a Gradient Boosted Tree and use the formulation to embed in an optimization model.

It should be noted we are using the sklearn.ensemble.GradientBoostingRegressor for convenience. You can also provide the information from the trained Gradient Boosted Tree as a list of DecisionTreeStruct.

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor

import gamspy as gp
from gamspy.math import dim

X = np.array(
   [
      [2, 3],
      [3, 1],
      [1, 2],
      [5, 6],
      [6, 4],
   ]
)
y = np.array([10, 10, 10, 15, 33])

ensemble = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1)
# This is the ensemble that you want to include in
# your optimization model
ensemble.fit(X, y)

m = gp.Container()
# Formulation requires the trained ensemble
gbt_formulation = gp.formulations.GradientBoosting(m, ensemble)
# Let's create a sample input
m_input = gp.Parameter(m, "input", domain=dim((5, 2)), records=X)

# y_pred = ensemble(m_input) and eqns are the equations that
# create this relation
y_pred, eqns = gbt_formulation(m_input)

predict_values = gp.Model(
   m,
   "gradientBoostedTrees",
   equations=eqns,
   problem="MIP",
)
predict_values.solve()
print(y_pred.toDense().flatten())
# [10.00014874 10.00014874 10.00014874 15.00001594 32.99953783]

Note

Formulating Gradient Boosted Trees with a large number of trees in GAMSPy can be time-intensive, as the formulation must traverse each tree individually.