Binary Classification Example#

Learning to classify between two (or more) classes is a common problem that arises in many real-life applications. For example, you might have traffic data and want to determine, based on certain parameters, whether a road is congested or not. Based on the number of hours studied, you might want to predict whether a student will fail or pass.

One of the most common techniques for handling binary classification problems is the so-called logistic regression approach.

Learn more about logistic regression here.

We start with generating a simple dataset. We have a single feature, and two Gaussian distributions from which class 0 and class 1 are sampled.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

num_samples = 100

# Parameters for the first Gaussian distribution
mean1 = 2.0
std_dev1 = 1.0

# Parameters for the second Gaussian distribution
mean2 = 5.0
std_dev2 = 1.5

# Generate samples
samples0 = np.random.normal(mean1, std_dev1, num_samples)
samples1 = np.random.normal(mean2, std_dev2, num_samples)

# Plot the samples
plt.figure(figsize=(10, 6))
plt.scatter(samples0, np.zeros_like(samples0), color='blue', label='Class 0')
plt.scatter(samples1, np.zeros_like(samples1) + 1, color='red', label='Class 1')
plt.xlabel('Sample Value')
plt.ylabel('Class')
plt.title('Samples')
plt.legend()
plt.show()
../../_images/logistic_regression.png

Our goal is to fit the parameters of a function (specifically, the logistic function) where, given the feature of a sample, we calculate the probability that this sample belongs to class 1. Since we have only two classes, 1 minus that probability gives us the probability that the sample belongs to class 0.

Just by looking at the samples, you can guess \(f(0)\) should be very close to 0 whereas \(f(1)\) should be close to 1.

The function that we are trying to fit is:

\[p(x) = \frac{1}{1 + e ^ {-(\beta_0 + \beta_1 * x)}}\]

We need to come-up with \(\beta_0\) and \(\beta_1\) values so that samples belonging to class 1 have a higher \(p(x)\) and samples belonging to class 0 have \(p(x)\) close to 0.

Let’s start with passing data to GAMSPy

import gamspy as gp
import sys
from gamspy.math import dim, exp, log

m = gp.Container()

class_0_samples = gp.Parameter(m, name="samples_0",
                               domain=dim([100]), records=samples0)

class_1_samples = gp.Parameter(m, name="samples_1",
                               domain=dim([100]), records=samples1)

dim_100 = class_0_samples.domain[0]

Then, we are searching to optimize coefficients \(\beta_0\) and \(\beta_1\) minimizing the loss.

b0 = gp.Variable(m, name="bias")
b1 = gp.Variable(m, name="coefficient")
loss = gp.Variable(m, name="loss")

We are using log loss for guiding our optimization, which you can read more from the referenced wikipedia article.

def logistic(c0, c1, x):
    return 1 / (1 + exp(-c0 - x * c1))


def_loss = gp.Equation(m, name="calc_loss")

# Define the loss function
def_loss[...] = loss == gp.Sum(dim_100, - log(logistic(b0, b1, class_1_samples[...]))) + \
            gp.Sum(dim_100, - log(1 - logistic(b0, b1, class_0_samples[...])))

This is basically all we need. We put everything under the logistic model and solve it using your favourite NLP solver.

model_logistic = gp.Model(
        m,
        name="logistic",
        equations=m.getEquations(),
        problem="NLP",
        sense="min",
        objective=loss,
)

model_logistic.solve() # output=sys.stdout if you like to show the log from the solver

class_1_accuracy = gp.Parameter(m, name="accuracy1")
class_1_accuracy[...] = gp.Sum(dim_100, logistic(b0.l,  b1.l, class_1_samples) >= 0.5)


class_0_accuracy = gp.Parameter(m, name="accuracy2")
class_0_accuracy[...] = gp.Sum(dim_100, logistic(b0.l,  b1.l, class_0_samples) < 0.5)

learned_b0 = b0.toDense()
learned_b1 = b1.toDense()

avg_accuracy = (class_1_accuracy.toDense() + class_0_accuracy.toDense()) / 2
print(avg_accuracy, "% Accuracy")
# 91.0 % Accuracy

If we plot the logistic function on top of the samples:

def predict_class(b0, b1, x):
    prob = 1 / (1 + np.exp(-b0 - x * b1))
    return prob


# Create labels for the samples
labels1 = np.ones(100)
labels0 = np.zeros(100)

# Combine samples and labels
X = np.concatenate((samples1, samples0)).reshape(-1, 1)
y = np.concatenate((labels1, labels0))


# Generate a range of values for plotting the logistic function
x_values = np.linspace(min(X), max(X), 500).reshape(-1, 1)
y_values = predict_class(learned_b0, learned_b1, x_values)


plt.figure(figsize=(10, 6))
plt.scatter(samples1, np.zeros_like(samples1) + 1, color='red')
plt.scatter(samples0, np.zeros_like(samples0), color='blue')

plt.plot(x_values, y_values, color='green', linewidth=2, label='Logistic Function')
../../_images/logistic_regression_2.png

You can see how nicely the function fits the samples. In this example, we only trained a logistic regression model, but it is also possible to use this trained model in your optimization models.