Pickstock#

Importing Modules#

The gamspy Python package is loaded in order to access GAMSPy syntax.

[1]:
from gamspy import Container, Set, Alias, Parameter, Variable, Equation, Model, Sum, Sense, Card, Options

Loading the price data#

The price data is provided from a CSV file located on the web. The Python package pandas is used to load the data.

[2]:
import pandas as pd

url = (
    "https://github.com/daveh19/pydataberlin2017/raw/master/notebooks/dowjones2016.csv"
)
price_data = pd.read_csv(url)
[3]:
m = Container()
date = Set(m, "date", description="trading date")
symbol = Set(m, "symbol", description="stock symbol")
price = Parameter(
    m,
    "price",
    [date, symbol],
    domain_forwarding=True,
    records=price_data,
    description="price of stock on date",
)
d = Alias(m, "d", date)
s = Alias(m, "s", symbol)
[4]:
price.records.head()
[4]:
date symbol value
0 2016-01-04 AAPL 105.349998
1 2016-01-04 AXP 67.589996
2 2016-01-04 BA 140.500000
3 2016-01-04 CAT 67.989998
4 2016-01-04 CSCO 26.410000

The mean price per stock is calculated in GAMS.

[5]:
avgprice = Parameter(m, "avgprice", symbol, description="average price of stock")
avgprice[s] = Sum(d, price[d, s]) / Card(d)

avgprice.records.head()
[5]:
symbol value
0 AAPL 104.604008
1 AXP 63.793333
2 BA 133.111508
3 CAT 78.698016
4 CSCO 28.789683

The averages can be used in order to calculate weights.

[6]:
weight = Parameter(m, "weight", symbol, description="weight of stock")
weight[symbol] = avgprice[symbol] / Sum(s, avgprice[s])
weight.records.head()
[6]:
symbol value
0 AAPL 0.039960
1 AXP 0.024370
2 BA 0.050850
3 CAT 0.030063
4 CSCO 0.010998

Compute the contributions using weight and price

[7]:
contribution = Parameter(m, "contribution", [date, symbol])
contribution[d, s] = weight[s] * price[d, s]
contribution.records.head()
[7]:
date symbol value
0 2016-01-04 AAPL 4.209752
1 2016-01-04 AXP 1.647143
2 2016-01-04 BA 7.144396
3 2016-01-04 CAT 2.044007
4 2016-01-04 CSCO 0.290455

Compute index values

[8]:
index = Parameter(m, "index", date, description="Dow Jones index")
index[d] = Sum(s, contribution[d, s])
index.records.head()
[8]:
date value
0 2016-01-04 100.572929
1 2016-01-05 100.511422
2 2016-01-06 99.014207
3 2016-01-07 96.606033
4 2016-01-08 95.685461

Create a plot showing the symbol and index values over time

[9]:
import matplotlib
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
price.records.groupby("symbol", observed=True).plot(
    x="date",
    y="value",
    ax=ax,
    alpha=0.4,
    color="grey",
    legend=False,
    ylim=(0, 300),
    xticks=[],
)
index.records.plot(x="date", y="value", ax=ax, linewidth=2, legend=False)
[9]:
<Axes: xlabel='date'>
../../_images/user_notebooks_pickstock_15_1.png
[10]:
sMap = matplotlib.cm.ScalarMappable(
    norm=matplotlib.colors.Normalize(0, weight.records["value"].max()), cmap="inferno"
)
color = [sMap.to_rgba(x) for x in weight.records["value"]]
weight.records.plot.bar(x="symbol", y="value", color=color)
[10]:
<Axes: xlabel='symbol'>
../../_images/user_notebooks_pickstock_16_1.png

Define dynamic set ds and scalar maxstock.

[11]:
trainingdays = Parameter(m, "trainingdays", records=100)
maxstock = Parameter(m, "maxstock", records=3)
ds = Set(m, "ds", date, description="selected dates")
ds.setRecords(date.records["uni"][:100])

Declaration of the variables and equations used to formulate the optimization model

[12]:
p = Variable(m, "p", "binary", symbol, description="is stock included?")
w = Variable(m, "w", "positive", symbol, description="what part of the portfolio")
slpos = Variable(m, "slpos", "positive", date, description="positive slack")
slneg = Variable(m, "slneg", "positive", date, description="negative slack")

Defining the actual model#

We know come to the decision problem, where we want to pick a small subset of the stocks together with some weights, such that this portfolio has a similar behavior to our overall Dow Jones index.

The model is based on a linear regression over the time series, but we minimize the loss using the L1-norm (absolute value), and allow only a fixed number of weights to take nonzero variable.

\begin{align} \text{minimize} \qquad & \text{obj}:= \sum_{ds} \text{slpos}_{ds} + \text{slneg}_{ds} \\ \text{subject to} \qquad & \sum_{s} \text{price}_{ds, s} \cdot w_{s} = \text{index}_{ds} + \text{slpos}_{ds} - \text{slneg}_{ds} & (\forall{ds}) \\ & w_{s} \leq p_{s} & (\forall{s}) \\ & \sum_{s}{p_{s}} \leq \text{maxstock} \\ & w_{s}\geq 0, \qquad p_{s}\in \{0,1\} & (\forall s) \\ & \text{slpos}_{d}\geq 0, \qquad \text{slneg}_{d}\geq 0 & (\forall d) \end{align}

[13]:
fit = Equation(m, name="deffit", domain=[ds], description="fit to Dow Jones indeex")
fit[ds] = Sum(s, price[ds, s] * w[s]) == index[ds] + slpos[ds] - slneg[ds]

pick = Equation(
    m, name="defpick", domain=[s], description="can only use stok if picked"
)
pick[s] = w[s] <= p[s]

numstock = Equation(m, name="defnumstock", description="few stocks allowed")
numstock[...] = Sum(s, p[s]) <= maxstock

Objective

[14]:
obj = Sum(ds, slpos[ds] + slneg[ds])

Model

[15]:
pickstock = Model(
    container=m,
    name="pickstock",
    equations=m.getEquations(),
    problem="MIP",
    sense=Sense.MIN,
    objective=obj,
)

Specify ‘maxstock’ and ‘trainingdays’ and solve the model.

[16]:
ds.setRecords(date.records["uni"][:100])
maxstock.setRecords(3)

Solve

[17]:
pickstock.solve(options=Options(relative_optimality_gap=0.01, time_limit=6))
[17]:
Solver Status Model Status Objective Num of Equations Num of Variables Model Type Solver Solver Time
0 Normal Integer 45.5725938967286 132 261 MIP CPLEX 0.142

Generate reporting parameters

[18]:
fund = Parameter(m, "fund", [date], description="Index fund report parameter")
fund[d] = Sum(s, price[d, s] * w.l[s])
error = Parameter(m, "error", [date], description="Absolute error")
error[d] = index[d] - fund[d]

Plotting of the results

[19]:
fig, ax = plt.subplots()
trainingDays = int(trainingdays.records.iloc[0].to_list()[0])
index.records.plot(
    y="value",
    ax=ax,
    xticks=[0, trainingDays, len(date.records)],
    yticks=[],
    label="Dow Jones",
)
fund.records.plot(
    y="value",
    ax=ax,
    xticks=[0, trainingDays, len(date.records)],
    yticks=[],
    label="Index Fund",
)
[19]:
<Axes: >
../../_images/user_notebooks_pickstock_34_1.png
[20]:
maxError = error.records["value"].max()
fig, ax = plt.subplots(edgecolor="black")
error.records["value"][trainingDays:].plot.hist(
    x="date",
    y="value",
    ax=ax,
    bins=20,
    range=(0, maxError),
    label="later",
    color="red",
    edgecolor="black",
)
error.records["value"][:trainingDays].plot.hist(
    x="date",
    y="value",
    ax=ax,
    bins=20,
    range=(0, maxError),
    label="training",
    color="green",
    alpha=0.8,
    edgecolor="black",
)
[20]:
<Axes: ylabel='Frequency'>
../../_images/user_notebooks_pickstock_35_1.png