Overview

On this article, I'll make the local level model with explanatory variable to time series data on Stan.
Before, I made the simple local level model on Stan. In the practical situation, we frequently need to make model with some explanatory variables. So, I'll make simple local level model with explanatory variables here.
As a reference, I’m using the following book. This article is dealing with the chapter 5 of the book.

Data

For the small experiment, I made the following artificial data.

import numpy as np
# make data
np.random.seed(59)

x = np.random.normal(100, 5, 100)


beta = 10.0
data = []
u = []

u_start = 500.0

data.append(u_start + beta * x[0] + np.random.normal(0, 2))
u.append(u_start)
for i in range(100):
    if i == 0:
        u_temp = u_start
    else:
        next_u = u_temp + np.random.normal(0, 2)

        data.append(next_u + beta * x[i] + np.random.normal(0, 2))
        u.append(next_u)
        u_temp = next_u

The variable ‘data’ is time series data with explanatory variable x. By visualizing, let’s check how it is.

import matplotlib.pyplot as plt
plt.plot(list(range(len(data))), data)

plt.show()

This time, the data has one explanatory variable. If we check the relations between explanatory and explained variables, it becomes like this.

plt.scatter(x, data)
plt.show()

In a nutshell, this data is time series data with one explanatory variable.

Stan Modeling

Mathematically, the model can be expressed as followings.
$u_t \sim Normal(u_{t-1}, \sigma_ξ^2)$
$β_t \sim Normal(β_{t-1}, \sigma_τ^2)$
$y_t \sim Normal(u_t + β_t \times x_t, \sigma_e^2)$
The state disturbances $\sigmaτ^2$ are usually fixed on zero.

data {
    int N;
    vector[N] X;
    vector[N] Y;
}

parameters {
    vector[N] u;
    real beta;
    real<lower=0> s_y;
    real<lower=0> s_u;
}

transformed parameters {
    vector[N] y_hat;
    y_hat = u + beta * X;
}

model {
    u[2:N] ~ normal(u[1:(N-1)], s_u);
    Y ~ normal(y_hat, s_y);
}

Execute

On Python, we can execute.

import pystan

data_feed = {'N': len(data), 'X': x, 'Y': data}
fit = pystan.stan(file='local_level_experiment.stan', data=data_feed, iter=1000)

Check the outcome

By the code below, we can get sampled points.

samples = fit.extract(permuted=True)
beta_mean = samples['beta'].mean()
u_mean = samples['u'].mean(axis=0)

data_pred = u + beta_mean * x

By visualizing, we can check if it traces the real data or not.

plt.plot(list(range(len(data))), data)
plt.plot(list(range(len(data))), data_pred)
plt.show()

Prediction

On the article below, I showed how to predict future points by using generated quantities block on Stan.

Time series analysis to predict future points on Stan

By the same manner as this, I'll predict the future points. As a premise, here we already have the explanatory variable’s values for the future.
The model is like this. On the data block, I added two items. On the generated quantities block, I'm making points for future, using the obtained parameters.

data {
    int N;
    int pred_N;
    vector[N] X;
    vector[pred_N] X_future;
    vector[N] Y;
}

parameters {
    vector[N] u;
    real beta;
    real<lower=0> s_y;
    real<lower=0> s_u;
}

transformed parameters {
    vector[N] y_hat;
    y_hat = u + beta * X;
}

model {
    u[2:N] ~ normal(u[1:(N-1)], s_u);
    Y ~ normal(y_hat, s_y);
}

generated quantities {
    vector[N + pred_N] u_pred;
    vector[pred_N] y_pred;
    u_pred[1:N] = u;
    for (i in 1:pred_N){
        u_pred[N+i] = normal_rng(u_pred[N+i-1], s_u);
        y_pred[i] = normal_rng(u_pred[N+i] + beta * X_future[i] , s_y);
    }
}

On Python, we can do sampling. Before that, I made the explanatory data for that.

# future explanatory variable
np.random.seed(32)
x_future = np.random.normal(100, 5, 50)

data_feed = {'N': len(data), 'pred_N': len(x_future), 'X': x, 'X_future': x_future, 'Y': data}
fit_generated = pystan.stan(file='local_level_experiment_generated.stan', data=data_feed, iter=1000)

By visualizing, we can check how it works.

samples_generated = fit_generated.extract(permuted=True)
y_pred = samples_generated['y_pred']

pred_time = list(range(len(data) + len(x_future)))
all_data = list(data_pred) + list(y_pred.mean(axis=0))

plt.plot(list(range(len(data))), data)
plt.plot(pred_time, all_data)
plt.show()

Data Science Struggle

Monday, January 1, 2018

Local level model with explanatory variable to time series data on Stan