Thursday, October 12, 2017

Hierarchical Bayesian model's parameter Interpretation on Stan

Usually, Hierarchical Bayesian model has many parameters. So apparently, the interception to the sampled point’s statistical information looks complex.

On the article below, I made a Hierarchical Bayesian model to the artificial data. Here, by using almost same but simpler data, I’ll make a model and try to interpret.

Hierarchical Bayesian model by Stan: Struggling

I'll try to make Hierarchical Bayesian model to the artificial data by Stan. Hierarchical Bayesian model lets us write the model with a high degree of freedom.



Data


I used almost same data as the one on the article above. But the amount of the data of each group is unbalanced. So, on this article, I arranged the data balance.

import numpy as np
np.random.seed(49)

x_1 = list(range(1, 100))
x_2 = list(range(1, 100))
x_3 = list(range(1, 100))

y_1 = [np.random.normal(3.0 * i + 4, 4.0) for i in x_1]
y_2 = [np.random.normal(4.0 * i + 5, 4.0) for i in x_2]
y_3 = [np.random.normal(2.0 * i + 2, 20.0) for i in x_3]

x_1_sign = [1 for i in x_1]
x_2_sign = [2 for i in x_2]
x_3_sign = [3 for i in x_3]

X = x_1 + x_2 + x_3
Y = y_1 + y_2 + y_3
GID = x_1_sign + x_2_sign + x_3_sign

We can check the behavior of the data by the plot.

import matplotlib.pyplot as plt

plt.plot(x_1, y_1)
plt.plot(x_2, y_2)
plt.plot(x_3, y_3)
plt.show()

enter image description here

As you can see, the data has three types of points.

Hierarchical Bayesian model


The model itself is same as the one on the article, Hierarchical Bayesian model by Stan.

data {
    int N;
    int G;
    int X[N];
    real Y[N];
    int<lower=1, upper=G> GID[N];
}

parameters {
    real a0;
    real b0;
    real ag[G];
    real bg[G];
    real<lower=0> sigma_a;
    real<lower=0> sigma_b;
    real<lower=0> sigma_Y;
}

transformed parameters {
    real a[G];
    real b[G];

    for (g in 1:G){
        a[g] = a0 + ag[g];
        b[g] = b0 + bg[g];
    }
}

model {
    for (g in 1:G) {
        ag[g] ~ normal(0, sigma_a);
        bg[g] ~ normal(0, sigma_b);
    }
    for (i in 1:N){
        Y[i] ~ normal(a[GID[i]] + b[GID[i]] * X[i], sigma_Y);
    }
}

On Python, the code below executes the sampling.

import pystan
data_hier = {'X': X, 'Y': Y, 'N':len(X), 'GID':GID, 'G':3}
fit = pystan.stan(file='hierarchical.stan', data=data_hier, iter=10000)

Before checking the summary of it, let’s look at the plot.
Those three types are recognized on the plot.

fit.plot()

enter image description here

Finally, we can tackle with the interpretation of the summary.

print(fit)
Inference for Stan model: anon_model_3f0dce44b23e650381c177b45c857744.
4 chains, each with iter=10000; warmup=5000; thin=1; 
post-warmup draws per chain=5000, total post-warmup draws=20000.

          mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
a0        3.44    0.83  14.02  -8.44   2.45   4.07   5.64  15.44    285   1.01
b0        3.28    0.43   6.47  -6.91   2.14   3.01   3.85  16.07    229   1.01
ag[0]     1.01    0.83  14.04 -11.01  -0.84   0.15   1.62  13.43    287   1.01
ag[1]     0.66    0.83  14.01 -11.56  -1.15   0.02   1.31  12.73    286   1.01
ag[2]     0.38    0.83  14.02 -11.74  -1.42  -0.05   1.05  12.35    284   1.01
bg[0]    -0.29    0.43   6.47  -13.1  -0.86  -0.02   0.85   9.91    229   1.01
bg[1]     0.73    0.43   6.47 -12.05   0.16    1.0   1.87  10.94    229   1.01
bg[2]    -1.27    0.43   6.47 -14.08  -1.84   -1.0  -0.13   8.95    229   1.01
sigma_a   8.49    1.56  44.41   0.12   1.09   2.57   5.98  41.87    810   1.01
sigma_b    6.6    1.02   22.2   0.64   1.25   2.17   4.75  40.14    477   1.01
sigma_Y   12.1  7.5e-3   0.51  11.15  11.75  12.09  12.43  13.15   4554    1.0
a[0]      4.45    0.01   2.02   0.57   3.13   4.41   5.72   8.56  20000    1.0
a[1]       4.1    0.01   2.03   0.02   2.78   4.08   5.41   8.12  20000    1.0
a[2]      3.82    0.01   2.01   -0.3   2.56   3.87   5.13   7.75  20000    1.0
b[0]       3.0  2.6e-4   0.04   2.92   2.97    3.0   3.02   3.07  20000    1.0
b[1]      4.01  2.6e-4   0.04   3.94   3.99   4.01   4.04   4.09  20000    1.0
b[2]      2.01  2.6e-4   0.04   1.94   1.99   2.01   2.04   2.09  20000    1.0
lp__    -891.4    0.15   3.97 -899.9 -893.8 -891.2 -888.8 -884.0    729   1.01

Samples were drawn using NUTS at Thu Oct 12 16:26:01 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

On the model, I separated the interception, and the slope, into common part and group part. On this context, the word, “common”, means the common part between the groups and the word, “group”, means the group-specific part.



The model can be expressed as followings.





Let’s focus on the parameter b.
About parameter b, we should check , and . On the summary of it, (, , ) is (, , ). When we want to know the slope of each group, we should simply check .

For example, on the data, the slope of each groups are (3.0, 4.0, 2.0) and if we focus on the mean of sampled points, (, , ) is (3.0, 4.01, 2.01).
We can say same things to the parameter a.