Monday, December 25, 2017

Time series analysis on TensorFlow and Edward: local level model

Overview

To review the time series analysis from the basic points, I tried to do state space modeling with TensorFlow and Edward. And I’m at a loss.
The main purposes are these two.

  • review the time series analysis from the basic points
  • try to check how to do that on Edward and TensorFlow




On other PPLs like Stan, I sometimes does time series analysis. To deepen the knowledge of time series analysis and to make use of Edward efficiently, I started to study time series analysis with Edward and TensorFlow. But actually, even on the first step, local level model, I’m at a loss.
For time series analysis review, I’m using this book.


Edward

I’m using Edward here. About Edward, please check the official pages and following theses.


About the simple trial on Edward, I wrote the article below.

Simple regression by Edward: variational inference

Edward is one of the PPL(probabilistic programming language). This enables us to use variational inference, Gibbs sampling and Monte Carlo method relatively easily. But it doesn't look so easy. So step by step, I'll try this. On this article, simple regression, tried on the article Simple Bayesian modeling by Stan, can be the nice example.

Data

For time series analysis, I prepared the following data.

import numpy as np

# make data
np.random.seed(59)
data = []
start_point = 10.0
data.append(start_point)
for i in range(100):
    if i == 0:
        temp = start_point
    else:
        next_temp = temp + np.random.normal(0, 1)
        data.append(next_temp)
        temp = next_temp
To check data behavior, let’s plot it.
import matplotlib.pyplot as plt
plt.plot(list(range(len(data))), data)
plt.show()

enter image description here

Modeling

The main purposes are following two points.

  • review the time series from the basic points
  • try to check how to do that on Edward and TensorFlow
I'll adapt the local level model here. This is very simple model for time series analysis and can be expressed as followings.
  • pattern 1




We can also express those above as followings.
  • pattern 2


I’ll write the model, following the pattern 2.
From the model part to the inference part, I wrote the code below. As I wrote on Overview, this doesn’t work as I expected. If anyone knows the reason, please let me know on comment.

import tensorflow as tf
import edward as ed
from edward.models import Normal,Gamma

x = [0] * len(data)
u = [0] * len(data)

s_x = Gamma(tf.ones([1]), tf.ones([1]))
s_u = Gamma(tf.ones([1]), tf.ones([1]))

# start point
u[0] = Normal(loc=tf.zeros([1]), scale=tf.ones([1]))
x[0] = Normal(loc=u[0], scale=s_x)

for i in range(1, len(data)):
    u[i] = Normal(loc=u[i-1], scale=s_u)
    x[i] = Normal(loc=u[i], scale=s_x)

# inference
q_x = [Normal(loc=tf.Variable(tf.random_normal([1])), scale=tf.nn.softplus(tf.random_normal([1]))) for _ in range(len(data))]
q_u = [Normal(loc=tf.Variable(tf.random_normal([1])), scale=tf.nn.softplus(tf.random_normal([1]))) for _ in range(len(data))]
q_s_x = Gamma(tf.nn.softplus(tf.random_normal([1])), tf.nn.softplus(tf.random_normal([1])))
q_s_u = Gamma(tf.nn.softplus(tf.random_normal([1])), tf.nn.softplus(tf.random_normal([1])))

# variational feed
variational_feed = {}
variational_x_feed = {x_elem: q_x_elem for x_elem, q_x_elem in zip(x, q_x)}
variational_u_feed = {u_elem: q_u_elem for u_elem, q_u_elem in zip(u, q_u)}

variational_feed.update(variational_x_feed)
variational_feed.update(variational_u_feed)
variational_feed.update({s_x: q_s_x, s_u: q_s_u})

# data feed
data_feed = {x_true: np.array(x_observed).reshape([1,]).astype(np.float32) for x_true, x_observed in zip(x, data)}

# execute
inference = ed.KLqp(variational_feed, data_feed)
inference.run(n_iter=1000)

Time series modeling’s specific point is here. Although it looks confusing, the point is just that is the unobserved level at time t and it can be genereated by . The observed point follows normal distribution whose mean is the .
We need to write the start point separately.

x = [0] * len(data)
u = [0] * len(data)

s_x = Gamma(tf.ones([1]), tf.ones([1]))
s_u = Gamma(tf.ones([1]), tf.ones([1]))

# start point
u[0] = Normal(loc=tf.zeros([1]), scale=tf.ones([1]))
x[0] = Normal(loc=u[0], scale=s_x)

for i in range(1, len(data)):
    u[i] = Normal(loc=u[i-1], scale=s_u)
    x[i] = Normal(loc=u[i], scale=s_x)

Anyway, by executing the code,

1000/1000 [100%] ██████████████████████████████ Elapsed: 89s | Loss: nan

Sampled points


To check how it worked, I just tried to get sampled points of .

samples = []
for _ in range(10):
    u_sample = []
    for u_elem in q_u:
        u_sample.append(u_elem.sample().eval())

    samples.append(u_sample)

But if I check the sampled points.

print(samples[0])
[array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32), array([ nan], dtype=float32)]

All the sampled points are just nan. Actually, I don’t know the reason. Now I’m investigating the points.