Sunday, January 28, 2018

How to check autocorrelation on Python

To time series data, we usually check autocorrelation. As a memo, I’ll write down how to get the autocorrelation and the plot of it on Python.



Data

I'll use the air passengers data. This is typical time series data and we can get this by the link below.
Let's import necessary libraries and load data. As you can see, the data is very simple. The first column is the time information and the second column is the number of passengers.

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

data_loc = './data/AirPassengers.csv'
data = pd.read_csv(data_loc)

print(data.head())
    Month  #Passengers
0  1949-01          112
1  1949-02          118
2  1949-03          132
3  1949-04          129
4  1949-05          121

My target is just the number of passengers. So from now on, I'll focus on the columns “#Passengers”. To check the data behavior, we can visualize it.

ts_data = data['#Passengers']
plt.plot(ts_data)
plt.show()
enter image description here

Check autocorrelation

By statsmodels library, we can check the autocorrelation and plot it.
To check the autocorrelation and partial autocorrelation, we can use following functions.

# autocorrelation
print(sm.graphics.tsa.acf(ts_data, nlags=40))
# partial autocorrelation
print(sm.graphics.tsa.acf(ts_data, nlags=40))

The outputs are long. So I don’t show here. About visualization, the statsmodels library has functions.
First, autocorrelation.

sm.graphics.tsa.plot_acf(ts_data, lags=40)
plt.show()

enter image description here

Second, partial autocorrelation.

sm.graphics.tsa.plot_pacf(ts_data, lags=40)
plt.show()

enter image description here