Data
I'll use the air passengers data. This is typical time series data and we can get this by the link below.Let's import necessary libraries and load data. As you can see, the data is very simple. The first column is the time information and the second column is the number of passengers.
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
data_loc = './data/AirPassengers.csv'
data = pd.read_csv(data_loc)
print(data.head())
Month #Passengers
0 1949-01 112
1 1949-02 118
2 1949-03 132
3 1949-04 129
4 1949-05 121
My target is just the number of passengers. So from now on, I'll focus on the columns “#Passengers”. To check the data behavior, we can visualize it.
ts_data = data['#Passengers']
plt.plot(ts_data)
plt.show()
Check autocorrelation
By statsmodels library, we can check the autocorrelation and plot it.To check the autocorrelation and partial autocorrelation, we can use following functions.
# autocorrelation
print(sm.graphics.tsa.acf(ts_data, nlags=40))
# partial autocorrelation
print(sm.graphics.tsa.acf(ts_data, nlags=40))
The outputs are long. So I don’t show here. About visualization, the statsmodels library has functions.
First, autocorrelation.
sm.graphics.tsa.plot_acf(ts_data, lags=40)
plt.show()
Second, partial autocorrelation.
sm.graphics.tsa.plot_pacf(ts_data, lags=40)
plt.show()