Time Series Modeling - Part I (Theoretical Background)

Apart from clssification and regression problems, Time Series models are a separte entity in itself which are not easily tackled by standard methods and algorithms (well, it can be after some smart tweaks). The main aim of a time series analysis is to forecast future values of a variable using its past values. Time series models are also very business friendly, and directly solve some business problems like “What will be my stores sales in the nest two months” or “How many customers are going to come in my pizz store tomorrow, so that I can optimize my ingredients”

Trends and Seasonality are two important component of Time Series.

Stationary Time Series

All the modelling techniques discussed are based on the assumption that our time series is weakly stationary. In case, if a non-stationary series is encountered it is first converted to a weakly stationary series and then modeling is done on it.

A series $x_t$ is said to be a weakly stationary series if it satisfies the following properties –

The mean, $E(x_t)$ is same for all t.
Variance of $x_t$ is same for all t.

The covariance and correlation between $x_t$ and $x_{t-h}$ depends only on the lag, h. Hence, the covariance and correlation between $x_t$ and $x_{t-h}$ is same for all t.

Let’s put it simple, practitioners say that the stationary time-series is the one with no trend - fluctuates around the constant mean and has constant variance. In stationary processes, shocks are temporary and dissipate (lose energy) over time. After a while, they do not contribute to the new time-series values. For example, something which happened log time ago (long enough) such as World War II, had an impact, but, it the time-series today is the same as if World War II never happened, we would say that shock lost its energy or dissipated. Stationarity is especially important as many classical econometric theories are derived under the assumptions of stationarity.

Some examples of non-stationary series –

1. a series with a continual upward trend.
2. a series with a distinct seasonal pattern.

AutoRegressive (AR) Models

An autoregressive (AR) model assumes that the present value of a time series variable can be explained as a function of its past values.

$x_t = \phi_1x_{t-1} + \phi_2x_{t-2} + ……………….. + \phi_px_{t-p} + w_t$

where, $w_t$ is the error term and $x_t$ is stationary.

An AR model is said to be of order p if it is dependent on p past values and is denoted as AR(p).

An important thing to note about AR models is that they are not the same as the standard linear regression models because the data in this case is not necessarily independent and not necessarily identically distributed.

INTUITION

Lets discuss a practical example to get a feel about the kind of information an AR model incorporates.

Consider, the number of blankets sold in a city. On a particular day, the temperature dropped below normal and there was an increase in the sale of blankets ($x_{t-1}$). The next day, the temperature went back to normal but there was still a significant demand of blankets ($x_t$). This could be due to the fact that the number of blankets sold depends on the current temperature but it is also affected by the past sale of blankets. This situation can be expressed as –

$xt = \phi_1x_{t-1} + w_t$

Moving Averages (MA) Models

In moving average (MA) models, the present value of a time series is explained as a linear representation of the past error terms.

$x_t = \theta_1w_{t-1} + \theta_2w_{t-2} + ……………….. + \theta_qw_{t-q} + w_t$

where $w_k$ is the error at time k, $x_t$ is stationary and mean of the series is 0.

An MA model is said to be of order q if it is dependent on q past error terms and is denoted as MA(q).

INTUITION

Consider a car manufacturer who manufactured 10000 special edition cars. This edition was a success and he managed to sell all of them (lets call this $x_{t-1}$). But there were some 1500 customers who could not purchase this car as it went out of stock (lets call this as $w_{t-1}$). Some of these 1500 customers settled buying some other car but some returned the next month when this special edition car was back in stock. Mathematically, the above scenario can be depicted as,

$x_t =\theta_1w_{t-1} + w_t$

It is usually difficult to guess a suitable model by just looking at the data. We need certain techniques to come up with a suitable forecasting model. Understanding the autocorrelation function and the partial autocorrelation function is an important step in time series modelling.

Autocorrelation Function (ACF)

As discussed earlier, for a stationary series the autocorrelation between $x_t$ and $x_{t-h}$ depends only on the difference (lag) of the two measurements.

Therefore, for a time series the autocorrelation function is a function of lag, h. It gives the correlation between two time dependent random variables with a separation of h time frames.

ACF plots are used to infer the type and the order of the models that can be suitable for a particular forecasting problem.

Autocorrelation Function of an AR(p) model

The autocorrelation function of an AR model dampens exponentially as h increases. At times, the exponential decay can also be sinusoidal in nature.

In practice, a sample won’t usually provide such a clear pattern. But such an exponential decay is usually indicative of an AR model. But ACF plots cannot tell you the order of the AR model. To determine the order, we use the partial autocorrelation function plot, but more on that later.

Autocorrelation of an MA(q) model

The autocorrelation function of an MA(q) model cuts off at lag q. It means that it will have a finite value for lags, h ≤ q.

If the ACF plot has such a characteristic, we can decipher the order of the MA model as well. Again, in practice, a sample won’t usually provide such a clear pattern. But a resemblance to such a plot would suggest an MA model.

PARTIAL AUTOCORRELATION FUNCTION (PACF)

We can understand the order of an MA(q) model by looking at its ACF plot. But this is not feasible with an AR(p) model. Hence, we use the partial autocorrelation function for this. It is the correlation between $x_t$ and $x_s$ with the linear effect of everything in the middle removed.

Let us consider an AR(1) model, $x_t = \phi_1x_{t-1} + w_t$ . We know, that the correlation between $x_t$ and $x_{t-2}$ is not zero , because $x_t$ is dependent on $x{t-2}$ through $x_{t-1}$ . But what if we break this chain of dependence by removing the effect of $x_{t-1}$. That is, we consider the correlation between $x_t − \phi x_{t-1}$ and $x_{t-2} − \phi x_{t-1}$ , because it is the correlation between $x_t$ and $x_{t-2}$ with the linear dependence of each on $x_{t-1}$ removed. In this way, we have broken the dependence chain between $x_t$ and $x_{t-2}$. Hence,

$cov(x_t −\phi x_{t-1}, x_{t-2} − \phi x_{t-1}) = cov(w_t, x_{t-2} − \phi x_{t−1}) = 0$

PACF of an AR(p) model

The PACF for an AR(p) model cuts off after p lags.

PACF of an MA(q) model

Similar to the ACF of an AR(p) model, the PACF of an MA(q) model tails off as the lag increases.

What are these horizontal blue lines in the ACF and PACF plots?

For those who are not comfortable with inferential statistics – We always work with a sample data. We cannot find out true population parameters. Therefore, we find out sample statistics (in this case the sample autocorrelation) to estimate the true population parameter (or, the true autocorrelation). The blue lines denote the confidence interval of our estimate. If the autocorrelation value in the plot is inside the blue lines we assume it to be zero (statistically insignificant). You would not have to worry about calculating the confidence intervals as most of the software packages you use would do it for you.

For those who are comfortable with inferential statistics – We take,

null hypothesis : autocorrelation for a particular lag, ρ(h) = 0

alternate hypothesis : ρ(h) ≠ 0

The blue lines represent the ±2 standard errors region. We reject the null hypothesis if our sample estimate is outside this boundary.

AUTOREGRESSIVE MOVING AVERAGE (ARMA) MODELS

Most often using only AR or MA models does not give the best results. Hence, we use ARMA models. These models incorporate the autoregressive as well as the moving average terms. An ARMA model can be represented as,

$x_t = \phi_1x_{t-1} + \phi_2x_{t-2} + ….. + \phi_px_{t-p} + \theta_1w_{t-1} + \theta_2w_{t-2} + ….. + \theta_qw_{t-q}+ wt$

An ARMA model dependent on p past values and q past error terms is denoted as ARMA(p,q) .

Behavior of the ACF and PACF for ARMA Models –What are these horizontal blue lines in the ACF and PACF plots?

For those who are not comfortable with inferential statistics – We always work with a sample data. We cannot find out true population parameters. Therefore, we find out sample statistics (in this case the sample autocorrelation) to estimate the true population parameter (or, the true autocorrelation). The blue lines denote the confidence interval of our estimate. If the autocorrelation value in the plot is inside the blue lines we assume it to be zero (statistically insignificant). You would not have to worry about calculating the confidence intervals as most of the software packages you use would do it for you.

For those who are comfortable with inferential statistics – We take,

null hypothesis : autocorrelation for a particular lag, ρ(h) = 0

alternate hypothesis : ρ(h) ≠ 0

The blue lines represent the ±2 standard errors region. We reject the null hypothesis if our sample estimate is outside this boundary.

AUTOREGRESSIVE MOVING AVERAGE (ARMA) MODELS

Most often using only AR or MA models does not give the best results. Hence, we use ARMA models. These models incorporate the autoregressive as well as the moving average terms. An ARMA model can be represented as,

$x_t = \phi_1x_{t-1} + \phi_2x_{t-2} + ….. + \phi_p x_{t-p} + \theta_1w_{t-1} + \theta_2w_{t-2} + ….. + \theta_qw_{t-q} + w_t$

An ARMA model dependent on p past values and q past error terms is denoted as ARMA(p,q) .

Behavior of the ACF and PACF for ARMA Models –

DIFFERENCING Till now we have only talked about stationary series. But what if we encounter a non-stationary series? Well, as I mentioned earlier, we will have to come with strategies to stationarize our time series.

Differencing is one such and perhaps the most common strategy to stationarize non-stationary series. Consider,

$x_t = μ + φx_{t-1} + w_t$ ………… (1)

$x_{t-1} = μ + φx_{t-2} + w_{t-1}$ ………… (2)

Subtracting (2) from (1) we get,

$x_{t-1} – x_t= φ(x_{t-2} – x_{t-1}) + w_{t-1} – w_t$

Here, we removed a linear trend in the data by doing a first order differencing. After fitting a model on the differenced terms, we can always retrieve the actual terms to get their forecasted values.

Example of a second order differencing,

$(x_t – x_{t-1}) – (x_{t-1} – x_{t-2})$

AUTOREGRESSIVE INTEGRATED MOVING AVERAGE (ARIMA) MODELS

They are nothing but ARMA models applied after differencing a time series. In most of the software packages, the elements in the model are specified in the order –

(AR order, differencing order, MA order)

For example,

MA(2) => ARIMA(0, 0, 2)
ARMA(1,3) => ARIMA(1, 0, 3)
AR(1), differencing(1), MA(2) => ARIMA(1, 1, 2)

Source:

Disclaimer - a lot of material in this post has been shamelessly copied from the awesome post from Yashuseth’s Blog. Since it was already well written, I did not want to write it again.

Other Sources:
https://towardsdatascience.com/time-series-analysis-in-python-an-introduction-70d5a5b1d52a
The awesome guys at Analytics Vidhya
Jason Brownlee -1
Jason Brownlee -2

Written on February 23, 2018
[ ]