Vector Autoregression (VAR) Models

A vector autoregression (VAR) of order \(p\), often abbreviated as VAR(\(p\)), is the following data-generating process (DGP):

\[y_t = \upsilon + A_1 y_{t-1} + \ldots + A_p y_{t-p} + u_t \, ,\]

for \(t = 0, 1, 2, \ldots\), where \(y_t = (y_{1t}, \ldots, y_{Kt})'\) is a (\(K \times 1\)) random vector of observed data, the \(A_i\) are fixed (\(K \times K\)) coefficient matrices, \(\upsilon = (\upsilon_1 , \ldots , \upsilon_K)'\) is a fixed (\(K \times 1\)) vector of intercept terms, and \(u_t = (u_{1t} , \ldots , u_{Kt})'\) is a \(K\)-dimensional innovation process with \(E(u_t) = 0\), \(E(u_t u_t') = \Sigma_u\), and \(E(u_t u_s') = 0\) for \(s \neq t\). Simply put, a VAR(\(p\)) is a model of the DGP underlying some random data vector \(y_t\) for all \(t\) as a function of \(1, \ldots , p\) of its own lags, along with identically and independently distributed (iid) innovations.

Any given VAR(\(p\)) process has an equivalent VAR(1) representation:

\[Y_t = \boldsymbol{\upsilon} + \boldsymbol{A} Y_{t-1} + U_t \, ,\]

where

\[Y_t = \begin{bmatrix} y_t \\ y_{t-1} \\ \vdots \\ y_{t-p+1} \end{bmatrix} \, ,\] \[\boldsymbol{\upsilon} = \begin{bmatrix} \upsilon \\ 0 \\ \vdots \\ 0 \end{bmatrix} \, ,\] \[A = \begin{bmatrix} A_1 & A_2 & \ldots & A_{p-1} & A_p \\ I_K & 0 & \ldots & 0 & 0 \\ 0 & I_K & & 0 & 0 \\ \vdots & & \ddots & \vdots & \vdots \\ 0 & 0 & \ldots & I_K & 0 \end{bmatrix} \, ,\]

and

\[U_t = \begin{bmatrix} u_t \\ 0 \\ \vdots \\ 0 \end{bmatrix} \, .\]

By the above ubiquitous formulation, any given VAR(\(p\)) is stable if \(\text{det}(I_{Kp} - \boldsymbol{A}z) \neq 0\) for \(|z| \leq 1\). In other words, if all eigenvalues of \(\boldsymbol{A}\) live within the complex unit circle, we may express the VAR(1) model as

\[Y_t = \boldsymbol{\mu} + \sum_{i=0}^\infty \boldsymbol{A}^i U_{t-i} \, ,\]

where \(\boldsymbol{\mu} = E(Y_t) = (I_{Kp} - \boldsymbol{A})^{-1} \boldsymbol{\upsilon}\), \(\Gamma_Y(h) = \sum_{i=0}^\infty \boldsymbol{A}^{h+i} \Sigma_U (\boldsymbol{A}^i)'\), and \(\frac{\partial Y_t}{U_{t-i}} = \boldsymbol{A}^i \rightarrow 0\) as \(i \rightarrow \infty\). Intuitively, this means that the impulse response of \(Y_t\) to innovations converges to zero over time. Furthermore, a stable VAR(\(p\)) process is stationary – its first and second moments are time invariant.

VAR(\(p\)) models may be estimated using a variety of statistical methods, with one of the most popular approaches being multivariate least squares estimation. Suppose we observe a sample time series \(y_1, \ldots, y_T\), along with \(p\) presample values for each variable (effectively a combined sample size of \(T+p\)). Define

\[Y = (y_1, \ldots, y_T) \, ,\] \[B = (\upsilon, A_1, \ldots, A_p) \, ,\] \[Z_t = \begin{bmatrix} 1 \\ y_t \\ \vdots \\ y_{t-p+1} \end{bmatrix} \, ,\] \[Z = (Z_0 , \ldots, Z_{T-1}) \, ,\] \[U = (u_1, \ldots, u_T) \, ,\] \[\boldsymbol{y} = \text{vec}(Y) \,\] \[\boldsymbol{\beta} = \text{vec}(B) \,\] \[\boldsymbol{b} = \text{vec}(B') \, ,\] \[\boldsymbol{u} = \text{vec}(U) \, .\]

Using the above notvation, we may express any given VAR(\(p\)) model as

\[Y = BZ + U \, ,\]

or equivalently as

\[\text{vec}(Y) = \text{vec}(B Z) + \text{vec}(U) = (Z' \otimes I_K) \text{vec}(B) + \text{vec}(U) \, ,\]

\[\boldsymbol{y} = (Z' \otimes I_K) \boldsymbol{\beta} + \boldsymbol{u} \, ,\]

with the covariance matrix of \(\boldsymbol{u}\) being \(\Sigma_{\boldsymbol{u}} = I_t \otimes \Sigma_u\).

It can be shown that the least-squares (LS) estimator for the given model is

\[\widehat{\boldsymbol{b}} = \text{vec}(\widehat{B}') = (I_K \otimes (Z Z')^{-1} Z) \text{vec}(Y') \, ,\]

which is equivalent to separately estimating each of the \(K\) equations in the standard formulation of a VAR(\(p\)) model using OLS.

It can also be shown that if \(y_t\) is stable with standard white noise disturbances, we can ues the \(t\)-ratios provided by common regression programs in setting up confidence intervals and tests for individual coefficients. These \(t\)-statistics can be obtained by dividing the elements of \(\widehat{B}\) by square roots of the corresponding diagonal elements of \((Z Z')^{-1} \otimes \widehat{\Sigma}_u\).

Keep in Mind

VARs are often used for impulse response analysis, which is plagued with a multitude of identification limitations that you can read about in Lütkepohl’s (2005) and Kilian and Lütkepohl’s (2017) textbooks. VARs are reduced-form models – it is necessary to impose structural restrictions to identify the relevant innovations and impulse responses.

Implementations

R

Begin by loading relevant packages. dplyr provides us with data manipulation capabilities, lubridate allows us to generate and work with date data, and vars contains VAR-related tools.

if (!require("pacman")) install.packages("pacman")
library(pacman)
p_load(dplyr, lubridate, vars)

Then we create an arbitrary dataset containing two different time series. The actual relationship between these time series is irrelevant for this demonstration – the focus is on estimating VARs.

gdp   <- read.csv("https://github.com/LOST-STATS/lost-stats.github.io/raw/source/Time_Series/Data/GDPC1.csv")
fdefx <- read.csv("https://github.com/LOST-STATS/lost-stats.github.io/raw/source/Time_Series/Data/FDEFX.csv")
data  <- inner_join(gdp, fdefx) %>% # Join the two data sources into a single data frame
          mutate(DATE = as.Date(DATE), 
                 GDPC1 = log(GDPC1), # log GDPC1
                 FDEFX = log(FDEFX)) # log FDEFX

We may use the vars::VARselect function to obtain optimal lag orders under a variety of information criteria. Notice that we are excluding the date vector when inputting the data into VARselect.

lagorders <- VARselect(data[,c("GDPC1","FDEFX")])$selection
lagorders

Now we estimate the VAR by defaulting to the Akaike Information Criterium (AIC) optimal lag order. We include an intercept in the model by passing the type = "const" argument inside of VAR.

lagorder <- lagorders[1]
estim <- VAR(data[,c("GDPC1","FDEFX")], p = lagorder, type = "const")

Print the estimated VAR roots – we must make sure that the VAR is stable (all roots lie within the unit circle).

summary(estim)$roots

Regardless of stability issues, we are able to generate the non-cumulative impulse response function of FDEFX responding to an orthogonal shock to GDPC1.

irf <- irf(estim, impulse = "FDEFX", response = "GDPC1")
plot(irf)

Lastly, we may also generate forecast error variance decompositions.

fevd <- fevd(estim)
plot(fevd)