# Vector Autoregression (VAR) Models

A vector autoregression (VAR) of order $$p$$, often abbreviated as VAR($$p$$), is the following data-generating process (DGP):

$y_t = \upsilon + A_1 y_{t-1} + \ldots + A_p y_{t-p} + u_t \, ,$

for $$t = 0, 1, 2, \ldots$$, where $$y_t = (y_{1t}, \ldots, y_{Kt})'$$ is a ($$K \times 1$$) random vector of observed data, the $$A_i$$ are fixed ($$K \times K$$) coefficient matrices, $$\upsilon = (\upsilon_1 , \ldots , \upsilon_K)'$$ is a fixed ($$K \times 1$$) vector of intercept terms, and $$u_t = (u_{1t} , \ldots , u_{Kt})'$$ is a $$K$$-dimensional innovation process with $$E(u_t) = 0$$, $$E(u_t u_t') = \Sigma_u$$, and $$E(u_t u_s') = 0$$ for $$s \neq t$$. Simply put, a VAR($$p$$) is a model of the DGP underlying some random data vector $$y_t$$ for all $$t$$ as a function of $$1, \ldots , p$$ of its own lags, along with identically and independently distributed (iid) innovations.

Any given VAR($$p$$) process has an equivalent VAR(1) representation:

$Y_t = \boldsymbol{\upsilon} + \boldsymbol{A} Y_{t-1} + U_t \, ,$

where

$Y_t = \begin{bmatrix} y_t \\ y_{t-1} \\ \vdots \\ y_{t-p+1} \end{bmatrix} \, ,$ $\boldsymbol{\upsilon} = \begin{bmatrix} \upsilon \\ 0 \\ \vdots \\ 0 \end{bmatrix} \, ,$ $A = \begin{bmatrix} A_1 & A_2 & \ldots & A_{p-1} & A_p \\ I_K & 0 & \ldots & 0 & 0 \\ 0 & I_K & & 0 & 0 \\ \vdots & & \ddots & \vdots & \vdots \\ 0 & 0 & \ldots & I_K & 0 \end{bmatrix} \, ,$

and

$U_t = \begin{bmatrix} u_t \\ 0 \\ \vdots \\ 0 \end{bmatrix} \, .$

By the above ubiquitous formulation, any given VAR($$p$$) is stable if $$\text{det}(I_{Kp} - \boldsymbol{A}z) \neq 0$$ for $$|z| \leq 1$$. In other words, if all eigenvalues of $$\boldsymbol{A}$$ live within the complex unit circle, we may express the VAR(1) model as

$Y_t = \boldsymbol{\mu} + \sum_{i=0}^\infty \boldsymbol{A}^i U_{t-i} \, ,$

where $$\boldsymbol{\mu} = E(Y_t) = (I_{Kp} - \boldsymbol{A})^{-1} \boldsymbol{\upsilon}$$, $$\Gamma_Y(h) = \sum_{i=0}^\infty \boldsymbol{A}^{h+i} \Sigma_U (\boldsymbol{A}^i)'$$, and $$\frac{\partial Y_t}{U_{t-i}} = \boldsymbol{A}^i \rightarrow 0$$ as $$i \rightarrow \infty$$. Intuitively, this means that the impulse response of $$Y_t$$ to innovations converges to zero over time. Furthermore, a stable VAR($$p$$) process is stationary – its first and second moments are time invariant.

VAR($$p$$) models may be estimated using a variety of statistical methods, with one of the most popular approaches being multivariate least squares estimation. Suppose we observe a sample time series $$y_1, \ldots, y_T$$, along with $$p$$ presample values for each variable (effectively a combined sample size of $$T+p$$). Define

$Y = (y_1, \ldots, y_T) \, ,$ $B = (\upsilon, A_1, \ldots, A_p) \, ,$ $Z_t = \begin{bmatrix} 1 \\ y_t \\ \vdots \\ y_{t-p+1} \end{bmatrix} \, ,$ $Z = (Z_0 , \ldots, Z_{T-1}) \, ,$ $U = (u_1, \ldots, u_T) \, ,$ $\boldsymbol{y} = \text{vec}(Y) \,$ $\boldsymbol{\beta} = \text{vec}(B) \,$ $\boldsymbol{b} = \text{vec}(B') \, ,$ $\boldsymbol{u} = \text{vec}(U) \, .$

Using the above notvation, we may express any given VAR($$p$$) model as

$Y = BZ + U \, ,$

or equivalently as

$\text{vec}(Y) = \text{vec}(B Z) + \text{vec}(U) = (Z' \otimes I_K) \text{vec}(B) + \text{vec}(U) \, ,$

or

$\boldsymbol{y} = (Z' \otimes I_K) \boldsymbol{\beta} + \boldsymbol{u} \, ,$

with the covariance matrix of $$\boldsymbol{u}$$ being $$\Sigma_{\boldsymbol{u}} = I_t \otimes \Sigma_u$$.

It can be shown that the least-squares (LS) estimator for the given model is

$\widehat{\boldsymbol{b}} = \text{vec}(\widehat{B}') = (I_K \otimes (Z Z')^{-1} Z) \text{vec}(Y') \, ,$

which is equivalent to separately estimating each of the $$K$$ equations in the standard formulation of a VAR($$p$$) model using OLS.

It can also be shown that if $$y_t$$ is stable with standard white noise disturbances, we can ues the $$t$$-ratios provided by common regression programs in setting up confidence intervals and tests for individual coefficients. These $$t$$-statistics can be obtained by dividing the elements of $$\widehat{B}$$ by square roots of the corresponding diagonal elements of $$(Z Z')^{-1} \otimes \widehat{\Sigma}_u$$.

# Keep in Mind

• VARs are often used for impulse response analysis, which is plagued with a multitude of identification limitations that you can read about in Lütkepohl’s (2005) and Kilian and Lütkepohl’s (2017) textbooks. VARs are reduced-form models – it is necessary to impose structural restrictions to identify the relevant innovations and impulse responses.

# Implementations

## R

Begin by loading relevant packages. dplyr provides us with data manipulation capabilities, lubridate allows us to generate and work with date data, and vars contains VAR-related tools.

if (!require("pacman")) install.packages("pacman")
library(pacman)


Then we create an arbitrary dataset containing two different time series. The actual relationship between these time series is irrelevant for this demonstration – the focus is on estimating VARs.

gdp   <- read.csv("https://github.com/LOST-STATS/lost-stats.github.io/raw/source/Time_Series/Data/GDPC1.csv")
data  <- inner_join(gdp, fdefx) %>% # Join the two data sources into a single data frame
mutate(DATE = as.Date(DATE),
GDPC1 = log(GDPC1), # log GDPC1
FDEFX = log(FDEFX)) # log FDEFX


We may use the vars::VARselect function to obtain optimal lag orders under a variety of information criteria. Notice that we are excluding the date vector when inputting the data into VARselect.

lagorders <- VARselect(data[,c("GDPC1","FDEFX")])$selection lagorders  Now we estimate the VAR by defaulting to the Akaike Information Criterium (AIC) optimal lag order. We include an intercept in the model by passing the type = "const" argument inside of VAR. lagorder <- lagorders[1] estim <- VAR(data[,c("GDPC1","FDEFX")], p = lagorder, type = "const")  Print the estimated VAR roots – we must make sure that the VAR is stable (all roots lie within the unit circle). summary(estim)$roots


Regardless of stability issues, we are able to generate the non-cumulative impulse response function of FDEFX responding to an orthogonal shock to GDPC1.

irf <- irf(estim, impulse = "FDEFX", response = "GDPC1")
plot(irf)


Lastly, we may also generate forecast error variance decompositions.

fevd <- fevd(estim)
plot(fevd)