Link Search Menu Expand Document

Vector Autoregression (VAR) Models

A vector autoregression (VAR) of order \(p\), often abbreviated as VAR(\(p\)), is the following data-generating process (DGP):

\[y_t = \upsilon + A_1 y_{t-1} + \ldots + A_p y_{t-p} + u_t \, ,\]

for \(t = 0, 1, 2, \ldots\), where \(y_t = (y_{1t}, \ldots, y_{Kt})'\) is a (\(K \times 1\)) random vector of observed data, the \(A_i\) are fixed (\(K \times K\)) coefficient matrices, \(\upsilon = (\upsilon_1 , \ldots , \upsilon_K)'\) is a fixed (\(K \times 1\)) vector of intercept terms, and \(u_t = (u_{1t} , \ldots , u_{Kt})'\) is a \(K\)-dimensional innovation process with \(E(u_t) = 0\), \(E(u_t u_t') = \Sigma_u\), and \(E(u_t u_s') = 0\) for \(s \neq t\). Simply put, a VAR(\(p\)) is a model of the DGP underlying some random data vector \(y_t\) for all \(t\) as a function of \(1, \ldots , p\) of its own lags, along with identically and independently distributed (iid) innovations.

Any given VAR(\(p\)) process has an equivalent VAR(1) representation:

\[Y_t = \boldsymbol{\upsilon} + \boldsymbol{A} Y_{t-1} + U_t \, ,\]


\[Y_t = \begin{bmatrix} y_t \\ y_{t-1} \\ \vdots \\ y_{t-p+1} \end{bmatrix} \, ,\] \[\boldsymbol{\upsilon} = \begin{bmatrix} \upsilon \\ 0 \\ \vdots \\ 0 \end{bmatrix} \, ,\] \[A = \begin{bmatrix} A_1 & A_2 & \ldots & A_{p-1} & A_p \\ I_K & 0 & \ldots & 0 & 0 \\ 0 & I_K & & 0 & 0 \\ \vdots & & \ddots & \vdots & \vdots \\ 0 & 0 & \ldots & I_K & 0 \end{bmatrix} \, ,\]


\[U_t = \begin{bmatrix} u_t \\ 0 \\ \vdots \\ 0 \end{bmatrix} \, .\]

By the above ubiquitous formulation, any given VAR(\(p\)) is stable if \(\text{det}(I_{Kp} - \boldsymbol{A}z) \neq 0\) for \(|z| \leq 1\). In other words, if all eigenvalues of \(\boldsymbol{A}\) live within the complex unit circle, we may express the VAR(1) model as

\[Y_t = \boldsymbol{\mu} + \sum_{i=0}^\infty \boldsymbol{A}^i U_{t-i} \, ,\]

where \(\boldsymbol{\mu} = E(Y_t) = (I_{Kp} - \boldsymbol{A})^{-1} \boldsymbol{\upsilon}\), \(\Gamma_Y(h) = \sum_{i=0}^\infty \boldsymbol{A}^{h+i} \Sigma_U (\boldsymbol{A}^i)'\), and \(\frac{\partial Y_t}{U_{t-i}} = \boldsymbol{A}^i \rightarrow 0\) as \(i \rightarrow \infty\). Intuitively, this means that the impulse response of \(Y_t\) to innovations converges to zero over time. Furthermore, a stable VAR(\(p\)) process is stationary – its first and second moments are time invariant.

VAR(\(p\)) models may be estimated using a variety of statistical methods, with one of the most popular approaches being multivariate least squares estimation. Suppose we observe a sample time series \(y_1, \ldots, y_T\), along with \(p\) presample values for each variable (effectively a combined sample size of \(T+p\)). Define

\[Y = (y_1, \ldots, y_T) \, ,\] \[B = (\upsilon, A_1, \ldots, A_p) \, ,\] \[Z_t = \begin{bmatrix} 1 \\ y_t \\ \vdots \\ y_{t-p+1} \end{bmatrix} \, ,\] \[Z = (Z_0 , \ldots, Z_{T-1}) \, ,\] \[U = (u_1, \ldots, u_T) \, ,\] \[\boldsymbol{y} = \text{vec}(Y) \,\] \[\boldsymbol{\beta} = \text{vec}(B) \,\] \[\boldsymbol{b} = \text{vec}(B') \, ,\] \[\boldsymbol{u} = \text{vec}(U) \, .\]

Using the above notvation, we may express any given VAR(\(p\)) model as

\[Y = BZ + U \, ,\]

or equivalently as

\[\text{vec}(Y) = \text{vec}(B Z) + \text{vec}(U) = (Z' \otimes I_K) \text{vec}(B) + \text{vec}(U) \, ,\]


\[\boldsymbol{y} = (Z' \otimes I_K) \boldsymbol{\beta} + \boldsymbol{u} \, ,\]

with the covariance matrix of \(\boldsymbol{u}\) being \(\Sigma_{\boldsymbol{u}} = I_t \otimes \Sigma_u\).

It can be shown that the least-squares (LS) estimator for the given model is

\[\widehat{\boldsymbol{b}} = \text{vec}(\widehat{B}') = (I_K \otimes (Z Z')^{-1} Z) \text{vec}(Y') \, ,\]

which is equivalent to separately estimating each of the \(K\) equations in the standard formulation of a VAR(\(p\)) model using OLS.

It can also be shown that if \(y_t\) is stable with standard white noise disturbances, we can ues the \(t\)-ratios provided by common regression programs in setting up confidence intervals and tests for individual coefficients. These \(t\)-statistics can be obtained by dividing the elements of \(\widehat{B}\) by square roots of the corresponding diagonal elements of \((Z Z')^{-1} \otimes \widehat{\Sigma}_u\).

Keep in Mind

  • VARs are often used for impulse response analysis, which is plagued with a multitude of identification limitations that you can read about in Lütkepohl’s (2005) and Kilian and Lütkepohl’s (2017) textbooks. VARs are reduced-form models – it is necessary to impose structural restrictions to identify the relevant innovations and impulse responses.



Begin by loading relevant packages. dplyr provides us with data manipulation capabilities, lubridate allows us to generate and work with date data, and vars contains VAR-related tools.

if (!require("pacman")) install.packages("pacman")
p_load(dplyr, lubridate, vars)

Then we create an arbitrary dataset containing two different time series. The actual relationship between these time series is irrelevant for this demonstration – the focus is on estimating VARs.

gdp   <- read.csv("")
fdefx <- read.csv("")
data  <- inner_join(gdp, fdefx) %>% # Join the two data sources into a single data frame
          mutate(DATE = as.Date(DATE), 
                 GDPC1 = log(GDPC1), # log GDPC1
                 FDEFX = log(FDEFX)) # log FDEFX

We may use the vars::VARselect function to obtain optimal lag orders under a variety of information criteria. Notice that we are excluding the date vector when inputting the data into VARselect.

lagorders <- VARselect(data[,c("GDPC1","FDEFX")])$selection

Now we estimate the VAR by defaulting to the Akaike Information Criterium (AIC) optimal lag order. We include an intercept in the model by passing the type = "const" argument inside of VAR.

lagorder <- lagorders[1]
estim <- VAR(data[,c("GDPC1","FDEFX")], p = lagorder, type = "const")

Print the estimated VAR roots – we must make sure that the VAR is stable (all roots lie within the unit circle).


Regardless of stability issues, we are able to generate the non-cumulative impulse response function of FDEFX responding to an orthogonal shock to GDPC1.

irf <- irf(estim, impulse = "FDEFX", response = "GDPC1")

Lastly, we may also generate forecast error variance decompositions.

fevd <- fevd(estim)