Creating a Time Series Dataset
Timeseries estimators are, by definition, a function of the temporal ordering of the observations in the estimation sample. So a number of programmed timeseries econometric routines can only be used if the software is instructed ahead of time that it is working with a timeseries dataset.
Keep in Mind

Date data can be notoriously difficult to work with. Be sure before declaring your data set as a time series that your date variable has been imported properly.

As an example, we will use data on U.S. quarterly real Gross Domestic Product (GDP). To get an Excel spreadsheet holding the GDP data, go to the Saint Louis Federal Reserve Bank FRED website.
Implementations
Python
pandas supports time series data. Here is an example which downloads quarterly data, casts the date column (read in as an object
series) as a datetime
series, and creates a yearquarter column.
import pandas as pd
# Read in data
gdp = pd.read_csv("https://github.com/LOSTSTATS/loststats.github.io/raw/source/Time_Series/Data/GDPC1.csv")
# Convert date column to be of data type datetime64
gdp['DATE'] = pd.to_datetime(gdp['DATE'])
# Create a column with quarteryear combinations
gdp['yrqtr'] = gdp['DATE'].apply(lambda x: str(x.year) + '' + str(x.quarter))
R
There are many different kinds of time series data set objects in R. Instead of Rbased time series objects such as ts
, zoo
and xts
, here we will use tsibble, will preserves time indices as the essential data column and makes heterogeneous data structures possible.
The tsibble package extends the tidyverse to temporal data and built on top of the tibble
, and so is a data and modeloriented object.
For more detail information for using tsibble such as key and index, check the tsibble page and the Introduction to tsibble.
STEP 1) Load necessary packages
# If necessary
# install.packages(c('tsibble','tidyverse'))
library(tsibble)
library(tidyverse)
STEP 2) Import data into R.
gdp < read.csv("https://github.com/LOSTSTATS/loststats.github.io/raw/source/Time_Series/Data/GDPC1.csv")
# read.csv() has read in our date variable as a factor. We need a date!
gdp$DATE < as.Date(gdp$DATE)
# If it were a little less wellbehaved than this, we could use the lubridate package to fix it.
STEP 3) Convert a date variable formats to quarter
gdp_ts < as_tsibble(gdp,
index = DATE,
regular = FALSE) %>%
index_by(qtr = ~ yearquarter(.))
By applying yearmonth()
to the index variable (referred to as .
), it creates new variable named qtr
with a quarter interval which corresponds to the yearquarter for the original variable DATE
.
Since the tsibble
handles regularlyspaced temporal data whereas our data (GDPC1
) has an irregular time interval (since itâ€™s not the exact same number of days between quarters every time), we set the option regular = FALSE
.
Now, we have a quarterly timeseries dataset with the new variable date
.
References for more information:
 If you want to learn how to build various types of timeseries forecasting models, Forecasting: Principles and Practice provides very useful information to deal with timeseries data in R.
 If you need more detail information on tssible, visit the tsibble page or tsibble on RDRR.io.
 The fable packages provides a collection of commonly used univariate and multivariate timeseries forecasting models. For more information, visit fable.
Stata
STEP 1) Import Data to Stata
import delimited "https://github.com/LOSTSTATS/loststats.github.io/raw/source/Time_Series/Data/GDPC1.csv", clear
STEP 2) Generate the new date variable
generate date_index = tq(1947q1) + _n1
The function tq()
converts a date variable for each of the above formats to an integer value (starting point of our data is 1947q1
).
_n
is a Stata command gives the index number of the current row.
STEP 3) Index the new variable format as quarter
format date_index %tq
This command will format date_index
as a vector of quarterly dates which corresponds to our original date variable observation date
.
STEP 4) Convert a variable into timeseries data
tsset date_index
Now, we have a quarterly Stata timeseries dataset. Any data you add to this file in the future will be interpreted as timeseries data.