deaths
We’ll work with the deaths
of package MASS
.
data(deaths)
Give some basic statistics on the series.
Visualize the data, its ACF and PACF.
Is the series stationary in terms of trend, variance, seasonality ? Explain how do you reach these conclusions.
Do an automatic model selection via the auto.arima
function and call this model model_auto
What is surprising concerning the orders chosen ?
We’ll show in the next questions that this first model is not satisfactory.
The following code generates 4 files with plot of series and acf for lags between 1 and 16 (uncomment it)
# path_fig = "yourpath"
#
# for (k in 1:4){
# plots =list()
# for(d in 1:4) {
# plots[[(2*(d-1))+1]] = autoplot(diff(deaths, d+4*(k-1)), main="",xlab=paste("lag",d+4*(k-1)),ylab="")
# plots[[(2*(d-1))+2]] = ggAcf(diff(deaths, d+4*(k-1)), main="",xlab=paste("lag",d+4*(k-1)) ,ylab="")
# }
# g <- arrangeGrob(grobs = plots, ncol=2) #generates g
# filename = paste(c("serieslag",4*(k-1)+1,"_",4*k,".pdf"),collapse="")
# ggsave(filename,g,path=path_fig, width=11, height=8.5) ###grob
# }
In the SARIMA model ARIMA\((p,d,q) \times (P,D,Q)_s\)}, which parameters are now fixed and what are their values ?
We want to fit a sarima model to the seasonally differentiated series. Execute the following code, should it be a difference between the two calls to the auto.arima
function ?
Which model do you choose ? Why ?
Differentiate the previous series (\(d=1\)) and fit a sarima model, called model_D1d1
. Did you ameliorate the previous model ?
model_auto
, model_D1
, model_D1d1
stl
function.Apply the stl
function to the deaths
series, get the remainder. Fit a sarima model to the remainder with auto.arima
Why the model chosen by the automatic selection shows that the stl
function did not decompose well the deaths
series ?
We’ll work with the nottem
.
window
function)a train called nottem_train
by removing the last two years of observations
and a test nottem_test
with the last three years of observations
nottem_train
seriesnottem_train
series with the stl
function.Apply the stl
decomposition to the nottem_train
series and create a new series nottem_train_seasonrem
by removing the seasonal part of the decomposition.
Visualize the new series, its ACF and PACF. Which arima model(s) would you suggest ?
Verify if the stl
function did a good job removing the seasonality.
Fit a arima model (called model_stl_seasonrem
) to the series nottem_train_seasonrem
and verify if the model(s) you suggested earlier are competitive.
Predict from model_stl_seasonrem
the future values from January 1937 to December 1939. Caution, do not forget the seasonality that you remove earlier.
Compare the predictions to the actual values (in series nottem_train
) by calculating the mean squared error.
How many parameters has this model. Deduce its BIC.
Build a sarima model on the nottem_train
series, predict, compute the mse and the BIC.
Compare the two models in terms of BIC and MSE. Conclude.
You’ll find in the folder “series” 20 series corresponding to measures of acceleration for 10 walkers with an Android smartphone positioned in the chest pocket. All participants walked over a predefined path. (more details can be found on https://archive.ics.uci.edu/ml/datasets/User+Identification+From+Walking+Activity. Caution : I’ve sampled and preprocessed the original data).
The records have been done every 0.03sec.
For walkers 1 and 2, you’ll find
a train series (named “series_train1.csv” and “series_train2.csv”), corresponding to 30sec of acceleration records and
a train series (named “series_test1.csv” and “series_test2.csv”), corresponding to the next 30sec of acceleration records
For walkers 4,6,7,8,9,10,11,12, you’ll find
a train series (named “series_train4.csv”, “series_train6.csv”, etc), corresponding to 30sec of acceleration records and
a train series corresponding to the next 30sec of acceleration records, I’ve randomly affected a number to each of these walkers, so you’ll find file named “series_test18.csv”, “series_test45.csv”, etc
Your task is to affect each series of the test to a walker of the train set. Towards that end, you’ll need to construct SARIMA models for each series and measure the distance between two series as the distance between the estimated parameters of the fitted SARIMA models.
You have to send me your code in a .pdf or html file (which results from clicking the Knit button) not a .Rmd file. The last chunk of you code has to display your affectation results as follows
walkers = c(1,2,4,6,7,8,9,10,11,12)
aleatoire = c( 1, 2 , 146 , 141 , 124 , 53 , 18 , 48 , 77 , 45) ## here I chose randomly....
myresults = matrix(c(walkers,aleatoire),ncol=2)
colnames(myresults) = c("train","test")
print(myresults)
## train test
## [1,] 1 1
## [2,] 2 2
## [3,] 4 146
## [4,] 6 141
## [5,] 7 124
## [6,] 8 53
## [7,] 9 18
## [8,] 10 48
## [9,] 11 77
## [10,] 12 45
Here is a code that can help you find the frequencies and set them.
# series1 = ts(walker_records,start=1,deltat = 0.03)
# start(series1)
# end(series1)
# frequency(series1)
# deltat(series1)
#
# freq = findfrequency(series1)
#
# series1_train = ts(walker_records,start=1, frequency = freq)
Comments on the Lab
deaths
Nottingam
You have to send me your lab results (Exercise 1 to 2) at the end of this lab at agathe.guilloux@math.cnrs.fr.
Name the files you send me with your name (“yourname.pdf”)
The challenge has to be sent before the 1th of April.
Caution: I’ll only accept a .pdf or html file (which results from clicking the Knit button) not a .Rmd file
You’ll need the following R packages: