Comments on the Lab

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.3.2
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.3.2
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(ggfortify)
library(astsa)
## Warning: package 'astsa' was built under R version 3.3.2
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.2
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:astsa':
## 
##     gas
## The following object is masked from 'package:ggfortify':
## 
##     gglagplot
library(tseries)
## Warning: package 'tseries' was built under R version 3.3.2
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select

Exercise 1: Data deaths

We’ll work with the deaths of package MASS.

data(deaths)
  1. Raw data
  1. Automatic selection

We’ll show in the next questions that this first model is not satisfactory.

  1. By hand selection (algorithm of slide 120)

The following code generates 4 files with plot of series and acf for lags between 1 and 16 (uncomment it)

# path_fig = "yourpath"
# 
# for (k in 1:4){
# plots =list()
#  for(d in 1:4) {
#   plots[[(2*(d-1))+1]] = autoplot(diff(deaths, d+4*(k-1)), main="",xlab=paste("lag",d+4*(k-1)),ylab="")
#   plots[[(2*(d-1))+2]]  = ggAcf(diff(deaths, d+4*(k-1)), main="",xlab=paste("lag",d+4*(k-1)) ,ylab="")
#  }
# g <- arrangeGrob(grobs = plots, ncol=2) #generates g
# filename = paste(c("serieslag",4*(k-1)+1,"_",4*k,".pdf"),collapse="")
# ggsave(filename,g,path=path_fig, width=11, height=8.5) ###grob
# }
  1. Diagnostic tests on the residuals
  1. Via the stl function.

Exercise 2: Data Nottingam

We’ll work with the nottem.

  1. Construct (you use the window function)
  1. The nottem_train series
  1. Building a model on the nottem_train series with the stl function.
  1. Build a sarima model on the nottem_train series, predict, compute the mse and the BIC.

  2. Compare the two models in terms of BIC and MSE. Conclude.

Challenge: walkers series

You’ll find in the folder “series” 20 series corresponding to measures of acceleration for 10 walkers with an Android smartphone positioned in the chest pocket. All participants walked over a predefined path. (more details can be found on https://archive.ics.uci.edu/ml/datasets/User+Identification+From+Walking+Activity. Caution : I’ve sampled and preprocessed the original data).

The records have been done every 0.03sec.

For walkers 1 and 2, you’ll find

For walkers 4,6,7,8,9,10,11,12, you’ll find

Instructions for the challenge

Your task is to affect each series of the test to a walker of the train set. Towards that end, you’ll need to construct SARIMA models for each series and measure the distance between two series as the distance between the estimated parameters of the fitted SARIMA models.

You have to send me your code in a .pdf or html file (which results from clicking the Knit button) not a .Rmd file. The last chunk of you code has to display your affectation results as follows

walkers = c(1,2,4,6,7,8,9,10,11,12)
aleatoire = c( 1, 2 , 146 , 141 , 124 , 53 , 18 , 48  , 77 , 45) ## here I chose randomly....
myresults = matrix(c(walkers,aleatoire),ncol=2)
colnames(myresults) = c("train","test")
print(myresults)
##       train test
##  [1,]     1    1
##  [2,]     2    2
##  [3,]     4  146
##  [4,]     6  141
##  [5,]     7  124
##  [6,]     8   53
##  [7,]     9   18
##  [8,]    10   48
##  [9,]    11   77
## [10,]    12   45

Here is a code that can help you find the frequencies and set them.

# series1 = ts(walker_records,start=1,deltat = 0.03)
# start(series1)
# end(series1)
# frequency(series1)
# deltat(series1)
# 
# freq = findfrequency(series1)
# 
# series1_train = ts(walker_records,start=1, frequency = freq)