1 Abstract

In phonetics many data sets are encountered which deal with dynamic data collected over time. Examples include dynamic formant trajectories, or tongue position trajectories obtained via electromagnetic articulography. Traditional approaches to analyzing this type of data generally aggregate data over a certain timespan, or only include measurements at a certain fixed time point (e.g., formant measurements at the midpoint of a vowel). While these types of analyses are relatively easy to understand and conduct, I argue in this paper for a more elaborate approach, generalized additive modeling, which is able to take into account the non-linear patterns over time while simultaneously taking into account subject and item-related variability. I will illustrate its use in this tutorial using articulatory trajectories from L1 and L2 speakers of English. All data and R code is made available for readers to replicate the analysis presented in this paper.

Journal: Submitted (August, 2017) to Journal of Phonetics

Preprint: http://www.martijnwieling.nl/files/GAM-tutorial-Wieling.pdf

Keywords: Generalized additive modeling; Tutorial; Articulography; Second language acquisition

## Generated on: August 18, 2017 - 12:16:45

2 Libraries and functions

The following commands load the necessary functions and libraries and show the version information.

# install packages if not yet installed
packages <- c("mgcv","itsadug","lme4")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

# load required packages
library(mgcv)
library(itsadug)
library(lme4)

# version information
R.version.string
## [1] "R version 3.4.1 (2017-06-30)"
cat(paste('mgcv version:',packageVersion('mgcv')))
## mgcv version: 1.8.18
cat(paste('itsadug version:',packageVersion('itsadug')))
## itsadug version: 2.2

3 Dataset

The following shows the columns of the full dataset and their explanation.

if (!file.exists('full.rda')) { 
    download.file('http://www.let.rug.nl/wieling/Tutorial/full.rda','full.rda') # 2 MB
}

load('full.rda')

3.1 Column names

The dataset consists of 126177 rows and 13 columns with the following column names:

colnames(full)
##  [1] "Speaker"     "Lang"        "Word"        "Sound"       "Loc"        
##  [6] "RecBlock"    "SeqNr"       "Trial"       "Time"        "Pos"        
## [11] "Pos01"       "start.event" "IsEN"

3.2 Data description

  • Speaker – ID of the speaker
  • Lang – Native language of the speaker ("NL" for Dutch, or "EN" for English)
  • Word – The label of the word
  • Sound – The sound contrast ("TH" for words with the dental fricative, "T" for words with the stop
  • Loc – The location where in the word the sound contrasts occurs ("FRONT" when it occurs at the beginning of the word or "BACK" when it occurs at the back of the word
  • Trial – The trial number of the word for each speaker
  • Time – The normalized (between 0: beginning of the word, to 1: end of the word)
  • Pos – The standardized (mean 0, standard deviation 1) position for each speaker of the T1 sensor in the anterior-posterior direction (higher values, more anterior)

4 Comparing ‘faith’ and ‘fate’

In the following, several models are fitted for the two words ‘faith’ and ‘fate’.

words = c('fate','faith')
dat = droplevels(full[full$Word %in% words,])
dat$Word = relevel(dat$Word,'fate') # fate is set as the reference level

4.1 Fitting a linear model

m1 <- bam(Pos ~ Word, data=dat, method="fREML")
(smry1 <- summary(m1))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.12164    0.01047   11.62   <2e-16 ***
## Wordfaith    0.35773    0.01461   24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.0453   Deviance explained = 4.54%
## -REML =  15417  Scale est. = 0.673     n = 12622

4.2 Fitting a non-linear model

m2 <- bam(Pos ~ Word + s(Time, by=Word, bs="tp", k=10), data=dat)
(smry2 <- summary(m2))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word + s(Time, by = Word, bs = "tp", k = 10)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.121657   0.009722   12.51   <2e-16 ***
## Wordfaith   0.358124   0.013569   26.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df      F p-value    
## s(Time):Wordfate  6.002  7.161  34.32  <2e-16 ***
## s(Time):Wordfaith 7.576  8.492 208.07  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.177   Deviance explained = 17.7%
## fREML =  14509  Scale est. = 0.58049   n = 12622

4.3 Visualizing the results

par(mfrow=c(1,2))
plot_smooth(m2, view="Time", plot_all="Word", main='', rug=FALSE) 
## Summary:
##  * Word : factor; set to the value(s): faith, fate. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000.
plot_diff(m2, view="Time", comp=list(Word=c("faith","fate")))
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000.

## 
## Time window(s) of significant difference(s):
##  0.030303 - 0.070707
##  0.353535 - 1.000000

4.4 Complexity necessary?

4.4.1 Model comparison

m2a.ml <- bam(Pos ~ Word + s(Time), data=dat, method="ML")
m2b.ml <- bam(Pos ~ Word + s(Time, by=Word), data=dat, 
              method="ML")
compareML(m2a.ml, m2b.ml)
## m2a.ml: Pos ~ Word + s(Time)
## 
## m2b.ml: Pos ~ Word + s(Time, by = Word)
## 
## Chi-square test of ML scores
## -----
##    Model    Score Edf   Chisq    Df  p.value Sig.
## 1 m2a.ml 14739.41   4                            
## 2 m2b.ml 14499.11   6 240.297 2.000  < 2e-16  ***
## 
## AIC difference: 492.78, model m2b.ml has lower AIC.

4.4.2 Binary difference smooth

dat$IsFaith <- (dat$Word == "faith")*1

m2.bin <- bam(Pos ~ s(Time) + s(Time,by=IsFaith), data=dat)
(smry2bin <- summary(m2.bin))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ s(Time) + s(Time, by = IsFaith)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.121631   0.009722   12.51   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                   edf Ref.df      F p-value    
## s(Time)         6.391  7.447  34.27  <2e-16 ***
## s(Time):IsFaith 7.211  8.278 146.27  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.177   Deviance explained = 17.7%
## fREML =  14506  Scale est. = 0.58047   n = 12622
plot(m2.bin, select=2, shade=TRUE)
abline(h=0)

4.4.3 Ordered factor difference

dat$WordO <- as.ordered(dat$Word)
contrasts(dat$WordO) <- "contr.treatment"

m2.ord <- bam(Pos ~ s(Time) + s(Time,by=WordO) + WordO, data=dat)
(smrym2.ord <- summary(m2.ord))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ s(Time) + s(Time, by = WordO) + WordO
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.121714   0.009722   12.52   <2e-16 ***
## WordOfaith  0.358044   0.013569   26.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                      edf Ref.df     F p-value    
## s(Time)            6.392  7.448 34.27  <2e-16 ***
## s(Time):WordOfaith 6.211  7.278 71.06  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.177   Deviance explained = 17.7%
## fREML =  14506  Scale est. = 0.58047   n = 12622
plot(m2.ord, select=2, shade=TRUE)
abline(h=0)

4.5 Model criticism

gam.check(m2)

## 
## Method: fREML   Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-1.773701e-06,1.668991e-06]
## (score 14508.79 & scale 0.5804887).
## Hessian positive definite, eigenvalue range [2.621649,6309.003].
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                     k'  edf k-index p-value
## s(Time):Wordfate  9.00 6.00    1.02    0.91
## s(Time):Wordfaith 9.00 7.58    1.02    0.92

4.6 Mixed-effects

4.6.1 Random intercepts

m3 <- bam(Pos ~ Word + s(Time, by=Word) + s(Speaker,bs="re"), data=dat)
(smrym3 <- summary(m3))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word + s(Time, by = Word) + s(Speaker, bs = "re")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.13811    0.06204   2.226    0.026 *  
## Wordfaith    0.34339    0.01185  28.973   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                      edf Ref.df      F p-value    
## s(Time):Wordfate   6.311  7.463  43.85  <2e-16 ***
## s(Time):Wordfaith  7.808  8.633 272.18  <2e-16 ***
## s(Speaker)        40.608 41.000 100.21  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.379   Deviance explained = 38.2%
## fREML =  12827  Scale est. = 0.43781   n = 12622
par(mfrow=c(1,2))
plot_smooth(m3, view="Time", plot_all="Word", main='m3', rug=FALSE, rm.ranef=T,ylim=c(-0.3,1.3)) 
## Summary:
##  * Word : factor; set to the value(s): faith, fate. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Speaker : factor; set to the value(s): VENI_EN_10. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Speaker)
## 
plot_diff(m3, view="Time", comp=list(Word=c("faith","fate")), rm.ranef=T,ylim=c(-0.3,1.3))
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * Speaker : factor; set to the value(s): VENI_EN_10. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Speaker)
## 

## 
## Time window(s) of significant difference(s):
##  0.363636 - 1.000000

4.6.2 Random slopes

m4 <- bam(Pos ~ Word + s(Time, by=Word) + s(Speaker,bs="re") + s(Speaker,Word,bs="re"), data=dat)
(smrym4 <- summary(m4))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word + s(Time, by = Word) + s(Speaker, bs = "re") + s(Speaker, 
##     Word, bs = "re")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.13244    0.07528   1.759   0.0785 .  
## Wordfaith    0.34841    0.08645   4.030 5.61e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                      edf Ref.df       F p-value    
## s(Time):Wordfate   6.517  7.655   51.72 < 2e-16 ***
## s(Time):Wordfaith  7.947  8.708  325.85 < 2e-16 ***
## s(Speaker)        20.818 41.000 2029.10 0.00336 ** 
## s(Speaker,Word)   60.177 82.000  958.37 0.00580 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.485   Deviance explained = 48.9%
## fREML =  11733  Scale est. = 0.36292   n = 12622
par(mfrow=c(1,2))
plot_smooth(m4, view="Time", plot_all="Word", main='m4', rug=FALSE, rm.ranef=T,ylim=c(-0.3,1.3)) 
## Summary:
##  * Word : factor; set to the value(s): faith, fate. (Might be canceled as random effect, check below.) 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Speaker : factor; set to the value(s): VENI_EN_10. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Speaker),s(Speaker,Word)
## 
plot_diff(m4, view="Time", comp=list(Word=c("faith","fate")), rm.ranef=T,ylim=c(-0.3,1.3))
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * Speaker : factor; set to the value(s): VENI_EN_10. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Speaker),s(Speaker,Word)
## 

## 
## Time window(s) of significant difference(s):
##  0.494949 - 1.000000

4.6.3 Factor smooths

m5 <- bam(Pos ~ Word + s(Time, by=Word) + s(Speaker,Word,bs="re") + s(Time,Speaker,bs="fs",m=1), data=dat)
(smrym5 <- summary(m5))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word + s(Time, by = Word) + s(Speaker, Word, bs = "re") + 
##     s(Time, Speaker, bs = "fs", m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.11687    0.08898   1.313    0.189    
## Wordfaith    0.34868    0.08638   4.036 5.46e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                       edf  Ref.df       F  p-value    
## s(Time):Wordfate    5.764   6.491   8.828 4.61e-10 ***
## s(Time):Wordfaith   7.724   8.350  26.307  < 2e-16 ***
## s(Speaker,Word)    55.966  82.000  46.419  < 2e-16 ***
## s(Time,Speaker)   295.111 377.000 923.320 1.77e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.646   Deviance explained = 65.6%
## fREML = 9756.4  Scale est. = 0.24977   n = 12622
plot(m5, select=4)

compareML(m4,m5)
## m4: Pos ~ Word + s(Time, by = Word) + s(Speaker, bs = "re") + s(Speaker, 
##     Word, bs = "re")
## 
## m5: Pos ~ Word + s(Time, by = Word) + s(Speaker, Word, bs = "re") + 
##     s(Time, Speaker, bs = "fs", m = 1)
## 
## Chi-square test of fREML scores
## -----
##   Model     Score Edf    Chisq    Df  p.value Sig.
## 1    m4 11732.635   8                             
## 2    m5  9756.358   9 1976.277 1.000  < 2e-16  ***
## 
## AIC difference: 4451.54, model m5 has lower AIC.
par(mfrow=c(1,2))
plot_smooth(m5, view="Time", plot_all="Word", main='m5', rug=FALSE, rm.ranef=T,ylim=c(-0.3,1.3)) 
## Summary:
##  * Word : factor; set to the value(s): faith, fate. (Might be canceled as random effect, check below.) 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Speaker : factor; set to the value(s): VENI_EN_10. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Speaker,Word),s(Time,Speaker)
## 
plot_diff(m5, view="Time", comp=list(Word=c("faith","fate")), rm.ranef=T,ylim=c(-0.3,1.3))
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * Speaker : factor; set to the value(s): VENI_EN_10. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Speaker,Word),s(Time,Speaker)
## 

## 
## Time window(s) of significant difference(s):
##  0.494949 - 1.000000
dat$SpeakerWord = interaction(dat$Speaker, dat$Word)
m6 <- bam(Pos ~ Word + s(Time, by=Word) + s(Time,SpeakerWord,bs="fs",m=1), data=dat)
(smrym6 <- summary(m6))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word + s(Time, by = Word) + s(Time, SpeakerWord, bs = "fs", 
##     m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.11373    0.09482   1.199  0.23038   
## Wordfaith    0.35832    0.13431   2.668  0.00764 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf  Ref.df      F  p-value    
## s(Time):Wordfate      5.648   6.167  6.652 4.32e-07 ***
## s(Time):Wordfaith     7.547   7.948 17.765  < 2e-16 ***
## s(Time,SpeakerWord) 637.237 754.000 42.642  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.768   Deviance explained =   78%
## fREML =   7526  Scale est. = 0.1635    n = 12622
compareML(m5,m6)
## m5: Pos ~ Word + s(Time, by = Word) + s(Speaker, Word, bs = "re") + 
##     s(Time, Speaker, bs = "fs", m = 1)
## 
## m6: Pos ~ Word + s(Time, by = Word) + s(Time, SpeakerWord, bs = "fs", 
##     m = 1)
## 
## Model m6 preferred: lower fREML score (2230.374), and lower df (1.000).
## -----
##   Model    Score Edf Difference    Df
## 1    m5 9756.358   9                 
## 2    m6 7525.983   8  -2230.374 1.000
## 
## AIC difference: 5074.96, model m6 has lower AIC.
par(mfrow=c(1,2))
plot_smooth(m6, view="Time", plot_all="Word", main='m6', rug=FALSE, rm.ranef=T,ylim=c(-0.3,1.3)) 
## Summary:
##  * Word : factor; set to the value(s): faith, fate. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
plot_diff(m6, view="Time", comp=list(Word=c("faith","fate")), rm.ranef=T,ylim=c(-0.3,1.3))
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 

## 
## Time window(s) of significant difference(s):
##  0.555556 - 1.000000

4.7 Autocorrelation

m6acf <- acf_resid(m6)

dat <- start_event(dat,event=c("Speaker","Trial"))
## Warning in start_event(dat, event = c("Speaker", "Trial")): Column
## start.event already exists, will be overwritten.
m7 <- bam(Pos ~ Word + s(Time, by=Word) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=m6acf[2], AR.start=dat$start.event) 

acf_resid(m7)

m7.newrho <- bam(Pos ~ Word + s(Time, by=Word) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event) 

acf_resid(m7.newrho)

(smrym7.newrho <- summary(m7.newrho))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word + s(Time, by = Word) + s(Time, SpeakerWord, bs = "fs", 
##     m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.07044    0.15659    0.45    0.653
## Wordfaith    0.32085    0.21981    1.46    0.144
## 
## Approximate significance of smooth terms:
##                         edf  Ref.df      F  p-value    
## s(Time):Wordfate      6.598   6.976  6.414 2.77e-07 ***
## s(Time):Wordfaith     7.935   8.140 20.629  < 2e-16 ***
## s(Time,SpeakerWord) 593.721 754.000  7.835  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.343   Deviance explained = 37.4%
## fREML = -15907  Scale est. = 1.915     n = 12622
par(mfrow=c(1,2))
plot_smooth(m7.newrho, view="Time", plot_all="Word", main='m7.newrho', rug=FALSE, rm.ranef=T,ylim=c(-0.3,1.3)) 
## Summary:
##  * Word : factor; set to the value(s): faith, fate. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
plot_diff(m7.newrho, view="Time", comp=list(Word=c("faith","fate")), rm.ranef=T,ylim=c(-0.3,1.3))
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 

## 
## Time window(s) of significant difference(s):
##  0.626263 - 1.000000

4.8 Tensor product interaction

system.time(m8 <- bam(Pos ~ Word + te(Time, Trial, by=Word) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event))
##    user  system elapsed 
##   32.51    3.29   35.85
(smrym8 <- summary(m8))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word + te(Time, Trial, by = Word) + s(Time, SpeakerWord, 
##     bs = "fs", m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.04445    0.15573   0.285    0.775
## Wordfaith    0.35149    0.21879   1.606    0.108
## 
## Approximate significance of smooth terms:
##                              edf  Ref.df      F  p-value    
## te(Time,Trial):Wordfate    7.848   8.282  5.025 2.52e-06 ***
## te(Time,Trial):Wordfaith  16.509  18.392 13.133  < 2e-16 ***
## s(Time,SpeakerWord)      597.741 754.000  7.694  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.355   Deviance explained = 38.7%
## fREML = -15912  Scale est. = 1.9127    n = 12622
par(mfrow=c(2,2))
fvisgam(m8, view=c("Time","Trial"), cond=list(Word=c("fate")), main='m8: fate', rm.ranef=T,zlim=c(-0.3,1.1),color='gray') 
## Summary:
##  * Word : factor; set to the value(s): fate. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; with 30 values ranging from 53.000000 to 466.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
fvisgam(m8, view=c("Time","Trial"), cond=list(Word=c("faith")), main='m8: faith', rm.ranef=T,zlim=c(-0.3,1.1), color='gray') 
## Summary:
##  * Word : factor; set to the value(s): faith. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; with 30 values ranging from 53.000000 to 466.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
plot_diff2(m8, view=c("Time","Trial"), comp=list(Word=c("faith","fate")), rm.ranef=T, main='Difference between faith and fate',zlim=c(-0.3,1.1), color='gray')
## Summary:
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; with 30 values ranging from 53.000000 to 466.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 

par(mfcol=c(3,2))
plot_diff2(m8, view=c("Time","Trial"), comp=list(Word=c("faith","fate")), rm.ranef=T, main='Difference between faith and fate',zlim=c(-0.3,1.1), color='gray')
## Summary:
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; with 30 values ranging from 53.000000 to 466.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
abline(h=400,lty=2,lwd=2,col='white')
plot_diff2(m8, view=c("Time","Trial"), comp=list(Word=c("faith","fate")), rm.ranef=T, main='Difference between faith and fate',zlim=c(-0.3,1.1), color='gray')
## Summary:
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; with 30 values ranging from 53.000000 to 466.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
abline(h=300,lty=2,lwd=2,col='white')
plot_diff2(m8, view=c("Time","Trial"), comp=list(Word=c("faith","fate")), rm.ranef=T, main='Difference between faith and fate',zlim=c(-0.3,1.1), color='gray')
## Summary:
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; with 30 values ranging from 53.000000 to 466.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
abline(h=200,lty=2,lwd=2,col='white')
plot_diff(m8, view="Time", comp=list(Word=c("faith","fate")), cond=list(Trial=400), rm.ranef=T, main='Difference for trial 400')
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; set to the value(s): 400. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
## 
## Time window(s) of significant difference(s):
##  0.737374 - 0.868687
plot_diff(m8, view="Time", comp=list(Word=c("faith","fate")), cond=list(Trial=300), rm.ranef=T, main='Difference for trial 300')
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; set to the value(s): 300. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
## 
## Time window(s) of significant difference(s):
##  0.575758 - 0.939394
plot_diff(m8, view="Time", comp=list(Word=c("faith","fate")), cond=list(Trial=200), rm.ranef=T, main='Difference for trial 200')
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * Trial : numeric predictor; set to the value(s): 200. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 

## 
## Time window(s) of significant difference(s):
##  0.636364 - 1.000000

4.9 Decomposition of tensor

m8.dc <- bam(Pos ~ Word + s(Time, by=Word) + s(Trial, by=Word) + ti(Time, Trial, by=Word) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event)
(smrym8.dc <- summary(m8.dc))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Word + s(Time, by = Word) + s(Trial, by = Word) + ti(Time, 
##     Trial, by = Word) + s(Time, SpeakerWord, bs = "fs", m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.06768    0.15665   0.432    0.666
## Wordfaith    0.32746    0.21989   1.489    0.136
## 
## Approximate significance of smooth terms:
##                              edf  Ref.df      F  p-value    
## s(Time):Wordfate           6.603   6.980  6.420 2.55e-07 ***
## s(Time):Wordfaith          7.936   8.140 20.807  < 2e-16 ***
## s(Trial):Wordfate          1.000   1.000  0.060    0.806    
## s(Trial):Wordfaith         1.000   1.000  0.023    0.880    
## ti(Time,Trial):Wordfate    1.001   1.002  0.132    0.716    
## ti(Time,Trial):Wordfaith   9.141  11.335  4.754 1.91e-07 ***
## s(Time,SpeakerWord)      593.372 754.000  7.730  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.35   Deviance explained = 38.2%
## fREML = -15923  Scale est. = 1.9067    n = 12622
pvisgam(m8.dc, view=c("Time","Trial"), select=6, color="gray", zlim=c(-0.3,0.4), main="m8.dc: ti for faith")
## [1] "Tensor(s) to be plotted: ti(Time,Trial):Wordfaith"

4.10 Including language difference

dat$WordLang <- interaction(dat$Word, dat$Lang)

m9 <- bam(Pos ~ WordLang + s(Time, by=WordLang) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event)
(smrym9 <- summary(m9))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ WordLang + s(Time, by = WordLang) + s(Time, SpeakerWord, 
##     bs = "fs", m = 1)
## 
## Parametric coefficients:
##                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)       0.03183    0.21184   0.150    0.881
## WordLangfaith.EN  0.42836    0.30009   1.427    0.153
## WordLangfate.NL   0.07542    0.31370   0.240    0.810
## WordLangfaith.NL  0.29328    0.30770   0.953    0.341
## 
## Approximate significance of smooth terms:
##                              edf  Ref.df      F  p-value    
## s(Time):WordLangfate.EN    5.552   5.999  3.775 0.000924 ***
## s(Time):WordLangfaith.EN   7.509   7.790 14.554  < 2e-16 ***
## s(Time):WordLangfate.NL    6.967   7.316  7.029 2.99e-06 ***
## s(Time):WordLangfaith.NL   7.369   7.650  4.080 0.000182 ***
## s(Time,SpeakerWord)      583.691 752.000  7.382  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.347   Deviance explained = 37.9%
## fREML = -15909  Scale est. = 1.9134    n = 12622
par(mfrow=c(2,2))
plot_smooth(m9, view="Time", cond=list(WordLang=c("fate.EN")), main='m9: English speakers', rug=FALSE, rm.ranef=T,ylim=c(-0.6,1.7), col=rainbow(2)[1]) 
## Summary:
##  * WordLang : factor; set to the value(s): fate.EN. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
plot_smooth(m9, view="Time", cond=list(WordLang=c("faith.EN")), rm.ranef=T,add=T, col=rainbow(2)[2], rug=FALSE) 
## Summary:
##  * WordLang : factor; set to the value(s): faith.EN. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
# add legend (from code of plot_smooth with parameter plot_all) 
gfc <- getFigCoords()
legend(gfc[2], gfc[4], legend = c('fate','faith'), text.col = rainbow(2), text.font = 2, xjust = 1, yjust = 1, bty = "n", xpd = TRUE)

plot_diff(m9, view="Time", comp=list(WordLang=c("faith.EN","fate.EN")), rm.ranef=T,ylim=c(-0.6,1.7), main='Difference between faith and fate')
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
## 
## Time window(s) of significant difference(s):
##  0.626263 - 1.000000
plot_smooth(m9, view="Time", cond=list(WordLang=c("fate.NL")), main='m9: Dutch speakers', rug=FALSE, rm.ranef=T,ylim=c(-0.6,1.7), col=rainbow(2)[1]) 
## Summary:
##  * WordLang : factor; set to the value(s): fate.NL. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
plot_smooth(m9, view="Time", cond=list(WordLang=c("faith.NL")), rm.ranef=T,add=T, col=rainbow(2)[2], rug=FALSE) 
## Summary:
##  * WordLang : factor; set to the value(s): faith.NL. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
# add legend
gfc <- getFigCoords()
legend(gfc[2], gfc[4], legend = c('fate','faith'), text.col = rainbow(2), text.font = 2, xjust = 1, yjust = 1, bty = "n", xpd = TRUE)

plot_diff(m9, view="Time", comp=list(WordLang=c("faith.NL","fate.NL")), rm.ranef=T,ylim=c(-0.6,1.7), main='Difference between faith and fate')
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 

## 
## Difference is not significant.

4.10.1 Using an ordered factor

dat$ENFaithO <- as.ordered(dat$Lang == "EN" & dat$Word == "faith") 
contrasts(dat$ENFaithO) <- "contr.treatment"

dat$NLFaithO <- as.ordered(dat$Lang == "NL" & dat$Word == "faith") 
contrasts(dat$NLFaithO) <- "contr.treatment"

m9.ord <- bam(Pos ~ Lang + ENFaithO + NLFaithO + s(Time,by=Lang) + s(Time,by=ENFaithO) + s(Time,by=NLFaithO) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event) 
(smrym9.ord <- summary(m9.ord))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Lang + ENFaithO + NLFaithO + s(Time, by = Lang) + s(Time, 
##     by = ENFaithO) + s(Time, by = NLFaithO) + s(Time, SpeakerWord, 
##     bs = "fs", m = 1)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.02804    0.21151   0.133    0.895
## LangNL        0.07205    0.31305   0.230    0.818
## ENFaithOTRUE  0.43838    0.29932   1.465    0.143
## NLFaithOTRUE  0.22638    0.32064   0.706    0.480
## 
## Approximate significance of smooth terms:
##                          edf  Ref.df     F  p-value    
## s(Time):LangEN         5.873   6.271 3.995 0.000444 ***
## s(Time):LangNL         6.693   7.038 5.740 2.76e-06 ***
## s(Time):ENFaithOTRUE   6.764   7.098 5.219 8.55e-06 ***
## s(Time):NLFaithOTRUE   7.095   7.384 0.992 0.392697    
## s(Time,SpeakerWord)  584.871 752.000 7.437  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.348   Deviance explained =   38%
## fREML = -15919  Scale est. = 1.91      n = 12622
par(mfrow=c(1,2))
plot(m9.ord,select=3,shade=T,rug=F, ylim=c(-1.1,1.2), main='m9.ord: English difference')
abline(h=0)
plot(m9.ord,select=4,shade=T,rug=F, ylim=c(-1.1,1.2), main='m9.ord: Dutch difference')
abline(h=0)

4.10.2 Using a binary smooth

dat$IsENFaith <- (dat$Lang == "EN" & dat$Word == "faith")*1 
dat$IsNLFaith <- (dat$Lang == "NL" & dat$Word == "faith")*1 
m9.bin <- bam(Pos ~ Lang + s(Time,by=Lang) + s(Time,by=IsENFaith) + s(Time,by=IsNLFaith) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event) 
(smrym9.bin <- summary(m9.bin))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ Lang + s(Time, by = Lang) + s(Time, by = IsENFaith) + s(Time, 
##     by = IsNLFaith) + s(Time, SpeakerWord, bs = "fs", m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.02791    0.21151   0.132    0.895
## LangNL       0.07203    0.31304   0.230    0.818
## 
## Approximate significance of smooth terms:
##                         edf  Ref.df     F  p-value    
## s(Time):LangEN        5.873   6.271 3.995 0.000444 ***
## s(Time):LangNL        6.693   7.038 5.740 2.76e-06 ***
## s(Time):IsENFaith     7.764   8.098 4.660 6.98e-06 ***
## s(Time):IsNLFaith     8.095   8.384 0.949 0.454155    
## s(Time,SpeakerWord) 584.871 752.000 7.437  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.348   Deviance explained =   38%
## fREML = -15919  Scale est. = 1.91      n = 12622
par(mfrow=c(1,2))
plot(m9.bin,select=3,shade=T,rug=F, ylim=c(-0.6,1.7), main='m9.bin: English difference')
abline(h=0)
plot(m9.bin,select=4,shade=T,rug=F, ylim=c(-0.6,1.7), main='m9.bin: Dutch difference')
abline(h=0)

4.11 Faster computation

system.time(m9 <- bam(Pos ~ WordLang + s(Time, by=WordLang) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event))
##    user  system elapsed 
##   36.83    3.36   40.34
system.time(m9.discrete <- bam(Pos ~ WordLang + s(Time, by=WordLang) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event, discrete=TRUE))
##    user  system elapsed 
##    9.27    0.31    9.63
system.time(m9.discrete2 <- bam(Pos ~ WordLang + s(Time, by=WordLang) + s(Time,SpeakerWord,bs="fs",m=1), data=dat, rho=0.999, AR.start=dat$start.event, discrete=TRUE, nthreads=2))
##    user  system elapsed 
##   12.82    0.31    8.27
(smrym9.discrete <- summary(m9.discrete))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ WordLang + s(Time, by = WordLang) + s(Time, SpeakerWord, 
##     bs = "fs", m = 1)
## 
## Parametric coefficients:
##                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)       0.03676    0.21162   0.174    0.862
## WordLangfaith.EN  0.42878    0.29987   1.430    0.153
## WordLangfate.NL   0.06997    0.31346   0.223    0.823
## WordLangfaith.NL  0.29249    0.30748   0.951    0.342
## 
## Approximate significance of smooth terms:
##                              edf  Ref.df      F  p-value    
## s(Time):WordLangfate.EN    5.387   5.829  3.779 0.001025 ** 
## s(Time):WordLangfaith.EN   7.476   7.753 12.758  < 2e-16 ***
## s(Time):WordLangfate.NL    6.968   7.307  5.060  3.6e-06 ***
## s(Time):WordLangfaith.NL   7.370   7.641  3.849 0.000196 ***
## s(Time,SpeakerWord)      584.638 752.000  7.465  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.35   Deviance explained = 38.2%
## fREML = -15927  Scale est. = 1.9067    n = 12622
par(mfrow=c(2,2))
plot_smooth(m9.discrete, view="Time", cond=list(WordLang=c("fate.EN")), main='m9.discrete: English speakers', rug=FALSE, rm.ranef=T,ylim=c(-0.6,1.7), col=rainbow(2)[1]) 
## Summary:
##  * WordLang : factor; set to the value(s): fate.EN. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
plot_smooth(m9.discrete, view="Time", cond=list(WordLang=c("faith.EN")), rm.ranef=T,add=T, col=rainbow(2)[2], rug=FALSE) 
## Summary:
##  * WordLang : factor; set to the value(s): faith.EN. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
# legend
gfc <- getFigCoords()
legend(gfc[2], gfc[4], legend = c('fate','faith'), text.col = rainbow(2), text.font = 2, xjust = 1, yjust = 1, bty = "n", xpd = TRUE)

plot_diff(m9.discrete, view="Time", comp=list(WordLang=c("faith.EN","fate.EN")), rm.ranef=T,ylim=c(-0.6,1.7), main='Difference between faith and fate')
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
## 
## Time window(s) of significant difference(s):
##  0.626263 - 1.000000
plot_smooth(m9.discrete, view="Time", cond=list(WordLang=c("fate.NL")), main='m9.discrete: Dutch speakers', rug=FALSE, rm.ranef=T,ylim=c(-0.6,1.7), col=rainbow(2)[1]) 
## Summary:
##  * WordLang : factor; set to the value(s): fate.NL. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
plot_smooth(m9.discrete, view="Time", cond=list(WordLang=c("faith.NL")), rm.ranef=T,add=T, col=rainbow(2)[2], rug=FALSE) 
## Summary:
##  * WordLang : factor; set to the value(s): faith.NL. 
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 
# legend
gfc <- getFigCoords()
legend(gfc[2], gfc[4], legend = c('fate','faith'), text.col = rainbow(2), text.font = 2, xjust = 1, yjust = 1, bty = "n", xpd = TRUE)

plot_diff(m9.discrete, view="Time", comp=list(WordLang=c("faith.NL","fate.NL")), rm.ranef=T,ylim=c(-0.6,1.7), main='Difference between faith and fate')
## Summary:
##  * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000. 
##  * SpeakerWord : factor; set to the value(s): VENI_EN_10.faith. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Time,SpeakerWord)
## 

## 
## Difference is not significant.

5 Full model: all words

full$LangLoc <- interaction(full$Lang, full$Loc)

full$IsENTHFront <- (full$Lang == "EN" & full$Sound == "TH" & full$Loc == "Front")*1

full$IsNLTHFront <- (full$Lang == "NL" & full$Sound == "TH" & full$Loc == "Front")*1

full$IsENTHBack <- (full$Lang == "EN" & full$Sound == "TH" & full$Loc == "Back")*1

full$IsNLTHBack <- (full$Lang == "NL" & full$Sound == "TH" & full$Loc == "Back")*1

full$SpeakerSoundLoc <- interaction(full$Speaker, full$Sound, full$Loc)

system.time(fm1 <- bam(Pos ~ LangLoc + s(Time,by=LangLoc) + s(Time,by=IsENTHFront) + s(Time,by=IsENTHBack) + s(Time,by=IsNLTHFront) + s(Time,by=IsNLTHBack) + s(Time,SpeakerSoundLoc,bs="fs",m=1) + s(Time,Word,bs="fs",m=1), data=full, discrete=TRUE, rho=0.999, AR.start=full$start.event))
## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.
##    user  system elapsed 
##  214.77    2.20  217.26
acf_resid(fm1)

(smryfm1 <- summary(fm1))
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ LangLoc + s(Time, by = LangLoc) + s(Time, by = IsENTHFront) + 
##     s(Time, by = IsENTHBack) + s(Time, by = IsNLTHFront) + s(Time, 
##     by = IsNLTHBack) + s(Time, SpeakerSoundLoc, bs = "fs", m = 1) + 
##     s(Time, Word, bs = "fs", m = 1)
## 
## Parametric coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -0.13761    0.13601  -1.012    0.312
## LangLocNL.Back   0.13834    0.19919   0.694    0.487
## LangLocEN.Front  0.06257    0.19965   0.313    0.754
## LangLocNL.Front  0.14605    0.20604   0.709    0.478
## 
## Approximate significance of smooth terms:
##                              edf   Ref.df      F  p-value    
## s(Time):LangLocEN.Back     3.064    3.394  3.210  0.04360 *  
## s(Time):LangLocNL.Back     3.621    3.963  1.481  0.21470    
## s(Time):LangLocEN.Front    6.903    7.090  3.243  0.00093 ***
## s(Time):LangLocNL.Front    6.108    6.334  1.459  0.18149    
## s(Time):IsENTHFront        7.408    7.651  8.717 1.11e-11 ***
## s(Time):IsENTHBack         6.189    6.480  2.491  0.02750 *  
## s(Time):IsNLTHFront        4.552    4.878  1.666  0.09827 .  
## s(Time):IsNLTHBack         6.041    6.337  0.228  0.97153    
## s(Time,SpeakerSoundLoc) 1233.950 1504.000 16.385  < 2e-16 ***
## s(Time,Word)             146.990  176.000 59.271  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.424   Deviance explained = 43.1%
## fREML = -1.2168e+05  Scale est. = 3.7858    n = 126177
par(mfrow=c(2,2))
plot(fm1,select=5,shade=T,rug=F, ylim=c(-0.6,1.8), main='fm1: English difference front')
abline(h=0)
plot(fm1,select=6,shade=T,rug=F, ylim=c(-0.6,1.8), main='fm1: English difference back')
abline(h=0)
plot(fm1,select=7,shade=T,rug=F, ylim=c(-0.6,1.8), main='fm1: Dutch difference front')
abline(h=0)
plot(fm1,select=8,shade=T,rug=F, ylim=c(-0.6,1.8), main='fm1: Dutch difference back')
abline(h=0)

6 Replication

To replicate the analysis presented above, you can just copy the following lines to the most recent version of R. Please note that you first need to install Pandoc.

download.file('http://www.let.rug.nl/wieling/Tutorial/analysis.Rmd', 'analysis.Rmd')
if (length(setdiff('rmarkdown', rownames(installed.packages()))) > 0) {
  install.packages('rmarkdown')  
}
library(rmarkdown)
render('analysis.Rmd') # generates html file with results
browseURL(paste('file://', file.path(getwd(),'analysis.html'), sep='')) # shows result