Lecture 3: Sample autocorrelation and testing for independence This Lecture

Transcription

Lecture 3: Sample autocorrelation and testing for independence This Lecture
Notes
Lecture 3: Sample autocorrelation and testing for
independence
STAT352: Applied Time Series
Tilman M. Davies
Dept. of Mathematics & Statistics
University of Otago
This Lecture
Estimation of autocovariance/autocorrelation
Testing for an independent series
Notes
Estimating dependence from data
Notes
Last lecture, we examined the first- and second-order
properties of a time series, and defined the mean, covariance
and correlation functions for some simple time series models.
In practice, however, we don’t start with the model. We start
with observed data.
To assess the strength and nature of the dependence in our
data, we make use of the ‘sample’ analogues of the functions
defined earlier for stationary time series models.
Sample estimators
Notes
Let x1 , . . . , xN be observations of a time series. The stationary
sample mean is
N
1 X
x¯ =
xt .
N
t=1
The sample autocovariance function is
N−|h|
1 X (xt+|h| − x¯)(xt − x¯) ;
γˆ (h) =
N
−n < h < n.
t=1
The sample autocorrelation function is
ρˆ(h) =
γˆ (h)
;
γˆ (0)
−n < h < n.
Sample estimators (cont.)
Notes
Note that we can compute the sample estimates for any data
set.
A common task is to plot the sample autocorrelation function
(abbreviated to an ‘ACF plot’) and inspect it (a) for any
evidence of independence in the data; and (b) to assess
whether there exists a nice stationary model we can fit to the
observed time series (subsequently used for
prediction/forecasting).
Typically we only plot the ACF for h ≥ 0, since it is symmetric
for negative values of h.
Example I: White noise sample ACF plot
Notes
1.0
As we would expect, the purely independent nature of white noise
means that for lags h 6= 0, the sample autocorrelation is very small.
●
2
●
0.8
●
●
●
●
●●
● ●
●
●
●
● ● ●
●
●
●
●
−1
●
●
●
●
●
● ●
●
●
●
●
●
0.4
ACF
●
●●
0.2
0
●
●
●
−2
W_t
●
●
●
●
●
●
●
−0.2 0.0
1
●
0.6
●
●
●
●
0
10
20
30
t
40
50
0
5
10
Lag
15
ACF plot confidence bands
Notes
It can be shown, for an iid series with finite variance, that
ρˆ(h) for h > 0 is approximately normally distributed with zero
mean and variance 1/N, as N becomes large.
The dashed horizontal lines on the ACF plot are typically
included by default.
They indicate a 95% confidence band corresponding to a null
hypothesis of independent terms in the time series.
√
Hence, they are computed as ±1.96/ N.
Any breaches of these bands by ρˆ for h > 0 constitutes
evidence against the null hypothesis of independent terms.
This isn’t necessarily a bad thing... if we can fit a model
which represents the observed dependence structure, we can
provide sensible ‘future’ predictions of the series using that
model.
Example II: Random walk sample ACF plot
Notes
1.0
A realisation of the nonstationary standard normal random walk
model, and its ACF plot, is given below:
●●
●
0.8
●
−4
● ●●●
● ●●
●
●
●
●
●
●
●
●
●
0
●
●
●
●
●
●
S_t
W_t
●
●
●
−0.4
●
●
ACF
−2
●
●
−6
S_t
●
●
●● ●
● ● ●
0.4
●
●
0.6
●
●
0.2
0
●
0.0
2
●
●
●
●●
10
20
30
t
40
50
0
10
20
30
Lag
40
Example III: Standard normal MA(1) model
Notes
1.0
10
A realisation of the stationary standard normal first order moving
average model with θ = −3, and its ACF plot, is given below:
●
●
●
●
● ●
●
●
●
−5
●
●
●
●
●
0
●
●
●
X_t
W_t
●
●
●
● ●
●
●
●
●
●
0.2
●
●
●
●
0.0
0
●
●
●
●
●
●
●
−0.2
X_t
●
●
●
10
0.4
●
●
●
●
●●
0.6
●
●
ACF
5
0.8
●
●
20
30
40
t
50
0
10
20
30
40
50
Lag
Interpreting an ACF plot
We are typically interested in two distinct features of a sample
ACF plot:
1
2
The individual values close to h = 0, and
The behaviour of the correlations as a whole.
That is, we care less about individual correlation values for
larger h than for smaller h.
The overall behaviour of the plot can aid in (a) selecting an
appropriate model to represent our observed data, (b)
assessing the presence of trend and or seasonality in the
observed data thus, more generally, (c) indicate the
possibility of nonstationarity.
Notes
Interpreting an ACF plot: important notes
Notes
VERY IMPORTANT:
Just because an ACF plot indicates statistically significant
correlations, does not necessarily mean the underlying process is
nonstationary. A breach of the confidence bands simply provides
you with evidence against the null hypothesis of purely
independent terms.
Remember, stationary processes are of course allowed dependence,
its just that the nature of that dependence has certain simple
conditions attached to it.
It can be difficult to use the ACF plot alone to assess stationarity
vs. nonstationarity, but there are certain features of such a plot
that can be recognised as unique to either situation.
Testing the hypothesis of independent terms
Method 1: 95% ACF plot confidence bands.
Computation:
√
±1.96/ N,
where N is the sample size.
Interpretation: Inspect the ACF plot and conclude statistically
significant evidence against the null hypothesis of independent
terms if more than 5% of the ρˆ(h) values exceed the limits.
Notes
Testing the hypothesis of independent terms
Notes
Method 2: Ljung-Box test (single-value version of Method 1).
Computation: Under the null hypothesis, the test statistic
Q = N(N + 2)
h
X
ρˆ(k)2
N −k
k=1
follows a Chi-squared distribution with h degrees of freedom.
Interpretation: For a fixed value of h, compute
Q, then the p-value is given by P(q > Q); q ∼ χ2 (h). Reject
null hypothesis of independent terms at the 5% level if
P(q > Q) < 0.05.
Testing the hypothesis of independent terms
Method 3: Rank test (can be useful for detecting a linear trend in
the observed data).
Computation: Suppose the observations of our time series are
denoted X = {x1 , . . . , xN }. Let
P=
N−1
X
N
X
1[xu > xt ]
t=1 u=t+1
be the total number of pairs with the ‘later’ value in the time
series larger than the ‘earlier’ value in the time series, where
1[ · ] denotes the indicator function.
Notes
Testing the hypothesis of independent terms
Notes
Method 3: Rank test (cont.)
Furthermore, let
N(N − 1)
4
MP =
and VP =
N(N − 1)(2N + 5)
.
72
Then, under the null hypothesis of iid terms, P is normally
√
distributed with mean MP and standard deviation VP .
Interpretation: Reject null hypothesis√of independent terms at a
5% level of significance if |P − MP |/ VP > 1.96 (and you can
get a p-value in the usual way from the standard normal
distribution).
Testing for iid terms in Example I: White noise
1.0
●
2
●
0.8
●
●
●
●
●●
●
● ● ●
●
−1
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
−2
●
●
0.4
ACF
●
●●
● ●
●
●
0.2
0
W_t
●
●
●
●
●
●
●
−0.2 0.0
1
●
0.6
●
●
●
Notes
●
0
10
20
30
t
40
50
0
5
10
15
Lag
ACF plot test: NO EVIDENCE, RETAIN NULL
Ljung-Box test (h=15):√Q=4.8944, p = 0.993 RETAIN NULL
Rank test: (P − MP )/ VP = 0.811, p = 0.209 RETAIN NULL
Testing for iid terms in Example II: Random walk
1.0
●
●
●●
●
0.8
2
●
●
● ●●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
−0.4
S_t
W_t
0
●
●
0.0
−4
●
●
●
●
ACF
−2
S_t
●
●
●● ●
● ● ●
0.4
●
●
0.6
●
●
0.2
0
●
−6
Notes
●●
10
20
30
40
50
0
10
20
t
30
40
Lag
ACF plot test: STRONG EVIDENCE, REJECT NULL
Ljung-Box test (h=15):√Q=191.0684, p < 0.00001 REJECT NULL
Rank test: (P − MP )/ VP = 3.873, p = 0.00005 REJECT NULL
●
●
●
●
● ●
●
●
●
−5
0
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●
X_t
W_t
0.0
0
●●
●
●
●
−0.2
X_t
●
●
10
0.4
●
●
0.6
●
●
●
●
ACF
5
0.8
●
●
●
Notes
1.0
10
Testing for iid terms in Example III: MA(1)
20
30
t
40
50
0
10
20
30
40
50
Lag
ACF plot test: WEAK EVIDENCE, REJECT NULL
Ljung-Box test (h=15):√Q=26.3322, p = 0.1468 RETAIN NULL
Rank test: (P − MP )/ VP = 1.21, p = 0.225 RETAIN NULL
Next lecture...
Notes
What have we really been talking about when we’ve
mentioned trend and seasonality?
Classical decomposition
Notes