Practical small sample inference for single lag subset autoregressive models

Transcription

Practical small sample inference for single lag subset autoregressive models
Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
www.elsevier.com/locate/jspi
Practical small sample inference for single lag subset autoregressive
models
Robert L. Paigea , A. Alexandre Trindadeb,∗
a Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, USA
b Department of Statistics, University of Florida, Gainesville, FL 32611-8545, USA
Received 22 December 2005; received in revised form 19 April 2007; accepted 16 July 2007
Available online 10 August 2007
Abstract
We propose a method for saddlepoint approximating the distribution of estimators in single lag subset autoregressive models of
order one. By viewing the estimator as the root of an appropriate estimating equation, the approach circumvents the difficulty inherent
in more standard methods that require an explicit expression for the estimator to be available. Plots of the densities reveal that the
distributions of the Burg and maximum likelihood estimators are nearly identical. We show that one possible reason for this is the fact
that Burg enjoys the property of estimation equation optimality among a class of estimators expressible as a ratio of quadratic forms
in normal random variables, which includes Yule–Walker and least squares. By inverting a two-sided hypothesis test, we show how
small sample confidence intervals for the parameters can be constructed from the saddlepoint approximations. Simulation studies
reveal that the resulting intervals generally outperform traditional ones based on asymptotics and have good robustness properties
with respect to heavy-tailed and skewed innovations. The applicability of the models is illustrated by analyzing a longitudinal data
set in a novel manner.
© 2007 Elsevier B.V. All rights reserved.
Keywords: Yule–Walker; Burg; Maximum likelihood; Estimating equation; Saddlepoint approximation; Confidence interval; Longitudinal data
1. Introduction
This paper considers approximations to distributions of various estimators of the parameters in a Gaussian autoregressive model of order p, AR(p), where the coefficients of the first p − 1 lags are zero. We call these, single lag subset
AR models of order p, henceforth abbreviated to SAR(p). Note that the usual AR(1) model is a special case, being a
SAR(1). Primarily motivated by the desire to improve inference in small samples, and using a result of Daniels (1983),
we show that saddlepoint approximations for the estimators can be easily constructed by viewing them as solutions of
appropriate estimating equations. The main benefit of this approach lies in the fact that the estimating equation does
not have to be solved for the estimator in question, an otherwise nontrivial task in the case of the maximum likelihood
estimator (MLE). The approximations can be subsequently inverted to yield highly accurate confidence intervals for
the parameters. For small samples, the resulting coverage probabilities and their lengths are generally superior to those
obtained from first order asymptotics, the traditional way of constructing confidence intervals.
∗ Corresponding author. Tel.: +1 352 392 1941x216; fax: +1 352 392 5175.
E-mail address: [email protected]fl.edu (A.A. Trindade).
0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jspi.2007.07.006
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
1935
A secondary motivation for our work is the observation that AR models fitted via Burg’s method tend to exhibit
consistently larger Gaussian likelihoods than those fitted via Yule–Walker’s (YW) method (e.g. Brockwell and Davis,
2002, Section 5.1.2). Recent work by Brockwell et al. (2005) suggests this feature extends to multivariate subset AR
models, and is accentuated by proximity of the roots of the AR polynomial to the unit circle. Comparing the distributions
of the YW, Burg, and maximum likelihood (ML) estimators should provide further insight into their different finitesample performances, and the question of whether or not the densities of the Burg and ML estimators are closer in some
sense than those of YW and ML. When the data come from a Gaussian SAR(p) model, the YW and Burg estimators
take the form of ratios of quadratic forms in normal random variables. Results concerning the closeness of the YW and
Burg estimators to the MLEs are important, since the former are easily constructed and are frequently used as initial
inputs in optimization algorithms that search for the latter. This is especially important in higher order AR models and
multivariate settings.
There is a vast literature on approximating the distribution of estimators of the coefficient in an AR(1) model.
Daniels (1956) derived saddlepoint approximations for the density of the Burg estimator. White (1961) computes
asymptotic expansions for the mean and variance of the least squares and ML estimators. Phillips (1978) obtained
the Edgeworth and saddlepoint approximations to the density of the least squares estimator (LSE). Durbin (1980)
explored the approximate distribution of partial serial correlation coefficients, which included the YW estimator. Using
Edgeworth approximations, Ochi (1983) obtained asymptotic expansions to terms of order n−1 for the distribution of
ˆ 1 , c2 ) presented in the next section. Fujikoshi and Ochi
the AR(1) version of the generalized coefficient estimator (c
(1984) used the same technique to approximate the distribution of the ML estimator. Liebermann (1994a, b) obtained
saddlepoint approximations to the cumulative distribution and probability density functions of ratios of quadratic
forms in normal random variables and the LSE, by direct contour integration of the inversion formula. Saddlepoint
approximations to the cumulative distribution function of the LSE in an AR(1) are also derived by Wang (1992). More
recently, and using the methods of Daniels (1954), Butler and Paolella (1998a, b) obtained saddlepoint approximations
to general ratios of quadratic forms in normal random variables.
To the best of our knowledge, the general SAR(p) model has only been investigated by Chan (1989), who deals with
asymptotic results in the nearly nonstationary case. In the particular (nonstationary) instance when the SAR coefficient
is equal to unity, the resulting model is equivalent to seasonally differencing the series and has been called the seasonal
random walk by Latour and Roy (1987). This branch of the literature dealing with inference for nonstationary AR(1)s
and unit roots is particularly popular in econometrics, but is not our concern here; we are interested in the stationary case.
Some of these results are obtained through asymptotic expansions, a related technique to saddlepoint approximations.
We mention in passing the recent works of Perron (1996), Dufour and Kiviet (1998), and Kiviet and Phillips (2005).
Noteworthy are also the papers of Andrews (1993) dealing with median-bias-corrected least squares estimation of a
stationary and nonstationary (unit root) AR(1) in the presence of a polynomial time trend, and the very recent work of
Broda et al. (2007) which extends this by considering other forms of bias adjustment to the LSE and incorporates the
use of saddlepoint approximations for increased accuracy.
Our work differs and builds on the above cited references in several respects. First, we extend the saddlepoint
approximations from the AR(1) to the more general SAR(p) case. Secondly, we approach the inference problem from
the point of view estimating equations, allowing us to tackle more complex estimators like the MLE, which have not
been dealt with before. The method we employ allows us to not only accurately approximate the distributions of the
various parameter estimators, but to also invert them to produce confidence intervals with good properties. In the case
of the LSE, the resulting confidence intervals are identical to those of Andrews (1993), but while he can only handle
least-squares inference, our method requires only knowledge of the moment generating function of the estimating
equation of which the desired estimator is the unique solution. Additionally we shed new light on the Burg estimator,
showing it to be optimal in at least two respects, and find it to be robust to departures from normality with respect
to heavy-tailed and skewed innovations. Lastly, we illustrate a novel application of the SAR(p) model, providing a
simpler time series-based alternative to standard parametric longitudinal data modeling.
The rest of the paper is organized as follows. Section 2 introduces the YW, Burg, and ML estimators of the AR
ˆ 1 , c2 ),
coefficient in a SAR(p) process. We show that all but the MLE are special cases of a generalized estimator, (c
expressible as a ratio of quadratic forms in normal random variables. In Section 3, saddlepoint approximations to the
ˆ 1 , c2 ), c1 0, c2 0} and the MLE are
probability distribution and density functions of the family of estimators {(c
derived. Plots of the densities of the various estimators are given in Section 4, and an estimating equation optimality
property is proved for Burg. Section 5 shows how small sample confidence intervals can be constructed from the
1936
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
saddlepoint approximations, and assesses their goodness in terms of coverage probability and interval length. The
robustness of the intervals is studied in Section 6. We conclude in Section 7 by demonstrating the applicability of
SAR(p) models on a longitudinal data set.
2. SAR(p) model parameter estimation
Consider the zero-mean process {Xt }, t = . . . , −1, 0, 1, . . . , satisfying the SAR(p) model equations
Xt = Xt−p + Zt ,
{Zt } ∼ iid N(0, 2 ),
(1)
with || < 1. It follows that Xt is stationary with autocovariance function (ACVF) at lag h 0:
2 k
/(1 − 2 ) if h = kp, k = 0, 1, 2, . . . ,
h ≡ E(Xt Xt+h ) =
0
otherwise.
A realization X = [X1 , . . . , Xn ] from the model has the multivariate normal distribution, X ∼ Nn (0, n ), where the
(i, j ) entry of covariance matrix n is just |i−j | .
Given observations {x1 , . . . , xn } from a stationary time series, let
1
xt xt+h
n
n−h
ˆ h =
(p)
and ˆ 0 =
n−p
1 2
xt
n
t=p+1
t=1
denote, respectively, the sample-based estimates of the ACVF at lag h and the truncated ACVF at lag 0 obtained
by omitting the first and last p observations. For any given nonnegative constants c1 and c2 , we can then define the
generalized estimator of ,
n
S
t=1+p xt xt−p
ˆ
c1 ,c2 = p
≡
.
(2)
n
n−p
2
2
2
c1 T1 + T 2 + c 2 T3
c1 t=1 xt + t=p+1 xt + c2 t=n−p+1 xt
Some of the more common estimators of and 2 can be shown to be the following special cases of (2) (Brockwell et
al., 2005):
Least squares,
ˆ =
LS
S
ˆ .
=
1,0
T1 + T 2
(3)
Yule–Walker,
ˆ
YW =
ˆ p
ˆ 0
=
S
ˆ .
=
1,1
T1 + T 2 + T 3
(4)
Burg,
ˆ
BG =
2ˆp
(p)
ˆ 0 + ˆ 0
=
S
ˆ 1/2,1/2 .
=
T1 /2 + T2 + T3 /2
(5)
Two main estimates of the white noise variance 2 can be viewed as functions of . The first, arising from the
Durbin–Levinson algorithm, we define as 2AL () ≡ (1 − 2 )ˆ0 . The second, originating from a sum of squares
(p)
expression in the likelihood, we define as 2SS () ≡ ˆ 0 − 2ˆp + ˆ 0 2 . Note that these are both quadratics in .
2AL () is bounded above with roots at ±1; while 2SS () is bounded below, and being a sum of squares, satisfies
ˆ .
2SS ()0 for all ||1. They intersect at = 0 and = BG
Remark 1. The Burg algorithm white noise variance estimate thus coincides with that from ML evaluated at the Burg
ˆ BG ) = 2 (
ˆ
estimator of , i.e. 2SS (
AL BG ).
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
1937
A straightforward argument allows us to determine sufficient conditions for when the estimator ˆ c1 ,c2 is causal, i.e.
ˆ
|c1 ,c2 | < 1. Note that by the Cauchy–Schwarz Inequality
|S| (T1 + T2 )1/2 (T2 + T3 )1/2 [(T1 + T2 ) + (T2 + T3 )]/2 = T1 /2 + T2 + T3 /2,
where the second inequality follows from the fact that
0 ( T1 + T2 − T2 + T3 )2 = (T1 + T2 ) + (T2 + T3 ) − 2 (T1 + T2 )(T2 + T3 ),
ˆ
and therefore choosing c1 21 and c2 21 ensures |
c1 ,c2 | 1.
Remark 2. It follows immediately from the above result that while both theYW and Burg estimators are always causal,
(p)
this is not necessarily the case for the LSE. An immediate consequence of the causality of Burg is that 2|ˆp | ˆ 0 + ˆ 0 .
It is easily shown that the −2 log likelihood for the parameters is given by
L(, 2 ) = n log(22 ) − p log(1 − 2 ) + n2SS ()/2 ,
where
(6)
⎤
⎡
p
n
1
(p)
2SS () = ⎣(1 − 2 )
Xt2 +
(Xt − Xt−p )2 ⎦ = ˆ 0 − 2ˆp + ˆ 0 2
n
t=1
t=p+1
is the MLE of 2 for fixed . Substituting 2SS () for 2 in (6) gives the reduced (or profile) −2 log likelihood
2SS ()n
(p) 2
2
l() = n log(ˆ0 − 2ˆp + ˆ 0 ) − p log(1 − ) ∝ log
,
2AL ()p
(7)
ˆ .
whose minimizer in (−1, 1) is the MLE of , ML
3. Saddlepoint approximations for the estimators
Saddlepoint approximations are methods for obtaining accurate expressions for the probability density function
(pdf) and cumulative distribution function (cdf) of a random variable when it possesses a tractable moment generating
function (mgf), or equivalently its logarithm, the cumulant generating function (cgf). The basic idea for saddlepoint
approximating the pdf is to start from the inversion formula that expresses it as a complex-valued integral in terms of
the cgf. After deforming the contour associated with this integral so that it passes through the saddlepoint, a Laplace
approximation is then used to obtain an expression to the pdf by carrying up to second order terms in the integrand.
The method, pioneered by Daniels (1954), can achieve an error rate of O(n−3/2 ) for independent and identically
distributed (iid) data over sets of bounded central tendency when the approximation is renormalized to integrate to one.
Approximations for the cdf can be obtained by either numerically integrating the renormalized saddlepoint pdf, or via
a direct saddlepoint approximation of the tail area developed by Lugannani and Rice (1980). An illuminating overview
of the topic is given by Goutis and Casella (1999).
In this section we obtain saddlepoint approximations to the distributions of the estimators ˆ c1 ,c2 and ˆ ML , by
viewing them as solutions of estimating equations. We begin by defining the (i, j ) entry of the n × n matrix A to be zero
everywhere, except when |i − j | = p, in which case it is equal to 21 . Similarly, define B(c1 , c2 ) to be the n × n identity
matrix, with the first (last) p diagonal elements multiplied by c1 (c2 ). If X = [X1 , . . . , Xn ] is a realization from model
ˆ
(1), we can then express the generalized estimator c1 ,c2 as a ratio of quadratic forms in normal random variables
ˆ c ,c =
1 2
X AX
Q1
≡
.
X B(c1 , c2 )X
Q2 (c1 , c2 )
1938
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
(For simplicity we may sometimes suppress the dependence on c1 and c2 when writing B and Q2 .) The joint mgf of
Q1 and Q2 is given by
MQ1 ,Q2 (s, t) = E exp{X (sA + tB)X} = |In − 2n (sA + tB)|−1/2
(8)
and is defined for all s and t such that In − 2n (sA + tB) is positive definite. Defining D(r) ≡ n (A − rB), with
and the random variable (r) ≡ Q1 − rQ2 = X (A − rB)X, we obtain by
eigenvalues d1 (r) < · · · < 0 < · · · < dn (r),
linearity of the trace, E(r) = tr[D(r)] = ni=1 di (r). The mgf of (r) is given by
M(r) (s) = E exp{X (sA − srB)X} = |(r, s)|−1/2 ,
(9)
where (r, s) ≡ In − 2sD(r). Its cgf is then K(r) (s) = −( 21 ) log |(r, s)|, and using standard matrix calculus (e.g.
Lütkepohl, 1996, Chapter 10), we find its first three derivatives to be
K(r) (s) = aj tr [(−1 (r, s)D(r))j ],
(j )
j = 1, 2, 3,
(10)
where a1 = 1, a2 = 2, a3 = 8.
Defining h(x) ≡ x Ax/(x Bx), the lower and upper bounds on the support of ˆ c1 ,c2 satisfy, rL = min {h(x) : 0 =
x ∈ Rn } and rU = max {h(x) : 0 = x ∈ Rn }, with probability 1. By differentiating h(x), it follows immediately that
any extremum point, r ∗ = h(x∗ ), must satisfy the generalized eigenvalue problem
|A − r ∗ B| = 0,
(11)
so that rL and rU are, respectively, the minimum and maximum of such extremum points r ∗ . If B is nonsingular, this
reduces to the standard eigenvalue problem, so that rL = min (B −1 A) and rU = max (B −1 A), where min (·) and max (·)
denote, respectively, the smallest and largest eigenvalues of their arguments. If B is singular, (11) must be solved by
specialized numerical techniques (e.g. Parlett, 1980).
3.1. The probability density functions
Daniels (1983) pioneered an approach that works directly with the equation that has as solution the estimator whose
ˆ
saddlepoint approximation is desired. Generalized estimator c1 ,c2 is determined as the unique root in r of the estimating
equation
(r) = Q1 − rQ2 (c1 , c2 ).
(12)
This estimating equation is itself a random variable, with cgf K(r) (s) = −( 21 ) ln |(r, s)|. Since ˆ c1 ,c2 is the unique
ˆ
root of (12) and (r) is a monotonically decreasing function in r for every realization X, we have c1 ,c2 r ⇔ (r) 0,
ˆ
which leads to the device P (c1 ,c2 r) = P ((r)0).
Daniels (1983) shows via complex-analytic arguments that this device yields the following saddlepoint approximation
ˆ
to the pdf of c1 ,c2 at r:
fˆˆ
c1 ,c2
(r|) = |K˙ (r) (ˆs )/ˆs |[2K(r) (ˆs )]−1/2 exp{K(r) (ˆs )}
=
|tr[n (jD(r)/jr)]|
,
2M(r) (ˆs ) |(r, sˆ )| tr [((r, sˆ )D(r))2 ]
(13)
(14)
where K˙ (r) (s) = jK(r) (s)/jr, and sˆ solves the saddlepoint equation
jK(r) (s)/js|s=ˆs ≡ K (r) (ˆs ) = tr [−1 (r, sˆ )D(r)] = 0.
(15)
The saddlepoint is the solution in the neighborhood of zero whose endpoints satisfy |In − 2sD(r)| = 0. Since this
occurs when (2s)−1 is any eigenvalue of D(r), sˆ is the unique solution to (15) in the interval
(2d1 (r))−1 < sˆ < (2dn (r))−1 .
Notice that the saddlepoint distribution depends on the true value of . This dependence enters (14) through n .
(16)
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
1939
The validity of (13) requires that the Jacobian term have a removable singularity at s = 0, i.e.
lim |K˙ (r) (s)/s| < ∞.
(17)
s→0
This condition is satisfied for a SAR(p) model, since (12) is a linear combination of quadratic forms, whence we obtain
immediately
−sEQ2 jMQ1 ,Q2 (s, t) K˙ (r) (s) =
.
M(r) (s)
jt
t=−rs
The saddlepoint approximation for the distribution of the MLE can be obtained in similar fashion. Differentiating
ˆ , is a root in r of the estimating equation
(7) enables us to determine that the MLE of , ML
(p)
(p)
ϑ(r) = (n − p)ˆ0 r 3 − (n − 2p)ˆp r 2 − (nˆ0 + pˆ0 )r + nˆp
(18)
in the interval [−1, 1]. With the definitions,
1 ≡ −
(n − 2p)ˆp
,
(p)
(n − p)ˆ0
(p)
2 ≡ −
nˆ0 + pˆ0
(p)
(n − p)ˆ0
,
3 ≡
nˆp
(p)
(n − p)ˆ0
,
ˆ ML satisfies the cubic written in standard form, r 3 + 1 r 2 + 2 r + 3 = 0. If we now define,
Q≡
3
2 − 21
,
9
R≡
9
1 2 − 27
3 − 2
31
,
54
D ≡ Q3 + R 2 ,
we arrive at the following result, whose proof is deferred to the Appendix.
Proposition 1. The MLE of in model (1) is given by
4
R
1
1
ˆ ML = 2 −Q cos
+
arccos − .
3
3
3
3
−Q
(19)
In the proof of this result we show that (18) has three real roots, r1 < − 1 r2 1 < r3 , and by definition the MLE is
therefore r2 . As such, one of two conditions must be satisfied in the interval [−1, 1]: either (i) ˆ ML r ⇔ ϑ(r) 0 or
(ii) ˆ ML r ⇔ ϑ(r)0. The fact that the coefficient of r 3 in ϑ(r) is positive then implies that ϑ(r) 0 on the interval
[−1, ˆ ML ]. For a SAR(p) therefore, condition (i) is satisfied, and we thus have the device P (ˆ ML r) = P (ϑ(r) 0).
With ϑ(r) replacing (r),
D(r) = [(n − p)r 3 − nr]B(0, 0) + [n − (n − 2p)r 2 ]A − rpB(1, 1),
and (r, s) = In − 2sD(r), this yields the saddlepoint approximation to the density of ˆ ML as in (14). As before, this
approximation requires that the Jacobian term have a removable singularity at s = 0. For the MLE of the SAR(p)
model, this condition is satisfied once again since ϑ(r) is a linear combination of quadratic forms. The technique of
bounding the saddlepoint described above in (16) also carries over immediately to this setting.
Although renormalized saddlepoint density approximations can generally attain relative errors of O(n−3/2 ) for iid
data over sets of bounded central tendency, this is not necessarily the case in our dependent data setting. Liebermann
(1994a) notes the difficulty in determining the order of errors for saddlepoint approximations to distributions of ratios
of quadratic forms. In fact, such results have been obtained analytically in only a few instances. Liebermann (1994b)
develops a saddlepoint approximation for the LSE in an AR(1) model which is shown to have an error of O(n−1 ). Wang
(1992) develops a different saddlepoint approximation for the LSE in an AR(1) and shows that his approximation also
has error O(n−1 ). We do not consider order of error calculations in this paper, but present instead results from several
simulation studies which will demonstrate the remarkable accuracy of our saddlepoint-based procedures.
1940
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
3.2. The cumulative distribution functions
ˆ c ,c at r can be expressed as
The cdf of 1 2
Fˆ
c1 ,c2
(r|) = P
Q1
r = P (Q1 − rQ2 (c1 , c2 ) 0) = P ((r) 0),
Q2 (c1 , c2 )
where r ∈ (rL , rU ) lies in the interior of the support of ˆ c1 ,c2 . The Lugannani and Rice (1980) approximation to the
cdf of ˆ c1 ,c2 at r can then be defined in terms of the approximation to the cdf of (r) at 0 as
Fˆˆ
c1 ,c2
(r|) = Fˆ(r) (0|) =
(w)
ˆ + (w)[
ˆ wˆ −1 − uˆ −1 ]
1
1
+√
K(r) (0)K(r) (0)−3/2
2
72
if E[(r)] = 0,
if E[(r)] = 0,
where (·) and (·) denote, respectively, the cdf and pdf of a standard normal random variable, wˆ = sgn(ˆs )
[−2K(r) (ˆs )]−1/2 , and uˆ = sˆ [K(r) (ˆs )]−1/2 . As before, sˆ solves saddlepoint equation (15) in the interval defined
by (16).
ˆ
ˆ ˆ (r|), is obtained similarly by replacing
The saddlepoint approximation to the cdf of the MLE ML at point r, F
ML
(r) with ϑ(r) in the expression for Fˆ(r) (0|) above.
Remark 3. Note that all the estimators of we consider are formed from ratios of quadratic forms and are therefore
independent of the scale parameter 2 . Ratios of quadratic forms are invariant under scale transformations of X, and
therefore have a distribution which does not depend on 2 . Thus, when inference on is the objective, no generality
is lost by restricting attention to the special case 2 = 1, and we do so for the remainder of the paper.
4. Plots of densities and optimality of the Burg estimator
In this section we compute some saddlepoint approximations to the densities of the various estimators of under
model (1). It will be seen that the density of the Burg estimator closely tracks that of the MLE. Two possible reasons
for this phenomenon are discussed. The ML, YW, and Burg estimators, all have the same asymptotic distribution
ˆ
ˆ
under model (1) (e.g. Brockwell et al., 2004). If AN designates any of these estimators, then asymptotically AN ∼
2
N(, (1 − )/n). The abbreviation AN will be used when referring to this large sample distribution.
Fig. 1 shows the normalized saddlepoint approximations to the pdfs of the ML, Burg, and YW estimators, when
a sample of size 10 is drawn from model (1). Plots are presented for true values of = 0.5 and 0.9, respectively, in
combination with p = 1 and 3. The emerging pattern, particularly visible in the plots at right, is how closely Burg
tracks the ML pdf, whilst YW gains increasing bias with larger values of p. This is in agreement with empirical results
that find the likelihoods of models fitted via Burg to be consistently larger than those fitted via YW. The limiting AN
distribution is shown for reference. Note also that while the support of the Burg and ML estimators is always (−1, 1),
that of YW narrows as p → ∞, being (−0.96, 0.96) and (−0.81, 0.81) for p = 1 and 3, respectively.
Although the distribution of the Burg estimator tends to more closely track that of the MLE than is the case for YW,
it is important to bear in mind that in finite samples no optimality property is conferred on the MLE. The efficiency
of the latter is only guaranteed asymptotically. Nevertheless, we can see at least two possible reasons for explaining
why the distribution of Burg tends to be “closer” to that of the MLE. The first is revealed when one examines the
asymptotic expansions derived by Fujikoshi and Ochi (1984) for the respective cdfs. In the p = 1 case, they obtain
ˆ , which are correct to O(n−1 ).
third order Edgeworth expansions for standardized versions of the cdfs of ˆ c1 ,c2 and ML
Specifically, they show that
√
2
ˆ
P
n(ML − )/ 1 − r = GML (r, n, ) + o(n−1 )
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
n=10, p=1 φ=0.5
2.0
n=10, p=3 φ=0.5
2.0
ML
BG
YW
AN
1.5
1.0
1941
1.5
1.0
0.5
0.5
0.0
0.0
−0.5
6
0.0
0.5
4
3
0.0
0.5
r
r
n=10, p=1, φ=0.9
n=10, p=3, φ=0.9
1.0
6
ML
BG
YW
AN
5
−0.5
1.0
5
4
3
2
2
1
1
0
0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
r
0.2
0.4
0.6
0.8
1.0
r
Fig. 1. Saddlepoint approximations to the pdfs of the ML, Burg (BG), and YW estimators of in model (1) with = 1, based on samples of size
n = 10. In the top (bottom) plots, the true value of the parameter is = 0.5 ( = 0.9). The plots on the left (right) have p = 1 (p = 3). The limiting
asymptotically normal distribution (AN) for the three estimators is shown for reference.
for some function GML (r, n, ) depending on r, n, and . When c1 + c2 = 1, they show further that
√
r(r)2
2
ˆc
g(c1 ) + o(n−1 ),
n(
−
)/
1
−
r
=
G
(r,
n,
)
−
P
ML
1 ,c2
n(1 − 2 )
ˆ c ,c and ˆ ML to
where g(c1 ) = 2c1 (c1 − 1) + 1, is minimized for c1 = 21 . The implication is that for the cdfs of 1 2
−1
differ only in terms O(n ), we must have c1 + c2 = 1, and this difference is then minimized for c1 = 21 = c2 ; the Burg
estimator.
The second reason involves the concept of estimating equation optimality introduced by Godambe (1960). Since
ϑ() in (18) is a score function, it is automatically an unbiased estimating equation (Godambe, 1960). It is easy to
show that
E[()] = p2 [1 − (c1 + c2 )]/(1 − 2 ),
so that () defined by (12) is unbiased if and only if c1 + c2 = 1. (Note that for YW c1 + c2 = 2, and the bias of ()
approaches infinity in absolute value as || → 1.) Godambe (1960) defines an estimating equation ∗ () for parameter
∈ to be optimal, if it is unbiased and if
E[∗ ()/E(j∗ ()/j)]2 E[()/E(j()/j)]2 ,
∀ ∈ ,
(20)
for any other unbiased estimating equation () in a given class. This definition is motivated by observing that when is the true value, () should be as close as possible to zero, and ( + ) as far as possible from zero. More details and
justifications can be found in Godambe (1991, Chapter 1). Now, for the class of unbiased estimators ˆ c1 ,c2 , consider
finding the value of c1 = 1 − c2 that makes () optimal in this sense. A straightforward computation reveals that
E[j()/j] = −2 (n − p)/(1 − 2 ), which does not depend on c1 . Therefore (20) reduces to minimizing E[()]2
over all c1 ∈ R. A tedious computation presented in the Appendix shows c1 = 21 to be the minimizer; again the Burg
estimator. We state this formally as a proposition for ease of reference.
1942
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
Proposition 2. For a random vector X = [X1 , . . . , Xn ] from model (1), let c1 be any real number and
S=
n
T1 =
Xt Xt−p ,
t=p+1
p
Xt2 ,
T2 =
t=1
n−p
Xt2 ,
T3 =
t=p+1
n
Xt2 .
t=n−p+1
If (c1 ; ) = S − [T2 + T3 + c1 (T1 − T3 )], then c1 = 1/2 is the minimizer of E[(c1 ; )]2 .
5. A comparative study of confidence intervals
In this section we show how a confidence interval (c.i.) for can be constructed, by pivoting the saddlepoint
approximation to the cdf. This essentially involves inverting the p-value in a corresponding two-sided hypothesis
test. We also compare the lengths and coverage probabilities of the resulting c.i.s, with those obtained from standard
asymptotic theory.
Let Fˆˆ (·|0 , 2 ) denote the saddlepoint approximation to the cdf of the estimator ˆ in a SAR(p) model with known
2 , under a hypothesized value of = 0 . A test of H0 : = 0 versus Ha : = 0 at significance level would then
ˆ computed from the observed sample.
ˆ obs |0 , 2 ) 1 − /2, where ˆ obs is the value of fail to reject H0 if /2 Fˆˆ (
Since failure to reject H0 at level is equivalent to 0 being in the associated (1 − )% c.i. for , we determine the
lower and upper confidence bounds for this interval, L and U , by solving the pair of equations,
ˆ obs |L , 2 ) = 1 − /2
Fˆˆ (
ˆ obs |U , 2 ) = /2.
and Fˆˆ (
(21)
ˆ |, 2 ) is monotone in (Casella and Berger,
The resulting confidence set is guaranteed to be an interval if Fˆˆ (
obs
ˆ
ˆ
2002, Theorem 9.2.12). In our case simulations show that F ˆ (obs |, 2 ) is actually monotone decreasing, and
hence L is associated with the higher quantile. If, as is usually the case, 2 is unknown, Remark 3 means that
we can simply set 2 = 1 in (21). Each of these equations can be solved via a grid search over a subinterval of
[−1, 1], with a step size of say 10−3 . To find U for example, select the closest values U,1 and U,2 , such that
ˆ obs |U,1 , 1) /2 Fˆ ˆ (
ˆ obs |U,2 , 1).Then take the linear interpolation, U = (1 − d)U,1 + dU,2 , where
Fˆ ˆ (
ˆ | , 1)}/{Fˆ ˆ (
ˆ | , 1) − Fˆ ˆ (
ˆ | , 1)}.
d = {
/2 − Fˆˆ (
obs U,1
obs U,2
obs U,1
Letting z1−
/2 denote the (1 − /2) quantile from a standard normal distribution, the corresponding (1 − )100%
c.i. from asymptotic theory would be
ˆ
ˆ 2 )/n.
obs ± z1−
/2 (1 − (22)
obs
Example 1. Suppose a sample of size n = 30 from a SAR(p = 2) model produces an observed value of ˆ ML = 0.4.
The corresponding 95% saddlepoint and asymptotic c.i.s would be (0.050, 0.755), and (0.072, 0.728), respectively.
We now turn our attention to assessing the performance of these two types of c.i.s, saddlepoint and asymptotic, in
terms of interval length and coverage probability. Since the distribution of the Burg estimator is very close to that of the
MLE, we will present results only for YW and Burg. In addition, we focus only on sample size n = 10 (small samples);
the differences and problems observed in the ensuing discussion diminish rapidly for sample sizes in excess of about
30 (large samples). The nominal significance level is set to = 0.05 throughout.
Consider first c.i. length. On close inspection of Eqs. (21) and (22), we note that the lengths of the resulting intervals
depend only on and the estimated . In fact, there is a functional relationship between the lengths of the saddlepoint
and asymptotic intervals, parameterized by and ˆ obs , as can be observed in Fig. 2. While the asymptotic c.i.s have
smaller lengths over estimates close to zero, the saddlepoint c.i.s are generally shorter for estimated values of larger
than about 0.5. Although YW is consistently shorter than Burg, it suffers from problems as the estimates approach
1. This is particularly evident when p = 3, and happens because the YW saddlepoint density is only supported on
(−0.81, 0.81) there, while for p = 1 the support is (−0.96, 0.96). The Burg saddlepoint density is always supported
ˆ
on (−1, 1), irrespective of p. (See also the bottom right plot of Fig. 1.) As obs approaches the upper boundary of
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
Length of 95% confidence interval for φ
n=10, p=1
1943
n=10, p=3
1.5
1.5
1.0
1.0
0.5
0.5
Asymptotic
Burg saddlepoint
YW saddlepoint
0.0
0.0
0.0
0.2
0.4
0.6
Estimated φ
0.8
1.0
0.0
0.2
0.4
0.6
Estimated φ
0.8
1.0
Fig. 2. Saddlepoint and asymptotic 95% c.i. lengths for based on the YW and Burg estimators, as a function of the estimated . Lengths are based
on Eqs. (21) and (22), for n = 10 and p = 1, 3.
Underage probability
Coverage probability
Burg (n=10, p=1)
asymptotic
saddlepoint
1.00
0.95
0.90
0.0
0.2
0.4
0.6
True φ
0.8
0.06
0.04
0.02
0.00
1.0
0.0
0.2
0.4
0.6
True φ
0.8
1.0
0.0
0.2
0.4
0.6
True φ
0.8
1.0
Underage probability
Coverage probability
Yule−Walker (n=10, p=1)
1.00
0.95
0.90
0.0
0.2
0.4
0.6
True φ
0.8
1.0
0.06
0.04
0.02
0.00
Fig. 3. Asymptotic and saddlepoint Burg andYW c.i. coverage and underage probabilities based on 1000 simulated realizations from SAR(p) models
with n = 10 and p = 1. The nominal levels of 0.95 and 0.025 are indicated by the solid horizontal line.
the support, rU , U eventually becomes equal to rU , and cannot increase further. For small samples therefore, YW
saddlepoint c.i.s cannot be constructed if the estimates are close to ±1.
Consider now Burg and YW c.i. coverage and underage probabilities for p = 1 and 3, presented in Figs. 3 and
4, respectively. These plot the proportion of 1000 simulated realizations from the SAR(p) model for which the true
value of falls inside (coverage) and below (underage), the respective c.i. based on the estimated . The nominal
1944
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
Underage probability
Coverage probability
Burg (n=10, p=3)
asymptotic
1.00
saddlepoint
0.95
0.90
0.0
0.2
0.4
0.6
True φ
0.8
0.06
0.04
0.02
0.00
1.0
0.0
0.2
0.4
0.6
True φ
0.8
1.0
0.0
0.2
0.4
0.6
True φ
0.8
1.0
Underage probability
Coverage probability
Yule−Walker (n=10, p=3)
1.00
0.95
0.90
0.0
0.2
0.4
0.6
True φ
0.8
1.0
0.06
0.04
0.02
0.00
Fig. 4. Asymptotic and saddlepoint Burg andYW c.i. coverage and underage probabilities based on 1000 simulated realizations from SAR(p) models
with n = 10 and p = 3. The nominal levels of 0.95 and 0.025 are indicated by the solid horizontal line.
levels of 0.95 and 0.025 are indicated by the solid horizontal line. We observe that the saddlepoint c.i. coverages and
underages (and therefore overages) are very close to their nominal levels. With the exception of YW with p = 1, the
asymptotic coverages and underages are generally not as good, and fare worse for larger p. Note how theYW saddlepoint
coverages and underages become zero once the true exceeds the support of the sampling density. In summary
saddlepoint c.i.s outperform asymptotic c.i.s; the latter are sometimes shorter but this comes at the expense of accuracy of
coverage.
6. Assessing robustness of the confidence intervals
Pivoting or inverting the approximated cdf of the sampling distribution of the estimator of interest, as proposed in
the previous section, is a method of constructing confidence intervals that is gaining in popularity as computing power
increases. An early attempt at this was undertaken by Andrews (1993) in the context of the LSE for an AR(1). Since from
(3) the LSE can be written as a ratio of quadratic forms in normal random variables, Andrews (1993) used the Imhof
(1961) algorithm to compute the cdf. More recently, Broda et al. (2007) have increased the accuracy and efficiency of
this approach by utilizing saddlepoint approximations to compute both the pdf and cdf.
Our method likewise saddlepoint approximates the cdf of the estimator, and there is consequently some overlap with
the work of Broda et al. (2007). Since these approaches work by numerical inversion of the characteristic function, they
should all perform similarly for the LSE, or for that matter any of the generalized estimators ˆ c1 ,c2 . However, while
the other methods can only deal with the cdf of estimators that can be expressed as linear combinations of chi-square
random variates (which rules out the MLE), our method applies in the less restrictive case where the mgf of merely
the estimating equation that defines the estimator of interest can be computed (or approximated, see e.g. Easton and
Ronchetti, 1986). Additionally, the saddlepoint method enjoys some numerical advantages over other approximation
methods as pointed out by Butler and Paolella (1998b).
A natural question that arises when a new method is proposed is the issue of robustness with respect to departures from
the assumptions. This was also addressed by Andrews (1993) for the LSE and Broda et al. (2007) for its bias-adjusted
variants, using a variety of nonnormal distributions for the driving white noise or innovations. A moderate amount
of robustness was found. Our intent in this section is to build on this work by investigating the robustness properties
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
1945
1.00
bootstrap
asymptotic
saddlepoint
0.95
0.90
0.0
0.2
0.4
0.6
True φ
0.8
Underage probability
Coverage probability
Burg with AL noise (κ=1)
0.06
0.04
0.02
0.00
1.0
0.0
0.2
0.4
0.6
True φ
0.8
1.0
0.0
0.2
0.4
0.6
True φ
0.8
1.0
Underage probability
Coverage probability
Burg with AL noise (κ=0.5)
1.00
0.95
0.90
0.0
0.2
0.4
0.6
True φ
0.8
1.0
0.06
0.04
0.02
0.00
Fig. 5. Asymptotic (dots), bootstrap (dots-dashes), and saddlepoint (dashes) Burg c.i. coverage and underage probabilities based on 1000 simulated
realizations from an AR(1) model of sample size n = 10. The model is driven by symmetric Laplace noise in the top plots ( = 1), and right-skewed
Laplace noise in the bottom plots ( = 0.5). The nominal confidence levels of 0.95 and 0.025 are indicated by the solid horizontal lines. The bootstrap
values are based on 500 resamplings.
of the MLE. Since Burg closely approximates the MLE, we will focus on it instead due to its smaller computational
burden.
For the innovations we will consider the family of Asymmetric Laplace (AL) distributions introduced by Kotz et al.
(2001). These generalize the symmetric Laplace location–scale family to allow for skewness. Our choice in considering
it here is motivated by the fact that the AL distribution is leptokurtic (heavy-tailed) and shows promise in financial
modeling (Kotz et al., 2001, Chapter 8). Its rich structure also allows natural extensions to stable laws, commonly used
in such applications (Kotz et al., 2001, Chapter 4). We will draw {Zt } of model (1) from an AL(,
√, ) distribution
with scale parameter = 1, skewness parameter 0 < 1, and location parameter = ( − 1/)/ 2. These choices
of location and scale will ensure the innovations have zero mean and unit variance. The choice of skewness = 1
corresponds to the standard symmetric Laplace case, with ↓ 0 producing an increasingly right-skewed distribution.
Fig. 5 shows the results of a simulation study similar to those of Section 5, by displaying coverage and underage
probabilities for samples of size 10 drawn from an AR(1) model driven by the above AL noise. The top plots are for
symmetric noise ( = 1), while the bottom plots correspond to right-skewed noise with = 0.5. Ninety-five percent c.i.s
for three estimators are compared: asymptotic as before, saddlepoint as before, and bootstrap. The latter is obtained by
resampling with replacement 500 times from the estimated innovations, and obtaining the 0.025 and 0.975 quantiles
empirically from the resulting cdf. This is the standard bootstrap method for ARMA models, see e.g. Example 2.30 of
Shumway and Stoffer (2000) for more details.
Both the asymptotic and saddlepoint coverages are mostly consistently below the nominal level, the latter being
substantially better than the former. The bootstrap coverages are systematically and severely off the nominal level for
anything other than 0.4 < < 0.6. This may be due to the small sample size entertained here. The remarkable feat of
the saddlepoint coverages is that they are only slightly off despite the fact that they correspond to the only incorrect
method here. That is, while the asymptotic and bootstrap methods are applicable in any situation where the innovations
are iid (regardless of distribution), the saddlepoint method relies on both iid and normality of the same and is therefore
misspecified in this situation. It could in principle be improved by computing (or approximating) the correct cgf of
(r) in (12) for the case of AL noise.
1946
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
7. An application to repeated measurements data
Although the applicability of the usual AR(1) model is ubiquitous, being appropriate whenever serial correlation
decays geometrically with time lag, the reader might question the practical utility of a SAR(p) model with p 2. In
this section we demonstrate a novel application of a general SAR(p) model to analyzing repeated measurements or
longitudinal data.
Recall from Section 2 that the autocorrelation function (ACF) of model (1) at lag h is h ≡ h /0 = k , if h = kp is
an integer multiple of p, and zero otherwise. Likewise, it can be shown that the partial autocorrelation function (PACF)
at lag p is just , and zero at all other positive lags. In using sample ACF and PACF plots as model identification
tools, the characteristic signature of a SAR(p) model would therefore have gaps indistinguishable from a white noise
sequence at all lags, except at the appropriate multiples of p. In particular, any contiguous stretch of p − 1 observations
is serially uncorrelated, with observations separated by p lags having a geometrically decaying correlation structure.
This autocorrelation structure would be identical to one formed by concatenating together p independent realizations
of an AR(1) process, grouping first by realization and then by time. This is well suited for analyzing several independent
short sequences of repeated measurements prevalent in longitudinal data, each sequence emanating from a different
subject. Specifically, let Yi (tj ) denote the response for subject i, i = 1, . . . , m, observed at time tj , j = 1, . . . , n. A
parametric longitudinal data analysis would typically proceed by grouping repeated measurements for each subject
into vectors, Yi = [Yi (t1 ), . . . , Yi (tn )] , and then proposing a multivariate normal model, Yi ∼ N(μ, ), i = 1, . . . , m.
The hope is that the nonstationary behavior of each time series can be captured by the mean profile, here entertained
as being identical for all subjects. Some stationary covariance structure is then postulated for the matrix , so that the
number of parameters is reduced and useful inferences can be drawn. The choice of covariance structure can be gleaned
by inspecting for example the sample variogram (Diggle, 1990).
An alternative time series-based analysis of such data when the time gaps are equally-spaced would proceed as
follows. Order the Yi (tj ) by subject within time
Y1 (t1 ), . . . , Ym (t1 ), Y1 (t2 ), . . . , Ym (t2 ), . . . , Y1 (tn ), . . . , Ym (tn ),
(23)
and call this rearranged series {Xt }, t = 1, . . . , mn. If the observations within each subject follow an AR(1) with
coefficient , then {Xt } will follow a SAR(m) with coefficient . It is then straightforward to apply the usual time
series methodology to make both in-sample and out-of-sample inferences (e.g. parameter estimation and forecasting).
As an example, consider the body weights of rats data set analyzed in Diggle (1990). To focus on small sample
inferences, we consider only the control group consisting of 10 rats whose body weight was recorded weekly for
five successive weeks. To correct for heteroscedasticity, we let Yi (tj ) denote the log body weight of the ith rat,
i = 1, . . . , 10, recorded at week j, j = 1, . . . , 5. Let {Wt }, t = 1, . . . , 50, denote the rearranged {Yi (tj )} as in (23). A
time series plot of {Wt } can be seen in Fig. 6, whence it is immediately apparent that the data need to be detrended before
proceeding with a stationarity-based analysis. Subtracting a quadratic time trend (t) = 0 + 1 t + 2 t 2 as proposed in
Diggle (1990, Chapter 5) leads to the residual series Xt ≡ Wt − (t), which exhibits the characteristic signature of a
SAR(10) with coefficient close to unity, as exemplified by the sample ACF and PACF plots of Fig. 6. (The sample ACF
ˆ
where (t)
ˆ is the MLE of (t) discussed below.)
and PACF are based on the empirical series Xˆ t ≡ Wt − (t),
The proposed regression with SAR(10) errors model is then
Wt = 0 + 1 t + 2 t 2 + Xt ,
Xt = Xt−10 + Zt ,
{Zt } ∼ iid N(0, 2 ),
(24)
which can be fitted with standard time series software. Employing for example the ITSM2000 package of Brockwell
and Davis (2002) which fits via exact maximum (Gaussian) likelihood for all parameters leads to the estimates in Table
1. The 95% AS c.i.s are asymptotic, based on the observed joint Fisher information matrix for all parameters. The SP
c.i. for was computed via the saddlepoint approximation method as in Example 1, using the residual series Xˆ t as the
data. This is appropriate since for fixed the MLEs of (t) (which coincide with the generalized LSEs) can be profiled
in the likelihood. As expected with this sample size, the AS and SP c.i.s are very similar. The fitted model passes all
the standard goodness of fit checks.
The main attraction of this time series-based analysis is that routine software can quickly handle what could otherwise
be a fairly laborious longitudinal modeling exercise. In addition, the time series model has the advantage of being
immediately capable of producing linear minimum mean squared error forecasts. Fitting model (24) only to the first
Wt (log weight)
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
1947
observed
predicted
95% prediction bounds
5.0
4.6
4.2
3.8
0
10
20
30
40
50
t
ACF of Xt
PACF of Xt
1.0
1.0
0.0
0.0
−1.0
−1.0
0
10
20
Lag
30
40
0
10
20
Lag
30
40
Fig. 6. Time series analysis of the log body weights of rats data. The top plot shows the original data, {Wt }, along with SAR(10) model-based
forecasts obtained from the first 40 values. The bottom plots show sample ACF and PACF values for {Xt }, the detrended {Wt }.
Table 1
Maximum likelihood parameter estimates and 95% c.i.s for the regression with SAR(10) errors model (24) fitted to the log body weights of rats data
Parameter
0
1
2
Estimate
95% AS c.i.
95% SP c.i.
3.769
(3.677, 3.861)
4.166E − 2
(3.798E − 2, 4.534E − 2)
2.842E − 4
(2.178E − 4, 3.506E − 4)
0.9400
(0.8921, 0.9879)
(0.8730, 0.9807)
The AS c.i. uses the asymptotic distribution; the SP c.i. approximates the finite sample distribution via the saddlepoint method. The estimate of the
white noise variance is ˆ 2 = 2.331E − 3.
40 observations (t = 1, . . . , 40) leads to the predicted values for t = 41, . . . , 50 displayed in Fig. 6 (dashed line). That
40 (h) = a1 W1 + · · · + a40 W40 , such
is, the forecast at time t = 40 + h, h = 1, . . . , 10, is the linear combination W
40 (h)]2 is minimum. The associated 95% prediction bands (dotted lines) are exact, based on the
that E[W40+h − W
normality assumption for Zt . Note that the corresponding observed values fall within the bands, further evidence for
the plausibility of the fitted model.
Acknowledgments
The second author would like to thank Peter J. Brockwell, Ronald W. Butler, and Richard A. Davis, for the helpful
comments and suggestions provided on a precursor of this manuscript, as part of his Ph.D. dissertation at Colorado
State University. We are also indebted to two anonymous referees for suggestions that resulted in an improved paper.
Appendix A. Proof of Proposition 1
Let {r1 , r2 , r3 } be the three roots of the cubic ϑ(r) defined by (18). We will show that there is a unique real root with
ˆ . First note that by Remark 2, 2|ˆ | (ˆ + ˆ (p) ), so
absolute value less than unity, and this will by definition be ML
p
0
0
that except in the degenerate cases when this becomes an equality,
(p)
ϑ(−1) = p[2ˆp + (ˆ0 + ˆ 0 )] > 0,
(p)
ϑ(1) = p[2ˆp − (ˆ0 + ˆ 0 )] < 0,
1948
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
and, ϑ(0) = nˆp . Therefore one root, r1 say, is smaller than −1; and one root, r3 say, is larger than 1. If ˆ p < 0,
we must then have −1 < r2 < 0; if ˆ p = 0, r2 = 0; and if ˆ p > 0, 0 < r2 < 1. Thus ϑ(r) has three distinct real roots,
ˆ ML < 1 < r3 , and so the discriminant D < 0.
r1 < − 1 < r2 ≡ The roots of such a cubic can be obtained via trigonometry; they are
1
ri = 2 −Q cos( + i ) − , i = 1, 2, 3,
3
where = 13 arccos(R/ −Q3 ), and the possible values for i belong to the set {0, 2/3, 4/3} (not necessarily in that
order). It remains to show that 2 = 4/3. Now, since the principal values of the inverse cosine function lie in [0, ],
we have 0 /3, whence,
2
2
+
3
3
and
4 5
4
.
+
3
3
3
But this means that
4
2
cos +
cos(),
cos +
3
3
√
and since 2 −Q > 0, the assignment 1 ≡ 2/3, 2 ≡ 4/3, and 3 ≡ 0 is the one that ensures r1 < r2 < r3 , as
required.
Appendix B. Proof of Proposition 2
First note that E[(c1 ; )]2 = E1 − c1 E2 + c12 E3 is a quadratic in c1 , where
E1 = E[S 2 + 2 (T2 + T3 )2 − 2S(T2 + T3 )],
E3 = E[2 (T3 − T1 )2 ]
and
E2 = E[2S(T1 − T3 ) + 22 (T2 + T3 )(T3 − T1 )].
Since E3 0, E[(c1 ; )]2 is minimized when c1 = E2 /(2E3 ). It therefore remains to show that E2 = E3 . To this end,
note the following facts: (i) E[T1 T2 ] = E[T2 T3 ], (ii) ET12 = ET32 , and (iii) E[S(T1 − T3 )] = 0. This then gives
E2 = 22 E[(T2 + T3 )(T3 − T1 )] by (iii),
= 22 {ET32 − E[T1 T3 ]}
by (i).
On the other hand, expanding E3 and using (ii), we obtain that E3 = E2 as above.
To verify the three facts, notice that (i) and (ii) follow easily from stationarity of the covariance structure of X and
symmetry considerations, since T1 and T3 , being sums of squares of the first and last p observations, have identical
moments when squared or crossed with T2 , the sums of squares of the middle (n − 2p) observations. To show (iii), we
have that
T1 S = (X12 + · · · + Xp2 )(Xp+1 X1 + · · · + Xn Xn−p ),
2
T3 S = (Xn−2p+1
+ · · · + Xn2 )(Xp+1 X1 + · · · + Xn Xn−p ).
Since X has a multivariate normal distribution with stationary covariance structure |i−j | = E[Xi Xj ], for any integers
i j k l its fourth order moments satisfy
E[Xi Xj Xk Xl ] = |i−j | |k−l| + |i−k| |j −l| + |i−l| |j −k| .
This fact and the obvious symmetry between T1 S and T3 S suffices to show that E[T1 S] = E[T3 S], which proves (iii).
R.L. Paige, A.A. Trindade / Journal of Statistical Planning and Inference 138 (2008) 1934 – 1949
1949
References
Andrews, D.W.K., 1993. Exactly median-unbiased estimation of first order autoregressive/unit root models. Econometrica 61, 139–166.
Brockwell, P.J., Davis, R.A., 2002. Introduction to Time Series and Forecasting. second ed. Springer, New York.
Brockwell, P.J., Davis, R.A., Trindade, A.A., 2004. Asymptotic properties of some subset vector autoregressive process estimators. J. Multivariate
Analysis 90, 327–347.
Brockwell, P.J., Dahlhaus, R., Trindade, A.A., 2005. Modified Burg algorithms for multivariate subset autoregression. Statist. Sin. 15, 197–213.
Broda, S., Carstensen, K., Paolella, M.S., 2007. Bias-adjusted estimation in the ARX(1) model. Comput. Statist. Data Anal. 51, 3355–3367.
Butler, R.W., Paolella, M.S., 1998a. Approximate distributions for the various serial correlograms. Bernoulli 4, 497–518.
Butler, R.W., Paolella, M.S., 1998b. Saddlepoint approximations to the density and distribution of ratios of quadratic forms in normal variables with
application to the sample autocorrelation function. Technical Report 98/16, Department of Statistics, Colorado State University.
Casella, G., Berger, R.L., 2002. Statistical Inference. second ed. Duxbury, Pacific Grove, CA.
Chan, N.H., 1989. On the nearly nonstationary seasonal time series. Canad. J. Statist. 17, 279–284.
Daniels, H.E., 1954. Saddlepoint approximations in statistics. Ann. Math. Statist. 25, 631–650.
Daniels, H.E., 1956. The approximate distribution of serial correlation coefficients. Biometrika 43, 169–185.
Daniels, H.E., 1983. Saddlepoint approximations for estimating equations. Biometrika 70, 89–96.
Diggle, P., 1990. Time Series: A Biostatistical Introduction. Oxford University Press, Oxford.
Dufour, J., Kiviet, J., 1998. Exact inference for first-order autoregressive distributed lag models. Econometrika 66, 79–104.
Durbin, J., 1980. The approximate distribution of partial serial correlation coefficients calculated from residuals from regression on Fourier series.
Biometrika 67, 335–349.
Easton, G.S., Ronchetti, E., 1986. General saddlepoint approximations with applications to L statistics. J. Amer. Statist. Assoc. 81, 420–430.
Fujikoshi, Y., Ochi, Y., 1984. Asymptotic properties of the maximum likelihood estimate in the first order autoregressive process. Ann. Inst. Statist.
Math. 36, 119–128.
Godambe, V.P., 1960. An optimum property of regular maximum likelihood estimation. Ann. Math. Statist. 31, 1208–1211.
Godambe, V.P., 1991. Estimating Equations. Oxford University Press, Oxford.
Goutis, C., Casella, G., 1999. Explaining the saddlepoint approximation. Amer. Statist. 53, 216–224.
Imhof, J.P., 1961. Computing the distribution of quadratic forms in normal variates. Biometrika 48, 419–426.
Kiviet, J., Phillips, G.D.A., 2005. Moment approximations for least-squares estimators in dynamic regression models with a unit root. Econometrics
J. 8, 115–142.
Kotz, S., Kozubowski, T., Podgórski, K., 2001. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications,
Economics, Engineering, and Finance. Birkhäuser, Boston.
Latour, A., Roy, R., 1987. Some exact results on the sample autocovariances of a seasonal ARIMA model. Canad. J. Statist. 15, 283–291.
Liebermann, O., 1994a. Saddlepoint approximation for the distribution of a ratio of quadratic forms in normal variables. J. Amer. Statist. Assoc. 89,
924–928.
Liebermann, O., 1994b. Saddlepoint approximation for the least squares estimator in first-order autoregression. Biometrika 81, 807–811.
Lugannani, R., Rice, S.O., 1980. Saddlepoint approximations for the distribution of sums of independent of random variables. Adv. Appl. Probab.
12, 475–490.
Lütkepohl, H., 1996. Handbook of Matrices. Wiley, Chichester.
Ochi, Y., 1983. Asymptotic expansions for the distribution of an estimator in the first order autoregressive process. J. Time Ser. Anal. 4, 57–67.
Parlett, B.N., 1980. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Cliffs, NJ.
Perron, P., 1996. The adequacy of asymptotic approximations in the near-integrated autoregressive model with dependent errors. J. Econometrics
70, 317–350.
Phillips, P.C.B., 1978. Edgeworth and saddlepoint approximations to the first-order noncircular autoregression. Biometrika 65, 91–98.
Shumway, R.H., Stoffer, D.S., 2000. Time Series Analysis and its Applications. Springer, New York.
Wang, S., 1992. Tail probability approximations in the first-order noncircular autoregression. Biometrika 79, 431–434.
White, J.S., 1961. Asymptotic expansions for the mean and variance of the serial correlation coefficient. Biometrika 48, 85–94.