pdf file

Transcription

pdf file
Published in: Journal of Statistical Planning and Inference 151-152 (2014), 37–59.
Parameter estimation for a subcritical affine two factor model
´tya
´s Barczy∗ , Leif Do
¨ ring, Zenghu Li, Gyula Pap
Ma
Abstract
For an affine two factor model, we study the asymptotic properties of the maximum likelihood and
least squares estimators of some appearing parameters in the so-called subcritical (ergodic) case based on
continuous time observations. We prove strong consistency and asymptotic normality of the estimators in
question.
1
Introduction
We consider the following 2-dimensional affine process (affine two factor model)
{
√
dYt = (a − bYt ) dt + Yt dLt ,
√
(1.1)
t > 0,
dXt = (m − θXt ) dt + Yt dBt ,
where a > 0, b, m, θ ∈ R, and (Lt )t>0 and (Bt )t>0 are independent standard Wiener processes. Note that
the process (Yt )t>0 given by the first SDE of (1.1) is the so-called Cox–Ingersol–Ross (CIR) process which
is a continuous state branching process with branching mechanism bz + z 2 /2, z > 0, and with immigration
mechanism az, z > 0. Chen and Joslin [9] applied (1.1) for modelling quantitative impact of stochastic recovery
on the pricing of defaultable bonds, see their equations (25) and (26).
The process (Y, X) given by (1.1) is a special affine diffusion process. The set of affine processes contains a
large class of important Markov processes such as continuous state branching processes and Ornstein–Uhlenbeck
processes. Further, a lot of models in financial mathematics are affine such as the Heston model [15], the model
of Barndorff-Nielsen and Shephard [4] or the model due to Carr and Wu [8]. A precise mathematical formulation
and a complete characterization of regular affine processes are due to Duffie et al. [13]. These processes are
widely used in financial mathematics due to their computational tractability, see Gatheral [14].
This article is devoted to estimate the parameters a, b, m and θ from some continuously observed real
data set (Yt , Xt )t∈[0,T ] , where T > 0. To the best knowledge of the authors the parameter estimation problem
for multi-dimensional affine processes has not been tackled so far. Since affine processes are frequently used
in financial mathematics, the question of parameter estimation for them is of high importance. In Barczy et
al. [1] we started the discussion with a simple non-trivial two-dimensional affine diffusion process given by (1.1)
in the so called critical case: b > 0, θ = 0 or b = 0, θ > 0 (for the definition of criticality, see Section
2). In the special critical case b = 0, θ = 0 we described the asymptotic behavior of least squares estimator
(LSE) of (m, θ) from some discretely observed low frequency real data set X0 , X1 , . . . , Xn as n → ∞. The
description of the asymptotic behavior of the LSE of (m, θ) in the other critical cases b = 0, θ > 0 or b > 0,
θ = 0 remained opened. In this paper we deal with the same model (1.1) but in the so-called subcritical
(ergodic) case: b > 0, θ > 0, and we consider the maximum likelihood estimator (MLE) of (a, b, m, θ) using
some continuously observed real data set (Yt , Xt )t∈[0,T ] , where T > 0, and the LSE of (m, θ) using some
continuously observed real data set (Xt )t∈[0,T ] , where T > 0. For studying the asymptotic behaviour of
the MLE and LSE in the subcritical (ergodic) case, one first needs to examine the question of existence of a
unique stationary distribution and ergodicity for the model given by (1.1). In a companion paper Barczy et
al. [2] we solved this problem, see also Theorem 2.5. Further, in a more general setup by replacing the CIR
process (Yt )t>0 in the first SDE of (1.1) by a so-called α-root process (stable CIR process) with α ∈ (1, 2),
the existence of a unique stationary distribution for the corresponding model was proved in Barczy et al. [2].
In general, parameter estimation for subcritical (also called ergodic) models has a long history, see, e.g., the
monographs of Liptser and Shiryaev [27, Chapter 17], Kutoyants [23], Bishwal [7] and the papers of Klimko
∗ Corresponding author
2010 Mathematics Subject Classifications: 62F12, 60J25.
Key words and phrases: affine process, maximum likelihood estimator, least squares estimator.
´
The research of M. Barczy and G. Pap was realized in the frames of TAMOP
4.2.4. A/2-11-1-2012-0001 ,,National Excellence
Program – Elaborating and operating an inland student and researcher personal support system”. The project was subsidized by
the European Union and co-financed by the European Social Fund. Z. Li has been partially supported by NSFC under Grant No.
11131003 and 973 Program under Grant No. 2011CB808001.
1
and Nelson [22] and Sørensen [32]. In case of the one-dimensional CIR process Y , the parameter estimation of a and b goes back to Overbeck and Ryd´en [28] (conditional LSE), Overbeck [29] (MLE), and
see also Bishwal [7, Example 7.6] and the very recent papers of Ben Alaya and Kebaier [5], [6] (MLE). In
Ben Alaya and Kebaier [5], [6] one can find a systematic study of the asymptotic behavior of the quadruplet
)
(
∫t
∫t
log(Yt ), Yt , 0 Ys ds, 0 1/Ys ds as t → ∞. Finally, we note that Li and Ma [25] started to investigate the
asymptotic behaviour of the (weighted) conditional LSE of the drift parameters for a CIR model driven by a
stable noise (they call it a stable CIR model) from some discretely observed low frequency real data set. To
give another example besides the one-dimensional CIR process, we mention a model that is somewhat related
to (1.1) and parameter estimation of the appearing parameters based on continuous time observations has been
considered. It is the so-called Ornstein–Uhlenbeck process driven by α-stable L´evy motions, i.e.,
dUt = (m − θUt ) dt + dZt ,
t > 0,
where θ > 0, m ̸= 0, and (Zt )t>0 is an α-stable L´evy motion with α ∈ (1, 2). For this model Hu and Long
investigated the question of parameter estimation, see [17], [18] and [19].
It would be possible to calculate the discretized version of the estimators presented in this paper using the
same procedure as in Ben Alaya and Kebaier [5, Section 4] valid for discrete time observations of high frequency.
However, it is out of the scope of the present paper.
We give a brief overview of the structure of the paper. Section 2 is devoted to a preliminary discussion of
the existence and uniqueness of a strong solution of the SDE (1.1), we make a classification of the model (see
Definition 2.4), we also recall our results in Barczy et al. [2] on the existence of a unique stationary distribution
and ergodicity for the affine process given by SDE (1.1), see Theorem 2.5. Further, we recall some limit theorems
for continuous local martingales that will be used later on for studying the asymptotic behaviour of the MLE
of (a, b, m, θ) and the LSE of (m, θ), respectively. In Sections 3–8 we study the asymptotic behavior of the
MLE of (a, b, m, θ) and LSE of (m, θ) proving that the estimators are strongly consistent and asymptotically
normal under appropriate conditions on the parameters. We call the attention that for the MLE of (a, b, m, θ)
we require a continuous time observation (Yt , Xt )t∈[0,T ] of the process (Y, X), but for the LSE of (m, θ) we
need a continuous time observation (Xt )t∈[0,T ] only for the process X. We note that in the critical case we
obtained a different limit behaviour for the LSE of (m, θ), see Barczy et al. [1, Theorem 3.2].
A common generalization of the model (1.1) and the well-known Heston model [15] is a general affine diffusion
two factor model
{
√
dYt = (a − bYt ) dt + σ1 Yt dLt ,
√
√
(1.2)
t > 0,
dXt = (m − κYt − θXt ) dt + σ1 Yt (ϱLt + 1 − ϱ2 dBt ),
where a, σ1 , σ2 > 0, b, m, κ, θ ∈ R, ϱ ∈ (−1, 1), and (Lt )t>0 and (Bt )t>0 are independent standard
Wiener processes. One does not need to estimate the parameters σ1 , σ2 and ϱ, since these parameters
could —in principle, at least— be determined (rather than estimated) using an arbitrarily short continuous
time observation of (Y, X), see Remark 2.5 in Barczy and Pap [3]. For studying the parameter estimation of
a, b, m, κ and θ in the subcritical case, one needs to investigate ergodicity properties of the model (1.2).
For the submodel (1.1), this has been proved in Barczy et al. [2], see also Theorem 2.5. For the Heston model,
ergodicity of the first coordinate process Y is sufficient for statistical purposes, see Barczy and Pap [3]; the
existence of a unique stationary distribution and the ergodicity for Y has been proved by Cox et al. [10,
Equation (20)] and Li and Ma [25, Theorem 2.6]. After showing appropriate ergodicity properties of the model
(1.2), one could obtain the asymptotic behavior of the MLE and LSE of (a, b, m, κ, θ) with a similar method
used in the present paper.
2
Preliminaires
Let N, Z+ , R and R+ denote the sets of positive integers, non-negative integers, real numbers and nonnegative real numbers, respectively. By ∥x∥ and ∥A∥ we denote the Euclidean norm of a vector x ∈ Rm
and the induced matrix norm ∥A∥ = sup{∥Ax∥ : x ∈ Rm , ∥x∥ = 1} of a matrix A ∈ Rn×m , respectively.
The next proposition is about the existence and uniqueness of a strong solution of the SDE (1.1) which
follows from the theorem due to Yamada and Watanabe, further it clarifies that the process given by the SDE
(1.1) belongs to the family of regular affine processes, see Barczy et al. [2, Theorem 2.2 with α = 2].
(
)
2.1 Proposition. Let Ω, F, P be a probability space, and (Lt , Bt )t∈R+ be a 2-dimensional standard Wiener
process. Let (η0 , ζ0 ) be a random vector independent of (Lt , Bt )t>0 satisfying P(η0 > 0) = 1. Then, for all
2
a > 0, and b, m, θ ∈ R, there is a (pathwise) unique strong solution (Yt , Xt )t>0 of the SDE (1.1) such that
P((Y0 , X0 ) = (η0 , ζ0 )) = 1 and P(Yt > 0, ∀ t > 0) = 1. Further, we have
(
)
∫ t
∫ t
√
(2.1)
Yt = e−b(t−s) Ys + a
e−b(s−u) du +
e−b(s−u) Yu dLu ,
0 6 s 6 t,
s
s
and
(2.2)
(
)
∫ t
∫ t
√
Xt = e−θ(t−s) Xs + m
e−θ(s−u) du +
e−θ(s−u) Yu dBu ,
s
0 6 s 6 t.
s
Moreover, (Yt , Xt )t>0 is a regular affine process with infinitesimal generator
1
′′
′′
(Af )(y, x) = (a − by)f1′ (y, x) + (m − θx)f2′ (y, x) + y(f1,1
(y, x) + f2,2
(y, x)),
2
′′
where (y, x) ∈ R+ × R, f ∈ Cc2 (R+ × R, R), fi′ , i = 1, 2, and fi,j
, i, j ∈ {1, 2}, denote the first and
second order partial derivatives of f with respect to its i-th and i-th and j-th variables, respectively, and
Cc2 (R+ × R, R) is the set of twice continuously differentiable real-valued functions defined on R+ × R having
compact support.
2.2 Remark. Note that in Proposition 2.1 the unique strong solution (Yt , Xt )t>0 of the SDE (1.1) is adapted
to the augmented filtration (Ft )t∈R+ corresponding to (Lt , Bt )t∈R+ and (η0 , ζ0 ), constructed as in Karatzas
and Shreve [21, Section 5.2]. Note also that (Ft )t∈R+ satisfies the usual conditions, i.e., the filtration (Ft )t∈R+
is right-continuous and F0 contains all the P-null sets in F. Further, (Lt )t∈R+ and (Bt )t∈R+ are
independent (Ft )t∈R+ -standard Wiener processes. In Proposition 2.1 it is the assumption a > 0 which ensures
P(Yt > 0, ∀ t > 0) = 1.
2
In what follows we will make a classification of the affine processes given by the SDE (1.1). First we recall
a result about the first moment of (Yt , Xt )t∈R+ , see Proposition 3.2 in Barczy et al. [1].
2.3 Proposition. Let (Yt , Xt )t∈R+ be an affine diffusion process given by the SDE (1.1) with a random initial
value (η0 , ζ0 ) independent of (Lt , Bt )t>0 such that P(η0 > 0) = 1, E(η0 ) < ∞ and E(|ζ0 |) < ∞. Then
[
] [
][
] [∫ t
][ ]
−bs
E(Yt )
e−bt
0
E(η0 )
e
ds
0
a
∫ t −θs
=
+ 0
,
t ∈ R+ .
E(Xt )
0
e−θt E(ζ0 )
0
e
ds
m
0
Proposition 2.3 shows that the asymptotic behavior of the first moment of (Yt , Xt )t∈R+ as t → ∞ is
determined by the spectral radius of the diagonal matrix diag(e−bt , e−θt ), which motivates our classification
of the affine processes given by the SDE (1.1).
2.4 Definition. Let (Yt , Xt )t∈R+ be an affine diffusion process given by the SDE (1.1) with a random initial
value (η0 , ζ0 ) independent of (Lt , Bt )t>0 satisfying P(η0 > 0) = 1. We call (Yt , Xt )t∈R+ subcritical, critical
or supercritical if the spectral radius of the matrix
[
]
e−bt
0
0
e−θt
is less than 1, equal to 1 or greater than 1, respectively.
Note that, since the spectral radius of the matrix given in Definition 2.4 is max(e−bt , e−θt ), the affine
process given in Definition 2.4 is
subcritical
critical
supercritical
if b > 0 and θ > 0,
if b = 0, θ > 0 or b > 0, θ = 0,
if b < 0 or θ < 0.
Further, under the conditions of Proposition 2.3, by an easy calculation, if b > 0 and θ > 0, then
[
] [ ]
a
E(Yt )
b ,
lim
= m
t→∞ E(Xt )
θ
3
if b = 0 and θ = 0, then
[
lim
t→∞
if b = 0 and θ > 0, then
[
lim
t→∞
if b > 0 and θ = 0, then
[
lim
t→∞
and if b < 0 and θ < 0, then
[
lim
t→∞
1
t E(Yt )
1
t E(Xt )
]
[ ]
a
=
,
m
] [ ]
E(Yt )
a
= m ,
E(Xt )
θ
1
t
] [ ]
a
E(Yt )
= b ,
1
m
t E(Xt )
] [
]
ebt E(Yt )
E(η0 ) − ab
=
.
eθt E(Xt )
E(ζ0 ) − m
θ
Remark also that Definition 2.4 of criticality is in accordance with the corresponding definition for onedimensional continuous state branching processes, see, e.g., Li [24, page 58].
P
L
In the sequel −→ and −→ will denote convergence in probability and in distribution, respectively.
The following result states the existence of a unique stationary distribution and the ergodicity for the affine
process given by the SDE (1.1), see Theorems 3.1 with α = 2 and Theorem 4.2 in Barczy et al. [2].
2.5 Theorem. Let us consider the 2-dimensional affine model (1.1) with a > 0, b > 0, m ∈ R, θ > 0, and
with a random initial value (η0 , ζ0 ) independent of (Lt , Bt )t>0 satisfying P(η0 > 0) = 1. Then
L
(i) (Yt , Xt ) −→ (Y∞ , X∞ ) as t → ∞, and the distribution of (Y∞ , X∞ ) is given by
{ ∫ ∞
}
(
)
m
(2.3)
E e−λ1 Y∞ +iλ2 X∞ = exp −a
(λ1 , λ2 ) ∈ R+ × R,
vs (λ1 , λ2 ) ds + i λ2 ,
θ
0
where vt (λ1 , λ2 ), t > 0, is the unique non-negative solution of the (deterministic) differential equation
{
∂vt
1
1 −2θt 2
2
t > 0,
λ2 ,
∂t (λ1 , λ2 ) = −bvt (λ1 , λ2 ) − 2 (vt (λ1 , λ2 )) + 2 e
(2.4)
v0 (λ1 , λ2 ) = λ1 .
(ii) supposing that the random initial value (η0 , ζ0 ) has the same distribution as (Y∞ , X∞ ) given in part
(i), we have (Yt , Xt )t>0 is strictly stationary.
(
)
(iii) for all Borel measurable functions f : R2 → R such that E |f (Y∞ , X∞ )| < ∞, we have
(
)
∫
1 T
(2.5)
P lim
f (Ys , Xs ) ds = E(f (Y∞ , X∞ )) = 1,
T →∞ T 0
where the distribution of (Y∞ , X∞ ) is given by (2.3) and (2.4).
Moreover, the random variable (Y∞ , X∞ ) is absolutely continuous, the Laplace transform of Y∞ takes the
form
E(e
(2.6)
−λ1 Y∞
(
)−2a
λ1
)= 1+
,
2b
λ1 ∈ R+ ,
i.e., Y∞ has Gamma distribution with parameters 2a and 2b, all the (mixed) moments of (Y∞ , X∞ ) of
n
any order are finite, i.e., E(Y∞
|X∞ |p ) < ∞ for all n, p ∈ Z+ , and especially,
E(Y∞ ) =
a
,
b
E(X∞ ) =
m
,
θ
a(2a + 1)
ma
aθ + 2bm2
2
,
E(Y∞ X∞ ) =
,
E(X∞
)=
,
2
2b
θb
2bθ2
(
)
a
2
E(Y∞ X∞
)=
θ(ab + 2aθ + θ) + 2m2 b(2θ + b) .
2
2
(b + 2θ)2b θ
2
E(Y∞
)=
4
In all what follows we will suppose that we have continuous time observations for the process (Y, X), i.e.,
(Yt , Xt )t∈[0,T ] can be observed for some T > 0, and our aim is to deal with parameter estimation of (a, b, m, θ).
We also deal with parameter estimation of θ provided that the parameter m ∈ R is supposed to be known.
Next we recall some limit theorems for continuous local martingales. We will use these limit theorems in the
sequel for studying the asymptotic behaviour of different kinds of estimators for (a, b, m, θ). First we recall a
strong law of large numbers for continuous local martingales, see, e.g., Liptser and Shiryaev [27, Lemma 17.4].
(
)
2.6 Theorem. Let
Ω, F, (Ft )t>0 , P
be a filtered probability space satisfying the usual conditions. Let
(Mt )t>0 be a square-integrable continuous local martingale with respect to the filtration (Ft )t>0 started from
0. Let (ξt )t>0 be a progressively measurable process such that
(∫
)
t
P
(ξu ) d⟨M ⟩u < ∞
2
= 1,
t > 0,
0
and
(
(2.7)
P
∫
t
lim
t→∞
)
(ξu ) d⟨M ⟩u = ∞ = 1,
2
0
where (⟨M ⟩t )t>0 denotes the quadratic variation process of M . Then
(
)
∫t
ξ
dM
u
u
(2.8)
P lim ∫ t 0
= 0 = 1.
t→∞
(ξ )2 d⟨M ⟩u
0 u
In case of Mt = Bt , t > 0, where (Bt )t>0 is a standard Wiener process, the progressive measurability of
(ξt )t>0 can be relaxed to measurability and adaptedness to the filtration (Ft )t>0 .
The next theorem is about the asymptotic behaviour of continuous multivariate local martingales.
(
)
2.7 Theorem. (van Zanten [33, Theorem 4.1]) Let Ω, F, (Ft )t>0 , P
be a filtered probability space
satisfying the usual conditions. Let (Mt )t>0 be a d-dimensional square-integrable continuous local martingale
with respect to the filtration (Ft )t>0 started from 0. Suppose that there exists a function Q : [0, ∞) → Rd×d
such that Q(t) is a non-random, invertible matrix for all t > 0, limt→∞ ∥Q(t)∥ = 0 and
P
Q(t)⟨M ⟩t Q(t)⊤ −→ ηη ⊤ as t → ∞,
(
)
where η is( a d × d
random
matrix
defined
on
Ω,
F,
P
. Then, for each Rk -valued random variable V
)
defined on Ω, F, P , it holds that
L
(Q(t)Mt , V ) −→ (ηZ, V )
as
t → ∞,
where Z is a d-dimensional standard normally distributed random variable independent of (η, V ).
We note that Theorem 2.7 remains true if the function Q, instead of the interval [0, ∞), is defined only
on an interval [t0 , ∞) with some t0 > 0.
3
Existence and uniqueness of maximum likelihood estimator
Let P(a,b,m,θ) denote the probability measure on the measurable space (C(R+ , R+ × R), B(C(R+ , R+ × R)))
induced by the process (Yt , Xt )t>0 corresponding to the parameters (a, b, m, θ) and initial value (Y0 , X0 ). Here
C(R+ , R+ ×R) denotes the set of continuous R+ ×R-valued functions defined on R+ , B(C(R+ , R+ ×R)) is the
Borel σ-algebra on it, and we suppose that the space (C(R+ , R+ × R), B(C(R+ , R+ × R))) is endowed with the
natural filtration (At )t>0 , given by At := φ−1
t (B(C(R+ , R+ ×R))), where φt : C(R+ , R+ ×R) → C(R+ , R+ ×R)
is the mapping φt (f )(s) := f (t ∧ s), s > 0. For all T > 0, let P(a,b,m,θ),T := P(a,b,m,θ) |AT be the restriction
of P(a,b,m,θ) to AT .
3.1 Lemma. Let a > 1/2, b, m, θ ∈ R, T > 0, and suppose that P(Y0 > 0) = 1. Let P(a,b,m,θ) and P(1,0,0,0)
denote the probability measures induced by the unique strong solutions of the SDE (1.1) corresponding to the
parameters (a, b, m, θ) and (1, 0, 0, 0) with the same initial value (Y0 , X0 ), respectively. Then P(a,b,m,θ),T
5
and P(1,0,0,0),T are absolutely continuous with respect to each other, and the Radon-Nykodim derivative of
P(a,b,m,θ),T with respect to P(1,0,0,0),T (so called likelihood ratio) takes the form
{∫ (
)
T
a − bYs − 1
m − θXs
(a,b,m,θ),(1,0,0,0)
LT
((Ys , Xs )s∈[0,T ] ) = exp
dYs +
dXs
Ys
Ys
0
1
−
2
∫
T
0
}
(a − bYs − 1)(a − bYs + 1) + (m − θXs )2
ds ,
Ys
where (Yt , Xt )t>0 denotes the unique strong solution of the SDE (1.1) corresponding to the parameters
(a, b, m, θ) and the initial value (Y0 , X0 ).
Proof. First note that the SDE (1.1) can be written in the form:
[ ] [[
] [ ] [ ]]
[√
Yt
Yt
−b 0
Yt
a
d
=
+
dt +
0
Xt
0 −θ Xt
m
0
√
Yt
][
]
dLt
,
dBt
t > 0.
Note also that under the condition a > 21 , for all y0 > 0, we have P(Yt > 0, ∀ t ∈ R+ | Y0 = y0 ) = 1, see, e.g.,
page 442 in Revuz and Yor [30]. Since P(Y0 > 0) = 1, by the law of total probability, P(Yt > 0, ∀ t ∈ R+ ) = 1.
We intend to use Lemma A.1. By Proposition 2.1, under the conditions of the present lemma, there is a
pathwise unique strong solution of the SDE (1.1). We has to check
∫
T
0
(a − bYs )2 + (m − θXs )2 + 1
ds < ∞
Ys
a.s. for all T ∈ R+ .
Since (Y, X) has continuous sample paths almost surely, this holds if
∫
T
(3.1)
0
1
ds < ∞
Ys
a.s. for all T ∈ R+ .
Under the conditions a > 1/2 and P(Y0 > 0) = 1, Theorems 1 and 3 in Ben-Alaya and Kebaier [6] yield
(3.1). More precisely, if a > 12 and y0 > 0, then Theorems 1 and 3 in Ben-Alaya and Kebaier [6] yield
(∫
)
T
1
P
ds < ∞ Y0 = y0 = 1,
T ∈ R+ .
0 Ys
Since P(Y0 > 0) = 1, by the law of total probability, we get (3.1). We give another, direct proof for
(3.1). Namely, since Y has continuous sample paths almost surely and P(Yt > 0, ∀ t ∈ R+ ) = 1, we have
P(inf t∈[0,T ] Yt > 0) = 1 for all T ∈ R+ , which yields (3.1).
2
By Lemma 3.1, under its conditions the log-likelihood function takes the form
∫
(a,b,m,θ),(1,0,0,0)
log LT
((Ys , Xs )s∈[0,T ] )
∫ T
1
1
= (a − 1)
dYs − b(YT − Y0 ) + m
dXs
Y
Y
s
s
0
0
∫ T
∫
∫
Xs
a2 − 1 T 1
b2 T
−θ
dXs −
ds + abT −
Ys ds
Ys
2
2 0
0
0 Ys
∫
∫ T
∫
m2 T 1
Xs
θ2 T Xs2
−
ds + mθ
ds −
ds
2 0 Ys
Ys
2 0 Ys
0
T
=: fT (a, b, m, θ),
T > 0.
We remark that for all T > 0 and all initial values (Y0 , X0 ), the probability measures P(a,b,m,θ),T , a > 1/2,
b, m, θ ∈ R, are absolutely continuous with respect to each other, and hence it does not matter which measure
is taken as a reference measure for defining the MLE (we have chosen P(1,0,0,0),T ). For more details, see, e.g.,
T
Liptser and Shiryaev [26, page 35]. Then the equation ∂f
∂θ (a, b, m, θ) = 0 takes the form
∫
T
−
0
Xs
dXs + m
Ys
∫
0
T
Xs
ds − θ
Ys
6
∫
0
T
Xs2
ds = 0.
Ys
Moreover, the system of equations
∂fT
(a, b, m, θ) = 0,
∂a
∂fT
(a, b, m, θ) = 0,
∂b
∂fT
(a, b, m, θ) = 0,
∂m
∂fT
(a, b, m, θ) = 0,
∂θ
takes the form
∫ T





0
−T
0
−T
∫T
Ys ds
0
0
0
0
1
Ys
ds
   ∫T

1
dYs
a
0 Ys
  
  b   −(YT − Y0 ) 

∫T 1
∫ T Xs 
 =  ∫T 1
.




ds
−
ds
m
dX
s 

0 Ys
0 Ys
s
∫
2
∫0 T YX
∫
T
T Xs
s
θ
− 0 X
− 0 Yss ds
ds
Ys dXs
0 Ys
0
0
0
0
(a,b,m,θ),(1,0,0,0)
First, we suppose that a > 1/2, and b ∈ R and m ∈ R are known. By maximizing log LT
in θ ∈ R, we get the MLE of θ based on the observations (Yt , Xt )t∈[0,T ] ,
θeTMLE
(3.2)
provided that
∫T
0
Xs2
Ys
−
:=
∫T
0
Xs
Ys
dXs + m
∫ T Xs2
ds
0 Ys
∫T
0
Xs
Ys
ds
,
T > 0,
ds > 0. Indeed,
∫
∂ 2 fT
(θ, m) = −
∂θ2
T
0
Xs2
ds < 0.
Ys
Using the SDE (1.1), one can also get
∫T
θeTMLE
(3.3)
provided that
∫T
0
Xs2
Ys
0
Xs
√
dBs
Ys
,
Xs2
ds
0 Ys
−θ = − ∫T
T > 0,
ds > 0. Note that the estimator θeTMLE does not depend on the parameters a > 1/2
(a,b,m,θ),(1,0,0,0)
and b ∈ R. In fact, if we maximize log LT
in (a, b, θ) ∈ R3 , then we obtain the MLE of (a, b, θ)
supposing that m ∈ R is known, and one can observe that the MLE of θ by this procedure coincides with
θeTMLE .
(a,b,m,θ),(1,0,0,0)
in (a, b, m, θ) ∈ R4 , the MLE of (a, b, m, θ) based on the observations
By maximizing log LT
(Yt , Xt )t∈[0,T ] takes the form
∫T
(3.4)
:=
b
aMLE
T
(3.5)
bbMLE :=
T
0
T
∫T
∫T
m
b MLE
T
(3.6)
:=
0
∫T
θbTMLE
(3.7)
:=
0
∫T
Ys ds 0 Y1s dYs − T (YT − Y0 )
,
∫T
∫T
Ys ds 0 Y1s ds − T 2
0
0
∫T 1
1
Ys dYs − (YT − Y0 ) 0 Ys
∫T
∫T
Ys ds 0 Y1s ds − T 2
0
Xs2
Ys
∫T
0
Ys ds
∫T
0
[
[
1
Ys
∫T
0
∫T
Xs2
0
Ys
Xs
Ys
ds
∫T
0
provided that
ds
ds
∫T
Xs2
Ys
1
Ys
0
∫T
0
1
Ys
ds
∂ 2 fT
∂m2 (a, b, m, θ)
∂ 2 fT
∂m∂θ (a, b, m, θ)
1
Ys
0
1
Ys
∫T
0
∫T
0
ds −
Xs2
Ys
1
Ys
(∫
ds
1
Ys
T > 0,
∫T
Xs
Ys
)2
0
dXs
,
T > 0,
ds
∫T
0
T Xs
0 Ys
∫T
0
ds
,
Xs
Ys dXs
,
)2
T > 0,
ds
ds −
(∫
T Xs
0 Ys
)2
ds > 0. Indeed,
[ ∫T
]
− 0 Y1s ds
T
∫T
=
,
T
− 0 Ys ds
] [ ∫T
]
∫ T Xs
∂ 2 fT
− 0 Y1s ds
ds
0 Ys
∂θ∂m (a, b, m, θ) =
,
∫ T Xs
∫ T X2
∂ 2 fT
ds − 0 Yss ds
∂θ 2 (a, b, m, θ)
0 Ys
∂ 2 fT
∂b∂a (a, b, m, θ)
∂ 2 fT
∂b2 (a, b, m, θ)
7
]
ds
T Xs
0 Ys
∫T
0
Xs
Ys
(∫
ds −
dXs −
∫T
ds − T 2 > 0 and
∂ 2 fT
∂a2 (a, b, m, θ)
∂ 2 fT
∂a∂b (a, b, m, θ)
dXs −
ds
T > 0,
and the positivity of
∫T
Ys ds
0
∫T
0
∫T
ds − T 2 and
1
Ys
Xs2
Ys
0
ds
∫T
1
Ys
0
ds −
(∫
T Xs
0 Ys
)2
∫T
ds
yield 0
1
Ys
ds > 0,
respectively. Using the SDE (1.1) one can check that
[
] [∫ T
1
ds
b
aMLE
−
a
T
0 Ys
=
bbMLE − b
−T
T
[
]
]−1 [ ∫ T
√1 dLs
−T
∫T
,
∫0 T √Ys
Y
ds
Ys dLs
−
s
0
0
] [ ∫T
]
∫ T s ]−1 [ ∫ T 1
1
√
dBs
ds
− 0 X
m
b MLE
−m
0
T
Ys ds
0 Ys
Ys
∫T
=
,
∫T s
∫ T Xs2
θbTMLE − θ
− 0 √XYs dBs
− 0 X
ds
Ys ds
0 Ys
s
and hence
∫T
−a=
(3.8)
b
aMLE
T
(3.9)
bbMLE − b =
T
0
T
∫T
∫T
m
b MLE
T
(3.10)
−m=
∫T
∫T √
Ys ds 0 √1Y dLs − T 0 Ys dLs
s
,
∫T
∫T 1
2
Y
ds
ds
−
T
s
0
0 Ys
0
∫T
√1 dLs −
0
Ys
∫T
∫T
Y
ds
s
0
0
0
∫T
ds 0
∫ T Xs2
Xs2
Ys
Ys
0
∫T
0
Ys dLs
ds − T 2
1
Ys
∫T
0
ds −
1
Ys
0
∫T √
ds
dBs −
√1
Ys
ds
1
Ys
Xs
Ys
(∫
ds
∫T
,
T > 0,
Xs
√
Ys
0
T Xs
0 Ys
T > 0,
)2
ds
dBs
,
T > 0,
and
∫T
θbTMLE − θ =
(3.11)
0
Xs
Ys
ds
∫T
0
provided that
∫T
0
Ys ds
∫T
0
1
Ys
∫T
0
Xs2
Ys
⌊nT ⌋
∑
i=1
1 (
Y i−1
∫T
0
∫T
n
⌊nT ⌋
∑
i=1
0
0
1
Ys
)
Xs2
Ys
Y i−1
n
T
Y i−1
1
Ys
Xs
Ys
X ni − X i−1 −→
n
T
0
n
Xs
√
Ys
)2
Xs
dXs
Ys
1
dXs
Ys
dBs
,
T > 0,
ds
(∫
T Xs
0 Ys
dXs and
T
∫
0
ds −
∫
0
∫T
T Xs
0 Ys
0
n
P
ds
1
dYs
Ys
) P
X ni − X i−1 −→
)
(∫
0
0
∫
1
Ys
∫T
∫T
n
1 (
0
ds
dYs ,
P
∫T
ds −
Y ni − Y i−1 −→
∑ X i−1 (
i=1
∫T
1
Ys
n
⌊nT ⌋
(3.12)
ds
ds − T 2 > 0 and
3.2 Remark. For the stochastic integrals
(3.7), we have
dBs −
√1
Ys
∫T
0
)2
ds > 0.
1
Ys
dXs in (3.4), (3.5), (3.6) and
as n → ∞,
as n → ∞,
as n → ∞,
following from
Proposition
I.4.44 in Jacod and Shiryaev [20] with the Riemann sequence of deterministic sub(
)
divisions ni ∧ T i∈N , n ∈ N. Thus, there exist measurable functions Φ, Ψ, Ξ : C([0, T ], R+ × R) → R such
∫T s
∫T 1
∫T
dXs = Ξ((Ys , Xs )s∈[0,T ] ),
that 0 Y1s dYs = Φ((Ys , Xs )s∈[0,T ] ), 0 X
Ys dXs = Ψ((Ys , Xs )s∈[0,T ] ) and
0 Ys
since the convergences in (3.12) hold almost surely along suitable subsequences, the members of the sequences
in (3.12) are measurable functions of (Ys , Xs )s∈[0,T ] , and one can use Theorems 4.2.2 and 4.2.8 in Dudley [12].
Hence the right hand sides of (3.4), (3.5), (3.6) and (3.7) are measurable functions of (Ys , Xs )s∈[0,T ] , i.e., they
are statistics.
2
The next lemma is about the existence of θeTMLE (supposing that a > 12 , b ∈ R and m ∈ R are known).
3.3 Lemma. If a > 12 , b, m, θ ∈ R, and P(Y0 > 0) = 1, then
(∫
)
T
Xs2
(3.13)
P
ds ∈ (0, ∞) = 1 for all T > 0,
Ys
0
and hence there exists a unique MLE θeTMLE which has the form given in (3.2).
8
Proof. First note that under the condition a > 12 , for all y0 > 0, we have P(Yt > 0, ∀ t ∈ R+ | Y0 = y0 ) = 1,
see, e.g., page 442 in Revuz and Yor [30]. Since P(Y0 > 0) = 1, by the law of total probability, we get
P(Yt > 0, ∀ t ∈ R+ ) = 1.( Note also that, since
X has continuous trajectories almost surely, by the proof of
)
∫ T X 2 (ω)
∫ T Xs2
Lemma 3.1, we have P 0 Ys ds ∈ [0, ∞) = 1 for all T > 0. Further, for any ω ∈ Ω, 0 Yss(ω) ds = 0
holds if and only if Xs (ω) = 0 for almost every s ∈ [0, T ]. Using that X has continuous trajectories almost
surely, we have
(∫
)
T
Xs2
P
ds = 0 > 0
Ys
0
holds if and only if P(Xs = 0, ∀ s ∈ [0, T ]) > 0. Due to a > 12 there does not exist a constant c ∈ R such
that P(Xs = c, ∀ s ∈ [0, T ]) > 0. Indeed, if c ∈ R is such that P(Xs = c, ∀ s ∈ [0, T ]) > 0, then using (2.2),
we have
(
)
∫ s
∫ s
√
−θs
θu
θu
c=e
c+m
e du +
e
Yu dBu ,
s ∈ [0, T ],
0
0
on the event {ω ∈ Ω : Xs (ω) = c, ∀ s ∈ [0, T ]}. In case θ ̸= 0, the process
(∫ s
)
√
θu
Yu dBu
e
0
s∈[0,T ]
(
)
would be equal to the deterministic process (c − m/θ)(eθs − 1) s∈[0,T ] on the event {ω ∈ Ω : Xs (ω) =
c,
T ]} having
positive probability. Since the quadratic variation of the deterministic process
( ∀ s ∈ [0,θs
)
(c − m/θ)(e − 1) s∈[0,T ] is the identically zero process (due to the fact that the quadratic variation pro(∫ s θu √
)
cess is a process starting from 0 almost surely), the quadratic variation of
e
Yu dBu s∈[0,T ] should
0
∫s
be identically zero on the event {ω ∈ Ω : Xs (ω) = c, ∀ s ∈ [0, T ]}. This would imply that 0 e2θu Yu du = 0
for all s ∈ [0, T ] on the event {ω ∈ Ω : Xs (ω) = c, ∀ s ∈ [0, T ]}. Using the almost sure continuity and
non-negativeness of the sample paths of Y , we have Ys = 0 for all s ∈ [0, T ] on the event
{ω ∈ Ω : Xs (ω) = c, ∀ s ∈ [0, T ]} ∩ {ω ∈ Ω : (Ys (ω))s∈[0,T ] is continuous}
(
)
having positive probability. If θ = 0, then replacing (c − m/θ)(eθs − 1) s∈[0,T ] by (−ms)s∈[0,T ] , one can
arrive at the same conclusion. Hence P(inf{t ∈ R+ : Yt = 0} = 0) > 0 which leads us to a contradiction. This
implies (3.13).
The above argument also shows that we can make a shortcut by arguing in a little bit different way. Indeed,
if C is a random variable such that P(Xs = C, ∀ s ∈ [0, T ]) > 0, then on the event {ω ∈ Ω : Xs (ω) =
C(ω), ∀ s ∈ [0, T ]}, the quadratic variation
(∫ of X) would be identically zero.∫ Since, by the SDE (1.1), the
t
s
, it would imply that 0 Yu du = 0 for all s ∈ [0, T ]
quadratic variation of X is the process
Y du
0 u
t>0
on the event {ω ∈ Ω : Xs (ω) = C(ω), ∀ s ∈ [0, T ]}. Using the almost sure continuity and non-negativeness of
the sample paths of Y , we have Ys = 0 for all s ∈ [0, T ] on the event
{ω ∈ Ω : Xs (ω) = C(ω), ∀ s ∈ [0, T ]} ∩ {ω ∈ Ω : (Ys (ω))s∈[0,T ] is continuous}
2
having positive probability. As before, this leads us to a contradiction.
The next lemma is about the existence of (b
aMLE
, bbMLE
,m
b MLE
, θbTMLE ).
T
T
T
3.4 Lemma. If a > 12 , b, m, θ ∈ R, and P(Y0 > 0) = 1, then
(∫
)
∫ T
T
1
P
Ys ds
(3.14)
ds − T 2 ∈ (0, ∞) = 1
0
0 Ys

(3.15)
P
∫
0
T
Xs2
Ys
∫
ds
0
T
1
ds −
Ys
(∫
0
T
)2
Xs
ds
Ys
for all T > 0,

∈ (0, ∞) = 1
for all T > 0,
and hence there exists a unique MLE (b
aMLE
, bbMLE
,m
b MLE
, θbTMLE ) which has the form given in (3.4), (3.5) (3.6)
T
T
T
and (3.7).
Proof. First note that under the condition a > 12 , as it was detailed in the proof of Lemma 3.3, we have
P(Yt > 0, ∀ t ∈ R+ ) = 1. By Cauchy-Schwarz’s inequality, we have
∫ T
∫ T
1
(3.16)
ds − T 2 > 0,
T > 0,
Ys ds
Y
s
0
0
9
∫T
and hence, using also that Y has continuous trajectories almost surely and 0 Y1s ds < ∞ (see the proof of
Lemma 3.1), we have
(∫
)
∫ T
T
1
2
P
Ys ds
ds − T ∈ [0, ∞) = 1,
T > 0.
0
0 Ys
Further, equality holds in (3.16) if and only if KYs = L/Ys for almost every s ∈ [0, T ] with some K, L > 0,
K 2 + L2 > 0 (K and L may depend on ω ∈ Ω and T > 0) or equivalently KYs2 = L for almost every
s ∈ [0, T ] with some K, L > 0, K 2 + L2 > 0. Note that if K were 0, then L would be 0, too, hence
K can not be 0 implying that equality holds in (3.16) if and only if Ys2 = L/K for almost every s ∈ [0, T ]
with some K > 0 and L > 0 (K and L may depend on ω ∈ Ω and T ). Using that Y has continuous
trajectories almost surely, we have
(∫
)
∫ T
T
1
2
(3.17)
P
Ys ds
ds − T = 0 > 0
0
0 Ys
holds if and only if P(Ys2 = L/K, ∀ s ∈ [0, T ]) > 0 with some random variables K and L such that
P(K > 0, L > 0) = 1 (K and L may depend on T ). Similarly, as it was explained at the end of the proof
of Lemma 3.3, this implies that the quadratic variation of the process (Ys2 )s∈[0,T ] would be identically zero on
the event {ω ∈ Ω : Ys2 (ω) = L(ω)/K(ω), ∀ s ∈ [0, T ]} having positive probability. Since, by Itˆo’s formula,
√
dYt2 = 2Yt dYt + Yt dt = (2Yt (a − bYt ) + Yt ) dt + 2Yt Yt dLt , t > 0,
(∫
)
∫s
t
the quadratic variation of (Yt2 )t>0 is the process
4Yu3 du
. If (3.17) holds, then 0 4Yu3 du = 0 for
0
t>0
all s ∈ [0, T ] on the event {ω ∈ Ω : Ys2 (ω) = L(ω)/K(ω), ∀ s ∈ [0, T ]} having positive probability. Using the
almost sure continuity and non-negativeness of the sample paths of Y , we have Ys = 0 for all s ∈ [0, T ] on
the event
{ω ∈ Ω : Ys2 (ω) = L(ω)/K(ω), ∀ s ∈ [0, T ]} ∩ {ω ∈ Ω : (Yt (ω))t>0 is continuous}
having positive probability. This yields us to a contradiction since P(Yt > 0, ∀ t > 0) = 1, implying (3.14).
Now we turn to prove (3.15). By Cauchy-Schwarz’s inequality, we have
∫
T
(3.18)
0
Xs2
ds
Ys
∫
T
0
1
ds −
Ys
(∫
T
0
)2
Xs
ds
> 0,
Ys
T > 0,
and hence, since X has continuous trajectories almost surely, by the proof of Lemma 3.1, we have


(∫
)2
∫ T 2
∫ T
T
X
1
X
s
s
P
ds
ds −
ds
∈ [0, ∞) = 1,
T > 0.
Ys
Ys
0
0 Ys
0
Further, equality holds in (3.18) if and only if KXs2 /Ys = L/Ys for almost every s ∈ [0, T ] with some
K, L > 0, K 2 + L2 > 0 (K and L may depend on ω ∈ Ω and T > 0) or equivalently KXs2 = L for
almost every s ∈ [0, T ] with some K, L > 0, K 2 + L2 > 0. Note that if K were 0, then L would be 0,
too, hence K can not be 0 implying that equality holds in (3.18) if and only if Xs2 = L/K for almost every
s ∈ [0, T ] with some K > 0 and L > 0 (K and L may depend on ω ∈ Ω and T ). Using that X has
continuous trajectories almost surely, we have


(∫
)2
∫ T 2
∫ T
T
X
1
X
s
s
(3.19)
ds
ds −
ds
= 0 > 0
P
Ys
Ys
0
0 Ys
0
holds if and only if P(Xs2 = L/K, ∀ s ∈ [0, T ]) > 0 with some random variables K and L such that
P(K > 0, L > 0) = 1 (K and L may depend on T ). Similarly, as it was explained at the end of the proof of
Lemma 3.3, this implies that the quadratic variation of the process (Xs2 )s∈[0,T ] would be identically zero on
the event {ω ∈ Ω : Xs2 (ω) = L(ω)/K(ω), ∀ s ∈ [0, T ]} having positive probability. Since, by Itˆo’s formula,
√
dXt2 = 2Xt dXt + Yt dt = (2Xt (m − θXt ) + Yt ) dt + 2Xt Yt dBt , t > 0,
(∫
)
∫s
t
the quadratic variation of (Xt2 )t>0 is the process
4Xu2 Yu du
. If (3.19) holds, then 0 4Xu2 Yu du = 0
0
t>0
for all s ∈ [0, T ] on the event {ω ∈ Ω : Xs2 (ω) = L(ω)/K(ω), ∀ s ∈ [0, T ]} having positive probability. Using
10
the almost sure continuity and non-negativeness of the sample paths of X 2 Y , we have Xs2 Ys = 0 for all
s ∈ [0, T ] on the event
{ω ∈ Ω : Xs2 (ω) = L(ω)/K(ω), ∀ s ∈ [0, T ]} ∩ {ω ∈ Ω : (Xt2 (ω)Yt (ω))t>0 is continuous} =: A
having positive probability. Since P(Yt > 0, ∀ t > 0) = 1, we have Xs = 0 for all s ∈ [0, T ] on the
event A having positive probability. Repeating the argument given in the proof of Lemma 3.3, we arrive at a
contradiction. This implies (3.15).
2
4
Existence and uniqueness of least squares estimator
Studying LSE for the model (1.1), the parameters a > 0 and b ∈ R will be not supposed to be known.
However, we will not consider the LSEs of a and b, we will focus only on the LSEs of m and θ, since we
would like use a continuous time observation only for the process X, and not for the process (Y, X), studying
LSEs.
First we give a motivation for the LSE based on continuous time observations using the form of the LSE
based on discrete time low frequency observations.
Let us suppose that m ∈ R is known (a > 0 and b ∈ R are not supposed to be known). The LSE of θ
based on the discrete time observations Xi , i = 0, 1, . . . , n, can be obtained by solving the following extremum
problem
n
∑
θenLSE,D := arg min
(Xi − Xi−1 − (m − θXi−1 ))2 .
θ∈R
i=1
Here in the notation θenLSE,D the letter D refers to discrete time observations, and we note that X0 denotes
an observation for the second coordinate of the initial value of the process (Y, X). This definition of LSE
of θ can be considered as the corresponding one given in Hu and Long [18, formula (1.2)] for generalized
Ornstein-Uhlenbeck processes driven by α-stable motions, see also Hu and Long [19, formula (3.1)]. For a
motivation of the LSE of θ based on the discrete observations Xi , i ∈ {0, 1, . . . , n}, see Remark 3.4 in Barczy
et al. [1]. Further, by Barczy et al. [1, formula (3.5)],
∑n
∑n
)Xi−1 − m ( i=1 Xi−1 )
LSE,D
i=1 (Xi − Xi−1
e
∑n
=−
θn
(4.1)
2
i=1 Xi−1
∑n
2
provided that
i=1 Xi−1 > 0. Motivated by (4.1), the LSE of θ based on the continuous time observations
(Xt )t∈[0,T ] is defined by
θeTLSE
(4.2)
provided that
∫T
0
:= −
0
∫T
Xs dXs − m 0 Xs ds
,
∫T
Xs2 ds
0
Xs2 ds > 0, and using the SDE (1.1) we have
θeTLSE − θ = −
(4.3)
provided that
∫T
∫T
0
∫T
0
√
Xs Ys dBs
,
∫T
Xs2 ds
0
Xs2 ds > 0.
Let us suppose that the parameters a > 0 and b, m ∈ R are not known. The LSE of (m, θ) based on the
discrete time observations Xi , i = 0, 1, . . . , n, can be obtained by solving the following extremum problem
(m
b LSE,D
, θbnLSE,D ) := arg min
n
n
∑
(θ,m)∈R2 i=1
(Xi − Xi−1 − (m − θXi−1 ))2 .
By Barczy et al. [1, formulas (3.27) and (3.28)],
∑n
∑n
∑n
∑n
2
i=1 Xi−1
i=1 (Xi − Xi−1 ) −
i=1 Xi−1
i=1 (Xi − Xi−1 )Xi−1
m
b LSE,D
=
(4.4)
∑n
∑n
n
2
2
n i=1 Xi−1 − ( i=1 Xi−1 )
and
(4.5)
θbnLSE,D =
∑n
i=1
Xi−1
∑n
∑n
i=1 (Xi − Xi−1 ) − n
i=1 (Xi
∑n
∑n
2
2
n i=1 Xi−1 − ( i=1 Xi−1 )
11
− Xi−1 )Xi−1
,
∑n
∑n
2
2
provided that n i=1 Xi−1
− ( i=1 Xi−1 ) > 0. Motivated by (4.4) and (4.5), the LSE of (m, θ) based on
the continuous time observations (Xt )t∈[0,T ] is defined by
(∫
) (∫
)
∫T
T
T
(XT − X0 ) 0 Xs2 ds − 0 Xs ds
X
dX
s
s
0
(4.6)
m
b LSE
:=
,
(∫
)2
T
∫T
T
2
T 0 Xs ds − 0 Xs ds
∫T
∫T
(XT − X0 ) 0 Xs ds − T 0 Xs dXs
LSE
b
,
θT :=
)2
(∫
∫T
T
T 0 Xs2 ds − 0 Xs ds
(4.7)
(∫
)2
∫T
∫T
T
provided that T 0 Xs2 ds − 0 Xs ds > 0. Note that, by Cauchy-Schwarz’s inequality, T 0 Xs2 ds −
(∫
)2
(∫
)2
∫T
∫T
T
T
Xs ds > 0, and T 0 Xs2 ds − 0 Xs ds > 0 yields that 0 Xs2 ds > 0. Then
0
[
]−1 [
]
] [
∫T
XT − X 0
T
− 0 Xs ds
m
b LSE
T
∫T
∫T
∫T 2
,
=
θbTLSE
− 0 Xs dXs
− 0 Xs ds
Xs ds
0
and using the SDE (1.1) one can check that
] [
]−1 [ ∫ T √
[
]
∫T
T
− 0 Xs ds
m
b LSE
−m
Ys dBs
T
0
∫T
∫T 2
∫T
=
,
√
θbTLSE − θ
− 0 Xs ds
Xs ds
− 0 Xs Ys dBs
0
and hence
m
b LSE
T
(4.8)
−m=
−
∫T
0
Xs ds
and
−T
θbTLSE − θ =
(4.9)
provided that T
∫T
0
Xs2 ds −
(∫
T
0
∑
0
∫T
∫T √
√
Xs Ys dBs + 0 Xs2 ds 0 Ys dBs
,
(∫
)2
∫T
T
T 0 Xs2 ds − 0 Xs ds
0
∫T
∫T √
√
Xs Ys dBs + 0 Xs ds 0 Ys dBs
,
)2
(∫
∫T
T
T 0 Xs2 ds − 0 Xs ds
)2
Xs ds > 0.
4.1 Remark. For the stochastic integral
⌊nT ⌋
∫T
∫T
∫T
0
Xs dXs in (4.2), (4.6) and (4.7), we have
(
) P
X i−1 X ni − X i−1 −→
n
∫
n
i=1
T
Xs dXs
as n → ∞,
0
following from Proposition I.4.44 in Jacod and Shiryaev [20]. For more details, see Remark 3.2.
2
The next lemma is about the existence of θeTLSE (supposing that m ∈ R is known, but a > 0 and b are
unknown).
4.2 Lemma. If a > 0, b, m, θ ∈ R, and P(Y0 > 0) = 1, then
(∫
)
T
(4.10)
P
Xs2 ds ∈ (0, ∞)
=1
for all T > 0,
0
and hence there exists a unique LSE θeTLSE which has the form given in (4.2).
Proof. First note that under the condition on the parameters, for all y0 > 0, we have
P(inf{t ∈ R+ : Yt = 0} > 0 | Y0 = y0 ) = 1,
and hence, by the law of total probability, P(inf{t ∈ R+ : Yt = 0} > 0) = 1.
trajectories almost surely, we readily have
(∫
)
T
2
P
Xs ds ∈ [0, ∞) = 1,
T > 0.
0
12
Since X has continuous
∫T
Observe also that for any ω ∈ Ω, 0 Xs2 (ω) ds = 0 holds if and only if Xs (ω) = 0 for almost every s ∈ [0, T ].
Using that X has continuous trajectories almost surely, we have
(∫
)
T
P
Xs2 ds = 0
>0
0
holds if and only if P(Xs = 0, ∀ s ∈ [0, T ]) > 0. By the end of the proof of Lemma 3.3, if P(Xs = 0, ∀ s ∈
[0, T ]) > 0, then Ys = 0 for all s ∈ [0, T ] on the event
{
}
{ω ∈ Ω : Xs (ω) = 0, ∀ s ∈ [0, T ]} ∩ ω ∈ Ω : (Ys (ω))s∈[0,T ] is continuous
having positive probability. This would yield that P(inf{t ∈ R+ : Yt = 0} = 0) > 0 leading us to a
contradiction, which implies (4.10).
2
, θbTLSE ).
The next lemma is about the existence of (m
b LSE
T
4.3 Lemma. If a > 0, b, m, θ ∈ R, and P(Y0 > 0) = 1, then


)2
(∫
∫ T
T
(4.11)
P T
Xs2 ds −
Xs ds
∈ (0, ∞) = 1
0
for all T > 0,
0
and hence there exists a unique LSE (m
b LSE
, θbTLSE ) which has the form given in (4.6) and (4.7).
T
Proof. Just as in the proof of Lemma 4.2, we have P(inf{t ∈ R+ : Yt = 0} > 0) = 1. By Cauchy-Schwarz’s
inequality, we have
(∫
)2
∫
T
(4.12)
T
Xs2 ds −
T
Xs ds
0
> 0,
T > 0,
0
and hence, since X has continuous trajectories almost surely, we readily have


(∫
)2
∫ T
T
P T
Xs2 ds −
Xs ds
∈ [0, ∞) = 1,
T > 0.
0
0
Further, equality holds in (4.12) if and only if KXs2 = L for almost every s ∈ [0, T ] with some K, L > 0,
K 2 + L2 > 0 (K and L may depend on ω ∈ Ω and T ). Note that if K were 0, then L would be 0,
too, hence K can not be 0 implying that equality holds in (4.12) if and only if Xs2 = L/K for almost every
s ∈ [0, T ] with some K > 0 and L > 0 (K and L may depend on ω ∈ Ω and T ). Using that X has
continuous trajectories almost surely, we have


(∫
)2
∫ T
T
P T
Xs2 ds −
Xs ds
(4.13)
= 0 > 0
0
0
holds if and only if P(Xs2 = L/K, ∀ s ∈ [0, T ]) > 0 with some random variables K and L such that
P(K > 0, L > 0) = 1 (K and L may depend on T ). Using again that X has continuous trajectories almost
surely, we √
have (4.13) holds if and only if P(Xs = C,
√ ∀ s ∈ [0, T ]) > 0 with some random variable C (note
that C = L/K if X is non-negative and C = − L/K if X is negative). This leads us to a contradiction
by the end of the proof of Lemma 3.3. Indeed, if P(Xs = C, ∀ s ∈ [0, T ]) > 0 with some random variable C,
then, by the end of the proof of Lemma 3.3, we have Ys = 0 for all s ∈ [0, T ] on the event
{
} {
}
ω ∈ Ω : Xs (ω) = C(ω), ∀ s ∈ [0, T ] ∩ ω ∈ Ω : (Ys (ω))s∈[0,T ] is continuous
having positive probability. This would yield that P(inf{t ∈ R+ : Yt = 0} = 0) > 0 leading us to a
contradiction, which implies (4.11).
2
5
Consistency of maximum likelihood estimator
5.1 Theorem.( If a > 12 , b > 0,
) m ∈ R, θ > 0, and P(Y0 > 0) = 1, then the MLE of θ is strongly
MLE
consistent: P limT →∞ θeT
= θ = 1.
13
Proof. By Lemma 3.3, there exists a unique MLE θeTMLE of θ which has the form given in (3.2). Further, by
(3.3),
∫ T Xs
√
dBs
0
Y
θeTMLE − θ = − ∫ T Xs2
,
T > 0.
s
ds
0 Ys
The strong consistency of θeTMLE will follow from a strong law of large numbers for continuous local martingales
(see, e.g., Theorem 2.6) provided that we check that
(∫
)
T
Xs2
(5.1)
P
ds < +∞ = 1,
T > 0,
Ys
0
(
(5.2)
P
∫
)
Xs2
ds = +∞ = 1.
Ys
T
lim
T →∞
0
Since (Y, X) has continuous trajectories almost surely, we have (5.1). Next we check (5.2). By Theorem 2.5,
we have
(
)
∫
2
X∞
1 T Xs2
ds = E
lim
a.s.,
T →∞ T 0 1 + Ys
1 + Y∞
where
(
)
2
X∞
aθ + 2bm2
2
E
6 E(X∞
)=
< ∞.
1 + Y∞
2bθ2
2
2
2
Next we check that E(X∞
/(1 + Y∞ )) > 0. Since X∞
/(1 + Y∞ ) > 0, we have E(X∞
/(1 + Y∞ )) = 0 holds if
2
and only if P(X∞ /(1 + Y∞ ) = 0) = 1 or equivalently P(X∞ = 0) = 1 which leads us to a contradiction since
X∞ is absolutely continuous by Theorem 2.5. Hence we have
(
)
∫ T
Xs2
P lim
ds = ∞ = 1,
T →∞ 0 1 + Ys
2
which yields (5.2).
5.2 Theorem. If a >( 21 , b > 0, m ∈ R, θ > 0, and P(Y0 > 0) =
) 1, then the MLE of (a, b, m, θ) is
strongly consistent: P limT →∞ (b
aMLE
, bbMLE
,m
b MLE
, θbTMLE ) = (a, b, m, θ) = 1.
T
T
T
Proof. By Lemma 3.4, there exists a unique MLE (b
aMLE
, bbMLE
,m
b MLE
, θbTMLE ) of (a, b, m, θ) which has the
T
T
T
form given in (3.4), (3.5), (3.6) and (3.7). First we check that P(limT →∞ θbTMLE = θ) = 1. By (3.11), using also
(∫
)2
∫T
∫ T X2
∫T
∫ T X2
T
s
> 0 yields 0 Yss ds 0 Y1s ds > 0, we have
that 0 Yss ds 0 Y1s ds − 0 X
Ys ds
1
T
(5.3)
θbTMLE − θ =
1
T
∫T
0
∫T
0
Xs
Ys
2
Xs
Ys
ds
ds
1−
·
∫T
0
√1
Ys
1
0 Ys
∫T
dBs
ds
∫ T Xs
1
T 0 Ys
∫ T Xs2
1
1
T 0 Ys ds· T
(
−
ds)
∫T
0
∫T
0
X
√ s
Ys
2
Xs
0 Ys
∫T
dBs
ds
2
1
Ys
a.s.
ds
2
due to (3.15). Next, we show that E(1/Y∞ ) < ∞, E(X∞ /Y∞ ) < ∞ and E(X∞
/Y∞ ) < ∞, which, by part
(iii) of Theorem 2.5, will imply that
(
(
))
∫
1 T 1
1
(5.4)
P lim
ds = E
= 1,
T →∞ T 0 Ys
Y∞
(
(5.5)
1
P lim
T →∞ T
(
(5.6)
1
P lim
T →∞ T
∫
T
0
∫
T
0
Xs
ds = E
Ys
Xs2
ds = E
Ys
(
(
X∞
Y∞
2
X∞
Y∞
))
= 1,
))
= 1.
2
We only prove E(X∞
/Y∞ ) < ∞ noting that E(1/Y∞ ) < ∞ and E(X∞ /Y∞ ) < ∞ can be checked in the
2
very same way. First note that, by Theorem 2.5, P(Y∞ > 0) = 1, and hence the random variable X∞
/Y∞ is
1
1
well-defined with probability one. By H¨older’s inequality, for all p, q > 0 with p + q = 1, we have
(
E
2
X∞
Y∞
)
6
2p 1/p
(E(X∞
))
14
( (
))1/q
1
E
.
q
Y∞
( 2 )
X∞
Since, by Theorem 2.5, E(|X∞ |n ) < ∞ for all n ∈ N, to prove that E Y∞
< ∞ it is enough to check that
(
)
1
< ∞. Using that, by Theorem 2.5, Y∞ has Gamma distribution
there exists some ε > 0 such that E Y 1+ε
∞
with density function
(
E
(2b)2a 2a−1 −2bx
e
1{x>0} ,
Γ(2a) x
1
1+ε
Y∞
)
∫
∞
=
0
1
x1+ε
we have, for any ε > 0,
(2b)2a 2a−1 −2bx
(2b)2a
x
e
dx =
Γ(2a)
Γ(2a)
·
∫
∞
x2a−2−ε e−2bx dx.
0
Due to our assumption a > 1/2, one can choose an ε such that max(0, 2a − 2) < ε < 2a − 1, and hence for
all M > 0,
∫ ∞
∫ M
∫ ∞
x2a−2−ε e−2bx dx 6
x2a−2−ε dx + M 2a−2−ε
e−2bx dx
0
0
M
=
M 2a−1−ε
e−2bL − e−2bM
+ M 2a−2−ε lim
L→∞
2a − 1 − ε
−2b
=
e−2bM
M 2a−1−ε
+ M 2a−2−ε
< ∞,
2a − 1 − ε
2b
1+ε
which yields that E(1/Y∞
) < ∞ for ε satisfying max(0, 2a − 2) < ε < 2a − 1.
2
Further, since E(1/Y∞ ) > 0 and E(X∞
/Y∞ ) > 0 (due to the absolutely continuity of X∞ , as it was
explained in the proof of Theorem 5.1), (5.4) and (5.6) yield that
(
)
(
)
∫ T
∫ T 2
1
Xs
P lim
ds = ∞ = P lim
ds = ∞ = 1.
T →∞ 0 Ys
T →∞ 0
Ys
Using (5.3), (5.4), (5.5), (5.6), Slutsky’s lemma and a strong law of large numbers for continuous local martingales (see, e.g., Theorem 2.6), we get
(
lim
T →∞
θbTMLE
∞
E( X
)
( Y∞ )
2
X∞
E Y∞
)
−θ =
·0−0
=0
2
∞
(E( XY)∞
))
(
2
X∞
E Y∞ E( Y1∞ )
1−
a.s.,
where we also used that the denominator is strictly positive. Indeed, by Cauchy-Schwarz’s inequality,
( (
))2
( 2 ) (
)
X∞
X∞
1
(5.7)
E
6E
E
,
Y∞
Y∞
Y∞
2
/Y∞ = L/Y∞ ) = 1 with some K, L > 0, K 2 + L2 > 0. By
and equality would hold if and only if P(KX∞
Theorem 2.5, (Y∞ , X∞ ) is absolutely continuous and then P(X∞ = c) = 0 for all c ∈ R. This implies that
(5.7) holds with strict inequality.
Similarly, by (3.8), (3.9) and (3.10), we have
∫T
0
b
aMLE
−a=
T
bbMLE − b =
T
√1
Ys
1
0 Ys
∫T
1
T
∫ T1
Ys ds
0
1−
√1
0
∫ T Y1s
0 Ys
−
ds
1−
∫T
m
b MLE
−m=
T
dLs
1
T
ds
1−
0
0
√1
Ys
1
0 Ys
∫T
1
T
∫T
0
−
Ys
0
ds
∫T √
Ys dLs
0∫
T
Ys ds
0
·
1 ∫
Ys ds· T1 0T
∫T
∫T
·
dBs
∫ T1 1
1
T
dLs
ds
∫T
∫0T
0
Xs
Ys
1
Ys
∫T
ds
ds
Xs
0 Ys
2
∫
X
T
1
1
s
T 0 Ys ds· T
( T1
1
Ys
·
ds)
∫T
0
a.s.,
ds
∫T √
Ys dLs
0∫
T
Ys ds
0
−
1 ∫
Ys ds· T1 0T
1
T
1
T
1
Ys
a.s.,
ds
∫T
0
X
√ s
Ys
2
Xs
0 Ys
∫T
2
1
Ys
dBs
ds
a.s.,
ds
due to (3.14) and (3.15). Since E(Y∞ ) < ∞, by part (iii) of Theorem 2.5,
)
(
)
(
∫ T
∫
1 T
(5.8)
Ys ds = ∞ = 1,
P lim
Ys ds = E(Y∞ ) = P lim
T →∞ T 0
T →∞ 0
15
and then, similarly as before, one can argue that
lim (b
aMLE
T
T →∞
− a) =
lim (bbMLE
− b) =
T
T →∞
lim (m
b MLE
− m) =
T
T →∞
0−
1−
1
E( Y1∞ )
1
E(Y∞ ) E( Y1∞ )
1
E(Y∞ )
1−
·0
·0−0
1
E(Y∞ ) E( Y1∞ )
0−
∞
E( X
Y∞ )
E( Y1∞ )
=0
a.s.,
=0
a.s.,
·0
2
1−
∞
(E( XY)∞
))
(
2
X∞
E Y∞ E( Y1∞ )
=0
a.s.,
( )
where we also used that E(Y∞ ) E Y1∞ > 1 (which can be checked using Cauchy-Schwarz’s inequality and
the absolute continuity of Y∞ ). Using that the intersection of four events with probability one is an event with
probability one, we have the assertion.
2
5.3 Remark. If a = 12 , b > 0, θ > 0, m ∈ R, and P(Y0 > 0) = 1, then one should find another approach
for studying the consistency behaviour of the MLE of (a, b, m, θ), since in this case
) ∫ ∞
(
1
2be−2bx
E
=
dx = ∞,
Y∞
x
0
2
and hence one cannot use part (iii) of Theorem 2.5. In this paper we renounce to consider it.
6
Consistency of least squares estimator
6.1 Theorem.( If a > 0, b > )0, m ∈ R, θ > 0, and P(Y0 > 0) = 1, then the LSE of θ is strongly
consistent: P limT →∞ θeTLSE = θ = 1.
Proof. By Lemma 4.2, there exists a unique θeTLSE of θ which has the form given in (4.2). By (4.3), we have
(6.1)
θeTLSE − θ = −
∫T
0
∫T
√
√
Xs Ys dBs
Xs Ys dBs
0
= − ∫T
·
∫T
Xs2 ds
Xs2 Ys ds
0
0
By Theorem 2.5, we have
(
)
∫
( 2
)
1 T 2
P lim
(6.2)
=1
Xs Ys ds = E X∞ Y∞
T →∞ T 0
(
and
1
P lim
T →∞ T
∫T
1
T
Xs2 Ys ds
.
∫T
Xs2 ds
0
0
1
T
∫
T
Xs2
0
ds = E
(
2
X∞
)
)
= 1.
2
2
2
We note that E(X∞
Y∞ ) and E(X∞
) are calculated explicitly in Theorem 2.5. Note also that E(X∞
Y∞ ) is
2
positive (due to that X∞ Y∞ is non-negative and absolutely continuous), and hence we also have
)
(
∫
T
P
Xs2 Ys ds = ∞
lim
T →∞
= 1.
0
Then, by a strong law of large numbers for continuous local martingales (see, e.g., Theorem 2.6), we get
(
)
∫T
√
Xs Ys dBs
0
P lim ∫ T
= 0 = 1,
T →∞
Xs2 Ys ds
0
2
and hence (6.1) yields the assertion.
6.2 Theorem.( If a > 0, b > 0, m ∈ R, θ)> 0, and P(Y0 > 0) = 1, then the LSE of (m, θ) is strongly
consistent: P limT →∞ (m
b LSE
, θbTLSE ) = (m, θ) = 1.
T
16
Proof. By Lemma 4.3, there exists a unique LSE (m
b LSE
, θbTLSE ) of (m, θ) which has the form given in (4.6)
T
and (4.7). By (4.9), we have
1
θbTLSE − θ =
∫T
X 2 Y ds
− T 1 0∫ T Xs 2 sds ·
T
0
s
∫T
√
Xs Ys dBs
0∫
T
2
X
s Ys ds
0
∫
1 T
T 0 Xs ds
∫
2
( T1 ∫0T Xs ds)
T
1
2 ds
X
s
T 0
+
1−
due to (4.11). Using (5.8), (6.2) and that
(
1
P lim
T →∞ T
(6.3)
∫
1
T
1
T
·
∫T
Ys ds
∫ 0T
Xs2 ds
0
·
∫T √
Ys dBs
0∫
T
Ys ds
0
a.s.
)
T
Xs ds = E(X∞ )
= 1,
0
similarly to the proof of Theorem 5.2, we get
−
lim (θbTLSE − θ) =
2
E(X∞
Y∞ )
2 )
E(X∞
· 0 + E(X∞ ) ·
1−
T →∞
E(Y∞ )
2 )
E(X∞
(E(X∞ ))2
2 )
E(X∞
·0
=0
a.s.,
2
where for the last step we also used that (E(X∞ ))2 < E(X∞
) (which holds since there does not exist a constant
c ∈ R such that P(X∞ = c) = 1 due to the fact that X∞ is absolutely continuous).
Similarly, by (4.8),
lim (m
b LSE
− m) = lim
T
T →∞
− T1
∫T
0
Xs ds ·
1
T
∫T 2
X Ys ds
0
∫T s
1
2
T 0 Xs ds
T →∞
=
1
− E(X∞ ) ·
2
E(X∞
Y∞ )
·0+
2 )
E(X∞
2
∞ ))
1 − (E(X
2 )
E(X∞
∫T
√
Xs Ys dBs
0∫
+ T1
T
Xs2 Ys ds
0
∫
2
( 1 T Xs ds)
− T1 ∫0T X 2 ds
s
T 0
·
E(Y∞ ) · 0
=0
∫T
0
Ys ds ·
∫T √
Ys dBs
0∫
T
Ys ds
0
a.s.
Using that the intersection of two events with probability one is an event with probability one, we have the
assertion.
2
7
Asymptotic behaviour of maximum likelihood estimator
7.1 Theorem. If a > 1/2, b > 0, m ∈ R, θ > 0, and P(Y0 > 0) = 1, then the MLE of θ is asymptotically
normal, i.e.,


√
1
L
T (θeTMLE − θ) −→ N 0, ( 2 ) 
as T → ∞,
X∞
E Y∞
( 2
)
where E X∞
/Y∞ is positive and finite.
Proof. First note that, by (3.3),
(7.1)
√
∫
T (θeTMLE
− θ) = −
T Xs
√1
√
dBs
Ys
T 0
∫
2
T
X
1
s
T 0 Ys ds
a.s.
(
)
∫ T X2
due to (3.13). Recall that, by (5.2), P limT →∞ 0 Yss ds = +∞ = 1. Using Theorem 2.7 with the following
choices
∫ t
X
√ s dBs , t > 0,
(Ft )t>0 given in Remark 2.2,
d := 1,
Mt :=
Ys
0
√ (
)
2
1
X∞
Q(t) := √ , t > 0,
,
η := E
Y∞
t
17
we have
∫
1
√
T
T
0
X
L
√ s dBs −→
Ys
√ (
)
2
X∞
E
ξ
Y∞
as T → ∞,
where ξ is a standard normally distributed random variable. Then Slutsky’s lemma, (5.6) and (7.1) yield the
assertion.
2
7.2 Theorem. If a > 1/2, b > 0, m ∈ R, θ > 0, and P(Y0 > 0) = 1, then the MLE of (a, b, m, θ) is
asymptotically normal, i.e.,

 MLE
b
aT
−a

√ 
(
)
 bbMLE − b  L
as T → ∞,
T  TMLE
 −→ N4 0, ΣMLE
m
− m
bT
θbMLE − θ
T
)
(
where N4 0, ΣMLE denotes a 4-dimensional normally distribution with mean vector 0 ∈ R4 and with
covariance matrix ΣMLE := diag(ΣMLE
, ΣMLE
) with blockdiagonals given by
1
2
[
]
E(Y∞ )
1 )
1
MLE
(
Σ1
:= ( )
D1 ,
D1 :=
,
1
E Y1∞
E Y1∞ E(Y∞ ) − 1
ΣMLE
:=
2
E
(
1
Y∞
)
E
(
2
X∞
Y∞
1
)
( (
))2 D2 ,
∞
− E X
Y∞
 ( 2 )
X∞
E Y∞
)

(
D2 :=
∞
E X
Y∞
Proof. By (3.8), (3.9), (3.10) and (3.11), we have
∫T 1
∫T √
∫
1 T
√1
√
dLs − √1T 0 Ys dLs
√
T 0 Ys ds · T 0
Ys
MLE
T (b
aT
− a) =
∫
∫
1 T
1 T 1
T 0 Ys ds · T 0 Ys ds − 1
√
√
− b) =
T (bbMLE
T
T (m
b MLE
T
√
− m) =
T (θbTMLE − θ) =
√1
T
1
T
1
T
∫T
0
∫T
0
∫T
0
√1
Ys
1
T
Xs2
Ys
Xs
Ys
∫T √
∫T
dLs − T1 0 Y1s ds · √1T 0 Ys dLs
∫T
∫T
Ys ds · T1 0 Y1s ds − 1
0
∫T s
∫T
∫T
ds · √1T 0 √1Y dBs − T1 0 X
ds · √1T 0
Y
s
s
( ∫
)2
∫
∫
1 T Xs2
1 T 1
1 T Xs
ds
·
ds
−
ds
T 0 Ys
T 0 Ys
T 0 Ys
∫T
∫T
∫T
ds · √1T 0 √1Y dBs − T1 0 Y1s ds · √1T 0
s
( ∫
)2
∫
∫
1 T Xs2
1 T 1
1 T Xs
ds
·
ds
−
ds
T 0 Ys
T 0 Ys
T 0 Ys
due to (3.14) and (3.15). Next, we show that

∫ T √
Ys dLs
0

 ∫ T √1
dLs  L
1
1 
0
Ys
 −→ ηZ,

√ MT := √ ∫ T Xs
(7.2)
T
T  0 √Ys dBs 

∫T 1
√
dBs
0
Y
)
(
E
E
X∞
( Y∞ ) 
1
Y∞
.
a.s.,
a.s.,
Xs
√
Ys
Xs
√
Ys
dBs
dBs
a.s.,
a.s.,
as T → ∞,
s
where Z is a 4-dimensional standard normally distributed random variable and η is a non-random 4 × 4
matrix such that
ηη ⊤ = diag(D1 , D2 ).
Here the matrices D1 and D2 are positive definite, since their principal minors are positive, and η denotes
the unique symmetric positive definite square root of diag(D1 , D2 ) (see, e.g., Horn and Johnson [16, Theorem
7.2.6]). Indeed, by the absolutely continuity of (Y∞ , X∞ ) (see Theorem 2.5), there does not exist constants
2
2
c, d ∈ R+ such that P(Y∞
= c) = 1 and P(X∞
= d) = 1, and, by Cauchy-Schwarz’s inequality,
)
( 2 ) (
) ( (
))2
(
X∞
1
X∞
1
E(Y∞ ) − 1 > 0
and
E
E
− E
> 0,
E
Y∞
Y∞
Y∞
Y∞
18
2
2
where equalities would hold if and only if P(Y∞
= c) = 1 and P(X∞
= d) = 1 with some constants c, d ∈ R+ ,
respectively. Note also that the quantity E(1/Y∞ ) E(Y∞ ) − 1 could have been calculated explicitly, since Y∞
has Gamma distribution with parameters 2a and 2b.
Let us use Theorem 2.7 with the choices d = 4, Mt , t > 0, defined in (7.2), (Ft )t>0 given in Remark
2.2, and Q(t) := diag(t−1/2 , t−1/2 , t−1/2 , t−1/2 ), t > 0. We have
∫ t

Y ds
t
0
0
0 s
∫t 1




ds
0
0
t
0 Ys

⟨M ⟩t = 
,
t > 0,
∫ t Xs2
∫ t Xs 
0
ds
 0

Ys ds
0 Ys
∫0t X
∫
t 1
s
0
0
ds
ds
0 Ys
0 Ys
where we used the independence of (Lt )t>0 and (Bt )t>0 . Recall that (under the conditions of the theorem)
in the proof of Theorem 5.2 it was shown that E(Y∞ ) < ∞, E(1/Y∞ ) < ∞, E(X∞ /Y∞ ) < ∞, and
2
E(X∞
/Y∞ ) < ∞, and, hence by Theorem 2.5, we have
Q(t)⟨M ⟩t Q(t)⊤ → diag(D1 , D2 )
as t → ∞ a.s.
Hence, Theorem 2.7 yields (7.2). Then Slutsky’s lemma and the continuous mapping theorem yield that

−a
b
aMLE
T

√ 
 bbMLE − b  L
T  TMLE
 −→ diag(A1 , A2 )ηZ
m
− m
bT
θbTMLE − θ

as T → ∞,
where
A1 :=
A2 :=
E
(
1
Y∞
E
)
(
E
1
Y∞
(
)
2
X∞
Y∞
[
1
E(Y∞ ) − 1
1
)
−1
( )
B1 :=
− E Y1∞
B1 ,
( (
))2 B2 ,
∞
− E X
Y∞
]
E(Y∞ )
1
,

(
)
( 2 )
X∞
∞
−E X
E
( Y∞ ) .
( Y∞ )
B2 := 
∞
E X
− E Y1∞
Y∞
Using that ηZ is a 4-dimensional normally distributed random variable with mean vector zero and with
covariance matrix ηη ⊤ = diag(D1 , D2 ), the covariance matrix of diag(A1 , A2 )ηZ takes the form
⊤
⊤
⊤
diag(A1 , A2 ) diag(D1 , D2 ) diag(A⊤
1 , A2 ) = diag(A1 D1 A1 , A2 D2 A2 ),
which yields the assertion. Indeed,
B1 D1 B1⊤ =
and
B2 D2 B2⊤
( (
)
)
1
E
E(Y∞ ) − 1 D1 ,
Y∞
( (
) (
) ( (
))2 )
2
1
X∞
X∞
E
− E
D2 .
= E
Y∞
Y∞
Y∞
2
( 2
)
7.3 Remark. The asymptotic variance 1/ E X∞
/Y∞ of θeTMLE in Theorem 7.1 is less than the asymptotic
variance
( )
E Y1∞
))2
( ) ( 2 ) ( (
X∞
∞
− E X
E Y1∞ E Y∞
Y∞
of θbTMLE in Theorem 7.2. This is in accordance with the fact that θeTMLE is the MLE of θ provided that
the value of the parameter m is known, which gives extra information, so the MLE estimator of θ becomes
better.
2
19
8
Asymptotic behaviour of least squares estimator
8.1 Theorem. If a > 0, b > 0, m ∈ R, θ > 0, and P(Y0 > 0) = 1, then the LSE of θ is asymptotically
normal, i.e.,
)
(
2
√
E(X∞
Y∞ )
L
as T → ∞,
T (θeTLSE − θ) −→ N 0,
2 ))2
(E(X∞
2
2
where E(X∞
Y∞ ) and E(X∞
) are given explicitly in Theorem 2.5.
Proof. First note that, by (4.3),
√
(8.1)
T (θeTLSE − θ) = −
√1
T
∫T
0
1
T
√
Xs Ys dBs
∫T
0
a.s.
Xs2 ds
2
2
due to (4.10). Using that E(X∞
Y∞ ) is positive (since X∞
Y∞ is non-negative and absolutely continuous), by
(6.2), we have
(
)
∫
T
P
Xs2 Ys ds = +∞
lim
T →∞
= 1.
0
Further, an application of Theorem 2.7 with the following choices
∫ t √
d := 1,
Mt :=
Xs Ys dBs , t > 0,
(Ft )t>0 given in Remark 2.2,
0
1
Q(t) := √ ,
t
t > 0,
η :=
√
2 Y ),
E(X∞
∞
yields that
1
√
T
∫
T
Xs
0
√
L √
2 Y )ξ
Ys dBs −→ E(X∞
∞
as T → ∞,
where ξ is a standard normally distributed random variable. Using again (6.2), Slutsky’s lemma and (8.1), we
get the assertion.
2
2
2
8.2 Remark. The asymptotic variance E(X∞
Y∞ )/(E(X∞
))2 of the LSE θeTLSE in Theorem 8.1 is greater than
( 2
)
the asymptotic variance 1/ E X∞ /Y∞ of the MLE θeTMLE in Theorem 7.1, since, by Cauchy and Schwarz’s
inequality,
( (
( 2 )
√ ))2
)
( 2
X∞
X∞
2
(E(X∞
))2 = E √
X∞ Y∞
<E
E X∞
Y∞ .
Y∞
Y∞
Note also that using the limit theorem for θeTLSE given in Theorem 8.1, one can not give asymptotic confidence
intervals for θ, since the variance of the limit normal distribution depends on the unknown parameters a and
b as well.
2
8.3 Theorem. If a > 0, b > 0, m ∈ R, θ > 0, and P(Y0 > 0) = 1, then the LSE of (m, θ) is
asymptotically normal, i.e.,
[
]
√ m
(
)
b LSE
−m L
T
T bLSE
−→ N2 0, ΣLSE
as T → ∞,
θT − θ
(
)
where N2 0, ΣLSE denotes a 2-dimensional normally distribution with mean vector 0 ∈ R2 and with
2
covariance matrix ΣLSE = (ΣLSE
i,j )i,j=1 , where
2
2
2
(E(X∞ ))2 E(X∞
Y∞ ) − 2 E(X∞ ) E(X∞
) E(X∞ Y∞ ) + (E(X∞
))2 E(Y∞ )
,
2
2
2
(E(X∞ ) − (E(X∞ )) )
(
)
(
)
2
2
2
E(X∞ ) E(X∞
Y∞ ) + E(X∞
) E(Y∞ ) − E(X∞ Y∞ ) E(X∞
) + (E(X∞ ))2
LSE
= Σ2,1 :=
,
2 ) − (E(X ))2 )2
(E(X∞
∞
ΣLSE
1,1 :=
ΣLSE
1,2
ΣLSE
2,2 :=
2
E(X∞
Y∞ ) − 2 E(X∞ ) E(X∞ Y∞ ) + (E(X∞ ))2 E(Y∞ )
.
2 ) − (E(X ))2 )2
(E(X∞
∞
20
Proof. By (4.9) and (4.8), we have
√
T (m
b LSE
T
− m) =
− T1
∫T
0
Xs ds √1T
∫T
∫T
∫T √
√
Xs Ys dBs + T1 0 Xs2 ds √1T 0 Ys dBs
( ∫
)2
∫T
T
Xs2 ds − T1 0 Xs ds
0
0
1
T
a.s.
and
√
T (θbTLSE − θ) =
− √1T
∫T
0
∫T
∫T √
√
Xs Ys dBs + T1 0 Xs ds √1T 0 Ys dBs
( ∫
)2
∫
1 T
2 ds − 1 T X ds
X
s
s
T 0
T 0
a.s.
due to (4.11). Next we show that
(
)
∫ T
∫ T√
(
)
√
1
1
L
√
(8.2)
Ys dBs −→ (ηZ)1 , (ηZ)2
as T → ∞,
Xs Ys dBs , √
T 0
T 0
where Z is a 2-dimensional standard normally distributed random variable and η is a non-random 2 × 2
matrix such that
[
]
2
E(X∞
Y∞ ) E(X∞ Y∞ )
⊤
ηη =
.
E(X∞ Y∞ )
E(Y∞ )
[
Here the matrix
]
2
Y∞ ) E(X∞ Y∞ )
E(X∞
E(X∞ Y∞ )
E(Y∞ )
is positive definite, since its principal minors are positive, and η denotes its unique symmetric positive definite
2
square root. Indeed, by the absolutely continuity of (Y∞ , X∞ ) (see Theorem 2.5), we have P(X∞
Y∞ = 0) = 0
2
2
and, by Cauchy-Schwarz’s inequality, E(X∞ Y∞ ) E(Y∞ ) − (E(X∞ Y∞ )) > 0, where equality would hold if and
2
only if P(KX∞
Y∞ = LY∞ ) = 1 with some constant K, L ∈ R+ such that K 2 + L2 > 0 or equivalently
2
(using that P(Y∞ > 0) = 1 since Y∞ has Gamma distribution) if and only if P(X∞
= L/K) = 1, which
leads us to a contradiction refereeing to the absolutely continuity of X∞ .
Let us use Theorem 2.7 with the following choices
d := 2,
(Ft )t>0 given in Remark 2.2,
]
[∫ t √
[
1
√
Xs Ys dBs
t
0
Mt := ∫ t √
, t > 0,
Q(t) :=
Y
dB
0
s
s
0
[∫ t
Then
⟨M ⟩t =
∫0t
0
Xs2 Ys ds
Xs Ys ds
]
∫t
X Y ds
0∫ s s
,
t
Y ds
0 s
0
]
1
√
t
,
t > 0.
t > 0,
and hence, by Theorem 2.5 (similarly as detailed in the proof of Theorem 7.2),
[ ∫t
]
[
]
∫
1
1 t
2
2
E(X
Y
)
E(X
Y
)
X
Y
ds
X
Y
ds
∞
∞
∞
s
s
s
∞
t 0∫
Q(t)⟨M ⟩t Q(t)⊤ = t1 ∫0t s
→
1 t
E(X∞ Y∞ )
E(Y∞ )
t 0 Xs Ys ds
t 0 Ys ds
By (5.8), (6.2), Slutsky’s lemma and the continuous mapping theorem, we get
[
]
[
]
2
√ m
b LSE
−θ L
− E(X∞ ) E(X∞
)
1
T
T bLSE
−→
ηZ
2 ) − (E(X ))2
E(X∞
θT − θ
−1
E(X∞ )
∞
as t → ∞ a.s.
as T → ∞.
Using that ηZ is a 2-dimensional normally distributed random variable with mean vector zero and with
covariance matrix
[
]
2
E(X∞
Y∞ ) E(X∞ Y∞ )
⊤
ηη =
,
E(X∞ Y∞ )
E(Y∞ )
the covariance matrix of
[
− E(X∞ )
−1
]
2
E(X∞
)
ηZ
E(X∞ )
21
takes the form
[
]
[
]
2
− E(X∞ ) E(X∞
)
−1
⊤ ⊤ − E(X∞ )
η E(ZZ )η
2
−1
E(X∞ )
E(X∞
) E(X∞ )
[
][
][
]
2
2
− E(X∞ ) E(X∞
) E(X∞
Y∞ ) E(X∞ Y∞ ) − E(X∞ )
−1
=
2
−1
E(X∞ ) E(X∞ Y∞ )
E(Y∞ )
E(X∞
) E(X∞ )
[
]
2
2
2
E(X∞
) E(X∞ Y∞ ) − E(X∞ ) E(X∞
Y∞ ) E(X∞
) E(Y∞ ) − E(X∞ ) E(X∞ Y∞ )
=
2
E(X∞ ) E(X∞ Y∞ ) − E(X∞
Y∞ )
E(X∞ ) E(Y∞ ) − E(X∞ Y∞ )
[
]
− E(X∞ )
−1
×
,
2
E(X∞
) E(X∞ )
2
which yields the assertion.
8.4 Remark. Using the explicit forms of the mixed moments given in (iii) of Theorem 2.5, one can check that
2
2
the asymptotic variance E(X∞
Y∞ )/(E(X∞
))2 of θeTLSE in Theorem 8.1 is less than the asymptotic variance
LSE
LSE
b
Σ1,1 of θT
in Theorem 8.3. This can be interpreted similarly as in Remark 7.3. Note also that using
, θbTLSE ) given in Theorem 8.3, one can not give asymptotic confidence regions for
the limit theorem for (m
b LSE
T
(m, θ), since the variance matrix of the limit normal distribution depends on the unknown parameters a and
b as well.
2
Appendix
A
Radon-Nykodim derivatives for certain diffusions
We consider the SDEs
(A.1)
dξt = (Aξt + a) dt + σ(ξt ) dWt ,
t ∈ R+ ,
(A.2)
dηt = (Bηt + b) dt + σ(ηt ) dWt ,
t ∈ R+ ,
with the same initial values ξ0 = η0 , where A, B ∈ R2×2 , a, b ∈ R2 , σ : R2 → R2×2 is a Borel measurable
function, and (Wt )t∈R+ is a two-dimensional standard Wiener process. Suppose that the SDEs (A.1) and
(A.2) admit pathwise unique strong solutions. Let P(A,a) and P(B,b) denote the probability measures on the
measurable space (C(R+ , R2 ), B(C(R+ , R2 ))) induced by the processes (ξt )t∈R+ and (ηt )t∈R+ , respectively.
Here C(R+ , R2 ) denotes the set of continuous R2 -valued functions defined on R+ , B(C(R+ , R2 )) is the
Borel σ-algebra on it, and we suppose that the space (C(R+ , R2 ), B(C(R+ , R2 ))) is endowed with the natural
2
2
2
filtration (At )t∈R+ , given by At := φ−1
t (B(C(R+ , R ))), where φt : C(R+ , R ) → C(R+ , R ) is the mapping
φt (f )(s) := f (t ∧ s), s ∈ R+ . For all T > 0, let P(A,a),T := P(A,a) |AT and P(B,b),T := P(B,b) |AT be the
restrictions of P(A,a) and P(B,b) to AT , respectively.
From the very general result in Section 7.6.4 of Liptser and Shiryaev [26], one can deduce the following
lemma.
A.1 Lemma. Let A, B ∈ R2×2 , a, b ∈ R2 , and let σ : R2 → R2×2 be a Borel measurable function. Suppose
that the SDEs (A.1) and (A.2) admit pathwise unique strong solutions. Let P(A,a) and P(B,b) denote the
probability measures induced by the unique strong solutions of the SDEs (A.1) and (A.2) with the same initial
value ξ0 = η0 , respectively. Suppose that P(∃ σ(ξt )−1 ) = 1 and P(∃ σ(ηt )−1 ) = 1 for all t ∈ R+ , and
(∫ t
)
[
]
⊤
⊤ −1
⊤
⊤
⊤ −1
⊤
P
(Aξs + a) (σ(ξs )σ(ξs ) ) (Aξs + a) + (Bξs + b) (σ(ξs )σ(ξs ) ) (Bξs + b) ds < ∞ = 1
0
and
(∫
t
P
)
[
]
(Aηs + a)⊤ (σ(ηs )σ(ηs )⊤ )−1 (Aηs + a)⊤ + (Bηs + b)⊤ (σ(ηs )σ(ηs )⊤ )−1 (Bηs + b)⊤ ds < ∞ = 1
0
22
for all t ∈ R+ . Then for all T > 0, the probability measures P(A,a),T and P(B,b),T are absolutely continuous
with respect to each other, and the Radon-Nykodim derivative of P(A,a),T with respect to P(B,b),T (so called
likelihood ratio) takes the form
{∫
(A,a),(B,b)
LT
T
((ξs )s∈[0,T ] ) = exp
(Aξs + a − Bξs − b)⊤ (σ(ξs )σ(ξs )⊤ )−1 dξs
0
1
−
2
∫
}
T
⊤
⊤ −1
(Aξs + a − Bξs − b) (σ(ξs )σ(ξs ) )
⊤
(Aξs + a + Bξs + b) ds .
0
We call the attention that conditions (4.110) and (4.111) are also required for Section 7.6.4 in Liptser and
Shiryaev [26], but the Lipschitz condition (4.110) in Liptser and Shiryaev [26] does not hold in general for the
SDEs (A.1) and (A.2). However, we can use formula (7.139) in Liptser and Shiryaev [26], since they use their
conditions (4.110) and (4.111) only in order to ensure the SDE they consider in Section 7.6.4 has a pathwise
unique strong solution (see, the proof of Theorem 7.19 in Liptser and Shiryaev [26]), which is supposed in
Theorem A.1.
Acknowledgements
We are grateful for both referees for their several valuable comments yielding an improvement of the manuscript.
References
¨ ring, L., Li, Z. and Pap, G. (2013). On parameter estimation for critical affine processes.
[1] Barczy, M., Do
Electronic Journal of Statistics 7 647–696.
¨ ring, L., Li, Z. and Pap, G. (2013). Stationarity and ergodicity for an
[2] Barczy, M., Do
affine two factor model. To appear in Advances in Applied Probability. Available also on ArXiv:
http://arxiv.org/abs/1302.2534
[3] Barczy, M. and Pap, G. (2013). Maximum likelihood estimation for Heston models. Available on ArXiv:
http://arxiv.org/abs/1310.4783
[4] Barndorff-Nielsen, O. and Shephard, N. (2001). Non-Gaussian Ornstein-Uhlenbeck-based models
and some of their uses in financial economics. Journal of the Royal Statistical Society, Series B 63 167–
241.
[5] Ben Alaya, M. and Kebaier, A. (2012). Parameter estimation for the square root diffusions: ergodic
and nonergodic cases. Stochastic Models 28(4) 609–634.
[6] Ben Alaya, M. and Kebaier, A. (2013). Asymptotic behavior of the maximum likelihood estimator for
ergodic and nonergodic square-root diffusions. Stochastic Analysis and Applications 31(4) 552–573.
[7] Bishwal, J. P. N. (2008). Parameter Estimation in Stochastic Differential Equations. Springer-Verlag
Berlin Heidelberg.
[8] Carr, P. and Wu, L. (2003). The finite moment log stable process and option pricing. The Journal of
Finance 58 753–777.
[9] Chen, H. and Joslin, S. (2012). Generalized transform analysis of affine processes and applications in
finance. Review of Financial Studies 25(7) 2225–2256.
[10] Cox, J. C., Ingersoll, J. E. and Ross, S. A. (1985). A theory of the term structure of interest rates.
Econometrica 53(2) 385–407.
[11] Dawson, D. A. and Li, Z. (2006). Skew convolution semigroups and affine Markov processes. The Annals
of Probability 34(3) 1103–1142.
[12] Dudley, R. M. (1989). Real Analysis and Probability. Wadsworth & Brooks/Cole Advanced Books &
Software, Pacific Grove, California.
23
´, D. and Schachermayer, W. (2003). Affine processes and applications in finance.
[13] Duffie, D., Filipovic
Annals of Applied Probability 13 984–1053.
[14] Gatheral, J. (2006). The volatility surface : a practitioner’s guide. Jon Wiley & Sons, Inc., Hoboken,
New Jersey.
[15] Heston, S. (1993). A closed-form solution for options with stochastic volatilities with applications to bond
and currency options. The Review of Financial Studies 6 327–343.
[16] Horn, R. A. and Johnson, Ch. R. (1985). Matrix Analysis. Cambridge University Press, Cambridge.
[17] Hu, Y. and Long, H. (2007). Parameter estimation for Ornstein–Uhlenbeck processes driven by α-stable
L´evy motions. Communications on Stochastic Analysis 1(2) 175–192.
[18] Hu, Y. and Long, H. (2009). Least squares estimator for Ornstein–Uhlenbeck processes driven by α-stable
motions. Stochastic Processes and their Applications 119(8) 2465–2480.
[19] Hu, Y. and Long, H. (2009). On the singularity of least squares estimator for mean-reverting α-stable
motions. Acta Mathematica Scientia 29B(3) 599–608.
[20] Jacod, J. and Shiryaev, A. N. (2003). Limit Theorems for Stochastic Processes, 2nd ed. Springer-Verlag,
Berlin.
[21] Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. SpringerVerlag.
[22] Klimko, L. A. and Nelson, P. I. (1978). On conditional least squares estimation for stochastic processes.
The Annals of Statistics 6(3) 629–642.
[23] Kutoyants, Yu A. (2004). Statistical Inference for Ergodic Diffusion Processes. Springer-Verlag London
Limited.
[24] Li, Z. (2011). Measure-Valued Branching Markov Processes. Springer-Verlag, Heidelberg.
[25] Li, Z. and Ma, C. (2013). Asymptotic properties of estimators in a stable Cox-Ingersoll-Ross model.
Available on the ArXiv: http://arxiv.org/abs/1301.3243
[26] Liptser, R. S. and Shiryaev, A. N. (2001). Statistics of Random Processes I. Applications, 2nd edition.
Springer-Verlag, Berlin, Heidelberg.
[27] Liptser, R. S. and Shiryaev, A. N. (2001). Statistics of Random Processes II. Applications, 2nd edition.
Springer-Verlag, Berlin, Heidelberg.
´n, T. (1997). Estimation in the Cox-Ingersoll-Ross model. Econometric Theory
[28] Overbeck, L. and Ryde
13(3) 430–461.
[29] Overbeck, L. (1998). Estimation for continuous branching processes. Scandinavian Journal of Statistics
25(1) 111–126.
[30] Revuz, D. and Yor, M. (1999). Continuous Martingales and Brownian Motion, 3rd ed. Springer-Verlag
Berlin Heidelberg.
´, N. (2013). Long-time behaviour of stable-like processes. Stochastic Processes and their Applica[31] Sandric
tions 123(4) 1276–1300.
[32] Sørensen, H. (2004). Parametric inference for diffusion processes observed at discrete points in time: a
survey. International Statistical Review 72(3) 337–354.
[33] van Zanten, H. (2000). A multivariate central limit theorem for continuous local martingales. Statistics
& Probability Letters 50(3) 229–235.
´tya
´s Barczy, Faculty of Informatics, University of Debrecen, Pf. 12, H–4010 Debrecen, Hungary. Tel.:
Ma
+36-52-512900, Fax: +36-52-512996, e–mail: [email protected]
¨ ring, Institut f¨
Leif Do
ur Mathematik, Universit¨at Z¨
urich, Winterthurerstrasse 190, CH-8057 Z¨
urich,
Switzerland. Tel.: +41-(0)44-63 55892, Fax: +41-(0)44-63 55705,
e–mail: [email protected]
Zenghu Li, School of Mathematical Sciences, Beijing Normal University, Beijing 100875, People’s Republic
of China. Tel.: +86-10-58802900, Fax: +86-10-58808202, e–mail: [email protected]
Gyula Pap, Bolyai Institute, University of Szeged, Aradi v´ertan´
uk tere 1, H–6720 Szeged, Hungary. Tel.:
+36-62-544033, Fax: +36-62-544548, e–mail: [email protected]
24