A statistic T = T(X)

Transcription

A statistic T = T(X)
§Chapter 7 Sufficiency
✒ Def.
Let X = (X1, . . . , Xn) be a sample from {Fθ : θ ∈ Ω}. A statistic T = T (X) is
sufficient for θ or for the family of distributions {Fθ : θ ∈ Ω} ⇔ the conditional
distribution of X, given T = t, is independent of θ.
✇ Note:
The outcome X1, . . . , Xn is always sufficient, but we will exclude this trivil
statistic from consideration.
☞ EX.
n
�
iid
Let X1, . . . , Xn ∼ Bernoulli(p). Show that T (X) =
Xi is sufficient for p.
✍ Sol.
�
P (X1 = x1, . . . , Xn = xn| ni=1 Xi = t)
=
i=1
P (X1 = x1, . . . , Xn = xn, T = t)
1
�n�
�n�
=
t
n−t
t p (1 − p)
t
which is independent of p.
MathStat(II) ∼ Lecture 1 ∼ page 1
☞ EX.
iid
Let X1, X2 ∼ Poisson(λ). Show that X1 + X2 is sufficient for λ.
✍ Sol.
P (X1 = x1, X2 = x2|X1 + X2 = t) =
P (X1 = x1, X2 = t − x1)
P (X1 + X2 = t)
t!
e−λλx1 e−λλt−x1
= −2λ
×
×
e (2λ)t
x1 !
(t − x1)!
� �� �
t
1 t
=
,
x1 2
where xi = 0, 1, 2, . . ., i = 1, 2, and x1 + x2 = t.
� �� �
t
1 n
∵
is independent of λ.
∴ X1 + X2 is sufficient for λ.
x1 2
MathStat(II) ∼ Lecture 1 ∼ page 4
✇ Note:
P (X1 = 0, X2 = 1)
P (X1 = 0, X2 = 1|X1 + 2X2 =2) =
P (X1 + 2X2 = 2)
e−λλe−λ
=
P (X1 = 0, X2 = 1) + P (X1 = 2, X2 = 0)
λe−2λ
= −2λ
λe
+ (λ2/2)e−2λ
1
=
.
1 + λ/2
⇒ X1 + 2X2 is not sufficient for λ.
✑ Theorem: (The Factorization Criterion) ✯✯✯✯✯
Let X1, . . . , Xn be r.v.s with the joint pdf fθ (x), θ ∈ Ω. Then T (X1, . . . , Xn)
is sufficient for θ ⇔
fθ (x) = h(x)gθ (T (x)),
where h is a nonnegative function of x only and does not depend on θ, and g is
a nonnegative function of θ and T (x) only. The statistic T (X) and parameter
θ may be vectors.
MathStat(II) ∼ Lecture 1 ∼ page 5
iid
☞ EX. Let X1, . . . , Xn ∼ Bernoulli(p).
� p ��ni=1 xi
Pp(X = x) = p
(1 − p)
=
(1 − p)n.
1−p
� p ��ni=1 xi
Taking h(x) = 1 and gp(x) = (1 − p)n
.
1−p
�
By the factorization criterion, T (x) = ni=1 Xi is sufficient for p.
�n
i=1 xi
n−
�n
i=1 xi
iid
☞ EX. Let X1, . . . , Xn ∼ N (µ, σ 2), where both µ and σ 2 are unknown. Find the joint
sufficient statistic of (µ, σ 2)
� �n (x − µ)2 �
1
i
✍ Sol.
fµ,σ2 (x) = √
exp − i=1 2
n
2σ
(σ 2π)
�n 2
�
�
1
µ ni=1 xi nµ2 �
i=1 xi
= √
exp −
+
− 2
2σ 2
σ2
2σ
(σ 2π)n
��
�
�n
n
2
By the factorization criterion, T (X) =
X
,
X
is jointly sufi
i=1
i=1 i
ficiently for the parameter (µ, σ 2). An equivalent
that is
�nsufficient statistic
2
(X
−
X)
i
frequently used is T1(X) = (X, S 2), where S 2 = i=1
.
n−1
MathStat(II) ∼ Lecture 1 ∼ page 6
✎ HW.
iid
Let X1, . . . , Xn ∼ N (µ, σ 2). Prove that the conditional distribution of X given
(X, S 2) is independent of (µ, σ 2).
☞ EX.
iid
Let X1, . . . , Xn ∼ U (−θ/2, θ/2), θ > 0. Find a sufficient statistic for θ.
✍ Sol.
The joint pdf of X is given by
1
fθ (x) = n I(x(1) ≥ −θ/2)I(x(n) ≤ θ/2)
θ
By the factorization criterion, T (X) = (X(1), X(n)) is sufficient for θ.
✇ Note:
f (x|X(1) = x(1), X(n) = x(n)) =
n!
(n−2)!
=
�x
1/θn
(n) −x(1)
θ
�
1
�n−2
1
θ2
n(n − 1) x(n) − x(1)
which is independent of θ.
�n−2 ,
MathStat(II) ∼ Lecture 1 ∼ page 7
✒ Def.
1 An estimator T (X) that is not unbiased is called biased, and the function
b(θ, T ), defined by b(θ, T ) = Eθ (T (X)) − θ is called the bias of T (X).
2 The mean squared error (MSE)
� of an estimator
� T (X) of a parameter
θ is the function of θ defined by Eθ (T (X) − θ)2 .
✇ Note:
�
Eθ (T (X) − θ)
2
�
�
= Eθ (T (X) − E[T (X)] + E[T (X)] − θ)
= Varθ (T (X)) + [b(θ, T )]2
precision
2
�
accuracy
�
�
Therefore, for an unbiased estimator we have Eθ (T (X) − θ)2 = Varθ (T (X)).
MathStat(II) ∼ Lecture 1 ∼ page 11
☞ EX.
iid
Let X1, · · · , Xn ∼ N(µ, σ 2), where µ and σ 2 are both unknown. We know
�
that S 2 = ni=1(Xi − X)/(n − 1) is unbiased for σ 2. Note that S is not, in
general, unbiased for σ. Find the bias of S for σ.
✍ Sol.
(n − 1)S 2
∼ χ2(n−1).
σ2
�
�
√
√ � n � � n − 1 � −1
Thus, E( W ) = 2 Γ
Γ
.
2
2
��
�
�−1�
2 �1/2 � n � � n − 1 �
⇒ Eσ (S) = σ
Γ
Γ
n−1
2
2
Let W =
Therefore, the bias of S can be evaluate by b(σ, S) = Eσ (S) − σ.
MathStat(II) ∼ Lecture 1 ∼ page 12
✇ Note: ✯✯✯✯✯
In particular, it is sometimes the case that a trade-off occurs between variance and bias in such a way that a small increase in bias can be traded for a
large decrease in variance, resulting in an improvement in MSE.
☞ EX.
iid
Let X1, · · · , Xn ∼ N(µ, σ 2), where µ ∈ R and σ 2 ∈ R+ are both unknown. Let
�
S 2 = ni=1(Xi − X)2/(n − 1). We know that S 2 is an unbiased estimator and
�
n−1 2
an alternative estimator for σ 2 is the mle σ̂ 2 = ni=1(Xi − X)2/n =
S .
n
�n
�
�
(Xi − X)2
n−1 2
n−1 2
∵ i=1 2
∼ χ2(n − 1)
∴ E[σ̂ 2] = E
S =
σ .
σ
n
n
� n − 1 � 2(n − 1)σ 4
Moreover, Var[σ̂ 2] = Var
S2 =
.
2
n
n
Thus, the MSE of σ̂ 2 is
�2
�
�
� � � � �
E (σ̂ 2 − σ 2)2 = Var σ̂ 2 + E σ̂ 2 − σ 2
�
�2 �
�
2(n − 1)σ 4
n−1 2
2n − 1 4
2
=
+
σ −σ
=
σ .
n2
n
n2
MathStat(II) ∼ Lecture 1 ∼ page 13
2
∵ Var(S 2) = σ 4
and
n−1
2n − 1
2
2
2
1
−2
1
−
= −
− 2=
− 2 < 0,
2
n
n−1 n n−1 n
n(n − 1) n
∀n ≥ 2
∴ The MSE of σ̂ 2 is smaller than that of S 2.
✎ HW.
Find a minimum MSE estimate of the form αS 2 for the parameter σ 2.
n−1
Ans: α =
.
n+1
✒ Def.
Let θ0 ∈ Ω and U (θ0) = {T (X) : Eθ0 [T (X)] = θ0} such that
Eθ0 [T 2(X)] < ∞. Then T0 ∈ U (θ0) is called a locally minimum variance
unbiased estimator (LMVUE) at θ0 if
Eθ0 [(T0 − θ0)2] ≤ Eθ0 [(T − θ0)2]
∀T ∈ U (θ0).
or
Varθ0 (T0) ≤ Varθ0 (T ).
MathStat(II) ∼ Lecture 1 ∼ page 14
✒ Def.
Let U = {T (X)|Eθ [T (X)] = θ, ∀θ ∈ Ω} such that Eθ [T 2] < ∞, for all θ ∈ Ω.
An estimate T0 ∈ U is called uniformly minimum variance unbiased estimate
(UMVUE) of θ if
Varθ (T0) ≤ Varθ (T )
∀θ ∈ Ω and every T ∈ U
✎ HW.
Let X1, · · · , Xn be independent r.v.s, and a1, · · · , an be real numbers such
�
that ni=1 ai = 1. Assume that E[Xi2] < ∞ with Var(Xi) = σi2 ∀i. Write
�
�
S = ni=1 aiXi. Then Var(S) = ni=1 a2i σi2 = σ. Find the weights ai such
that σ is minimum.
❉ Hint:
1/σ 2
Using Cauchy-Schwarz inequality, the weights ai = �n i 2 , ∀i can minij=1 1/σj
1
mize σ with the value σmin = �n
2.
j=1 1/σj
MathStat(II) ∼ Lecture 1 ∼ page 15
✑ Theorem: (Rao-Blackwell) ✯✯✯✯✯
Let {Fθ : θ ∈ Ω} be a family of distribution functions and
h(X) ∈ U = {W (X)|Eθ (W ) = θ, ∀θ ∈ Ω}
with Eθ [h2(X)] < ∞ (or V arθ [h(X)] < ∞).
Let T (X) be a sufficient statistic for {Fθ , θ ∈ Ω}. Then ① the conditional
expectation Eθ (h|T ) is independent of θ and is an unbiased estimate of θ.
Moreover, ②
�
�
�
�
Eθ [E(h|T ) − θ]2 ≤ Eθ (h − θ)2 ,
∀θ ∈ Ω
or
Varθ [E(h|T )] ≤ Varθ (h),
∀θ ∈ Ω
The equality holds ⇔ Pθ (h = E(h|T )) = 1 ∀θ ∈ Ω.
↑ That is h(X) is a function of T (X).
MathStat(II) ∼ Lecture 1 ∼ page 16
✍ Proof:
Because of the definition of sufficiency, Eθ (h|T ) is independent of θ.
∵ Eθ [E(h|T )] = Eθ (h) = θ.
∴ Eθ (h|T ) is unbiased for θ.
Varθ (h) = Varθ [E(h|T )] + Eθ [Varθ (h|T )]
≥ Varθ [E(h|T )],
∀θ ∈ Ω.
The equality of above inequality holds
⇔ Eθ [Varθ (h|T )] = 0
�
2
�
�
⇔ Varθ (h|T ) = 0 ⇔ E h |T = E(h|T )
�2
⇔ Pθ (h = E(h|T )) = 1,
∀θ ∈ Ω.
☞ EX.
Let X and Y be two r.v.s with the joint pdf
2
f (x, y) = 2 e−(x+y)/θ , θ > 0,
0 < x < y < ∞.
θ
(a) Show that E[Y ] = 3θ/2 and Var[Y ] = 5θ2/4.
(b) Show that E[Y |X = x] = x + θ and Var(X + θ) <Var[Y ].
MathStat(II) ∼ Lecture 1 ∼ page 17
✍ Sol.
The marginal pdf of X is given by
� ∞
�
2 −x/θ ∞ −y/θ
2
fX (x) =
f (x, y)dy = 2 e
e
dy = 2 e−x/θ θe−x/θ
θ
θ
x
x
2 −2x/θ
= e
,
0 < x < ∞, i.e. X ∼ exp(θ/2)
θ
The conditional pdf of Y given X = x is
1
fY |X (y|x) = e−(y−x)θ ,
0<x<y<∞
θ
⇒ E[Y − X|X] = θ and Var[Y − X|X] = θ2
⇒ E[Y |X] = X + θ and Var[Y |X] = θ2
�
�
E[Y ] = E E[Y |X] = E[X] + θ = θ/2 + θ = 3θ/2
�
�
Var[Y ] = E Var[Y |X] +Var{E[Y |X]}
= θ2+Var(X + θ)= θ2 + θ2/4 = 5θ2/4
⇒ Var(X + θ) <Var(Y ).
✎ HW. §7.1— 1∼ 4; §7.3 — 2,3,6.
MathStat(II) ∼ Lecture 1 ∼ page 18
§7.4 Completeness and Uniqueness
✒ Def.
Let {fθ (x), θ ∈ Ω} be a family of pdf’s. We say that this family is complete
if
Eθ [g(X)] = 0, ∀θ ∈ Ω implies Pθ (g(X) = 0) = 1,
∀θ ∈ Ω.
✒ Def.
A statistic T (X) is said to be complete if the family of distribution of T (X) is
complete.
☞ EX.
�
iid
Let X1, · · · , Xn ∼ Bernoulli(p). Then T (X) = ni=1 Xi is a sufficient statistic.
Show that T (X) is also complete.
MathStat(II) ∼ Lecture 2 ∼ page 1
✍ Sol.
It’s equivalent to show that the family of distributions of T (X),
{b(n, p), 0 < p < 1}, is complete. Let
Ep[g(T )] =
n
�
t=0
Thus,
(1 − p)
n
n
�
t=0
� �
n t
g(t)
p (1 − p)n−t = 0
t
∀p ∈ (0, 1).
� �
n t
g(t)
φ = 0, if φ = p/(1 − p) > 0, ∀p ∈ (0, 1).
t
This is a polynomial in φ with order n. Therefore the coefficients must vanish.
That is g(t) = 0, t = 0, 1, 2, . . . , n. (or that is g is a zero function)
☞ EX.
Let X ∼ N (0, θ), θ ∈ Ω = (0, ∞). Show that X is not complete.
✍ Sol.
Let g(x) = x, then Eθ [g(X)] = Eθ [X] = 0, ∀θ ∈ Ω.
However, g(x) = x is not identically zero. Thus, X is not complete.
MathStat(II) ∼ Lecture 2 ∼ page 2
✑ Theorem: (Lehmann-Scheffé) ✯✯✯✯✯
If T is a complete sufficient statistic and there exists an unbiased estimator h of θ, there exists unique UMVUE of θ, which is given by Eθ [h|T ].
✍ Proof:
If h1, h2 ∈ U = {W : Eθ [W ] = θ, ∀θ ∈ Ω}, then E[h1|T ] and E[h2|T ] are
both unbiased and
Eθ {E[h1|T ] − E[h2|T ]} = 0, ∀θ ∈ Ω.
∵ T is complete.
∴ E[h1|T ] = E[h2|T ].
Therefore, E[h|T ] is unique UMVUE.
MathStat(II) ∼ Lecture 2 ∼ page 3
☞ EX.
Let X1, · · · , Xn be iid r.v.s with the pdf Pθ (X = x) = f (x; θ) = 1/θ, x =
1, 2, · · · , θ, where θ is an unknown positive integer. Find the UMVUE of θ.
✍ Sol.

 ∵ f (x; θ) = 1 I(x ≤ θ)
(n)
θn
step 1.
 ∴ By the factorization criterion, T = X is sufficient for θ.
(n)
MathStat(II) ∼ Lecture 2 ∼ page 4
step 2.


The pdf of T is given by






P (T = x) = P (T ≤ x) − P (T ≤ x − 1)





� x �n � x − 1 �n



=
−
, x = 1, 2, · · · , θ.



θ
θ



�



Eθ [g(T )] = θx=1 g(x)P (T = x) = 0,
∀θ ≥ 1.




 We want to show that g(x) = 0, for all θ ≥ 1.





























For θ = 1, then E1[g(T )] = g(1) = 0 so g(1) = 0
�� �n � �n
� �n � �n �
For θ = 2, then E2[g(T )] = g(1) 12 − 02 ] + g(2)[ 22 − 12
=0
⇒ g(2) = 0
Using the mathematical induction, we conclude that
g(1) = g(2) = · · · g(θ) = 0
i.e. T is a complete sufficient statistic.
MathStat(II) ∼ Lecture 2 ∼ page 5


∵ E[X1] = (θ + 1)/2
∴ U (X) = 2X1 − 1 is unbiased for θ.






By the Lehmann-Scheffé Theorem, E[U (X)|T ] is UMVUE of θ.




P (X1 = x1, T = y)



P (X1 = x1|T = y) =
, if x1 �= y and x1 < y


P
(T
=
y)



P (X1 = x1, max{x2, · · · , xn} = y)


=



P (T = y)


��
�
� y − 1 �n−1�

n−1

1 y


−
θ �
step 3
=θ �
� y −θ 1 �n
n
y



−


θ
θ

n−1
n−1


y
−
(y
−
1)


=
,
x1 = 1, 2, · · · , y − 1

n − (y − 1)n

y




y−1

�

y n−1


P (X1 = y | T = y) = 1 −
P (X1 = x1 | T = y) = n


y − (y − 1)n


x
=1
1



T n+1 − (T − 1)n+1


 Finally, we have E[U (X)|T ] =
.
T n − (T − 1)n
MathStat(II) ∼ Lecture 2 ∼ page 6
§7.5 The Exponential Class (Family) of Distributions
✑ Theorem:
Let {fθ : θ ∈ Ω} be a k-parameter exponential family given by
fθ (x) = exp
k
��
i=1
�
Qj (θ)Tj (x) + D(θ) + S(x) ,
where θ = (θ1, · · · , θk ) ∈ Ω, T = (T1, T2, · · · , Tk ), and x = (x1, · · · xn),
k ≤ n. Let Q = (Q1, · · · Qk ) ∈ Ω, Then
T = (T1(X), T2(X), · · · , Tk (X))
is a complete sufficient statistic. (or to say that the family is of full
rank)
MathStat(II) ∼ Lecture 2 ∼ page 7
☞ EX.
iid
Let X1, · · · , Xn ∼ N (µ, σ 2), where µ ∈ (−∞, ∞) and σ 2 is assumed to be
�
known. Explain that T = ni=1 Xi is a complete sufficient statistic of µ
✍ Proof:
� �n
�
2
√
(x
−
µ)
i
−n
f (x; θ) = ( 2πσ) exp − i=1 2
2σ
�n 2
�n
�
2�
√
i=1 xi − 2µ
i=1 xi + nµ
= exp − n ln( 2πσ) −
2σ 2
�n 2
n
�
√
µ �
nµ2 �
i=1 xi
= exp − n ln( 2πσ) −
+
x
−
i
2σ 2
σ 2 i=1
2σ 2
�
Let Q(µ) = µ/σ 2, T (x) = ni=1 xi, D(µ) = −nµ2/(2σ 2),
√
�
and S(x) = −n ln( 2πσ) − ni=1 x2i /(2σ 2).
�n
Therefore, it is a one-parameter exponential family, and T =
i=1 Xi is a
complete sufficient statistic of µ.
MathStat(II) ∼ Lecture 2 ∼ page 8
☞ EX.
iid
Let X1, · · · , Xn ∼ U (0, θ), θ ∈ (0, ∞). Show that T = X(n) is a complete
sufficient statistic of θ.
✍ Sol.
Since
1
I(x(n) ≤ θ),
θn
we know that T = X(n) is sufficient for θ by the factorization criterion. However, the joint pdf of X is not a one-parameter exponential family. We need to
show that
� θ n−1
t
Eθ [g(T )] =
n n g(t)dt = 0,
∀θ > 0 implies g(t) = 0, ∀t.
θ
0
fθ (x) =
We differentiable both sides for Eθ [g(T )] = 0 with respect to θ
to get g(θ) = 0, ∀θ > 0.
Hence we have shown that T = X(n) is a complete statistic of θ.
MathStat(II) ∼ Lecture 2 ∼ page 9
☞ EX.
�
�
iid
Let X1, · · · , Xn ∼ N (θ, θ2). Show that T = ( ni=1 Xi, ni=1 Xi2) is sufficient
for θ. Is it also complete for θ?
�n 2
�n
�
2�
√
✍ Sol.
x
−
2θ
i=1 i
i=1 xi + nθ
fθ (x) = exp − n ln( 2πθ) −
2θ2
�n 2 �
n
�
√
x
xi n �
= exp − n ln( 2πθ) − i=12 i + i=1 −
2θ
θ
2
�n
�n
By the factorization criterion, T = ( i=1 Xi, i=1 Xi2) is sufficient for θ.
�n 2
n
�
�
√
n�
2
i=1 xi
[Q1(θ)] +
xiQ1(θ) −
∵ fθ (x) = exp − n ln( 2πθ) −
2
2
i=1
∴ This family is not of full rank.
�
�
In fact, T = ( ni=1 Xi, ni=1 Xi2) is not complete for θ.
� �
�
�n
n
2
2
Though Eθ 2( i=1 Xi) − (n + 1) i=1 Xi = 0,
∀θ ∈ R,
�
�
g(x) = 2( ni=1 xi)2 − (n + 1) ni=1 x2i is not identically zero.
MathStat(II) ∼ Lecture 2 ∼ page 10
§7.6 Functions of a parameter
☞ EX.
iid
Let X1, · · · , Xn ∼ Bernoulli(θ) θ ∈ (0, 1). Find the UMVUE of the parameter
δ = θ(1 − θ).
✍ Sol.
��
�
�
∵ fθ (x) = exp ( ni=1 xi) ln θ + (n − ni=1 xi) ln(1 − θ)
�
�
��
�
θ
n
= exp ( i=1 xi) ln
+ n ln(1 − θ)
1−θ
∴ {fθ : θ ∈ (0, 1)} is a one-parameter exponential family, so Y =
complete and sufficient for θ.
�n
i=1 Xi
The following task is to find out an estimator T (Y ) such that E[T (Y )] = δ.
By the Lehmann-Scheffé theorem, we know that T (Y ) is the UMVUE of δ.
MathStat(II) ∼ Lecture 2 ∼ page 11
is
Method I:
�� � �
��
Y
Y
1
1
∵E
1−
= E[Y ] − 2 E[Y 2]
n
n
n
n
with E[Y ] = nθ
and E[Y 2] = nθ(1 − θ) + n2θ2.
�� � �
��
Y
Y
(n − 1)
∴E
1−
=
θ(1 − θ).
n
n
n
�
�� ��
�
n
Y
Y
Therefore, we can take T (Y ) =
1−
.
n−1
n
n
Method II:
step1: Let

1, X = 1, X = 0;
1
2
U (X) =
so Eθ [U (X)] = δ.
0, otherwise,
step2: Further, let T (Y ) = Eθ [U (X)|Y ].
MathStat(II) ∼ Lecture 2 ∼ page 12
Eθ [U (X)|Y = t] = Pθ
�
� n
�
��
�
X1 = 1, X2 = 0 �
Xi = t
�
i=1
�
Pθ (X1 = 1)Pθ (X2 = 0)Pθ ( ni=3 Xi = t − 1)
�
=
Pθ ( ni=1 Xi = t)
� � t−1
θ(1 − θ) n−2
(1 − θ)n−t−1
t−1 θ
�n�
=
t
n−t
t θ (1 − θ)
t(n − t)
.
n(n − 1)
�
�� ��
�
n
Y
Y
That is Eθ [U (X)|Y ] =
1−
.
n−1
n
n
=
MathStat(II) ∼ Lecture 2 ∼ page 13
☞ EX.
iid
Let X1, . . . , Xn ∼ Poisson(θ). Find the UMVUE of P (X1 = r)
✍ Sol.
�
� n
��
�n
n
�
�
e−nθ θ i=1 xi
xi − nθ − ln
xi !
∵ fθ (x) = �n
= exp ln θ
i=1 xi !
i=1
i=1
n
�
∴T =
Xi is a complete sufficient statistic for θ.
i=1
(It is a one-parameter exponential family.)

Let
1, if X = r;
1
U (X) =
then E[U (X)] = P (X1 = r).
0, otherwise,
That is U (X) is an unbiased estimator of P (X1 = r).
By the Lehmann-Scheffé theorem, we know that E[U (X)|T ] is the
UMVUE of P (X1 = r). Moreover, (X1, . . . , Xn) given T = t follows
multinomial(t; 1/n, 1/n, · · · , 1/n,), so X1|T
� =
� t ∼ Binomial� (t, 1/n). Thus,
t � 1 �r �
1 t−r
E[U (X)|T = t] = P (X1 = r|T = t) =
1−
, t ≥ r; =0,
r n
n
t < r.
MathStat(II) ∼ Lecture 2 ∼ page 14
☞ EX.
iid
Let X1, . . . , Xn ∼ Exp(1/θ), θ > 0.
�
(a) Explain that T = ni=1 Xi is a complete sufficient statistic of θ.
(b) Prove that (n − 1)/T is the UMVUE of θ.
✍ Sol.
�
n
�
�
�
n
�
(a) ∵ fθ (x) = θ exp −θ
xi = exp − θ
xi + n ln θ
i=1
i=1
�
∴ T = ni=1 Xi is a complete sufficient statistic of θ.
(It is a one-parameter exponential family.)
�
(b) ∵ T = ni=1 Xi ∼ gamma(n, 1/θ).
� ∞ n n−1 −θt
1θ t e
θnΓ(n − 1)
θ
∴ E[1/T ] =
dt = n−1
=
.
t Γ(n)
θ Γ(n)
n−1
0
n
By the Lehmann-Scheffé theorem,
�
n−1
is the UMVUE of θ.
T
MathStat(II) ∼ Lecture 2 ∼ page 15
☞ EX.
iid
Let X1, . . . , Xn ∼ Bernulli(p), p ∈ [0, 1]. Show that the sample mean X is the
UMVUE of p.
✍ Sol.
By the Cramér-Rao inequality, for any unbiased estimator T (X) of p, we have
Varp(T (X)) ≥
p(1 − p)
n
and
Varp(X) =
p(1 − p)
.
n
It follows that Varp(X) attains the lower bound of the Cramér-Rao inequality,
and hence the estimator T (X) is the UMVUE of p.
MathStat(II) ∼ Lecture 2 ∼ page 16
☞ EX.
iid
Let X1, . . . , Xn ∼ N (θ, 1), θ ∈ R. Find the UMVUE of θ2.
✍ Sol.
∵ {N (θ, 1) : θ ∈ R} is a one-parameter exponential family.
�
∴ X or ni=1 Xi is sufficient and complete for θ.
�
� 1
∵ E (X)2 = + θ2
which is derived from X ∼ N (θ, 1/n).
n
�
�
∴ E (X)2 − n1 = θ2.
2
By he Lehmann-Scheffé theorem, T (X) = X − 1/n is the UMVUE of θ2.
✇ Note:
2
X − 1/n may occasionally be negative, so that an UMVUE for θ2 is not very
sensible in this case.
MathStat(II) ∼ Lecture 2 ∼ page 17
☞ EX. (Sometimes an unbiased estimate may be absurd)
Let X ∼ Poisson(λ), and d(λ) = e−3λ. We have T (X) = (−2)X which is
unbiased for d(λ). Note that
E[T (X)] = e
−λ
∞
�
x=0
(−2)
xλ
x
x!
=e
−λ
∞
�
(−2λ)x
x=0
x
x!
= e−λe−2λ = d(λ).
However, T (x) = (−2) > 0 if x is even, and < 0 if x is odd, which is absurd
since d(λ) > 0.
iid
☞ EX. Let X1, . . . , Xn ∼ Exp(θ), θ > 0. Find the UMVUE of P (X1 ≤ 2).
✍ Sol.
Let U (X) = 1 if X1 ≤ 2, and = 0 if X1 > 2, so E[U (X)] = P (X1 ≤ 2).
∵ {exp(θ) : θ ∈ R} is a one-parameter exponential family.
�
∴ T = ni=1 Xi is complete and sufficient for θ and T ∼ gamma(n, θ)
Using the Lehmann-Scheffé theorem gives that E[U (X)|T ] is the UMVUE of
P (X1 ≤ 2).
MathStat(II) ∼ Lecture 2 ∼ page 18
The conditional pdf of X1 given T is
(y − x1)n−2e−(y−x1)/θ
Γ(n − 1)θn−1
f (x1|T = y) =
y n−1e−y/θ
Γ(n)θn
�
�n−2
n−1
x1
=
1−
, 0 < x1 < y.
y
y
(1/θ)e−x1/θ
Thus,
E[U (X)|T = y] = P (X1 ≤ 2|T = y),
if y ≥ 2
�
�n−2
� 2
n−1
x1
=
1−
dx1
y
y
0
�
�n−2 �
�� 2
�
x1
x1 ��
=
−(n − 1) 1 −
d 1−
�
y
y �
0
�
�n−1��2
�
�n−1
x1
2
�
= − 1−
� =1− 1−
�
y
y
0
MathStat(II) ∼ Lecture 2 ∼ page 19
Therefore, the UMVUE of P (X1 ≤ 2) is

�
�n−1

1 − 1 − 2
,T ≥ 2
T
E[U (X)|T ] =

1
,T < 2
✎ HW. §7.4 — 1, 5, 9; §7.5 — 1, 3, 4, 11, 13; §7.6 — 1, 2, 4, 7.
MathStat(II) ∼ Lecture 2 ∼ page 20
§6.4 Multiparameter case: Estimation
✒ Def.
Let X1, . . . , Xn be iid with common pdf f (x; θ), where θ ∈ Ω ⊂ Rp. As before,
the likelihood function and its natural logarithm are given by
L(θ) =
n
�
i=1
f (xi; θ)
and l(θ) = ln L(θ) =
n
�
i=1
ln f (xi; θ) ∀θ ∈ Ω
We will consider the value which maximizes L(θ) or l(θ). If it exists this value
will be called the maximum likelihood estimator (mle)
☞ EX.
Let X1, . . . Xn be iid N (θ, σ 2), with both θ and σ 2 unknown. Find the mle of θ and σ 2.
MathStat(II) ∼ Lecture 3 ∼ page 1
✍ Sol. Method I:
�
L(θ, σ 2|x) = (2πσ 2)−n/2 exp{− 12 (xi − θ)2/σ 2}
n
n
n
1 � (xi − θ)2
l(θ, σ ) = − ln(2π) − ln(σ 2) −
2
2
2 1
σ2
2
and
n
∂l(θ, σ 2)
1 �
⇒
= 2
(xi − θ)
∂θ
σ 1
and
n
∂l(θ, σ 2)
n
1 �
=
−
+
(xi − θ)2
∂σ 2
2σ 2 2σ 4 1
Setting these partial derivatives equal to zero and solving yields the solution.
n
θ̂ = x̄
1�
σˆ2 =
(xi − x̄)2
n 1
and
The remainder task is to verify this solution is a global maximum.
n
n
�
�
2
∵
(xi − θ) >
(xi − x̄)2
∀θ �= x̄
1
1
�
n
1
1 � (xi − x̄)2
∴
exp
−
2 1
σ2
(2πσ 2)n/2
�
�
�
n
1
1 � (xi − θ)2
≥
exp −
2 1
σ2
(2πσ 2)n/2
MathStat(II) ∼ Lecture 3 ∼ page 2
Then this problem
� � is reduced
� to a 1-dimensional problem, verifying that
n (xi −x̄)2
1
2 −n/2
(σ )
exp − 2 1 σ2
achieves its global maximum at
�
σ 2 = n−1 n1 (xi − x̄)2
✇ Note:
To use two-variate calculus to verify that a function H(θ1, θ2) has a global
maximum at (θˆ1, θˆ2), it must be shown that the following four conditions hold.
�
�
�
�
∂
∂
(i)
H(θ1, θ2)��
= 0 and
H(θ1, θ2)��
=0
∂θ1
∂θ2
ˆ ˆ
ˆ ˆ
(θ1 ,θ2 )=(θ1 ,θ2 )
�
�
∂2
(ii) 2 H(θ1, θ2)��
∂θ
1
θ=θ̂
�
�
∂2
< 0 or 2 H(θ1, θ2)��
∂θ
2
(θ1 ,θ2 )=(θ1 ,θ2 )
< 0.
θ=θ̂
(iii) The determinant of Hessian matrix is positive, that is
� 2
�
� ∂
�
∂2
� ∂θ2 H(θ1, θ2) ∂θ1∂θ2 H(θ1, θ2) �
� 21
�
>0
� ∂ H(θ , θ ) ∂ 2 H(θ , θ ) �
1 2
1
2
� ∂θ1∂θ2
�
2
∂θ2
θ=θ̂
(iv) Check boundaries.
MathStat(II) ∼ Lecture 3 ∼ page 3
✍ Sol. Method II:
n
∂2
−n
∂2
n
1 �
2
2
l(θ,
σ
)
=
<
0,
l(θ,
σ
)
=
−
(xi − θ)2,
∂θ2
σ2
∂(σ 2)2
2σ 4 σ 6 1
n
∂2
1 �
2
and
l(θ,
σ
)
=
−
(xi − θ)
∂θ∂σ 2
σ4 1
� � � 2�
1
n
The determinant of Hessian matrix is
, so it is positive.
6
σ̂
2
Check boundaries:
θ −→ ±∞ and σ 2 −→ ∞ ⇒ L(θ, σ 2) −→ 0
θ −→ ±∞ and σ 2 −→
σ 2) −→ �
0
� 0 ⇒ L(θ,
n
1�
Therefore, (θ̂, σ̂ 2) = X̄,
(Xi − X̄)2 is the mle of (θ, σ 2).
n 1
MathStat(II) ∼ Lecture 3 ∼ page 4
✑ Theorem: ✯✯✯✯✯
Let X = (X1, . . . , Xn) be distributed according to a k-parameter exponential
family with pdf


k
�

f (x; θ) = exp
Qj (θ)Tj (x) + D(θ) + S(x)


j=1
where θ ∈ Ω ⊂ Rk
If the equations E[Tj (X)] = Tj (x) for� j = 1, 2, . . . k (the method
of moment
�
ˆ
ˆ
ˆ
estimator), have a solution θ̂(x) = θ1(x), θ2(x), . . . , θk (x) then θ̂ is the
unique mle of θ.
MathStat(II) ∼ Lecture 3 ∼ page 5
✍ Proof:
For a 1-parameter exponential family, if the pdf is
f (x; θ) = exp {θT (x) + D(θ) + S(x)}
l(θ) = ln f (x; θ) = θT (x) + D(θ) + S(x)

l�(θ) = T (x) + D�(θ)
⇒
l��(θ) = D��(θ)
Note that the mgf of T (X) is given by
�
� �
tT (X)
E e
=
e(t+θ)T (x)+D(θ)+S(x)dx
Rn
D(θ)
e
= exp{[D(θ) − D(θ + t)]}
eD(t+θ)
∂ � tT (X)� ��
⇒ E[T (X)] = E e
� = −D�(θ) and
t=0
∂t
�
��
∂2
2
tT (X) �
E{[T (X)] } = 2 E e
� = [D�(θ)]2 − D��(θ)
t=0
∂t
=
MathStat(II) ∼ Lecture 3 ∼ page 6
So Var[T (X)] = −D��(θ)
Then l�(θ) = 0 ⇒ E[T (X)] = T (x)
and l��(θ) = −Var[T (X)] < 0
Therefore, the equation E[T (X)] = T (x) has a solution θ̂(x), then θ̂(x) is the
unique MLE of the parameter θ.
☞ EX. Let X1, . . . , Xn be iid N (µ, σ 2), where µ and σ 2 are unknown. Find the mle
of (µ, σ 2).
✍ Sol.
�
�
n
2
�
1
(xi − µ)
f (x; µ, σ 2) = (2πσ 2)−n/2 exp −
2 1
σ2
�
�
n
n
2
�
�
1
µ
nµ
n
= exp − 2
x2i + 2
xi − 2 − ln(2πσ 2)
2σ 1
σ 1
2σ
2
MathStat(II) ∼ Lecture 3 ∼ page 7
Therefore, it is a 2-parameter exponential family.
We just need to solve the equations
� n
�
� n
�
n
n
�
�
�
�
2
2
E
Xi =
xi
and
E
Xi =
xi
1
That is
1
1
1

n
�



xi

 nµ̂ =








 µ̂ = x̄
n
1
⇒
1�
n
�
ˆ
2

(xi − x̄)2

 σ =n
n(σˆ2 + µˆ2) =
x2i
1
1
MathStat(II) ∼ Lecture 3 ∼ page 8
☞ EX.
Let (X1, . . . , Xs) ∼Multinomial(n, p), where p = (p1, . . . , ps) is unknown.
Find the mle of p.
✍ Sol.
n!
fX(x; p) = �s
�
i=1 xi !
= exp ln n! − ln(
�
= exp ln n! − ln(
�
px1 1 · · · pxs s = exp ln n! − ln(
s
�
i=1
s
�
s
�
xi!) +
i=1
s
�
xi!) + (n − x2 − x3 − . . . − xs) ln p1 +
xi!) + n ln p1 +
i=1
s
�
xi ln(pi/p1)
i=2
xi ln pi
i=1
�
s
�
i=2
�
xi ln pi
�
Therefore, it is a (s − 1)-parameter exponential family. Then we can obtain
the mle of p by solving the equations
E[Xi] = xi
⇒ p̂i = xi/n
Consequently, the mle of pi is p̂i = Xi/n,
i = 2, . . . , s
i = 1, 2, . . . , s.
i = 1, 2, . . . , s.
MathStat(II) ∼ Lecture 3 ∼ page 9
☞ EX.
iid
X1, . . . , Xn ∼ U (θ − ρ, θ + ρ), where Ω = {(θ, ρ) : θ ∈ R and ρ > 0}. Find
the mle’s for θ and ρ. Are these two unbiased estimators?
✍ Sol.
The likelihood function is
L(θ, ρ|x) = 1/(2ρ)n, θ − ρ ≤ x(1) ≤ x(n) ≤ θ + ρ
= 1/(2ρ)nI(θ − ρ ≤ x(1) ≤ x(n) ≤ θ + ρ)
To minimize L make ρ as small as possible
which is accomplished by setting


 θ̂ − ρ̂ = x
θ̂ = x(n)+x(1)
(1)
2
⇒
 θ̂ + ρ̂ = x(n)
ρ̂ = x(n)−x(1)
2
LetYi = [Xi − (θ − ρ)]/(2ρ) ∼ U (0, 1), it follows that Y(r) ∼ beta(r, n − r + 1).
 θ̂ = ρY + ρY + (θ − ρ)
(n)
(1)
∵
 ρ̂ = ρY(n) − ρY(1)
MathStat(II) ∼ Lecture 3 ∼ page 10

n
1


+ρ
+θ−ρ=θ
 E(θ̂) = ρ
n+1
n+1
∴
n−1


 E(ρ̂) =
ρ
n+1
We conclude that θ̂ is unbiased but ρ̂ is not.
✎ HW.
Let X1, . . . , Xn1 and Y1, . . . , Yn2 be independent random samples from
N (µ1, σ 2) and N (µ2, σ 2) respectively. Find the mle of (µ1, µ2, σ 2), where
Ω = {(µ1, µ2, σ 2)|µi ∈ R, i = 1, 2, 0 < σ 2 < ∞}
n1
n2
�
�
2
(Xi − X̄) +
(Yi − Ȳ )2
1
Ans: µˆ1 = X̄, µˆ2 = Ȳ and σˆ2 = 1
.
n1 + n2
MathStat(II) ∼ Lecture 3 ∼ page 11
✒ Def.
Multiparameter case: Fisher Information Matrix
Fisher information in the scalar case is the variance of the random variable
∂
∂θ ln f (X; θ). The analog in the multiparameter case is the variance-covariance
matrix of the gradient of ln f (X; θ); that is, the variance-covariance matrix of
the random vector given by
�
��
∂ ln f (X; θ)
∂ ln f (X; θ)
∇ ln f (X; θ) =
,...,
∂θ1
∂θp
Fisher information is then defined by the p × p matrix,
I(θ) = Cov (∇ ln f (X; θ))
The (i, j)th entry of I(θ) is given by
�
�
∂
∂
Ij,k = Cov
ln f (X; θ),
ln f (X; θ)
∂θj
∂θk
�
�
∂2
= −E
ln f (X; θ)
∂θj ∂θk
MathStat(II) ∼ Lecture 3 ∼ page 12
✇ Note:
Information for a sample follows in the same way as the scalar case. Then
Fisher information for a sample is given by nI(θ).
✑ Theorem: Multiparameter case: Cramér-Rao lower bound
Let X1, . . . , Xn be iid r.v.s with common pdf f (x; θ) for θ ∈ Ω ⊂ Rs. Assume
that regularity conditions hold. Let T (X) be an unbiased estimator of g(θ).
Then
Var[T (X)] ≥ α�[nI(θ)]−1α,
where α� is the row matrix ith element αi =
∂
∂θi g(θ), i
= 1, 2, . . . , s.
✇ Note:
For the scalar case, the lower bound reduces to Var[T (X)] ≥
g(θ) = θ.
MathStat(II) ∼ Lecture 3 ∼ page 13
1
, provided
nI(θ)
☞ EX.
iid
Let X1, . . . , Xn ∼ N (µ, σ 2), with µ = θ1 and σ 2 = θ2 both unknown.
(i) Find Fisher information matrix for this sample.
(ii) Find Cramér-Rao lower bound for any unbiased estimator of σ 2.
✍ Sol.
(i) f (x; θ1, θ2) = √
(x−θ1 )2
1
−
e 2θ2
2πθ2
Let
∂ ln f (X; θ1, θ2)
1
S1 =
= (X − θ1)
∂θ1
θ2
∂ ln f (X; θ1, θ2)
1
(X − θ1)2
S2 =
=−
+
.
∂θ2
2θ2
2θ22
We can derive
1
1
1
Var(S1) = 2 Var(X) = = 2
and
θ2
θ2 σ
�
�
�2 �
1 X − θ1
1
1
√
Var(S2) =Var
= 2 × 2 = 4 and
2θ2
4θ2
2σ
θ2
MathStat(II) ∼ Lecture 3 ∼ page 14
�
1
X − θ1 1
Cov(S1, S2) =Cov √ × √
,
×
θ2
θ2 2θ2
�
X − θ1
√
θ2
�2 �
Consequently, Fisher information matrix is
�
�  n
Var(S1) Cov(S1, S2)
2
I(θ) = n ×
=σ
0
Cov(S1, S2) Var(S2)
2
(ii) ∵ g(θ) =
� θ2 = σ
�
∂g(θ) ∂g(θ)
�
∴α =
,
= (0, 1)
∂θ1 ∂θ2
=0

0
n 
(2σ 4)
⇒ Cramér-Rao lower bound is given by
 2

� �
σ
0
2σ 4
 n
 0
Var[T (X)] ≥ (0, 1) 
=
.
2σ 4  1
n
0
n
MathStat(II) ∼ Lecture 3 ∼ page 15
✎ HW.
n
Is the unbiased estimator σˆ2 =
1 �
(Xi − X̄)2 efficient?
n−1 1
✑ Theorem: (The asymptotic behavior of the mle of the vector θ)
Let X1, . . . , Xn be iid r.v.s with pdf f (x; θ) for θ ∈ Ω. Assume that regularity
conditions hold. Then
(i) The likelihood equation,
∂
l(θ) = 0,
∂θi
P
i = 1, 2, . . . , s
P
has a solution θ̂ such that θ̂ −→ θ, that is θ̂i −→ θi, i = 1, 2, . . . , s
(ii) For any sequence which satisfies (i),
√
D
n(θ̂ − θ) −→ MVN(0, I −1(θ))
← Don’t involve n
√
D
(iii) n(θ̂i − θi) −→ N (0, Iii−1(θ)), i = 1, 2, . . . , n.
MathStat(II) ∼ Lecture 3 ∼ page 16
(iv) Let g be a transformation g(θ) = (g1(θ), . . . , gk (θ))� such that 1 ≤ k ≤ s
and that the k × s matrix of partial derivatives
 ∂g ∂g

∂g1
1
1
.
.
.
� �
∂θ ∂θ
∂θ
∂gi
 .1 .2 . .s 
B=
= . . . . 
∂θj
∂gk ∂gk
∂gk
∂θ1 ∂θ2 . . . ∂θs
Let η̂ = g(θ̂). Then η̂ is the mle of η = g(θ).
Then (using the ∆-method)
√
D
n(η̂ − η) −→ MVN(0, BI −1(θ)B �)
�
�−1
Moreover, Fisher information matrix for η is I(η) = BI −1(θ)B �
MathStat(II) ∼ Lecture 3 ∼ page 17
✇ Note:
For the scalar case, let I(θ) be Fisher information for the family {p(x, θ), θ ∈
Ω}. If θ = h(ξ) and h is differentiable, then Fisher information that X contains
about ξ is
��
�2 �
∂
I ∗(ξ) = E
ln p(X; θ)
∂ξ
=E
��
=E
��
∂
∂h(ξ)
ln p(X; θ) ×
∂h(ξ)
∂ξ
∂
ln p(X; θ)
∂θ
�2 �
�2 �
× [h�(ξ)]2 = I(θ)[h�(ξ)]2
MathStat(II) ∼ Lecture 3 ∼ page 18
☞ EX.
Let X ∼ N (0, σ 2). If θ = σ 2, then I(θ) = 1/(2σ 4) = 1/(2θ2). Find I ∗(σ).
✍ Sol.
Let ξ = σ then θ = h(ξ) = ξ 2, so h�(ξ) = 2ξ = 2σ.
1
2
Thus, I ∗(ξ) = I ∗(σ) = I(σ 2)(2σ)2 = 4 × 4σ 4 = 2 .
2σ
σ
✎ HW.
Let (Xi, Yi), i = 1, 2, . . . , n be a r.s. from BVN(0, Σ) where
�
�
�
1 ρ
= σ2
,
ρ 1
Ω = {(σ 2, ρ) : 0 < σ 2 < ∞, |ρ| ≤ 1}
(a) Find mle of σ 2 and ρ.
(b) Find Fisher information matrix
(c) Establish limiting normality of ρ̂ in (a).
MathStat(II) ∼ Lecture 3 ∼ page 19
❉ Hint:
�� n
�n 2 �
�n
2
X
+
Y
2
X i Yi
i
i
1
(a) σˆ2 =
and ρ̂ = �n 2 1 �n 2
(2n)
1 Xi +
1 Yi


n
−nρ

σ4
[σ 2(1 − ρ2)] 


2
(b) nI((σ , ρ)) = 

2

−nρ
n(1 + ρ ) 
[σ 2(1 − ρ2)] (1 − ρ2)2
�
�
�� �
�
2
ˆ
2
√
σ −σ
0
D
, I −1(σ 2, ρ)
(c) ∵ n
−→ MVN
ρ̂ − ρ
0


σ 4(1 + ρ2) −σ 2(1 − ρ2)ρ

I −1(σ 2, ρ) = 
2
2
2 2
−σ (1 − ρ )ρ (1 − ρ )
∂g
∂g
Let g(σ 2, ρ) = ρ. Then
=
0
and
= 1 ←i.e., B = (0, 1)
∂σ 2
∂ρ
√
D
n(ρ̂ − ρ) −→ N (0, (1 − ρ2)2)
1
✎ HW. §6.4 — 1, 2, 3, 4, 6.
MathStat(II) ∼ Lecture 3 ∼ page 20
§5.5 Introduction to Hypothesis Testing
✒ Def.
(1) A parametric hypothesis is an assertion about the unknown parameter θ.
It is usually referred to as the null hypothesis, H0 : θ ∈ Ω0 ⊂ Ω.
The statement H1 : θ ∈ Ω1 = Ω \ Ω0 is usually referred to as
the alternative hypothesis.
(2) If Ω0(Ω1) contains only one point, we say that Ω0(Ω1) is simple; otherwise,
composite. Thus, if a hypothesis is simple, the probability distribution of
X, if X ∼ fθ , θ ∈ Ω0(Ω1), is completely specified under the hypothesis.
☞ EX.
Let X ∼ N (µ, σ 2). If both µ and σ 2 are unknown, Ω = {(µ, σ 2) : µ ∈ R, σ 2 >
0}. The hypothesis H0 : µ ≤ µ0, σ 2 > 0, where µ0 is known constant, is a
composite null hypothesis. The alternative hypothesis is H1 : µ > µ0, σ 2 > 0,
which is also composite. Similarly, the null hypothesis H0� : µ = µ0, σ 2 > 0 is
also composite. If σ 2 = σ02 is known, the hypothesis H0�� : µ = µ0 is a simple
hypothesis.
MathStat(II) ∼ Lecture 6 ∼ page 1
☞ EX.
Let X1, . . . , Xn be iid Bernoulli(p). Some hypotheses of interest are p = 1/2,
p ≤ 1/2, p ≥ 1/2 or, quite generally, p = p0, p ≤ p0, p ≥ p0, where p0 is
known number, p0 ∈ (0, 1).
✇ Note:
The problem of testing of hypotheses may be described as follow:
Given the sample point x = (x2, . . . , xn) ∈ Rn, find a decision rule (function)
that will head to a decision to reject or not reject the null hypothesis. In other
words, partition the space Rn into two disjoint sets. C and C c such that, if
x ∈ C, we reject H0 : θ ∈ Ω0, and if x ∈ C c, we do not reject H0.
✒ Def.
(1) Let X1, . . . , Xn be a sample with the joint pdf fθ (x), θ ∈ Ω. A subset C
of Rn such that, if x ∈ C, then H0 is rejected (with probability 1) is called
the critical region (set):
C = {x ∈ Rn : H0 is rejected if x ∈ C}
MathStat(II) ∼ Lecture 6 ∼ page 2
(2) There are two types of errors that can be made if one uses such a procedure.
One may reject H0 when in fact it is true, called a type I error, or do
not reject H0 when it is false, called a type II error:
True
Don’t reject H0
Don’t reject H1
H0
correct
type I error
H1
type II error
correct
That is
Pθ (type I error) = Pθ (X ∈ C),
Pθ (type II error) = Pθ (X ∈ C c),
MathStat(II) ∼ Lecture 6 ∼ page 3
θ ∈ Ω0
θ ∈ Ω1
✇ Note:
Ideally one would like to find a critical region for which both these probabilities
are 0. Unfortunately situations such as this do not arise in practice.
Usually, if a critical region is such that P (type I error)=0, it will be of the
form “always do not reject H0”, and P (type II error) will then be 1. For example, let H0 : exp(θ) vs. H1 : poisson(θ) and the critical region {X : X < 0}.
The procedure used in practice is to limit P (type I error) to some preassigned
level α (usually 0.01 or 0.05) that is small and to minimize the probability of
type II error.
Sometimes α is called significance level or, simply, level.
✒ Def. Let ϕ : Rn −→ [0, 1], the function is known as a test function.
MathStat(II) ∼ Lecture 6 ∼ page 4
☞ EX.
Let C be a critical region for some test.
If a test function is

1,
ϕ(x) =
0,
x∈C
o.w.
Then Eθ [ϕ(X)] = P (type I error) ∀θ ∈ Ω0
and 1 − Eθ [ϕ(X)] = P (type II error) ∀θ ∈ Ω1 = Ω \ Ω0.
✒ Def.
The mapping ϕ is said to be a test of hypothesis H0 : θ ∈ Ω0 against the
alternative H1 : θ ∈ Ω1 with error probability (or level) α if
Eθ [ϕ(X)] ≤ α
∀θ ∈ Ω0
we shall say, in short, that ϕ is a test for the problem (α, Ω0, Ω1)
MathStat(II) ∼ Lecture 6 ∼ page 5
✇ Note:
Let us write βϕ(θ) = Eθ [ϕ(X)]. Our object, in practice will be to seek a test
ϕ a given α ∈ [0, 1], such that
sup βϕ(θ) ≤ α
θ∈Ω0
The left-hand side of the above inequality is usually known as the size of the
test ϕ
✒ Def.
Let ϕ be a test function for the problem (α, Ω0, Ω1). For every θ ∈ Ω define
βϕ(θ) = Eθ [ϕ(X)] = Pθ ( Reject H0)
As a function of θ, βϕ(θ) is called the power function of the test ϕ. For any
θ ∈ Ω1, βϕ(θ) is called the power of ϕ against the alternative θ.
MathStat(II) ∼ Lecture 6 ∼ page 6
☞ EX.
iid
Let X1, X2, . . . , X20 ∼ Bernoulli(p), where p ∈ (0, 1). It is clear that Y =
�20
i=1 Xi ∼ binomial(20, p). To test H0 : p = 1/2 vs. H1 : p < 1/2 is of
interest. Let the critical region be C = {y : y ≤ 6}. Find the probability of
type I error and the power function of this test.
✍ Sol.
Let the test function ϕ be
ϕ(y) =

1,
0,
P (type I error)=P (Y ≤ 6|p = 1/2) =
if y ≤ 6
o.w.
6 � �
�
20
y=0
y
(1/2)20 ≈ 0.0577
The power function of ϕ is given by
6 � �
�
20 y
βϕ(p) = Ep[ϕ(Y )] =
p (1 − p)20−y ,
y
y=0
MathStat(II) ∼ Lecture 6 ∼ page 7
0 < p < 1.
✒ Def.
Often the partitioning of the sample space is specified in terms of the values of
a statistic called the test statistic.
✫ Skill:
Given a sample point x, find a test ϕ(x) such that βϕ(θ) ≤ α ∀θ ∈ Ω0, and
βϕ(θ) is maximum for θ ∈ Ω1.
✎ HW. §5.5 — 2, 4, 8 ∼ 13.
MathStat(II) ∼ Lecture 6 ∼ page 8
§8.1 Most Power Tests
✒ Def.
Let Φα be the class of all test for the problem (α, Ω0, Ω1). A test ϕ0 ∈ Φα is
said to be a most powerful (MP) test against an alternative θ ∈ Ω1 if
βϕ0 (θ) ≥ βϕ(θ)
∀ϕ ∈ Φα
✇ Note:
If Ω1 contains only one point, this definition suffices. If on the other hand, Ω1
contains at least two points, as will usually be the case, we will have an MP
test corresponding to each θ ∈ Ω1
✒ Def.
A test ϕ0 ∈ Φα for the (α, Ω0, Ω1) is said to be a uniformly most powerful
(UMP) test if
βϕ0 (θ) ≥ βϕ(θ)
∀ϕ ∈ Φα
uniformly in θ ∈ Ω1
MathStat(II) ∼ Lecture 6 ∼ page 9
✒ Def. ✯✯✯✯✯
The p-value associated with a test is the probability that we obtain the observed value of the test statistic or a value that is more extreme in the direction
of the alternative hypothesis, calculated when H0 is true.
☞ EX.
Let X ∼ N (µ, 100). To test H0 : µ = 80, v.s. H1 : µ > 80, let the critical
region be defined by C = {(x1, . . . , x25) : x̄ > 83}, where x̄ is the sample mean
of a r.s. of size n = 25 from this distribution.
(i) How is the power function βϕ(θ) defined for this test?
(ii) Find P (type I error).
(iii) What are the value of βϕ(80), βϕ(83), and βϕ(86)?
(iv) What is the p-value corresponding to x̄ = 83.41?
MathStat(II) ∼ Lecture 6 ∼ page 10
✍ Sol.
(i) The test function can be defined by

1,
ϕ(X) =
0,
X̄ > 83
o.w.
βϕ(µ) = Pµ(X̄ > 83) = P (Z ≥ (83 − µ)/(10/5)) = 1 − Φ ((83 − µ)/2),
where Φ is the df of the standard normal distribution.
(ii) P (Type I error)=βϕ(80) = 1 − Φ(1.5) ≈ 0.0668
(iii) βϕ(83) = 1 − Φ(0) ≈ 0.5
βϕ(86) = 1 − Φ(−1.5) ≈ 0.9332 (Do you have any discovery?)
(iv)
p − value = P (X̄ > 83.41|µ = 80) = P (Z > 1.705)
= 1 − Φ(1.705) ∼
= 0.0441
MathStat(II) ∼ Lecture 6 ∼ page 11
☞ EX.
Let X̄ ∼ N (µ, 16/n), where n is the sample size. To test H0 : µ = 20 vs.
H1 : µ < 20. Assume that there is a test function ϕ(x) = 1 if x̄ ≤ c; = 0,
otherwise. Find the constant c and sample size n so that E[ϕ(X)|µ = 20] =
0.05 and E[ϕ(X)|µ = 19] = 0.9, approximately.
✍ Sol.
� �
�
�
X̄ − µ
c − µ ��
c−µ
√ ≤ √ �µ = Φ
√
βϕ(µ) = Eµ[ϕ(X)] = P
4/ n
4/ n
4/ n


�
�
βϕ(20) = Φ c−20
 c−20
√
√ = −1.645
= 0.05
� 4/ n �
We are given
⇒ 4/ n
βϕ(19) = Φ c−19
 c−19
√
√ = 1.282
= 0.9
4/ n
4/ n
�
Solving simultaneously yields n ∼
= 137 and c ∼
= 19.4
MathStat(II) ∼ Lecture 6 ∼ page 12
✒ Def.
To every x ∈ Rn we assign a number ϕ(x), 0 ≤ ϕ(x) ≤ 1, which is the
probability of rejecting H0 that X ∼ fθ , θ ∈ Ω0 if x is observed. The restriction
βϕ(θ) ≤ α for θ ∈ Ω0 then says that, if H0 were true, ϕ rejects it with a
probability ≤ α. We will call such a test a randomized test function.
If ϕ(x) = IA(x), ϕ will be called a nonrandomized test. If x ∈ A, we
reject H0 with probability 1; and if x ∈
/ A, this probability is 0.
MathStat(II) ∼ Lecture 6 ∼ page 13
☞ EX.
iid
Let X1, X2, . . . , Xn ∼ N (µ, 1), where
µ is unknown but it is known that

H : X ∼ N (µ , 1)
0
i
0
µ ∈ Ω = {µ0, µ1}, µ0 < µ1. Let
H1 : Xi ∼ N (µ1, 1)
Intuitively, one would not reject H0 if the sample mean X̄ is “closer” to µ0
than to µ1; that is to say, one would reject H0 if X̄ > k, and not reject H0
otherwise. The constant k is determined from the level requirements.
Given 0 < α < 1, we have
�
�
X̄ − µ0 k − µ0
√ > √
Pµ0 (X̄ > k) = Pµ0
1/ n
1/ n
= P ( Type I error ) = α
√
so that k = µ0 + Zα/ n.
Note that Z ∼ N (0, 1), then P (Z > Zα) = α.
MathStat(II) ∼ Lecture 6 ∼ page 14
The test, therefore, is ϕ(x) =

1,
√
if x̄ > µ0 + Zα/ n
0,
o.w.
X̄ is known as a test statistic, and the test ϕ is nonrandomized with critical
√
region C = {x : x̄ > µ0 + Zα/ n}
The power of the test at µ1 is given by
√
Eµ1 [ϕ(X)] = Pµ1 (X̄ > µ0 + Zα/ n)
�
�
√
X̄ − µ1
√ > (µ0 − µ1) n + Zα
= P µ1
1/ n
√
= Pµ1 (Z > Zα − n(µ1 − µ0)),
Z ∼ N (0, 1)
Note that the probability of type II error can be evaluated directly by
P ( type II error ) = 1 − Eµ1 [ϕ(X)]
√
= P (Z ≤ Zα − n(µ1 − µ0)).
MathStat(II) ∼ Lecture 6 ∼ page 15
MathStat(II) ∼ Lecture 6 ∼ page 16
☞ EX.
Let X1, X2, . .
. X5 be a sample from Bernoulli(p), where p is unknown and
H ,
p = 1/2
0
p ∈ [0, 1]. Let
 H1 ,
p �= 1/2
It is reasonable to reject H0 if |X̄ − 1/2| > c, where X̄ is the sample mean and
c is to be determined as below:
Let α = 0.1. Then we would like to choose c such that the size of our test is
α, that is 0.1 = Pp=1/2(|X̄ − 1/2| > c) or
�
�
5
�
�
�
0.9 = P −5c ≤
Xi − 5/2 ≤ 5c�p = 1/2
=P
�
1
−k ≤
5
�
1
�
�
Xi − 5/2 ≤ k �p = 1/2
�
How do you find the critical point k?
MathStat(II) ∼ Lecture 6 ∼ page 17
with letting k = 5c
Note that
n
�
1
�
Xi ∼ binomial (5, 1/2), under H0, so we have
xi
�
xi − 5/2
−2.5
−1.5
−0.5
0.5
1.5
2.5
0
1
2
3
4
5
P
��
5
1 Xi
=
�5
1 xi |H0
is true
0.03125
0.15625
0.31250
0.31250
0.15625
0.03125
�
Note that we cannot choose any k such that
�
�� 5
�
��
�
�
�
P �
Xi − 5/2� ≤ k|H0 is true = 0.9.
�
�
1
MathStat(II) ∼ Lecture 6 ∼ page 18
☞ For example.
�
(i) If k = 1.5 we reject H0, i.e., 51 Xi = 0 or 5, then
�
�� 5
�
��
�
�
�
�
�
P �
Xi − 5/2� > 1.5�p = 1/2 = 0.03125 × 2 = 0.0625 < 0.1
�
�
1
�
(ii) If k = 0.5, we reject H0 when 5i=1 Xi = 0, 1, 4, 5 then
�
�� 5
�
��
�
�
�
�
�
P �
Xi − 5/2� > 0.5�p = 1/2 = 0.0625 + 0.15625 × 2 = 0.375 > 0.1
�
�
1
If we insist on achieving α = 0.1, we can take a randomized test function.
�
Assume that we reject H0 with probability γ when n1 Xi = 1 or 4. Then
� n
�
� n
�
�
�
�
�
�
�
0.1 = P
Xi = 0 or 5�p = 1/2 + γP
Xi = 1 or 4�p = 1/2
1
1
⇒ γ = 0.0375/0.3125 = 0.114.
MathStat(II) ∼ Lecture 6 ∼ page 19
It follows that we have a randomized test function

�n


1,
if

1 xi = 0 or 5

�n
ϕ(x) = 0.114,
if
1 xi = 1 or 4



0,
o.w.
The power of this test is
� 5
�
� 5
�
�
�
�
�
�
�
Ep�=1/2[ϕ(X)] = P
Xi = 0 or 5�p �= 1/2 +0.114P
Xi = 1 or 4�p �= 1/2
1
1
MathStat(II) ∼ Lecture 6 ∼ page 20
☞ EX.
iid
Let X1, X2 ∼ exp(θ), where θ ∈ Ω = {2, 4}. To test H0 : θ = 2 vs. H1 : θ = 4.
Assume that there is a test function

1,
if x1 + x2 ≥ 9.5
ϕ(x) =
0,
o.w.
That is, we reject H0 when X1 + X2 ≥ 9.5.
(i) Find the probabilities of the type I error and the type II error, respectively.
(ii) Evaluate the power for this test.
MathStat(II) ∼ Lecture 6 ∼ page 21
✍ Sol.
(i)
P (Type I error ) = Eθ=2[ϕ(X)] = P (X1 + X2 ≥ 9.5|θ = 2)
� 9.5 � 9.5−x2
1 −(x1+x2)/2
=1−
e
dx1dx2
4
0
0
� 9.5
�
1 −x2/2 �
−(9.5−x2 )/2
=1−
e
1−e
dx2
�0 2
�
9.5 −9.5/2
= 1 − 1 − e−9.5/2 −
e
2
11.5 −9.5/2 ∼
=
e
[It is the size of this test]
= 0.05
2
P (Type II error ) = P (X1 + X2 < 9.5|θ = 4) = 1 − Eθ=4[ϕ(X)]
� 9.5 � 9.5−x2
1 −(x1+x2)/4
=
e
dx1dx2
16
0
0
13.5 −9.5/4 ∼
=1−
e
= 0.69
4
MathStat(II) ∼ Lecture 6 ∼ page 22
(ii) By the result of (i), we can obtain the power of this test directly by
βϕ(4) = 1 − P ( Type II error ) ∼
= 0.31
☞ EX.
iid
Let X1, X2 ∼ exp(θ), θ ∈ Ω = {1, 2}. We reject H0 if the observed values of
x1 and x2 such that
f (x1; 2)f (x2; 2) 1
≤ ,
f (x1; 1)f (x2; 1) 2
1
f (x; θ) = e−x/θ ,
θ
for H0 : θ = 2 vs. H1 : θ = 1
Find the size of this test and the power of the test.
✍ Sol.
f (x1; 2)f (x2; 2) (1/4)e−(x1+x2)/2 1
1
1
=
≤ ⇐⇒ e(x1+x2) ≤ ⇐⇒ x1+x2 ≤ 2 ln 2
−(x
+x
)
f (x1; 1)f (x2; 1)
2
4
2
e 1 2
MathStat(II) ∼ Lecture 6 ∼ page 23
Therefore, the size of the test is
P (X1 + X2 ≤ 2 ln 2|θ = 2) = 1 − e− ln 2 − ln 2e− ln 2 =
1 1
− ln 2
2 2
and the power of the test is
P (X1 + X2 ≤ 2 ln 2|θ = 1) = 1 − e−2 ln 2 − 2 ln 2e−2 ln 2 =
3 1
− ln 2
4 2
MathStat(II) ∼ Lecture 6 ∼ page 24
✑ Theorem: (The Neyman-Pearson Fundamental Lemma )✯✯✯✯✯
Let {fθ , θ ∈ Ω}, where Ω = {θ0, θ1}, be a family of possible distribution of X.
Testing H0 : θ = θ0 vs. H1 : θ = θ1 at level α.
Let Eθ0 [ϕ(X)] = α


1,


and ϕ(x) =
γ(x),



0,
(1)
if fθ1 (x) > kfθ0 (x)
if fθ1 (x) = kfθ0 (x)
if fθ1 (x) < kfθ1 (x)
for some k ≥ 0 and 0 ≤ γ(x) ≤ 1
(i) Sufficient condition for an MP test (or UMP test)
If a test satisfies (1) and (2) then it is an MP test.
(ii) Necessary condition for an MP test
If ϕ∗ is an MP test then for some k it satisfies (2).
MathStat(II) ∼ Lecture 6 ∼ page 25
(2)
✍ Proof:
(i) In the continuous case, suppose that ϕ is a test satisfying (1) and (2) and
the ϕ∗ is any other test with Eθ0 [ϕ∗(X)] ≤ α.
Denote
S�+ = {x : ϕ(x) − ϕ∗(x) > 0} and S − = {x : ϕ(x) − ϕ∗(x) < 0}
�
∵
. . . [ϕ(x) − ϕ∗(x)][fθ1 (x) − kfθ0 (x)]dx
�
�
= ...
[ϕ(x) − ϕ∗(x)][fθ1 (x) − kfθ0 (x)]dx ≥ 0
 S +∪S −
 f (x) − kf (x) > 0 when x ∈ S + and
θ1
θ0
Note that
 fθ (x) − kfθ (x) < 0 when x ∈ S −
1
0
�
�
�
�
∗
∴ . . . [ϕ(x)−ϕ (x)]fθ1 (x)dx ≥ k . . . [ϕ(x)−ϕ∗(x)]fθ0 (x)dx ≥ 0
(Note that Eθ0 [ϕ∗(X)] ≤ α and Eθ0 [ϕ(X)] = α imply
Eθ0 [ϕ(X) − ϕ∗(X)] ≥ 0)
⇒ Eθ1 [ϕ(X)] ≥ Eθ1 [ϕ∗(X)].
MathStat(II) ∼ Lecture 6 ∼ page 26
(ii) In the continuous case, let ϕ∗ be an MP test at level α and let ϕ satisfy (1)
and (2).
Let S = (S + ∪ S −) ∩ A, where S + and S − are defined in (i) and A = {x :
fθ1 (x) �= kfθ0 (x)}. Assume that P (S) > 0, then
�
�
. . . [ϕ(x) − ϕ∗(x)][fθ1 (x) − kfθ0 (x)]dx > 0,
S
so Eθ1 [ϕ(X)] > Eθ1 [ϕ∗(X)].
That is a contradiction, and therefore P (S) = 0,
i.e., P (ϕ(X) = ϕ∗(X)) = 1
MathStat(II) ∼ Lecture 6 ∼ page 27
☞ EX.
Let X ∼ N (0, 1) under H0 and X ∼ Cauchy distribution under H1. Find an
MP with size α ≤ 0.1 test of H0 against H1.
✍ Sol.
f1(x)
=
f0(x)
1
π(1+x2 )
2
√1 e−x /2
2π
√ x2/2
2 e
=√
π 1 + x2
Using the Neyman-Pearson Lemma, thus the MP test
is of the form

� 2
ex /2
1,
if π2 1+x
2 > k
ϕ(x) =
0,
o.w.
where k is determined so that E0[ϕ(X)] = α.
1.2
1.1
y=
exp(x2 2)
1 + x2
y = 1.044
0.9
1.0
x = ! 1.645
0.8
test.function(x)
1.3
1.4
1.5
MathStat(II) ∼ Lecture 6 ∼ page 28
-2
-1
0
1
2
x
MathStat(II) ∼ Lecture 6 ∼ page 29
∵ The size α ≤ 0.1.
∴ The test function can be rewritten as

1,
if |x| > k1
ϕ(x) =
0,
o.w.
(intuition?)
where k1 is determined from
� k1
1
2
√ e−x /2dx = 1 − α
2π
−k1
It follows that k1 = Zα/2. The power of the test is given by
� k1
�
1
2
�
E[ϕ(X)�H1] = 1 −
dx = 1 − tan−1 k1
2
π
−k1 π(1 + x )
2
= 1 − tan−1 Zα/2
π
MathStat(II) ∼ Lecture 6 ∼ page 30
☞ EX.
iid
Let X1, . . . , Xn ∼ Bernoulli(p), and let H0 : p = p0, H1 : p = p1 with p1 > p0.
Find the MP size α test of H0 against H1.
✍ Sol.
Using the Neyman-Pearson Lemma, it gives that the MP size α test of H0 vs.
H1 is of the form



�
λ(x) > k
�

x
1,
p1 i (1 − p1)n− xi
ϕ(x) = γ,
where λ(x) = � x
λ(x) = k ,
�
i
n− xi

p
(1
−
p
)

0
0

0,
λ(x) < k
and k and γ are determined from Ep0 [ϕ(X)] = α.
� �� x i �
�� x i �
�n
p1
1 − p0
1 − p1
∵ λ(x) =
×
×
and p1 > p0
p0
1 − p1
1 − p0
↑ does not depend on x
�
∴ λ(x) is an increasing function of
xi .
MathStat(II) ∼ Lecture 6 ∼ page 31
Thus the MP size α test is of the form

�


1,
xi > k


�
ϕ(x) = γ,
xi = k



0,
o.w.
(intuition?)
Also, k1 and γ are determined from
� n
�
� n
�
�
�
α = Ep0 [ϕ(X)] = Pp0
Xi > k1 + γPp0
X i = k1
1
1
MathStat(II) ∼ Lecture 6 ∼ page 32
✇ Note:
This MP size α test is independent of p1 as long as p1 > p0, that is, it remains
an MP size α test against any p > p0 and is therefore a UMP test of p = p0
against p > p0. For the same example, in particular, let n = 5, p0 = 1/2,
p1 = 3/4, and α = 0.05. Then the MP test is given by

�


1,
xi > k


�
ϕ(x) = γ,
xi = k


�

0,
xi < k
where k and γ are determined from 0.05 = α =
It follows that k = 4 and γ = 0.122, i.e.,




1,
ϕ(x) =
0.122,



0,
5 � � � �5
�
5
1
k+1
�5
x
1 xi
>4
1 xi
=4
�5
o.w.
MathStat(II) ∼ Lecture 6 ∼ page 33
2
� � � �5
5
1
+γ
k
2
☞ EX.
iid
Let X1, . . . , Xn ∼ N (µ, σ 2) where both µ and σ 2 are unknown. One wishes
to test H0 : µ = µ0, σ 2 = σ02 against H0 : µ = µ1, σ 2 = σ02. Find an MP size
α test of H0 vs. H1.
✍ Sol.
The Neyman-Pearson lemma leads to the following MP test:

1,
if λ(x) > k,
ϕ(x) =
0,
if λ(x) < k,
� �
�
� �
�
where λ(x) = exp − (xi − µ1)2/(2σ02) / exp − (xi − µ0)2/(2σ02)
and k is determined from Eµ0,σ2 [ϕ(X)] = α.
0
λ(x) can be simplified by
λ(x) = exp
��n
1 xi
(µ1
σ02
�
n 2
2
− µ0) + 2 (µ0 − µ1) .
2σ0
MathStat(II) ∼ Lecture 6 ∼ page 34
�
If µ1 > µ0, then λ(x) > k ⇔ n1 xi > k1, where k1 is determined from
� n
�
�
�
�
k1 − nµ0
α = Pµ0,σ2
X i > k1 = P Z > √
,
Z ∼ N (0, 1)
0
nσ
0
1
√
giving k1 = Zα nσ0 + nµ0.
The case µ1 < µ0 is treated similarly. The test function is

�n
√
1,
if
1 xi < Zα nσ0 + nµ0
ϕ(x) =
0,
o.w.
MathStat(II) ∼ Lecture 6 ∼ page 35
✇ Note:
(1) If σ0 is known, the test determined above is independent of µ1 as long
as µ1 > µ0 (or µ1 < µ0), and it follows that the test is UMP against
H1� : µ > µ0, σ 2 = σ02 (or H1� : µ < µ0, σ 2 = σ02).
(2) If σ0 is not known, that is, the null hypothesis is a composite hypothesis
H0�� : µ = µ0, σ 2 > 0 to be tested against the alternatives H1�� : µ = µ1, σ 2 >
0 (if µ > µ0), then the MP test determined above depends on σ 2. In other
words, an MP test against the alternative H1��� : µ = µ1, σ 2 = σ02 will not
(4)
be an MP against H1 : µ = µ1, σ 2 = σ12 where σ12 �= σ02.
✎ HW. §8.1 — 2, 4 ∼ 10.
MathStat(II) ∼ Lecture 6 ∼ page 36
§8.2 Uniformly Most Powerful Tests
✇ Note:
In general, to test H0 : θ ≤ θ0 vs. H1 : θ > θ0 or its dual, H0� : θ ≥ θ0
v.s.H1 : θ < θ0 is not possible to find an UMP test. But we can consider a special class of distributions that is large enough to include the one-parameter
exponential family, for which an UMP test of a one-sided hypothesis exists.
✒ Def.
Let {fθ , θ ∈ Ω} be a family of pdf’s, Ω ⊆ R1(one parameter). We say that
{fθ } has a monotone likelihood ratio (MLR) in the statistic T (X) if
for θ1 < θ2, whenever fθ1 , fθ2 are distinct, the ratio fθ2 (x)/fθ1 (x) is a nondecreasing function of T (X) for x ∈ {x|fθ1 (x) > 0, or fθ2 (x) > 0}.
MathStat(II) ∼ Lecture 7 ∼ page 1
☞ EX.
iid
Let X1, . . . , Xn ∼ U (0, θ), θ > 0. The joint pdf of X1, . . . , Xn is
1
,
0 ≤ max{x1, . . . , xn} ≤ θ.
θn
Let θ1 > θ2 and consider the ratio
fθ (x) =
fθ1 (x) (1/θ1n) I{0 ≤ max{x1, . . . , xn} ≤ θ1}
=
fθ2 (x) (1/θ2n) I{0 ≤ max{x1, . . . , xn} ≤ θ2}
� �n
θ2 I(0 ≤ x(n) ≤ θ1)
=
θ1 I(0 ≤ x(n) ≤ θ2)

1,
x(n) ∈ [0, θ2];
I(0 ≤ x(n) ≤ θ1)
=
Define R(x) = ∞,
I(0 ≤ x(n) ≤ θ2) ∞,
x(n) ∈ [θ2, θ1].
if x(n) > θ1. It follows that fθ1 /fθ2 is a nondecreasing function of x(n), and the
family of U (0, θ) has an MLR in x(n).
Let R(x) =
MathStat(II) ∼ Lecture 7 ∼ page 2
✑ Theorem:
The one-parameter exponential family
fθ (x) = exp{Q(θ)T (x) + S(x) + D(θ)},
where Q(θ) is nondecreasing, has an MLR in T (x).
✍ Proof:
For θ2 > θ1,
fθ2 (x)
= exp{T (x)[Q(θ2) − Q(θ1)] + D(θ2) − D(θ1)}
fθ1 (x)
∵ Q(θ) is a nondecreasing of θ.
∴ The ratio
fθ2 (x)
is a nondecreasing function of T (x).
fθ1 (x)
✇ Note:
We have already seen that U (0, θ), which is not an exponential family, has an
MLR.
MathStat(II) ∼ Lecture 7 ∼ page 3
✑ Theorem:
let X ∼ fθ , θ ∈ Ω ⊆ R, where {fθ } has an MLR in T (x)
For testing H0 : θ ≤ θ0 vs. H1 : θ > θ0, θ0 ∈ Ω, any test of the form



if T (x) > t0

1,
ϕ(x) =
γ,



0,
if T (x) = t0
if T (x) < t0
has a nondecreasing power function (intuition?) and is UMP of its size
Eθ0 [ϕ(X)] = α.
Moreover, for every 0 ≤ α ≤ 1 and every θ0 ∈ Ω, there exists a t0,
−∞ ≤ t0 ≤ ∞, and γ ∈ [0, 1] such that the test described above is the
UMP size α test of H0 against H1.
➳ Remark:
By interchanging inequalities throughout in the above theorem, we see that
this theorem also provides a solution of the dual problem H0� : θ ≥ θ0 against
H1� : θ < θ0.
MathStat(II) ∼ Lecture 7 ∼ page 4
☞ EX.
Let X have the pdf (or pmf) PM (X = x) =
�M ��N −M �
x
�Nn−x
�
,
n
max{0, M + n − N } ≤ x ≤ min{M, n}, and M is an unknown, positive
integer. Find an UMP size α test of H0 : M ≤ M0 vs. H1 : M > M0 if it
exists.
✍ Sol.
Is the distribution of this one a one-parameter exponential family? (No!)
Because the ratio
PM +1{X = x}
M +1
N −M −n+x
R(x) =
=
×
,
PM {X = x}
N −M
M +1−x
we see that {PM } has an MLR in x, i.e., the ratio is a nondecreasing function
of x. Note that (M + 1)/(N − M ) in R(x) is independent of x. It follows that
there exists an UMP test of H0 : M ≤ M0 vs. H1 : M > M0, which reject H0
when X is too large, i.e., the UMP size α test is given by
MathStat(II) ∼ Lecture 7 ∼ page 5




1,
ϕ(x) = γ,



0,
x>k
x=k
x<k
where k and γ are determined from EM0 [ϕ(X)] = α.
☞ EX.
iid
Let X1, . . . , X25 ∼ N (θ, 100). Find an UMP size α = 0.1 test for H0 : θ = 75
against H1 : θ > 75.
✍ Sol.
∵ {N (θ, 100), θ ∈ R} is a one-parameter exponential family [HW]
�
∴ {N (θ, 100), θ ∈ R} has an MLR in 25
1 Xi (or X̄). It follows that the UMP
test function is given ϕ(x) = 1, if x̄ > k and = 0, otherwise, where k is
determined from P (X̄ > k|θ = 75) = 0.1.
∵X̄ ∼ N (75, 4) under H0.
∴ Z0.1 = (k − 75)/2. Note that Z ∼ N (0, 1), P (Z > Z0.1) = 0.1, Z0.1 = 1.28.
⇒ k = 77.56.
MathStat(II) ∼ Lecture 7 ∼ page 6
☞ EX.
iid
Let X1, . . . , Xn ∼ N (θ, 16). Find the sample size n and an UMP test of
H0 : θ = 25 vs. H0 : θ < 25 with power function K(θ) so that approximately
K(25) = 0.1 and K(23) = 0.9.
✍ Sol.
∵ {N (θ, 16), θ ∈ R} is a one-parameter exponential family.
�
∴ This family has an MLR in X̄ (or
Xi).

1,
x̄ < k
Then we have the UMP test ϕ(x) =
0,
otherwise.

 K(25) = 0.1;
∵
 K(23) = 0.9.

�
�

k − 25


= 0.1;
 P Z< √
4/ n
�
�
∵

k − 23


 P Z < 4/√n = 0.9.
MathStat(II) ∼ Lecture 7 ∼ page 7
It follows that
−Z0.1 = Z0.9 =
k − 25
k − 23
√ —– ➀ and Z0.1 = √ —– ➁
4/ n
4/ n
k − 25
by ➀/➁, so k = 24.
k − 23
Substituting k = 24 into ➁, so n ∼
= 26.2
We have −1 =
We can take n be 26 or 27.
Accordingly, the test function of this UMP test is

1,
x̄ < 24;
ϕ(x) =
0,
otherwise.
MathStat(II) ∼ Lecture 7 ∼ page 8
☞ EX.
iid
Let X1, X2 ∼ U (µ, µ + 1). For testing H0 : µ = 0 vs. H1 : µ > 0, we have
two competing tests:
① ϕ1(x1) : Reject H0 if x1 > 0.95 and ② ϕ2(x1, x2) : Reject H0 if x1 + x2 > c.
(a) Find the value of c so that ϕ2 has the same size as ϕ1.
(b) Calculate the power function of each test. Draw a well-labeled graph of
each power function.
✍ Sol.
(a) Let W = X1 + X2, so the pdf of W under H0 is given by

2 − w,
1 < w < 2;
fW (w) =
w,
0 < w < 1.
Claim Pµ=0(X1 > 0.95) = Pµ=0(W > c).
� 2
⇒ 0.05 =
(2 − w)dw if 1 < c ≤ 2.
c
√
⇒ 0.05 = 2 − 2c + c2/2 ⇒ c = 2 − 1/ 10.
MathStat(II) ∼ Lecture 7 ∼ page 9
(b) The power function of the first test is




�
0,
µ+1
Eµ[ϕ1(X1)] =
?
dx =
0.95
µ + 0.05,



1,
µ < −0.05
− 0.05 ≤ µ < 0.95
µ ≥ 0.95
For general case, the pdf of W is

2µ + 2 − w,
2µ + 1 < w < 2µ + 2
fW (w) =
w − 2u,
2µ < w < 2µ + 1
MathStat(II) ∼ Lecture 7 ∼ page 10
Let


 W =X +X
 X =Z
1
2
1
⇒
 Z = X1
 X2 = W − Z

 µ≤Z ≤µ+1
Then fW,Z (w, z) = 1, where
 µ≤W −Z ≤µ+1
MathStat(II) ∼ Lecture 7 ∼ page 11
Eµ[ϕ2(X)] = Pµ(X1 + X2 > c)



0,
if µ < c/2 − 1, (i.e., c > 2µ + 2)


�

2µ+1

1



(w − 2µ)dw + ,
if 2µ < c < 2µ + 1,
2
c
= � 2µ+2



(2µ + 2 − w)dw,
if 2µ + 1 < c < 2µ + 2,



c


1,
if µ > c/2. (i.e., c < 2µ)



0,
if µ ≤ c/2 − 1,




1
c
c−1

 (2µ + 2 − c)2,
if − 1 < µ ≤
,
2
2
= 2 (c − 2µ)2
c−1
c


1
−
,
if
<
µ
≤
,


2
2
2



1,
if µ > c/2.
MathStat(II) ∼ Lecture 7 ∼ page 12
✑ Theorem:
For the one-parameter exponential family, there exists an UMP test of
the hypothesis H0 : θ ≤ θ1 or θ ≥ θ2 (θ1 < θ2) against H1 : θ1 < θ < θ2 that
is of the form



if c1 < T (x) < c2,

1,
ϕ(x) =
γi ,



0,
if T (x) = ci,
i = 1, 2(c1 < c2),
if T (x) < c1 or > c2
where the c’s and the γ’s are given by
Eθ1 [ϕ(X)] = Eθ2 [ϕ(X)] = α.
☞ EX.
Let X1, . . . , Xn be iid N (µ, 1). Find an UMP size α test for H0 : µ ≤ µ0 or
µ ≥ µ1 (µ1 > µ0) against H1 : µ0 < µ < µ1.
MathStat(II) ∼ Lecture 7 ∼ page 13
✍ Sol.
It is clear that {N (µ, 1), µ ∈ R} is an one-parameter exponential family.
Then there exists the UMP
 test given by
�


1,
if
c
<
x i < c2 ,
1


�
�
ϕ(x) = γi,
if
xi = c1 or
xi = c2 ,



0,
o.w..
where c1 and c2 are determined from
�
�
α = Pµ0 (c1 <
Xi < c2) = Pµ1 (c1 <
Xi < c2) and γ1 = γ2 = 0.
Thus,
�
�
�
�
c1 − nµ0
c2 − nµ0
c1 − nµ1
c2 − nµ1
√
√
α = P µ0
<Z< √
= P µ1
<Z< √
.
n
n
n
n
Given α, n, µ0, and µ1, we can solve for c1 and c2.
✇ Note:
UMP two-sided tests for H0 : θ1 ≤ θ ≤ θ2 and H0� : θ = θ0 for the
one-parameter exponential family do not exist.
MathStat(II) ∼ Lecture 7 ∼ page 14
✦ Counter Example: :
iid
Let X1, . . . , Xn ∼ U (0, θ). To test H0 : θ = θ0 vs. H1 : θ �= θ0, the UMP test
of the one exists.
☞ EX.
�
iid
Let X1, . . . , Xn ∼ N (0, σ 2). Since {N (0, σ 2) : σ 2 > 0} has an MLR in n1 Xi2,
it follows that UMP tests exist for one-sided hypotheses σ ≥ σ0 and σ ≤ σ0.
Consider now the null hypothesis H0 : σ = σ0 vs. H1 : σ �= σ0. We will show
that an UMP test of H0 : σ = σ0 does not exist. For testing σ = σ0 against
σ > σ0, a test of the form

�n 2
1,
if
1 x i > c1
ϕ1(x) =
0,
otherwise.
is UMP, and for testing σ = σ0 against σ < σ0, a test of the form

�n 2
1,
if
1 x i < c2
ϕ2(x) =
0,
o.w.
MathStat(II) ∼ Lecture 7 ∼ page 15
is UMP. If the size is chosen as α, then c1 = σ02χ2n,α and c2 = σ02χ2n,1−α, where
P (W ≥ χ2n,α) = α, W is distributed from a chi-square distribution with n
degrees of freedom. Clearly, neither ϕ1 nor ϕ2 is UMP for H0 : σ = σ0 vs.
H1 : σ �= σ0
✎ HW. §8.2 — 1 ∼ 4, 7 ∼ 9, 11, 13.
MathStat(II) ∼ Lecture 7 ∼ page 16
§8.3 Likelihood Ratio Tests
✒ Def.
For testing H0 : θ ∈ Ω0 against H1 : θ ∈ Ω1 a test of the form:
reject H0 if and only if λ(x) < c, where c is some constant, and
sup fθ (x)
λ(x) =
θ∈Ω0
sup fθ (x)
,
θ∈Ω
where θ ∈ Ω and X be a random vector with pdf fθ , is called a likelihood
ratio test (LR test).
➳ Remark:
It is clear that 0 ≤ λ ≤ 1. The constant c is determined from the size restriction
sup Pθ (X : λ(X) < c) = α.
θ∈Ω0
MathStat(II) ∼ Lecture 8 ∼ page 1
☞ EX.
Let X ∼ b(n.p). Find a size α likelihood ratio test of H0 : p ≤ p0 against
H1 : p > p0, p0 ∈ (0, 1).
✍ Sol.
� �
supp≤p0 nx px(1 − p)n−x
� �
Let λ(x) =
.
sup0≤p≤1 nx px(1 − p)n−x
� x �x �
x �n−x
It is clear that sup px(1 − p)n−x =
1−
and
n
n
p∈[0,1]
supp≤p0 px(1 − p)n−x


px0 (1 − p0)n−x,
= � �x �
x �n−x

 x
1−
,
n
n
x
if p0 < ;
n
x
if ≤ p0.
n
MathStat(II) ∼ Lecture 8 ∼ page 2
It follows that

px0 (1 − p0)n−x



,
 � x �x �
x �n−x
1
−
λ(x) =
n
n



1,
x
if p0 < ;
n
x
if p0 ≥ .
n
Note that λ(x) < 1 for np0 < x and λ(x) = 1 for x ≤ np0, and it follows that
λ(x) is a decreasing function of x. Thus we have a test function defined by



x > c,

1,
ϕ(x) =
γ,



0,
x = c,
x < c.
γ and c can be determined by Ep0 [ϕ(X)] = α.
MathStat(II) ∼ Lecture 8 ∼ page 3
☞ EX.
iid
Let X1, . . . , Xn ∼ N (µ, σ 2), where both µ and σ 2 are unknown. Find a size
α likelihood ratio test of H0 : µ = µ0 against H1 : µ �= µ0.
✍ Sol.
Let θ = (µ, σ 2) and Ω0 = {θ : µ = µ0, σ 2 > 0} and Ω = {θ : µ ∈ R, σ > 0}
�
�n
� �n
�
2
1
1 (xi − µ0 )
sup fθ (x) = sup √
exp −
= fµ0,σ̂2 (x),
0
2σ 2
2πσ 2
θ∈Ω0
σ 2 >0
n
where
σ̂02
is the MLE,
σ̂02
Thus
1�
=
(xi − µ0)2.
n i=1
1
e−n/2
�
n/2
n
n/2
2
θ∈Ω0
(2π/n) { 1 (xi − µ0) }
2
The MLE of θ = (µ, σ ) when both µ and σ 2 are unknown is
�
� n
n
� xi �
(xi − x̄)2
,
.
n
n
i=1
i=1
sup fθ (x) =
MathStat(II) ∼ Lecture 8 ∼ page 4
Thus
1
e−n/2
�
n/2
n
n/2
2
θ∈Ω
(2π/n) { 1 (xi − x̄) }
Therefore, the likelihood ratio can be simplified to
 n
n/2
�





(xi − x̄)2 


�
�n/2


1
1
�n
λ(x) =
=
n
2/
2
�


1
+
[n(x̄
−
µ
)
0


1 (xi − x̄) ]
2

(xi − µ0) 



sup fθ (x) =
1
n
�
n
�
(xi − µ0) =
2
n
�
(xi − x̄ + x̄ − µ0) =
(xi − x̄)2 + n(x̄ − µ0)2.
1
1
� 1
∵ λ(x) is a decreasing function of n(x̄ − µ0)2/ n1 (xi − x̄)2.
�√
�
n
�
� n(x̄ − µ0) �
2
−1
�
�
∴ We reject H0 if �
(xi − x̄)2.
� > c, where s = (n − 1)
s
1
Note that
2
MathStat(II) ∼ Lecture 8 ∼ page 5
The statistic
t(X) =
√
√
n(X̄ − µ0)
(X̄ − µ0)/(σ/ n)
=�
�1/2
S
(n − 1)S 2/σ 2
(n − 1)
has a t-distribution with n − 1 degrees of freedom under H0 : µ = µ0. A test
function can be defined by

1,
|t| > tn−1,α/2,
ϕ(x) =
0,
otherwise.
✒ Def.
Let the random variable W be N (δ, 1) and the random variable V be χ2(r);
moreover W and V are assumed to be independent. The quotient T =
�
W/ v/r is said to have a noncentral t-distribution with r degrees of
freedom and noncentral parameter δ.
MathStat(II) ∼ Lecture 8 ∼ page 6
☞ EX.
In the previous example, the statistic t(X) is a noncentral t-distribution with
√
r d.f. and noncentral parameter δ = (µ − µ0)/(σ/ n) under H1.
☞ EX.
Let X1, X2, . . . , Xm and Y1, Y2, . . . , Yn be independent random samples from
N (µ1, σ12) and N (µ2, σ22), respectively. Find a size α likelihood ratio test of
H0 : σ12 = σ22 against H1 : σ12 �= σ22 if µ1 and µ2 are both unknown.
✍ Sol.
Let Ω = {θ : µi ∈ R, σi2 > 0, i = 1, 2}, θ = (µ1, σ12, µ2, σ22),
and Ω0 = {θ : µi ∈ R, i = 1, 2 σ12 = σ22 > 0}
The joint pdf of X and Y is
�
�
m
n
�
1
1 �
1
fθ (x, y) =
exp − 2
(xi − µ1)2 − 2
(yi − µ2)2 .
m
n
(n+m)/2
2σ1 1
2σ2 1
(2π)
σ1 σ2
We have the MLE of θ on Ω are µˆ1 = x̄, µˆ2 = ȳ,
�
�n
1
2
2
2
σ̂12 = m1 m
(x
−
x̄)
,
and
σ̂
=
i
2
1
1 (yi − ȳ) .
n
MathStat(II) ∼ Lecture 8 ∼ page 7
If the joint pdf is restrict to Ω0, then the MLE of θ is µ̂ˆ 1 = x̄, µ̂ˆ 2 = ȳ,
�m
�n
2
2
1 (xi − x̄) +
1 (yi − ȳ)
2
σ̂ =
if σ 2 = σ12 = σ22.
m+n
Thus
sup fθ (x, y) =
θ∈Ω0
and
e−(m+n)/2
[2π/(m + n)](m+n)/2 {
�m
1
(xi − x̄)2 +
�n
1 (yi
− ȳ)2}
(m+n)/2
e−(m+n)/2
sup fθ (x, y) =
�
�n
2 }m/2 {
2 n/2
θ∈Ω
(2π/m)m/2(2π/n)n/2 { m
(x
−
x̄)
i
1
1 (yi − ȳ) }
so that
� �� n
�
�n/2 ��m
2 m/2
2 n/2
(x
−
x̄)
(y
−
ȳ)
n
i
i
1
1
λ(x, y) =
�m
�n
(m+n)/2
2
m+n
{ 1 (xi − x̄) + 1 (yi − ȳ)2}
�m
�n
2
2
x̄)
(x
−
i
1
1 (yi − ȳ)
2
2
Let s1 =
and s2 =
.
m−1
n−1
�
m
m+n
�m/2 �
MathStat(II) ∼ Lecture 8 ∼ page 8
Then
λ(x, y) =
�
m
m+n
�m/2 �
n
m+n
�n/2
{1 + [(m − 1)/(n −
×
1
1)]s21/s22}n/2{1
+ [(n − 1)/(m − 1)]s22/s21}m/2
Let f = s21/s22. Then λ(x, y) < c is equivalent to f < c1 or f > c2 (H.W.)
Under H0, the statistic
�m
2
1 (Xi − X̄) /(m − 1)
F = �
under H0
n
2
1 (Yi − Ȳ ) /n − 1
has an F (m − 1, n − 1) distribution, so that c1, c2 can be selected. It is usual
to take
α
P (F ≤ c1) = P (F ≥ c2) =
2
� 2 2�
under H1, σ2 /σ1 F has an F (m − 1, n − 1) distribution.
MathStat(II) ∼ Lecture 8 ∼ page 9
☞ EX.
A random sample X1, . . . , Xn, is drawn from a Pareto population with pdf
θν θ
f (x|θ, ν) = θ+1 ,
x
(i) Find the MLEs of θ and ν
ν ≤ x, θ > 0, ν > 0
(ii) Show that the LR test of H0 : θ = 1, ν unknown against H1 : θ �= 1, ν
unknown, has a critical region of the
form c =
��
� {x : T (x) ≤ c1 or T (x) ≥
n
i=1 Xi
c2}, where 0 < c1 < c2 and T = ln
n
X(1)
(iii) Show that, under H0, 2T ∼ χ2(2n−2)
✍ Sol.
(i) ν̂ = X(1) and θ̂ =
ln
(ii) (H.W.)
n
��
� (H.W.)
n
i=1 Xi
n
X(1)
MathStat(II) ∼ Lecture 8 ∼ page 10
(iii) 2T = 2 ln
�
n
�
Xi
1
�
�
− 2 ln
�
�
n
X(1)
⇒ 2T + 2n ln X(1)/ν = 2 ln
�
�
n
�
�
(Xi/n) , using T = ln
1
Let Yi = Xi/ν, then the pdf of Yi is
fY (y) = y −2,
��n
1 (Xi /ν)
(X(1)/ν)n
�
1<y<∞
Thus, T is an ancillary statistic. Under H0 : θ = 1, X(1) is complete and
sufficient statistic. Then Basus’s theorem given X(1) and T are independent.
It follows that φ1(t)φ2(t) = φ3(t), where φ1, φ2 and φ3 are the mgf’s of 2T ,
�
�
�
2n ln X(1)/ν and 2 ln [ ni=1 Xi/ν], respectively.
�
�
� �
iid
∵ 2 ln Yi ∼ χ2(2),
i = 1, 2, . . . n and 2n ln X(1)/ν = 2n ln Y(1) ∼ χ2(2)
∴ φ1(t)(1 − 2t)−1 = (1 − 2t)−n
⇒
φ1(t) = (1 − 2t)−(n−1)
⇒ 2T ∼ χ2(2n−2)
MathStat(II) ∼ Lecture 8 ∼ page 11
✑ Theorem:
Under some regularity conditions on fθ (x), the random variable −2 ln λ(X) is
asymptotically distributed as a chi-square r.v. with degrees of freedom equal
to dim(Ω)−dim(Ω0).
✇ Note:
These conditions are mainly concerned with the existence and behavior of the
derivatives (with respect to the parameter) of the likelihood function, and the
support of the distribution (it cannot depend on the parameter).
✎ HW. §8.3 — 7, 9 ∼ 12.
MathStat(II) ∼ Lecture 8 ∼ page 12

Similar documents

TR2000-01: Advances in DFS Theory

TR2000-01: Advances in DFS Theory DFS theory [9] is an axiomatic theory of fuzzy natural language (NL) quantification. It aims at providing linguistically adequate models for approximate quantifiers like almost all, as well as for ...

More information