A statistic T = T(X)
Transcription
A statistic T = T(X)
§Chapter 7 Sufficiency ✒ Def. Let X = (X1, . . . , Xn) be a sample from {Fθ : θ ∈ Ω}. A statistic T = T (X) is sufficient for θ or for the family of distributions {Fθ : θ ∈ Ω} ⇔ the conditional distribution of X, given T = t, is independent of θ. ✇ Note: The outcome X1, . . . , Xn is always sufficient, but we will exclude this trivil statistic from consideration. ☞ EX. n � iid Let X1, . . . , Xn ∼ Bernoulli(p). Show that T (X) = Xi is sufficient for p. ✍ Sol. � P (X1 = x1, . . . , Xn = xn| ni=1 Xi = t) = i=1 P (X1 = x1, . . . , Xn = xn, T = t) 1 �n� �n� = t n−t t p (1 − p) t which is independent of p. MathStat(II) ∼ Lecture 1 ∼ page 1 ☞ EX. iid Let X1, X2 ∼ Poisson(λ). Show that X1 + X2 is sufficient for λ. ✍ Sol. P (X1 = x1, X2 = x2|X1 + X2 = t) = P (X1 = x1, X2 = t − x1) P (X1 + X2 = t) t! e−λλx1 e−λλt−x1 = −2λ × × e (2λ)t x1 ! (t − x1)! � �� � t 1 t = , x1 2 where xi = 0, 1, 2, . . ., i = 1, 2, and x1 + x2 = t. � �� � t 1 n ∵ is independent of λ. ∴ X1 + X2 is sufficient for λ. x1 2 MathStat(II) ∼ Lecture 1 ∼ page 4 ✇ Note: P (X1 = 0, X2 = 1) P (X1 = 0, X2 = 1|X1 + 2X2 =2) = P (X1 + 2X2 = 2) e−λλe−λ = P (X1 = 0, X2 = 1) + P (X1 = 2, X2 = 0) λe−2λ = −2λ λe + (λ2/2)e−2λ 1 = . 1 + λ/2 ⇒ X1 + 2X2 is not sufficient for λ. ✑ Theorem: (The Factorization Criterion) ✯✯✯✯✯ Let X1, . . . , Xn be r.v.s with the joint pdf fθ (x), θ ∈ Ω. Then T (X1, . . . , Xn) is sufficient for θ ⇔ fθ (x) = h(x)gθ (T (x)), where h is a nonnegative function of x only and does not depend on θ, and g is a nonnegative function of θ and T (x) only. The statistic T (X) and parameter θ may be vectors. MathStat(II) ∼ Lecture 1 ∼ page 5 iid ☞ EX. Let X1, . . . , Xn ∼ Bernoulli(p). � p ��ni=1 xi Pp(X = x) = p (1 − p) = (1 − p)n. 1−p � p ��ni=1 xi Taking h(x) = 1 and gp(x) = (1 − p)n . 1−p � By the factorization criterion, T (x) = ni=1 Xi is sufficient for p. �n i=1 xi n− �n i=1 xi iid ☞ EX. Let X1, . . . , Xn ∼ N (µ, σ 2), where both µ and σ 2 are unknown. Find the joint sufficient statistic of (µ, σ 2) � �n (x − µ)2 � 1 i ✍ Sol. fµ,σ2 (x) = √ exp − i=1 2 n 2σ (σ 2π) �n 2 � � 1 µ ni=1 xi nµ2 � i=1 xi = √ exp − + − 2 2σ 2 σ2 2σ (σ 2π)n �� � �n n 2 By the factorization criterion, T (X) = X , X is jointly sufi i=1 i=1 i ficiently for the parameter (µ, σ 2). An equivalent that is �nsufficient statistic 2 (X − X) i frequently used is T1(X) = (X, S 2), where S 2 = i=1 . n−1 MathStat(II) ∼ Lecture 1 ∼ page 6 ✎ HW. iid Let X1, . . . , Xn ∼ N (µ, σ 2). Prove that the conditional distribution of X given (X, S 2) is independent of (µ, σ 2). ☞ EX. iid Let X1, . . . , Xn ∼ U (−θ/2, θ/2), θ > 0. Find a sufficient statistic for θ. ✍ Sol. The joint pdf of X is given by 1 fθ (x) = n I(x(1) ≥ −θ/2)I(x(n) ≤ θ/2) θ By the factorization criterion, T (X) = (X(1), X(n)) is sufficient for θ. ✇ Note: f (x|X(1) = x(1), X(n) = x(n)) = n! (n−2)! = �x 1/θn (n) −x(1) θ � 1 �n−2 1 θ2 n(n − 1) x(n) − x(1) which is independent of θ. �n−2 , MathStat(II) ∼ Lecture 1 ∼ page 7 ✒ Def. 1 An estimator T (X) that is not unbiased is called biased, and the function b(θ, T ), defined by b(θ, T ) = Eθ (T (X)) − θ is called the bias of T (X). 2 The mean squared error (MSE) � of an estimator � T (X) of a parameter θ is the function of θ defined by Eθ (T (X) − θ)2 . ✇ Note: � Eθ (T (X) − θ) 2 � � = Eθ (T (X) − E[T (X)] + E[T (X)] − θ) = Varθ (T (X)) + [b(θ, T )]2 precision 2 � accuracy � � Therefore, for an unbiased estimator we have Eθ (T (X) − θ)2 = Varθ (T (X)). MathStat(II) ∼ Lecture 1 ∼ page 11 ☞ EX. iid Let X1, · · · , Xn ∼ N(µ, σ 2), where µ and σ 2 are both unknown. We know � that S 2 = ni=1(Xi − X)/(n − 1) is unbiased for σ 2. Note that S is not, in general, unbiased for σ. Find the bias of S for σ. ✍ Sol. (n − 1)S 2 ∼ χ2(n−1). σ2 � � √ √ � n � � n − 1 � −1 Thus, E( W ) = 2 Γ Γ . 2 2 �� � �−1� 2 �1/2 � n � � n − 1 � ⇒ Eσ (S) = σ Γ Γ n−1 2 2 Let W = Therefore, the bias of S can be evaluate by b(σ, S) = Eσ (S) − σ. MathStat(II) ∼ Lecture 1 ∼ page 12 ✇ Note: ✯✯✯✯✯ In particular, it is sometimes the case that a trade-off occurs between variance and bias in such a way that a small increase in bias can be traded for a large decrease in variance, resulting in an improvement in MSE. ☞ EX. iid Let X1, · · · , Xn ∼ N(µ, σ 2), where µ ∈ R and σ 2 ∈ R+ are both unknown. Let � S 2 = ni=1(Xi − X)2/(n − 1). We know that S 2 is an unbiased estimator and � n−1 2 an alternative estimator for σ 2 is the mle σ̂ 2 = ni=1(Xi − X)2/n = S . n �n � � (Xi − X)2 n−1 2 n−1 2 ∵ i=1 2 ∼ χ2(n − 1) ∴ E[σ̂ 2] = E S = σ . σ n n � n − 1 � 2(n − 1)σ 4 Moreover, Var[σ̂ 2] = Var S2 = . 2 n n Thus, the MSE of σ̂ 2 is �2 � � � � � � � E (σ̂ 2 − σ 2)2 = Var σ̂ 2 + E σ̂ 2 − σ 2 � �2 � � 2(n − 1)σ 4 n−1 2 2n − 1 4 2 = + σ −σ = σ . n2 n n2 MathStat(II) ∼ Lecture 1 ∼ page 13 2 ∵ Var(S 2) = σ 4 and n−1 2n − 1 2 2 2 1 −2 1 − = − − 2= − 2 < 0, 2 n n−1 n n−1 n n(n − 1) n ∀n ≥ 2 ∴ The MSE of σ̂ 2 is smaller than that of S 2. ✎ HW. Find a minimum MSE estimate of the form αS 2 for the parameter σ 2. n−1 Ans: α = . n+1 ✒ Def. Let θ0 ∈ Ω and U (θ0) = {T (X) : Eθ0 [T (X)] = θ0} such that Eθ0 [T 2(X)] < ∞. Then T0 ∈ U (θ0) is called a locally minimum variance unbiased estimator (LMVUE) at θ0 if Eθ0 [(T0 − θ0)2] ≤ Eθ0 [(T − θ0)2] ∀T ∈ U (θ0). or Varθ0 (T0) ≤ Varθ0 (T ). MathStat(II) ∼ Lecture 1 ∼ page 14 ✒ Def. Let U = {T (X)|Eθ [T (X)] = θ, ∀θ ∈ Ω} such that Eθ [T 2] < ∞, for all θ ∈ Ω. An estimate T0 ∈ U is called uniformly minimum variance unbiased estimate (UMVUE) of θ if Varθ (T0) ≤ Varθ (T ) ∀θ ∈ Ω and every T ∈ U ✎ HW. Let X1, · · · , Xn be independent r.v.s, and a1, · · · , an be real numbers such � that ni=1 ai = 1. Assume that E[Xi2] < ∞ with Var(Xi) = σi2 ∀i. Write � � S = ni=1 aiXi. Then Var(S) = ni=1 a2i σi2 = σ. Find the weights ai such that σ is minimum. ❉ Hint: 1/σ 2 Using Cauchy-Schwarz inequality, the weights ai = �n i 2 , ∀i can minij=1 1/σj 1 mize σ with the value σmin = �n 2. j=1 1/σj MathStat(II) ∼ Lecture 1 ∼ page 15 ✑ Theorem: (Rao-Blackwell) ✯✯✯✯✯ Let {Fθ : θ ∈ Ω} be a family of distribution functions and h(X) ∈ U = {W (X)|Eθ (W ) = θ, ∀θ ∈ Ω} with Eθ [h2(X)] < ∞ (or V arθ [h(X)] < ∞). Let T (X) be a sufficient statistic for {Fθ , θ ∈ Ω}. Then ① the conditional expectation Eθ (h|T ) is independent of θ and is an unbiased estimate of θ. Moreover, ② � � � � Eθ [E(h|T ) − θ]2 ≤ Eθ (h − θ)2 , ∀θ ∈ Ω or Varθ [E(h|T )] ≤ Varθ (h), ∀θ ∈ Ω The equality holds ⇔ Pθ (h = E(h|T )) = 1 ∀θ ∈ Ω. ↑ That is h(X) is a function of T (X). MathStat(II) ∼ Lecture 1 ∼ page 16 ✍ Proof: Because of the definition of sufficiency, Eθ (h|T ) is independent of θ. ∵ Eθ [E(h|T )] = Eθ (h) = θ. ∴ Eθ (h|T ) is unbiased for θ. Varθ (h) = Varθ [E(h|T )] + Eθ [Varθ (h|T )] ≥ Varθ [E(h|T )], ∀θ ∈ Ω. The equality of above inequality holds ⇔ Eθ [Varθ (h|T )] = 0 � 2 � � ⇔ Varθ (h|T ) = 0 ⇔ E h |T = E(h|T ) �2 ⇔ Pθ (h = E(h|T )) = 1, ∀θ ∈ Ω. ☞ EX. Let X and Y be two r.v.s with the joint pdf 2 f (x, y) = 2 e−(x+y)/θ , θ > 0, 0 < x < y < ∞. θ (a) Show that E[Y ] = 3θ/2 and Var[Y ] = 5θ2/4. (b) Show that E[Y |X = x] = x + θ and Var(X + θ) <Var[Y ]. MathStat(II) ∼ Lecture 1 ∼ page 17 ✍ Sol. The marginal pdf of X is given by � ∞ � 2 −x/θ ∞ −y/θ 2 fX (x) = f (x, y)dy = 2 e e dy = 2 e−x/θ θe−x/θ θ θ x x 2 −2x/θ = e , 0 < x < ∞, i.e. X ∼ exp(θ/2) θ The conditional pdf of Y given X = x is 1 fY |X (y|x) = e−(y−x)θ , 0<x<y<∞ θ ⇒ E[Y − X|X] = θ and Var[Y − X|X] = θ2 ⇒ E[Y |X] = X + θ and Var[Y |X] = θ2 � � E[Y ] = E E[Y |X] = E[X] + θ = θ/2 + θ = 3θ/2 � � Var[Y ] = E Var[Y |X] +Var{E[Y |X]} = θ2+Var(X + θ)= θ2 + θ2/4 = 5θ2/4 ⇒ Var(X + θ) <Var(Y ). ✎ HW. §7.1— 1∼ 4; §7.3 — 2,3,6. MathStat(II) ∼ Lecture 1 ∼ page 18 §7.4 Completeness and Uniqueness ✒ Def. Let {fθ (x), θ ∈ Ω} be a family of pdf’s. We say that this family is complete if Eθ [g(X)] = 0, ∀θ ∈ Ω implies Pθ (g(X) = 0) = 1, ∀θ ∈ Ω. ✒ Def. A statistic T (X) is said to be complete if the family of distribution of T (X) is complete. ☞ EX. � iid Let X1, · · · , Xn ∼ Bernoulli(p). Then T (X) = ni=1 Xi is a sufficient statistic. Show that T (X) is also complete. MathStat(II) ∼ Lecture 2 ∼ page 1 ✍ Sol. It’s equivalent to show that the family of distributions of T (X), {b(n, p), 0 < p < 1}, is complete. Let Ep[g(T )] = n � t=0 Thus, (1 − p) n n � t=0 � � n t g(t) p (1 − p)n−t = 0 t ∀p ∈ (0, 1). � � n t g(t) φ = 0, if φ = p/(1 − p) > 0, ∀p ∈ (0, 1). t This is a polynomial in φ with order n. Therefore the coefficients must vanish. That is g(t) = 0, t = 0, 1, 2, . . . , n. (or that is g is a zero function) ☞ EX. Let X ∼ N (0, θ), θ ∈ Ω = (0, ∞). Show that X is not complete. ✍ Sol. Let g(x) = x, then Eθ [g(X)] = Eθ [X] = 0, ∀θ ∈ Ω. However, g(x) = x is not identically zero. Thus, X is not complete. MathStat(II) ∼ Lecture 2 ∼ page 2 ✑ Theorem: (Lehmann-Scheffé) ✯✯✯✯✯ If T is a complete sufficient statistic and there exists an unbiased estimator h of θ, there exists unique UMVUE of θ, which is given by Eθ [h|T ]. ✍ Proof: If h1, h2 ∈ U = {W : Eθ [W ] = θ, ∀θ ∈ Ω}, then E[h1|T ] and E[h2|T ] are both unbiased and Eθ {E[h1|T ] − E[h2|T ]} = 0, ∀θ ∈ Ω. ∵ T is complete. ∴ E[h1|T ] = E[h2|T ]. Therefore, E[h|T ] is unique UMVUE. MathStat(II) ∼ Lecture 2 ∼ page 3 ☞ EX. Let X1, · · · , Xn be iid r.v.s with the pdf Pθ (X = x) = f (x; θ) = 1/θ, x = 1, 2, · · · , θ, where θ is an unknown positive integer. Find the UMVUE of θ. ✍ Sol. ∵ f (x; θ) = 1 I(x ≤ θ) (n) θn step 1. ∴ By the factorization criterion, T = X is sufficient for θ. (n) MathStat(II) ∼ Lecture 2 ∼ page 4 step 2. The pdf of T is given by P (T = x) = P (T ≤ x) − P (T ≤ x − 1) � x �n � x − 1 �n = − , x = 1, 2, · · · , θ. θ θ � Eθ [g(T )] = θx=1 g(x)P (T = x) = 0, ∀θ ≥ 1. We want to show that g(x) = 0, for all θ ≥ 1. For θ = 1, then E1[g(T )] = g(1) = 0 so g(1) = 0 �� �n � �n � �n � �n � For θ = 2, then E2[g(T )] = g(1) 12 − 02 ] + g(2)[ 22 − 12 =0 ⇒ g(2) = 0 Using the mathematical induction, we conclude that g(1) = g(2) = · · · g(θ) = 0 i.e. T is a complete sufficient statistic. MathStat(II) ∼ Lecture 2 ∼ page 5 ∵ E[X1] = (θ + 1)/2 ∴ U (X) = 2X1 − 1 is unbiased for θ. By the Lehmann-Scheffé Theorem, E[U (X)|T ] is UMVUE of θ. P (X1 = x1, T = y) P (X1 = x1|T = y) = , if x1 �= y and x1 < y P (T = y) P (X1 = x1, max{x2, · · · , xn} = y) = P (T = y) �� � � y − 1 �n−1� n−1 1 y − θ � step 3 =θ � � y −θ 1 �n n y − θ θ n−1 n−1 y − (y − 1) = , x1 = 1, 2, · · · , y − 1 n − (y − 1)n y y−1 � y n−1 P (X1 = y | T = y) = 1 − P (X1 = x1 | T = y) = n y − (y − 1)n x =1 1 T n+1 − (T − 1)n+1 Finally, we have E[U (X)|T ] = . T n − (T − 1)n MathStat(II) ∼ Lecture 2 ∼ page 6 §7.5 The Exponential Class (Family) of Distributions ✑ Theorem: Let {fθ : θ ∈ Ω} be a k-parameter exponential family given by fθ (x) = exp k �� i=1 � Qj (θ)Tj (x) + D(θ) + S(x) , where θ = (θ1, · · · , θk ) ∈ Ω, T = (T1, T2, · · · , Tk ), and x = (x1, · · · xn), k ≤ n. Let Q = (Q1, · · · Qk ) ∈ Ω, Then T = (T1(X), T2(X), · · · , Tk (X)) is a complete sufficient statistic. (or to say that the family is of full rank) MathStat(II) ∼ Lecture 2 ∼ page 7 ☞ EX. iid Let X1, · · · , Xn ∼ N (µ, σ 2), where µ ∈ (−∞, ∞) and σ 2 is assumed to be � known. Explain that T = ni=1 Xi is a complete sufficient statistic of µ ✍ Proof: � �n � 2 √ (x − µ) i −n f (x; θ) = ( 2πσ) exp − i=1 2 2σ �n 2 �n � 2� √ i=1 xi − 2µ i=1 xi + nµ = exp − n ln( 2πσ) − 2σ 2 �n 2 n � √ µ � nµ2 � i=1 xi = exp − n ln( 2πσ) − + x − i 2σ 2 σ 2 i=1 2σ 2 � Let Q(µ) = µ/σ 2, T (x) = ni=1 xi, D(µ) = −nµ2/(2σ 2), √ � and S(x) = −n ln( 2πσ) − ni=1 x2i /(2σ 2). �n Therefore, it is a one-parameter exponential family, and T = i=1 Xi is a complete sufficient statistic of µ. MathStat(II) ∼ Lecture 2 ∼ page 8 ☞ EX. iid Let X1, · · · , Xn ∼ U (0, θ), θ ∈ (0, ∞). Show that T = X(n) is a complete sufficient statistic of θ. ✍ Sol. Since 1 I(x(n) ≤ θ), θn we know that T = X(n) is sufficient for θ by the factorization criterion. However, the joint pdf of X is not a one-parameter exponential family. We need to show that � θ n−1 t Eθ [g(T )] = n n g(t)dt = 0, ∀θ > 0 implies g(t) = 0, ∀t. θ 0 fθ (x) = We differentiable both sides for Eθ [g(T )] = 0 with respect to θ to get g(θ) = 0, ∀θ > 0. Hence we have shown that T = X(n) is a complete statistic of θ. MathStat(II) ∼ Lecture 2 ∼ page 9 ☞ EX. � � iid Let X1, · · · , Xn ∼ N (θ, θ2). Show that T = ( ni=1 Xi, ni=1 Xi2) is sufficient for θ. Is it also complete for θ? �n 2 �n � 2� √ ✍ Sol. x − 2θ i=1 i i=1 xi + nθ fθ (x) = exp − n ln( 2πθ) − 2θ2 �n 2 � n � √ x xi n � = exp − n ln( 2πθ) − i=12 i + i=1 − 2θ θ 2 �n �n By the factorization criterion, T = ( i=1 Xi, i=1 Xi2) is sufficient for θ. �n 2 n � � √ n� 2 i=1 xi [Q1(θ)] + xiQ1(θ) − ∵ fθ (x) = exp − n ln( 2πθ) − 2 2 i=1 ∴ This family is not of full rank. � � In fact, T = ( ni=1 Xi, ni=1 Xi2) is not complete for θ. � � � �n n 2 2 Though Eθ 2( i=1 Xi) − (n + 1) i=1 Xi = 0, ∀θ ∈ R, � � g(x) = 2( ni=1 xi)2 − (n + 1) ni=1 x2i is not identically zero. MathStat(II) ∼ Lecture 2 ∼ page 10 §7.6 Functions of a parameter ☞ EX. iid Let X1, · · · , Xn ∼ Bernoulli(θ) θ ∈ (0, 1). Find the UMVUE of the parameter δ = θ(1 − θ). ✍ Sol. �� � � ∵ fθ (x) = exp ( ni=1 xi) ln θ + (n − ni=1 xi) ln(1 − θ) � � �� � θ n = exp ( i=1 xi) ln + n ln(1 − θ) 1−θ ∴ {fθ : θ ∈ (0, 1)} is a one-parameter exponential family, so Y = complete and sufficient for θ. �n i=1 Xi The following task is to find out an estimator T (Y ) such that E[T (Y )] = δ. By the Lehmann-Scheffé theorem, we know that T (Y ) is the UMVUE of δ. MathStat(II) ∼ Lecture 2 ∼ page 11 is Method I: �� � � �� Y Y 1 1 ∵E 1− = E[Y ] − 2 E[Y 2] n n n n with E[Y ] = nθ and E[Y 2] = nθ(1 − θ) + n2θ2. �� � � �� Y Y (n − 1) ∴E 1− = θ(1 − θ). n n n � �� �� � n Y Y Therefore, we can take T (Y ) = 1− . n−1 n n Method II: step1: Let 1, X = 1, X = 0; 1 2 U (X) = so Eθ [U (X)] = δ. 0, otherwise, step2: Further, let T (Y ) = Eθ [U (X)|Y ]. MathStat(II) ∼ Lecture 2 ∼ page 12 Eθ [U (X)|Y = t] = Pθ � � n � �� � X1 = 1, X2 = 0 � Xi = t � i=1 � Pθ (X1 = 1)Pθ (X2 = 0)Pθ ( ni=3 Xi = t − 1) � = Pθ ( ni=1 Xi = t) � � t−1 θ(1 − θ) n−2 (1 − θ)n−t−1 t−1 θ �n� = t n−t t θ (1 − θ) t(n − t) . n(n − 1) � �� �� � n Y Y That is Eθ [U (X)|Y ] = 1− . n−1 n n = MathStat(II) ∼ Lecture 2 ∼ page 13 ☞ EX. iid Let X1, . . . , Xn ∼ Poisson(θ). Find the UMVUE of P (X1 = r) ✍ Sol. � � n �� �n n � � e−nθ θ i=1 xi xi − nθ − ln xi ! ∵ fθ (x) = �n = exp ln θ i=1 xi ! i=1 i=1 n � ∴T = Xi is a complete sufficient statistic for θ. i=1 (It is a one-parameter exponential family.) Let 1, if X = r; 1 U (X) = then E[U (X)] = P (X1 = r). 0, otherwise, That is U (X) is an unbiased estimator of P (X1 = r). By the Lehmann-Scheffé theorem, we know that E[U (X)|T ] is the UMVUE of P (X1 = r). Moreover, (X1, . . . , Xn) given T = t follows multinomial(t; 1/n, 1/n, · · · , 1/n,), so X1|T � = � t ∼ Binomial� (t, 1/n). Thus, t � 1 �r � 1 t−r E[U (X)|T = t] = P (X1 = r|T = t) = 1− , t ≥ r; =0, r n n t < r. MathStat(II) ∼ Lecture 2 ∼ page 14 ☞ EX. iid Let X1, . . . , Xn ∼ Exp(1/θ), θ > 0. � (a) Explain that T = ni=1 Xi is a complete sufficient statistic of θ. (b) Prove that (n − 1)/T is the UMVUE of θ. ✍ Sol. � n � � � n � (a) ∵ fθ (x) = θ exp −θ xi = exp − θ xi + n ln θ i=1 i=1 � ∴ T = ni=1 Xi is a complete sufficient statistic of θ. (It is a one-parameter exponential family.) � (b) ∵ T = ni=1 Xi ∼ gamma(n, 1/θ). � ∞ n n−1 −θt 1θ t e θnΓ(n − 1) θ ∴ E[1/T ] = dt = n−1 = . t Γ(n) θ Γ(n) n−1 0 n By the Lehmann-Scheffé theorem, � n−1 is the UMVUE of θ. T MathStat(II) ∼ Lecture 2 ∼ page 15 ☞ EX. iid Let X1, . . . , Xn ∼ Bernulli(p), p ∈ [0, 1]. Show that the sample mean X is the UMVUE of p. ✍ Sol. By the Cramér-Rao inequality, for any unbiased estimator T (X) of p, we have Varp(T (X)) ≥ p(1 − p) n and Varp(X) = p(1 − p) . n It follows that Varp(X) attains the lower bound of the Cramér-Rao inequality, and hence the estimator T (X) is the UMVUE of p. MathStat(II) ∼ Lecture 2 ∼ page 16 ☞ EX. iid Let X1, . . . , Xn ∼ N (θ, 1), θ ∈ R. Find the UMVUE of θ2. ✍ Sol. ∵ {N (θ, 1) : θ ∈ R} is a one-parameter exponential family. � ∴ X or ni=1 Xi is sufficient and complete for θ. � � 1 ∵ E (X)2 = + θ2 which is derived from X ∼ N (θ, 1/n). n � � ∴ E (X)2 − n1 = θ2. 2 By he Lehmann-Scheffé theorem, T (X) = X − 1/n is the UMVUE of θ2. ✇ Note: 2 X − 1/n may occasionally be negative, so that an UMVUE for θ2 is not very sensible in this case. MathStat(II) ∼ Lecture 2 ∼ page 17 ☞ EX. (Sometimes an unbiased estimate may be absurd) Let X ∼ Poisson(λ), and d(λ) = e−3λ. We have T (X) = (−2)X which is unbiased for d(λ). Note that E[T (X)] = e −λ ∞ � x=0 (−2) xλ x x! =e −λ ∞ � (−2λ)x x=0 x x! = e−λe−2λ = d(λ). However, T (x) = (−2) > 0 if x is even, and < 0 if x is odd, which is absurd since d(λ) > 0. iid ☞ EX. Let X1, . . . , Xn ∼ Exp(θ), θ > 0. Find the UMVUE of P (X1 ≤ 2). ✍ Sol. Let U (X) = 1 if X1 ≤ 2, and = 0 if X1 > 2, so E[U (X)] = P (X1 ≤ 2). ∵ {exp(θ) : θ ∈ R} is a one-parameter exponential family. � ∴ T = ni=1 Xi is complete and sufficient for θ and T ∼ gamma(n, θ) Using the Lehmann-Scheffé theorem gives that E[U (X)|T ] is the UMVUE of P (X1 ≤ 2). MathStat(II) ∼ Lecture 2 ∼ page 18 The conditional pdf of X1 given T is (y − x1)n−2e−(y−x1)/θ Γ(n − 1)θn−1 f (x1|T = y) = y n−1e−y/θ Γ(n)θn � �n−2 n−1 x1 = 1− , 0 < x1 < y. y y (1/θ)e−x1/θ Thus, E[U (X)|T = y] = P (X1 ≤ 2|T = y), if y ≥ 2 � �n−2 � 2 n−1 x1 = 1− dx1 y y 0 � �n−2 � �� 2 � x1 x1 �� = −(n − 1) 1 − d 1− � y y � 0 � �n−1��2 � �n−1 x1 2 � = − 1− � =1− 1− � y y 0 MathStat(II) ∼ Lecture 2 ∼ page 19 Therefore, the UMVUE of P (X1 ≤ 2) is � �n−1 1 − 1 − 2 ,T ≥ 2 T E[U (X)|T ] = 1 ,T < 2 ✎ HW. §7.4 — 1, 5, 9; §7.5 — 1, 3, 4, 11, 13; §7.6 — 1, 2, 4, 7. MathStat(II) ∼ Lecture 2 ∼ page 20 §6.4 Multiparameter case: Estimation ✒ Def. Let X1, . . . , Xn be iid with common pdf f (x; θ), where θ ∈ Ω ⊂ Rp. As before, the likelihood function and its natural logarithm are given by L(θ) = n � i=1 f (xi; θ) and l(θ) = ln L(θ) = n � i=1 ln f (xi; θ) ∀θ ∈ Ω We will consider the value which maximizes L(θ) or l(θ). If it exists this value will be called the maximum likelihood estimator (mle) ☞ EX. Let X1, . . . Xn be iid N (θ, σ 2), with both θ and σ 2 unknown. Find the mle of θ and σ 2. MathStat(II) ∼ Lecture 3 ∼ page 1 ✍ Sol. Method I: � L(θ, σ 2|x) = (2πσ 2)−n/2 exp{− 12 (xi − θ)2/σ 2} n n n 1 � (xi − θ)2 l(θ, σ ) = − ln(2π) − ln(σ 2) − 2 2 2 1 σ2 2 and n ∂l(θ, σ 2) 1 � ⇒ = 2 (xi − θ) ∂θ σ 1 and n ∂l(θ, σ 2) n 1 � = − + (xi − θ)2 ∂σ 2 2σ 2 2σ 4 1 Setting these partial derivatives equal to zero and solving yields the solution. n θ̂ = x̄ 1� σˆ2 = (xi − x̄)2 n 1 and The remainder task is to verify this solution is a global maximum. n n � � 2 ∵ (xi − θ) > (xi − x̄)2 ∀θ �= x̄ 1 1 � n 1 1 � (xi − x̄)2 ∴ exp − 2 1 σ2 (2πσ 2)n/2 � � � n 1 1 � (xi − θ)2 ≥ exp − 2 1 σ2 (2πσ 2)n/2 MathStat(II) ∼ Lecture 3 ∼ page 2 Then this problem � � is reduced � to a 1-dimensional problem, verifying that n (xi −x̄)2 1 2 −n/2 (σ ) exp − 2 1 σ2 achieves its global maximum at � σ 2 = n−1 n1 (xi − x̄)2 ✇ Note: To use two-variate calculus to verify that a function H(θ1, θ2) has a global maximum at (θˆ1, θˆ2), it must be shown that the following four conditions hold. � � � � ∂ ∂ (i) H(θ1, θ2)�� = 0 and H(θ1, θ2)�� =0 ∂θ1 ∂θ2 ˆ ˆ ˆ ˆ (θ1 ,θ2 )=(θ1 ,θ2 ) � � ∂2 (ii) 2 H(θ1, θ2)�� ∂θ 1 θ=θ̂ � � ∂2 < 0 or 2 H(θ1, θ2)�� ∂θ 2 (θ1 ,θ2 )=(θ1 ,θ2 ) < 0. θ=θ̂ (iii) The determinant of Hessian matrix is positive, that is � 2 � � ∂ � ∂2 � ∂θ2 H(θ1, θ2) ∂θ1∂θ2 H(θ1, θ2) � � 21 � >0 � ∂ H(θ , θ ) ∂ 2 H(θ , θ ) � 1 2 1 2 � ∂θ1∂θ2 � 2 ∂θ2 θ=θ̂ (iv) Check boundaries. MathStat(II) ∼ Lecture 3 ∼ page 3 ✍ Sol. Method II: n ∂2 −n ∂2 n 1 � 2 2 l(θ, σ ) = < 0, l(θ, σ ) = − (xi − θ)2, ∂θ2 σ2 ∂(σ 2)2 2σ 4 σ 6 1 n ∂2 1 � 2 and l(θ, σ ) = − (xi − θ) ∂θ∂σ 2 σ4 1 � � � 2� 1 n The determinant of Hessian matrix is , so it is positive. 6 σ̂ 2 Check boundaries: θ −→ ±∞ and σ 2 −→ ∞ ⇒ L(θ, σ 2) −→ 0 θ −→ ±∞ and σ 2 −→ σ 2) −→ � 0 � 0 ⇒ L(θ, n 1� Therefore, (θ̂, σ̂ 2) = X̄, (Xi − X̄)2 is the mle of (θ, σ 2). n 1 MathStat(II) ∼ Lecture 3 ∼ page 4 ✑ Theorem: ✯✯✯✯✯ Let X = (X1, . . . , Xn) be distributed according to a k-parameter exponential family with pdf k � f (x; θ) = exp Qj (θ)Tj (x) + D(θ) + S(x) j=1 where θ ∈ Ω ⊂ Rk If the equations E[Tj (X)] = Tj (x) for� j = 1, 2, . . . k (the method of moment � ˆ ˆ ˆ estimator), have a solution θ̂(x) = θ1(x), θ2(x), . . . , θk (x) then θ̂ is the unique mle of θ. MathStat(II) ∼ Lecture 3 ∼ page 5 ✍ Proof: For a 1-parameter exponential family, if the pdf is f (x; θ) = exp {θT (x) + D(θ) + S(x)} l(θ) = ln f (x; θ) = θT (x) + D(θ) + S(x) l�(θ) = T (x) + D�(θ) ⇒ l��(θ) = D��(θ) Note that the mgf of T (X) is given by � � � tT (X) E e = e(t+θ)T (x)+D(θ)+S(x)dx Rn D(θ) e = exp{[D(θ) − D(θ + t)]} eD(t+θ) ∂ � tT (X)� �� ⇒ E[T (X)] = E e � = −D�(θ) and t=0 ∂t � �� ∂2 2 tT (X) � E{[T (X)] } = 2 E e � = [D�(θ)]2 − D��(θ) t=0 ∂t = MathStat(II) ∼ Lecture 3 ∼ page 6 So Var[T (X)] = −D��(θ) Then l�(θ) = 0 ⇒ E[T (X)] = T (x) and l��(θ) = −Var[T (X)] < 0 Therefore, the equation E[T (X)] = T (x) has a solution θ̂(x), then θ̂(x) is the unique MLE of the parameter θ. ☞ EX. Let X1, . . . , Xn be iid N (µ, σ 2), where µ and σ 2 are unknown. Find the mle of (µ, σ 2). ✍ Sol. � � n 2 � 1 (xi − µ) f (x; µ, σ 2) = (2πσ 2)−n/2 exp − 2 1 σ2 � � n n 2 � � 1 µ nµ n = exp − 2 x2i + 2 xi − 2 − ln(2πσ 2) 2σ 1 σ 1 2σ 2 MathStat(II) ∼ Lecture 3 ∼ page 7 Therefore, it is a 2-parameter exponential family. We just need to solve the equations � n � � n � n n � � � � 2 2 E Xi = xi and E Xi = xi 1 That is 1 1 1 n � xi nµ̂ = µ̂ = x̄ n 1 ⇒ 1� n � ˆ 2 (xi − x̄)2 σ =n n(σˆ2 + µˆ2) = x2i 1 1 MathStat(II) ∼ Lecture 3 ∼ page 8 ☞ EX. Let (X1, . . . , Xs) ∼Multinomial(n, p), where p = (p1, . . . , ps) is unknown. Find the mle of p. ✍ Sol. n! fX(x; p) = �s � i=1 xi ! = exp ln n! − ln( � = exp ln n! − ln( � px1 1 · · · pxs s = exp ln n! − ln( s � i=1 s � s � xi!) + i=1 s � xi!) + (n − x2 − x3 − . . . − xs) ln p1 + xi!) + n ln p1 + i=1 s � xi ln(pi/p1) i=2 xi ln pi i=1 � s � i=2 � xi ln pi � Therefore, it is a (s − 1)-parameter exponential family. Then we can obtain the mle of p by solving the equations E[Xi] = xi ⇒ p̂i = xi/n Consequently, the mle of pi is p̂i = Xi/n, i = 2, . . . , s i = 1, 2, . . . , s. i = 1, 2, . . . , s. MathStat(II) ∼ Lecture 3 ∼ page 9 ☞ EX. iid X1, . . . , Xn ∼ U (θ − ρ, θ + ρ), where Ω = {(θ, ρ) : θ ∈ R and ρ > 0}. Find the mle’s for θ and ρ. Are these two unbiased estimators? ✍ Sol. The likelihood function is L(θ, ρ|x) = 1/(2ρ)n, θ − ρ ≤ x(1) ≤ x(n) ≤ θ + ρ = 1/(2ρ)nI(θ − ρ ≤ x(1) ≤ x(n) ≤ θ + ρ) To minimize L make ρ as small as possible which is accomplished by setting θ̂ − ρ̂ = x θ̂ = x(n)+x(1) (1) 2 ⇒ θ̂ + ρ̂ = x(n) ρ̂ = x(n)−x(1) 2 LetYi = [Xi − (θ − ρ)]/(2ρ) ∼ U (0, 1), it follows that Y(r) ∼ beta(r, n − r + 1). θ̂ = ρY + ρY + (θ − ρ) (n) (1) ∵ ρ̂ = ρY(n) − ρY(1) MathStat(II) ∼ Lecture 3 ∼ page 10 n 1 +ρ +θ−ρ=θ E(θ̂) = ρ n+1 n+1 ∴ n−1 E(ρ̂) = ρ n+1 We conclude that θ̂ is unbiased but ρ̂ is not. ✎ HW. Let X1, . . . , Xn1 and Y1, . . . , Yn2 be independent random samples from N (µ1, σ 2) and N (µ2, σ 2) respectively. Find the mle of (µ1, µ2, σ 2), where Ω = {(µ1, µ2, σ 2)|µi ∈ R, i = 1, 2, 0 < σ 2 < ∞} n1 n2 � � 2 (Xi − X̄) + (Yi − Ȳ )2 1 Ans: µˆ1 = X̄, µˆ2 = Ȳ and σˆ2 = 1 . n1 + n2 MathStat(II) ∼ Lecture 3 ∼ page 11 ✒ Def. Multiparameter case: Fisher Information Matrix Fisher information in the scalar case is the variance of the random variable ∂ ∂θ ln f (X; θ). The analog in the multiparameter case is the variance-covariance matrix of the gradient of ln f (X; θ); that is, the variance-covariance matrix of the random vector given by � �� ∂ ln f (X; θ) ∂ ln f (X; θ) ∇ ln f (X; θ) = ,..., ∂θ1 ∂θp Fisher information is then defined by the p × p matrix, I(θ) = Cov (∇ ln f (X; θ)) The (i, j)th entry of I(θ) is given by � � ∂ ∂ Ij,k = Cov ln f (X; θ), ln f (X; θ) ∂θj ∂θk � � ∂2 = −E ln f (X; θ) ∂θj ∂θk MathStat(II) ∼ Lecture 3 ∼ page 12 ✇ Note: Information for a sample follows in the same way as the scalar case. Then Fisher information for a sample is given by nI(θ). ✑ Theorem: Multiparameter case: Cramér-Rao lower bound Let X1, . . . , Xn be iid r.v.s with common pdf f (x; θ) for θ ∈ Ω ⊂ Rs. Assume that regularity conditions hold. Let T (X) be an unbiased estimator of g(θ). Then Var[T (X)] ≥ α�[nI(θ)]−1α, where α� is the row matrix ith element αi = ∂ ∂θi g(θ), i = 1, 2, . . . , s. ✇ Note: For the scalar case, the lower bound reduces to Var[T (X)] ≥ g(θ) = θ. MathStat(II) ∼ Lecture 3 ∼ page 13 1 , provided nI(θ) ☞ EX. iid Let X1, . . . , Xn ∼ N (µ, σ 2), with µ = θ1 and σ 2 = θ2 both unknown. (i) Find Fisher information matrix for this sample. (ii) Find Cramér-Rao lower bound for any unbiased estimator of σ 2. ✍ Sol. (i) f (x; θ1, θ2) = √ (x−θ1 )2 1 − e 2θ2 2πθ2 Let ∂ ln f (X; θ1, θ2) 1 S1 = = (X − θ1) ∂θ1 θ2 ∂ ln f (X; θ1, θ2) 1 (X − θ1)2 S2 = =− + . ∂θ2 2θ2 2θ22 We can derive 1 1 1 Var(S1) = 2 Var(X) = = 2 and θ2 θ2 σ � � �2 � 1 X − θ1 1 1 √ Var(S2) =Var = 2 × 2 = 4 and 2θ2 4θ2 2σ θ2 MathStat(II) ∼ Lecture 3 ∼ page 14 � 1 X − θ1 1 Cov(S1, S2) =Cov √ × √ , × θ2 θ2 2θ2 � X − θ1 √ θ2 �2 � Consequently, Fisher information matrix is � � n Var(S1) Cov(S1, S2) 2 I(θ) = n × =σ 0 Cov(S1, S2) Var(S2) 2 (ii) ∵ g(θ) = � θ2 = σ � ∂g(θ) ∂g(θ) � ∴α = , = (0, 1) ∂θ1 ∂θ2 =0 0 n (2σ 4) ⇒ Cramér-Rao lower bound is given by 2 � � σ 0 2σ 4 n 0 Var[T (X)] ≥ (0, 1) = . 2σ 4 1 n 0 n MathStat(II) ∼ Lecture 3 ∼ page 15 ✎ HW. n Is the unbiased estimator σˆ2 = 1 � (Xi − X̄)2 efficient? n−1 1 ✑ Theorem: (The asymptotic behavior of the mle of the vector θ) Let X1, . . . , Xn be iid r.v.s with pdf f (x; θ) for θ ∈ Ω. Assume that regularity conditions hold. Then (i) The likelihood equation, ∂ l(θ) = 0, ∂θi P i = 1, 2, . . . , s P has a solution θ̂ such that θ̂ −→ θ, that is θ̂i −→ θi, i = 1, 2, . . . , s (ii) For any sequence which satisfies (i), √ D n(θ̂ − θ) −→ MVN(0, I −1(θ)) ← Don’t involve n √ D (iii) n(θ̂i − θi) −→ N (0, Iii−1(θ)), i = 1, 2, . . . , n. MathStat(II) ∼ Lecture 3 ∼ page 16 (iv) Let g be a transformation g(θ) = (g1(θ), . . . , gk (θ))� such that 1 ≤ k ≤ s and that the k × s matrix of partial derivatives ∂g ∂g ∂g1 1 1 . . . � � ∂θ ∂θ ∂θ ∂gi .1 .2 . .s B= = . . . . ∂θj ∂gk ∂gk ∂gk ∂θ1 ∂θ2 . . . ∂θs Let η̂ = g(θ̂). Then η̂ is the mle of η = g(θ). Then (using the ∆-method) √ D n(η̂ − η) −→ MVN(0, BI −1(θ)B �) � �−1 Moreover, Fisher information matrix for η is I(η) = BI −1(θ)B � MathStat(II) ∼ Lecture 3 ∼ page 17 ✇ Note: For the scalar case, let I(θ) be Fisher information for the family {p(x, θ), θ ∈ Ω}. If θ = h(ξ) and h is differentiable, then Fisher information that X contains about ξ is �� �2 � ∂ I ∗(ξ) = E ln p(X; θ) ∂ξ =E �� =E �� ∂ ∂h(ξ) ln p(X; θ) × ∂h(ξ) ∂ξ ∂ ln p(X; θ) ∂θ �2 � �2 � × [h�(ξ)]2 = I(θ)[h�(ξ)]2 MathStat(II) ∼ Lecture 3 ∼ page 18 ☞ EX. Let X ∼ N (0, σ 2). If θ = σ 2, then I(θ) = 1/(2σ 4) = 1/(2θ2). Find I ∗(σ). ✍ Sol. Let ξ = σ then θ = h(ξ) = ξ 2, so h�(ξ) = 2ξ = 2σ. 1 2 Thus, I ∗(ξ) = I ∗(σ) = I(σ 2)(2σ)2 = 4 × 4σ 4 = 2 . 2σ σ ✎ HW. Let (Xi, Yi), i = 1, 2, . . . , n be a r.s. from BVN(0, Σ) where � � � 1 ρ = σ2 , ρ 1 Ω = {(σ 2, ρ) : 0 < σ 2 < ∞, |ρ| ≤ 1} (a) Find mle of σ 2 and ρ. (b) Find Fisher information matrix (c) Establish limiting normality of ρ̂ in (a). MathStat(II) ∼ Lecture 3 ∼ page 19 ❉ Hint: �� n �n 2 � �n 2 X + Y 2 X i Yi i i 1 (a) σˆ2 = and ρ̂ = �n 2 1 �n 2 (2n) 1 Xi + 1 Yi n −nρ σ4 [σ 2(1 − ρ2)] 2 (b) nI((σ , ρ)) = 2 −nρ n(1 + ρ ) [σ 2(1 − ρ2)] (1 − ρ2)2 � � �� � � 2 ˆ 2 √ σ −σ 0 D , I −1(σ 2, ρ) (c) ∵ n −→ MVN ρ̂ − ρ 0 σ 4(1 + ρ2) −σ 2(1 − ρ2)ρ I −1(σ 2, ρ) = 2 2 2 2 −σ (1 − ρ )ρ (1 − ρ ) ∂g ∂g Let g(σ 2, ρ) = ρ. Then = 0 and = 1 ←i.e., B = (0, 1) ∂σ 2 ∂ρ √ D n(ρ̂ − ρ) −→ N (0, (1 − ρ2)2) 1 ✎ HW. §6.4 — 1, 2, 3, 4, 6. MathStat(II) ∼ Lecture 3 ∼ page 20 §5.5 Introduction to Hypothesis Testing ✒ Def. (1) A parametric hypothesis is an assertion about the unknown parameter θ. It is usually referred to as the null hypothesis, H0 : θ ∈ Ω0 ⊂ Ω. The statement H1 : θ ∈ Ω1 = Ω \ Ω0 is usually referred to as the alternative hypothesis. (2) If Ω0(Ω1) contains only one point, we say that Ω0(Ω1) is simple; otherwise, composite. Thus, if a hypothesis is simple, the probability distribution of X, if X ∼ fθ , θ ∈ Ω0(Ω1), is completely specified under the hypothesis. ☞ EX. Let X ∼ N (µ, σ 2). If both µ and σ 2 are unknown, Ω = {(µ, σ 2) : µ ∈ R, σ 2 > 0}. The hypothesis H0 : µ ≤ µ0, σ 2 > 0, where µ0 is known constant, is a composite null hypothesis. The alternative hypothesis is H1 : µ > µ0, σ 2 > 0, which is also composite. Similarly, the null hypothesis H0� : µ = µ0, σ 2 > 0 is also composite. If σ 2 = σ02 is known, the hypothesis H0�� : µ = µ0 is a simple hypothesis. MathStat(II) ∼ Lecture 6 ∼ page 1 ☞ EX. Let X1, . . . , Xn be iid Bernoulli(p). Some hypotheses of interest are p = 1/2, p ≤ 1/2, p ≥ 1/2 or, quite generally, p = p0, p ≤ p0, p ≥ p0, where p0 is known number, p0 ∈ (0, 1). ✇ Note: The problem of testing of hypotheses may be described as follow: Given the sample point x = (x2, . . . , xn) ∈ Rn, find a decision rule (function) that will head to a decision to reject or not reject the null hypothesis. In other words, partition the space Rn into two disjoint sets. C and C c such that, if x ∈ C, we reject H0 : θ ∈ Ω0, and if x ∈ C c, we do not reject H0. ✒ Def. (1) Let X1, . . . , Xn be a sample with the joint pdf fθ (x), θ ∈ Ω. A subset C of Rn such that, if x ∈ C, then H0 is rejected (with probability 1) is called the critical region (set): C = {x ∈ Rn : H0 is rejected if x ∈ C} MathStat(II) ∼ Lecture 6 ∼ page 2 (2) There are two types of errors that can be made if one uses such a procedure. One may reject H0 when in fact it is true, called a type I error, or do not reject H0 when it is false, called a type II error: True Don’t reject H0 Don’t reject H1 H0 correct type I error H1 type II error correct That is Pθ (type I error) = Pθ (X ∈ C), Pθ (type II error) = Pθ (X ∈ C c), MathStat(II) ∼ Lecture 6 ∼ page 3 θ ∈ Ω0 θ ∈ Ω1 ✇ Note: Ideally one would like to find a critical region for which both these probabilities are 0. Unfortunately situations such as this do not arise in practice. Usually, if a critical region is such that P (type I error)=0, it will be of the form “always do not reject H0”, and P (type II error) will then be 1. For example, let H0 : exp(θ) vs. H1 : poisson(θ) and the critical region {X : X < 0}. The procedure used in practice is to limit P (type I error) to some preassigned level α (usually 0.01 or 0.05) that is small and to minimize the probability of type II error. Sometimes α is called significance level or, simply, level. ✒ Def. Let ϕ : Rn −→ [0, 1], the function is known as a test function. MathStat(II) ∼ Lecture 6 ∼ page 4 ☞ EX. Let C be a critical region for some test. If a test function is 1, ϕ(x) = 0, x∈C o.w. Then Eθ [ϕ(X)] = P (type I error) ∀θ ∈ Ω0 and 1 − Eθ [ϕ(X)] = P (type II error) ∀θ ∈ Ω1 = Ω \ Ω0. ✒ Def. The mapping ϕ is said to be a test of hypothesis H0 : θ ∈ Ω0 against the alternative H1 : θ ∈ Ω1 with error probability (or level) α if Eθ [ϕ(X)] ≤ α ∀θ ∈ Ω0 we shall say, in short, that ϕ is a test for the problem (α, Ω0, Ω1) MathStat(II) ∼ Lecture 6 ∼ page 5 ✇ Note: Let us write βϕ(θ) = Eθ [ϕ(X)]. Our object, in practice will be to seek a test ϕ a given α ∈ [0, 1], such that sup βϕ(θ) ≤ α θ∈Ω0 The left-hand side of the above inequality is usually known as the size of the test ϕ ✒ Def. Let ϕ be a test function for the problem (α, Ω0, Ω1). For every θ ∈ Ω define βϕ(θ) = Eθ [ϕ(X)] = Pθ ( Reject H0) As a function of θ, βϕ(θ) is called the power function of the test ϕ. For any θ ∈ Ω1, βϕ(θ) is called the power of ϕ against the alternative θ. MathStat(II) ∼ Lecture 6 ∼ page 6 ☞ EX. iid Let X1, X2, . . . , X20 ∼ Bernoulli(p), where p ∈ (0, 1). It is clear that Y = �20 i=1 Xi ∼ binomial(20, p). To test H0 : p = 1/2 vs. H1 : p < 1/2 is of interest. Let the critical region be C = {y : y ≤ 6}. Find the probability of type I error and the power function of this test. ✍ Sol. Let the test function ϕ be ϕ(y) = 1, 0, P (type I error)=P (Y ≤ 6|p = 1/2) = if y ≤ 6 o.w. 6 � � � 20 y=0 y (1/2)20 ≈ 0.0577 The power function of ϕ is given by 6 � � � 20 y βϕ(p) = Ep[ϕ(Y )] = p (1 − p)20−y , y y=0 MathStat(II) ∼ Lecture 6 ∼ page 7 0 < p < 1. ✒ Def. Often the partitioning of the sample space is specified in terms of the values of a statistic called the test statistic. ✫ Skill: Given a sample point x, find a test ϕ(x) such that βϕ(θ) ≤ α ∀θ ∈ Ω0, and βϕ(θ) is maximum for θ ∈ Ω1. ✎ HW. §5.5 — 2, 4, 8 ∼ 13. MathStat(II) ∼ Lecture 6 ∼ page 8 §8.1 Most Power Tests ✒ Def. Let Φα be the class of all test for the problem (α, Ω0, Ω1). A test ϕ0 ∈ Φα is said to be a most powerful (MP) test against an alternative θ ∈ Ω1 if βϕ0 (θ) ≥ βϕ(θ) ∀ϕ ∈ Φα ✇ Note: If Ω1 contains only one point, this definition suffices. If on the other hand, Ω1 contains at least two points, as will usually be the case, we will have an MP test corresponding to each θ ∈ Ω1 ✒ Def. A test ϕ0 ∈ Φα for the (α, Ω0, Ω1) is said to be a uniformly most powerful (UMP) test if βϕ0 (θ) ≥ βϕ(θ) ∀ϕ ∈ Φα uniformly in θ ∈ Ω1 MathStat(II) ∼ Lecture 6 ∼ page 9 ✒ Def. ✯✯✯✯✯ The p-value associated with a test is the probability that we obtain the observed value of the test statistic or a value that is more extreme in the direction of the alternative hypothesis, calculated when H0 is true. ☞ EX. Let X ∼ N (µ, 100). To test H0 : µ = 80, v.s. H1 : µ > 80, let the critical region be defined by C = {(x1, . . . , x25) : x̄ > 83}, where x̄ is the sample mean of a r.s. of size n = 25 from this distribution. (i) How is the power function βϕ(θ) defined for this test? (ii) Find P (type I error). (iii) What are the value of βϕ(80), βϕ(83), and βϕ(86)? (iv) What is the p-value corresponding to x̄ = 83.41? MathStat(II) ∼ Lecture 6 ∼ page 10 ✍ Sol. (i) The test function can be defined by 1, ϕ(X) = 0, X̄ > 83 o.w. βϕ(µ) = Pµ(X̄ > 83) = P (Z ≥ (83 − µ)/(10/5)) = 1 − Φ ((83 − µ)/2), where Φ is the df of the standard normal distribution. (ii) P (Type I error)=βϕ(80) = 1 − Φ(1.5) ≈ 0.0668 (iii) βϕ(83) = 1 − Φ(0) ≈ 0.5 βϕ(86) = 1 − Φ(−1.5) ≈ 0.9332 (Do you have any discovery?) (iv) p − value = P (X̄ > 83.41|µ = 80) = P (Z > 1.705) = 1 − Φ(1.705) ∼ = 0.0441 MathStat(II) ∼ Lecture 6 ∼ page 11 ☞ EX. Let X̄ ∼ N (µ, 16/n), where n is the sample size. To test H0 : µ = 20 vs. H1 : µ < 20. Assume that there is a test function ϕ(x) = 1 if x̄ ≤ c; = 0, otherwise. Find the constant c and sample size n so that E[ϕ(X)|µ = 20] = 0.05 and E[ϕ(X)|µ = 19] = 0.9, approximately. ✍ Sol. � � � � X̄ − µ c − µ �� c−µ √ ≤ √ �µ = Φ √ βϕ(µ) = Eµ[ϕ(X)] = P 4/ n 4/ n 4/ n � � βϕ(20) = Φ c−20 c−20 √ √ = −1.645 = 0.05 � 4/ n � We are given ⇒ 4/ n βϕ(19) = Φ c−19 c−19 √ √ = 1.282 = 0.9 4/ n 4/ n � Solving simultaneously yields n ∼ = 137 and c ∼ = 19.4 MathStat(II) ∼ Lecture 6 ∼ page 12 ✒ Def. To every x ∈ Rn we assign a number ϕ(x), 0 ≤ ϕ(x) ≤ 1, which is the probability of rejecting H0 that X ∼ fθ , θ ∈ Ω0 if x is observed. The restriction βϕ(θ) ≤ α for θ ∈ Ω0 then says that, if H0 were true, ϕ rejects it with a probability ≤ α. We will call such a test a randomized test function. If ϕ(x) = IA(x), ϕ will be called a nonrandomized test. If x ∈ A, we reject H0 with probability 1; and if x ∈ / A, this probability is 0. MathStat(II) ∼ Lecture 6 ∼ page 13 ☞ EX. iid Let X1, X2, . . . , Xn ∼ N (µ, 1), where µ is unknown but it is known that H : X ∼ N (µ , 1) 0 i 0 µ ∈ Ω = {µ0, µ1}, µ0 < µ1. Let H1 : Xi ∼ N (µ1, 1) Intuitively, one would not reject H0 if the sample mean X̄ is “closer” to µ0 than to µ1; that is to say, one would reject H0 if X̄ > k, and not reject H0 otherwise. The constant k is determined from the level requirements. Given 0 < α < 1, we have � � X̄ − µ0 k − µ0 √ > √ Pµ0 (X̄ > k) = Pµ0 1/ n 1/ n = P ( Type I error ) = α √ so that k = µ0 + Zα/ n. Note that Z ∼ N (0, 1), then P (Z > Zα) = α. MathStat(II) ∼ Lecture 6 ∼ page 14 The test, therefore, is ϕ(x) = 1, √ if x̄ > µ0 + Zα/ n 0, o.w. X̄ is known as a test statistic, and the test ϕ is nonrandomized with critical √ region C = {x : x̄ > µ0 + Zα/ n} The power of the test at µ1 is given by √ Eµ1 [ϕ(X)] = Pµ1 (X̄ > µ0 + Zα/ n) � � √ X̄ − µ1 √ > (µ0 − µ1) n + Zα = P µ1 1/ n √ = Pµ1 (Z > Zα − n(µ1 − µ0)), Z ∼ N (0, 1) Note that the probability of type II error can be evaluated directly by P ( type II error ) = 1 − Eµ1 [ϕ(X)] √ = P (Z ≤ Zα − n(µ1 − µ0)). MathStat(II) ∼ Lecture 6 ∼ page 15 MathStat(II) ∼ Lecture 6 ∼ page 16 ☞ EX. Let X1, X2, . . . X5 be a sample from Bernoulli(p), where p is unknown and H , p = 1/2 0 p ∈ [0, 1]. Let H1 , p �= 1/2 It is reasonable to reject H0 if |X̄ − 1/2| > c, where X̄ is the sample mean and c is to be determined as below: Let α = 0.1. Then we would like to choose c such that the size of our test is α, that is 0.1 = Pp=1/2(|X̄ − 1/2| > c) or � � 5 � � � 0.9 = P −5c ≤ Xi − 5/2 ≤ 5c�p = 1/2 =P � 1 −k ≤ 5 � 1 � � Xi − 5/2 ≤ k �p = 1/2 � How do you find the critical point k? MathStat(II) ∼ Lecture 6 ∼ page 17 with letting k = 5c Note that n � 1 � Xi ∼ binomial (5, 1/2), under H0, so we have xi � xi − 5/2 −2.5 −1.5 −0.5 0.5 1.5 2.5 0 1 2 3 4 5 P �� 5 1 Xi = �5 1 xi |H0 is true 0.03125 0.15625 0.31250 0.31250 0.15625 0.03125 � Note that we cannot choose any k such that � �� 5 � �� � � � P � Xi − 5/2� ≤ k|H0 is true = 0.9. � � 1 MathStat(II) ∼ Lecture 6 ∼ page 18 ☞ For example. � (i) If k = 1.5 we reject H0, i.e., 51 Xi = 0 or 5, then � �� 5 � �� � � � � � P � Xi − 5/2� > 1.5�p = 1/2 = 0.03125 × 2 = 0.0625 < 0.1 � � 1 � (ii) If k = 0.5, we reject H0 when 5i=1 Xi = 0, 1, 4, 5 then � �� 5 � �� � � � � � P � Xi − 5/2� > 0.5�p = 1/2 = 0.0625 + 0.15625 × 2 = 0.375 > 0.1 � � 1 If we insist on achieving α = 0.1, we can take a randomized test function. � Assume that we reject H0 with probability γ when n1 Xi = 1 or 4. Then � n � � n � � � � � � � 0.1 = P Xi = 0 or 5�p = 1/2 + γP Xi = 1 or 4�p = 1/2 1 1 ⇒ γ = 0.0375/0.3125 = 0.114. MathStat(II) ∼ Lecture 6 ∼ page 19 It follows that we have a randomized test function �n 1, if 1 xi = 0 or 5 �n ϕ(x) = 0.114, if 1 xi = 1 or 4 0, o.w. The power of this test is � 5 � � 5 � � � � � � � Ep�=1/2[ϕ(X)] = P Xi = 0 or 5�p �= 1/2 +0.114P Xi = 1 or 4�p �= 1/2 1 1 MathStat(II) ∼ Lecture 6 ∼ page 20 ☞ EX. iid Let X1, X2 ∼ exp(θ), where θ ∈ Ω = {2, 4}. To test H0 : θ = 2 vs. H1 : θ = 4. Assume that there is a test function 1, if x1 + x2 ≥ 9.5 ϕ(x) = 0, o.w. That is, we reject H0 when X1 + X2 ≥ 9.5. (i) Find the probabilities of the type I error and the type II error, respectively. (ii) Evaluate the power for this test. MathStat(II) ∼ Lecture 6 ∼ page 21 ✍ Sol. (i) P (Type I error ) = Eθ=2[ϕ(X)] = P (X1 + X2 ≥ 9.5|θ = 2) � 9.5 � 9.5−x2 1 −(x1+x2)/2 =1− e dx1dx2 4 0 0 � 9.5 � 1 −x2/2 � −(9.5−x2 )/2 =1− e 1−e dx2 �0 2 � 9.5 −9.5/2 = 1 − 1 − e−9.5/2 − e 2 11.5 −9.5/2 ∼ = e [It is the size of this test] = 0.05 2 P (Type II error ) = P (X1 + X2 < 9.5|θ = 4) = 1 − Eθ=4[ϕ(X)] � 9.5 � 9.5−x2 1 −(x1+x2)/4 = e dx1dx2 16 0 0 13.5 −9.5/4 ∼ =1− e = 0.69 4 MathStat(II) ∼ Lecture 6 ∼ page 22 (ii) By the result of (i), we can obtain the power of this test directly by βϕ(4) = 1 − P ( Type II error ) ∼ = 0.31 ☞ EX. iid Let X1, X2 ∼ exp(θ), θ ∈ Ω = {1, 2}. We reject H0 if the observed values of x1 and x2 such that f (x1; 2)f (x2; 2) 1 ≤ , f (x1; 1)f (x2; 1) 2 1 f (x; θ) = e−x/θ , θ for H0 : θ = 2 vs. H1 : θ = 1 Find the size of this test and the power of the test. ✍ Sol. f (x1; 2)f (x2; 2) (1/4)e−(x1+x2)/2 1 1 1 = ≤ ⇐⇒ e(x1+x2) ≤ ⇐⇒ x1+x2 ≤ 2 ln 2 −(x +x ) f (x1; 1)f (x2; 1) 2 4 2 e 1 2 MathStat(II) ∼ Lecture 6 ∼ page 23 Therefore, the size of the test is P (X1 + X2 ≤ 2 ln 2|θ = 2) = 1 − e− ln 2 − ln 2e− ln 2 = 1 1 − ln 2 2 2 and the power of the test is P (X1 + X2 ≤ 2 ln 2|θ = 1) = 1 − e−2 ln 2 − 2 ln 2e−2 ln 2 = 3 1 − ln 2 4 2 MathStat(II) ∼ Lecture 6 ∼ page 24 ✑ Theorem: (The Neyman-Pearson Fundamental Lemma )✯✯✯✯✯ Let {fθ , θ ∈ Ω}, where Ω = {θ0, θ1}, be a family of possible distribution of X. Testing H0 : θ = θ0 vs. H1 : θ = θ1 at level α. Let Eθ0 [ϕ(X)] = α 1, and ϕ(x) = γ(x), 0, (1) if fθ1 (x) > kfθ0 (x) if fθ1 (x) = kfθ0 (x) if fθ1 (x) < kfθ1 (x) for some k ≥ 0 and 0 ≤ γ(x) ≤ 1 (i) Sufficient condition for an MP test (or UMP test) If a test satisfies (1) and (2) then it is an MP test. (ii) Necessary condition for an MP test If ϕ∗ is an MP test then for some k it satisfies (2). MathStat(II) ∼ Lecture 6 ∼ page 25 (2) ✍ Proof: (i) In the continuous case, suppose that ϕ is a test satisfying (1) and (2) and the ϕ∗ is any other test with Eθ0 [ϕ∗(X)] ≤ α. Denote S�+ = {x : ϕ(x) − ϕ∗(x) > 0} and S − = {x : ϕ(x) − ϕ∗(x) < 0} � ∵ . . . [ϕ(x) − ϕ∗(x)][fθ1 (x) − kfθ0 (x)]dx � � = ... [ϕ(x) − ϕ∗(x)][fθ1 (x) − kfθ0 (x)]dx ≥ 0 S +∪S − f (x) − kf (x) > 0 when x ∈ S + and θ1 θ0 Note that fθ (x) − kfθ (x) < 0 when x ∈ S − 1 0 � � � � ∗ ∴ . . . [ϕ(x)−ϕ (x)]fθ1 (x)dx ≥ k . . . [ϕ(x)−ϕ∗(x)]fθ0 (x)dx ≥ 0 (Note that Eθ0 [ϕ∗(X)] ≤ α and Eθ0 [ϕ(X)] = α imply Eθ0 [ϕ(X) − ϕ∗(X)] ≥ 0) ⇒ Eθ1 [ϕ(X)] ≥ Eθ1 [ϕ∗(X)]. MathStat(II) ∼ Lecture 6 ∼ page 26 (ii) In the continuous case, let ϕ∗ be an MP test at level α and let ϕ satisfy (1) and (2). Let S = (S + ∪ S −) ∩ A, where S + and S − are defined in (i) and A = {x : fθ1 (x) �= kfθ0 (x)}. Assume that P (S) > 0, then � � . . . [ϕ(x) − ϕ∗(x)][fθ1 (x) − kfθ0 (x)]dx > 0, S so Eθ1 [ϕ(X)] > Eθ1 [ϕ∗(X)]. That is a contradiction, and therefore P (S) = 0, i.e., P (ϕ(X) = ϕ∗(X)) = 1 MathStat(II) ∼ Lecture 6 ∼ page 27 ☞ EX. Let X ∼ N (0, 1) under H0 and X ∼ Cauchy distribution under H1. Find an MP with size α ≤ 0.1 test of H0 against H1. ✍ Sol. f1(x) = f0(x) 1 π(1+x2 ) 2 √1 e−x /2 2π √ x2/2 2 e =√ π 1 + x2 Using the Neyman-Pearson Lemma, thus the MP test is of the form � 2 ex /2 1, if π2 1+x 2 > k ϕ(x) = 0, o.w. where k is determined so that E0[ϕ(X)] = α. 1.2 1.1 y= exp(x2 2) 1 + x2 y = 1.044 0.9 1.0 x = ! 1.645 0.8 test.function(x) 1.3 1.4 1.5 MathStat(II) ∼ Lecture 6 ∼ page 28 -2 -1 0 1 2 x MathStat(II) ∼ Lecture 6 ∼ page 29 ∵ The size α ≤ 0.1. ∴ The test function can be rewritten as 1, if |x| > k1 ϕ(x) = 0, o.w. (intuition?) where k1 is determined from � k1 1 2 √ e−x /2dx = 1 − α 2π −k1 It follows that k1 = Zα/2. The power of the test is given by � k1 � 1 2 � E[ϕ(X)�H1] = 1 − dx = 1 − tan−1 k1 2 π −k1 π(1 + x ) 2 = 1 − tan−1 Zα/2 π MathStat(II) ∼ Lecture 6 ∼ page 30 ☞ EX. iid Let X1, . . . , Xn ∼ Bernoulli(p), and let H0 : p = p0, H1 : p = p1 with p1 > p0. Find the MP size α test of H0 against H1. ✍ Sol. Using the Neyman-Pearson Lemma, it gives that the MP size α test of H0 vs. H1 is of the form � λ(x) > k � x 1, p1 i (1 − p1)n− xi ϕ(x) = γ, where λ(x) = � x λ(x) = k , � i n− xi p (1 − p ) 0 0 0, λ(x) < k and k and γ are determined from Ep0 [ϕ(X)] = α. � �� x i � �� x i � �n p1 1 − p0 1 − p1 ∵ λ(x) = × × and p1 > p0 p0 1 − p1 1 − p0 ↑ does not depend on x � ∴ λ(x) is an increasing function of xi . MathStat(II) ∼ Lecture 6 ∼ page 31 Thus the MP size α test is of the form � 1, xi > k � ϕ(x) = γ, xi = k 0, o.w. (intuition?) Also, k1 and γ are determined from � n � � n � � � α = Ep0 [ϕ(X)] = Pp0 Xi > k1 + γPp0 X i = k1 1 1 MathStat(II) ∼ Lecture 6 ∼ page 32 ✇ Note: This MP size α test is independent of p1 as long as p1 > p0, that is, it remains an MP size α test against any p > p0 and is therefore a UMP test of p = p0 against p > p0. For the same example, in particular, let n = 5, p0 = 1/2, p1 = 3/4, and α = 0.05. Then the MP test is given by � 1, xi > k � ϕ(x) = γ, xi = k � 0, xi < k where k and γ are determined from 0.05 = α = It follows that k = 4 and γ = 0.122, i.e., 1, ϕ(x) = 0.122, 0, 5 � � � �5 � 5 1 k+1 �5 x 1 xi >4 1 xi =4 �5 o.w. MathStat(II) ∼ Lecture 6 ∼ page 33 2 � � � �5 5 1 +γ k 2 ☞ EX. iid Let X1, . . . , Xn ∼ N (µ, σ 2) where both µ and σ 2 are unknown. One wishes to test H0 : µ = µ0, σ 2 = σ02 against H0 : µ = µ1, σ 2 = σ02. Find an MP size α test of H0 vs. H1. ✍ Sol. The Neyman-Pearson lemma leads to the following MP test: 1, if λ(x) > k, ϕ(x) = 0, if λ(x) < k, � � � � � � where λ(x) = exp − (xi − µ1)2/(2σ02) / exp − (xi − µ0)2/(2σ02) and k is determined from Eµ0,σ2 [ϕ(X)] = α. 0 λ(x) can be simplified by λ(x) = exp ��n 1 xi (µ1 σ02 � n 2 2 − µ0) + 2 (µ0 − µ1) . 2σ0 MathStat(II) ∼ Lecture 6 ∼ page 34 � If µ1 > µ0, then λ(x) > k ⇔ n1 xi > k1, where k1 is determined from � n � � � � k1 − nµ0 α = Pµ0,σ2 X i > k1 = P Z > √ , Z ∼ N (0, 1) 0 nσ 0 1 √ giving k1 = Zα nσ0 + nµ0. The case µ1 < µ0 is treated similarly. The test function is �n √ 1, if 1 xi < Zα nσ0 + nµ0 ϕ(x) = 0, o.w. MathStat(II) ∼ Lecture 6 ∼ page 35 ✇ Note: (1) If σ0 is known, the test determined above is independent of µ1 as long as µ1 > µ0 (or µ1 < µ0), and it follows that the test is UMP against H1� : µ > µ0, σ 2 = σ02 (or H1� : µ < µ0, σ 2 = σ02). (2) If σ0 is not known, that is, the null hypothesis is a composite hypothesis H0�� : µ = µ0, σ 2 > 0 to be tested against the alternatives H1�� : µ = µ1, σ 2 > 0 (if µ > µ0), then the MP test determined above depends on σ 2. In other words, an MP test against the alternative H1��� : µ = µ1, σ 2 = σ02 will not (4) be an MP against H1 : µ = µ1, σ 2 = σ12 where σ12 �= σ02. ✎ HW. §8.1 — 2, 4 ∼ 10. MathStat(II) ∼ Lecture 6 ∼ page 36 §8.2 Uniformly Most Powerful Tests ✇ Note: In general, to test H0 : θ ≤ θ0 vs. H1 : θ > θ0 or its dual, H0� : θ ≥ θ0 v.s.H1 : θ < θ0 is not possible to find an UMP test. But we can consider a special class of distributions that is large enough to include the one-parameter exponential family, for which an UMP test of a one-sided hypothesis exists. ✒ Def. Let {fθ , θ ∈ Ω} be a family of pdf’s, Ω ⊆ R1(one parameter). We say that {fθ } has a monotone likelihood ratio (MLR) in the statistic T (X) if for θ1 < θ2, whenever fθ1 , fθ2 are distinct, the ratio fθ2 (x)/fθ1 (x) is a nondecreasing function of T (X) for x ∈ {x|fθ1 (x) > 0, or fθ2 (x) > 0}. MathStat(II) ∼ Lecture 7 ∼ page 1 ☞ EX. iid Let X1, . . . , Xn ∼ U (0, θ), θ > 0. The joint pdf of X1, . . . , Xn is 1 , 0 ≤ max{x1, . . . , xn} ≤ θ. θn Let θ1 > θ2 and consider the ratio fθ (x) = fθ1 (x) (1/θ1n) I{0 ≤ max{x1, . . . , xn} ≤ θ1} = fθ2 (x) (1/θ2n) I{0 ≤ max{x1, . . . , xn} ≤ θ2} � �n θ2 I(0 ≤ x(n) ≤ θ1) = θ1 I(0 ≤ x(n) ≤ θ2) 1, x(n) ∈ [0, θ2]; I(0 ≤ x(n) ≤ θ1) = Define R(x) = ∞, I(0 ≤ x(n) ≤ θ2) ∞, x(n) ∈ [θ2, θ1]. if x(n) > θ1. It follows that fθ1 /fθ2 is a nondecreasing function of x(n), and the family of U (0, θ) has an MLR in x(n). Let R(x) = MathStat(II) ∼ Lecture 7 ∼ page 2 ✑ Theorem: The one-parameter exponential family fθ (x) = exp{Q(θ)T (x) + S(x) + D(θ)}, where Q(θ) is nondecreasing, has an MLR in T (x). ✍ Proof: For θ2 > θ1, fθ2 (x) = exp{T (x)[Q(θ2) − Q(θ1)] + D(θ2) − D(θ1)} fθ1 (x) ∵ Q(θ) is a nondecreasing of θ. ∴ The ratio fθ2 (x) is a nondecreasing function of T (x). fθ1 (x) ✇ Note: We have already seen that U (0, θ), which is not an exponential family, has an MLR. MathStat(II) ∼ Lecture 7 ∼ page 3 ✑ Theorem: let X ∼ fθ , θ ∈ Ω ⊆ R, where {fθ } has an MLR in T (x) For testing H0 : θ ≤ θ0 vs. H1 : θ > θ0, θ0 ∈ Ω, any test of the form if T (x) > t0 1, ϕ(x) = γ, 0, if T (x) = t0 if T (x) < t0 has a nondecreasing power function (intuition?) and is UMP of its size Eθ0 [ϕ(X)] = α. Moreover, for every 0 ≤ α ≤ 1 and every θ0 ∈ Ω, there exists a t0, −∞ ≤ t0 ≤ ∞, and γ ∈ [0, 1] such that the test described above is the UMP size α test of H0 against H1. ➳ Remark: By interchanging inequalities throughout in the above theorem, we see that this theorem also provides a solution of the dual problem H0� : θ ≥ θ0 against H1� : θ < θ0. MathStat(II) ∼ Lecture 7 ∼ page 4 ☞ EX. Let X have the pdf (or pmf) PM (X = x) = �M ��N −M � x �Nn−x � , n max{0, M + n − N } ≤ x ≤ min{M, n}, and M is an unknown, positive integer. Find an UMP size α test of H0 : M ≤ M0 vs. H1 : M > M0 if it exists. ✍ Sol. Is the distribution of this one a one-parameter exponential family? (No!) Because the ratio PM +1{X = x} M +1 N −M −n+x R(x) = = × , PM {X = x} N −M M +1−x we see that {PM } has an MLR in x, i.e., the ratio is a nondecreasing function of x. Note that (M + 1)/(N − M ) in R(x) is independent of x. It follows that there exists an UMP test of H0 : M ≤ M0 vs. H1 : M > M0, which reject H0 when X is too large, i.e., the UMP size α test is given by MathStat(II) ∼ Lecture 7 ∼ page 5 1, ϕ(x) = γ, 0, x>k x=k x<k where k and γ are determined from EM0 [ϕ(X)] = α. ☞ EX. iid Let X1, . . . , X25 ∼ N (θ, 100). Find an UMP size α = 0.1 test for H0 : θ = 75 against H1 : θ > 75. ✍ Sol. ∵ {N (θ, 100), θ ∈ R} is a one-parameter exponential family [HW] � ∴ {N (θ, 100), θ ∈ R} has an MLR in 25 1 Xi (or X̄). It follows that the UMP test function is given ϕ(x) = 1, if x̄ > k and = 0, otherwise, where k is determined from P (X̄ > k|θ = 75) = 0.1. ∵X̄ ∼ N (75, 4) under H0. ∴ Z0.1 = (k − 75)/2. Note that Z ∼ N (0, 1), P (Z > Z0.1) = 0.1, Z0.1 = 1.28. ⇒ k = 77.56. MathStat(II) ∼ Lecture 7 ∼ page 6 ☞ EX. iid Let X1, . . . , Xn ∼ N (θ, 16). Find the sample size n and an UMP test of H0 : θ = 25 vs. H0 : θ < 25 with power function K(θ) so that approximately K(25) = 0.1 and K(23) = 0.9. ✍ Sol. ∵ {N (θ, 16), θ ∈ R} is a one-parameter exponential family. � ∴ This family has an MLR in X̄ (or Xi). 1, x̄ < k Then we have the UMP test ϕ(x) = 0, otherwise. K(25) = 0.1; ∵ K(23) = 0.9. � � k − 25 = 0.1; P Z< √ 4/ n � � ∵ k − 23 P Z < 4/√n = 0.9. MathStat(II) ∼ Lecture 7 ∼ page 7 It follows that −Z0.1 = Z0.9 = k − 25 k − 23 √ —– ➀ and Z0.1 = √ —– ➁ 4/ n 4/ n k − 25 by ➀/➁, so k = 24. k − 23 Substituting k = 24 into ➁, so n ∼ = 26.2 We have −1 = We can take n be 26 or 27. Accordingly, the test function of this UMP test is 1, x̄ < 24; ϕ(x) = 0, otherwise. MathStat(II) ∼ Lecture 7 ∼ page 8 ☞ EX. iid Let X1, X2 ∼ U (µ, µ + 1). For testing H0 : µ = 0 vs. H1 : µ > 0, we have two competing tests: ① ϕ1(x1) : Reject H0 if x1 > 0.95 and ② ϕ2(x1, x2) : Reject H0 if x1 + x2 > c. (a) Find the value of c so that ϕ2 has the same size as ϕ1. (b) Calculate the power function of each test. Draw a well-labeled graph of each power function. ✍ Sol. (a) Let W = X1 + X2, so the pdf of W under H0 is given by 2 − w, 1 < w < 2; fW (w) = w, 0 < w < 1. Claim Pµ=0(X1 > 0.95) = Pµ=0(W > c). � 2 ⇒ 0.05 = (2 − w)dw if 1 < c ≤ 2. c √ ⇒ 0.05 = 2 − 2c + c2/2 ⇒ c = 2 − 1/ 10. MathStat(II) ∼ Lecture 7 ∼ page 9 (b) The power function of the first test is � 0, µ+1 Eµ[ϕ1(X1)] = ? dx = 0.95 µ + 0.05, 1, µ < −0.05 − 0.05 ≤ µ < 0.95 µ ≥ 0.95 For general case, the pdf of W is 2µ + 2 − w, 2µ + 1 < w < 2µ + 2 fW (w) = w − 2u, 2µ < w < 2µ + 1 MathStat(II) ∼ Lecture 7 ∼ page 10 Let W =X +X X =Z 1 2 1 ⇒ Z = X1 X2 = W − Z µ≤Z ≤µ+1 Then fW,Z (w, z) = 1, where µ≤W −Z ≤µ+1 MathStat(II) ∼ Lecture 7 ∼ page 11 Eµ[ϕ2(X)] = Pµ(X1 + X2 > c) 0, if µ < c/2 − 1, (i.e., c > 2µ + 2) � 2µ+1 1 (w − 2µ)dw + , if 2µ < c < 2µ + 1, 2 c = � 2µ+2 (2µ + 2 − w)dw, if 2µ + 1 < c < 2µ + 2, c 1, if µ > c/2. (i.e., c < 2µ) 0, if µ ≤ c/2 − 1, 1 c c−1 (2µ + 2 − c)2, if − 1 < µ ≤ , 2 2 = 2 (c − 2µ)2 c−1 c 1 − , if < µ ≤ , 2 2 2 1, if µ > c/2. MathStat(II) ∼ Lecture 7 ∼ page 12 ✑ Theorem: For the one-parameter exponential family, there exists an UMP test of the hypothesis H0 : θ ≤ θ1 or θ ≥ θ2 (θ1 < θ2) against H1 : θ1 < θ < θ2 that is of the form if c1 < T (x) < c2, 1, ϕ(x) = γi , 0, if T (x) = ci, i = 1, 2(c1 < c2), if T (x) < c1 or > c2 where the c’s and the γ’s are given by Eθ1 [ϕ(X)] = Eθ2 [ϕ(X)] = α. ☞ EX. Let X1, . . . , Xn be iid N (µ, 1). Find an UMP size α test for H0 : µ ≤ µ0 or µ ≥ µ1 (µ1 > µ0) against H1 : µ0 < µ < µ1. MathStat(II) ∼ Lecture 7 ∼ page 13 ✍ Sol. It is clear that {N (µ, 1), µ ∈ R} is an one-parameter exponential family. Then there exists the UMP test given by � 1, if c < x i < c2 , 1 � � ϕ(x) = γi, if xi = c1 or xi = c2 , 0, o.w.. where c1 and c2 are determined from � � α = Pµ0 (c1 < Xi < c2) = Pµ1 (c1 < Xi < c2) and γ1 = γ2 = 0. Thus, � � � � c1 − nµ0 c2 − nµ0 c1 − nµ1 c2 − nµ1 √ √ α = P µ0 <Z< √ = P µ1 <Z< √ . n n n n Given α, n, µ0, and µ1, we can solve for c1 and c2. ✇ Note: UMP two-sided tests for H0 : θ1 ≤ θ ≤ θ2 and H0� : θ = θ0 for the one-parameter exponential family do not exist. MathStat(II) ∼ Lecture 7 ∼ page 14 ✦ Counter Example: : iid Let X1, . . . , Xn ∼ U (0, θ). To test H0 : θ = θ0 vs. H1 : θ �= θ0, the UMP test of the one exists. ☞ EX. � iid Let X1, . . . , Xn ∼ N (0, σ 2). Since {N (0, σ 2) : σ 2 > 0} has an MLR in n1 Xi2, it follows that UMP tests exist for one-sided hypotheses σ ≥ σ0 and σ ≤ σ0. Consider now the null hypothesis H0 : σ = σ0 vs. H1 : σ �= σ0. We will show that an UMP test of H0 : σ = σ0 does not exist. For testing σ = σ0 against σ > σ0, a test of the form �n 2 1, if 1 x i > c1 ϕ1(x) = 0, otherwise. is UMP, and for testing σ = σ0 against σ < σ0, a test of the form �n 2 1, if 1 x i < c2 ϕ2(x) = 0, o.w. MathStat(II) ∼ Lecture 7 ∼ page 15 is UMP. If the size is chosen as α, then c1 = σ02χ2n,α and c2 = σ02χ2n,1−α, where P (W ≥ χ2n,α) = α, W is distributed from a chi-square distribution with n degrees of freedom. Clearly, neither ϕ1 nor ϕ2 is UMP for H0 : σ = σ0 vs. H1 : σ �= σ0 ✎ HW. §8.2 — 1 ∼ 4, 7 ∼ 9, 11, 13. MathStat(II) ∼ Lecture 7 ∼ page 16 §8.3 Likelihood Ratio Tests ✒ Def. For testing H0 : θ ∈ Ω0 against H1 : θ ∈ Ω1 a test of the form: reject H0 if and only if λ(x) < c, where c is some constant, and sup fθ (x) λ(x) = θ∈Ω0 sup fθ (x) , θ∈Ω where θ ∈ Ω and X be a random vector with pdf fθ , is called a likelihood ratio test (LR test). ➳ Remark: It is clear that 0 ≤ λ ≤ 1. The constant c is determined from the size restriction sup Pθ (X : λ(X) < c) = α. θ∈Ω0 MathStat(II) ∼ Lecture 8 ∼ page 1 ☞ EX. Let X ∼ b(n.p). Find a size α likelihood ratio test of H0 : p ≤ p0 against H1 : p > p0, p0 ∈ (0, 1). ✍ Sol. � � supp≤p0 nx px(1 − p)n−x � � Let λ(x) = . sup0≤p≤1 nx px(1 − p)n−x � x �x � x �n−x It is clear that sup px(1 − p)n−x = 1− and n n p∈[0,1] supp≤p0 px(1 − p)n−x px0 (1 − p0)n−x, = � �x � x �n−x x 1− , n n x if p0 < ; n x if ≤ p0. n MathStat(II) ∼ Lecture 8 ∼ page 2 It follows that px0 (1 − p0)n−x , � x �x � x �n−x 1 − λ(x) = n n 1, x if p0 < ; n x if p0 ≥ . n Note that λ(x) < 1 for np0 < x and λ(x) = 1 for x ≤ np0, and it follows that λ(x) is a decreasing function of x. Thus we have a test function defined by x > c, 1, ϕ(x) = γ, 0, x = c, x < c. γ and c can be determined by Ep0 [ϕ(X)] = α. MathStat(II) ∼ Lecture 8 ∼ page 3 ☞ EX. iid Let X1, . . . , Xn ∼ N (µ, σ 2), where both µ and σ 2 are unknown. Find a size α likelihood ratio test of H0 : µ = µ0 against H1 : µ �= µ0. ✍ Sol. Let θ = (µ, σ 2) and Ω0 = {θ : µ = µ0, σ 2 > 0} and Ω = {θ : µ ∈ R, σ > 0} � �n � �n � 2 1 1 (xi − µ0 ) sup fθ (x) = sup √ exp − = fµ0,σ̂2 (x), 0 2σ 2 2πσ 2 θ∈Ω0 σ 2 >0 n where σ̂02 is the MLE, σ̂02 Thus 1� = (xi − µ0)2. n i=1 1 e−n/2 � n/2 n n/2 2 θ∈Ω0 (2π/n) { 1 (xi − µ0) } 2 The MLE of θ = (µ, σ ) when both µ and σ 2 are unknown is � � n n � xi � (xi − x̄)2 , . n n i=1 i=1 sup fθ (x) = MathStat(II) ∼ Lecture 8 ∼ page 4 Thus 1 e−n/2 � n/2 n n/2 2 θ∈Ω (2π/n) { 1 (xi − x̄) } Therefore, the likelihood ratio can be simplified to n n/2 � (xi − x̄)2 � �n/2 1 1 �n λ(x) = = n 2/ 2 � 1 + [n(x̄ − µ ) 0 1 (xi − x̄) ] 2 (xi − µ0) sup fθ (x) = 1 n � n � (xi − µ0) = 2 n � (xi − x̄ + x̄ − µ0) = (xi − x̄)2 + n(x̄ − µ0)2. 1 1 � 1 ∵ λ(x) is a decreasing function of n(x̄ − µ0)2/ n1 (xi − x̄)2. �√ � n � � n(x̄ − µ0) � 2 −1 � � ∴ We reject H0 if � (xi − x̄)2. � > c, where s = (n − 1) s 1 Note that 2 MathStat(II) ∼ Lecture 8 ∼ page 5 The statistic t(X) = √ √ n(X̄ − µ0) (X̄ − µ0)/(σ/ n) =� �1/2 S (n − 1)S 2/σ 2 (n − 1) has a t-distribution with n − 1 degrees of freedom under H0 : µ = µ0. A test function can be defined by 1, |t| > tn−1,α/2, ϕ(x) = 0, otherwise. ✒ Def. Let the random variable W be N (δ, 1) and the random variable V be χ2(r); moreover W and V are assumed to be independent. The quotient T = � W/ v/r is said to have a noncentral t-distribution with r degrees of freedom and noncentral parameter δ. MathStat(II) ∼ Lecture 8 ∼ page 6 ☞ EX. In the previous example, the statistic t(X) is a noncentral t-distribution with √ r d.f. and noncentral parameter δ = (µ − µ0)/(σ/ n) under H1. ☞ EX. Let X1, X2, . . . , Xm and Y1, Y2, . . . , Yn be independent random samples from N (µ1, σ12) and N (µ2, σ22), respectively. Find a size α likelihood ratio test of H0 : σ12 = σ22 against H1 : σ12 �= σ22 if µ1 and µ2 are both unknown. ✍ Sol. Let Ω = {θ : µi ∈ R, σi2 > 0, i = 1, 2}, θ = (µ1, σ12, µ2, σ22), and Ω0 = {θ : µi ∈ R, i = 1, 2 σ12 = σ22 > 0} The joint pdf of X and Y is � � m n � 1 1 � 1 fθ (x, y) = exp − 2 (xi − µ1)2 − 2 (yi − µ2)2 . m n (n+m)/2 2σ1 1 2σ2 1 (2π) σ1 σ2 We have the MLE of θ on Ω are µˆ1 = x̄, µˆ2 = ȳ, � �n 1 2 2 2 σ̂12 = m1 m (x − x̄) , and σ̂ = i 2 1 1 (yi − ȳ) . n MathStat(II) ∼ Lecture 8 ∼ page 7 If the joint pdf is restrict to Ω0, then the MLE of θ is µ̂ˆ 1 = x̄, µ̂ˆ 2 = ȳ, �m �n 2 2 1 (xi − x̄) + 1 (yi − ȳ) 2 σ̂ = if σ 2 = σ12 = σ22. m+n Thus sup fθ (x, y) = θ∈Ω0 and e−(m+n)/2 [2π/(m + n)](m+n)/2 { �m 1 (xi − x̄)2 + �n 1 (yi − ȳ)2} (m+n)/2 e−(m+n)/2 sup fθ (x, y) = � �n 2 }m/2 { 2 n/2 θ∈Ω (2π/m)m/2(2π/n)n/2 { m (x − x̄) i 1 1 (yi − ȳ) } so that � �� n � �n/2 ��m 2 m/2 2 n/2 (x − x̄) (y − ȳ) n i i 1 1 λ(x, y) = �m �n (m+n)/2 2 m+n { 1 (xi − x̄) + 1 (yi − ȳ)2} �m �n 2 2 x̄) (x − i 1 1 (yi − ȳ) 2 2 Let s1 = and s2 = . m−1 n−1 � m m+n �m/2 � MathStat(II) ∼ Lecture 8 ∼ page 8 Then λ(x, y) = � m m+n �m/2 � n m+n �n/2 {1 + [(m − 1)/(n − × 1 1)]s21/s22}n/2{1 + [(n − 1)/(m − 1)]s22/s21}m/2 Let f = s21/s22. Then λ(x, y) < c is equivalent to f < c1 or f > c2 (H.W.) Under H0, the statistic �m 2 1 (Xi − X̄) /(m − 1) F = � under H0 n 2 1 (Yi − Ȳ ) /n − 1 has an F (m − 1, n − 1) distribution, so that c1, c2 can be selected. It is usual to take α P (F ≤ c1) = P (F ≥ c2) = 2 � 2 2� under H1, σ2 /σ1 F has an F (m − 1, n − 1) distribution. MathStat(II) ∼ Lecture 8 ∼ page 9 ☞ EX. A random sample X1, . . . , Xn, is drawn from a Pareto population with pdf θν θ f (x|θ, ν) = θ+1 , x (i) Find the MLEs of θ and ν ν ≤ x, θ > 0, ν > 0 (ii) Show that the LR test of H0 : θ = 1, ν unknown against H1 : θ �= 1, ν unknown, has a critical region of the form c = �� � {x : T (x) ≤ c1 or T (x) ≥ n i=1 Xi c2}, where 0 < c1 < c2 and T = ln n X(1) (iii) Show that, under H0, 2T ∼ χ2(2n−2) ✍ Sol. (i) ν̂ = X(1) and θ̂ = ln (ii) (H.W.) n �� � (H.W.) n i=1 Xi n X(1) MathStat(II) ∼ Lecture 8 ∼ page 10 (iii) 2T = 2 ln � n � Xi 1 � � − 2 ln � � n X(1) ⇒ 2T + 2n ln X(1)/ν = 2 ln � � n � � (Xi/n) , using T = ln 1 Let Yi = Xi/ν, then the pdf of Yi is fY (y) = y −2, ��n 1 (Xi /ν) (X(1)/ν)n � 1<y<∞ Thus, T is an ancillary statistic. Under H0 : θ = 1, X(1) is complete and sufficient statistic. Then Basus’s theorem given X(1) and T are independent. It follows that φ1(t)φ2(t) = φ3(t), where φ1, φ2 and φ3 are the mgf’s of 2T , � � � 2n ln X(1)/ν and 2 ln [ ni=1 Xi/ν], respectively. � � � � iid ∵ 2 ln Yi ∼ χ2(2), i = 1, 2, . . . n and 2n ln X(1)/ν = 2n ln Y(1) ∼ χ2(2) ∴ φ1(t)(1 − 2t)−1 = (1 − 2t)−n ⇒ φ1(t) = (1 − 2t)−(n−1) ⇒ 2T ∼ χ2(2n−2) MathStat(II) ∼ Lecture 8 ∼ page 11 ✑ Theorem: Under some regularity conditions on fθ (x), the random variable −2 ln λ(X) is asymptotically distributed as a chi-square r.v. with degrees of freedom equal to dim(Ω)−dim(Ω0). ✇ Note: These conditions are mainly concerned with the existence and behavior of the derivatives (with respect to the parameter) of the likelihood function, and the support of the distribution (it cannot depend on the parameter). ✎ HW. §8.3 — 7, 9 ∼ 12. MathStat(II) ∼ Lecture 8 ∼ page 12
Similar documents
TR2000-01: Advances in DFS Theory
DFS theory [9] is an axiomatic theory of fuzzy natural language (NL) quantification. It aims at providing linguistically adequate models for approximate quantifiers like almost all, as well as for ...
More information