Chapter 5 Properties of A Random Sample. 1 Basic Concepts of Random Sample

Transcription

Chapter 5 Properties of A Random Sample. 1 Basic Concepts of Random Sample
Chapter 5
1
Properties of A Random Sample.
Basic Concepts of Random Sample
Def: The random variables X1 , · · · , Xn are called a random sample of size n
from the population f (x), if X1 , · · · , Xn are mutually independent random variables
and the marginal pdf or pmf of each Xi is the same function f (x). They are
also called independent and identically distributed (iid) random variables
with the pdf or pmf f (x).
The joint pdf or pmf of the iid random variables (X1 , . . . , Xn ) is given by
f (x1 , · · · , xn ) = f (x1 )f (x2 ) · · · f (xn ) =
n
Y
f (xi ).
i=1
Example. Assume X1 , . . . , Xn are i.i.d. sample from exp(β).
(1) Find the joint distribution of the sample.
(2) Compute P (X1 > 2, · · · , Xn > 2).
104
Remarks:
• A sample drawn from a finite N population with replacement are iid.
• A sample drawn from a finite N population without replacement are
identically distributed but not independent.
2
Summary of a Random Sample
Def: A statistic is a real- or vector-valued function of the random sample:
Y = T (X1 , . . . , Xn ).
Statistics provide good summaries of the sample.
Def: The probability distribution of Y is called the sampling distribution of
Y (because this distribution is derived from the distributions of the random
variables in the random sample).
Example of Commonly-Used statistics:
sample sum
sample mean
sample variance
sample standard deviation
minimum of sample
maximum of sample
part of sample
P
T = P
Xi .
¯= 1
Xi .
X
n
1 P
2
¯ 2.
S = n−1 (Xi − X)
√
S = S2.
T = min(X1 , . . . , Xn ).
T = max(X
, . . . , Xn ).
1P
P
T = ( Xi , Xi2 ).
Remark: A statistic can not be a function of parameter.
Example. Assume X1 , . . . , Xn are i.i.d. sample from exp(β). Let T =
min(X1 , . . . , Xn ). Find the sampling distribution of T .
105
2.1
Sum and Mean of A Random Sample
Theorem. Let X1 , . . . , Xn be a random sample from a population with
mean µ and variance σ 2 . Then
P
P
(1) E( ni=1 Xi ) = nµ, Var( ni=1 Xi ) = nσ 2 .
¯ = µ, Var(X)
¯ =
(2) E(X)
σ2
n .
(3) E(S 2 ) = σ 2 .
¯ is an unbiased estimator of µ, and S 2 is an unbiased
Remark: We say X
estimator of σ 2 .
Theorem. Distribution of the sum and mean of a random sample.
¯ = 1 Y . Then
Let X1 , . . . , Xn be iid. Define Y = X1 + · · · + Xn and X
n
MY (t) = [MX (t)]n ,
t
t
MX¯ = MY ( ) = [MX ( )]n .
n
n
Example 1:
P
Xi ∼ binom (k, p) i.i.d. P
−→ ni=1 Xi ∼ binom (nk, p)
Xi ∼ Poi (λ) i.i.d. −→ Pni=1 Xi ∼ Poi (nλ)
2
Xi ∼ N(µ, σ 2 ) i.i.d. −→ ni=1 X
Pi n∼ N(nµ, nσ )
Xi ∼ Gamma (α, β) i.i.d. −→ i=1 Xi ∼ Gamma (nα, β)
106
3
Sampling From the Normal Distribution
3.1
Sample Mean and Sample Variance
2 ). Let the sample mean
Theorem.
For a random sample from N (µ, σP
P
1
1
2
¯ 2 . Then
¯=
Xi and the sample variance S = n−1 (Xi − X)
X
n
¯ ⊥ S2
(a) X
¯ ∼ N (µ, σ 2 /n)
(b) X
(c) (n − 1)S 2 /σ 2 ∼ χ2n−1
Proof:
107
Theorem. Assume Xj ∼ N (µj , σj2 ), j = 1, · · · , n, are independent. For
constants aij and brj (j = 1, . . . , n; i = 1, . . . , k; r = 1, . . . , m), where k+m ≤
n, define
n
X
Ui =
aij Xj , i = 1, . . . , k,
j=1
Vr =
n
X
brj Xj ,
r = 1, . . . , m.
j=1
(a) The random variables Ui and Vr are independent if and only if
Cov(Ui , Vr ) = 0.
(b) Cov(Ui , Vr )=
Pn
2
j=1 aij brj σj .
(c) The random vectors (U1 , . . . , Uk ) and (V1 , . . . , Vm ) are independent if
and only if Ui is independent of Vr for all the pair (i, r), where i = 1, . . . , k
and r = 1, . . . , m.
Remark:
(i) Based on (a), if we start with independent normal random variables,
covariance and independence are equivalent for linear functions of these
random variables. Thus, we can check independence for normal variables by
merely checking the covariance term.
Example: Let X1 , · · · , Xn be a random sample from N (µ, σ 2 ) distribu¯ Xj − X)
¯ = 0 for all j = 1, · · · , n.
tion. Then Cov(X,
(ii) Based on part (b), we can conclude that the pairwise independence
implies the vector independence for the linear functions of independent normal random variables. (This is not true in general)
108
3.2
Distributions Derived from Normal
χ2p distribution: same as Gamma(p/2, 2), with the pdf
f (x) =
1
x(p/2)−1 e−x/2 ,
2p/2 Γ(p/2)
x > 0.
(1) If X is N (0, 1), then X 2 ∼ χ21 .
P
(2) If X1 , . . . , Xn are independent and Xi ∼ χ2pi , then ni=1 Xi ∼ χ2p1 +···+pn .
Note:
Student’s t distribution:
If U is N (0, 1) and V is χ2p , and U and V are independent, then √U
V /p
has a Student’s tp distribution with pdf
f (t) =
Γ((p + 1)/2)
√ (1 + t2 /p)−(p+1)/2 ,
Γ(p/2) pπ
−∞ < t < ∞.
If Tp is a random variable with a tp distribution, then
(i) Tp has no mgf as it does not have moments of all orders;
(ii) Tp has only p − 1 moments;
(iii) T1 has no mean, T2 has no variance;
p
for p > 2.
(iv) ETp = 0 for p > 1, VarTp = p−2
Example: Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ). Then
T =
¯ −µ
X
√ ∼ tn−1
S/ n
109
F distribution:
If U ∼ χ2p and V ∼ χ2q , and U and V are independent, then
Snedecor’s Fp,q distribution with pdf
f (x) =
x(p/2)−1
Γ((p + q)/2) p p/2
( )
,
Γ(p/2)Γ(q/2) q
[1 + (p/q)x](p+q)/2
U/p
V /q
has a
0 < x < ∞.
(i) Fp,q has no mgf as it does not have moments of all orders;
(ii) Fp,q has (finite) moments of order m where m < q/2;
q
(iii) EFp,q = q−2
if q > 2.
2 )
Example (Variance ratio): Let X1 , . . . , Xn be a random sample from N (µX , σX
2
and Y1 , . . . , Ym be a random sample from N (µY , σY ). Then
F =
2 /σ 2
SX
X
∼ Fn−1,m−1 .
SY2 /σY2
Theorem
• If X ∼ Fp,q , then 1/X ∼ Fq,p .
• If X ∼ tq , then X 2 ∼ F1,q .
• If X ∼ Fp,q , then
(p/q)X
1+(p/q)X
∼ Beta(p/2, q/2).
110
4
Convergence Concepts
Assume there is a sequence of random variables X1 , X2 , ..., Xn , ... In this
section, we study how the distribution of Xn converges to some limiting
distribution as n → ∞.
4.1
Convergence in Probability
Def: A sequence of random variables X1 , X2 , ..., Xn , ... →p X if for every
> 0,
lim P (|Xn − X| < ) → 1,
n→∞
or equivalently
lim P (|Xn − X| ≥ ) → 0,
n→∞
Remark:
• The sequence X1 , X2 , ... here is not required to be iid. Actually, in
many cases, the distribution of Xn changes as n changes.
• The limiting variable X may be a constant or a random variable.
Example: Let X be a random variable having the distribution exp(1).
Define Xn = (1 + n1 )X for n = 1, 2, · · ·. Show that Xn →p X.
111
Weak Law of Large Numbers. (WLLN): Let X1 , . . . , Xn be iid with
E(X) = µ, var(X) < ∞. Then
X1 + · · · + Xn
→p µ.
n
Proof: Chebychev’s Inequality.
[The sample mean goes to the population mean in probability, and this
property is called consistency.]
Examples:
¯ →p µ.
X1 , . . . , Xn ∼ N (µ, σ 2 ) iid. Then X
¯
X1 , . . . , Xn ∼ Bin(1, p) iid. Then X →p p. (The sample proportion goes
to the population proportion)
Convergence of Transformations:
Assume Xn →p X, Yn →p Y , then
(1) aXn + bYn →p aX + bY, Xn Yn →p XY
(2) Xn /Yn →p X/Y if P (Y = 0) = 0.
(3) Assume g is a continuous function. Then g(Xn ) →p g(X).
(4) Assume h is a continuous function. Then h(Xn , Yn ) →p h(X, Y ).
Proof for (3):
Example: If Var(Xi2 ) < ∞, or equivalently E(Xi4 ) < ∞, then Sn2 →p σ 2 .
And Sn is a consistent estimator of σ.
112
4.2
Almost Sure Convergence
Def.: We say Xn → X almost sure (a.s.) if
P ({ω ∈ S : Xn (ω) → X(ω)}) = 1,
where S is the sample space. In some sense, this convergence can be regarded
as pointwise convergence (almost everywhere)
Example: Let S = [0, 1], with the uniform probability distribution. Define
Xn (ω) = ω + ω n and X(ω) = ω. Then Xn → X a.s.
Borel-Cantelli Lemma: If for any > 0,
Xn → X a.s.
P∞
n=1 P (|Xn
− X| > ) < ∞, then
Strong Law of Large Numbers. (SLLN)
Suppose X1 , . . . , Xn iid with E(X) = µ, var(X) = σ 2 . Then
X1 + · · · + Xn
→µ
n
a.s.
Remark: The sample mean goes to the population mean a.s. In particular,
the sample proportion goes to the population proportion.
113
Convergence of Transformations:
Assume Xn → X a.s., Yn → Y a.s., then
(1) aXn + bYn → aX + bY a.s., Xn Yn → XY a.s.
(2) Xn /Yn → X/Y a.s. if P (Y = 0) = 0.
(3) Assume g is a continuous function. Then g(Xn ) → g(X) a.s.
(4) Assume h is a continuous function. Then h(Xn , Yn ) → h(X, Y ) a.s.
Example: Sn2 = (n − 1)−1
Pn
i=1 (Xi
¯ n )2 → σ 2 a.s.
−X
114
4.3
Convergence in Distribution
Def.: We say Xn →d X if the sequence of distribution functions FXn of Xn
converge to that of X in an appropriate sense: FXn (x) → FX (x) for all x
where FX is continuous.
Remark: Convergence in distribution is very indirect unlike that in
probability or a.s. It is a property of the distribution, not of a specific
random variable. The random variables are secondary, and Xn ’s and X
need not even be related.
Example: Let Xn be a random variable following the distribution exp(βn ),
where βn = 1 + n1 , for n = 1, 2, · · ·. Let X be a random variable having the
distribution exp(1). Show that Xn →d X.
Result: Direct verification of Xn →d X is often difficult. A very useful
criterion for Xn →d X is the convergence of MGFs, that is
MXn (t) → MX (t),
for all tin a neighborhood of 0.
We have seen previously:
Example: Consider the sequence X1 , ...., Xr , ..., with Xr ∼NB(r,p). If
r → ∞, p → 1 so that r(1 − p) → λ, then Xr →d X with X ∼ Poisson (λ).
Example: Consider the sequence X1 , ...., Xn , ..., with Xr ∼Bin(n,p). If
n → ∞, p → 0, np → λ, then Xn →d X where X ∼ Poisson(λ).
115
Theorem: If Xn →d X and g is a continuous function, then g(Xn ) →d
g(X). This holds even if X is a random vector.
Example: Let X1 , X2 , . . . be random variables. If Xn →d X, where X has
the N (0, 1) distribution, then Xn2 →d X 2 , which has the χ21 distribution.
Example: Let (X1 , Y1 ), (X2 , Y2 ), . . . be a sequence of random bivariate satisfying (Xn , Yn ) →d (X, Y ), where X and Y are independently following
N (0, 1). Then Xn /Yn →d X/Y , which has the Cauchy distribution.
4.4
Relationship Among Different Types of Convergences.
(i)
Xn → X a.s.
=⇒ Xn →p X
=⇒ Xn →d X.
(ii) In general, the inverse direction in the above is not true.
Example 5.5.8.
(iii) When X = c is a constant (i.e., X takes the deterministic value c with
probability one), Xn →p c is equivalent to Xn →d c.
(iii) Slutsky’s Theorem. If Xn →d X, Yn →p c (constant), then
(a) Xn + Yn →d X + c,
(b) Xn Yn →d cX,
(b) Xn /Yn →d X/c if c 6= 0.
116
5
Central Limit Theorem.
Let X1 , X2 , . . . be iid with mean µ and variance σ 2 . Then
√ ¯
n(Xn − µ)
→d N (0, 1).
σ
Example: A shooter hits a target with probability p independently in each
attempt. She decides to hit the target r times. Let X stand for the number
of attempts she needs.
(a) State the distribution of X.
(b) If r → ∞ and 0 < p < 1 remains fixed, show that the distribution
of r−1/2 (X − r/p) converges a normal distribution. Specify the mean
and variance for the limiting distribution.
√ ¯
Example: (A useful application) By CLT, n(X
n − µ)/σ →d N (0, 1).
However, often σ is unknown. So we can use its estimate
n
X
¯ n )2 /(n − 1)]1/2 .
Sn = [ (Xi − X
i=1
By Slutsky,
√
¯ n − µ)
n(X
→d N (0, 1).
Sn
This can be used to test hypothesis or construct confidence interval for µ.
117
6
Delta Methods
If X is normal, any linear transform of X is also normal. This is, however,
not true for nonlinear transformations.
If Xn is asymptotically normal (with mean µ and variance σn2 going to
zero), then for any smooth function g, g(Xn ) is also asymptotically normal.
In most applications, CLT gives σn2 = τ 2 /n for sample averages. Delta
method can be applied in such situations to calculate the asymptotic distribution of functions of sample average.
Intuitively, when σn2 is small, Xn is concentrated near µ and thus only
the behavior of g(x) near µ matters. Any smooth function behaves locally
like a linear function. More formally, g(x) can be expanded near µ as
g(x) = g(µ) + g 0 (µ)(x − µ) + o(|x − µ|).
Thus
g(Xn ) = g(µ) + g 0 (µ)(Xn − µ) + Remainder.
(i) First-order Delta method:
√
Let Xn be a sequence of r.v.s satisfying n(Xn − µ) →d N (0, σ 2 ). If
g 0 (µ) 6= 0, then
√
n[g(Xn ) − g(µ)] →d N (0, σ 2 [g 0 (µ)]2 )
by Slutsky’s theorem. In other words, g(Xn ) is asymptotically normal with
mean g(µ) and variance [g 0 (µ)]2 σ 2 /n.
¯
Example
p 1: X1 , . . . , Xn iid Poi(λ). Find the asymptotic distributions of Xn
¯n.
and X
Further, λ can be substituted by Slutsky.
118
¯n.
Example 2: X1 , . . . , Xn iid binom(1, p). Define Yn = X
(1) Find the asymptotic distribution of − log Yn .
¯ n (1 − X
¯ n ), assume p 6= 1 .
(2) Find the asymptotic distribution of X
2
Further, p can be substituted by Slutsky.
119
(ii) Second-order Delta method:
√
Let Xn be a sequence of r.v.s satisfying n(Xn − µ) →d N (0, σ 2 ). If
g 0 (µ) = 0 and g 00 (µ) 6= 0, then
n[g(Xn ) − g(µ)] →d σ 2
Note that: g(Xn ) = g(µ) + 0 +
g 00 (µ)
2 (Xn
g 00 (µ) 2
χ1 .
2
− µ)2 + Remainder.
Example: X1 , . . . , Xn iid binom(1, p). Find the asymptotic distribution
¯
¯ n ), assume p = 1 .
of Xn (1 − X
2
(iii) Multivariate Delta method
Let X 1 , . . . , X n be p-dimensional
random vector i.i.d. with E(X) = µ
1 Pn
¯
and D(X) = Σ. Define Xj = n i=1 Xij for j = 1, . . . , p, then
√
¯1, . . . , X
¯ p ) − g(µ1 , . . . , µp )] →d N (0, [g 0 (µ)]T Σ[g 0 (µ)]),
n[g(X
∂g(µ)
∂g(µ)
¯1, . . . , X
¯p)
where g 0 (µ) = ( ∂µ1 , . . . , ∂µp )T . In other words, we have g(X
is AN(g(µ), n−1 [g 0 (µ)]T Σ[g 0 (µ)]).
Note g(x) = g(µ) + [g 0 (µ)]T (x − µ) + o(kx − µk2 ).
120
Some Exercises
Example: Let X1 , . . . , Xn be Normal(0, θ) and Y1 , . . . , Yn be Normal(0, 1),
and let all the variables be mutually independent. Consider
Vn =
(a) Show that E(Vn ) =
X12 + · · · + Xn2
.
Y12 + · · · + Yn2
nθ
n−2 .
(b) Using the fact that
Pn
Vn − θ =
(X 2 − θY 2 )
i=1
Pn i 2 i ,
i=1 Yi
√
n(Vn − θ) converges in distribution to Normal (0, 4θ2 ).
√
(c) Is it true that n(Vn −E(Vn )) converges in distribution to Normal(0, 4θ2 )?
show that
(d) Obtain the asymptotic distribution of log Vn .
121
Example: Let Xi , i =, 1, 2, · ·Q
· be independent uniform random variables
on interval (0, 1). Define Yn = ni=1 Xi , where n is a positive integer.
(a) Derive the distribution of − log(Yn ).
(b) Define Tn = (Yn )−1/n . Show that Tn converges in probability to e1 as
n goes to infinity.
√
(c) Show that n(Tn − e1 ) converges in distribution to a normal random
variable N (0, τ ) as n goes to infinity. Determine the value of τ .
122
Example: (X, Y ) is a bivariate random variable and define θ = Pr(X < Y ).
(a) Define the function H(X, Y ) to take the value 1 if X < Y and 0
otherwise. Compute E[H(X, Y )] = θ.
(b) Let the pairs (Xi , Yi ), i = 1, · · · , n P
be i.i.d. samples with the same
distribution as (X, Y ). Define T = ni=1 H(Xi , Yi ). What is the distribution of T ?
(c) Show that T /n →p θ.
(d) Describe the limiting distribution of n−1/2 (T − nθ).
123
Example: Let Xi , i = 1, · · · , n, and Yj , j = 1, · · · , m, be independent
¯ and Sx2 denote
normal random variables with mean µ and variance σ 2 . Let X
the sample mean and sample variance of Xi ’s; similarly, define Y¯ and Sy2 for
Yj ’s.
¯ − Y¯ . (5pts)
(a) Find the distribution of X
(b) Find the distribution of (n − 1)Sx2 + (m − 1)Sy2 /σ 2 . (5pts)
(c) Define
r
T =
¯ − Y¯
nm(n + m − 2)
X
q
.
n+m
(n − 1)S 2 + (m − 1)S 2
x
y
Show that T follows a t-distribution with m+n−2 degrees of freedom.
(6 pts)
124