Estimation and Hypothesis Testing with Large Sample of Data: Part II

Transcription

Estimation and Hypothesis Testing with Large Sample of Data: Part II
Estimation and Hypothesis Testing with Large Sample of
Data: Part II
Ba Chu
E-mail: ba [email protected]
Web: http://www.carleton.ca/∼bchu
(Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for
details. Examples will be given and explained in the class.)
1
Objectives
We have learnt various ways to characterize the behaviour of discrete and continuous random
variables. Given a specific distribution, we can calculate the probabilities of various events and
the moments of the random variable. We have also learnt the WLLN and the CLT designed
to characterize the in-the-limit behaviour of sample means. In view of the above concepts and
methods, the purpose of this lecture is to introduce basic types of statistical inference – the phrase
“stat. inference” describes any procedure for extracting information about a prob. distribution
from a sample of observations. More specifically, I will discuss 3 types of stat. inference: point
estimation, hypothesis testing, and confidence interval in the context of drawing inferences
about a population mean. Throughout this lecture, the following assumptions are maintained:
1. X1 , . . . , Xn are IID.
2. Both E[Xi ] = µ and V ar(Xi ) exist and are finite. We are interested in drawing inferences
about µ.
1
3. The sample size, n, is large so that we can use the CLT to approximate a standardized random
variable with the standard normal random variable.
2
Preample about Stat. Inference
We should begin with the classic example – spinning a coin 100 times. Each spin is a Bernoulli
trial, and the experiment is a sequence of 100 trials. Let Xi denote the outcome of spin i, where
Xi = 1 if Heads turn up and Xi = 0 if Tails turns up. Then X1 , . . . , X100 ∼ Bernoulli(p), where
p is the fixed, but unknown probability that a single spin will result in Heads – p is also equal
P100
1
to the expected value of each trial, E[Xi ]. Let X n = 100
1 Xi denote the sample mean. After
performing this experiment, we observe a sample of realizations (or observations), x1 , . . . , xn , and
P100
1
compute xn = 100
1 xi = 0.32.
We emphasized that p is fixed, but unknown. Our goal is to infer about this fixed, but unknown
quantity. Hence, we may ask ourself 3 following questions:
1. What can we guess about the true value of p based on the sample of observations?
2. Is the coin is fair (i.e., p = 0.5)?. If the answer is no, is the evidence that p 6= 0.5 so compelling
or plausible?
3. What are plausible values of p? In particular, is there a subset of [0,1] that we can confidently
claim contains the true value of p?
The first question introduces a type of inference that statisticians call point estimation. In this case,
the point estimate of p is pb = xn = 0.32.
The second set of questions introduces a type of inference that statisticians call hypothesis
testing. Having computed pb = 0.32, we may be inclined to guess that the true value of p is not 0.5
(p 6= 0.5). How much can we believe in this guess?. To answer this question, we note that Y = nX n
is a Binomial random variable with mean, E[Y ] = np = 100 × 0.5 = 50. Then, the prob. that Y
2
will deviate from its mean by at least 100(b
p − 0.50) = 18 is
p = P (|Y − 50| ≥ 18)
= P (Y ≤ 32 or Y ≥ 68)
= P (Y ≤ 32) + P (Y ≥ 68)
≈ 0.000408,
where p is called the significance prob. – the prob. that the difference between the sample estimate
pb and the hypothetical true value p = 0.5 is greater or equal to 18. Hence, p is small enough for us
to believe that p 6= 0.5.
3
Point Estimation
The purpose of point estimation is to make a reasonable guess of the unknown quantity, e.g., the
population mean and the population variance.
We already knows that the sample estimate of a population mean, µ, is
n
1X
Xn =
Xi .
n 1
The random variable X n is also called a statistic. This estimation procedure is called an estimator.
The quality of an individual estimate depends on the individual sample from which it was computed
and is thus affected by chance variation. Moreover, it is rarely possible to assess how close to correct
an individual estimate may be. For these reasons, we shall study the statistical properties of an
estimator: unbiasedness and consistency.
We know that E[X n ] = µ. Thus, on average, our procedure for guessing the population mean
produces the correct value. We express this property by saying that X n is an unbiased estimator
of µ. One appealing characteristic of the sample mean is that it has rather small variance as the
sample size becomes large, i.e., V ar(X n ) =
σ2
.
n
3
p
The WLLN says that X n =⇒ µ. This implies that X n is a consistent estimator of µ. Property
of consistency is essential – it is difficult to conceive a circumstance in which one would be willing
to use an estimation procedure that might fail regardless of how much data one collected.
We also know that the sample estimate of a population variance, σ 2 , is
n
1X
σb2 =
(Xi − X n )2 .
n i=1
n−1 2
σ
n
< σ 2 . We can obtain an unbiased
Pn
n
n b2
1
2
estimator by multiplying σb2 by the factor n−1
σ = n−1
. That is, Sn2 = n−1
1 (Xi − X n ) ; and
It is immediately to see that σb2 is biased because E[σb2 ] =
indeed, E[Sn2 ] = σ 2 .
In fact, both σb2 and Sn2 are consistent estimates of σ 2 , but we prefer σb2 for a subtle reason.
4
Hypothesis Testing
4.1
Heuristics
In the preample, we considered the possibility that spinning a penny is fair (p = 0.5) vs. the
possibility that spinning a penny is not fair (p = 0.5). The goal of hypothesis testing is to decide,
based on an experiment, which hypothesis is correct, i.e., which hypothesis contains the true state
of nature. However, even when a hypothesis is accepted, it does not imply that we can be 100%
sure about the true state of nature – there is some subtle errors, which we have to accept, in the
hypothesis testing process. Thus, the essence of hypothesis testing can be described by the familiar
saying, “Where there is smoke, there is fire”.
Returning to the preample, we noted that we observed only 32 Hs when we would expect to
observe 50 if p = 0.5. This is a discrepancy of |32 − 50| = 18, and we considered that possibility
that such a large discrepancy might have been produced by chance. Specifically, we calculated
the significance prob. p = P (|Y − 50| ≥ 18) under the assumption that p = 0.5, and obtained
p = 0.0004. On this basis, p 6= 0.5. Loosely speaking, a significance prob. can be interpreted as
the prob. that chance would produce a coincidence at least as extraordinary as the phenomenon
4
observed.
Statisticians often describe hypothesis testing as a game that they play against Nature. To
study this game in detail, it becomes necessary to distinguish between the two hypotheses: the null
hypothesis (H0 ), which is the hypothesis which one wants to test by default, and the alternative
hypothesis (H1 ), which is the hypothesis that one requires compelling evidences to accept.
As I have mentioned there is some subtle errors in the hypothesis testing. We define the type I
error as the error committed when we choose H1 when in fact H0 is true, and the type II error as
the error committed when we choose H0 when in fact H1 is true.
Because we are concerned with probabilistic evidence, any decision procedure (or testing procedure) that we devise will occasionally result in error. Obviously, we would like to devise a testing
procedure that minimize the probabilities of committing errors. Unfortunately, there is an inevitable
tradeoff between Type I and Type II errors, thus we can not minimize both these errors. On this
basis, statisticians often fix Type I errors within a certain bound, then minimize Type II errors.
A preference for Type II errors instead of Type I errors can often be seen in scientific applications
because science is conservative, it is generally considered better to wrongly accept to wrongly reject
a wisdom. A decision rule (or a test statistic) is called level α test if the prob. of Type I errors
is not greater than α, where α is the significance level specifies how small a significance prob. is
required in order to conclude that a phenomenon is not a coincidence. The significance level α is
often fixed by statisticians.
In the preample, we want to test H0 : p = 0.5 vs. H1 : p 6= 0.5. A Type I error occurs if
p = 0.5; and p = P (|Y − 50| ≥ 18) = 0.0004. If we set the level α = 0.05, then p < α. In this case,
we have just constructed a level α test statistic, |Y − 50|.
4.2
Testing Hypotheses About a Population Mean
We consider tesing H0 : µ = µ0 vs. H1 : µ 6= µ0 . The intuition that we are seeking to formalize
is straight-forward. By the WLLN, the observed sample mean ought to be fairly close to the true
population mean. Hence, if the null hypothesis is true, then X n ought to be fairly close to the
5
hypothesized mean, µ0 . If we observe X n far from µ0 , then we guess that µ 6= µ0 , i.e., we reject H0 .
Given a significance level, α, we want to calculate a significance prob., p, which is determined
by the sample. We reject H0 if p ≤ α. In the present situation, the significance prog. is given by
p = PH0 (|X n − µ0 | ≥ |xn − µ0 |), where xn is the average of observations. Furthermore, we note that
• the hypothesized mean, µ0 , is a real number that is either fixed or known to the researcher.
• the average of observations, xn , is a real number that is calculated from the sample of observation, and indeed it is known to the researcher. Hence, the quantity |xn − µ0 | is a fixed real
number.
• the sample mean, X n , is a random variable. Hence, the inequality |X n − µ0 | ≥ |xn − µ0 |
defines an event that the sample mean assumes a value at least as far from the hypothesized
mean as the researcher observed.
• the significance probability, p, is the prob. that the above-defined event occurs. The notation
PH0 reminds us that we are interested in the prob. that this event occurs under the assumption
that the null hypothesis is true. The smaller the significance prob., the more confidently we
reject the null hypothesis.
Having formulated an appropriate significance prob. for testing H0 vs. H1 , we now need to formulate
a method to compute p.
4.2.1
Case 1: The population variance is known or specified by the null hypothesis.
We define the test statistic Zn =
X n −µ
√ 0
σ/ n
and a real number z =
xn −µ
√ 0.
σ/ n
Under H0 : µ = µ0 , Zn
converges to the standard normal random variable by the CLT; therefore,
p = PH0 (|X n − µ0 | ≥ |xn − µ0 |)
= 1 − PH0 (−|xn − µ0 | < X n − µ0 < |xn − µ0 )
= 1 − PH0 (−|z| < Zn < |z|) = 1 − (Φ(|z|) − Φ(|z|))
= 2Φ(−|z|).
6
Setting the level α equal to 0.05, we reject H0 if p ≤ 0.05.
Example 1. When Xi ∼ Bernoulli(µ), σ 2 = p(1 − p); hence,under the null hypothesis that µ = µ0 ,
we have z = √
xn −µ0
.
µ0 (1−µ0 )/n
Suppose that µ0 = 0.5 and n = 2500, then z = −2 and p = 2Φ(−2) =
0.0456 < 0.05 = α. We reject H0 .
4.2.2
Case 2: The population variance is unknown.
Since σ 2 is unknown, then we replace it with its unbiased, consistent estimator, Sn2 . In this case, we
define the test statistic Tn =
X n −µ
√0
Sn / n
and a real number t =
xn −µ
√0 ,
xn / n
where sn is the sample standard
deviation we observed. The same argument as Case 1 yields p = 2Φ(−|t|).
Example 2. To test H0 : µ = 20 vs. H1 : µ 6= 20 at the significance level α = 0.05, we collect
n = 400 observations, observing that xn = 21.829 and sn = 24.7. The value of the test statistic is
t = 1.48123 and the significance prob. is p = 2Φ(−1.48123) = 0.1385 > 0.05 = α, we accept H0 .
4.3
One-Sided Hypotheses
Consider testing each of the following:
a) H0 : µ ≤ µ0 vs. H1 : µ > µ0 .
b) H0 : µ ≥ µ0 vs. H1 : µ < µ0 .
Qualitatively, we are inclined to reject the above hypotheses if
a) We observe xn − µ >> 0. This is equivalent to the significance prob. pa = PH0 (Tn ≥ t) < 0.05 =
α.
b) We observe xn − µ << 0. This is equivalent to the significance prob. pb = PH0 (Tn ≤ t) < 0.05 =
α.
Remark 4.1. We conclude this section with a note that formulating suitable hypotheses requires
good judgement, which can be only acquired through practice. Depending on specific cases, there are
some key questions that must be answered: Why was the experiment performed? Who needs to be
convinced of what? Is one type of error perceived as more important than the other?.
7
5
Confidence Interval for The Population Mean, µ
A confidence interval for a unknown µ is an interval, (L, U ), with endpoints, L and U calculated from
the sample. Ideally, the resulting interval will have two properties: it will contain the population
mean a large proportion of time, and it will be narrow. These upper (U ) and lower (L) limits of
the confidence interval are called the upper and lower confidence limits. The probability that a
confidence interval will contain µ is called the confidence coefficient.
As we know, the distribution of the sample mean is approximately standard normal for a large
n, that is Zn =
X n√
−µ
σ/ n
≈ N (0, 1). Suppose that a statistician can choose a significance level α = 0.05,
the confidence coefficient is 1 − α = 0.95; hence,
P (−z < Zn < z) ≈ Φ(z) − Φ(−z) = 1 − α = 0.95,
(5.1)
where z is the 95% quantile of the standard normal distribution, which is equal to 1.96.
With a little algebra, we have:
√
√
P (−z < Zn < z) = P (X n − 1.96σ/ n < µ < X n + 1.96σ/ n) = 0.95.
Substituting X n and σ with the sample average xn and the sample standard deviation sn respec√
√
tively, we obtain the 95% confidence interval (xn − 1.96sn / n < µ < xn + 1.96sn / n). We usually
state that we are confident that this interval contains the population mean, µ, 95% of time.
6
Exercises
1. Suppose that the White House randomly selects 10, 000 from the population of all eligible
voters for the next election and asks them whether they will vote Republican or not. Let p be
the true proportion of all eligible voters who will vote Republican; and the number of people
in the sample who responded that they will vote Republican is 2, 000. Suppose that we want
to test H0 : p = p0 vs. H1 : p 6= p0 , where p0 is fixed and known to you.
8
Derive a test with a significance level, α = 0.05. Your answer should involve p0 , n, and a
significance prob. from the standard normal distribution. Be sure to show exactly where and
how you use the CLT.
2. Suppose that you spun a penny 89 times and observed 2 Heads. Let p denote the true
probability that one spin will result in Heads.
(a) The significance probability for testing H0 : p ≥ 0.3 vs. H1 : p < 0.3 is p = P (Y ≤ 2),
where Y ∼ Binomial(89, 0.3).
i. Compute p, using the binomial distribution. [Hint: You can use the probability table
for the binomial random variable to find cdfs or http://www.adsciengineering.
com/bpdcalc/. For further details on the binomial random variable, follow the link:
http://en.wikipedia.org/wiki/Binomial_distribution.]
ii. Approximate p using the normal distribution. How good is this approximation?
(b) Construct a 95% confidence interval for p.
3. In a sample of five measurements, the diameter of a sphere was recorded by a scientist as 6.33,
6.37, 6.32, and 6.37 cm. Determine unbiased and consistent estimates of (a) the true mean
and (b) the true variance.
4. A random sample of 20 ECON 4002 grades out of a total of 72 showed a mean of 77 and a
standard deviation of 10.
(a) What are the 95% confidence limits for estimates of the mean of 72 grades.
(b) With what degree of confidence could we say that the mean of all 72 grades is 77 ± 1.
5. Exercises 6.2.7, 6.2.9, and 6.2.11 (LM06, pp. 439)
9