Estimation and Hypothesis Testing with Large Sample of Data: Part II
Transcription
Estimation and Hypothesis Testing with Large Sample of Data: Part II
Estimation and Hypothesis Testing with Large Sample of Data: Part II Ba Chu E-mail: ba [email protected] Web: http://www.carleton.ca/∼bchu (Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for details. Examples will be given and explained in the class.) 1 Objectives We have learnt various ways to characterize the behaviour of discrete and continuous random variables. Given a specific distribution, we can calculate the probabilities of various events and the moments of the random variable. We have also learnt the WLLN and the CLT designed to characterize the in-the-limit behaviour of sample means. In view of the above concepts and methods, the purpose of this lecture is to introduce basic types of statistical inference – the phrase “stat. inference” describes any procedure for extracting information about a prob. distribution from a sample of observations. More specifically, I will discuss 3 types of stat. inference: point estimation, hypothesis testing, and confidence interval in the context of drawing inferences about a population mean. Throughout this lecture, the following assumptions are maintained: 1. X1 , . . . , Xn are IID. 2. Both E[Xi ] = µ and V ar(Xi ) exist and are finite. We are interested in drawing inferences about µ. 1 3. The sample size, n, is large so that we can use the CLT to approximate a standardized random variable with the standard normal random variable. 2 Preample about Stat. Inference We should begin with the classic example – spinning a coin 100 times. Each spin is a Bernoulli trial, and the experiment is a sequence of 100 trials. Let Xi denote the outcome of spin i, where Xi = 1 if Heads turn up and Xi = 0 if Tails turns up. Then X1 , . . . , X100 ∼ Bernoulli(p), where p is the fixed, but unknown probability that a single spin will result in Heads – p is also equal P100 1 to the expected value of each trial, E[Xi ]. Let X n = 100 1 Xi denote the sample mean. After performing this experiment, we observe a sample of realizations (or observations), x1 , . . . , xn , and P100 1 compute xn = 100 1 xi = 0.32. We emphasized that p is fixed, but unknown. Our goal is to infer about this fixed, but unknown quantity. Hence, we may ask ourself 3 following questions: 1. What can we guess about the true value of p based on the sample of observations? 2. Is the coin is fair (i.e., p = 0.5)?. If the answer is no, is the evidence that p 6= 0.5 so compelling or plausible? 3. What are plausible values of p? In particular, is there a subset of [0,1] that we can confidently claim contains the true value of p? The first question introduces a type of inference that statisticians call point estimation. In this case, the point estimate of p is pb = xn = 0.32. The second set of questions introduces a type of inference that statisticians call hypothesis testing. Having computed pb = 0.32, we may be inclined to guess that the true value of p is not 0.5 (p 6= 0.5). How much can we believe in this guess?. To answer this question, we note that Y = nX n is a Binomial random variable with mean, E[Y ] = np = 100 × 0.5 = 50. Then, the prob. that Y 2 will deviate from its mean by at least 100(b p − 0.50) = 18 is p = P (|Y − 50| ≥ 18) = P (Y ≤ 32 or Y ≥ 68) = P (Y ≤ 32) + P (Y ≥ 68) ≈ 0.000408, where p is called the significance prob. – the prob. that the difference between the sample estimate pb and the hypothetical true value p = 0.5 is greater or equal to 18. Hence, p is small enough for us to believe that p 6= 0.5. 3 Point Estimation The purpose of point estimation is to make a reasonable guess of the unknown quantity, e.g., the population mean and the population variance. We already knows that the sample estimate of a population mean, µ, is n 1X Xn = Xi . n 1 The random variable X n is also called a statistic. This estimation procedure is called an estimator. The quality of an individual estimate depends on the individual sample from which it was computed and is thus affected by chance variation. Moreover, it is rarely possible to assess how close to correct an individual estimate may be. For these reasons, we shall study the statistical properties of an estimator: unbiasedness and consistency. We know that E[X n ] = µ. Thus, on average, our procedure for guessing the population mean produces the correct value. We express this property by saying that X n is an unbiased estimator of µ. One appealing characteristic of the sample mean is that it has rather small variance as the sample size becomes large, i.e., V ar(X n ) = σ2 . n 3 p The WLLN says that X n =⇒ µ. This implies that X n is a consistent estimator of µ. Property of consistency is essential – it is difficult to conceive a circumstance in which one would be willing to use an estimation procedure that might fail regardless of how much data one collected. We also know that the sample estimate of a population variance, σ 2 , is n 1X σb2 = (Xi − X n )2 . n i=1 n−1 2 σ n < σ 2 . We can obtain an unbiased Pn n n b2 1 2 estimator by multiplying σb2 by the factor n−1 σ = n−1 . That is, Sn2 = n−1 1 (Xi − X n ) ; and It is immediately to see that σb2 is biased because E[σb2 ] = indeed, E[Sn2 ] = σ 2 . In fact, both σb2 and Sn2 are consistent estimates of σ 2 , but we prefer σb2 for a subtle reason. 4 Hypothesis Testing 4.1 Heuristics In the preample, we considered the possibility that spinning a penny is fair (p = 0.5) vs. the possibility that spinning a penny is not fair (p = 0.5). The goal of hypothesis testing is to decide, based on an experiment, which hypothesis is correct, i.e., which hypothesis contains the true state of nature. However, even when a hypothesis is accepted, it does not imply that we can be 100% sure about the true state of nature – there is some subtle errors, which we have to accept, in the hypothesis testing process. Thus, the essence of hypothesis testing can be described by the familiar saying, “Where there is smoke, there is fire”. Returning to the preample, we noted that we observed only 32 Hs when we would expect to observe 50 if p = 0.5. This is a discrepancy of |32 − 50| = 18, and we considered that possibility that such a large discrepancy might have been produced by chance. Specifically, we calculated the significance prob. p = P (|Y − 50| ≥ 18) under the assumption that p = 0.5, and obtained p = 0.0004. On this basis, p 6= 0.5. Loosely speaking, a significance prob. can be interpreted as the prob. that chance would produce a coincidence at least as extraordinary as the phenomenon 4 observed. Statisticians often describe hypothesis testing as a game that they play against Nature. To study this game in detail, it becomes necessary to distinguish between the two hypotheses: the null hypothesis (H0 ), which is the hypothesis which one wants to test by default, and the alternative hypothesis (H1 ), which is the hypothesis that one requires compelling evidences to accept. As I have mentioned there is some subtle errors in the hypothesis testing. We define the type I error as the error committed when we choose H1 when in fact H0 is true, and the type II error as the error committed when we choose H0 when in fact H1 is true. Because we are concerned with probabilistic evidence, any decision procedure (or testing procedure) that we devise will occasionally result in error. Obviously, we would like to devise a testing procedure that minimize the probabilities of committing errors. Unfortunately, there is an inevitable tradeoff between Type I and Type II errors, thus we can not minimize both these errors. On this basis, statisticians often fix Type I errors within a certain bound, then minimize Type II errors. A preference for Type II errors instead of Type I errors can often be seen in scientific applications because science is conservative, it is generally considered better to wrongly accept to wrongly reject a wisdom. A decision rule (or a test statistic) is called level α test if the prob. of Type I errors is not greater than α, where α is the significance level specifies how small a significance prob. is required in order to conclude that a phenomenon is not a coincidence. The significance level α is often fixed by statisticians. In the preample, we want to test H0 : p = 0.5 vs. H1 : p 6= 0.5. A Type I error occurs if p = 0.5; and p = P (|Y − 50| ≥ 18) = 0.0004. If we set the level α = 0.05, then p < α. In this case, we have just constructed a level α test statistic, |Y − 50|. 4.2 Testing Hypotheses About a Population Mean We consider tesing H0 : µ = µ0 vs. H1 : µ 6= µ0 . The intuition that we are seeking to formalize is straight-forward. By the WLLN, the observed sample mean ought to be fairly close to the true population mean. Hence, if the null hypothesis is true, then X n ought to be fairly close to the 5 hypothesized mean, µ0 . If we observe X n far from µ0 , then we guess that µ 6= µ0 , i.e., we reject H0 . Given a significance level, α, we want to calculate a significance prob., p, which is determined by the sample. We reject H0 if p ≤ α. In the present situation, the significance prog. is given by p = PH0 (|X n − µ0 | ≥ |xn − µ0 |), where xn is the average of observations. Furthermore, we note that • the hypothesized mean, µ0 , is a real number that is either fixed or known to the researcher. • the average of observations, xn , is a real number that is calculated from the sample of observation, and indeed it is known to the researcher. Hence, the quantity |xn − µ0 | is a fixed real number. • the sample mean, X n , is a random variable. Hence, the inequality |X n − µ0 | ≥ |xn − µ0 | defines an event that the sample mean assumes a value at least as far from the hypothesized mean as the researcher observed. • the significance probability, p, is the prob. that the above-defined event occurs. The notation PH0 reminds us that we are interested in the prob. that this event occurs under the assumption that the null hypothesis is true. The smaller the significance prob., the more confidently we reject the null hypothesis. Having formulated an appropriate significance prob. for testing H0 vs. H1 , we now need to formulate a method to compute p. 4.2.1 Case 1: The population variance is known or specified by the null hypothesis. We define the test statistic Zn = X n −µ √ 0 σ/ n and a real number z = xn −µ √ 0. σ/ n Under H0 : µ = µ0 , Zn converges to the standard normal random variable by the CLT; therefore, p = PH0 (|X n − µ0 | ≥ |xn − µ0 |) = 1 − PH0 (−|xn − µ0 | < X n − µ0 < |xn − µ0 ) = 1 − PH0 (−|z| < Zn < |z|) = 1 − (Φ(|z|) − Φ(|z|)) = 2Φ(−|z|). 6 Setting the level α equal to 0.05, we reject H0 if p ≤ 0.05. Example 1. When Xi ∼ Bernoulli(µ), σ 2 = p(1 − p); hence,under the null hypothesis that µ = µ0 , we have z = √ xn −µ0 . µ0 (1−µ0 )/n Suppose that µ0 = 0.5 and n = 2500, then z = −2 and p = 2Φ(−2) = 0.0456 < 0.05 = α. We reject H0 . 4.2.2 Case 2: The population variance is unknown. Since σ 2 is unknown, then we replace it with its unbiased, consistent estimator, Sn2 . In this case, we define the test statistic Tn = X n −µ √0 Sn / n and a real number t = xn −µ √0 , xn / n where sn is the sample standard deviation we observed. The same argument as Case 1 yields p = 2Φ(−|t|). Example 2. To test H0 : µ = 20 vs. H1 : µ 6= 20 at the significance level α = 0.05, we collect n = 400 observations, observing that xn = 21.829 and sn = 24.7. The value of the test statistic is t = 1.48123 and the significance prob. is p = 2Φ(−1.48123) = 0.1385 > 0.05 = α, we accept H0 . 4.3 One-Sided Hypotheses Consider testing each of the following: a) H0 : µ ≤ µ0 vs. H1 : µ > µ0 . b) H0 : µ ≥ µ0 vs. H1 : µ < µ0 . Qualitatively, we are inclined to reject the above hypotheses if a) We observe xn − µ >> 0. This is equivalent to the significance prob. pa = PH0 (Tn ≥ t) < 0.05 = α. b) We observe xn − µ << 0. This is equivalent to the significance prob. pb = PH0 (Tn ≤ t) < 0.05 = α. Remark 4.1. We conclude this section with a note that formulating suitable hypotheses requires good judgement, which can be only acquired through practice. Depending on specific cases, there are some key questions that must be answered: Why was the experiment performed? Who needs to be convinced of what? Is one type of error perceived as more important than the other?. 7 5 Confidence Interval for The Population Mean, µ A confidence interval for a unknown µ is an interval, (L, U ), with endpoints, L and U calculated from the sample. Ideally, the resulting interval will have two properties: it will contain the population mean a large proportion of time, and it will be narrow. These upper (U ) and lower (L) limits of the confidence interval are called the upper and lower confidence limits. The probability that a confidence interval will contain µ is called the confidence coefficient. As we know, the distribution of the sample mean is approximately standard normal for a large n, that is Zn = X n√ −µ σ/ n ≈ N (0, 1). Suppose that a statistician can choose a significance level α = 0.05, the confidence coefficient is 1 − α = 0.95; hence, P (−z < Zn < z) ≈ Φ(z) − Φ(−z) = 1 − α = 0.95, (5.1) where z is the 95% quantile of the standard normal distribution, which is equal to 1.96. With a little algebra, we have: √ √ P (−z < Zn < z) = P (X n − 1.96σ/ n < µ < X n + 1.96σ/ n) = 0.95. Substituting X n and σ with the sample average xn and the sample standard deviation sn respec√ √ tively, we obtain the 95% confidence interval (xn − 1.96sn / n < µ < xn + 1.96sn / n). We usually state that we are confident that this interval contains the population mean, µ, 95% of time. 6 Exercises 1. Suppose that the White House randomly selects 10, 000 from the population of all eligible voters for the next election and asks them whether they will vote Republican or not. Let p be the true proportion of all eligible voters who will vote Republican; and the number of people in the sample who responded that they will vote Republican is 2, 000. Suppose that we want to test H0 : p = p0 vs. H1 : p 6= p0 , where p0 is fixed and known to you. 8 Derive a test with a significance level, α = 0.05. Your answer should involve p0 , n, and a significance prob. from the standard normal distribution. Be sure to show exactly where and how you use the CLT. 2. Suppose that you spun a penny 89 times and observed 2 Heads. Let p denote the true probability that one spin will result in Heads. (a) The significance probability for testing H0 : p ≥ 0.3 vs. H1 : p < 0.3 is p = P (Y ≤ 2), where Y ∼ Binomial(89, 0.3). i. Compute p, using the binomial distribution. [Hint: You can use the probability table for the binomial random variable to find cdfs or http://www.adsciengineering. com/bpdcalc/. For further details on the binomial random variable, follow the link: http://en.wikipedia.org/wiki/Binomial_distribution.] ii. Approximate p using the normal distribution. How good is this approximation? (b) Construct a 95% confidence interval for p. 3. In a sample of five measurements, the diameter of a sphere was recorded by a scientist as 6.33, 6.37, 6.32, and 6.37 cm. Determine unbiased and consistent estimates of (a) the true mean and (b) the true variance. 4. A random sample of 20 ECON 4002 grades out of a total of 72 showed a mean of 77 and a standard deviation of 10. (a) What are the 95% confidence limits for estimates of the mean of 72 grades. (b) With what degree of confidence could we say that the mean of all 72 grades is 77 ± 1. 5. Exercises 6.2.7, 6.2.9, and 6.2.11 (LM06, pp. 439) 9
Similar documents
Hypothesis Testing: 1-Sample & 2-Sample Cases Perform the required hypothesis test for two population means. You may presume that the assumptions for using the
More information