Chapter 18. Inference about a Population Mean. Theory & Problems.
Transcription
Chapter 18. Inference about a Population Mean. Theory & Problems.
Chapter 18. Inference about a Population Mean STAT 145 Conditions needed for realistic inference about population mean: • We can regard our data as a simple random sample (SRS) from the population. This condition is very important. • Observations from the population have a Normal distribution with mean μ and standard deviation σ . In practice, it is enough that the distribution be symmetric and single-peaked unless the sample is very small. Both μ and σ are unknown parameters. • In this book: the population must be much larger than the sample, say at least 20 times as large. Standard error When the standard deviation of a statistic is estimated from data, the result is called the standard error of the statistic. The standard error of the sample mean ¯x is s √n . The one-sample t statistic and the t distributions Draw an SRS of size n from a large population that has the Normal distribution with mean μ and standard deviation σ . The one-sample t statistic x −μ t= ¯ s / √n has the t distribution with n−1 degrees of freedom. Recall: s= √ n 1 ∑ ( x − ¯x )2 . n−1 i=1 i 1 Chapter 18. Inference about a Population Mean STAT 145 The t distributions • The density curves of the t distributions are similar in shape to the Standard Normal curve. They are symmetric about 0, single-peaked, and bell-shaped. • The spread of the t distributions is a bit greater than of the Standard Normal distribution. The t distributions have more probability in the tails and less in the center than does the Standard Normal. This is true because substituting the estimate s for the fixed parameter σ introduces more variation into the statistic. • As the degrees of freedom increase, the t density curve approaches the N(0, 1) curve ever more closely. This happens because s estimates σ more accurately as the sample size increases. So using s in place of σ causes little extra variation when the sample is large. 2 Chapter 18. Inference about a Population Mean STAT 145 The one-sample t confidence interval Draw an SRS of size n from a large population having unknown mean . A level C confidence interval for μ is ✴ s √n ¯x ±t ⋅ ✴ ✴ ✴ where t is the critical value for the t (n−1) density curve with area C between −t and t . This interval is exact when the population distribution is Normal and is approximately correct for large n in other cases. The one-sample t test Draw an SRS of size n from a large population having unknown mean μ . To test the hypothesis H 0 :μ=μ 0 , compute the one-sample t statistic t= ¯x −μ 0 s/√n In terms of a variable T having the t (n−1) distribution, the P-value for a test of H a :μ >μ0 is P(T ≥t) , H a :μ <μ0 is P(T ≤t) , H a :μ ≠μ0 is 2 P (T ≤−|t|) H 0 against or 2 P (T ≥|t|) - depends on convenience of use. These P-values are exact if the population distribution is Normal and are approximately correct for large n in other cases. Matched pairs t procedures To compare the responses to the two treatments in a matched pairs design, find the difference between the responses within each pair. Then apply the one-sample t procedures to these differences. Robust Procedures A confidence interval or significance test is called robust if the confidence level or P-value does not change very much when the conditions for use of the procedure are violated. t procedures are quite robust for non-Normality of the population except when outliers or strong skewness are present 3 Chapter 18. Inference about a Population Mean STAT 145 Using t procedures • Except in the case of small samples, the condition that the data are an SRS from the population of interest is more important than the condition that the population distribution is Normal. • Sample size less than 15: Use t procedures if the data appear close to Normal (roughly symmetric single peak, no outliers). If the data are clearly skewed or if outliers are present, do not use t. • Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness. • Large samples: The t procedures can be used even for clearly skewed distributions when the sample is large, roughly n≥40 . Can we use t procedures for these data? Figure18.8 shows plots of several data sets. For which of these can we safely use the t procedures?10 • Figure 18.8(a) is a histogram of the percent of each state’s adult residents who are college graduates. We have data on the entire population of 50 states, so inference is not needed. We can calculate the exact mean for the population. There is no uncertainty due to having only a sample from the population, and no need for a confidence interval or test. If these data were an SRS from a larger population, t inference would be safe despite the mild skewness because n = 50. Answer: Percent of adult college graduates in the 50 states. No, this is an entire population, not a sample. • Figure 18.8(b) is a stemplot of the force required to pull apart 20 pieces of Douglas fir. The data are 4 Chapter 18. Inference about a Population Mean STAT 145 strongly skewed to the left with possible low outliers, so we cannot trust the t procedures for n = 20. Answer: Force required to pull apart 20 pieces of Douglas fir. No, there are just 20 observations and strong skewness. • Figure 18.8(c) is a stemplot of the lengths of 23 specimens of the red variety of the tropical flower Heliconia. The data are mildly skewed to the right and there are no outliers. We can use the t distributions for such data. Answer: Lengths of 23 tropical flowers of the same variety. Yes, the sample is large enough to overcome the mild skewness. • Figure 18.8(d) is a histogram of the heights of the students in a college class. This distribution is quite symmetric and appears close to Normal. We can use the t procedures for any sample size. Answer: Heights of college students. Yes, for any size sample, because the distribution is close to Normal. 5 Chapter 18. Inference about a Population Mean STAT 145 You may be surprised at how variable 10 observations from a Normal(0,1) distribution looks like. Here are 25 samples. 6 Chapter 18. Inference about a Population Mean STAT 145 And here are samples of size n = 30. By viewing many versions of this of varying samples sizes you’ll develop your intuition about what a normal sample looks like. Problems 1 – 3 are for practice with Table C. Problem 1. A study of commuting times reports the travel times to work of a random sample of 1000 employed adults. The mean is ¯x = 49.2 minutes and the standard deviation is s = 63.9 minutes. What is the standard error of the mean? Problem 2. Use Table C to find a) the critical value for a one-sided test with level = 0.05 based on the t(4) distribution. b) the critical value for 98% confidence interval based on the t(26) distribution. 7 Chapter 18. Inference about a Population Mean STAT 145 Problem 3. ✴ You have an SRS of size 30 and calculate the one-sample t-statistic. What is the critical value t such that ✴ a) T has probability 0.025 to the right of t ? ✴ b) T has probability 0.75 to the left of t ? Problem 4. ✴ What critical value t from Table C would you use for a confidence interval for the mean of the population in each of the following situations? a) A 95% confidence interval based on n = 12 observations. b) A 99% confidence interval from an SRS of 18 observations. c) A 90% confidence interval from a sample of size 6. Problem 5. (Exercise 18.7) The composition of the earth's atmosphere may have changed over time. To try to discover the nature of the atmosphere long ago, we can examine the gas in bubbles inside ancient amber. Amber is tree resin that has hardened and been trapped in rocks. The gas in bubbles within amber should be a sample of the atmosphere at the time the amber was formed. Measurements on specimens of amber from the late Cretaceous era (75 to 95 million years ago) give these percents of nitrogen: 63.4 65 64.4 63.3 54.8 64.5 60.8 49.1 51.0 Assume that these observations are an SRS from the late Cretaceous atmosphere. Use a 90% confidence interval to estimate the mean percent of nitrogen in ancient air (Our present-day atmosphere is about 78.1% nitrogen). Problem 6. The one-sample t statistic for testing H 0 :μ=0 vs. H a :μ >0 from a sample of n = 20 observations has the value t = 1.84. a) What are the degrees of freedom for this statistic? ✴ b) Give the two critical values t from Table C that bracket t . What are the one-sided P-values for these two entries? c) Is the value t = 1.84 significant at the 5% level? Is it significant at the 1% level? 8 Chapter 18. Inference about a Population Mean STAT 145 Problem 7. The one-sample t statistic from a sample of n = 15 observations for the two-sided test of H 0 :μ=64 vs. H a :μ ≠64 has the value t = 2.12. a) What are the degrees of freedom for t ? ✴ b) Locate the two-critical values t from Table C that bracket t. What are the two-sided P-values for these two entries? c) Is the value t = 2.12 statistically sigfinicant at the 10% level? At the 5% level? Problem 8. Here's a new idea for treating advanced melanoma, the most serious kind of skin cancer. Genetically engineer white blood cells to better recognize and destroy cancer cells, then infuse these cells into patients. The subjects in a small initial study were 11 patients whose melanoma had not responded to existing treatments. One question was how rapidly the new cells would multiply after infusion, as measured by the doubling time in days. Here are the doubling times: 1.4 1.0 1.3 1.0 1.3 2.0 0.6 0.8 0.7 0.9 1.9 a) Examine the data. Is it reasonable to use the t procedures? b) Give a 90% confidence interval for the mean doubling time. Are you willing to use this interval to make an inference about the mean doubling time in a population of similar patients? Problem 9. Another outcome in the cancer experiment described above is measured by a test for the presence of cells that trigger an immune response in the body and so may help fight cancer. Here are data for the 11 subjects: counts of active cells per 100,000 cells before and after infusion of the modified cells. The difference (after minus before) is the response variable. a) Examine the data. Is it reasonable to use the t procedures? b) If your conclusion in part a) is "Yes", do the data give convincing evidence that the count of active cells is higher after treatment? 9 Chapter 18. Inference about a Population Mean STAT 145 Problem 10. A researcher claims that the yearly consumption of soft drinks per person is 51 gallons. In the city “A” the mayor thinks her residents are consuming more than the national average. In a sample of 20 randomly selected city residents, the mean was 53.6 gallons and the standard deviation was 5.4 gallons. With α = 0.05 (5% level of significance), is the researchers claim valid? Problem 11. (statement is different from Problem 10) A researcher claims that the yearly consumption of soft drinks per person is 52 gallons. In a sample of 50 randomly selected people, the mean was 56.3 gallons and the standard deviation was 3.5 gallons. With α = 0.05 (5% level of significance), is the researchers claim valid? 10