Worksheet
Transcription
Worksheet
Hypothesis Tests – Means – 2 samples 1. More eggs? Can a food additive increase egg production? Agricultural researchers want to design an experiment to find out. They have 100 hens available. They have two kinds of feed—the regular feed and the new feed with the additive. They plan to run their experiment for a month, recording the number of eggs each hen produces. b) Design an experiment that will require a twosample t-procedure to analyze the results. c) Design an experiment that will require a matchedpairs t-procedure to analyze the results. d) Which experiment would you consider the stronger design? Why? 2. MTV. Some students do homework with the TV on. (Anyone come to mind?) Some researchers want to see if people can work as effectively with as without distraction. The researchers will time some volunteers to see how long it takes them to complete some relatively easy crossword puzzles. During some of the trials, the room will be quiet, during other trials in the same room, a TV will be on, tuned to MTV. a) Design an experiment that will require a twosample t-procedure to analyze the results. b) Design an experiment that will require a matchedpairs t-procedure to analyze the results. c) Which experiment would you consider the stronger design? Why? 3. Women. Values for the labor force participation rate of women (LFPR) are published by the U.S. Bureau of Labor Statistics. We are interested in whether there was a difference between female participation in 1968 and 1972, a time of rapid change for women. We check LFPR values for 19 randomly selected cities for 1968 and 1972. Shown below is software output for two possible tests. Paired t-Test of (1 - 2) Test Ho: (1972-1968) = 0 vs Ha: (1972-1968) 0 Mean of Paired Differences = 0.0337 t-Statistic = 2.458 w/ 18 df p = 0.0244 2-Sample t-Test of 1 - 2 Ho: 1 - 2 = 0 Ha: 1 - 2 0 Test Ho: (1972) - (1968) = 0 vs Ha: (1972) - (1968) 0 Difference Between Means = 0.0337 t-Statistic = 1.496 w/ 35 df p = 0.1434 a) Which of these tests is appropriate for these data? Explain. b) Using the test you selected, state your conclusion. 4. Learning math. The Core Plus Mathematics Project (CPMP) is an innovative approach to teaching mathematics that engages students in group investigations and mathematical modeling. After field tests in 36 high schools over a three-year period, researchers compared the performances of CPMP students with those taught using a traditional curriculum. In one test, students had to solve applied algebra problems that did not allow them to use calculators. The table below shows the results. Are the mean scores of the two groups significantly different? Test an appropriate hypothesis and state your conclusion. Math n Mean SD program CPMP 312 29.0 18.8 Traditional 265 38.4 16.2 Performance on Algebraic Symbolic Manipulation Without Use of Calculators a) Write an appropriate hypothesis. b) Do you think the assumptions for inference are satisfied? Explain. c) Here is computer output for this hypothesis test. Explain what the P-value means in this context. 2-Sample t-Test of 1 - 2 0 t-Statistic = -6.496 w/ 583 df P < 0.0001 e) State a conclusion about the CPMP program. 5. Rain. Simpson, Alsen, and Eden (Technometrics 1975) report the results of trials in which clouds were seeded and the amount of rainfall recorded. The authors report on 26 seeded and 26 unseeded clouds in order of the amount of rainfall, largest amount first. Here are two possible tests to study the question of whether cloud seeding works. Which test is appropriate for these data? Explain your choice. Using the test you select, state your conclusion. Paired t-Test of (1 - 2) Mean of Paired Differences = -277.39615 t-Statistic = -3.641 w/ 25 df p = 0.0012 2-Sample t-Test of 1 - 2 Difference Between Means = -277.4 t-Statistic = -1.998 w/ 33 df p = 0.0538 a) Which of these tests is appropriate for these data? Explain. b) Using the test you selected, state your conclusion. 7. CPMP and word problems. The study of the new CPMP mathematics methodology described in Exercise 3 also tested students' abilities to solve word problems. This table shows how the CPMP and traditional groups performed. What do you conclude? Math n Mean SD program CPMP 320 57.4 32.1 Traditional 273 53.9 28.5 and some computer output from a two-sample t-test computed for the data. 8. Streams. Researchers collected samples of water from streams in the Adirondack Mountains to investigate the effects of acid rain. They measured the pH (acidity) of the water and classified the streams with respect to the kind of substrate (type of rock over which they flow). A lower pH means the water is more acidic. Here is a plot of the pH of the streams by substrate (limestone, mixed, or shale): 2-Sample t-Test of G - P > 0 Difference Between Means = -0.9914 t-Statistic = -1.540 w/196 df P = 0.9374 a) Explain in this context what the P-value means. b) State your conclusion about the effectiveness of ginkgo biloba. c) Proponents of ginkgo biloba continue to insist that it works. What type of error do they claim your conclusion makes? Explain. Here are selected parts of a software analysis comparing the pH of streams with limestone and shale substrates: 2-Sample t-Test of 1 - 2 Difference Between Means = 0.735 t-Statistic = 16.30 w/ 133 df p 0.0001 a) State the null and alternative hypotheses for this test. b) From the information you have, do the assumptions and conditions appear to be met? c) What conclusion would you draw? 9. Hurricanes. The data below show the number of hurricanes recorded annually before and after 1970. Create an appropriate visual display and determine whether these data are appropriate for testing whether there has been a change in the frequency of hurricanes. 1944-1969 3, 2, 1, 2, 4, 3, 7, 2, 3, 3, 2, 5, 2, 2, 4, 2, 2, 6, 0, 2, 5, 1, 3, 1, 0, 3 1970-2000 2, 1, 0, 1, 2, 3, 2, 1, 2, 2, 2, 3, 1, 1, 1, 3, 0, 1, 3, 2, 1, 2, 1, 1, 0, 5, 6, 1, 3, 5, 3 10. Memory. Does ginkgo biloba enhance memory? In an experiment to find out, subjects were assigned randomly to take ginkgo biloba supplements or a placebo. Their memory was tested to see whether it improved. Here are boxplots comparing the two groups 11. Job satisfaction. A company institutes an exercise break for its workers to see if this will improve job satisfaction, as measured by a questionnaire that assesses workers' satisfaction. Scores for 10 randomly selected workers before and after the implementation of the exercise program are shown. a) Identify the procedure you would use to assess the effectiveness of the exercise program, and check to see if the conditions allow use of that procedure. b) Test an appropriate hypothesis and state your conclusion. Worker number 1 2 3 4 5 6 7 8 9 10 Job satisfaction index Before 34 28 29 45 26 27 24 15 15 27 After 33 36 50 41 37 41 39 21 20 37 13. Sleep. W. S. Cosset (Student) refers to data recording the number of hours of additional sleep gained by 10 patients from the use of laevoliysocyamine hydmbromide. We want to see if there is strong evidence that the herb can help people get more sleep. a) State the null and alternative hypotheses clearly. b) A t-test of the null hypothesis of no gain has a tstatistic of 3.680 with 9 degrees of freedom. Find the P-value. c) Interpret this result by explaining the meaning of the P-value. d) State your conclusion regarding the hypotheses. e) This conclusion, of course, may be incorrect. If so, which type of error was made? 14. Gasoline. Many drivers of cars that can run on regular gas actually buy premium in the belief that they will get better gas mileage. To test that belief, we use 10 cars in a company fleet in which all the cars run on regular gas. Each car is filled first with either regular or premium gasoline, decided by a coin toss, and the mileage for that tank recorded. Then the mileage is recorded again for the same cars for a tank of the other kind of gasoline. We don't let the drivers know about this experiment. Here are the results (miles per gallon): Car # 1 2 3 4 5 6 7 8 9 10 Regular 16 20 21 22 23 22 27 25 27 28 Premium 19 22 24 24 25 25 26 26 28 32 a) Is there evidence that cars get significantly better fuel economy with premium gasoline? b) How big might that difference be? Check a 90% confidence interval. c) Even if the difference is significant, why might the company choose to stick with regular gasoline? d) Suppose you had done a Bad Thing. (We're sure you didn't.) Suppose you had mistakenly treated these data as two independent samples instead of matched pairs. What would the significance test have found? Carefully explain why the results are so different. 15. Yogurt. Do these data suggest that there is a significant difference in calories between servings of strawberry and vanilla yogurt? Test an appropriate hypothesis and state your conclusion. Don't forget to check assumptions and conditions! Strawberry Vanilla America's Choice 210 200 Breyer's Lowfat 220 220 Columbo 220 180 Dannon Light 'n Fit 120 120 Dannon Lowfat 210 230 Dannon laCreme 140 140 Great Value 180 80 La Yogurt 170 160 Mountain High 200 170 Stonyfield Farm 100 120 Yoplait Custard 190 190 Yoplait Light 100 100 16. Caffeine. A student experiment investigating the potential impact of caffeine on studying for a test involved 30 subjects, randomly divided into two groups. Each group took a memory test. The subjects then each drank two cups of regular (caffeinated) cola or caffeine-free cola. Thirty minutes later they each took another version of the memory test, and the changes in their scores were noted. Among the 15 subjects who drank caffeine, scores fell an average of 0.933 points, with a standard deviation of 2.988 points. Among the no-caffeine group, scores went up an average of 1.429 points with a standard deviation of 2.441 points. Assumptions of Normality were deemed reasonable based on histograms of differences a) Did scores change significantly for the group who drank caffeine? Test an appropriate hypothesis and state your conclusion. b) Did scores change significantly for the no-caffeine group? Test an appropriate hypothesis and state your c) Does this indicate that some mystery substance in noncaffeinated soda may aid memory? What other explanation is plausible? 17. Hard water. In an investigation of environmental causes of disease, data were collected on the annual mortality rate (deaths per 100,000) for males in 61 large towns in England and Wales. In addition, the water hardness was recorded as the calcium concentration (parts per million, ppm) in the drinking water. The data set also notes for each town whether it was south or north of Derby. Is there a significant difference in mortality rates in the two regions? Here are the summary statistics. Summary of: mortality For categories in: Derby Group Count Mean Median StdDev North 34 1631.59 1631 138.470 South 27 1388.85 1369 151.114 a) Test appropriate hypotheses and state your conclusion. b) The boxplots of the two distributions show an outlier among the data north of Derby. What effect might that have had on your test? 18. Brain waves. An experiment was performed to see whether sensory deprivation over an extended period of time has any effect on the alpha-wave patterns produced by the brain. To determine this, 20 subjects, inmates in a Canadian prison, were randomly split into two groups. Members of one group were placed in solitary confinement. Those in the other group were allowed to remain in their own cells. Seven days later, alpha-wave frequencies were measured for all subjects, as shown in the following table (P. Gendreau et al, "Changes in EEC Alpha Frequency and Evoked Response Latency During Solitary Confinement," journal of Abnormal Psychology 79 11972]: 54-59): Nonconfined Confined 10.7 9.6 10.7 10.4 10.4 9.7 10.9 10.3 10.5 9.2 10.3 9.3 9.6 9.9 11.1 9.5 11.2 9.0 10.4 10.9 a) What are the null and alternative hypotheses? Be sure to define all the terms and symbols you use. b) Are the assumptions necessary for inference met? c) Perform the appropriate test, indicating the formula you used, the calculated value of the test statistic, and the P-value. d) State your conclusion. 19. Summer school. Having done poorly on their math final exams in June, six students repeat the course in summer school, then take another exam in August. If we consider these students representative of all students who might attend this summer school in other years, do these results provide evidence that the program is worthJune 54 49 68 66 62 62 Aug. 50 65 74 64 68 72 20. Lower scores? Newspaper headlines recently announced a decline in science scores among high school seniors. In 2000, 15,109 seniors tested by The National Assessment in Education Program (NAEP) scored a mean of 147 points. Four years earlier, 7537 seniors had averaged 150 points. The standard error of the difference in the mean scores for the two groups was 1.22. a) Have the science scores declined significantly? Cite appropriate statistical evidence to support your conclusion. b) The sample size in 2000 was almost double that in 1996. Does this make the results more convincing, or less? Explain. 21. The Internet. The NAEP report described in Exercise 20 compared science scores for students who had home Internet access with the scores of those who did not, as shown in the graph. They report that the differences are statistically significant. a) Explain what "statistically significant" context. b) If their conclusion is incorrect, which type of error did the researchers commit? c) Does this prove that using the Internet at home improve a student's performance in science? 22. Music and memory. Is it a good idea to listen to music when studying for a big test? In a study conducted by some Statistics students, 62 people were randomly assigned to listen to rap music, music by Mozart, or no music while attempting (o memorize objects pictured on a page. They were then asked to list all the objects they could remember. Here are summary statistics for each group: Rap Mozart No Music Count 29 20 13 Mean 10.72 10.0 12.77 StDev 3.99 3.19 4.73 a) Does it appear that it is better to study while listening to Mozart than to rap music? Test an appropriate hypothesis and state your conclusion. b) Create a 90% confidence interval for the mean difference in memory score between students who study to Mozart and those who listen to no music at all. Interpret your interval. 23. Rap. Using the results of the experiment described in Exercise 22, does it matter whether one listens to rap music while studying, or is it better to study without music at all? a) Test an appropriate hypothesis and state your conclusion. b) If you concluded there is a difference, estimate the size of that difference with a confidence interval and explain what your interval means. Hypothesis Tests – Means – 2 samples Answers: 1. a) Randomly assign 50 hens to each of the two kinds of feed. Compare production at the end of the month. b) Give all 100 hens the new feed for 2 weeks and the old food for 2 weeks, randomly selecting which feed the hens get first. Analyze the differences in production for all 100 hens. c) Matched pairs. Because hens vary in egg production, the matched-pairs design will control for that. 2. a) Randomly assign half the volunteers to do the puzzles in a quiet room, half to do them with MTV on. Compare the times. b) Randomly assign half the volunteers to do a puzzle in a quiet room, half to do a puzzle with MTV on. Then have each do a puzzle under the other condition. Look at the differences in completion times. c) Matched pairs. People vary in their ability to do crossword puzzles. 3. a) Matched pairs — same cities in different time periods. b) There is a significant difference (P-value = 0.0244) in the labor force participation rate for women in these cities; women's participation increased between 1968 and 1972. 4. a) H0: C - T = 0 vs. HA: C - T 0 b) Yes. Groups are independent, though we don't know if students were randomly assigned to the programs. Sample sizes are large, so CLT applies. c) If the means for the two programs are really equal, there is less than a 1 in 10,000 chance of seeing a difference as large as or larger than the observed difference just from natural sampling variation. d) On average, students who learn with the CPMP method do significantly worse on algebra tests that do not allow them to use calculators than students who learn by traditional methods. 5. a) 2-sample. Clouds are independent of one another. b) Based on these data, there is some evidence of a difference (P-value 0.0538) in the amount of rain between seeded and unseeded clouds. 7. H0: C - T = 0 vs. HA: C - T 0. t = 1.406, df = 590.05, P-value = 0.1602. Because of the large P-value, we fail to reject H0. Based on this sample, there is no evidence of a difference in mean scores on a test of word problems, whether students learned with CPMP or traditional methods. 8. a) H0: L - S = 0 vs. HA: L - S 0 b) Don't know if the streams were a random sample, or whether they are less than 10 % of all Adirondack streams. Boxplots show outliers and Shale may be skewed (median is equal to Q1 or Q3), but samples are large. c) Based on these data, it appears that water flowing over limestone is less acidic, on average, than water flowing over shale. 9. There are several concerns here. First, we don't have a random sample. We have to assume that the actual number of hurricanes in a given year is a random sample of the hurricanes that might occur under similar weather conditions. Also, the data for 1944-1969 are not symmetric and have three outliers. The outliers will tend to make the average for the period 1944-1969 larger. These data are not appropriate for inference. The boxplots provide little evidence of a change in the mean number of hurricanes in the two periods. 10. a) If there is no difference between ginkgo and the placebo, there is a 93.74% chance of seeing a difference as large or larger as that observed, just from natural sampling variation. b) There is no evidence based on this study that ginkgo biloba improves memory, as the difference in mean memory score was not significant. c) Type ll 11. a) Paired sample test. Data are before/after for the same workers; workers randomly selected; assume less than 10% of all this company's workers; boxplot of differences shows them to be symmetric with no outliers. b) H0: D = 0 vs. HA: D > 0. t = 3.60, P-value = 0.0029. Because P < 0.01, reject H0. These data show that average job satisfaction has increased after implementation of the exercise program. 13. a) H0: D = 0 vs. HA: D > 0 b) 0.0025 c) If there is no gain of additional hours of sleep with the herb, the chance of seeing a mean difference as large or larger than the one observed is about onequarter percent. d) The data provide evidence that the herb is helpful in gaining additional sleep. e) Type I 14. a) H0: D = 0 vs. HA: D > 0. t = 4.47, P-value = 0.0008. Because of the very small P-value, we reject H0. These data provide strong evidence that cars get significantly better mileage, on average with premium than with regular gasoline. b) (1.18, 2.82) gallons c) Premium gasoline costs more than regular. d) t = 1.25, P-value is 0.1144. Would have decided no difference. The variation in the cars' performances is larger than the differences. 15. H0: D = 0 vs. HA: D 0. Data are paired by brand; brands are independent of each other; less than 10% of all yogurts (questionable); boxplot of differences shows an outlier (100) for Great Value With the outlier included, the mean difference (Strawberry-Vanilla) is 12.5 calories with a t-stat of 1.332 with 11 df, for a P-value of 0.2098. Deleting the outlier, the difference is even smaller, 4.55 calories with a t-stat of only 0.833 and a P-value of 0.4241. With P-values so large, we do not reject H0. We conclude that the data do not provide evidence of a difference in mean calories. 16. a) H0: D = 0 vs. HA: D 0. t = -1.21, P-value = 0.2466. Since P > 0.05, fail to reject H0. There is no evidence that the mean score will change after using caffeine. b) H0: D = 0 vs. HA: D 0. t = 2.27, P-value = 0.0397. Since P < 0.05, reject H0. There is evidence that the mean score will increase when no caffeine is used. c) No. Might be variation due to the people in the two groups that produced a Type I error. (Answers will vary.) 17. a) H0: N - S = 0 vs. HA: N - S 0. t = 6.47, df = 53.49, P-value = 3.2 X 10-8. Because the P-value is low, we reject H0. On the basis of these data, there is clear evidence that mortality rates are different. The mean rate in the north is significantly higher. b) It will raise y for the north, but from looking at the boxplots and the fact that the mean and median are nearly the same, it probably will not change the conclusion of the test. 18. a) H0: NC - C = 0 vs. HA: NC - C 0; NC is the mean for nonconfined inmates, C is the mean for inmates confined to solitary. b) Groups are independent of each other, not paired; random assignment to groups, less than 10% of all inmates, boxplot shows no outliers in either group. c) 2-sample t-test statistic: 3.357, P-value = 0.0038 d) Because the P-value is so small, we reject H0. Solitary confinement makes a difference in mean alpha-wave frequencies; those subjected to confinement have lower frequencies. 19. These are before and after scores for the same individuals, not independent samples. 20. a) The 95% confidence interval for the difference is (0.61, 5.39). 0 is not in the interval, so scores in 1996 were significantly higher. (Or the t, with more than 7500 df, is 2.459 for a P-value of 0.0069) b) Since both samples were very large, there shouldn't be a difference. 21. a) The observed differences are too large to attribute to chance or natural sampling variation. b) Type I c) No. There may be many other factors. 22. a) H0: M - R = 0 vs. HA: M - R > 0. t = -0.70, df = 45.88, P-value = 0.7563. Because the P-value is so large, we do not reject H0. These data provide no evidence that listening to Mozart while studying is better than listening to rap. b) With 90% confidence, the average difference in score is between (0.189 and 5.357) objects more for those who listen to no music while studying, based on these samples. 23. a) H0: M - R = 0 vs. HA: M - R < 0. t = -1.36, df = 20.00, P-value = 0.944. Because the P-value is large, we fail to reject H0. These data show no evidence of a difference in mean number of objects recalled between listening to rap or no music at all. b) Didn't conclude a difference.