More about Analysis of Variance
Transcription
More about Analysis of Variance
CHAPTER 30 Robbie Shone/Science Source More about Analysis of Variance In this chapter we cover... 30.1 Beyond one-way ANOVA 30.2 Follow-up analysis: Tukey pairwise multiple comparisons 30.3 Follow-up analysis: contrasts* A nalysis of variance (ANOVA) is a statistical method for comparing the means of several populations based on independent random samples or on the mean responses to several treatments in a randomized comparative experiment. When we compare just two means, we use the two-sample t procedures described in Chapter 21. ANOVA allows comparison of any number of means. The basic form of ANOVA is one-way ANOVA, which treats the means being compared as mean responses to different values of a single variable. For example, in Chapter 27 we used one-way ANOVA to compare the mean lengths of three varieties of tropical flowers and the mean number of species per plot after each of three logging conditions. 30.4 Two-way ANOVA: conditions, main effects, and interaction 30.5 Inference for two-way ANOVA 30.6 Some details of two-way ANOVA* 30.1 Beyond one-way ANOVA You should recall or review the big ideas of one-way ANOVA from Chapter 27. One-way ANOVA compares the means 1, 2, # # # , I of I populations based on samples of sizes n1, n2, # # # , nI from these populations. ■ The conditions for ANOVA require independent random samples from each of the I populations (or a randomized comparative experiment with I treatments), Normal distributions for the response variable in each population, and a common standard deviation in all populations. Fortunately, ANOVA inference is quite robust against moderate violations of the Normality and common standard deviation conditions. 30-1 30-2 CHAP TER 3 0 ■ More about Analysis of Variance ■ multiple comparisons ■ ■ ANOVA F statistic Using many separate two-sample t procedures to compare many pairs of means is a bad idea because we don’t get a P-value or a confidence level for the complete set of comparisons together. This is the problem of multiple comparisons. One-way ANOVA gives a single test for the null hypothesis that all the population means are the same against the alternative hypothesis that not all are the same. ANOVA works by comparing how far apart the sample means are relative to the variation among individual observations in the same sample. The test statistic is the ANOVA F statistic F F distribution E X A M P L E 3 0 .1 variation among the sample means variation among individuals in the same sample The P-value comes from an F distribution. ■ In basic statistical practice, we combine the F test with data analysis to check the conditions for ANOVA and to see which means appear to differ and by how much. Here is an example that illustrates one-way ANOVA. ■ Logging in the Rain Forest 4step Fletcher & Baylis/Science Source STATE: How does logging in a tropical rain forest affect the forest in later years? Researchers compared forest plots in Borneo that had never been logged (Group A) with similar plots nearby that had been logged one year earlier (Group B) and eight years earlier (Group C). Although the study was not an experiment, the authors explain why we can consider the plots to be randomly selected. Table 30.1 displays data on the number of tree species found in each plot.1 Is there evidence that the mean numbers of species in the three groups differ? DATA PLAN: Describe how the three samples appear to differ, check the conditions for ANOVA, and carry out the one-way ANOVA F test to compare the three population means. TREESPEC T A B L E 3 0 . 1 COUNTS OF TREE SPECIES IN FOREST PLOTS IN BORNEO GROUP A NEVER LOGGED GROUP B LOGGED ONE YEAR AGO GROUP C LOGGED EIGHT YEARS AGO 22 18 22 20 15 21 13 13 19 13 19 15 11 11 14 7 18 15 15 12 13 2 15 8 17 4 18 14 18 15 15 10 12 30.1 Group A (never) Group B (1 year) Group C (8 years) 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 1 7 18 19 20 21 22 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 1 7 18 19 20 21 22 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 1 7 18 19 20 21 22 000 00 0 00 0 0 00 0 0 0 00 0 0 0 000 0 Beyond One-Way ANOVA 30-3 FIG U RE 30.1 Side-by-side stemplots comparing the counts of tree species in three groups of forest plots, from Table 30.1. 0 0 0 0 00 0 00 SOLVE: The stemplots in Figure 30.1 show that the distributions are irregular in shape, as is common for small samples. None of the distributions are strongly skewed, and there are no extreme outliers. The unlogged Group A plots tend to have more species than the two logged groups and also show less variation among plots. The Minitab ANOVA output in Figure 30.2 shows that the standard deviations satisfy our rule of thumb (page 657) that the largest (4.5) is no more than twice the smallest (3.529). The output also shows that the mean number of species in Group A is higher than in Groups B and C. The ANOVA F test is highly significant, with P-value P 0.006. Minitab Session One-way ANOVA: Species versus Group Source Group Error Total DF 2 30 32 S = 4.120 SS 204.4 509.2 713.6 MS 102.2 17.0 R-Sq = 28.64% F 6.02 P 0.006 R-Sq(adj) = 23.88% Individual 95% CIs for Mean Based on Pooled StDev Level l year 8 years never N 12 9 12 Mean 11.750 13.667 17.500 StDev 4.372 4.500 3.529 12.0 Pooled StDev = 4.120 15.0 18.0 21.0 FIG URE 30.2 Minitab ANOVA output for the logging study of Example 30.1. 30-4 CHAP TER 3 0 ■ More about Analysis of Variance CONCLUDE: The data provide strong evidence (F 6.02, P 0.006) that the mean number of species per plot is not the same in all three groups. It appears that the mean species count is higher in unlogged plots. ■ This chapter moves beyond basic one-way ANOVA in two directions. Follow-up analysis. The ANOVA F test in Example 30.1 tells us only that the three population means are not the same. We would like to say which means differ and by how much. For example, do the data allow us to say that the “never logged” population does have a higher mean species count than the “logged one year ago” and the “logged eight years ago” populations of forest plots? This is a follow-up analysis to the F test that goes beyond data analysis to confidence intervals and tests of significance for specific comparisons of means. Two-way ANOVA. Example 30.1, and one-way ANOVA in general, compares mean responses for several values of just one explanatory variable. In Example 30.1, that variable is “how long ago this plot was logged.” Suppose that we have data on two explanatory variables: say, how long ago the plot was logged and whether it is located in a river bottom or in the highlands. There are now six groups formed by combinations of time since logging and location, as follows: Time since Logging Never logged One year Eight years Plot Location River bottom Highlands Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 One-way ANOVA will still tell us if there is evidence that mean species counts in these six types of plot differ. But we want more: Does location matter? Does time since logging matter? And do these two variables interact? That is, does the effect of logging change when we move from river bottoms to highlands? Perhaps trees regrow faster in river bottoms, so that logging has less effect there than in the highlands. To answer these questions, we must extend ANOVA to take into account the fact that the six groups are formed from two explanatory variables. This is two-way ANOVA. We will first discuss follow-up analysis in one-way ANOVA. Fortunately, the distinction between one-way and two-way doesn’t affect the follow-up methods we will present. So once we have mastered these methods in the one-way setting, we can apply them immediately to two-way problems. Apply Your Knowledge 4step These exercises review one-way ANOVA. Follow the Plan, Solve, and Conclude steps of the four-step process, as illustrated in Example 30.1. In the next section, you will do follow-up analysis in all these settings. 30.1 Good Weather and Tipping. Exercise 27.35 (page 672) gives the data for a study on the effect of a waitress’s weather prediction on the tipping percent. Carry out data analysis and ANOVA to determine whether there 30.1 DATA are differences among the mean tipping percents for the three experimental conditions. TIPPING 30.2 Which Color Attracts Beetles Best? To detect the presence of harmful DATA insects in farm fields, we can put up boards covered with a sticky material and examine the insects trapped on the boards. Which colors attract insects best? Experimenters placed six boards of each of four colors at random locations in a field of oats and measured the number of cereal leaf beetles trapped. Here are the data:2 BEETLES Board Color Blue Green White Yellow Beetles Trapped 16 37 21 45 11 32 12 59 20 20 14 48 21 29 17 46 14 37 13 38 7 32 20 47 DATA Is there evidence that the colors differ in their ability to attract beetles? 30.3 Dogs, Friends, and Stress. If you are a dog lover, perhaps having your dog along reduces the effect of stress. To examine the effect of pets in stressful situations, researchers recruited 45 women who said they were dog lovers. The EESEE story “Stress among Pets and Friends” describes the results. Fifteen of the subjects were randomly assigned to each of three groups to do a stressful task alone (the control group), with a good friend present, or with their dog present. The subject’s mean heart rate during the task is one measure of the effect of stress. Table 30.2 displays the data. Are there significant differences among the mean heart rates under the three conditions? STRSSPET T A B L E 3 0 . 2 HEART RATES (BEATS PER MINUTE) AFTER EXERCISE UNDER THREE CONDITIONS: P ⫽ WITH PET, F ⴝ WITH FRIEND, C ⫽ CONTROL GROUP GROUP P P C F C C P P P C C P C C P HEART RATE 69.169 68.862 84.738 99.692 87.231 84.877 70.169 64.169 58.692 80.369 91.754 79.662 87.446 87.785 69.231 GROUP P F C F F C F C C P P F F P F HEART RATE 75.985 91.354 73.277 83.400 100.877 84.523 102.154 77.800 70.877 86.446 97.538 89.815 80.277 85.000 98.200 GROUP C F F C F P C C P F F F F P P HEART RATE 90.015 101.062 76.908 99.046 97.046 69.538 75.477 62.646 70.077 88.015 81.600 86.985 92.492 72.262 65.446 Beyond One-Way ANOVA 30-5 30-6 CHAP TER 3 0 ■ More about Analysis of Variance 30.2 Follow-up analysis: Tukey pairwise multiple comparisons In Example 30.1, we saw that there is good evidence that the mean number of tree species is not the same for plots that have never been logged, plots logged one year ago, and plots logged eight years ago. The sample means in Figure 30.2 suggest that (as we might expect) the mean species count is highest in never-logged plots and lowest in plots logged just a year ago. EX AM PLE 30.2 Comparing Groups: Individual t Procedures How much higher is the mean count of tree species in plots that have never been logged than in plots logged a year ago? A 95% confidence interval comparing Groups A and B answers this question. Because the conditions for ANOVA require the population standard deviation to be the same in all three populations of plots, we will use a version of the two-sample t confidence interval that also assumes equal standard deviations. The standard error for the difference of sample means xA xB estimates the standard deviation of the difference, which is 2 2 1 1 nB nB B nA B nA because both populations have the same standard deviation . The pooled standard deviation sp is an estimate of based on all three samples (see page 663). So the standard error of xA xB is sp 1 1 nB B nA The Minitab output in Figure 30.2 gives sp 4.120. This estimate has 30 degrees of freedom, the degrees of freedom for “Error” in the ANOVA. A 95% confidence interval for A B uses the critical value of the t distribution with 30 degrees of freedom: 1xA xB 2 t*sp 1 1 nB B nA 117.50 11.752 12.0422 14.1202 1 1 B 12 12 5.75 3.43 2.32 to 9.18 We are 95% confident that there are on the average between 2.32 and 9.18 more tree species in plots that have never been logged. Because this confidence interval does not contain zero, we can reject the null hypothesis of no difference, H0: A B, in favor of the two-sided alternative at the 5% significance level. ■ pairwise differences Example 30.2 gives a single 95% confidence interval. We would like to estimate all three pairwise differences among the population means, A B ! CAUTION A C B C Three 95% confidence intervals will not give us 95% confidence that all three simultaneously capture their true parameter values. This is the problem of multiple comparisons, discussed on page 647. 30.2 Follow-up Analysis: Tukey Pairwise Multiple Comparisons In general, we want to give confidence intervals for all pairwise differences among the population means 1, 2, # # # , I of I populations. We want an overall confidence level of (say) 95%. That is, in very many uses of the method, all the intervals will simultaneously capture the true differences 95% of the time. To do this, take the number of comparisons into account by replacing the t critical value t* in Example 30.2 by another critical value based on the distribution of the difference between the largest and smallest of a set of I sample means. We will call this critical value m*, for multiple comparisons. Values of m* depend on the number of populations we are comparing and on the total number of observations in the samples, as well as on the confidence level we want. Tables are therefore long and messy, so in practice we rely on software. This method is named after its inventor, John Tukey (1915–2000), who developed the ideas of modern data analysis. 30-7 overall confidence level Alfred Eisenstaedt/Time Life Pictures/Getty Images Tu ke y Pa i r w i s e M u l t i p l e C o m p a r i s o n s In the ANOVA setting, we have independent SRSs of size ni from each of I populations having Normal distributions with means i and a common standard deviation . Tukey simultaneous confidence intervals for all pairwise differences i ⫺ j among the population means have the form 1x i ⫺ x j 2 ⫾ m*sp 1 1 ⫹ nj B ni Here xi is the sample mean of the ith sample and sp is the pooled estimate of . The critical value m* depends on the confidence level C, the number of populations I, and the total number of observations. To carry out simultaneous tests of the hypotheses H0: i ⫽ j Ha: i ⬆ j for all pairs of population means at fixed significance level ␣ ⫽ 1 ⫺ C, reject H0 for any pair whose confidence interval does not contain zero. If all samples are the same size, Tukey simultaneous confidence intervals provide overall confidence level C. That is, C is the probability that all of the intervals simultaneously capture the true pairwise differences. If the samples differ in size, the true confidence level is at least as large as C, so that conclusions are conservative. Similarly, if all the samples are the same size, the Tukey simultaneous tests have overall significance level 1 ⫺ C. That is, 1 ⫺ C is the probability that when all the population means are equal, any of the tests incorrectly rejects its null hypothesis. If the samples differ in size, the true significance level is smaller than 1 ⫺ C, so that conclusions are conservative. EX AM PLE 30.3 overall confidence level overall significance level Logging in the Rain Forest: Multiple Intervals Figure 30.3 contains more Minitab output for the ANOVA comparing the mean counts of tree species in three groups of forest plots in Borneo. We asked for Tukey multiple comparisons with an overall error rate of 5%. That is, the overall confidence level for the three intervals together is 95%. 30-8 CHAP TER 3 0 ■ More about Analysis of Variance Minitab Session Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of Group Individual confidence level = 98.05% Group = l year subtracted from: Group 8 years never Lower −2.567 1.599 Center 1.917 5.750 Upper 6.400 9.901 −5.0 0.0 5.0 10.0 −5.0 0.0 5.0 10.0 Group = 8 years subtracted from: Group never Lower −0.650 Center 3.833 Upper 8.317 F I G U R E 30.3 Additional Minitab ANOVA output for the logging study of Example 30.3, showing Tukey simultaneous confidence intervals. The format of the Minitab output takes some study. Be sure you can see that the Tukey confidence intervals are 1.599 to 9.901 0.650 to 8.317 6.400 to 2.567 for for for A B A C B C The interval for A B is wider than the individual 95% confidence interval in Example 30.2. The wider interval is the price we pay for having 95% confidence not just in one interval but in all three simultaneously. ■ E X A M P L E 3 0 .4 Logging in the Rain Forest: Multiple Tests The ANOVA null hypothesis is that all population means are equal, H0: A B C We know from the output in Figure 30.2 that the ANOVA F test rejects this hypothesis (F 6.02, P 0.006). So we have good evidence that some pairs of means are not the same. Which pairs? Look at the simultaneous 95% confidence intervals in Example 30.3. Which of these intervals do not contain zero? If an interval does not contain zero, we reject the hypothesis that this pair of population means are equal. The conclusions are We can reject We cannot reject We cannot reject H0: A B H0: A C H0: B C This Tukey simultaneous test of three null hypotheses has the property that when all three hypotheses are true, there is only 5% probability that any of the three tests wrongly rejects its hypothesis. ■ 30.2 Follow-up Analysis: Tukey Pairwise Multiple Comparisons Think for a moment about the conclusion of Example 30.4. At first glance, it appears to say that A equals C and that B equals C, but that A and B are not equal. That sounds like nonsense. Now is the time to recall what a test at a fixed significance level such as 5% tells us: either we do have enough evidence to reject the null hypothesis, or the data do not give enough evidence to allow rejection. There is no contradiction in saying that We do have enough evidence to conclude that A ⬆ B We do not have enough evidence to conclude that A ⬆ C We do not have enough evidence to conclude that B ⬆ C That is, xA 17.500 and xB 11.750 are far enough apart to conclude that the population means differ, but neither 17.500 nor 11.750 is far enough from xC 13.667. Notice that the Tukey method does not give a P-value for the three tests taken together. Rather, we have a set of “reject” or “fail to reject” conclusions with an overall significance level that we fixed in advance, 5% in this example. There are many other multiple-comparisons procedures that produce various simultaneous confidence intervals with an overall confidence level or simultaneous tests with an overall probability of any false rejection. The Tukey procedures are probably the most useful. If you can interpret results from Tukey, you can understand output from any multiple-comparisons procedure. Apply Your Knowledge 30.4 Good Weather and Tipping. In Exercise 30.1, you carried out basic DATA ANOVA to compare the mean tipping percents for three experimental conditions. TIPPING DATA (a) Find the Tukey simultaneous 95% confidence intervals for all pairwise differences among the three population means. (b) Explain in simple language what “95% confidence” means for these intervals. (c) Which pairs of means differ significantly at the overall 5% significance level? 30.5 Which Color Attracts Beetles Best? Exercise 30.2 presents data on the numbers of cereal leaf beetles trapped by boards of four different colors. Yellow boards appear most effective. ANOVA gives very strong evidence that the colors differ in their ability to attract beetles. BEETLES DATA (a) How many pairwise comparisons are there when we compare four colors? (b) Which pairs of colors are significantly different when we require significance level 5% for all comparisons as a group? In particular, is yellow significantly better than every other color? 30.6 Dogs, Friends, and Stress. In Exercise 30.3, the ANOVA F test had a very small P-value, giving good reason to conclude that mean heart rates under stress do differ depending on whether a pet, a friend, or no one is present. Do the means for the two treatments (pet, friend) differ significantly from each other and from the mean for the control group? STRSSPET (a) What are the three null hypotheses that formulate these questions? (b) We want to be 90% confident that we don’t wrongly reject any of the three null hypotheses. Tukey pairwise comparisons can give conclusions that meet this condition. What are the conclusions? 30-9 30-10 CHAP TER 3 0 ■ More about Analysis of Variance 30.3 Follow-up analysis: contrasts* Multiple comparisons methods give conclusions about all comparisons in some class with a measure of confidence that applies to all the comparisons taken together. For example, Tukey’s method gives conclusions about all pairwise differences among a set of population means. These methods are most useful when we did not have any specific comparison in mind before we produced the data. Multiple comparisons procedures sometimes give tests or confidence intervals for comparisons that don’t interest us. And they may leave out comparisons that do interest us. If we have specific questions in mind before we produce data, it is more efficient to plan an analysis that asks these specific questions. Do note that it is not legitimate to look at the data and then formulate a question suggested by the ! data. If data suggest a specific effect, you need new data to give evidence CAUTION that the effect isn’t just chance variation at work. DATA EX AM PLE 30.5 BEETLES Which Color Attracts Beetles Best? What color should we use on sticky boards placed in a field of oats to attract cereal leaf beetles? Exercise 30.2 gives data from an experiment in which 24 boards, six of each color in blue, green, white, and yellow, were placed at random locations in a field. ANOVA shows that there are significant differences among the mean numbers of beetles trapped by these colors. We might follow ANOVA with Tukey pairwise comparisons (Exercise 30.5): But in fact we have specific questions in mind: we suspect that warm colors are generally more attractive than cold colors. That is, before we produce any data, we suspect that blue and white boards have similar properties, that green and yellow boards are similar, and also that the average beetle count for green and yellow is greater than the average count for blue and white. We therefore want to test three hypotheses: Hypothesis 1 H0: B W Ha: B ⬆ W Hypothesis 2 H0: G Y Ha: G ⬆ Y Hypothesis 3 H0: 1G Y 2兾2 1B W 2兾2 Ha: 1G Y 2兾2 1B W 2兾2 Two of these hypotheses involve pairwise comparisons. The third does not and also has a one-sided alternative. ■ We can ask questions about population means by specifying contrasts among the means. Contrasts In the ANOVA setting comparing the means 1, 2, # # # , I of I populations, a population contrast is a combination of the means L c11 c22 p cII with numerical coefficients that add to zero, c1 c2 p cI 0. *This material is optional. 30.3 Attracting Beetles: Contrasts We can restate the three hypotheses in Example 30.5 in terms of three contrasts: L1 B W 112 1B 2 102 1G 2 112 1W 2 102 1Y 2 L2 G Y 102 1B 2 112 1G 2 102 1W 2 112 1Y 2 L3 1G Y 2兾2 1B W 2兾2 11兾22 1B 2 11兾22 1G 2 11兾22 1W 2 11兾22 1Y 2 DATA E X A M P L E 3 0 .6 Follow-up Analysis: Contrasts BEETLES Check that the four coefficients in each line do add to zero. In terms of these contrasts, the hypotheses become Hypothesis 1 H0: L1 0 Ha : L 1 ⬆ 0 Hypothesis 2 H0: L2 0 Ha : L 2 ⬆ 0 Hypothesis 3 H0: L3 0 Ha : L 3 0 ■ Some statistical software will test hypotheses and give confidence intervals for any contrasts you specify. Other software lacks this capability, but it is easy to work with just information from basic ANOVA output. Here’s how. To estimate a population contrast L c11 c22 p cII use the corresponding sample contrast ˆL c x c x p c x 1 1 2 2 I I ˆ The sample contrast L has standard error (estimated standard deviation) SE ˆL sp c21 c22 c2I p n2 nI B n1 I n f e re n c e a b o u t a Po p u l a t i o n C o n t r a s t In the ANOVA setting, a level C confidence interval for a population contrast L is ˆL t*SEˆ L where L̂ is the corresponding sample contrast and t* is a critical value from the t distribution with the degrees of freedom for error in the ANOVA. To test the hypothesis H0: L ⫽ 0, use the t statistic ˆL t SEˆL with the same degrees of freedom. For one-way ANOVA, the degrees of freedom for error are N I, where N is the total number of observations and I is the number of populations compared (see page 660). The box states the result more generally so that it applies to two-way ANOVA as well as one-way. If the contrast is a pairwise difference between means, the contrast confidence interval is exactly the individual confidence interval illustrated in Example 30.2. sample contrast 30-11 30-12 CHAP TER 3 0 ■ E X A M P L E 3 0 .7 More about Analysis of Variance Attracting Beetles: Inference for Contrasts Figure 30.4 displays the Minitab ANOVA output for the study on attracting cereal leaf beetles. The pooled estimate of is sp 5.672, and the number of degrees of freedom for error is 20. Minitab does not offer contrasts, so we must use a calculator. The three sample contrasts and their standard errors are L̂1 1.334 SE1 3.2747 L̂2 16.000 SE2 3.2747 L̂3 23.667 SE3 2.3153 Minitab Session One-way ANOVA: Beetles versus Color Source Color Error Total DF 3 20 23 S = 5.672 SS 4134.0 643.3 4777.3 MS 1378.0 32.2 R-Sq = 86.53% F 42.84 P 0.000 R-Sq(adj) = 84.51% Individual 95% CIs for Mean Based on Pooled StDev Level Blue Green White Yellow N 6 6 6 6 Mean 14.833 31.167 16.167 47.167 StDev 5.345 6.306 3.764 6.795 12 FIG U R E 3 0 . 4 Minitab ANOVA output for the study on attracting cereal leaf beetles, Example 30.7. 24 36 48 Pooled StDev = 5.672 Here are the details for the third contrast: L3 11兾22 1B 2 11兾22 1G 2 11兾22 1W 2 11兾22 1Y 2 L̂3 11兾22 114.8332 11兾22 131.1672 11兾22 116.1672 11兾22 147.1672 23.667 11兾22 2 11兾22 2 11兾22 2 11兾22 2 B 6 6 6 6 15.6722 10.40822 2.3153 SE3 15.6722 If you use software, your answers may differ slightly due to roundoff error in the hand calculations. A 95% confidence interval for L3 uses t* 2.086 from Table C with df 20, Lˆ t* SE 23.667 12.0862 12.31532 3 3 23.667 4.830 18.837 to 28.497 We are 95% confident that the average number of beetles attracted by green and yellow boards exceeds the average for blue and white boards by between about 18.8 and 28.5 beetles per board. 30.4 Two-Way ANOVA: Conditions, Main Effects, and Interaction There is very strong evidence that the population contrast L3 is greater than zero. The t statistic is t L̂3 23.667 10.22 SE3 2.3153 with 20 degrees of freedom, and P 0.0005 from Table C. The other t tests conclude that B and W do not differ significantly but that there is a significant difference between G and Y . ■ Our confidence intervals and tests for contrasts are individual procedures for each contrast. If we do inference about three contrasts, such as those in Examples 30.6 and 30.7, we face the problem of multiple comparisons again. That is, we do not have an overall confidence level for all three intervals together. There are more advanced multiple-comparisons methods that apply to contrasts just as Tukey’s method applies to pairwise differences. Apply Your Knowledge 30.7 Green versus Yellow. Using the Minitab output in Figure 30.4, verify the values for the sample contrast L̂2 and its standard error given in Example 30.7. Give a 95% confidence interval for the population contrast L2. Carry out a test of the hypothesis H0: L2 0 against the two-sided alternative. Be sure to state your conclusions in the setting of the study. 30.8 Logging in the Rain Forest: Contrasts. Figure 30.2 gives basic ANOVA output for the study of the effects of logging described in Example 30.1. We might describe the overall effect of logging by comparing the mean species count for unlogged plots (Group A) with the average of the mean counts for the two groups of logged plots (Groups B and C). (a) What population contrast L expresses this comparison? (b) Starting from the output in Figure 30.2, give the sample contrast that estimates L and its standard error. (c) Is there good evidence that the mean species count in unlogged plots is higher than the average for the two groups of logged plots? State hypotheses in terms of the population contrast L, and carry out a test. (d) How much higher is the mean count in unlogged plots than the average for the two groups of unlogged plots? Give a 95% confidence interval. 30.4 Two-way ANOVA: conditions, main effects, and interaction One-way analysis of variance compares the mean responses from any set of populations or experimental treatments when the responses satisfy the ANOVA conditions. Often, however, a sample or experiment has some design structure that leads to more specific questions than those answered by the one-way ANOVA F test or by Tukey pairwise comparisons. It is common, for example, to compare treatments that are combinations of values of two explanatory variables, two factors in the language of experimental design. Here is an example that we already met in Chapter 9. factors 30-13 30-14 CHAP TER 3 0 ■ More about Analysis of Variance Effects of TV Advertising EX AM PLE 30.8 What are the effects of repeated exposure to an advertising message? The answer may depend both on the length of the ad and on how often it is repeated. To investigate this question, assign undergraduate students (the subjects) to view a 40-minute television program that includes ads for a digital camera. Some subjects see a 30-second commercial; others, a 90-second version. The same commercial appears either one, three, or five times during the program. This experiment has two factors: length of the commercial, with two values; and repetitions, with three values. The six combinations of one value of each factor form six treatments. The subjects are divided at random into six groups, one for each treatment: Andrew Harrer/Bloomberg via Getty Images Variable C Repetitions Variable R Length 30 seconds 90 seconds 1 Time 3 Times 5 Times Group 1 Group 4 Group 2 Group 5 Group 3 Group 6 This is a two-way layout, with values of one factor forming rows and values of the other forming columns. After watching the TV program, all the subjects fill out a questionnaire that produces an “intention to buy” score with values between zero and 100. This is the response variable. ■ two-way ANOVA conditions To analyze data from such a study, we impose some additional conditions. Here are the two-way ANOVA conditions that will govern our work on this topic: We have responses for all combinations of values of the two factors (all six cells in Example 30.8). No combinations are missing in our data. In general, call the two explanatory variables R and C (for Row and Column). Variable R has r different values and variable C has c different values. The study compares all rc combinations of these values. These are called crossed designs. 2. In an observational study, we have independent SRSs from each of the rc populations. If the study is an experiment, the available subjects are allocated at random among all rc treatments. That is, we have a completely randomized design. We also allow some randomized block designs in which we have an SRS from each of r populations and the subjects in each SRS are then separately allocated at random among the same c treatments.3 We met block designs in Chapter 9 (see page 241). 3. The response variable has a Normal distribution in each population. The population mean responses may differ, but all rc populations have a common standard deviation . 4. We have the same number of individuals n in each of the rc treatment groups or samples. These are called balanced designs. 1. crossed designs balanced designs As always, the design of the study is the most important condition for statistical inference. Conditions 1 and 2 describe the designs covered by two-way ANOVA. These designs are very common. Condition 3 lists the usual ANOVA conditions on the distribution of the response variable. ANOVA inference is reasonably robust against violations of these conditions. When you design a study, you should try to satisfy the fourth condition; that is, you should choose equal numbers of individuals for each treatment. Balanced 30.4 Two-Way ANOVA: Conditions, Main Effects, and Interaction designs have several advantages in any ANOVA: F tests are most robust against violation of the “common standard deviation” condition when the subject counts are equal or close to equal, and Tukey’s method then gives exact overall confidence or significance levels. In the two-way layout, there is an even stronger reason to prefer balanced designs. If the numbers of individuals differ among treatments (an unbalanced design), several alternative analyses of the data are possible. These analyses answer different sets of questions, and you must decide which questions you want to answer. All the sets of questions and all the analyses collapse to just one in the balanced case. This makes interpreting your data much simpler. To understand the questions that two-way ANOVA answers, return to the advertising study in Example 30.8. In this section, we will assume that we know the actual population mean responses for each treatment. That is, we deal with an ideal situation in which we don’t have to worry about random variation in the mean responses. EX AM PLE 30.9 Main Effects with No Interaction Here again is the two-way layout of the advertising study in Example 30.8. The numbers in the six cells are now made-up values of the population mean responses to the six treatments: Variable C Repetitions Variable R Length 30 seconds 90 seconds 1 Time 3 Times 5 Times 30 40 45 55 50 60 The mean “intention to buy” scores increase with more repetitions of the camera ad and also increase with the length of the ad. The means increase by the same amount (10 points) when we move from 30 seconds to 90 seconds, no matter how many times the ad is shown. Turning to the other variable, the effect of moving from one to three repetitions is the same (15 points) for both 30-second and 90-second ads, and the effect of moving from three to five repetitions is also the same (5 points). Because the result of changing the value of one variable is the same for all values of the other variable, we say that there is no interaction between the two variables. Now average the mean responses for 30 seconds and for 90 seconds. The average for 30 seconds is 30 45 50 125 41.7 3 3 and the average for 90-second ads is 40 55 60 155 51.7 3 3 Because the averages for the two lengths differ, we say that there is a main effect for length. Similarly average the mean responses for each number of repetitions. They are 35 for one repetition, 50 for three, and 55 for five. So changing the number of times the ad is shown has an “on the average” effect on the response. There is a main effect for repetitions. ■ 30-15 30-16 CHAP TER 3 0 ■ More about Analysis of Variance FIG U R E 3 0 . 5 70 90 seconds 60 Average Mean response Plot of the means from Example 30.9. In addition to the means themselves, the plot displays their average for each number of repetitions as a dotted line. The parallel lines show that there is no interaction between the two factors. 30 seconds 50 40 30 ROCKED BY ROIDS How do baseball fans react to accusations of steroid use by some of the game’s most high profile players? Rafael Palmerio, one of four players with 3000 hits and 500 home runs, was suspended for 10 days in 2005 for a steroid violation after insisting at a Congressional hearing that he never used steroids. When Barry Bonds hit his 756th home run to break Hank Aaron's record, many fans thought his accomplishment was tainted by allegations of steroid use. Should an asterisk be placed behind the names of these players in record books? Public reaction depends on geographic region, gender, age, race, and many other factors. Any careful study of what fans think will include several factors. E X A M P L E 3 0 .10 20 1 rep 3 reps 5 reps Repetitions Figure 30.5 plots the cell means from Example 30.9. The two solid lines joining one, three, and five repetitions for 30-second and for 90-second ads are parallel. This reflects the fact that the mean response always increases by 10 points when we move from 30-second to 90-second ads, no matter how many times the ad is shown. Parallel lines in a plot of means show that there is no interaction. It doesn’t matter which variable you choose to place on the horizontal axis. To see the main effect of repetitions, look at the average response for 30 and 90 seconds at each number of repetitions. This average is the dotted line in the plot. It changes as we move from one to three to five repetitions. A variable has a main effect when the average response differs for different values of the variable. “Average” here means averaged over all the values of the other variable. A main effect of repetitions is present in Figure 30.5 because the dotted “average” line is not horizontal. A main effect for length is also present, but it can’t be seen directly in the plot. Interactions and Main Effects Now look at this different set of mean responses for the advertising study: Variable C Repetitions Variable R Length 30 seconds 90 seconds 1 Time 3 Times 5 Times 30 40 45 45 50 40 Figure 30.6 plots these means, with the means for 30-second ads connected by solid black lines and those for 90-second ads by solid red lines. These means reflect the case in which subjects grow annoyed when the ad is both longer and repeated more often. The mean responses for the 30-second ad are the same as in Example 30.9. But showing the 90-second ad three times increases the mean response by only 5 points, and showing it five times drops the mean back to 40. There is an interaction between length and repetitions: the difference between 30 30.4 Two-Way ANOVA: Conditions, Main Effects, and Interaction 30-17 FIG U RE 30.6 Plot of the means from Example 30.10, along with the average for each number of repetitions. The lines are not parallel, so there is an interaction between length and number of repetitions. 70 Mean response 60 50 30 seconds Average 40 90 seconds 30 20 1 rep 3 reps 5 reps Repetitions and 90 seconds changes with the number of repetitions, so that the solid black and red lines in Figure 30.6 are not parallel. There is still a main effect of repetitions because the average response (dotted line in Figure 30.6) changes as we move from one to three to five repetitions. What about the effect of length? The average over all values of repetitions for 30-second ads is 30 45 50 125 41.7 3 3 For 90-second ads, this average is the same, 40 45 40 125 41.7 3 3 On the average over all numbers of repetitions, changing the length of the ad has no effect. There is no main effect for length. ■ Interactions and Main Effects An interaction is present between factors R and C in a two-way layout if the change in mean response when we move between two values of R is different for different values of C. (We can interchange the roles of R and C in this statement.) A main effect for factor R is present if, when we average the responses for a fixed value of R over all values of C, we do not get the same result for all values of R. Main effects may have little meaning when interaction is present. After all, interaction says that the effect of changing one of the variables is different CAUTION for different values of the other variable. The main effect, as an “on the average” effect, may not tell us much. In Example 30.10, there is no main effect for the length of the camera ad. But length certainly matters—Figure 30.6 shows that there is little point in paying for 90-second ads if the same ad will be shown several times during the program. That’s the interaction of length with number of repetitions. ! ■ E X A M P L E 3 0 .11 More about Analysis of Variance Which Effect Is More Important? There are no simple rules for interpreting results from two-way ANOVA when strong interaction is present. You must look at plots of means and think. Figure 30.7 displays two different mean plots for a study of the effects of classroom conditions on the performance of normal and hyperactive schoolchildren. The two conditions are “quiet” and “noisy,” where the noisy condition is actually the usual environment in elementary school classrooms. Normal Normal Performance CHAP TER 3 0 Performance 30-18 Hyperactive Quiet Noisy Work condition (a) Hyperactive Quiet Noisy Work condition (b) F I GU R E 30.7 Plots of two sets of means for a study comparing the performance of normal and hyperactive children under two conditions, for Example 30.11. There is an interaction: normal children perform a bit better under noisy conditions, but hyperactive children perform slightly less well under noisy conditions. The interaction is exactly the same size in the two plots of Figure 30.7. To see this, look at the slopes of the “Normal” lines in the two plots: they are the same. The slopes of the two “Hyperactive” lines are also the same. So the size of the gap between normal and hyperactive changes by the same amount when we move from quiet to noisy in both plots even though the gap is much larger in Figure 30.7(b). In Figure 30.7(a), this interaction is the most important conclusion of the study. Both main effects are small: normal children do a bit better than hyperactive children, for example, but not a great deal better on the average. In Figure 30.7(b), the main effect of “hyperactive or not” is the big story. Normal children perform much better than hyperactive children in both environments. The interaction is still there, but it is not very important in the face of the large difference in average performance between hyperactive and normal children. ■ In this section, we pretended that we knew the population means so that we could discuss patterns without needing statistical inference. In practice, we don’t know the population means. However, plotting the sample means for all groups is an essential part of data analysis for a two-way layout. Look for interaction and main effects just as we did in this section. Of course, in real data you will almost never find exactly parallel lines representing exactly no interaction. Two-way ANOVA inference helps guide you because it assesses whether the interaction in the data is statistically significant. We are now ready to introduce two-way ANOVA inference. 30.4 Two-Way ANOVA: Conditions, Main Effects, and Interaction Apply Your Knowledge V2 V2 Average Average V1 H1 H2 H3 Yield Yield Figure 30.8 shows two different plots of means for a two-way study that compares the yields of two varieties of soybeans (V1 and V2) when three different herbicides (H1, H2, and H3) are applied to the fields. Exercises 30.9 and 30.10 ask you to interpret these plots. V1 H1 H2 Herbicide Herbicide (a) (b) H3 F I GU R E 3 0 . 8 Plots of two sets of possible means for a study in which the two factors are soybean variety and type of herbicide, for Exercises 30.9 and 30.10. The plots also show the average for each herbicide (dotted line). 30.9 Recognizing Effects. Consider the mean responses plotted in Figure 30.8(a). (a) Is there an interaction between soybean variety and herbicide type? Why or why not? (b) Is there a main effect of herbicide type? Why or why not? (c) Is there a main effect of soybean variety? Why or why not? 30.10 Recognizing Effects. Consider the mean responses plotted in Figure 30.8(b). (a) Is there an interaction between soybean variety and herbicide type? Why or why not? (b) Is there a main effect of herbicide type? Why or why not? (c) Is there a main effect of soybean variety? Why or why not? 30.11 Angry Women, Sad Men. What are the relationships among the por- trayal of anger or sadness, gender, and the degree of status conferred? Sixty-eight subjects were randomly assigned to view a videotaped interview in which either a male or a female professional described feeling either anger or sadness. The people being interviewed (we’ll call them the “targets”) wore professional attire and were ostensibly being interviewed for a job. The targets described an incident in which they and a colleague lost an account and, when asked by the interviewer how it made them feel, responded either that the incident made them feel angry or that it made them feel sad. The subjects were divided into four groups; each group evaluated one of the four types of interviews.4 After watching the interview, subjects evaluated the target on a composite measure of status conferral that included items assessing how much status, power, and independence the target deserved in his or her future job. The measure of 30-19 30-20 CHAP TER 3 0 ■ More about Analysis of Variance status ranged from 1 none to 11 a great deal. Here are the summary statistics: Treatment n x– s Males expressing anger 17 6.47 2.25 Females expressing anger 17 3.75 1.77 Males expressing sadness 17 4.05 1.61 Females expressing sadness 17 5.02 1.80 (a) Display the four treatment means in a two-way layout similar to those given in Exercises 30.9 and 30.10. (b) Plot the means and discuss the interaction and the two main effects. 30.5 Inference for two-way ANOVA Inference for two-way ANOVA is in many ways similar to inference for one-way ANOVA. Here is a brief outline: 1. 2. 3. Find and plot the group sample means. Study the plot to understand the interaction and main effects. Do data analysis to check the conditions for ANOVA. Use software for basic ANOVA inference. There are now three F tests with three P-values, which answer the questions Is the interaction statistically significant? Is the main effect for variable R statistically significant? Is the main effect for variable C statistically significant? You may wish to carry out a follow-up analysis. For example, Tukey’s method makes pairwise comparisons among the means of all rc treatment groups. We will illustrate two-way ANOVA inference with several examples. In the first example, the interaction is both small and insignificant, so that the message is in the main effects. E X A M P L E 3 0 .12 DATA 4step Computer-Assisted Instruction STATE: A study in education concerned computer-aided instruction in the use of “Blissymbols” for communication. Blissymbols are pictographs (think of Egyptian hieroglyphs) sometimes used to help learning-disabled children. Typically abled schoolchildren were randomly assigned to treatment groups. There are four groups in a two-way layout: COMPINST Placement Before During Learning Style Active Passive Group 1 Group 2 Group 4 Group 3 The two factors are the learning style (active or passive) of the lesson and the placement of the material to be learned (Blissymbols presented before compounds, or Blissymbols and compounds shown during a joint presentation). The response variable is the number of symbols recognized correctly, out of 24, in a test taken after the lesson. Table 30.3 displays the data.5 PLAN: Plot the sample means and discuss interaction and main effects. Check the conditions for ANOVA inference. Use two-way ANOVA F tests to determine the significance of interaction and main effects. 30.5 Inference for Two-Way ANOVA 30-21 T A B L E 3 0 . 3 TEST SCORES IN AN EDUCATION STUDY GROUP 1 14 14 16 14 6 10 12 13 12 12 13 9 GROUP 2 4 4 9 8 15 8 9 13 13 12 12 10 GROUP 3 11 8 8 7 14 9 7 8 12 10 6 3 GROUP 4 12 22 9 14 20 15 9 10 11 11 15 6 SOLVE: This is a balanced completely randomized experimental design. Figure 30.9 displays side-by-side stemplots of the scores for the 12 subjects in each group. Figure 30.10 shows two-way ANOVA output from Minitab. The stemplots show no departures from Normality strong enough to rule out ANOVA. (One observation, the 6 in Group 1, is slightly outside the range allowed by the 1.5 IQR rule. This would not be unusual among 48 Normally distributed observations.) You can check that the group standard deviations satisfy our rule of thumb that the largest (4.648) is no more than twice the smallest (2.678). We can proceed with ANOVA inference. Figure 30.11 shows the plots of means produced by Minitab. The two plots display the same four sample means; Minitab helpfully gives a plot with each variable marked on the horizontal axis. The plots are easy to interpret: the interaction and the main effect of placement are both small, and the main effect of learning style is quite large. The three F tests in the Minitab output substantiate what the plots of means show: interaction (P 0.349) and placement (P 0.838) are not significant, but learning style (P 0.002) is highly significant. Group 1 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 1 7 18 19 20 21 22 0 0 0 000 00 000 0 Group 2 Group 3 Group 4 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 1 7 18 19 20 21 22 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 1 7 18 19 20 21 22 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 1 7 18 19 20 21 22 00 00 00 0 00 00 0 0 0 00 000 0 0 0 0 0 0 00 0 00 0 0 00 0 0 FIG U RE 30.9 Side-by-side stemplots comparing the counts of correct answers for subjects in the four treatment groups from Example 30.12. 30-22 CHAP TER 3 0 ■ FIG U R E 3 0. 1 0 Two-way ANOVA output from Minitab for the computer-assisted instruction study, Example 30.12. More about Analysis of Variance Minitab Session ANOVA: score versus learn, place Factor learn place Type fixed fixed Levels 2 2 Values Act, Pas Bef, Dur Analysis of Variance for score Source learn place learn*place Error Total DF 1 1 1 44 47 S = 3.50892 SS 130.02 0.52 11.02 541.75 683.31 MS 130.02 0.52 11.02 12.31 R-Sq = 20.72% F 10.56 0.04 0.90 P 0.002 0.838 0.349 R-Sq(adj) = 15.31% Means learn Act Pas N 24 24 score 12.458 9.167 place Bef Dur N 24 24 learn Act Act Pas Pas place Bef Dur Bef Dur score 10.917 10.708 N 12 12 12 12 score 12.083 12.833 9.750 8.583 Minitab Interaction Plot (data means) for score Interaction Plot (data means) for score Act Pas 13 12 11 place place Bef Dur 10 9 13 FIG U R E 3 0. 1 1 Minitab plots of the group means from the computer-assisted learning study, Example 30.12. The two plots use the same four means. They differ only in the choice of which variable to mark on the horizontal axis. learn Act Pas 12 11 learn 10 9 Bef Dur CONCLUDE: Educators know that active learning almost always beats passive learning. This is the only significant effect that appears in these data. In particular, placement of material has very little effect under either style of learning. ■ 30.5 Inference for Two-Way ANOVA 30-23 The second example illustrates the situation in which there is significant interaction, but main effects are larger and more important. Think of Figure 30.7(b). Mycorrhizal Colonies and Plant Nutrition E X A M P L E 3 0 .13 0 kg/ha 28 kg/ha 160 kg/ha 4step Six plants of each type were assigned at random to each amount of fertilizer. The response variables describe the levels of nutrients in a plant after 19 weeks, when the tomatoes are fully ripe. We will look at one response, the percent of phosphorus in the plant. Table 30.4 contains the data.6 T A B L E 3 0 . 4 PERCENT OF PHOSPHORUS IN TOMATO PLANTS MUTANT NORMAL GROUP FERT PCTPHOS GROUP FERT PCTPHOS 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 0 0 0 0 0 0 28 28 28 28 28 28 160 160 160 160 160 160 0.29 0.25 0.27 0.24 0.24 0.20 0.21 0.24 0.21 0.22 0.19 0.17 0.18 0.20 0.19 0.19 0.16 0.17 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 0 0 0 0 0 0 28 28 28 28 28 28 160 160 160 160 160 160 0.64 0.54 0.53 0.52 0.41 0.43 0.41 0.37 0.50 0.43 0.39 0.44 0.34 0.31 0.36 0.37 0.26 0.27 PLAN: Plot the sample means and discuss interaction and main effects. Check the conditions for ANOVA inference. Use two-way ANOVA F tests to determine the significance of interaction and main effects. DATA Nitrogen Tomato Type Mutant Normal Group 1 Group 4 Group 2 Group 5 Group 3 Group 6 Eye of Science/Science Source STATE: Mycorrhizal fungi are present in the roots of many plants. This is a symbiotic relationship, in which the plant supplies nutrition to the fungus and the fungus helps the plant absorb nutrients from the soil. An experiment compared the effects of adding nitrogen fertilizer to two genetic types of tomato plants: a normal variety that is susceptible to mycorrhizal colonies, and a mutant that is not. Nitrogen was added at rates of 0, 28, or 160 kilograms per hectare (kg/ha). Here is the two-way layout for the six treatment combinations: TOMATOES 30-24 CHAP TER 3 0 ■ More about Analysis of Variance SOLVE: This is a randomized block design. We are willing to assume that each of our two sets of tomato plants is an SRS from its genetic type. We consider the blocks (genetic types) as one of the factors in analyzing the data because we are interested in comparing the two genetic types. In addition to the two-way ANOVA, we might consider separate one-way ANOVAs for mutant and normal types to draw separate conclusions about the effect of fertilizer on each type. Figure 30.12 displays Minitab’s plots of the six sample means. The lines are not parallel, so interaction is present. The interaction is rather small compared with the main effects. The main effect of type is expected: normal plants have higher phosphorus levels than the mutants at all levels of fertilization because they benefit from symbiosis with the fungus. The main effect of fertilizer is a bit surprising: phosphorus level goes down as the level of nitrogen fertilizer increases. Minitab Interaction Plot (data means) for phosphorus level Interaction Plot (data means) for percent phosphorus 0 28 160 0.5 0.4 Genotype Mutant Normal Plant type 0.3 0.2 0.5 0.4 Nitrogen amount Nitrogen amount 0 28 160 0.3 FIG U R E 3 0. 1 2 0.2 Minitab plots of the group means for the study of phosphorus levels in tomatoes, Example 30.13. Mutant Normal Examination of the data (we don’t show the details) finds no outliers or strong skewness. But the largest sample standard deviation (0.08329 in Group 4) is much larger than twice the smallest (0.01472 in Group 3). As the number of treatment groups increases, even samples from populations with exactly CAUTION the same standard deviation are more likely to produce sample standard deviations that violate our “twice as large” rule of thumb. (Think of comparing the shortest and tallest person among more and more people.) So our rule of thumb is often conservative for two-way ANOVA. Nonetheless, ANOVA inference may not give correct P-values for these data. The P-values for the three two-way ANOVA F tests are P 0.008 for interaction and P 0.001 for both main effects. These agree with the mean plots and are so small that even if not accurate, they strongly suggest significance. ! CONCLUDE: Normal plants, with their mycorrhizal colonies, have higher phosphorus levels than mutants that lack such colonies. Nitrogen fertilizer actually reduces phosphorus levels in both types of plants. The reduction is stronger for normal plants, but this interaction is not very large in practical terms. ■ Finally, here is an example in which strong interaction makes one of the main effects meaningless. Two-way ANOVA with strong interaction is often difficult to interpret simply, as this example also illustrates. 30.5 30-25 Better Corn for Heavier Chicks? STATE: Corn varieties with altered amino acid content can have advantages in feeding animals. Here is an excerpt from a study that compared normal corn (“norm” in the data file) with two altered varieties called opaque-2 (“opaq”) and floury-2 (“flou”). Nine treatments were arranged in a 3 3 factorial experiment to compare opaque-2, floury-2, and normal corn at dietary protein levels of 20, 16, and 12%. Corn–soybean meal diets containing either opaque-2, floury-2, or normal corn were formulated so that, for a given protein level, an equivalent amount of corn protein was supplied by each corn. Male broiler-type chicks were randomly allotted to treatments at 1 day of age. Feed and water were provided ad libitum. Chicks were weighed at weekly intervals until termination of the experiment at 21 days.7 There are 10 chicks in each group. The response variable is the weight in grams after 21 days. Which combinations of corn type and protein content lead to fastest growth? PLAN: Plot the sample means and discuss interaction and main effects. Check the conditions for ANOVA inference. Use two-way ANOVA F tests to determine the significance of interaction and main effects. If necessary, use Tukey pairwise comparisons to identify significant differences among treatments. SOLVE: This is a balanced completely randomized experiment with nine treatments. Figure 30.13 shows Minitab’s plots of the sample means. The mean weight of the chicks increases with the percent of protein in the diet, as expected. We are primarily interested in comparing the three types of corn. There are important interaction effects. Normal corn does poorest of the three corn types at 12% protein and best at 20%. Floury is best at both 12% and 16%. Opaque is always inferior to floury and beats normal corn only at 12% protein. 4step DATA E X A M P L E 3 0 .14 Inference for Two-Way ANOVA CORN A MOMENTARY LAPSE Beware of distracted drivers. A recent study showed that putting on makeup, chatting on a cell phone, eating breakfast, reading, and reaching for a moving object significantly increase the risk of being involved in an accident. About 80 percent of crashes occur within three seconds of a driver’s distraction. As you might guess, the risk depends on the type of distraction. As usual, other factors also play a role: for example, young drivers are four times more likely to have crashes and near-crashes as drivers over the age of 35. Minitab Interaction Plot (data means) for Weight Interaction Plot (data means) for Weight 12 16 20 500 400 Corn Corn Flou Norm Opaq 300 200 500 400 Protein Protein 12 16 20 300 FIG U RE 30.13 200 Flou Norm Opaq Although we don’t show the details, ANOVA is justified. There are no outliers or strong skewness. Although the largest sample standard deviation (61.16 for Group 4) is a bit more than twice the smallest (25.99 for Group 6), this is common when we have nine groups, even when all populations really have the same . Minitab plots of the mean weights of 21-day-old chicks fed nine different diets, for Example 30.14. 30-26 CHAP TER 3 0 ■ FIG U R E 3 0. 1 4 Two-way ANOVA output from Minitab for the study of the effect of diet on growth, Example 30.14. More about Analysis of Variance Minitab Session ANOVA: Weight versus Corn, Protein Analysis of Variance for Weight SS 31137 745621 60894 176951 1014602 Source Corn Protein Corn*Protein Error Total DF 2 2 4 81 89 S = 46.7395 R-Sq = 82.56% MS 15568 372810 15223 2185 F 7.13 170.66 6.97 P 0.001 0.000 0.000 R-Sq(adj) = 80.84% Means Corn flou norm opaq N 30 30 30 Protein 12 16 20 Corn flou flou flou norm norm norm opaq opaq opaq Weight 390.27 356.63 346.84 N 30 30 30 Protein 12 16 20 12 16 20 12 16 20 Weight 243.49 387.33 462.93 N 10 10 10 10 10 10 10 10 10 Weight 271.39 434.22 465.21 195.34 389.03 485.53 263.73 338.75 438.05 Figure 30.14 contains Minitab’s ANOVA output. We included the means for the nine groups, for the three types of corn, and for the three levels of protein. The main effects can be seen in the different mean weights for the corn types and for the protein levels. The three F tests are all highly significant. Although there is a substantial main effect for corn (the means range from 346.84 grams for opaque to 390.27 grams for floury), this has little meaning in the light of the interaction that we just described. When strong interaction makes one or both main effects hard to interpret, it is often useful to find the Tukey pairwise multiple comparisons for all the treatment groups. One way to do this is to do a one-way ANOVA with just “group” as the explanatory variable. There are 36 pairwise comparisons among nine groups. Minitab’s Tukey output is both long and hard to understand in such cases. Here is a condensed version using an idea that some software (though not Minitab) implements. Arrange the nine groups in the order of their sample means, from smallest to largest. We have identified the groups both by their group number in the data file and by the treatment. Connect with an underline all pairs of treatments that do not differ significantly at the overall 5% level: Treatment: N12 Group: 4 O12 F12 7 1 ------ O16 N16 F16 O20 F20 N20 8 5 2 9 3 6 ------------------------------- Group 4 has significantly smaller mean weight than any other group. Groups 7 and 1 do not differ significantly but are higher than Group 4 and lower than all other groups. 30.5 Inference for Two-Way ANOVA 30-27 Group 8 is not significantly different from Group 5 but is higher than Groups 4, 7, and 1 and lower than Groups 2, 9, 3, and 6. And so on. The most interesting finding is that, at the high end, Groups 2, 9, 3, and 6 do not have significantly different mean weights. Three of these are the three 20% protein groups, but floury corn with 16% protein belongs with these three. CONCLUDE: More protein clearly helps chicks grow faster. The three types of corn do not differ significantly when the diet has 20% protein. Floury corn is superior to both opaque and normal corn at middle (16%) and low (12%) protein levels, though not all differences at these levels are statistically significant. ■ Apply Your Knowledge © ARCO/H. Reinhard/age fotostock 30.12 Hooded Rats: Social Play Times. How does social isolation during a DATA critical developmental period affect the behavior of hooded rats? Psychology students assigned 24 young female rats at random to either isolated or group housing, then similarly assigned 24 young male rats. This is a randomized block design with the gender of the 48 rats as the blocking variable and housing type as the treatment. Later, the students observed the rats at play in a group setting and recorded data on three types of behavior (object play, locomotor play, and social play).8 The data file records the time (in seconds) that each rat devoted to social play during the observation period. SCLTIME DATA (a) Make a plot of the four group means. Is there a large interaction between gender and housing type? Which main effect appears to be more important? (b) Verify that the conditions for ANOVA inference are satisfied. (c) Give the complete two-way ANOVA table. What are the F statistics and P-values for interaction and the two main effects? Explain why the test results confirm the tentative conclusions you drew from the plot of means. 30.13 Hooded Rats: Social Play Counts. The researchers who conducted the study in Exercise 30.12 also recorded the number of times each of three types of behavior (object play, locomotor play, and social play) occurred. The data file contains the counts of social play episodes by each rat during the observation period. Use two-way ANOVA to analyze the effects of gender and housing. SCLCOUNT 30.14 Metabolic Rates in Caterpillars. Professor Harihuko Itagaki and his DATA students have been measuring metabolic rates in tobacco hornworm caterpillars (Manduca sexta) for years. The researchers do not want the metabolic rates to depend on which analyzer they use to obtain the measurements. They therefore make six repeated measurements on each of three caterpillars with each of three analyzers.9 CATRPLRS (a) Use software to give the two-way ANOVA table. The researchers used caterpillar as a blocking variable because metabolic rates vary from individual to individual. These three caterpillars are not of interest in themselves. They represent the larger population of caterpillars of this species. We should therefore avoid inference about the main effect of caterpillars. (b) Explain why the researchers will be unhappy if there is a significant interaction between analyzer and caterpillar. What does your analysis show about the interaction? (c) Is there a significant effect for analyzer? Do you think the researchers will be happy with this result? 4step 30-28 CHAP TER 3 0 ■ More about Analysis of Variance 30.6 Some details of two-way ANOVA* All ANOVA F statistics work on the same principle: compare the variation due to the effect being tested with a benchmark level of variation that would be present even if that effect were absent. The three F tests for two-way ANOVA use the same benchmark as the one-way ANOVA F test, namely, the variation among individual responses within the same treatment group. In two-way ANOVA, we have two factors (explanatory variables) that form treatments in a two-way layout. Factor R has r values and Factor C has c values, so there are rc treatments. The same number n of subjects are assigned to each treatment. The two-way layout that results is as follows: Row Factor R 1 2 Column Factor C p 1 2 c n subjects n subjects p n subjects n subjects n subjects p n subjects o r o n subjects o n subjects p o n subjects The number of treatments is I rc. The total number of observations is N rcn. Figure 30.15 presents both one-way and two-way ANOVA output for the same data, from the computer-assisted education study in Example 30.12. The one-way analysis just compares the means of rc treatments, ignoring the two-way layout. In discussing one-way ANOVA, we called the number of treatments I. Now I rc. Minitab Session One-way ANOVA: Score versus Groups Source Groups Error Total DF 3 44 47 S = 3.509 SS 141.563 541.750 683.313 R-Sq = 20.72% MS 47.2 12.3 F 3.83 P 0.016 R-Sq(adj) = 15.31% Session Two-way ANOVA: Score versus Learn, Place FIG U R E 3 0 . 15 Compare the sums of squares in these one-way and two-way ANOVA outputs for the study in Example 30.12. Source Learn Place Interaction Error Total S = 3.509 DF 1 1 1 44 47 SS 130.021 0.521 11.021 541.750 683.313 R-Sq = 20.72% MS 130.021 0.521 11.021 12.312 F 10.56 0.04 0.90 P 0.002 0.838 0.349 R-Sq(adj) = 15.31% *This optional material requires the background on some details of ANOVA, presented in Section 27.7, pages 661–665. 30.6 Some Details of Two-Way ANOVA The two-way analysis takes into account that each treatment is formed by combining a value of R with a value of C. In both settings, analysis of variance breaks down the overall variation in the observations into several pieces. The overall variation is expressed numerically by the total sum of squares total sum of squares SSTO a 1individual observation mean of all observations2 2 where the sum runs over all N individual observations. If we divide SSTO by N 1, we get the variance of the observations. So SSTO is closely related to a familiar measure of variability. Because SSTO uses just the N individual observations, it is the same for both one-way and two-way analyses. You can see in Figure 30.15 that SSTO 683.313 for these data. One-way ANOVA. We saw in Chapter 27 that the one-way ANOVA F test compares the variation among the I treatment means with the variation among responses to the same treatment. If the means vary more than we would expect based on the variation among subjects who receive the same treatment, that’s evidence of a difference among the mean responses in the I populations. Let’s give a bit more detail. One-way ANOVA breaks down the total variation into the sum of two parts: total variation among responses variation among treatment means variation among responses to the same treatment total sum of squares sum of squares for groups sum of squares for error SSTO SSG SSE Formulas for the sums of squares SSG and SSE appear on page 662 as the numerators of the mean squares MSG and MSE, but we won’t concern ourselves with the algebra. Remember that “error” is the traditional term in ANOVA for variation among observations. It doesn’t imply that a mistake has been made. In the one-way output in Figure 30.15, you see that the breakdown for these data is SSTO SSG SSE 683.313 141.563 541.750 The one-way ANOVA F test is formed in two stages (page 662): 1. Divide each sum of squares by its degrees of freedom to get the mean squares MSG for groups and MSE for error, MSG SSG I1 MSE SSE NI 2. The one-way ANOVA F statistic compares MSG to MSE, MSG MSE Find the P-value from the F distribution with I 1 and N I degrees of freedom. F The ANOVA table in the output reports sums of squares, their degrees of freedom, mean squares, and the F statistic with its P-value. ANOVA table 30-29 30-30 CHAP TER 3 0 ■ More about Analysis of Variance Two-way ANOVA. Now look at the two-way analysis of variance table in Figure 30.15: ■ ■ The total sum of squares and the error sum of squares are the same as in the one-way analysis. The sum of squares for groups in one-way is the sum of the three sums of squares for main effects and interaction in two-way. This is the heart of two-way analysis of variance: break down the variation among the rc groups into three parts: variation due to the main effect of Factor R, variation due to the main effect of Factor C, and variation due to interaction between the two factors. Each type of variation is measured by a sum of squares. The formulas for the two main effects sums of squares are similar to that for the one-way sum of squares for groups, but we will again ignore the algebraic details. The interaction sum of squares is best thought of as what’s left over: the variation among treatments that isn’t explained by the two main effects. In symbols, total sum of squares sum of squares for main effect of Factor R sum of squares for main effect of Factor C sum of squares for interaction between R and C sum of squares for error SSTO SSR SSC SSRC SSE Each of these sums of squares has degrees of freedom, and these also break down in the same way: total df df for main effect of Factor R df for main effect of Factor C df for interaction between R and C df for error rcn 1 1r 12 1c 12 1r 12 1c 12 rc1n 12 You can check that the total degrees of freedom in the line above are N 1 and the degrees of freedom for error are N I, the same as for one-way ANOVA. The one-way degrees of freedom for groups are the sum of the degrees of freedom for the three two-way effects. E X A M P L E 3 0 .15 Comparing One-Way and Two-Way ANOVA The data behind Figure 30.15 appear in Table 30.3. The two factors are R placement of Blissymbol elements and C learning style. Factor R has r 2 values: Before and During. Factor C has c 2 values: Active and Passive. There are I 4 treatments and n 12 subjects assigned to each treatment, resulting in N 48 observations. The total degrees of freedom are N 1 47. The degrees of freedom for error are N I 48 4 44. In the one-way analysis, the degrees of freedom for groups are I 1 3. The two-way analysis breaks this into degrees of freedom r 1 1 for Factor R, c 1 1 for Factor C, and 1r 12 1c 12 1 1 1 for interaction. Here are the two breakdowns of the total variation and the degrees of freedom that appear in Figure 30.15: 30.6 One-Way Sums of Squares SSG 141.563 df 3 SSE SSTO 44 47 541.750 683.313 Two-Way Sums of Squares SSR 130.021 SSC 0.521 SSRC 11.021 SSE 541.750 SSTO 683.313 df 1 1 1 44 47 Some Details of Two-Way ANOVA ■ The neat breakdown of SSG into three effects depends on the balance of the two-way layout. It doesn’t hold when the counts of observations are not the same for all treatments. That’s why two-way ANOVA for unbalanced data is more complicated and harder to interpret than for balanced data. Two-way ANOVA F tests. Finally, form three F statistics exactly as in the one-way setting. 1. Divide each sum of squares by its degrees of freedom to get the mean squares for the three effects and for error: MSR MSC SSC c1 MSRC SSRC 1r 12 1c 12 MSE SSE NI The three F statistics compare the mean squares for the three effects with MSE. Tw o - Wa y A N OVA F Te s t s The F statistics for the three types of treatment effects in two-way ANOVA are MSR For the main effect of Factor R, F with dfs r 1 and N I MSE MSC For the main effect of Factor C, F with dfs c 1 and N I MSE MSRC For the interaction of R and C, F with dfs 1r 12 1c 12 and N I MSE In all cases, large values of F are evidence against the null hypothesis that the effect is not present in the populations. Apply Your Knowledge 30.15 Hooded Rats: Social Play Times. In Exercise 30.12, you carried out two-way ANOVA for a study of the effect of social isolation on hooded rats. The response variable is the time (in seconds) that a rat devoted to social play during an observation period. Start your work in this exercise with your two-way ANOVA table from Exercise 30.12. SCLTIME DATA 2. SSR r1 (a) Explain how the sums of squares from the two-way ANOVA table can be combined to obtain the one-way ANOVA sum of squares for the four groups (SSG). What is the value of SSG? (b) Give the degrees of freedom, mean square (MSG), and F statistic for testing for the effect of groups. 30-31 CHAP TER 3 0 ■ More about Analysis of Variance (c) Is there a significant effect of group on the amount of time in play? Give and interpret the P-value in the context of this experiment. (d) Use software to carry out one-way ANOVA of time on group. Verify that your results in parts (b), (c), and (d) agree with the software output. 30.16 Hooded Rats: Social Play Counts. In Exercise 30.13, you carried out two-way ANOVA for a study of the effect of social isolation on hooded rats. The response variable is the count of social play episodes during an observation period. Start your work in this exercise with your two-way ANOVA table from Exercise 30.13. SCLCOUNT DATA 30-32 (a) How many treatment groups are there in this experiment? (b) Starting from the two-way ANOVA table, create a one-way ANOVA table to examine the effect of the four groups. (c) Is there a significant group effect? Give the appropriate hypotheses, test statistic, P-value, and conclusion in the context of this experiment. CHAPTER 30 SUMMARY Chapter Specifics ■ Two-way analysis of variance (ANOVA) compares the means of several populations formed by combinations of two factors R and C in a two-way layout. ■ The conditions for ANOVA state that we have an independent SRS from each population (or a completely randomized or randomized block experimental design); that each population has a Normal distribution; and that all populations have the same standard deviation. In this chapter, we consider only examples that satisfy the additional conditions that the design producing the data is crossed (all combinations of the factors are present) and balanced (all factor combinations are represented by the same number of individuals). ■ A factor has a main effect if the mean responses for each value of that factor, averaged over all values of the other factor, are not the same. The two factors interact if the effect of moving between two values of one factor is different for different values of the other factor. Plot the treatment mean responses to examine main effects and interaction. ■ There are three ANOVA F tests: for the null hypotheses of no main effect for Factor R, no main effect for Factor C, and no interaction between the two factors. ■ Follow-up analysis is often helpful in both one-way and two-way ANOVA settings. Tukey pairwise multiple comparisons give confidence intervals for all differences among treatment means with an overall level of confidence. That is, we can be (say) 95% confident that all the intervals simultaneously capture the true population differences between means. STATISTICS IN SUMMARY Here are the most important skills you should have acquired from reading this chapter. A. Recognition 1. Recognize the two-way layout, in which we have a quantitative response to treatments formed by combinations of values of two factors. Statistics in Summary 2. Recognize when comparing mean responses to the treatments in a two-way layout is helpful in understanding data. 3. Recognize when you can use two-way ANOVA to compare means. Check the data production, the presence of outliers, and the sample standard deviations for the groups you want to compare. Look for data production designs that are crossed and balanced. B. Interpreting Two-Way ANOVA 1. Plot the sample means for the treatments. Based on your plot, describe the main effects and interaction that appear to be present. 2. Decide which effects are most important in practice. Pay particular attention to whether a large interaction makes one or both main effects less meaningful. 3. Use software to carry out two-way ANOVA inference. From the P-values of the three F tests, learn which effects are statistically significant. C. Follow-up Analysis 1. Decide when it is helpful to know which differences among treatment means are significant. 2. Use software to carry out Tukey pairwise multiple comparisons among all the means you want to compare. 3. Understand the meaning of the overall confidence level and the overall level of significance provided by Tukey’s method for a set of confidence intervals or a set of significance tests. Link It In a one-way ANOVA, rejection of the null hypothesis is evidence of a difference in the means of the populations, but the result does not tell us which differences between the means are statistically significant. Typically, we perform a more detailed follow-up analysis to decide which of the means are different and to estimate how large these differences are. This can be done using Tukey’s pairwise multiple comparisons. These tell us which populations are significantly different from each other and, through the associated confidence intervals, how large these differences are. Sometimes we have more specific questions about the population means that can be answered using further contrasts among the means. The multiple regression models in Chapter 29 extend the simple linear regression model, allowing us to study the effect of several predictors on the mean of a quantitative response. Similarly, the two-way ANOVA extends the one-way ANOVA described in Chapter 27, allowing us to study the effect of two factors on the mean of a quantitative response. In the two-way ANOVA, we have specific questions about the two factors: what are the main effects of each factor, and do the two factors interact? These questions can be answered by examining appropriate plots of the means and the F tests in the ANOVA table. In the presence of strong interaction, the main effects may be of little interest. As in the one-way ANOVA, a follow-up analysis may be necessary to obtain more detailed information about the effects of the factors. 30-33 30-34 CHAP TER 3 0 ■ More about Analysis of Variance C H E C K YO U R S K I L L S 30.17 In an ANOVA that compares three treatments, how many pairwise comparisons between two of these treatments are there? (a) two (b) three (c) four 30.18 After an ANOVA that compares four treatments, you calculate a separate two-sample t 95% confidence interval for each pairwise difference formed from the four population means. Are you 95% confident that these intervals capture all the true differences? (a) No, because 95% confidence applies to each interval alone, not to all the intervals together. (b) No, because the ordinary two-sample t confidence intervals cannot be used for a difference between two population means when we have data from other populations as well. (c) Yes, because under the ANOVA conditions, following the t procedure gives correct 95% confidence intervals. 30.19 As part of an ANOVA that compares three treatments, you carry out Tukey pairwise tests at the overall 5% significance level. The Tukey tests find that 1 is significantly different from 2 but that the other two comparisons are not significant. You can be 95% confident that (a) 1 ⬆ 2 and 1 3 and 2 3 . (b) just 1 ⬆ 2 ; there is not enough evidence to draw conclusions about the other pairs of means. (c) 1 3 and 2 3 and this implies that it must also be true that 1 2 . 30.20 The purpose of two-way ANOVA is to learn about (a) the means of two populations. (b) the variances of two populations. (c) the combined effects of two factors on a quantitative response. 30.21 A two-way ANOVA compares two exercise programs (A and B) and two exercise frequencies (twice a week and four times a week). The response variable is weight lost after eight weeks of exercise. An interaction is present in the data when (a) the mean weight loss is not the same for Program A and Program B. (b) the mean weight loss under Program A is different for twice-weekly and four-times-weekly subjects. (c) the difference between the mean weight losses for Programs A and B is not the same for twiceweekly and four-times-weekly subjects. 30.22 When the F test for interaction in a two-way ANOVA is significant, (a) the main effects are never meaningful and should be ignored. (b) the main effects must be interpreted with caution. (c) interpret each main effect just as you would in a one-way ANOVA. A student project measured the increase in the heart rates of fellow students when they stepped up and down for three minutes to the beat of a metronome. The explanatory variables are step height (Lo 5.75 inches, Hi 11.5 inches) and metronome beat (Slow 14 steps/minute, Med 21 steps/minute, Fast 28 steps/minute). The subject’s heart rate was measured for 20 seconds before and after stepping. The response variable is the increase in heart rate during exercise.10 Use the ANOVA table in Figure 30.16 to answer the following questions. 30.23 How many treatment groups were there for this experiment? (a) 3 (b) 5 (c) 6 Minitab Session Two-way ANOVA: HRinc versus Freq, Height Source Freq Height Interaction Error Total DF 2 1 2 24 29 SS 4048.8 1920.0 218.4 2700.0 8887.2 S = 10.61 R-Sq = 69.62% MS 2024.4 1920.0 109.2 112.5 F 17.99 17.07 0.97 R-Sq(adj) = 63.29% F I GU R E 3 0 . 1 6 Two-way ANOVA table for Exercises 30.23 to 30.27. P 0.000 0.000 0.393 Exercises 30.24 What is the value of the test statistic for interaction between frequency and height? (a) 218.4 (b) 109.2 (c) 0.97 (d) 0.393 30.25 What is the estimate for the common standard deviation ? (a) 218.4 (b) 109.2 (c) 10.61 30.26 How many students were in the study? (a) 29 (b) 30 (c) It cannot be determined from the ANOVA table. 30-35 30.27 Which statement provides the best summary of the interaction between frequency and height? (a) The interaction is significant at the 1% level but not at the 5% level. (b) The interaction is significant at the 5% level but not at the 1% level. (c) The interaction is not significant at either the 1% or the 5% level. CHAPTER 30 EXERCISES 6.752 to 9.021 for bihai ⫺ red 10.165 to 12.670 for bihai ⫺ yellow DATA 2.375 to 4.688 for red ⫺ yellow (a) How confident are you that all three of these intervals capture the true differences between pairs of population means? (b) Write a short summary of the results of the ANOVA, including the multiple comparisons. 30.29 (Optional topic) Comparing tropical flowers: a contrast. The data in Table 27.1 contain lengths for the bihai variety of Heliconia and for two forms (red and yellow) of the caribaea variety. We wonder (before looking at the data) whether the population mean for bihai differs from the average of the means for the two forms of caribaea. FLOWER (a) What contrast expresses this comparison? (b) Give the null and alternative hypotheses for this comparison, find the sample contrast, and assess its statistical significance. What do you conclude? (c) Give a 90% confidence interval for the population contrast in part (a). 30.30 Does nature heal best? Our bodies have a natural electrical field that helps wounds heal. Might higher or lower levels speed healing? An experiment with newts investigated this question. Newts were randomly assigned to five groups. In four of the groups, an electrode applied to one hind limb (chosen at DATA FLOWER random) changed the natural field, whereas the other hind limb was not manipulated. Both limbs in the fifth (control) group remained in their natural state.11 Table 30.5 gives data from this experiment. The “Group” variable shows the field applied as a multiple of the natural field for each newt. For example, “0.5” is half the natural field, “1” is the natural level (the control group), and “1.5” indicates a field 1.5 times natural. “Diff” is the response variable, the difference in the healing rate (in micrometers per hour) of cuts made in the experimental and control limbs of that newt. Negative values mean that the experimental limb healed more slowly. The investigators conjectured that nature heals best, so that changing the field from the natural state (the “1” group) will slow healing. Carry out a one-way ANOVA to compare the mean healing rates for the five groups. Then perform Tukey multiple comparisons for the 10 pairs of population means. Use the “underline” method illustrated in Example 30.14 to display the complicated results. What do you conclude? NEWTHEAL 30.31 (Optional topic) Does nature heal best? A contrast. The researchers who carried out the study of healing in newts described in Exercise 30.30 conjectured that healing is fastest under natural conditions. If this is true, the population mean for the 1 group (natural field level) in Table 30.5 (page 30–36) should be larger (or less negative) than the average of the population means for the other four groups. NEWTHEAL DATA DATA 30.28 Comparing tropical flowers. Example 27.1 (page 645) describes a study of the lengths of three varieties of the tropical flower Heliconia. Table 27.1 gives the data. One-way ANOVA gives strong evidence (F ⫽ 259.12, P ⬍ 0.0001) that the three population mean lengths are not all equal. Software gives these Tukey 95% simultaneous confidence intervals: (a) What population contrast expresses this comparison? What null and alternative hypotheses in terms of this contrast should the researchers use to test their conjecture? (b) Carry out the test and state your conclusion. 30-36 CHAP TER 3 0 ■ More about Analysis of Variance T A B L E 3 0 . 5 EFFECT OF ELECTRICAL FIELD ON HEALING RATE IN NEWTS GROUP 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 DIFF 1 10 3 3 31 4 12 3 7 10 22 4 1 3 GROUP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 DATA (c) After you see the five sample means, it is tempting to contrast the average for the 1 and 1.25 groups with the average for the other three groups. Explain carefully why this is not legitimate. 30.32 Exercise and heart rate. A student project measured the increase in the heart rates of fellow students when they stepped up and down for three minutes to the beat of a metronome. The explanatory variables are step height (Lo 5.75 inches, Hi 11.5 inches) and metronome beat (Slow 14 steps/minute, Med 21 steps/minute, Fast 28 steps/minute). The subject’s heart rate was measured for 20 seconds before and after stepping. The response variable is the increase in heart rate during exercise.12 The data file has resting and final heart rates as well as the increase in heart rate for this study. If the randomization worked well, there should be no significant differences among the six groups in mean resting heart rate (variable HRrest in the data file). HEARTRTE (a) How many pairwise comparisons are there among the means of six populations? (b) Use Tukey’s method to compare these means at the overall 10% significance level. 30.33 Hooded rats: object play times. Exercise 30.12 describes an experiment to study the effects of social isolation on the behavior of hooded rats. You have analyzed the effects on social play. Now look at an- DIFF 7 15 4 16 2 13 5 4 2 14 5 11 10 3 6 1 13 8 GROUP 1.25 1.25 1.25 1.25 1.25 1.25 1.25 1.25 1.25 1.25 1.25 1.25 1.25 1.25 1.25 DIFF 1 8 15 14 7 1 11 8 11 4 7 14 0 5 2 GROUP 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 DIFF 13 49 16 8 2 35 11 46 22 2 10 4 10 2 5 other response variable, the time that a rat spends in object play during an observation period. The data file records the time (in seconds) that each rat devoted to object play. OBJTIME DATA DIFF 10 12 9 11 1 6 31 5 13 2 7 8 (a) Make a plot of the four group means. Is there a large interaction between gender and housing type? Which main effect appears to be more important? (b) Verify that the conditions for ANOVA inference are satisfied. (c) Give the complete two-way ANOVA table. What are the F statistics and P-values for interaction and the two main effects? Explain why the test results confirm the tentative conclusions you drew from the plot of means. 30.34 Hooded rats: object play counts. The researchers who conducted the study in Exercise 30.33 4step also recorded the number of times each of three types of behavior (object play, locomotor play, and social play) occurred. The data file contains the counts of object play episodes for each rat during the observation period. Carry out a complete analysis of the effects of gender and housing type. OBJCOUN DATA GROUP 0 0 0 0 0 0 0 0 0 0 0 0 30.35 Herbicide and corn hybrids. Genetic engineering has produced new corn hybrids that resist the effects of herbicides. This allows more effective control of weeds, because herbicides don’t damage 30-37 Exercises DATA (a) Construct a plot of means to examine the effects of application rate and hybrid and their interaction. (b) Are the conditions for ANOVA inference satisfied? Explain. 30.36 Girls’ cross-country times. Ten runners each completed 10 five-kilometer races for the Thomas Worthington High School girls’ cross-country team during the 2004 season. The data file contains each race time in minutes.14 GIRLSRUN (a) Prepare side-by-side boxplots to compare the distributions of race times for the 10 runners. Identify the best runner on the team. (b) Use runner as a blocking variable to see if there are significant differences in the overall mean race times for the different meets. Identify the appropriate parameters, state the null and alternative hypotheses, calculate the test statistic and P-value, and state your conclusion in context. (c) Should follow-up multiple comparisons be used to identify any significant differences? If so, summarize the appropriate procedure. If not, explain why not. 30.37 (Optional topic) Girls’ cross-country times: a contrast. Continue your analysis of the data from Exercise 30.36 by comparing average race time during the first half of the season with the average for the second half. Identify a population contrast L that makes this comparison, estimate this contrast, and state your conclusion. 30.38 Girls’ cross-country times: peak at the end? Coaches like to see their athletes “peak” at the end 4step of the season. Is there significant evidence that the average finish time for the 10th meet is significantly lower than the average finish time for the first nine meets? 30.39 Comparisons among means for one factor in a two-way analysis. We have illustrated Tukey pairwise comparisons among all treatment means in both one-way and two-way settings. The method can also compare the mean responses to just one of the two factors in a two-way setting: do a one-way ANOVA on the two-way data listing only one factor as an explanatory variable. This combines data for all levels of the other factor, so it is useful only when interactions are small. Return to the data on phosphorus in DATA tomato plants, Table 30.4. Do a one-way ANOVA that uses all 36 observations with fertilizer type as the only factor. Ask for Tukey pairwise comparisons among the three levels of fertilizer, with overall confidence level 95%. TOMATOES (a) How many observations per group does your analysis use? (b) What do you conclude from the F statistic and its P-value? (c) Use Tukey: which pairwise differences of means for the three fertilizer levels are significant at the overall 5% level? (d) Do you think these pairwise comparisons are useful for these data? (Hint: What population does each of the three samples represent?) 30.40 Fourth-graders composing music. The Orff xylophone is often used in teaching music to children because it has removable bars that allow the teacher to present different options to the students. An education researcher used the Orff xylophone to examine the effect of tonality (pentatonic or harmonic minor) and number of xylophone bars (five or 10) on the ability of fourth-graders to compose melodies that they could play repeatedly.15 © Arclight/Alamy Images DATA the corn. A study compared the effects of the herbicide glusofinate on a number of corn hybrids. The percents of necrosis (leaf burn) 10 days after application of glusofinate for several application rates (kilograms per hectare) and three corn hybrids, two resistant and one not, are provided in the data file.13 CORNHYB (a) Twelve children were randomly assigned to each combination of tonality and bar count. Give the two-way layout for this experiment. (b) Judges listened to tapes of the children’s work and assigned scores for several aspects of the melodies. One response variable measured the extent to which children generated new musical ideas in consecutive five-second intervals of their melodies. Here is the two-way ANOVA table for this variable (the publication does not give the group means): F P 1 44.08 0.43 0.52 1 705.33 6.81 0.01 1 50.02 0.48 0.49 44 4556.54 DF Tonality Bar count Error More about Analysis of Variance SUM OF SQUARES SOURCE Interaction ■ Comment on the significance of main effects and interaction. Then make a recommendation for teachers who want to use the Orff xylophone to encourage children to generate melodies with new musical ideas. 30.41 Cues for listening comprehension. In speaking, we often signal a shift to the next step in the discus4step sion by saying things like “Let’s talk about . . .” or “Finally, . . .” These are “discourse-signaling cues.” Do such cues help listeners better comprehend a second language? One study of this question involved 80 Korean learners of English as a foreign language.16 Half of the 80 learners listened to a lecture in English with discourse-signaling cues, and the other half heard the same lecture without cues. After the lecture, half of each group were asked to summarize information from the lecture and the other half simply to recall information. Here are the group means and two-way ANOVA table for one response variable, a measure of comprehension of high-level information: LECTURE CUES NO CUES Summary 23 17 Recall 19 13 Task SOURCE DF SUM OF SQUARES F P Lecture 1 649.80 16.582 0.000 Task 1 281.25 7.177 0.009 1 0.20 0.005 0.943 76 2978.30 Interaction Error Use this information to comment on the impact of discourse-signaling cues and the type of task performed. (Assume that the data satisfy the conditions for ANOVA.) 30.42 Fertilization of bromeliads. Does the type of fertilization have a significant effect on leaf development for bromeliads? Researchers compared leaf production and death over a seven-month period for a control group (C), a group fertilized only with nitrogen (N), a group fertilized only with phosphorus (P), and a group that received both nitrogen and phosphorus (NP).17 In the nitrogen and phosphorus columns in the data file, 0 indicates no fertilization and 1 indicates fertilization. The “new” and “dead” columns give the total number of new or dead leaves produced over the seven months following fertilization. “Change” gives the difference “new” minus “dead.” BROMLIAD (a) Figure 30.17 (page 30–39) displays the one-way ANOVA table and Tukey pairwise comparisons for the number of new leaves. Are there significant differences among the four group means? List the four group means in order from smallest to largest and give a summary of the Tukey procedure similar to that in Example 30.14. (b) Give the two-way ANOVA table for the number of new leaves. How do your conclusions from this analysis differ from your conclusion in part (a)? 30.43 Fertilization of bromeliads, continued. Repeat the analysis of Exercise 30.42 using number of dead leaves as the response variable. 30.44 Fertilization of bromeliads, continued. Repeat the analysis of Exercise 30.43 using change in the number of leaves as the response variable. 30.45 (Optional topic) ANOVA with one observation per treatment. If a two-way layout has just one observation for each treatment, the mean square for error (MSE) is 0 because there is no variation within each treatment group. The usual two-way ANOVA F tests can’t be done, because they have MSE in their denominators. But if we are willing to assume that there is no interaction between the two factors, the interaction mean square can be used as the denominator of F statistics for testing the two main effects. Software usually allows you to choose “no interaction.” Here is an example. Does the addition of sodium chloride change brick quality? One measure of quality is “resistance anisotropy,” which combines the results of ultrasound and mechanical tests. Researchers tested two types of bricks, one without sodium chloride (MPW) and one with sodium chloride (MPS), at three firing temperatures (850, 1000, and 1100 degrees Celsius). Here are the data for one brick per treatment:18 BRICKS DATA CHAP TER 3 0 DATA 30-38 BRICK TYPE TEMPERATURE (ºC) RESISTANCE MPW 850 3.29 MPW 1000 5.78 MPW 1100 4.84 MPS 850 2.57 MPS 1000 3.90 MPS 1100 2.57 Exercises 30-39 Minitab Session One-way ANOVA: New versus treatment Source treatment Error Total S = 1.595 DF 3 28 31 SS 28.75 71.25 100.00 R-Sq = 28.75% MS 9.58 2.54 F 3.77 P 0.002 R-Sq(adj) = 21.12% Individual 95% CIs For Mean Based on Poled StDev Level C NP P N 8 8 8 8 Mean 13.250 15.625 14.625 13.500 StDev 1.909 ( 1.685 1.302 ( 1.414 * ) ( + ( * 13.2 * ) ) ) 14.4 15.6 16.8 Pooled StDev = 1.595 Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of treatment Individual confidence level = 98.92% treatment = C subtracted from: treatment Lower N 0.198 NP -0.802 P -1.927 Center 2.375 1.375 0.250 Upper 4.552 3.552 2.427 * ( * ( * ( -2.5 ) ) ) 0.0 2.5 5.0 2.5 5.0 2.5 5.0 treatment = N subtracted from: treatment Lower NP -3.177 P -4.302 Center -1.000 -2.125 Upper 1.177 0.052 ( * ( * ) ) -2.5 0.0 treatment = NP subtracted from: treatment Lower P -3.302 Center -1.125 Upper 1.052 ( -2.5 * ) 0.0 F I G U R E 3 0 . 17 One-way ANOVA table and Tukey pairwise multiple comparisons from Minitab for Exercise 30.42. (a) Make a plot of the group means (which are the same as the individual observations). Is the interaction between type of brick and firing temperature small? If so, it may be reasonable to assume that in the population there is no interaction. (b) The researchers did fit a two-way ANOVA model without interaction. Use statistical software to obtain this two-way ANOVA table. What do you conclude about the main effects? 30.46 Packaging and perceived healthiness. Can the font and color used in a packaged product affect 4step our perception of the product’s healthiness? It was hypothesized that use of a natural font, which looks more handwritten and tends to be more slanted and curved, might lead to a different perception of product healthiness than an unnatural font. Two fonts, Impact and Sketchflow Print, were used. These fonts were shown in a previous CHAP TER 3 0 ■ More about Analysis of Variance study to differ in their perceived naturalness, but otherwise were rated similarly on factors such as readability and likeability. The two package colors used were red and green. Images of four identical packages differing only in the font and color were presented to the subjects. Participants read statements such as “This product is healthy” /natural/ wholesome/organic and they rated how much they agreed with the statement on a seven-point scale with 1 indicating “strongly agree” and 7 indicat- ing “strongly disagree.” Each subject’s responses were combined to create a perceived healthiness score.19 The researcher had 252 students available to serve as subjects, and randomly assigned 63 to each treatment, or combination of color and font. The data file records the perceived healthiness score for each subject. Carry out a complete analysis of the effects of color and font on perceived healthiness, using Example 30.12 to guide your work. COOKIES DATA 30-40 Exploring the Web 30.47 Confidence in the banking system. The General Social Survey (GSS) is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. The survey is conducted by the National Opinion Research Center of the University of Chicago, which interviews face-to-face a randomly selected sample of adults (18 and older). SDA (Survey Documentation and Analysis) is a set of programs that allows you to analyze survey data and includes the GSS as part of its archive. In Exercises 27.44 and 27.45 (page 674), you used data from the GSS to study the relationship between the average respondent age and confidence in the banking system. A one-way ANOVA showed a statistically significant difference between the age of the respondent and his or her confidence in the banking system. Download the data file following the directions in Exercise 27.45, and do Tukey’s multiple comparisons at the 5% overall significance level to compare the mean age for the three levels of confidence in the banking system. Which pairs of means are different? 30.48 Find a two-way ANOVA. Find an example of a two-way ANOVA on the web. The Journal of the American Medical Association (jama.ama-assn. org), Science Magazine (www.sciencemag.org), the Canadian Medical Association Journal (www.cmaj.ca), and the Journal of Marketing Research (www.atypon-link.com/AMA/loi/jmkr) are possible sources. To help locate an article, look through the abstracts of articles. If you are having difficulty, consider study 1 in Psychological Science, 19 (2008), pp. 268–273. Once you find a suitable article, read the article and then briefly describe the study (including the two factors and the number of levels of each factor) and its conclusions. If the means are given, plot them and discuss the interaction. Whether or not the study is balanced, the test for interaction is interpreted in the same way. If P-values are reported, be sure to discuss them in your summary. Also, be sure to give the reference (either the web link or the journal, issue, year, title of the paper, authors, and page numbers). NOTES AND DATA SOURCES Chapter 30 Notes 1. We thank Charles Cannon of Duke University for providing the data. The study report is C. H. Cannon, D. R. Peart, and M. Leighton, “Tree species diversity in commercially logged Bornean rainforest,” Science, 281 (1998), pp. 1366–1367. Exploring the Web 2. Modified from M. C. Wilson and R. E. Shade, “Relative attractiveness of various luminescent colors to the cereal leaf beetle and the meadow spittlebug,” Journal of Economic Entomology, 60 (1967), pp. 578–580. 3. In some cases, we think of the blocks in a block design as a sample from some larger population of potential blocks. For example, a clinical trial may compare several medical therapies (the factor) using patients at several medical centers (the blocks). We are not interested in these particular medical centers; they simply represent all hospitals at which the therapies might be used. Such blocks are called “random.” ANOVA with one or more random factors is an advanced topic. In other cases, the specific blocks are of interest in their own right. For example, a clinical trial might compare several therapies (the factor) on both female and male patients (the blocks). We randomly assign therapies among the women and separately among the men, a randomized block design. We are interested in the effect of gender on response to the therapies. Such blocks are called “fixed.” In this chapter, we allow randomized block designs with fixed blocks, analyzing the data by treating the blocks as the second factor in two-way ANOVA. 4. Victoria L. Brescoll and Eric L. Uhlmann, “Can an angry woman get ahead? Status conferral, gender and expression of emotion in the workplace,” Psychological Science, 19 (2008), pp. 268–273. The description and data are based on study 1 in this article. 5. Orit E. Hetzroni, “The effects of active versus passive computer-assisted instruction on the acquisition, retention, and generalization of Blissymbols while using elements for teaching compounds,” PhD thesis, Purdue University, 1995. 6. Data courtesy of David LeBauer, University of California, Irvine. Provided by Brigitte Baldi. 7. Simulated data, based on data summaries in G. L. Cromwell et al., “A comparison of the nutritive value of opaque-2, floury-2, and normal corn for the chick,” Poultry Science, 57 (1968), pp. 840–847. 8. We thank Andy Niemiec and Robbie Molden for data from a summer science project at Kenyon College. Provided by Brad Hartlaub. 9. We thank Harihuko Itagaki and Andrew Veerde for data from a summer science project at Kenyon College. Provided by Brad Hartlaub. 10. Data from the EESEE story “Stepping Up Your Heart Rate.” 11. Data provided by Drina Iglesia, Purdue University. The data are part of a larger study reported in D. D. S. Iglesia, E. J. Cragoe, Jr., and J. W. Vanable, “Electric field strength and epithelization in the newt (Notophthalmus viridescens),” Journal of Experimental Zoology, 274 (1996), pp. 56–62. 12. See Note 10. 13. Christopher Eric Mowen, “Use of glusofinate in glusofinate resistant corn hybrids,” MS thesis, Purdue University, 1999. 14. We thank Coach Brian Luthy for the data. Provided by Brad Hartlaub. 15. John Kratus, “Effect of available tonality and pitch options on children’s compositional processes and products,” Journal of Research in Music Education, 49 (2001), pp. 294–306. 16. Euen Hyuk Jung, “The role of discourse signaling cues in second language listening comprehension,” Modern Language Journal, 87 (2003), pp. 562–577. 17. We thank Jacqueline Ngai for providing data from Jacqueline T. Ngai and Diane S. Srivastava, “Predators accelerate nutrient cycling in a bromeliad ecosystem,” Science, 314 (2006), p. 963. Counts for the last control plant are simulated because a monkey enjoyed snacking on this plant before the researchers were able to obtain all the counts. 18. G. Cultrone et al., “Ultrasound and mechanical tests combined with ANOVA to evaluate brick quality,” Ceramics International, 27 (2001), pp. 401–406. 19. Alysha Fligner and Xiaoyan Deng, “The effect of font naturalness on perceived healthiness of food products,” Senior thesis, The Ohio State University. 30-41