More about Analysis of Variance

Transcription

More about Analysis of Variance
CHAPTER
30
Robbie Shone/Science Source
More about Analysis
of Variance
In this chapter
we cover...
30.1 Beyond one-way ANOVA
30.2 Follow-up analysis:
Tukey pairwise multiple
comparisons
30.3 Follow-up analysis:
contrasts*
A
nalysis of variance (ANOVA) is a statistical method for comparing the means
of several populations based on independent random samples or on the mean
responses to several treatments in a randomized comparative experiment.
When we compare just two means, we use the two-sample t procedures described
in Chapter 21. ANOVA allows comparison of any number of means. The basic form
of ANOVA is one-way ANOVA, which treats the means being compared as mean
responses to different values of a single variable. For example, in Chapter 27 we used
one-way ANOVA to compare the mean lengths of three varieties of tropical flowers and the mean number of species per plot after each of three logging conditions.
30.4 Two-way ANOVA: conditions,
main effects, and interaction
30.5 Inference for two-way ANOVA
30.6 Some details of two-way
ANOVA*
30.1 Beyond one-way ANOVA
You should recall or review the big ideas of one-way ANOVA from Chapter 27.
One-way ANOVA compares the means ␮1, ␮2, # # # , ␮I of I populations based on
samples of sizes n1, n2, # # # , nI from these populations.
■ The conditions for ANOVA require independent random samples from
each of the I populations (or a randomized comparative experiment with I
treatments), Normal distributions for the response variable in each population,
and a common standard deviation ␴ in all populations. Fortunately, ANOVA
inference is quite robust against moderate violations of the Normality and
common standard deviation conditions.
30-1
30-2
CHAP TER 3 0
■
More about Analysis of Variance
■
multiple comparisons
■
■
ANOVA F statistic
Using many separate two-sample t procedures to compare many pairs of means is
a bad idea because we don’t get a P-value or a confidence level for the complete
set of comparisons together. This is the problem of multiple comparisons.
One-way ANOVA gives a single test for the null hypothesis that all the
population means are the same against the alternative hypothesis that not all
are the same.
ANOVA works by comparing how far apart the sample means are relative
to the variation among individual observations in the same sample. The test
statistic is the ANOVA F statistic
F
F distribution
E X A M P L E 3 0 .1
variation among the sample means
variation among individuals in the same sample
The P-value comes from an F distribution.
■ In basic statistical practice, we combine the F test with data analysis to check
the conditions for ANOVA and to see which means appear to differ and by
how much.
Here is an example that illustrates one-way ANOVA.
■
Logging in the Rain Forest
4step
Fletcher & Baylis/Science Source
STATE: How does logging in a tropical rain forest affect the
forest in later years? Researchers compared forest plots in
Borneo that had never been logged (Group A) with similar
plots nearby that had been logged one year earlier (Group B) and
eight years earlier (Group C). Although the study was not an experiment, the authors explain why we can consider the plots to be
randomly selected. Table 30.1 displays data on the number of tree
species found in each plot.1 Is there evidence that the mean numbers
of species in the three groups differ?
DATA
PLAN: Describe how the three samples appear to differ, check the
conditions for ANOVA, and carry out the one-way ANOVA F test to
compare the three population means.
TREESPEC
T A B L E 3 0 . 1 COUNTS OF TREE SPECIES IN FOREST PLOTS
IN BORNEO
GROUP A
NEVER LOGGED
GROUP B
LOGGED ONE YEAR AGO
GROUP C
LOGGED EIGHT YEARS AGO
22
18
22
20
15
21
13
13
19
13
19
15
11
11
14
7
18
15
15
12
13
2
15
8
17
4
18
14
18
15
15
10
12
30.1
Group A
(never)
Group B
(1 year)
Group C
(8 years)
2
3
4
5
6
7
8
9
10
1 1
12
13
14
15
16
1 7
18
19
20
21
22
2
3
4
5
6
7
8
9
10
1 1
12
13
14
15
16
1 7
18
19
20
21
22
2
3
4
5
6
7
8
9
10
1 1
12
13
14
15
16
1 7
18
19
20
21
22
000
00
0
00
0
0
00
0
0
0
00
0
0
0
000
0
Beyond One-Way ANOVA
30-3
FIG U RE 30.1
Side-by-side stemplots comparing the
counts of tree species in three groups of
forest plots, from Table 30.1.
0
0
0
0
00
0
00
SOLVE: The stemplots in Figure 30.1 show that the distributions are irregular
in shape, as is common for small samples. None of the distributions are strongly
skewed, and there are no extreme outliers. The unlogged Group A plots tend to have
more species than the two logged groups and also show less variation among plots.
The Minitab ANOVA output in Figure 30.2 shows that the standard deviations satisfy
our rule of thumb (page 657) that the largest (4.5) is no more than twice the smallest (3.529). The output also shows that the mean number of species in Group A is
higher than in Groups B and C. The ANOVA F test is highly significant, with P-value
P 0.006.
Minitab
Session
One-way ANOVA: Species versus Group
Source
Group
Error
Total
DF
2
30
32
S = 4.120
SS
204.4
509.2
713.6
MS
102.2
17.0
R-Sq = 28.64%
F
6.02
P
0.006
R-Sq(adj) = 23.88%
Individual 95% CIs for Mean
Based on Pooled StDev
Level
l year
8 years
never
N
12
9
12
Mean
11.750
13.667
17.500
StDev
4.372
4.500
3.529
12.0
Pooled StDev = 4.120
15.0
18.0
21.0
FIG URE 30.2
Minitab ANOVA output for the logging
study of Example 30.1.
30-4
CHAP TER 3 0
■
More about Analysis of Variance
CONCLUDE: The data provide strong evidence (F 6.02, P 0.006) that the
mean number of species per plot is not the same in all three groups. It appears that
the mean species count is higher in unlogged plots. ■
This chapter moves beyond basic one-way ANOVA in two directions.
Follow-up analysis. The ANOVA F test in Example 30.1 tells us only that the
three population means are not the same. We would like to say which means differ
and by how much. For example, do the data allow us to say that the “never logged”
population does have a higher mean species count than the “logged one year ago”
and the “logged eight years ago” populations of forest plots? This is a follow-up analysis to the F test that goes beyond data analysis to confidence intervals and tests of
significance for specific comparisons of means.
Two-way ANOVA. Example 30.1, and one-way ANOVA in general, compares mean responses for several values of just one explanatory variable. In
Example 30.1, that variable is “how long ago this plot was logged.” Suppose
that we have data on two explanatory variables: say, how long ago the plot was
logged and whether it is located in a river bottom or in the highlands. There
are now six groups formed by combinations of time since logging and location,
as follows:
Time since Logging
Never logged
One year
Eight years
Plot Location
River bottom Highlands
Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
One-way ANOVA will still tell us if there is evidence that mean species counts
in these six types of plot differ. But we want more: Does location matter? Does
time since logging matter? And do these two variables interact? That is, does the
effect of logging change when we move from river bottoms to highlands? Perhaps
trees regrow faster in river bottoms, so that logging has less effect there than in
the highlands. To answer these questions, we must extend ANOVA to take into
account the fact that the six groups are formed from two explanatory variables.
This is two-way ANOVA.
We will first discuss follow-up analysis in one-way ANOVA. Fortunately, the
distinction between one-way and two-way doesn’t affect the follow-up methods we
will present. So once we have mastered these methods in the one-way setting, we
can apply them immediately to two-way problems.
Apply Your Knowledge
4step
These exercises review one-way ANOVA. Follow the Plan, Solve, and Conclude steps
of the four-step process, as illustrated in Example 30.1. In the next section, you will do
follow-up analysis in all these settings.
30.1 Good Weather and Tipping. Exercise 27.35 (page 672) gives the data
for a study on the effect of a waitress’s weather prediction on the tipping
percent. Carry out data analysis and ANOVA to determine whether there
30.1
DATA
are differences among the mean tipping percents for the three experimental
conditions.
TIPPING
30.2 Which Color Attracts Beetles Best? To detect the presence of harmful
DATA
insects in farm fields, we can put up boards covered with a sticky material and
examine the insects trapped on the boards. Which colors attract insects best?
Experimenters placed six boards of each of four colors at random locations in
a field of oats and measured the number of cereal leaf beetles trapped. Here
are the data:2
BEETLES
Board Color
Blue
Green
White
Yellow
Beetles Trapped
16
37
21
45
11
32
12
59
20
20
14
48
21
29
17
46
14
37
13
38
7
32
20
47
DATA
Is there evidence that the colors differ in their ability to attract beetles?
30.3 Dogs, Friends, and Stress. If you are a dog lover, perhaps having your
dog along reduces the effect of stress. To examine the effect of pets in
stressful situations, researchers recruited 45 women who said they were
dog lovers. The EESEE story “Stress among Pets and Friends” describes
the results. Fifteen of the subjects were randomly assigned to each of three
groups to do a stressful task alone (the control group), with a good friend
present, or with their dog present. The subject’s mean heart rate during
the task is one measure of the effect of stress. Table 30.2 displays the data.
Are there significant differences among the mean heart rates under the
three conditions?
STRSSPET
T A B L E 3 0 . 2 HEART RATES (BEATS PER MINUTE) AFTER
EXERCISE UNDER THREE CONDITIONS:
P ⫽ WITH PET, F ⴝ WITH FRIEND,
C ⫽ CONTROL GROUP
GROUP
P
P
C
F
C
C
P
P
P
C
C
P
C
C
P
HEART RATE
69.169
68.862
84.738
99.692
87.231
84.877
70.169
64.169
58.692
80.369
91.754
79.662
87.446
87.785
69.231
GROUP
P
F
C
F
F
C
F
C
C
P
P
F
F
P
F
HEART RATE
75.985
91.354
73.277
83.400
100.877
84.523
102.154
77.800
70.877
86.446
97.538
89.815
80.277
85.000
98.200
GROUP
C
F
F
C
F
P
C
C
P
F
F
F
F
P
P
HEART RATE
90.015
101.062
76.908
99.046
97.046
69.538
75.477
62.646
70.077
88.015
81.600
86.985
92.492
72.262
65.446
Beyond One-Way ANOVA
30-5
30-6
CHAP TER 3 0
■
More about Analysis of Variance
30.2 Follow-up analysis: Tukey pairwise
multiple comparisons
In Example 30.1, we saw that there is good evidence that the mean number of
tree species is not the same for plots that have never been logged, plots logged one
year ago, and plots logged eight years ago. The sample means in Figure 30.2 suggest
that (as we might expect) the mean species count is highest in never-logged plots
and lowest in plots logged just a year ago.
EX AM PLE 30.2
Comparing Groups: Individual t Procedures
How much higher is the mean count of tree species in plots that have never been
logged than in plots logged a year ago? A 95% confidence interval comparing
Groups A and B answers this question. Because the conditions for ANOVA require
the population standard deviation to be the same in all three populations of plots, we
will use a version of the two-sample t confidence interval that also assumes equal
standard deviations.
The standard error for the difference of sample means xA xB estimates the standard deviation of the difference, which is
␴2
␴2
1
1
␴
nB
nB
B nA
B nA
because both populations have the same standard deviation ␴. The pooled standard
deviation sp is an estimate of ␴ based on all three samples (see page 663). So the
standard error of xA xB is
sp
1
1
nB
B nA
The Minitab output in Figure 30.2 gives sp 4.120. This estimate has 30 degrees
of freedom, the degrees of freedom for “Error” in the ANOVA. A 95% confidence
interval for ␮A ␮B uses the critical value of the t distribution with 30 degrees of
freedom:
1xA xB 2 t*sp
1
1
nB
B nA
117.50 11.752 12.0422 14.1202
1
1
B 12
12
5.75 3.43
2.32 to 9.18
We are 95% confident that there are on the average between 2.32 and 9.18 more
tree species in plots that have never been logged. Because this confidence interval
does not contain zero, we can reject the null hypothesis of no difference, H0: ␮A ␮B,
in favor of the two-sided alternative at the 5% significance level. ■
pairwise differences
Example 30.2 gives a single 95% confidence interval. We would like to estimate
all three pairwise differences among the population means,
␮A ␮B
!
CAUTION
␮A ␮C
␮B ␮C
Three 95% confidence intervals will not give us 95% confidence that all three
simultaneously capture their true parameter values. This is the problem of
multiple comparisons, discussed on page 647.
30.2
Follow-up Analysis: Tukey Pairwise Multiple Comparisons
In general, we want to give confidence intervals for all pairwise differences
among the population means ␮1, ␮2, # # # , ␮I of I populations. We want an overall
confidence level of (say) 95%. That is, in very many uses of the method, all the
intervals will simultaneously capture the true differences 95% of the time. To
do this, take the number of comparisons into account by replacing the t critical
value t* in Example 30.2 by another critical value based on the distribution of
the difference between the largest and smallest of a set of I sample means. We
will call this critical value m*, for multiple comparisons. Values of m* depend
on the number of populations we are comparing and on the total number of
observations in the samples, as well as on the confidence level we want. Tables
are therefore long and messy, so in practice we rely on software. This method is
named after its inventor, John Tukey (1915–2000), who developed the ideas of
modern data analysis.
30-7
overall confidence level
Alfred Eisenstaedt/Time Life Pictures/Getty Images
Tu ke y Pa i r w i s e M u l t i p l e C o m p a r i s o n s
In the ANOVA setting, we have independent SRSs of size ni from each of I populations having Normal distributions with means ␮i and a common standard deviation
␴. Tukey simultaneous confidence intervals for all pairwise differences ␮i ⫺ ␮j
among the population means have the form
1x i ⫺ x j 2 ⫾ m*sp
1
1
⫹
nj
B ni
Here xi is the sample mean of the ith sample and sp is the pooled estimate of ␴. The
critical value m* depends on the confidence level C, the number of populations I,
and the total number of observations.
To carry out simultaneous tests of the hypotheses
H0: ␮i ⫽ ␮j
Ha: ␮i ⬆ ␮j
for all pairs of population means at fixed significance level ␣ ⫽ 1 ⫺ C, reject H0 for
any pair whose confidence interval does not contain zero.
If all samples are the same size, Tukey simultaneous confidence intervals
provide overall confidence level C. That is, C is the probability that all of the
intervals simultaneously capture the true pairwise differences. If the samples differ
in size, the true confidence level is at least as large as C, so that conclusions are
conservative. Similarly, if all the samples are the same size, the Tukey simultaneous
tests have overall significance level 1 ⫺ C. That is, 1 ⫺ C is the probability that
when all the population means are equal, any of the tests incorrectly rejects its null
hypothesis. If the samples differ in size, the true significance level is smaller than
1 ⫺ C, so that conclusions are conservative.
EX AM PLE 30.3
overall confidence level
overall significance level
Logging in the Rain Forest: Multiple Intervals
Figure 30.3 contains more Minitab output for the ANOVA comparing the mean counts
of tree species in three groups of forest plots in Borneo. We asked for Tukey multiple
comparisons with an overall error rate of 5%. That is, the overall confidence level for
the three intervals together is 95%.
30-8
CHAP TER 3 0
■
More about Analysis of Variance
Minitab
Session
Tukey 95% Simultaneous Confidence Intervals
All Pairwise Comparisons among Levels of Group
Individual confidence level = 98.05%
Group = l year subtracted from:
Group
8 years
never
Lower
−2.567
1.599
Center
1.917
5.750
Upper
6.400
9.901
−5.0
0.0
5.0
10.0
−5.0
0.0
5.0
10.0
Group = 8 years subtracted from:
Group
never
Lower
−0.650
Center
3.833
Upper
8.317
F I G U R E 30.3
Additional Minitab ANOVA output for the logging study of Example 30.3, showing Tukey simultaneous
confidence intervals.
The format of the Minitab output takes some study. Be sure you can see that the
Tukey confidence intervals are
1.599 to 9.901
0.650 to 8.317
6.400 to 2.567
for
for
for
␮A ␮B
␮A ␮C
␮B ␮C
The interval for ␮A ␮B is wider than the individual 95% confidence interval in
Example 30.2. The wider interval is the price we pay for having 95% confidence not
just in one interval but in all three simultaneously. ■
E X A M P L E 3 0 .4
Logging in the Rain Forest: Multiple Tests
The ANOVA null hypothesis is that all population means are equal,
H0: ␮A ␮B ␮C
We know from the output in Figure 30.2 that the ANOVA F test rejects this hypothesis
(F 6.02, P 0.006). So we have good evidence that some pairs of means are
not the same. Which pairs? Look at the simultaneous 95% confidence intervals in
Example 30.3. Which of these intervals do not contain zero? If an interval does not
contain zero, we reject the hypothesis that this pair of population means are equal.
The conclusions are
We can reject
We cannot reject
We cannot reject
H0: ␮A ␮B
H0: ␮A ␮C
H0: ␮B ␮C
This Tukey simultaneous test of three null hypotheses has the property that when
all three hypotheses are true, there is only 5% probability that any of the three tests
wrongly rejects its hypothesis. ■
30.2
Follow-up Analysis: Tukey Pairwise Multiple Comparisons
Think for a moment about the conclusion of Example 30.4. At first glance, it appears to say that ␮A equals ␮C and that ␮B equals ␮C, but that ␮A and ␮B are not
equal. That sounds like nonsense. Now is the time to recall what a test at a fixed
significance level such as 5% tells us: either we do have enough evidence to reject the
null hypothesis, or the data do not give enough evidence to allow rejection. There is
no contradiction in saying that
We do have enough evidence to conclude that ␮A ⬆ ␮B
We do not have enough evidence to conclude that ␮A ⬆ ␮C
We do not have enough evidence to conclude that ␮B ⬆ ␮C
That is, xA 17.500 and xB 11.750 are far enough apart to conclude that
the population means differ, but neither 17.500 nor 11.750 is far enough from
xC 13.667. Notice that the Tukey method does not give a P-value for the three
tests taken together. Rather, we have a set of “reject” or “fail to reject” conclusions
with an overall significance level that we fixed in advance, 5% in this example.
There are many other multiple-comparisons procedures that produce various
simultaneous confidence intervals with an overall confidence level or simultaneous
tests with an overall probability of any false rejection. The Tukey procedures are
probably the most useful. If you can interpret results from Tukey, you can understand output from any multiple-comparisons procedure.
Apply Your Knowledge
30.4 Good Weather and Tipping. In Exercise 30.1, you carried out basic
DATA
ANOVA to compare the mean tipping percents for three experimental
conditions.
TIPPING
DATA
(a) Find the Tukey simultaneous 95% confidence intervals for all pairwise
differences among the three population means.
(b) Explain in simple language what “95% confidence” means for these
intervals.
(c) Which pairs of means differ significantly at the overall 5% significance
level?
30.5 Which Color Attracts Beetles Best? Exercise 30.2 presents data on
the numbers of cereal leaf beetles trapped by boards of four different colors.
Yellow boards appear most effective. ANOVA gives very strong evidence
that the colors differ in their ability to attract beetles.
BEETLES
DATA
(a) How many pairwise comparisons are there when we compare four
colors?
(b) Which pairs of colors are significantly different when we require
significance level 5% for all comparisons as a group? In particular, is
yellow significantly better than every other color?
30.6 Dogs, Friends, and Stress. In Exercise 30.3, the ANOVA F test had
a very small P-value, giving good reason to conclude that mean heart rates
under stress do differ depending on whether a pet, a friend, or no one is present. Do the means for the two treatments (pet, friend) differ significantly
from each other and from the mean for the control group?
STRSSPET
(a) What are the three null hypotheses that formulate these questions?
(b) We want to be 90% confident that we don’t wrongly reject any of the
three null hypotheses. Tukey pairwise comparisons can give conclusions
that meet this condition. What are the conclusions?
30-9
30-10
CHAP TER 3 0
■
More about Analysis of Variance
30.3 Follow-up analysis: contrasts*
Multiple comparisons methods give conclusions about all comparisons in
some class with a measure of confidence that applies to all the comparisons
taken together. For example, Tukey’s method gives conclusions about all pairwise differences among a set of population means. These methods are most useful when we did not have any specific comparison in mind before we produced
the data.
Multiple comparisons procedures sometimes give tests or confidence intervals
for comparisons that don’t interest us. And they may leave out comparisons that do
interest us. If we have specific questions in mind before we produce data, it is more
efficient to plan an analysis that asks these specific questions. Do note that it is not
legitimate to look at the data and then formulate a question suggested by the
! data. If data suggest a specific effect, you need new data to give evidence
CAUTION
that the effect isn’t just chance variation at work.
DATA
EX AM PLE 30.5
BEETLES
Which Color Attracts Beetles Best?
What color should we use on sticky boards placed in a field of oats to attract cereal
leaf beetles? Exercise 30.2 gives data from an experiment in which 24 boards, six
of each color in blue, green, white, and yellow, were placed at random locations in a
field. ANOVA shows that there are significant differences among the mean numbers
of beetles trapped by these colors. We might follow ANOVA with Tukey pairwise
comparisons (Exercise 30.5):
But in fact we have specific questions in mind: we suspect that warm colors are generally more attractive than cold colors. That is, before we produce any data, we suspect
that blue and white boards have similar properties, that green and yellow boards are
similar, and also that the average beetle count for green and yellow is greater than the
average count for blue and white. We therefore want to test three hypotheses:
Hypothesis 1
H0: ␮B ␮W
Ha: ␮B ⬆ ␮W
Hypothesis 2
H0: ␮G ␮Y
Ha: ␮G ⬆ ␮Y
Hypothesis 3
H0: 1␮G ␮Y 2兾2 1␮B ␮W 2兾2
Ha: 1␮G ␮Y 2兾2 1␮B ␮W 2兾2
Two of these hypotheses involve pairwise comparisons. The third does not and also
has a one-sided alternative. ■
We can ask questions about population means by specifying contrasts among the
means.
Contrasts
In the ANOVA setting comparing the means ␮1, ␮2, # # # , ␮I of I populations, a
population contrast is a combination of the means
L c1␮1 c2␮2 p cI␮I
with numerical coefficients that add to zero, c1 c2 p cI 0.
*This material is optional.
30.3
Attracting Beetles: Contrasts
We can restate the three hypotheses in Example 30.5 in terms of three contrasts:
L1 ␮B ␮W
112 1␮B 2 102 1␮G 2 112 1␮W 2 102 1␮Y 2
L2 ␮G ␮Y
102 1␮B 2 112 1␮G 2 102 1␮W 2 112 1␮Y 2
L3 1␮G ␮Y 2兾2 1␮B ␮W 2兾2
11兾22 1␮B 2 11兾22 1␮G 2 11兾22 1␮W 2 11兾22 1␮Y 2
DATA
E X A M P L E 3 0 .6
Follow-up Analysis: Contrasts
BEETLES
Check that the four coefficients in each line do add to zero. In terms of these contrasts, the hypotheses become
Hypothesis 1
H0: L1 0
Ha : L 1 ⬆ 0
Hypothesis 2
H0: L2 0
Ha : L 2 ⬆ 0
Hypothesis 3
H0: L3 0
Ha : L 3 0
■
Some statistical software will test hypotheses and give confidence intervals for
any contrasts you specify. Other software lacks this capability, but it is easy to work
with just information from basic ANOVA output. Here’s how.
To estimate a population contrast
L c1␮1 c2␮2 p cI␮I
use the corresponding sample contrast
ˆL c x c x p c x
1 1
2 2
I I
ˆ
The sample contrast L has standard error (estimated standard deviation)
SE ˆL sp
c21
c22
c2I
p
n2
nI
B n1
I n f e re n c e a b o u t a Po p u l a t i o n C o n t r a s t
In the ANOVA setting, a level C confidence interval for a population contrast
L is
ˆL t*SEˆ
L
where L̂ is the corresponding sample contrast and t* is a critical value from the t
distribution with the degrees of freedom for error in the ANOVA.
To test the hypothesis H0: L ⫽ 0, use the t statistic
ˆL
t
SEˆL
with the same degrees of freedom.
For one-way ANOVA, the degrees of freedom for error are N I, where N is
the total number of observations and I is the number of populations compared (see
page 660). The box states the result more generally so that it applies to two-way
ANOVA as well as one-way. If the contrast is a pairwise difference between means,
the contrast confidence interval is exactly the individual confidence interval illustrated in Example 30.2.
sample contrast
30-11
30-12
CHAP TER 3 0
■
E X A M P L E 3 0 .7
More about Analysis of Variance
Attracting Beetles: Inference for Contrasts
Figure 30.4 displays the Minitab ANOVA output for the study on attracting cereal
leaf beetles. The pooled estimate of ␴ is sp 5.672, and the number of degrees
of freedom for error is 20. Minitab does not offer contrasts, so we must use a
calculator.
The three sample contrasts and their standard errors are
L̂1 1.334
SE1 3.2747
L̂2 16.000
SE2 3.2747
L̂3 23.667
SE3 2.3153
Minitab
Session
One-way ANOVA: Beetles versus Color
Source
Color
Error
Total
DF
3
20
23
S = 5.672
SS
4134.0
643.3
4777.3
MS
1378.0
32.2
R-Sq = 86.53%
F
42.84
P
0.000
R-Sq(adj) = 84.51%
Individual 95% CIs for Mean
Based on Pooled StDev
Level
Blue
Green
White
Yellow
N
6
6
6
6
Mean
14.833
31.167
16.167
47.167
StDev
5.345
6.306
3.764
6.795
12
FIG U R E 3 0 . 4
Minitab ANOVA output for the study
on attracting cereal leaf beetles,
Example 30.7.
24
36
48
Pooled StDev = 5.672
Here are the details for the third contrast:
L3 11兾22 1␮B 2 11兾22 1␮G 2 11兾22 1␮W 2 11兾22 1␮Y 2
L̂3 11兾22 114.8332 11兾22 131.1672 11兾22 116.1672 11兾22 147.1672
23.667
11兾22 2
11兾22 2
11兾22 2
11兾22 2
B
6
6
6
6
15.6722 10.40822
2.3153
SE3 15.6722
If you use software, your answers may differ slightly due to roundoff error in the
hand calculations. A 95% confidence interval for L3 uses t* 2.086 from Table C
with df 20,
Lˆ t* SE 23.667 12.0862 12.31532
3
3
23.667 4.830
18.837 to 28.497
We are 95% confident that the average number of beetles attracted by green and
yellow boards exceeds the average for blue and white boards by between about 18.8
and 28.5 beetles per board.
30.4
Two-Way ANOVA: Conditions, Main Effects, and Interaction
There is very strong evidence that the population contrast L3 is greater than zero.
The t statistic is
t
L̂3
23.667
10.22
SE3
2.3153
with 20 degrees of freedom, and P 0.0005 from Table C. The other t tests conclude that ␮B and ␮W do not differ significantly but that there is a significant difference
between ␮G and ␮Y . ■
Our confidence intervals and tests for contrasts are individual procedures
for each contrast. If we do inference about three contrasts, such as those in
Examples 30.6 and 30.7, we face the problem of multiple comparisons again. That
is, we do not have an overall confidence level for all three intervals together. There
are more advanced multiple-comparisons methods that apply to contrasts just as
Tukey’s method applies to pairwise differences.
Apply Your Knowledge
30.7 Green versus Yellow. Using the Minitab output in Figure 30.4, verify
the values for the sample contrast L̂2 and its standard error given in
Example 30.7. Give a 95% confidence interval for the population contrast
L2. Carry out a test of the hypothesis H0: L2 0 against the two-sided alternative. Be sure to state your conclusions in the setting of the study.
30.8 Logging in the Rain Forest: Contrasts. Figure 30.2 gives basic ANOVA
output for the study of the effects of logging described in Example 30.1. We
might describe the overall effect of logging by comparing the mean species
count for unlogged plots (Group A) with the average of the mean counts for
the two groups of logged plots (Groups B and C).
(a) What population contrast L expresses this comparison?
(b) Starting from the output in Figure 30.2, give the sample contrast that
estimates L and its standard error.
(c) Is there good evidence that the mean species count in unlogged plots
is higher than the average for the two groups of logged plots? State
hypotheses in terms of the population contrast L, and carry out a test.
(d) How much higher is the mean count in unlogged plots than the
average for the two groups of unlogged plots? Give a 95% confidence
interval.
30.4 Two-way ANOVA: conditions,
main effects, and interaction
One-way analysis of variance compares the mean responses from any set of populations or experimental treatments when the responses satisfy the ANOVA conditions. Often, however, a sample or experiment has some design structure that leads
to more specific questions than those answered by the one-way ANOVA F test or by
Tukey pairwise comparisons. It is common, for example, to compare treatments that
are combinations of values of two explanatory variables, two factors in the language
of experimental design. Here is an example that we already met in Chapter 9.
factors
30-13
30-14
CHAP TER 3 0
■
More about Analysis of Variance
Effects of TV Advertising
EX AM PLE 30.8
What are the effects of repeated exposure to an advertising message? The answer
may depend both on the length of the ad and on how often it is repeated. To investigate this question, assign undergraduate students (the subjects) to view a 40-minute
television program that includes ads for a digital camera. Some subjects see a
30-second commercial; others, a 90-second version. The same commercial appears
either one, three, or five times during the program.
This experiment has two factors: length of the commercial, with two values; and
repetitions, with three values. The six combinations of one value of each factor form
six treatments. The subjects are divided at random into six groups, one for each
treatment:
Andrew Harrer/Bloomberg via Getty Images
Variable C Repetitions
Variable R
Length
30 seconds
90 seconds
1 Time
3 Times
5 Times
Group 1
Group 4
Group 2
Group 5
Group 3
Group 6
This is a two-way layout, with values of one factor forming rows and values of the
other forming columns. After watching the TV program, all the subjects fill out a
questionnaire that produces an “intention to buy” score with values between zero and
100. This is the response variable. ■
two-way ANOVA conditions
To analyze data from such a study, we impose some additional conditions. Here
are the two-way ANOVA conditions that will govern our work on this topic:
We have responses for all combinations of values of the two factors (all six cells
in Example 30.8). No combinations are missing in our data. In general, call
the two explanatory variables R and C (for Row and Column). Variable R has
r different values and variable C has c different values. The study compares all
rc combinations of these values. These are called crossed designs.
2. In an observational study, we have independent SRSs from each of the rc populations. If the study is an experiment, the available subjects are allocated at
random among all rc treatments. That is, we have a completely randomized
design. We also allow some randomized block designs in which we have an SRS
from each of r populations and the subjects in each SRS are then separately
allocated at random among the same c treatments.3 We met block designs in
Chapter 9 (see page 241).
3. The response variable has a Normal distribution in each population. The
population mean responses may differ, but all rc populations have a common
standard deviation ␴.
4. We have the same number of individuals n in each of the rc treatment groups or
samples. These are called balanced designs.
1.
crossed designs
balanced designs
As always, the design of the study is the most important condition for statistical
inference. Conditions 1 and 2 describe the designs covered by two-way ANOVA.
These designs are very common. Condition 3 lists the usual ANOVA conditions
on the distribution of the response variable. ANOVA inference is reasonably robust
against violations of these conditions.
When you design a study, you should try to satisfy the fourth condition; that
is, you should choose equal numbers of individuals for each treatment. Balanced
30.4
Two-Way ANOVA: Conditions, Main Effects, and Interaction
designs have several advantages in any ANOVA: F tests are most robust against violation of the “common standard deviation” condition when the subject counts are
equal or close to equal, and Tukey’s method then gives exact overall confidence or
significance levels. In the two-way layout, there is an even stronger reason to prefer
balanced designs. If the numbers of individuals differ among treatments (an unbalanced design), several alternative analyses of the data are possible. These analyses
answer different sets of questions, and you must decide which questions you want
to answer. All the sets of questions and all the analyses collapse to just one in the
balanced case. This makes interpreting your data much simpler.
To understand the questions that two-way ANOVA answers, return to the
advertising study in Example 30.8. In this section, we will assume that we know
the actual population mean responses for each treatment. That is, we deal with an
ideal situation in which we don’t have to worry about random variation in the mean
responses.
EX AM PLE 30.9
Main Effects with No Interaction
Here again is the two-way layout of the advertising study in Example 30.8. The
numbers in the six cells are now made-up values of the population mean responses
to the six treatments:
Variable C Repetitions
Variable R
Length
30 seconds
90 seconds
1 Time
3 Times
5 Times
30
40
45
55
50
60
The mean “intention to buy” scores increase with more repetitions of the camera
ad and also increase with the length of the ad. The means increase by the same
amount (10 points) when we move from 30 seconds to 90 seconds, no matter
how many times the ad is shown. Turning to the other variable, the effect of moving from one to three repetitions is the same (15 points) for both 30-second and
90-second ads, and the effect of moving from three to five repetitions is also the
same (5 points). Because the result of changing the value of one variable is the
same for all values of the other variable, we say that there is no interaction between
the two variables.
Now average the mean responses for 30 seconds and for 90 seconds. The average
for 30 seconds is
30 45 50
125
41.7
3
3
and the average for 90-second ads is
40 55 60
155
51.7
3
3
Because the averages for the two lengths differ, we say that there is a main effect for
length. Similarly average the mean responses for each number of repetitions. They
are 35 for one repetition, 50 for three, and 55 for five. So changing the number of
times the ad is shown has an “on the average” effect on the response. There is a
main effect for repetitions. ■
30-15
30-16
CHAP TER 3 0
■
More about Analysis of Variance
FIG U R E 3 0 . 5
70
90 seconds
60
Average
Mean response
Plot of the means from Example 30.9.
In addition to the means themselves,
the plot displays their average for each
number of repetitions as a dotted line.
The parallel lines show that there is no
interaction between the two factors.
30 seconds
50
40
30
ROCKED BY
ROIDS
How do baseball fans react to
accusations of steroid use by some of
the game’s most high profile players?
Rafael Palmerio, one of four players
with 3000 hits and 500 home runs,
was suspended for 10 days in 2005
for a steroid violation after insisting
at a Congressional hearing that he
never used steroids. When Barry
Bonds hit his 756th home run to
break Hank Aaron's record, many
fans thought his accomplishment was
tainted by allegations of steroid use.
Should an asterisk be placed behind
the names of these players in record
books? Public reaction depends on
geographic region, gender, age, race,
and many other factors. Any careful
study of what fans think will include
several factors.
E X A M P L E 3 0 .10
20
1 rep
3 reps
5 reps
Repetitions
Figure 30.5 plots the cell means from Example 30.9. The two solid lines joining
one, three, and five repetitions for 30-second and for 90-second ads are parallel.
This reflects the fact that the mean response always increases by 10 points when we
move from 30-second to 90-second ads, no matter how many times the ad is shown.
Parallel lines in a plot of means show that there is no interaction. It doesn’t matter which
variable you choose to place on the horizontal axis.
To see the main effect of repetitions, look at the average response for 30 and
90 seconds at each number of repetitions. This average is the dotted line in the
plot. It changes as we move from one to three to five repetitions. A variable has
a main effect when the average response differs for different values of the variable.
“Average” here means averaged over all the values of the other variable. A main
effect of repetitions is present in Figure 30.5 because the dotted “average” line
is not horizontal. A main effect for length is also present, but it can’t be seen
directly in the plot.
Interactions and Main Effects
Now look at this different set of mean responses for the advertising study:
Variable C Repetitions
Variable R
Length
30 seconds
90 seconds
1 Time
3 Times
5 Times
30
40
45
45
50
40
Figure 30.6 plots these means, with the means for 30-second ads connected by solid
black lines and those for 90-second ads by solid red lines.
These means reflect the case in which subjects grow annoyed when the ad is both
longer and repeated more often. The mean responses for the 30-second ad are the
same as in Example 30.9. But showing the 90-second ad three times increases the
mean response by only 5 points, and showing it five times drops the mean back to 40.
There is an interaction between length and repetitions: the difference between 30
30.4
Two-Way ANOVA: Conditions, Main Effects, and Interaction
30-17
FIG U RE 30.6
Plot of the means from Example 30.10,
along with the average for each number
of repetitions. The lines are not parallel,
so there is an interaction between length
and number of repetitions.
70
Mean response
60
50
30 seconds
Average
40
90 seconds
30
20
1 rep
3 reps
5 reps
Repetitions
and 90 seconds changes with the number of repetitions, so that the solid black and
red lines in Figure 30.6 are not parallel.
There is still a main effect of repetitions because the average response (dotted line
in Figure 30.6) changes as we move from one to three to five repetitions. What about
the effect of length? The average over all values of repetitions for 30-second ads is
30 45 50
125
41.7
3
3
For 90-second ads, this average is the same,
40 45 40
125
41.7
3
3
On the average over all numbers of repetitions, changing the length of the ad has no
effect. There is no main effect for length. ■
Interactions and Main Effects
An interaction is present between factors R and C in a two-way layout if the change
in mean response when we move between two values of R is different for different
values of C. (We can interchange the roles of R and C in this statement.)
A main effect for factor R is present if, when we average the responses for a fixed
value of R over all values of C, we do not get the same result for all values of R.
Main effects may have little meaning when interaction is present. After all,
interaction
says that the effect of changing one of the variables is different
CAUTION
for different values of the other variable. The main effect, as an “on the
average” effect, may not tell us much. In Example 30.10, there is no main effect
for the length of the camera ad. But length certainly matters—Figure 30.6 shows
that there is little point in paying for 90-second ads if the same ad will be shown
several times during the program. That’s the interaction of length with number of
repetitions.
!
■
E X A M P L E 3 0 .11
More about Analysis of Variance
Which Effect Is More Important?
There are no simple rules for interpreting results from two-way ANOVA when strong
interaction is present. You must look at plots of means and think. Figure 30.7 displays two different mean plots for a study of the effects of classroom conditions on
the performance of normal and hyperactive schoolchildren. The two conditions are
“quiet” and “noisy,” where the noisy condition is actually the usual environment in
elementary school classrooms.
Normal
Normal
Performance
CHAP TER 3 0
Performance
30-18
Hyperactive
Quiet
Noisy
Work condition
(a)
Hyperactive
Quiet
Noisy
Work condition
(b)
F I GU R E 30.7
Plots of two sets of means for a study comparing the performance of normal and hyperactive children under
two conditions, for Example 30.11.
There is an interaction: normal children perform a bit better under noisy conditions, but hyperactive children perform slightly less well under noisy conditions.
The interaction is exactly the same size in the two plots of Figure 30.7. To see
this, look at the slopes of the “Normal” lines in the two plots: they are the same.
The slopes of the two “Hyperactive” lines are also the same. So the size of the
gap between normal and hyperactive changes by the same amount when we
move from quiet to noisy in both plots even though the gap is much larger in
Figure 30.7(b).
In Figure 30.7(a), this interaction is the most important conclusion of the study.
Both main effects are small: normal children do a bit better than hyperactive children,
for example, but not a great deal better on the average.
In Figure 30.7(b), the main effect of “hyperactive or not” is the big story. Normal
children perform much better than hyperactive children in both environments. The
interaction is still there, but it is not very important in the face of the large difference
in average performance between hyperactive and normal children. ■
In this section, we pretended that we knew the population means so that we
could discuss patterns without needing statistical inference. In practice, we don’t
know the population means. However, plotting the sample means for all groups
is an essential part of data analysis for a two-way layout. Look for interaction and
main effects just as we did in this section. Of course, in real data you will almost
never find exactly parallel lines representing exactly no interaction. Two-way
ANOVA inference helps guide you because it assesses whether the interaction
in the data is statistically significant. We are now ready to introduce two-way
ANOVA inference.
30.4
Two-Way ANOVA: Conditions, Main Effects, and Interaction
Apply Your Knowledge
V2
V2
Average
Average
V1
H1
H2
H3
Yield
Yield
Figure 30.8 shows two different plots of means for a two-way study that compares
the yields of two varieties of soybeans (V1 and V2) when three different herbicides
(H1, H2, and H3) are applied to the fields. Exercises 30.9 and 30.10 ask you to interpret these plots.
V1
H1
H2
Herbicide
Herbicide
(a)
(b)
H3
F I GU R E 3 0 . 8
Plots of two sets of possible means for a study in which the two factors are soybean variety and
type of herbicide, for Exercises 30.9 and 30.10. The plots also show the average for each herbicide
(dotted line).
30.9
Recognizing Effects. Consider the mean responses plotted in Figure 30.8(a).
(a) Is there an interaction between soybean variety and herbicide type?
Why or why not?
(b) Is there a main effect of herbicide type? Why or why not?
(c) Is there a main effect of soybean variety? Why or why not?
30.10 Recognizing Effects. Consider the mean responses plotted in Figure 30.8(b).
(a) Is there an interaction between soybean variety and herbicide type?
Why or why not?
(b) Is there a main effect of herbicide type? Why or why not?
(c) Is there a main effect of soybean variety? Why or why not?
30.11 Angry Women, Sad Men. What are the relationships among the por-
trayal of anger or sadness, gender, and the degree of status conferred?
Sixty-eight subjects were randomly assigned to view a videotaped interview in which either a male or a female professional described feeling
either anger or sadness. The people being interviewed (we’ll call them the
“targets”) wore professional attire and were ostensibly being interviewed
for a job. The targets described an incident in which they and a colleague
lost an account and, when asked by the interviewer how it made them
feel, responded either that the incident made them feel angry or that it
made them feel sad. The subjects were divided into four groups; each
group evaluated one of the four types of interviews.4 After watching the
interview, subjects evaluated the target on a composite measure of status
conferral that included items assessing how much status, power, and independence the target deserved in his or her future job. The measure of
30-19
30-20
CHAP TER 3 0
■
More about Analysis of Variance
status ranged from 1 none to 11 a great deal. Here are the summary
statistics:
Treatment
n
x–
s
Males expressing anger
17
6.47
2.25
Females expressing anger
17
3.75
1.77
Males expressing sadness
17
4.05
1.61
Females expressing sadness
17
5.02
1.80
(a) Display the four treatment means in a two-way layout similar to those
given in Exercises 30.9 and 30.10.
(b) Plot the means and discuss the interaction and the two main effects.
30.5 Inference for two-way ANOVA
Inference for two-way ANOVA is in many ways similar to inference for one-way
ANOVA. Here is a brief outline:
1.
2.
3.
Find and plot the group sample means. Study the plot to understand the interaction and main effects. Do data analysis to check the conditions for ANOVA.
Use software for basic ANOVA inference. There are now three F tests with
three P-values, which answer the questions
Is the interaction statistically significant?
Is the main effect for variable R statistically significant?
Is the main effect for variable C statistically significant?
You may wish to carry out a follow-up analysis. For example, Tukey’s method
makes pairwise comparisons among the means of all rc treatment groups.
We will illustrate two-way ANOVA inference with several examples. In the first
example, the interaction is both small and insignificant, so that the message is in
the main effects.
E X A M P L E 3 0 .12
DATA
4step
Computer-Assisted Instruction
STATE: A study in education concerned computer-aided instruction in the use of
“Blissymbols” for communication. Blissymbols are pictographs (think of Egyptian hieroglyphs) sometimes used to help learning-disabled children. Typically abled schoolchildren
were randomly assigned to treatment groups. There are four groups in a two-way layout:
COMPINST
Placement
Before
During
Learning Style
Active
Passive
Group 1 Group 2
Group 4 Group 3
The two factors are the learning style (active or passive) of the lesson and the placement of the material to be learned (Blissymbols presented before compounds, or
Blissymbols and compounds shown during a joint presentation). The response variable is the number of symbols recognized correctly, out of 24, in a test taken after
the lesson. Table 30.3 displays the data.5
PLAN: Plot the sample means and discuss interaction and main effects. Check the
conditions for ANOVA inference. Use two-way ANOVA F tests to determine the
significance of interaction and main effects.
30.5
Inference for Two-Way ANOVA
30-21
T A B L E 3 0 . 3 TEST SCORES IN AN EDUCATION STUDY
GROUP 1
14
14
16
14
6
10
12
13
12
12
13
9
GROUP 2
4
4
9
8
15
8
9
13
13
12
12
10
GROUP 3
11
8
8
7
14
9
7
8
12
10
6
3
GROUP 4
12
22
9
14
20
15
9
10
11
11
15
6
SOLVE: This is a balanced completely randomized experimental design. Figure 30.9
displays side-by-side stemplots of the scores for the 12 subjects in each group.
Figure 30.10 shows two-way ANOVA output from Minitab. The stemplots show no
departures from Normality strong enough to rule out ANOVA. (One observation, the
6 in Group 1, is slightly outside the range allowed by the 1.5 IQR rule. This would
not be unusual among 48 Normally distributed observations.) You can check that the
group standard deviations satisfy our rule of thumb that the largest (4.648) is no
more than twice the smallest (2.678). We can proceed with ANOVA inference.
Figure 30.11 shows the plots of means produced by Minitab. The two plots display the same four sample means; Minitab helpfully gives a plot with each variable
marked on the horizontal axis. The plots are easy to interpret: the interaction and the
main effect of placement are both small, and the main effect of learning style is quite
large. The three F tests in the Minitab output substantiate what the plots of means
show: interaction (P 0.349) and placement (P 0.838) are not significant, but
learning style (P 0.002) is highly significant.
Group 1
3
4
5
6
7
8
9
10
1 1
12
13
14
15
16
1 7
18
19
20
21
22
0
0
0
000
00
000
0
Group 2
Group 3
Group 4
3
4
5
6
7
8
9
10
1 1
12
13
14
15
16
1 7
18
19
20
21
22
3
4
5
6
7
8
9
10
1 1
12
13
14
15
16
1 7
18
19
20
21
22
3
4
5
6
7
8
9
10
1 1
12
13
14
15
16
1 7
18
19
20
21
22
00
00
00
0
00
00
0
0
0
00
000
0
0
0
0
0
0
00
0
00
0
0
00
0
0
FIG U RE 30.9
Side-by-side stemplots comparing the
counts of correct answers for subjects
in the four treatment groups from
Example 30.12.
30-22
CHAP TER 3 0
■
FIG U R E 3 0. 1 0
Two-way ANOVA output from Minitab
for the computer-assisted instruction
study, Example 30.12.
More about Analysis of Variance
Minitab
Session
ANOVA: score versus learn, place
Factor
learn
place
Type
fixed
fixed
Levels
2
2
Values
Act, Pas
Bef, Dur
Analysis of Variance for score
Source
learn
place
learn*place
Error
Total
DF
1
1
1
44
47
S = 3.50892
SS
130.02
0.52
11.02
541.75
683.31
MS
130.02
0.52
11.02
12.31
R-Sq = 20.72%
F
10.56
0.04
0.90
P
0.002
0.838
0.349
R-Sq(adj) = 15.31%
Means
learn
Act
Pas
N
24
24
score
12.458
9.167
place
Bef
Dur
N
24
24
learn
Act
Act
Pas
Pas
place
Bef
Dur
Bef
Dur
score
10.917
10.708
N
12
12
12
12
score
12.083
12.833
9.750
8.583
Minitab
Interaction Plot (data means) for score
Interaction Plot (data means) for score
Act
Pas
13
12
11
place
place
Bef
Dur
10
9
13
FIG U R E 3 0. 1 1
Minitab plots of the group means
from the computer-assisted learning
study, Example 30.12. The two plots
use the same four means. They differ
only in the choice of which variable
to mark on the horizontal axis.
learn
Act
Pas
12
11
learn
10
9
Bef
Dur
CONCLUDE: Educators know that active learning almost always beats passive
learning. This is the only significant effect that appears in these data. In particular,
placement of material has very little effect under either style of learning. ■
30.5
Inference for Two-Way ANOVA
30-23
The second example illustrates the situation in which there is significant interaction, but main effects are larger and more important. Think of Figure 30.7(b).
Mycorrhizal Colonies and Plant Nutrition
E X A M P L E 3 0 .13
0 kg/ha
28 kg/ha
160 kg/ha
4step
Six plants of each type were assigned at random to each amount of fertilizer. The
response variables describe the levels of nutrients in a plant after 19 weeks, when
the tomatoes are fully ripe. We will look at one response, the percent of phosphorus
in the plant. Table 30.4 contains the data.6
T A B L E 3 0 . 4 PERCENT OF PHOSPHORUS IN TOMATO PLANTS
MUTANT
NORMAL
GROUP
FERT
PCTPHOS
GROUP
FERT
PCTPHOS
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
0
0
0
0
0
0
28
28
28
28
28
28
160
160
160
160
160
160
0.29
0.25
0.27
0.24
0.24
0.20
0.21
0.24
0.21
0.22
0.19
0.17
0.18
0.20
0.19
0.19
0.16
0.17
4
4
4
4
4
4
5
5
5
5
5
5
6
6
6
6
6
6
0
0
0
0
0
0
28
28
28
28
28
28
160
160
160
160
160
160
0.64
0.54
0.53
0.52
0.41
0.43
0.41
0.37
0.50
0.43
0.39
0.44
0.34
0.31
0.36
0.37
0.26
0.27
PLAN: Plot the sample means and discuss interaction and main effects. Check the
conditions for ANOVA inference. Use two-way ANOVA F tests to determine the
significance of interaction and main effects.
DATA
Nitrogen
Tomato Type
Mutant
Normal
Group 1 Group 4
Group 2 Group 5
Group 3 Group 6
Eye of Science/Science Source
STATE: Mycorrhizal fungi are present in the roots of many plants. This is a symbiotic
relationship, in which the plant supplies nutrition to the fungus and the fungus helps
the plant absorb nutrients from the soil. An experiment compared the effects of adding nitrogen fertilizer to two genetic types of tomato plants: a normal variety that is
susceptible to mycorrhizal colonies, and a mutant that is not. Nitrogen was added at
rates of 0, 28, or 160 kilograms per hectare (kg/ha). Here is the two-way layout for
the six treatment combinations:
TOMATOES
30-24
CHAP TER 3 0
■
More about Analysis of Variance
SOLVE: This is a randomized block design. We are willing to assume that each of
our two sets of tomato plants is an SRS from its genetic type. We consider the blocks
(genetic types) as one of the factors in analyzing the data because we are interested
in comparing the two genetic types. In addition to the two-way ANOVA, we might
consider separate one-way ANOVAs for mutant and normal types to draw separate
conclusions about the effect of fertilizer on each type.
Figure 30.12 displays Minitab’s plots of the six sample means. The lines are not
parallel, so interaction is present. The interaction is rather small compared with the
main effects. The main effect of type is expected: normal plants have higher phosphorus levels than the mutants at all levels of fertilization because they benefit from
symbiosis with the fungus. The main effect of fertilizer is a bit surprising: phosphorus
level goes down as the level of nitrogen fertilizer increases.
Minitab
Interaction Plot (data means) for phosphorus level
Interaction Plot (data means) for percent phosphorus
0
28
160
0.5
0.4
Genotype
Mutant
Normal
Plant type
0.3
0.2
0.5
0.4
Nitrogen amount
Nitrogen amount
0
28
160
0.3
FIG U R E 3 0. 1 2
0.2
Minitab plots of the group means
for the study of phosphorus levels
in tomatoes, Example 30.13.
Mutant
Normal
Examination of the data (we don’t show the details) finds no outliers or strong
skewness. But the largest sample standard deviation (0.08329 in Group 4) is much
larger than twice the smallest (0.01472 in Group 3). As the number of
treatment groups increases, even samples from populations with exactly
CAUTION
the same standard deviation are more likely to produce sample standard
deviations that violate our “twice as large” rule of thumb. (Think of comparing the
shortest and tallest person among more and more people.) So our rule of thumb
is often conservative for two-way ANOVA. Nonetheless, ANOVA inference may not
give correct P-values for these data. The P-values for the three two-way ANOVA
F tests are P 0.008 for interaction and P 0.001 for both main effects. These
agree with the mean plots and are so small that even if not accurate, they strongly
suggest significance.
!
CONCLUDE: Normal plants, with their mycorrhizal colonies, have higher phosphorus
levels than mutants that lack such colonies. Nitrogen fertilizer actually reduces phosphorus levels in both types of plants. The reduction is stronger for normal plants, but
this interaction is not very large in practical terms. ■
Finally, here is an example in which strong interaction makes one of the main
effects meaningless. Two-way ANOVA with strong interaction is often difficult to
interpret simply, as this example also illustrates.
30.5
30-25
Better Corn for Heavier Chicks?
STATE: Corn varieties with altered amino acid content can have advantages in feeding
animals. Here is an excerpt from a study that compared normal corn (“norm” in the
data file) with two altered varieties called opaque-2 (“opaq”) and floury-2 (“flou”).
Nine treatments were arranged in a 3 3 factorial experiment to compare
opaque-2, floury-2, and normal corn at dietary protein levels of 20, 16, and
12%. Corn–soybean meal diets containing either opaque-2, floury-2, or normal
corn were formulated so that, for a given protein level, an equivalent amount of
corn protein was supplied by each corn. Male broiler-type chicks were randomly
allotted to treatments at 1 day of age. Feed and water were provided ad libitum.
Chicks were weighed at weekly intervals until termination of the experiment at
21 days.7
There are 10 chicks in each group. The response variable is the weight in grams
after 21 days. Which combinations of corn type and protein content lead to fastest
growth?
PLAN: Plot the sample means and discuss interaction and main effects. Check the
conditions for ANOVA inference. Use two-way ANOVA F tests to determine the significance of interaction and main effects. If necessary, use Tukey pairwise comparisons
to identify significant differences among treatments.
SOLVE: This is a balanced completely randomized experiment with nine treatments.
Figure 30.13 shows Minitab’s plots of the sample means. The mean weight of the
chicks increases with the percent of protein in the diet, as expected. We are primarily interested in comparing the three types of corn. There are important interaction
effects. Normal corn does poorest of the three corn types at 12% protein and best at
20%. Floury is best at both 12% and 16%. Opaque is always inferior to floury and
beats normal corn only at 12% protein.
4step
DATA
E X A M P L E 3 0 .14
Inference for Two-Way ANOVA
CORN
A MOMENTARY
LAPSE
Beware of distracted drivers. A
recent study showed that putting
on makeup, chatting on a cell
phone, eating breakfast, reading,
and reaching for a moving object
significantly increase the risk of being
involved in an accident. About
80 percent of crashes occur
within three seconds of a driver’s
distraction. As you might guess,
the risk depends on the type of
distraction. As usual, other factors
also play a role: for example, young
drivers are four times more likely
to have crashes and near-crashes
as drivers over the age of 35.
Minitab
Interaction Plot (data means) for Weight
Interaction Plot (data means) for Weight
12
16
20
500
400
Corn
Corn
Flou
Norm
Opaq
300
200
500
400
Protein
Protein
12
16
20
300
FIG U RE 30.13
200
Flou
Norm
Opaq
Although we don’t show the details, ANOVA is justified. There are no outliers or
strong skewness. Although the largest sample standard deviation (61.16 for Group 4)
is a bit more than twice the smallest (25.99 for Group 6), this is common when we
have nine groups, even when all populations really have the same ␴.
Minitab plots of the mean weights of
21-day-old chicks fed nine different
diets, for Example 30.14.
30-26
CHAP TER 3 0
■
FIG U R E 3 0. 1 4
Two-way ANOVA output from
Minitab for the study of the effect of
diet on growth, Example 30.14.
More about Analysis of Variance
Minitab
Session
ANOVA: Weight versus Corn, Protein
Analysis of Variance for Weight
SS
31137
745621
60894
176951
1014602
Source
Corn
Protein
Corn*Protein
Error
Total
DF
2
2
4
81
89
S = 46.7395
R-Sq = 82.56%
MS
15568
372810
15223
2185
F
7.13
170.66
6.97
P
0.001
0.000
0.000
R-Sq(adj) = 80.84%
Means
Corn
flou
norm
opaq
N
30
30
30
Protein
12
16
20
Corn
flou
flou
flou
norm
norm
norm
opaq
opaq
opaq
Weight
390.27
356.63
346.84
N
30
30
30
Protein
12
16
20
12
16
20
12
16
20
Weight
243.49
387.33
462.93
N
10
10
10
10
10
10
10
10
10
Weight
271.39
434.22
465.21
195.34
389.03
485.53
263.73
338.75
438.05
Figure 30.14 contains Minitab’s ANOVA output. We included the means for the nine
groups, for the three types of corn, and for the three levels of protein. The main effects
can be seen in the different mean weights for the corn types and for the protein levels.
The three F tests are all highly significant. Although there is a substantial main effect for
corn (the means range from 346.84 grams for opaque to 390.27 grams for floury),
this has little meaning in the light of the interaction that we just described.
When strong interaction makes one or both main effects hard to interpret, it is often
useful to find the Tukey pairwise multiple comparisons for all the treatment groups. One
way to do this is to do a one-way ANOVA with just “group” as the explanatory variable.
There are 36 pairwise comparisons among nine groups. Minitab’s Tukey output is both
long and hard to understand in such cases. Here is a condensed version using an idea
that some software (though not Minitab) implements. Arrange the nine groups in the
order of their sample means, from smallest to largest. We have identified the groups
both by their group number in the data file and by the treatment. Connect with an
underline all pairs of treatments that do not differ significantly at the overall 5% level:
Treatment: N12
Group:
4
O12 F12
7
1
------
O16 N16 F16 O20 F20 N20
8
5
2
9
3
6
-------------------------------
Group 4 has significantly smaller mean weight than any other group. Groups 7 and 1
do not differ significantly but are higher than Group 4 and lower than all other groups.
30.5
Inference for Two-Way ANOVA
30-27
Group 8 is not significantly different from Group 5 but is higher than Groups 4, 7, and
1 and lower than Groups 2, 9, 3, and 6. And so on. The most interesting finding is
that, at the high end, Groups 2, 9, 3, and 6 do not have significantly different mean
weights. Three of these are the three 20% protein groups, but floury corn with 16%
protein belongs with these three.
CONCLUDE: More protein clearly helps chicks grow faster. The three types of corn
do not differ significantly when the diet has 20% protein. Floury corn is superior to
both opaque and normal corn at middle (16%) and low (12%) protein levels, though
not all differences at these levels are statistically significant. ■
Apply Your Knowledge
© ARCO/H. Reinhard/age fotostock
30.12 Hooded Rats: Social Play Times. How does social isolation during a
DATA
critical developmental period affect the behavior of hooded rats? Psychology
students assigned 24 young female rats at random to either isolated or group
housing, then similarly assigned 24 young male rats. This is a randomized
block design with the gender of the 48 rats as the blocking variable and housing type as the treatment. Later, the students observed the rats at play in a
group setting and recorded data on three types of behavior (object play, locomotor play, and social play).8 The data file records the time (in seconds) that
each rat devoted to social play during the observation period.
SCLTIME
DATA
(a) Make a plot of the four group means. Is there a large interaction
between gender and housing type? Which main effect appears to be
more important?
(b) Verify that the conditions for ANOVA inference are satisfied.
(c) Give the complete two-way ANOVA table. What are the F statistics and
P-values for interaction and the two main effects? Explain why the test
results confirm the tentative conclusions you drew from the plot of means.
30.13 Hooded Rats: Social Play Counts. The researchers who conducted the
study in Exercise 30.12 also recorded the number of times each of three
types of behavior (object play, locomotor play, and social play) occurred.
The data file contains the counts of social play episodes by each rat during the observation period. Use two-way ANOVA to analyze the effects of
gender and housing.
SCLCOUNT
30.14 Metabolic Rates in Caterpillars. Professor Harihuko Itagaki and his
DATA
students have been measuring metabolic rates in tobacco hornworm caterpillars (Manduca sexta) for years. The researchers do not want the metabolic
rates to depend on which analyzer they use to obtain the measurements.
They therefore make six repeated measurements on each of three caterpillars with each of three analyzers.9
CATRPLRS
(a) Use software to give the two-way ANOVA table. The researchers
used caterpillar as a blocking variable because metabolic rates vary
from individual to individual. These three caterpillars are not of
interest in themselves. They represent the larger population of caterpillars of this species. We should therefore avoid inference about the
main effect of caterpillars.
(b) Explain why the researchers will be unhappy if there is a significant
interaction between analyzer and caterpillar. What does your analysis
show about the interaction?
(c) Is there a significant effect for analyzer? Do you think the researchers
will be happy with this result?
4step
30-28
CHAP TER 3 0
■
More about Analysis of Variance
30.6 Some details of two-way ANOVA*
All ANOVA F statistics work on the same principle: compare the variation due
to the effect being tested with a benchmark level of variation that would be present
even if that effect were absent. The three F tests for two-way ANOVA use the same
benchmark as the one-way ANOVA F test, namely, the variation among individual
responses within the same treatment group.
In two-way ANOVA, we have two factors (explanatory variables) that form
treatments in a two-way layout. Factor R has r values and Factor C has c values, so
there are rc treatments. The same number n of subjects are assigned to each treatment. The two-way layout that results is as follows:
Row
Factor R
1
2
Column Factor C
p
1
2
c
n subjects n subjects p n subjects
n subjects n subjects p n subjects
o
r
o
n subjects
o
n subjects
p
o
n subjects
The number of treatments is I rc.
The total number of observations is N rcn.
Figure 30.15 presents both one-way and two-way ANOVA output for the same
data, from the computer-assisted education study in Example 30.12. The one-way
analysis just compares the means of rc treatments, ignoring the two-way layout. In
discussing one-way ANOVA, we called the number of treatments I. Now I rc.
Minitab
Session
One-way ANOVA: Score versus Groups
Source
Groups
Error
Total
DF
3
44
47
S = 3.509
SS
141.563
541.750
683.313
R-Sq = 20.72%
MS
47.2
12.3
F
3.83
P
0.016
R-Sq(adj) = 15.31%
Session
Two-way ANOVA: Score versus Learn, Place
FIG U R E 3 0 . 15
Compare the sums of squares in these
one-way and two-way ANOVA outputs
for the study in Example 30.12.
Source
Learn
Place
Interaction
Error
Total
S = 3.509
DF
1
1
1
44
47
SS
130.021
0.521
11.021
541.750
683.313
R-Sq = 20.72%
MS
130.021
0.521
11.021
12.312
F
10.56
0.04
0.90
P
0.002
0.838
0.349
R-Sq(adj) = 15.31%
*This optional material requires the background on some details of ANOVA, presented in Section 27.7,
pages 661–665.
30.6
Some Details of Two-Way ANOVA
The two-way analysis takes into account that each treatment is formed by combining a value of R with a value of C.
In both settings, analysis of variance breaks down the overall variation in the
observations into several pieces. The overall variation is expressed numerically by
the total sum of squares
total sum of squares
SSTO a 1individual observation mean of all observations2 2
where the sum runs over all N individual observations. If we divide SSTO by N 1,
we get the variance of the observations. So SSTO is closely related to a familiar
measure of variability. Because SSTO uses just the N individual observations, it is
the same for both one-way and two-way analyses. You can see in Figure 30.15 that
SSTO 683.313 for these data.
One-way ANOVA. We saw in Chapter 27 that the one-way ANOVA
F test compares the variation among the I treatment means with the variation among responses to the same treatment. If the means vary more than we
would expect based on the variation among subjects who receive the same
treatment, that’s evidence of a difference among the mean responses in the
I populations.
Let’s give a bit more detail. One-way ANOVA breaks down the total variation
into the sum of two parts:
total variation among responses variation among treatment means variation
among responses to the same treatment
total sum of squares sum of squares for groups
sum of squares for error
SSTO SSG SSE
Formulas for the sums of squares SSG and SSE appear on page 662 as the
numerators of the mean squares MSG and MSE, but we won’t concern ourselves with the algebra. Remember that “error” is the traditional term in
ANOVA for variation among observations. It doesn’t imply that a mistake has
been made. In the one-way output in Figure 30.15, you see that the breakdown
for these data is
SSTO SSG SSE
683.313 141.563 541.750
The one-way ANOVA F test is formed in two stages (page 662):
1.
Divide each sum of squares by its degrees of freedom to get the mean squares
MSG for groups and MSE for error,
MSG SSG
I1
MSE SSE
NI
2. The one-way ANOVA F statistic compares MSG to MSE,
MSG
MSE
Find the P-value from the F distribution with I 1 and N I degrees of
freedom.
F
The ANOVA table in the output reports sums of squares, their degrees of freedom,
mean squares, and the F statistic with its P-value.
ANOVA table
30-29
30-30
CHAP TER 3 0
■
More about Analysis of Variance
Two-way ANOVA. Now look at the two-way analysis of variance table in
Figure 30.15:
■
■
The total sum of squares and the error sum of squares are the same as in the
one-way analysis.
The sum of squares for groups in one-way is the sum of the three sums of
squares for main effects and interaction in two-way.
This is the heart of two-way analysis of variance: break down the variation
among the rc groups into three parts: variation due to the main effect of Factor R,
variation due to the main effect of Factor C, and variation due to interaction between the two factors. Each type of variation is measured by a sum of
squares. The formulas for the two main effects sums of squares are similar to
that for the one-way sum of squares for groups, but we will again ignore the
algebraic details. The interaction sum of squares is best thought of as what’s
left over: the variation among treatments that isn’t explained by the two main
effects. In symbols,
total sum of squares sum of squares for main effect of Factor R
sum of squares for main effect of Factor C
sum of squares for interaction between R and C
sum of squares for error
SSTO SSR SSC SSRC SSE
Each of these sums of squares has degrees of freedom, and these also break down
in the same way:
total df df for main effect of Factor R
df for main effect of Factor C
df for interaction between R and C
df for error
rcn 1 1r 12 1c 12 1r 12 1c 12 rc1n 12
You can check that the total degrees of freedom in the line above are N 1 and
the degrees of freedom for error are N I, the same as for one-way ANOVA. The
one-way degrees of freedom for groups are the sum of the degrees of freedom for
the three two-way effects.
E X A M P L E 3 0 .15
Comparing One-Way and Two-Way ANOVA
The data behind Figure 30.15 appear in Table 30.3. The two factors are
R placement of Blissymbol elements and C learning style. Factor R has r 2
values: Before and During. Factor C has c 2 values: Active and Passive. There
are I 4 treatments and n 12 subjects assigned to each treatment, resulting in
N 48 observations.
The total degrees of freedom are N 1 47. The degrees of freedom for error
are N I 48 4 44. In the one-way analysis, the degrees of freedom for
groups are I 1 3. The two-way analysis breaks this into degrees of freedom
r 1 1 for Factor R, c 1 1 for Factor C, and 1r 12 1c 12 1 1 1
for interaction.
Here are the two breakdowns of the total variation and the degrees of freedom
that appear in Figure 30.15:
30.6
One-Way
Sums of Squares
SSG
141.563
df
3
SSE
SSTO
44
47
541.750
683.313
Two-Way
Sums of Squares
SSR
130.021
SSC
0.521
SSRC
11.021
SSE
541.750
SSTO
683.313
df
1
1
1
44
47
Some Details of Two-Way ANOVA
■
The neat breakdown of SSG into three effects depends on the balance of the
two-way layout. It doesn’t hold when the counts of observations are not the same
for all treatments. That’s why two-way ANOVA for unbalanced data is more
complicated and harder to interpret than for balanced data.
Two-way ANOVA F tests. Finally, form three F statistics exactly as in the
one-way setting.
1.
Divide each sum of squares by its degrees of freedom to get the mean squares for
the three effects and for error:
MSR MSC SSC
c1
MSRC SSRC
1r 12 1c 12
MSE SSE
NI
The three F statistics compare the mean squares for the three effects with
MSE.
Tw o - Wa y A N OVA F Te s t s
The F statistics for the three types of treatment effects in two-way ANOVA are
MSR
For the main effect of Factor R, F with dfs r 1 and N I
MSE
MSC
For the main effect of Factor C, F with dfs c 1 and N I
MSE
MSRC
For the interaction of R and C, F with dfs 1r 12 1c 12 and N I
MSE
In all cases, large values of F are evidence against the null hypothesis that the effect
is not present in the populations.
Apply Your Knowledge
30.15 Hooded Rats: Social Play Times. In Exercise 30.12, you carried out
two-way ANOVA for a study of the effect of social isolation on hooded rats.
The response variable is the time (in seconds) that a rat devoted to social
play during an observation period. Start your work in this exercise with your
two-way ANOVA table from Exercise 30.12.
SCLTIME
DATA
2.
SSR
r1
(a) Explain how the sums of squares from the two-way ANOVA table
can be combined to obtain the one-way ANOVA sum of squares for
the four groups (SSG). What is the value of SSG?
(b) Give the degrees of freedom, mean square (MSG), and F statistic
for testing for the effect of groups.
30-31
CHAP TER 3 0
■
More about Analysis of Variance
(c) Is there a significant effect of group on the amount of time in play?
Give and interpret the P-value in the context of this experiment.
(d) Use software to carry out one-way ANOVA of time on group.
Verify that your results in parts (b), (c), and (d) agree with the
software output.
30.16 Hooded Rats: Social Play Counts. In Exercise 30.13, you carried out
two-way ANOVA for a study of the effect of social isolation on hooded rats.
The response variable is the count of social play episodes during an observation period. Start your work in this exercise with your two-way ANOVA
table from Exercise 30.13.
SCLCOUNT
DATA
30-32
(a) How many treatment groups are there in this experiment?
(b) Starting from the two-way ANOVA table, create a one-way
ANOVA table to examine the effect of the four groups.
(c) Is there a significant group effect? Give the appropriate hypotheses,
test statistic, P-value, and conclusion in the context of this
experiment.
CHAPTER 30 SUMMARY
Chapter Specifics
■
Two-way analysis of variance (ANOVA) compares the means of several populations
formed by combinations of two factors R and C in a two-way layout.
■
The conditions for ANOVA state that we have an independent SRS from each
population (or a completely randomized or randomized block experimental design);
that each population has a Normal distribution; and that all populations have the same
standard deviation. In this chapter, we consider only examples that satisfy the additional conditions that the design producing the data is crossed (all combinations of the
factors are present) and balanced (all factor combinations are represented by the same
number of individuals).
■
A factor has a main effect if the mean responses for each value of that factor, averaged
over all values of the other factor, are not the same. The two factors interact if the effect
of moving between two values of one factor is different for different values of the other
factor. Plot the treatment mean responses to examine main effects and interaction.
■
There are three ANOVA F tests: for the null hypotheses of no main effect for Factor
R, no main effect for Factor C, and no interaction between the two factors.
■
Follow-up analysis is often helpful in both one-way and two-way ANOVA settings.
Tukey pairwise multiple comparisons give confidence intervals for all differences
among treatment means with an overall level of confidence. That is, we can be (say)
95% confident that all the intervals simultaneously capture the true population differences between means.
STATISTICS IN SUMMARY
Here are the most important skills you should have acquired from reading this chapter.
A. Recognition
1. Recognize the two-way layout, in which we have a quantitative response to treatments formed by combinations of values of two factors.
Statistics in Summary
2. Recognize when comparing mean responses to the treatments in a two-way layout is
helpful in understanding data.
3. Recognize when you can use two-way ANOVA to compare means. Check the data
production, the presence of outliers, and the sample standard deviations for the
groups you want to compare. Look for data production designs that are crossed and
balanced.
B. Interpreting Two-Way ANOVA
1. Plot the sample means for the treatments. Based on your plot, describe the main
effects and interaction that appear to be present.
2. Decide which effects are most important in practice. Pay particular attention to
whether a large interaction makes one or both main effects less meaningful.
3. Use software to carry out two-way ANOVA inference. From the P-values of the
three F tests, learn which effects are statistically significant.
C. Follow-up Analysis
1. Decide when it is helpful to know which differences among treatment means are
significant.
2. Use software to carry out Tukey pairwise multiple comparisons among all the means
you want to compare.
3. Understand the meaning of the overall confidence level and the overall level of
significance provided by Tukey’s method for a set of confidence intervals or a set
of significance tests.
Link It
In a one-way ANOVA, rejection of the null hypothesis is evidence of a difference in the
means of the populations, but the result does not tell us which differences between the
means are statistically significant. Typically, we perform a more detailed follow-up analysis
to decide which of the means are different and to estimate how large these differences are.
This can be done using Tukey’s pairwise multiple comparisons. These tell us which populations are significantly different from each other and, through the associated confidence
intervals, how large these differences are. Sometimes we have more specific questions about
the population means that can be answered using further contrasts among the means.
The multiple regression models in Chapter 29 extend the simple linear regression
model, allowing us to study the effect of several predictors on the mean of a quantitative response. Similarly, the two-way ANOVA extends the one-way ANOVA described
in Chapter 27, allowing us to study the effect of two factors on the mean of a quantitative
response. In the two-way ANOVA, we have specific questions about the two factors: what
are the main effects of each factor, and do the two factors interact? These questions can
be answered by examining appropriate plots of the means and the F tests in the ANOVA
table. In the presence of strong interaction, the main effects may be of little interest. As
in the one-way ANOVA, a follow-up analysis may be necessary to obtain more detailed
information about the effects of the factors.
30-33
30-34
CHAP TER 3 0
■
More about Analysis of Variance
C H E C K YO U R S K I L L S
30.17 In an ANOVA that compares three treatments, how
many pairwise comparisons between two of these
treatments are there?
(a) two
(b) three
(c) four
30.18 After an ANOVA that compares four treatments,
you calculate a separate two-sample t 95% confidence
interval for each pairwise difference formed from the
four population means. Are you 95% confident that
these intervals capture all the true differences?
(a) No, because 95% confidence applies to each
interval alone, not to all the intervals together.
(b) No, because the ordinary two-sample t confidence
intervals cannot be used for a difference between
two population means when we have data from
other populations as well.
(c) Yes, because under the ANOVA conditions,
following the t procedure gives correct 95% confidence intervals.
30.19 As part of an ANOVA that compares three treatments, you carry out Tukey pairwise tests at the
overall 5% significance level. The Tukey tests find
that ␮1 is significantly different from ␮2 but that the
other two comparisons are not significant. You can
be 95% confident that
(a) ␮1 ⬆ ␮2 and ␮1 ␮3 and ␮2 ␮3 .
(b) just ␮1 ⬆ ␮2 ; there is not enough evidence to
draw conclusions about the other pairs of means.
(c) ␮1 ␮3 and ␮2 ␮3 and this implies that it
must also be true that ␮1 ␮2 .
30.20 The purpose of two-way ANOVA is to learn about
(a) the means of two populations.
(b) the variances of two populations.
(c) the combined effects of two factors on a
quantitative response.
30.21 A two-way ANOVA compares two exercise programs (A and B) and two exercise frequencies (twice
a week and four times a week). The response variable is weight lost after eight weeks of exercise. An
interaction is present in the data when
(a) the mean weight loss is not the same for
Program A and Program B.
(b) the mean weight loss under Program A is
different for twice-weekly and four-times-weekly
subjects.
(c) the difference between the mean weight losses
for Programs A and B is not the same for twiceweekly and four-times-weekly subjects.
30.22 When the F test for interaction in a two-way ANOVA
is significant,
(a) the main effects are never meaningful and
should be ignored.
(b) the main effects must be interpreted with
caution.
(c) interpret each main effect just as you would
in a one-way ANOVA.
A student project measured the increase in the heart rates of
fellow students when they stepped up and down for three minutes to the beat of a metronome. The explanatory variables are
step height (Lo 5.75 inches, Hi 11.5 inches) and metronome beat (Slow 14 steps/minute, Med 21 steps/minute,
Fast 28 steps/minute). The subject’s heart rate was measured
for 20 seconds before and after stepping. The response variable is
the increase in heart rate during exercise.10 Use the ANOVA table
in Figure 30.16 to answer the following questions.
30.23 How many treatment groups were there for this
experiment?
(a) 3
(b) 5
(c) 6
Minitab
Session
Two-way ANOVA: HRinc versus Freq, Height
Source
Freq
Height
Interaction
Error
Total
DF
2
1
2
24
29
SS
4048.8
1920.0
218.4
2700.0
8887.2
S = 10.61
R-Sq = 69.62%
MS
2024.4
1920.0
109.2
112.5
F
17.99
17.07
0.97
R-Sq(adj) = 63.29%
F I GU R E 3 0 . 1 6
Two-way ANOVA table for Exercises 30.23 to 30.27.
P
0.000
0.000
0.393
Exercises
30.24 What is the value of the test statistic for interaction
between frequency and height?
(a) 218.4
(b) 109.2
(c) 0.97
(d) 0.393
30.25 What is the estimate for the common standard
deviation ␴?
(a) 218.4
(b) 109.2
(c) 10.61
30.26 How many students were in the study?
(a) 29
(b) 30
(c) It cannot be determined from the ANOVA
table.
30-35
30.27 Which statement provides the best summary of the
interaction between frequency and height?
(a) The interaction is significant at the 1% level
but not at the 5% level.
(b) The interaction is significant at the 5% level
but not at the 1% level.
(c) The interaction is not significant at either the
1% or the 5% level.
CHAPTER 30 EXERCISES
6.752 to 9.021
for
␮bihai ⫺ ␮red
10.165 to 12.670
for
␮bihai ⫺ ␮yellow
DATA
2.375 to 4.688 for ␮ red ⫺ ␮ yellow
(a) How confident are you that all three of these
intervals capture the true differences between pairs
of population means?
(b) Write a short summary of the results of the
ANOVA, including the multiple comparisons.
30.29 (Optional topic) Comparing tropical flowers: a
contrast. The data in Table 27.1 contain lengths
for the bihai variety of Heliconia and for two forms
(red and yellow) of the caribaea variety. We wonder
(before looking at the data) whether the population
mean for bihai differs from the average of the means
for the two forms of caribaea.
FLOWER
(a) What contrast expresses this comparison?
(b) Give the null and alternative hypotheses for
this comparison, find the sample contrast, and
assess its statistical significance. What do you
conclude?
(c) Give a 90% confidence interval for the population contrast in part (a).
30.30 Does nature heal best? Our bodies have a natural
electrical field that helps wounds heal. Might higher
or lower levels speed healing? An experiment with
newts investigated this question. Newts were randomly assigned to five groups. In four of the groups,
an electrode applied to one hind limb (chosen at
DATA
FLOWER
random) changed the natural field, whereas the
other hind limb was not manipulated. Both limbs in
the fifth (control) group remained in their natural
state.11
Table 30.5 gives data from this experiment. The
“Group” variable shows the field applied as a multiple of the natural field for each newt. For example,
“0.5” is half the natural field, “1” is the natural level
(the control group), and “1.5” indicates a field 1.5
times natural. “Diff” is the response variable, the difference in the healing rate (in micrometers per hour)
of cuts made in the experimental and control limbs
of that newt. Negative values mean that the experimental limb healed more slowly. The investigators
conjectured that nature heals best, so that changing
the field from the natural state (the “1” group) will
slow healing.
Carry out a one-way ANOVA to compare the
mean healing rates for the five groups. Then perform Tukey multiple comparisons for the 10 pairs of
population means. Use the “underline” method illustrated in Example 30.14 to display the complicated
results. What do you conclude?
NEWTHEAL
30.31 (Optional topic) Does nature heal best? A contrast. The researchers who carried out the study
of healing in newts described in Exercise 30.30
conjectured that healing is fastest under natural
conditions. If this is true, the population mean
for the 1 group (natural field level) in Table 30.5
(page 30–36) should be larger (or less negative)
than the average of the population means for the
other four groups.
NEWTHEAL
DATA
DATA
30.28 Comparing tropical flowers. Example 27.1 (page
645) describes a study of the lengths of three varieties of the tropical flower Heliconia. Table 27.1 gives
the data. One-way ANOVA gives strong evidence
(F ⫽ 259.12, P ⬍ 0.0001) that the three population mean lengths are not all equal. Software gives
these Tukey 95% simultaneous confidence intervals:
(a) What population contrast expresses this comparison? What null and alternative hypotheses in
terms of this contrast should the researchers use to
test their conjecture?
(b) Carry out the test and state your conclusion.
30-36
CHAP TER 3 0
■
More about Analysis of Variance
T A B L E 3 0 . 5 EFFECT OF ELECTRICAL FIELD ON HEALING RATE IN NEWTS
GROUP
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
DIFF
1
10
3
3
31
4
12
3
7
10
22
4
1
3
GROUP
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
DATA
(c) After you see the five sample means, it is
tempting to contrast the average for the 1 and 1.25
groups with the average for the other three groups.
Explain carefully why this is not legitimate.
30.32 Exercise and heart rate. A student project
measured the increase in the heart rates of fellow students when they stepped up and down for
three minutes to the beat of a metronome. The
explanatory variables are step height (Lo 5.75
inches, Hi 11.5 inches) and metronome beat
(Slow 14 steps/minute, Med 21 steps/minute, Fast 28 steps/minute). The subject’s heart
rate was measured for 20 seconds before and after
stepping. The response variable is the increase in
heart rate during exercise.12 The data file has resting and final heart rates as well as the increase
in heart rate for this study. If the randomization worked well, there should be no significant
differences among the six groups in mean resting heart rate (variable HRrest in the data file).
HEARTRTE
(a) How many pairwise comparisons are there
among the means of six populations?
(b) Use Tukey’s method to compare these means at
the overall 10% significance level.
30.33 Hooded rats: object play times. Exercise 30.12 describes an experiment to study the effects of social
isolation on the behavior of hooded rats. You have
analyzed the effects on social play. Now look at an-
DIFF
7
15
4
16
2
13
5
4
2
14
5
11
10
3
6
1
13
8
GROUP
1.25
1.25
1.25
1.25
1.25
1.25
1.25
1.25
1.25
1.25
1.25
1.25
1.25
1.25
1.25
DIFF
1
8
15
14
7
1
11
8
11
4
7
14
0
5
2
GROUP
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
DIFF
13
49
16
8
2
35
11
46
22
2
10
4
10
2
5
other response variable, the time that a rat spends in
object play during an observation period. The data
file records the time (in seconds) that each rat devoted to object play.
OBJTIME
DATA
DIFF
10
12
9
11
1
6
31
5
13
2
7
8
(a) Make a plot of the four group means. Is there
a large interaction between gender and housing
type? Which main effect appears to be more
important?
(b) Verify that the conditions for ANOVA
inference are satisfied.
(c) Give the complete two-way ANOVA table.
What are the F statistics and P-values for interaction and the two main effects? Explain why the test
results confirm the tentative conclusions you drew
from the plot of means.
30.34 Hooded rats: object play counts. The researchers who conducted the study in Exercise 30.33
4step
also recorded the number of times each of three
types of behavior (object play, locomotor play,
and social play) occurred. The data file contains
the counts of object play episodes for each rat
during the observation period. Carry out a complete analysis of the effects of gender and housing
type.
OBJCOUN
DATA
GROUP
0
0
0
0
0
0
0
0
0
0
0
0
30.35 Herbicide and corn hybrids. Genetic engineering has produced new corn hybrids that resist the
effects of herbicides. This allows more effective
control of weeds, because herbicides don’t damage
30-37
Exercises
DATA
(a) Construct a plot of means to examine the
effects of application rate and hybrid and their
interaction.
(b) Are the conditions for ANOVA inference satisfied? Explain.
30.36 Girls’ cross-country times. Ten runners each
completed 10 five-kilometer races for the Thomas
Worthington High School girls’ cross-country team
during the 2004 season. The data file contains each
race time in minutes.14
GIRLSRUN
(a) Prepare side-by-side boxplots to compare the
distributions of race times for the 10 runners.
Identify the best runner on the team.
(b) Use runner as a blocking variable to see if there
are significant differences in the overall mean race
times for the different meets. Identify the appropriate parameters, state the null and alternative
hypotheses, calculate the test statistic and P-value,
and state your conclusion in context.
(c) Should follow-up multiple comparisons be
used to identify any significant differences? If
so, summarize the appropriate procedure. If not,
explain why not.
30.37 (Optional topic) Girls’ cross-country times: a
contrast. Continue your analysis of the data from
Exercise 30.36 by comparing average race time during the first half of the season with the average for
the second half. Identify a population contrast L that
makes this comparison, estimate this contrast, and
state your conclusion.
30.38 Girls’ cross-country times: peak at the end?
Coaches like to see their athletes “peak” at the end
4step
of the season. Is there significant evidence that the
average finish time for the 10th meet is significantly
lower than the average finish time for the first nine
meets?
30.39 Comparisons among means for one factor in a
two-way analysis. We have illustrated Tukey pairwise comparisons among all treatment means in both
one-way and two-way settings. The method can also
compare the mean responses to just one of the two
factors in a two-way setting: do a one-way ANOVA
on the two-way data listing only one factor as an explanatory variable. This combines data for all levels
of the other factor, so it is useful only when interactions are small. Return to the data on phosphorus in
DATA
tomato plants, Table 30.4. Do a one-way ANOVA
that uses all 36 observations with fertilizer type as
the only factor. Ask for Tukey pairwise comparisons
among the three levels of fertilizer, with overall confidence level 95%.
TOMATOES
(a) How many observations per group does your
analysis use?
(b) What do you conclude from the F statistic and
its P-value?
(c) Use Tukey: which pairwise differences of means
for the three fertilizer levels are significant at the
overall 5% level?
(d) Do you think these pairwise comparisons are
useful for these data? (Hint: What population does
each of the three samples represent?)
30.40 Fourth-graders composing music. The Orff xylophone is often used in teaching music to children because it has removable bars that allow the teacher to
present different options to the students. An education researcher used the Orff xylophone to examine
the effect of tonality (pentatonic or harmonic minor)
and number of xylophone bars (five or 10) on the
ability of fourth-graders to compose melodies that
they could play repeatedly.15
© Arclight/Alamy Images
DATA
the corn. A study compared the effects of the herbicide glusofinate on a number of corn hybrids. The
percents of necrosis (leaf burn) 10 days after application of glusofinate for several application rates
(kilograms per hectare) and three corn hybrids,
two resistant and one not, are provided in the data
file.13
CORNHYB
(a) Twelve children were randomly assigned to each
combination of tonality and bar count. Give the
two-way layout for this experiment.
(b) Judges listened to tapes of the children’s work
and assigned scores for several aspects of the
melodies. One response variable measured the
extent to which children generated new musical
ideas in consecutive five-second intervals of their
melodies. Here is the two-way ANOVA table for
this variable (the publication does not give the
group means):
F
P
1
44.08
0.43
0.52
1
705.33
6.81
0.01
1
50.02
0.48
0.49
44
4556.54
DF
Tonality
Bar count
Error
More about Analysis of Variance
SUM OF SQUARES
SOURCE
Interaction
■
Comment on the significance of main effects and
interaction. Then make a recommendation for
teachers who want to use the Orff xylophone to
encourage children to generate melodies with new
musical ideas.
30.41 Cues for listening comprehension. In speaking,
we often signal a shift to the next step in the discus4step
sion by saying things like “Let’s talk about . . .” or
“Finally, . . .” These are “discourse-signaling cues.”
Do such cues help listeners better comprehend a
second language? One study of this question involved 80 Korean learners of English as a foreign
language.16 Half of the 80 learners listened to a lecture in English with discourse-signaling cues, and
the other half heard the same lecture without cues.
After the lecture, half of each group were asked to
summarize information from the lecture and the
other half simply to recall information. Here are the
group means and two-way ANOVA table for one
response variable, a measure of comprehension of
high-level information:
LECTURE
CUES
NO CUES
Summary
23
17
Recall
19
13
Task
SOURCE
DF
SUM OF SQUARES
F
P
Lecture
1
649.80
16.582
0.000
Task
1
281.25
7.177
0.009
1
0.20
0.005
0.943
76
2978.30
Interaction
Error
Use this information to comment on the impact of
discourse-signaling cues and the type of task performed. (Assume that the data satisfy the conditions
for ANOVA.)
30.42 Fertilization of bromeliads. Does the type of fertilization have a significant effect on leaf development for bromeliads? Researchers compared leaf
production and death over a seven-month period
for a control group (C), a group fertilized only with
nitrogen (N), a group fertilized only with phosphorus (P), and a group that received both nitrogen and
phosphorus (NP).17 In the nitrogen and phosphorus
columns in the data file, 0 indicates no fertilization
and 1 indicates fertilization. The “new” and “dead”
columns give the total number of new or dead leaves
produced over the seven months following fertilization. “Change” gives the difference “new” minus
“dead.”
BROMLIAD
(a) Figure 30.17 (page 30–39) displays the one-way
ANOVA table and Tukey pairwise comparisons for
the number of new leaves. Are there significant differences among the four group means? List the four
group means in order from smallest to largest and
give a summary of the Tukey procedure similar to
that in Example 30.14.
(b) Give the two-way ANOVA table for the number of new leaves. How do your conclusions from
this analysis differ from your conclusion in part (a)?
30.43 Fertilization of bromeliads, continued. Repeat
the analysis of Exercise 30.42 using number of dead
leaves as the response variable.
30.44 Fertilization of bromeliads, continued. Repeat
the analysis of Exercise 30.43 using change in the
number of leaves as the response variable.
30.45 (Optional topic) ANOVA with one observation
per treatment. If a two-way layout has just one
observation for each treatment, the mean square
for error (MSE) is 0 because there is no variation
within each treatment group. The usual two-way
ANOVA F tests can’t be done, because they have
MSE in their denominators. But if we are willing to
assume that there is no interaction between the two
factors, the interaction mean square can be used as
the denominator of F statistics for testing the two
main effects. Software usually allows you to choose
“no interaction.” Here is an example.
Does the addition of sodium chloride change
brick quality? One measure of quality is “resistance
anisotropy,” which combines the results of ultrasound and mechanical tests. Researchers tested
two types of bricks, one without sodium chloride
(MPW) and one with sodium chloride (MPS), at
three firing temperatures (850, 1000, and 1100
degrees Celsius). Here are the data for one brick per
treatment:18
BRICKS
DATA
CHAP TER 3 0
DATA
30-38
BRICK TYPE
TEMPERATURE (ºC)
RESISTANCE
MPW
850
3.29
MPW
1000
5.78
MPW
1100
4.84
MPS
850
2.57
MPS
1000
3.90
MPS
1100
2.57
Exercises
30-39
Minitab
Session
One-way ANOVA: New versus treatment
Source
treatment
Error
Total
S = 1.595
DF
3
28
31
SS
28.75
71.25
100.00
R-Sq = 28.75%
MS
9.58
2.54
F
3.77
P
0.002
R-Sq(adj) = 21.12%
Individual 95% CIs For Mean Based on
Poled StDev
Level
C
NP
P
N
8
8
8
8
Mean
13.250
15.625
14.625
13.500
StDev
1.909 (
1.685
1.302
(
1.414
*
)
(
+
(
*
13.2
*
)
)
)
14.4
15.6
16.8
Pooled StDev = 1.595
Tukey 95% Simultaneous Confidence Intervals
All Pairwise Comparisons among Levels of treatment
Individual confidence level = 98.92%
treatment = C subtracted from:
treatment Lower
N
0.198
NP
-0.802
P
-1.927
Center
2.375
1.375
0.250
Upper
4.552
3.552
2.427
*
(
*
(
*
(
-2.5
)
)
)
0.0
2.5
5.0
2.5
5.0
2.5
5.0
treatment = N subtracted from:
treatment Lower
NP
-3.177
P
-4.302
Center
-1.000
-2.125
Upper
1.177
0.052 (
*
(
*
)
)
-2.5
0.0
treatment = NP subtracted from:
treatment Lower
P
-3.302
Center
-1.125
Upper
1.052
(
-2.5
*
)
0.0
F I G U R E 3 0 . 17
One-way ANOVA table and Tukey pairwise multiple comparisons from Minitab for Exercise 30.42.
(a) Make a plot of the group means (which are the
same as the individual observations). Is the interaction between type of brick and firing temperature
small? If so, it may be reasonable to assume that in
the population there is no interaction.
(b) The researchers did fit a two-way ANOVA
model without interaction. Use statistical software
to obtain this two-way ANOVA table. What do you
conclude about the main effects?
30.46 Packaging and perceived healthiness. Can the
font and color used in a packaged product affect
4step
our perception of the product’s healthiness? It
was hypothesized that use of a natural font, which
looks more handwritten and tends to be more
slanted and curved, might lead to a different perception of product healthiness than an unnatural
font. Two fonts, Impact and Sketchflow Print,
were used. These fonts were shown in a previous
CHAP TER 3 0
■
More about Analysis of Variance
study to differ in their perceived naturalness, but
otherwise were rated similarly on factors such as
readability and likeability. The two package colors
used were red and green. Images of four identical
packages differing only in the font and color were
presented to the subjects. Participants read statements such as “This product is healthy” /natural/
wholesome/organic and they rated how much they
agreed with the statement on a seven-point scale
with 1 indicating “strongly agree” and 7 indicat-
ing “strongly disagree.” Each subject’s responses
were combined to create a perceived healthiness
score.19 The researcher had 252 students available to serve as subjects, and randomly assigned
63 to each treatment, or combination of color and
font. The data file records the perceived healthiness score for each subject. Carry out a complete
analysis of the effects of color and font on perceived healthiness, using Example 30.12 to guide
your work.
COOKIES
DATA
30-40
Exploring the Web
30.47 Confidence in the banking system. The General Social Survey (GSS) is a
sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. The survey is conducted by the National Opinion Research Center of the University of Chicago, which interviews
face-to-face a randomly selected sample of adults (18 and older). SDA (Survey
Documentation and Analysis) is a set of programs that allows you to analyze
survey data and includes the GSS as part of its archive. In Exercises 27.44 and
27.45 (page 674), you used data from the GSS to study the relationship between
the average respondent age and confidence in the banking system. A one-way
ANOVA showed a statistically significant difference between the age of the respondent and his or her confidence in the banking system. Download the data file
following the directions in Exercise 27.45, and do Tukey’s multiple comparisons
at the 5% overall significance level to compare the mean age for the three levels
of confidence in the banking system. Which pairs of means are different?
30.48 Find a two-way ANOVA. Find an example of a two-way ANOVA on the
web. The Journal of the American Medical Association (jama.ama-assn.
org), Science Magazine (www.sciencemag.org), the Canadian Medical
Association Journal (www.cmaj.ca), and the Journal of Marketing Research
(www.atypon-link.com/AMA/loi/jmkr) are possible sources. To help
locate an article, look through the abstracts of articles. If you are having difficulty, consider study 1 in Psychological Science, 19 (2008), pp. 268–273. Once you
find a suitable article, read the article and then briefly describe the study (including the two factors and the number of levels of each factor) and its conclusions.
If the means are given, plot them and discuss the interaction. Whether or not
the study is balanced, the test for interaction is interpreted in the same way. If
P-values are reported, be sure to discuss them in your summary. Also, be sure
to give the reference (either the web link or the journal, issue, year, title of the
paper, authors, and page numbers).
NOTES AND DATA SOURCES
Chapter 30 Notes
1. We thank Charles Cannon of Duke University for providing the data. The study
report is C. H. Cannon, D. R. Peart, and M. Leighton, “Tree species diversity in commercially logged Bornean rainforest,” Science, 281 (1998), pp. 1366–1367.
Exploring the Web
2. Modified from M. C. Wilson and R. E. Shade, “Relative attractiveness of various
luminescent colors to the cereal leaf beetle and the meadow spittlebug,” Journal of
Economic Entomology, 60 (1967), pp. 578–580.
3. In some cases, we think of the blocks in a block design as a sample from some larger
population of potential blocks. For example, a clinical trial may compare several
medical therapies (the factor) using patients at several medical centers (the blocks).
We are not interested in these particular medical centers; they simply represent all
hospitals at which the therapies might be used. Such blocks are called “random.”
ANOVA with one or more random factors is an advanced topic. In other cases, the
specific blocks are of interest in their own right. For example, a clinical trial might
compare several therapies (the factor) on both female and male patients (the blocks).
We randomly assign therapies among the women and separately among the men, a
randomized block design. We are interested in the effect of gender on response to the
therapies. Such blocks are called “fixed.” In this chapter, we allow randomized block
designs with fixed blocks, analyzing the data by treating the blocks as the second
factor in two-way ANOVA.
4. Victoria L. Brescoll and Eric L. Uhlmann, “Can an angry woman get ahead? Status
conferral, gender and expression of emotion in the workplace,” Psychological Science,
19 (2008), pp. 268–273. The description and data are based on study 1 in this article.
5. Orit E. Hetzroni, “The effects of active versus passive computer-assisted instruction
on the acquisition, retention, and generalization of Blissymbols while using elements
for teaching compounds,” PhD thesis, Purdue University, 1995.
6. Data courtesy of David LeBauer, University of California, Irvine. Provided by Brigitte
Baldi.
7. Simulated data, based on data summaries in G. L. Cromwell et al., “A comparison
of the nutritive value of opaque-2, floury-2, and normal corn for the chick,” Poultry
Science, 57 (1968), pp. 840–847.
8. We thank Andy Niemiec and Robbie Molden for data from a summer science project
at Kenyon College. Provided by Brad Hartlaub.
9. We thank Harihuko Itagaki and Andrew Veerde for data from a summer science
project at Kenyon College. Provided by Brad Hartlaub.
10. Data from the EESEE story “Stepping Up Your Heart Rate.”
11. Data provided by Drina Iglesia, Purdue University. The data are part of a larger study
reported in D. D. S. Iglesia, E. J. Cragoe, Jr., and J. W. Vanable, “Electric field strength
and epithelization in the newt (Notophthalmus viridescens),” Journal of Experimental
Zoology, 274 (1996), pp. 56–62.
12. See Note 10.
13. Christopher Eric Mowen, “Use of glusofinate in glusofinate resistant corn hybrids,”
MS thesis, Purdue University, 1999.
14. We thank Coach Brian Luthy for the data. Provided by Brad Hartlaub.
15. John Kratus, “Effect of available tonality and pitch options on children’s compositional processes and products,” Journal of Research in Music Education, 49 (2001),
pp. 294–306.
16. Euen Hyuk Jung, “The role of discourse signaling cues in second language listening
comprehension,” Modern Language Journal, 87 (2003), pp. 562–577.
17. We thank Jacqueline Ngai for providing data from Jacqueline T. Ngai and Diane S.
Srivastava, “Predators accelerate nutrient cycling in a bromeliad ecosystem,” Science,
314 (2006), p. 963. Counts for the last control plant are simulated because a monkey enjoyed snacking on this plant before the researchers were able to obtain all the
counts.
18. G. Cultrone et al., “Ultrasound and mechanical tests combined with ANOVA to
evaluate brick quality,” Ceramics International, 27 (2001), pp. 401–406.
19. Alysha Fligner and Xiaoyan Deng, “The effect of font naturalness on perceived
healthiness of food products,” Senior thesis, The Ohio State University.
30-41