Applications of Group Representations to Statistical Problems f(j)= "i

Transcription

Applications of Group Representations to Statistical Problems f(j)= "i
Applications of Group Representations
to Statistical Problems
Persi Diaconis
Department of Mathematics, Harvard University, Science Center, 1 Oxford Street
Cambridge, MA 02138 USA
Abstract
Many problems in routine statistical analysis can be interpreted as the decomposition of a representation into irreducible components and the computation and
interpretation of the projection of a given vector into these components. Examples
include the usual spectral analysis of time series and the statistical analysis of
variance. Recently, non-commutative representations have emerged as a practical
tool. A variety of approaches have come together to give a unified theory.
1. Introduction
The study of a function through the size of its coefficients in an orthogonal
expansion is a standard tool. This paper shows that expansions arising from the
action of a group on a set occur naturally in a variety of statistical problems.
Example (Time Series Analysis). Let f(0), f(\), ... f(N — 1), be the observed value
of a series of events. For example, the f(k) might be the number of children born in
New York City on successive days. Data collected in time often exhibit periodic
behavior; New York City birth data looks like this:
411, 430, 418, 396, 401, 320, 322,
the pattern of 5 high values followed by two low values persists. This seems
surprising until one realizes that about 20% of all births are induced and physicians
don't like to work on weekends. Izenman and Zabell (1978) discuss these data.
To find and interpret such periodicities, data f(k) is often transformed as
f(j) = "i *2"w7(*).
fc=0
The data can be recovered by the inversion theorem
JM j=o
Proceedings of the International Congress
of Mathematicians, Kyoto, Japan, 1990
© The Mathematical Society of Japan, 1991
1038
Persi Diaconis
If the transform f(j) is relatively large for only a few values of j , the inversion
theorem shows fis well approximated by these few periodic components. This gives
a simple description off and one can try to go back and understand why the few
components are large.
This "bump hunting" part of spectral analysis is fully explained in books by
Brillinger (1975) and Bloomfield (1976). It is only one part of the story - continuous
spectra is the other part (see Tukey (1961) - but it, and its generalizations will
dominate the present treatment.
The generalizations presented here involve a finite group G acting on a finite set
X. Let / : X -> IR be a given function. In the example above, G is a cyclic group of
order N acting on itself by translation. The function f(k) is the number of children
born on day k. In the example of the next section, G is the symmetric group acting
on itself and f(%) is the number of people in an election who ranked candidates in
the permutation n.
Let L(X) — {/: X -> <C}. This is a vector space on which G acts by sf(x) =
f(s~lx). Mashke's theorem implies that L(X) splits into a direct sum
L(X) =
V0®V1@-"®VJ
where each subspace is invariant under the group (so g G VI implies sg e Vt) and the
pieces are irreducible, so no further splitting is possible. Clearly / G L(X) can be
written as / = Yfx=ofi w i t n A t n e projection into Vt.
The empirical finding, to be explored further, is that the subspaces often have
simple interpretations and the decomposition of / into its projection into the V^
"makes sense".
Definition. Spectral analysis consists of the computation and interpretation off and
the approximation off by as few pieces as do a reasonable job.
This necessarily vague definition encompasses a number of areas of classical
statistics. In the next section an example is presented in some detail. Section 3 gives
a group-theoretic version of the classical analysis of a variance as spectral analysis.
Section 4 describes modern work on ÀNOVA of orthogonal designs as developed
by Bailey, Neider, Speed, Tjur, and their co-workers. That these two approaches
lead to the same analysis in nice cases is an important recent result of Rosemary
Bailey, Chris Rowley, and their co-workers. This is developed in Section 5. The final
section gives pointers to the many topics which couldn't be covered in this brief
review.
Spectral analysis as outlined here is a data analytic variation of ideas suggested
earlier by Alan James and Ted Hannen. Hannen's (1965) monograph is filled with
innovative ideas and treats continuous problems as well. Peter Fortini's (1977) thesis
is also an important source of inspiration for the treatment presented here.
Only the rudiments of group representations are needed. The beginning of the
books of Ledermann (1987) or Serre (1977) are ample background. I have tried to
lay out the background in Diaconis (1988).
Applications of Group Representations to Statistical Problems
1039
2. An Example
This section presents data on S5 the symmetric group on 5 letters. The data arise
from an election of the American Psychological Association. This organization asks
its membership to rank order 5 candidates for president. Here G = S5 and f(n) is
(\
2 3 4 5\
the number of voters choosing rank order 7T. For example,/1
1 = 29
so 29 voters ranked candidate one 5th, candidate two 4th and so on. The data is
shown in Table 1.
Let Q:S5-+ GL5(U) be the usual 5-dimensional permutation representation.
Thus Q(TL) is a 5 x 5 matrix with (/, ;) entry 1 if n(i) = j and zero otherwise. The
Fourier transform of / at Q is the matrix
Table 1. American Psychological Association election data
Ranking
N o . of
votes cast
of this
type
Ranking
54321
54312
54231
54213
54132
54123
53423
53412
53241
53214
53142
53124
52431
52413
52341
52314
52143
52134
51432
51423
51342
51324
51243
51234
45321
45312
45231
45213
45132
45123
29
67
37
24
43
28
57
49
22
22
34
26
54
44
26
24
35
50
50
46
25
19
11
29
31
54
34
24
38
30
43521
43512
43251
43215
43152
43125
42531
42513
42351
42315
42153
42135
41532
41523
41352
41325
41253
41235
35421
35412
35241
35214
35142
35124
34521
34512
34251
34215
34152
34125
N o . of
votes cast
of this
type
91
84
30
35
38
35
58
66
24
51
52
40
50
45
31
• 23
22
16
71
61
41
27
45
36
107
133
62
28
87
35
Ranking
N o . of
votes cast
of this
type
Ranking
N o . of
votes cast
of this
type
32541
32514
32451
32415
32154
32145
31542
31524
31452
31425
31254
31245
25431
25413
25341
25314
25143
25134
24531
24513
24351
24315
24153
24135
23541
23514
23451
23415
23154
23145
41
64
34
75
82
74
30
34
40
42
30
34
35
34
40
21
106
79
63
53
44
28
162
96
45
52
53
52
186
172
21543
21534
21453
21435
21354
21345
15432
15423
15342
15324
15243
15234
14532
14523
14352
14325
14253
14235
13542
13524
13452
13425
13254
13245
12543
12534
12453
12435
12354
12345
36
42
24
26
30
40
40
35
36
17
70
50
52
48
51
24
70
45
35
28
37
35
95
102
34
35
29
27
28
30
1040
Persi Diaconis
A?) = !<?(*)/(«).
This has (i, j) entry the number of people ranking candidate i in position j . This
natural summary is shown in Table 2 where entries are divided by the total number
of voters to give proportions.
Table 2. Percentage of voters ranking
candidate i in position j
Rank
Candidate
1
2
3
4
5
1
2
3
4
5
18
14
28
20
20
26
19
17
17
21
23
25
14
19
20
17
24
18
20
19
15
18
23
23
20
The largest number 28 in the (3,1) position shows 28 percent of the voters ranked
candidate 3 first. Candidate 3 also had some "hate vote": 23 percent ranked 3 last.
This first order summary is the first thing anyone analyzing such data would try. It
is natural to ask if it captures the essence of the voting pattern or if there is more
to be learned.
The data is summarized by / G L(S5). This last vector space splits into 7 invariant
subspaces in its isotypic decomposition shown in Table 3.
Table 3. Decomposition of the regular representation
M=
Dim 120
SS/120
vx
1
2286
e
v2
16
298
e
v3
25
459
e
v4
36
78
e
v5
25
27
©
v6
16
7
©
Vn
1
0
Table 2 amounts to looking at the projection off into V1®V2. If L(S) is treated
as an inner product space with </|gf> = £/(rc)0(rc) the function / decomposes into
the pieces of its orthogonal projection. The norm square off decomposes into the
norm squared of its projections by Pythagoras's theorem. These squared lengths are
shown in the last line of Table 3. As usual the largest contribution comes from the
projection onto the constant functions. There is also a large projection onto the
space V3. This projection is not captured in the summary of Table 2.
The space V3 is made up of "2nd order functions", a typical such being
n*->ô{jj>}{7c(i), n(i')} which is 1 if the unordered pair {n(i)9 n(i')} = {j, j ' } . The span
of all 2nd order functions, orthogonal to Vx ® V2 make up V3. In group representations language, V3 is the isotypic subspace corresponding to the partition 3, 2. This
V3 is 25-dimensional. To understand the projection o f / i n t o V3 a device of Colin
Mallows was used. The function / corresponding to the data is projected onto V3.
Applications of Group Representations to Statistical Problems
1041
The inner product of this projection with functions ö{jJ.}{n(i), n(i')} is then reported.
The pairs {/, /'}, {;', ;'} can be chosen in 10 ways each. The 100 inner products are
shown in Table 4.
Table 4. Second order, unordered effects
Rank
Candidate
1,2
1,3
1,4
1,5
2,3
2,4
2,5
3,4
3,5
4,5
1,2
1,3
-137
-20
-88
51
57
84
-20
-44
-7
10
-24
476
-189
-150
-42
157
22
-265
-169
296
1,4
2,3
1,5
2,4
2,5
111
22
4
-179
-209
-147
-169
-160
113
47
19
-43
7
72
88
24
45
-61
-25
15
199
70
-9
43
30
-93
98
49
-16
-76
69
140
44
99
56
82
-56
25
85
47
-142
-130
18
140
-117
39
78
-5
-163
-128
3,4
3,5
6
107
-65
-48
-76
8
62
19
-51
38
-97
128
23
-53
-39
38
99
-52
-36
-9
4,5
-46
241
-146
-48
72
112
-138
-233
To explain, consider the {1, 3} {1, 2} entry 476. This is the largest number in the
table. It means that there is a large positive effect for ranking candidates one and
three in the first two positions of the ballot. The last entry in row {1,3} shows that
these candidates also had a lot of hate vote.
A very similar picture occurs for the last row of the table. With these observations,
the tables pattern becomes apparent. The American Psychological Association
consists of two groups, academicians and clinicians who are on uneasy terms.
Candidates {1,3} are from one group, {4, 5} from the other. Very few voters cross
ranks so there is a large negative effect for ranking, say {1,4}, first and second (or
fourth and fifth). These observations are the main structure not revealed by the first
order analysis.
In studying data as we have above it is natural to ask if the data were collected
again, would the same patterns arise. I will not go into the details here, but a variety
of stochastic analyses suggest that the natural scale of variability in Table 4 is ± 50,
so the patterns observed are believable.
Further details of this analysis are given in Diaconis (1989). Diaconis and Smith
(1989) give a different set of applications for these group-theoretic decompositions.
3. Analysis of Variance (ANOVA)
Consider data cross classified by I levels of one variable and J levels of a
second variable. The observed data is then a function f(i, j) from
X = {(/, j) : 1 < / < I, 1 < ; < J} into R. The product Sj x Sj acts on X and L(X)
splits into
L(X) = V0®
dim 7 x 7
1
V, ®
7-1
V2 ®
J-\
V3
(I - \)(J - \)
-80
267
1042
Persi Diaconis
where V0 is the space of constant functions, V1 is the space of row functions
f(U j) = f(U j ' ) , V2 i s t n e space of column functions, and V3 is the space orthogonal
to V0 ® V± ® V2. The projection of / onto V0®Vl® V2 can be interpreted as. the
least squares approximatiorrto/of form f(i, j) = a -i^ßf^yjfor constantsar^ yjr
This, and many more complex variants are known collectively in statistical
literature as the Analysis of Variance. The classical book by Sheffe (1959) is still the
best treatment of this widely used subject.
The group-theoretic treatment of ANOVA was pioneered by Alan James and
Ted Hannen with important later work by Peter Fortini. Group theory is useful in
analyzing more complex designs where the appropriate decomposition is not so
easy to guess at. The dimensions and projections of various subspaces can be
computed by character theory. Diaconis (1988) reviews these topics.
Even in the simple example given above, thinking group-theoretically has something to offer: Instead of 5Z x Sj one can consider 57 x Cj or Cl x Cj, with CI a.
cyclic group of order I. These groups act transitively and their use would be
appropriate if the order of the corresponding rows or columns matters. For example,
if the rows of the table were birds, and the columns months of the year, with (i, j)
entry the number of birds of type i cited in month j (all in a given location) the
decomposition by Sj x C 12 would be appropriate.
Carrying out the projections involves calculating the Fourier transform at many
different irreducible representations. In Diaconis and Rockmore (1990) a noncommutative analog of the fast Fourier transform has been developed and used
to make these computations efficient. Similar work is being developed by Beth
(1984) and Clausen (1989a, b). Historically, the FFT on C{ was first developed by
Yates (1937) to analyze multi way tables.
4. Modern ANOVA
Analysis of variance has developed along non group-theoretic lines. In this section
a survey of works by Rosemary Bailey, Chris Rowley, John Neider, Tue Tjur, Terry
Speed, and their co-workers is given. The next section shows how the present
treatment and group-theoretic treatment interact.
Begin with a finite set X. Let L(X) be the real-valued functions on X. A design
^ is a set of partitions F of X. For example, in ANOVA with repeated observations
in a cell X = {(i, j , k) : 1 < i < I, 1 < j < J, 1 < k < nu} the design might be taken
as 9) = {U, R, C, R A C, E} where U is the universal partition with one block. R is
the row partition (i, j , k) ~ (i', / , k') iff i = V, C is the column partition, R A C, the
minimum of R and C, has indices equivalent if they are in the same row and column,
and E is the partition into singletons.
For each partition F G 9, let LF = {/ G L(X) which are F measurable}. The
projection PF : L(X) -> LF is defined by the averaging matrix
(P
)
F xy
=
{Vl/I
\0
x, y G / w i t h / a block in F
otherwise.
Applications of Group Representations to Statistical Problems
1043
Two partitions, F,G e 9 are called orthogonal if their subspaces are geometrically orthogonal, or if PFPG = PGPF- F ° r t n e ANOVA example, R and C are
orthogonal provided nu — \X\ni+n+j with ni+ = X/"(/- An orthogonal design has
all factors orthogonal.
In recent years, orthogonal designs with the set of factors closed under maxima
have come to be seen as a useful class with a unified theory. Adding maxima of
orthogonal factors preserves orthogonality, so a design can always be completed
in this way. Such designs are called Tjur designs because of the following basic result.
Theorem (Tjur (1984)), A Tjur design admits a unique decomposition
L(X) = © VB
GeS)
with the property that
LF = © VG.
G>F
The projection of a given / G L(X) onto the various VG constitute the analysis
of variance for a Tjur design. The point of the decomposition is this: it is easy to
complete the projection PF of / onto LF. The partially ordered set of factors then
allows the computation onto the VG by subtraction which amounts to Möbius
inversion in the poset.
In the basic ANOVA example, the factors can be diagrammed as:
U^
R'
R
A
C
Given / G L(X) one computes the projection onto the constant functions by Pv. The
projection onto the row effects space VR is given by PR — Pv. The projection onto
the column effects space Vc is given by Pc — Pv. The projection onto the residual
or interaction space VRAC is given by PRAC
— PR — Pc + Pu-
More generally, the projection onto VG is given by Y^F^GVÌF* G)PF with p the
Möbius function of the partially ordered set of factors.
This is an easy algorithm which is used by many large computer programs (e.g.,
Genstat) to analyze designed experiments. A splendid treatment of this point of view
appears in Tjur (1984).
1044
Persi Diaconis
5. Groups and Modem ANOVA
Sections 3 and 4 above present two different approaches to the analysis of designed
experiments. In both, data are represented as / G L(X). In the group case, a group
G is found acting on X and the analysis consists of decomposing L(X) and computing projections. In the second case, a collection of partitions is produced and one
uses the splitting of L(X) into parts indexed by these partitions.
It is natural to ask about the relation between these approaches. This problem
has been solved for a large class of examples in recent work of Bailey, Prager,
Rowley, and Speed (1983). Their work intertwines the two approaches. It is also
important group-theoretically in providing examples where the Fourier transform
can be computed using the simple averaging and difference algorithm outlined
before.
To describe their result, let X be a finite set and <3 = {F} a design, or set of
partitions of X. Assume that the blocks of each F e § have uniform size, that all
partitions F, G in Si are orthogonal, that @) is closed under max and min, and finally
that 3) forms a distributive lattice under max and min.
These assumptions include many complex classical cases. However, adding the
minimum of two partitions to an orthogonal design can destroy orthogonality.
As automorphism of a design 3) is a 1-1 map % : X -» X such that for each F e 3,
if x and y are in the same block of F thçn n(x) and %(y) are in the same block of F.
The set of all automorphisms of 3) is called the automorphism group of 3.
Bailey, Prager, Rowley, and Speed (1983) did three main things:
I) They identified the automorphism group of 3 as a generalized wreath
product of symmetric groups indexed by a partially ordered set. These generalized
wreath products have been extensively developed because of their role in the
algebraic theory of semi-groups (Krohn-Rhodes theory). A marvelous introduction
to this theory appears in Wells (1976). The result also builds on previous work by
Silcock (1977).
II) They identify the characters of the automorphism group that appear in the
representation L(X).
III) They show that the group-theoretic and partition based analysis agree.
Much further work is not reported here. For example, they determine the
commuting algebra of L(X), give a natural language for describing the groups and
decomposition, and finally they make the link with the large body of statistical work
in a useful way.
Each approach has problems where it seems to be the superior mode of analysis.
The approach by partitions works for some designs without enough symmetry to
permit a useful group-theoretic analysis. For example, consider a 2-way array with
n y entries per cell where ntj are as shown:
Applications of Group Representations to Statistical Problems
1045
Here it does not make sense to permute the rows or columns. However, the design
is orthogonal and permits a straightforward analysis.
In the other direction, block designs are a widely used class of design which are
not orthogonal. As an example, consider an experiment in which v levels of vanilla
are to be compared to help decide how much to put into ice cream. If one asks
people to taste many ice cream cones, they all taste like colored sugar water. Thus
suppose people are asked to taste k < v flavors. A complete block design involves
v\
people each of whom tastes k levels of vanilla. Suppose the response is a
rating between 0 and 100. This yields fc( I responses in total. The underlying set
X = {(/, s) : 1 < / < v, \s\ = k, i G s} the responses give / : X -> R.
T
The two natural partitions are for treatments and blocks. Thus (/, s) ~ (/', s') if
/ = /' and (/, s) ~ (/', s') if s = s'. These two partitions do not yield an orthogonal
design. Indeed, here T A B = U and the condition for orthogonality can be stated
as nis\X\ = n{ns where
- nt is the number of elements in X receiving treatment /1 so n{ = (
- ns is the number of elements in X in block s (so ns = k)
- nis is the number of elements in X with x = (i, s) (so ns = 1 if / G S, 0 if ifis).
It follows that nis\X\ ^ nx • ns for /fis.
The group-theoretic analysis of this kind of data is straightforward but picks up
aspects not developed in earlier analysis. The automorphism group can be identified
with Sv. The representation L(X) decomposes into a treatment space and a block
space, but it also includes new pieces which may be interpreted as the effect of taster's
rating by comparison. Fortini (1977) or Diaconis (1988) give further detail.
6. Other Topics
This section gives pointers to closely related research which cannot be adequately
covered due to space limitations.
6.1 Stochastic Models
The approach to spectral analysis outlined above begins with data and a group.
Almost all of the statistical literature begins with a probability model and presents
the projections as estimates of parameters in a model. For example, for two way
analysis of variance with one observation per cell the model would be written as
/(/, j) = P + a, + ßj + fiy
where p, af, ßj are parameters to be estimated (and £ a f = Y,ßj = 0 t o yi^ld identifiable parameters). The ey are errors, or disturbance terms, which are usually assumed
to be independent random variables with mean 0 and constant variance. The least
squares estimates of these parameters are the projections described earlier.
1046
9
Persi Diaconis
Assuming a model leads to well understood ways to quantify standard errors
for the estimates. It also allows analysis of data with no symmetry at all. Further,
if more careful specifications are made on the distribution of the error terms, a
variety of other estimates of the parameters become available.
One of the nice results of the past 10 years is a complete understanding of all
possible covariance structures for the error terms which lead to the original projections being efficient estimators. This, and the closely related subject of general
balance are treated by Speed (1987), or Bailey and Rowley (1990).
Rosemary Bailey has developed a more elaborate theory which allows incorporation of the randomization aspect of many designed experiments. Her treatment
provides separate provision for treatment and design aspects. The theory makes
extensive use of group theory and is well related with statistical practice. A recent
survey with extensive pointers to other work is Bailey (1990).
There has also been an extensive development which assumes that the errors
are Gaussian. The leading work here comes out of the Danish school. Andersson
and Perlman (1989) is an important paper with pointers to other work.
6.2 Bayesian Methods and Shrinkage Estimators
Once a model is specified, the Bayesian approach to statistics proceeds by putting
a prior distribution on the parameters and then using observations to get a posterior
distribution. There has been very little work on analyzing the kind of data discussed
above from a Bayesian perspective. Dawid (1988) presents some results as do Box
and Tiao (1973) and Diaconis, Eaton, and Lauritzen (1991) but much remains to
be done.
One of the exciting findings of recent statistical research has been the understanding that when many parameters must be estimated, the classical projection
estimates can be uniformly improved. The improvement depends on the assumption
of a model. Our current understanding of a reasonable way to go after the improvement involves a Bayesian (or empirical Bayesian) treatment of the problem. Again
there has not been much work on shrinkage estimates for designed experiments but
Bock (1975) and George (1986) are good starts.
6.3 Messy Data
Real data often contains a few stray or wild values that will foul up the classical
linear estimates. There is a growing theory of robust statistics surveyed in Huber
(1981) or Hoaglin, Mosteller and Tukey (1983,1985). Much remains to be done in
specializing the results available for robust regression to the demands of a complex
designed experiment.
Tukey (1977) has developed robust analyses in much the same data analytic
spirit as presented here. He supplements these with an extensive residual analysis
for ferreting out non-linearities and wild values. He also gives techniques for fitting
non-linear models such as
f(i, j) = a + ft + yj + ößft + Gij.
Applications of Group Representations to Statistical Problems
1047
There is active work on non-linear substitutes for classical least squares
estimates. Projection pursuit, as developed by Friedman, Stutzle, and Schroeder
(1984) or Huber (1985) is one of many varieties. Again, the adaption to analysis of
designed experiments is largely open.
Finally, missing data is an annoying part of real statistical analysis. A neat design
can become a nightmare with symmetry destroyed. The E.M. Algorithm is now a
standard tool for beginning to deal with this problem. Dempster, Laird, and Rubin
(1977) is a good reference and Little and Rubin (1987) is a comprehensive guide to
the state of the art.
6.4 Final Words
The problem with a model based analysis is that usually the model is simply made
up out of whole cloth, from linearity through stochastic assumptions. While in
principle assumptions can be checked, in my experience they are wildly misused.
See Freedman (1986, 1987) for an extensive discussion.
Complex models have the further disadvantage that their parameters are not
simple-to-interpret averages but estimates of rather complex quantities whose interpretation depends crucially on the correctness of the model. The spectral estimate
proposed in the first sections of this paper are relatively easy to understand averages.
Finally, as for block designs, group-theoretic considerations can lead to different
analyses and new models. There is clearly much to be done in combining the best
features of the various theories and then confronting them with reality.
References
Andersson, S., Perlman, M. (1988): Lattice models for conditional independence in a multivariate normal distribution. Technical report 155, Department of Statistics, University
of Washington
Bailey, R. (1990): A unified approach to design of experiments. J. Roy. Statist. Soc. A 144,
214-223
Bailey, R., Praeger, T., Rowley, C, Speed, T. (1983): Generalized wreath products of permutation groups. Proc. London Math. Soc. 47
Bailey, R., Rowley, C. (1990): General Balance and Treatment Permutations. Lin. Alg. Appi.
127, 183-225
Beth, T. (1984): Verfahren der Schnellen Fourier-Transform. Teubner, Stuttgart
Bloomfield, P. (1976): Fourier analysis of time series. An introduction. Wiley, New York
Bock, M.E. (1975): Minimax estimators of the mean of a multivariate normal distribution.
Ann. Statist. 3,209-218
Box, G, Tiao, G (1973): Bayesian inference in statistical analysis. Addison-Wesley, Reading,
Mass
Brillinger, D. (1975): Time series, data analysis and theory. Holt, Rinehart, and Winston, New
York
Clausen, M. (1989a): Fast Fourier transforms for meta-Abelian groups. SIAM J. Comput.
18, 584-593
Clausen, M. (1989b): Fast generalized Fourier transforms. J. Theoret. Comput. Sci. 67,55-63
Dawid, P. (1988): Symmetry models and hypotheses for structured data layouts. J. Roy.
Statist. Soc. B 50, 1-26
1048
Persi Diaconis
Dempster, A., Laird, N., Rubin, D. (1977): Maximum likelihood from incomplete data via
the EM algorithm. J. Roy. Statist. Soc. B 39,1-38
Diaconis, P. (1988): Group representations in probability and statistics. Institute of Math.
Statistics, Hay ward, CA
Diaconis, P. (1989): A generalization of spectral analysis with application to ranked data.
Ann. Statist. 17, 949-979
Diaconis, P. and Smith, L. (1989): Residual analysis for discrete longitudinal data. Technical
report. Department of Statistics^ Stanford University
Diaconis, P., Rockmore, D. (1990): Efficient computation of the Fourier Transform on finite
groups. J. Amer. Math. Soc. 3, 297-332
Diaconis, P., Eaton, M., Lauritzen, S. (1991): Finite de Finetti theorems in linear models and
multivariate analysis. Technical report Dept. of Mathematics, Aalborg University. To
appear in Scand. J. Statist.
Fortini, P. (1977): Representation of groups and the analysis of variance. Ph.D. Thesis,
Department of Statistics, Harvard University
Friedman, J.H., Schroeder, A., Stutzle, W. (1984): Projection pursuit density estimation. J.
Amer. Statist. Assoc. 79, 599-608
Freedman, D. (1987): As others see us: A case study in path analysis. J. Educ. Statist. 12,
101-206
Freedman, D., Navidi, W. (1986): Regression models for adjusting the 1980 census. Statist.
Sci. 1, 3-39
George, E. (1986): Minimax multiple shrinkage estimation. Ann. Statist. 14,188-205
Hannen, E.J. (1965): Group representations and applied probability. Methuen, New York
Hoaglin, D., Mosteller, F., Tukey, J. (1983): Understanding robust and exploratory analysis.
Wiley, New York
Hoaglin, D., Mosteller, F., Tukey, J. (1985): Exploring data: Tables, trends, and shapes. Wiley,
New York
Huber, P. (1981): Robust statistics, Wiley, New York
Huber, P. (1985): Projection pursuit. Ann. Statist. 13, 435-525
Izenman, A., Zabell, S. (1978): Babies and the blackout: the genesis of a misconception.
Technical report 38, Dept. of Statistics, University of Chicago
James, A. (1957): The relationship algebra of an experimental design. Ann. Math. Statist. 27,
993-1002
James, A. (1982): Analysis of variance determined by symmetry and combinatorial properties
of zonal polynomials. In G. Kallianpur et al. (eds.) Statistics and probability: essays in
honor of CR. Rao. North-Holland, New York
Ledermann, W. (1987): Introduction to group characters (2nd ed.). Cambridge University
Press, Cambridge
Little, R., Rubin, D. (1987): Statistical analysis with missing data. Wiley. New York
Scheffe, H. (1959): Analysis of variance. Wesely, New York
Serre, J.P. (1977): Linear representations of finite groups. Springer, Berlin Heidelberg New
York
Silcock, H.L. (1977): Generalized wreath products and the lattice of normal subgroups of a
group. Algebra Universalis. 7, 361-372
Speed, T. (1987): What is an analysis of variance? Ann. Statist. 15, 885-941
Tjur, T. (1984): Analysis of variance models in orthogonal designs. Int. Statist. 52, 33-82
Tukey, J. (1961): Discussion, emphasizing the connection between analysis of variance and
spectrum analysis. Technometrics 3, 191-219
Tukey, J. (1977): Exploratory data analysis. Addison-Wesley, Reading, Mass
Wells, C. (1976): Some applications of the wreath product construction. Amer. Math.
Monthly 83, 317-338
Yates, F. (1937): The design and analysis of factorial experiments. Imperial Bureau of Soil
Science, Harpenden, England