Applications of Group Representations to Statistical Problems f(j)= "i
Transcription
Applications of Group Representations to Statistical Problems f(j)= "i
Applications of Group Representations to Statistical Problems Persi Diaconis Department of Mathematics, Harvard University, Science Center, 1 Oxford Street Cambridge, MA 02138 USA Abstract Many problems in routine statistical analysis can be interpreted as the decomposition of a representation into irreducible components and the computation and interpretation of the projection of a given vector into these components. Examples include the usual spectral analysis of time series and the statistical analysis of variance. Recently, non-commutative representations have emerged as a practical tool. A variety of approaches have come together to give a unified theory. 1. Introduction The study of a function through the size of its coefficients in an orthogonal expansion is a standard tool. This paper shows that expansions arising from the action of a group on a set occur naturally in a variety of statistical problems. Example (Time Series Analysis). Let f(0), f(\), ... f(N — 1), be the observed value of a series of events. For example, the f(k) might be the number of children born in New York City on successive days. Data collected in time often exhibit periodic behavior; New York City birth data looks like this: 411, 430, 418, 396, 401, 320, 322, the pattern of 5 high values followed by two low values persists. This seems surprising until one realizes that about 20% of all births are induced and physicians don't like to work on weekends. Izenman and Zabell (1978) discuss these data. To find and interpret such periodicities, data f(k) is often transformed as f(j) = "i *2"w7(*). fc=0 The data can be recovered by the inversion theorem JM j=o Proceedings of the International Congress of Mathematicians, Kyoto, Japan, 1990 © The Mathematical Society of Japan, 1991 1038 Persi Diaconis If the transform f(j) is relatively large for only a few values of j , the inversion theorem shows fis well approximated by these few periodic components. This gives a simple description off and one can try to go back and understand why the few components are large. This "bump hunting" part of spectral analysis is fully explained in books by Brillinger (1975) and Bloomfield (1976). It is only one part of the story - continuous spectra is the other part (see Tukey (1961) - but it, and its generalizations will dominate the present treatment. The generalizations presented here involve a finite group G acting on a finite set X. Let / : X -> IR be a given function. In the example above, G is a cyclic group of order N acting on itself by translation. The function f(k) is the number of children born on day k. In the example of the next section, G is the symmetric group acting on itself and f(%) is the number of people in an election who ranked candidates in the permutation n. Let L(X) — {/: X -> <C}. This is a vector space on which G acts by sf(x) = f(s~lx). Mashke's theorem implies that L(X) splits into a direct sum L(X) = V0®V1@-"®VJ where each subspace is invariant under the group (so g G VI implies sg e Vt) and the pieces are irreducible, so no further splitting is possible. Clearly / G L(X) can be written as / = Yfx=ofi w i t n A t n e projection into Vt. The empirical finding, to be explored further, is that the subspaces often have simple interpretations and the decomposition of / into its projection into the V^ "makes sense". Definition. Spectral analysis consists of the computation and interpretation off and the approximation off by as few pieces as do a reasonable job. This necessarily vague definition encompasses a number of areas of classical statistics. In the next section an example is presented in some detail. Section 3 gives a group-theoretic version of the classical analysis of a variance as spectral analysis. Section 4 describes modern work on ÀNOVA of orthogonal designs as developed by Bailey, Neider, Speed, Tjur, and their co-workers. That these two approaches lead to the same analysis in nice cases is an important recent result of Rosemary Bailey, Chris Rowley, and their co-workers. This is developed in Section 5. The final section gives pointers to the many topics which couldn't be covered in this brief review. Spectral analysis as outlined here is a data analytic variation of ideas suggested earlier by Alan James and Ted Hannen. Hannen's (1965) monograph is filled with innovative ideas and treats continuous problems as well. Peter Fortini's (1977) thesis is also an important source of inspiration for the treatment presented here. Only the rudiments of group representations are needed. The beginning of the books of Ledermann (1987) or Serre (1977) are ample background. I have tried to lay out the background in Diaconis (1988). Applications of Group Representations to Statistical Problems 1039 2. An Example This section presents data on S5 the symmetric group on 5 letters. The data arise from an election of the American Psychological Association. This organization asks its membership to rank order 5 candidates for president. Here G = S5 and f(n) is (\ 2 3 4 5\ the number of voters choosing rank order 7T. For example,/1 1 = 29 so 29 voters ranked candidate one 5th, candidate two 4th and so on. The data is shown in Table 1. Let Q:S5-+ GL5(U) be the usual 5-dimensional permutation representation. Thus Q(TL) is a 5 x 5 matrix with (/, ;) entry 1 if n(i) = j and zero otherwise. The Fourier transform of / at Q is the matrix Table 1. American Psychological Association election data Ranking N o . of votes cast of this type Ranking 54321 54312 54231 54213 54132 54123 53423 53412 53241 53214 53142 53124 52431 52413 52341 52314 52143 52134 51432 51423 51342 51324 51243 51234 45321 45312 45231 45213 45132 45123 29 67 37 24 43 28 57 49 22 22 34 26 54 44 26 24 35 50 50 46 25 19 11 29 31 54 34 24 38 30 43521 43512 43251 43215 43152 43125 42531 42513 42351 42315 42153 42135 41532 41523 41352 41325 41253 41235 35421 35412 35241 35214 35142 35124 34521 34512 34251 34215 34152 34125 N o . of votes cast of this type 91 84 30 35 38 35 58 66 24 51 52 40 50 45 31 • 23 22 16 71 61 41 27 45 36 107 133 62 28 87 35 Ranking N o . of votes cast of this type Ranking N o . of votes cast of this type 32541 32514 32451 32415 32154 32145 31542 31524 31452 31425 31254 31245 25431 25413 25341 25314 25143 25134 24531 24513 24351 24315 24153 24135 23541 23514 23451 23415 23154 23145 41 64 34 75 82 74 30 34 40 42 30 34 35 34 40 21 106 79 63 53 44 28 162 96 45 52 53 52 186 172 21543 21534 21453 21435 21354 21345 15432 15423 15342 15324 15243 15234 14532 14523 14352 14325 14253 14235 13542 13524 13452 13425 13254 13245 12543 12534 12453 12435 12354 12345 36 42 24 26 30 40 40 35 36 17 70 50 52 48 51 24 70 45 35 28 37 35 95 102 34 35 29 27 28 30 1040 Persi Diaconis A?) = !<?(*)/(«). This has (i, j) entry the number of people ranking candidate i in position j . This natural summary is shown in Table 2 where entries are divided by the total number of voters to give proportions. Table 2. Percentage of voters ranking candidate i in position j Rank Candidate 1 2 3 4 5 1 2 3 4 5 18 14 28 20 20 26 19 17 17 21 23 25 14 19 20 17 24 18 20 19 15 18 23 23 20 The largest number 28 in the (3,1) position shows 28 percent of the voters ranked candidate 3 first. Candidate 3 also had some "hate vote": 23 percent ranked 3 last. This first order summary is the first thing anyone analyzing such data would try. It is natural to ask if it captures the essence of the voting pattern or if there is more to be learned. The data is summarized by / G L(S5). This last vector space splits into 7 invariant subspaces in its isotypic decomposition shown in Table 3. Table 3. Decomposition of the regular representation M= Dim 120 SS/120 vx 1 2286 e v2 16 298 e v3 25 459 e v4 36 78 e v5 25 27 © v6 16 7 © Vn 1 0 Table 2 amounts to looking at the projection off into V1®V2. If L(S) is treated as an inner product space with </|gf> = £/(rc)0(rc) the function / decomposes into the pieces of its orthogonal projection. The norm square off decomposes into the norm squared of its projections by Pythagoras's theorem. These squared lengths are shown in the last line of Table 3. As usual the largest contribution comes from the projection onto the constant functions. There is also a large projection onto the space V3. This projection is not captured in the summary of Table 2. The space V3 is made up of "2nd order functions", a typical such being n*->ô{jj>}{7c(i), n(i')} which is 1 if the unordered pair {n(i)9 n(i')} = {j, j ' } . The span of all 2nd order functions, orthogonal to Vx ® V2 make up V3. In group representations language, V3 is the isotypic subspace corresponding to the partition 3, 2. This V3 is 25-dimensional. To understand the projection o f / i n t o V3 a device of Colin Mallows was used. The function / corresponding to the data is projected onto V3. Applications of Group Representations to Statistical Problems 1041 The inner product of this projection with functions ö{jJ.}{n(i), n(i')} is then reported. The pairs {/, /'}, {;', ;'} can be chosen in 10 ways each. The 100 inner products are shown in Table 4. Table 4. Second order, unordered effects Rank Candidate 1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5 1,2 1,3 -137 -20 -88 51 57 84 -20 -44 -7 10 -24 476 -189 -150 -42 157 22 -265 -169 296 1,4 2,3 1,5 2,4 2,5 111 22 4 -179 -209 -147 -169 -160 113 47 19 -43 7 72 88 24 45 -61 -25 15 199 70 -9 43 30 -93 98 49 -16 -76 69 140 44 99 56 82 -56 25 85 47 -142 -130 18 140 -117 39 78 -5 -163 -128 3,4 3,5 6 107 -65 -48 -76 8 62 19 -51 38 -97 128 23 -53 -39 38 99 -52 -36 -9 4,5 -46 241 -146 -48 72 112 -138 -233 To explain, consider the {1, 3} {1, 2} entry 476. This is the largest number in the table. It means that there is a large positive effect for ranking candidates one and three in the first two positions of the ballot. The last entry in row {1,3} shows that these candidates also had a lot of hate vote. A very similar picture occurs for the last row of the table. With these observations, the tables pattern becomes apparent. The American Psychological Association consists of two groups, academicians and clinicians who are on uneasy terms. Candidates {1,3} are from one group, {4, 5} from the other. Very few voters cross ranks so there is a large negative effect for ranking, say {1,4}, first and second (or fourth and fifth). These observations are the main structure not revealed by the first order analysis. In studying data as we have above it is natural to ask if the data were collected again, would the same patterns arise. I will not go into the details here, but a variety of stochastic analyses suggest that the natural scale of variability in Table 4 is ± 50, so the patterns observed are believable. Further details of this analysis are given in Diaconis (1989). Diaconis and Smith (1989) give a different set of applications for these group-theoretic decompositions. 3. Analysis of Variance (ANOVA) Consider data cross classified by I levels of one variable and J levels of a second variable. The observed data is then a function f(i, j) from X = {(/, j) : 1 < / < I, 1 < ; < J} into R. The product Sj x Sj acts on X and L(X) splits into L(X) = V0® dim 7 x 7 1 V, ® 7-1 V2 ® J-\ V3 (I - \)(J - \) -80 267 1042 Persi Diaconis where V0 is the space of constant functions, V1 is the space of row functions f(U j) = f(U j ' ) , V2 i s t n e space of column functions, and V3 is the space orthogonal to V0 ® V± ® V2. The projection of / onto V0®Vl® V2 can be interpreted as. the least squares approximatiorrto/of form f(i, j) = a -i^ßf^yjfor constantsar^ yjr This, and many more complex variants are known collectively in statistical literature as the Analysis of Variance. The classical book by Sheffe (1959) is still the best treatment of this widely used subject. The group-theoretic treatment of ANOVA was pioneered by Alan James and Ted Hannen with important later work by Peter Fortini. Group theory is useful in analyzing more complex designs where the appropriate decomposition is not so easy to guess at. The dimensions and projections of various subspaces can be computed by character theory. Diaconis (1988) reviews these topics. Even in the simple example given above, thinking group-theoretically has something to offer: Instead of 5Z x Sj one can consider 57 x Cj or Cl x Cj, with CI a. cyclic group of order I. These groups act transitively and their use would be appropriate if the order of the corresponding rows or columns matters. For example, if the rows of the table were birds, and the columns months of the year, with (i, j) entry the number of birds of type i cited in month j (all in a given location) the decomposition by Sj x C 12 would be appropriate. Carrying out the projections involves calculating the Fourier transform at many different irreducible representations. In Diaconis and Rockmore (1990) a noncommutative analog of the fast Fourier transform has been developed and used to make these computations efficient. Similar work is being developed by Beth (1984) and Clausen (1989a, b). Historically, the FFT on C{ was first developed by Yates (1937) to analyze multi way tables. 4. Modern ANOVA Analysis of variance has developed along non group-theoretic lines. In this section a survey of works by Rosemary Bailey, Chris Rowley, John Neider, Tue Tjur, Terry Speed, and their co-workers is given. The next section shows how the present treatment and group-theoretic treatment interact. Begin with a finite set X. Let L(X) be the real-valued functions on X. A design ^ is a set of partitions F of X. For example, in ANOVA with repeated observations in a cell X = {(i, j , k) : 1 < i < I, 1 < j < J, 1 < k < nu} the design might be taken as 9) = {U, R, C, R A C, E} where U is the universal partition with one block. R is the row partition (i, j , k) ~ (i', / , k') iff i = V, C is the column partition, R A C, the minimum of R and C, has indices equivalent if they are in the same row and column, and E is the partition into singletons. For each partition F G 9, let LF = {/ G L(X) which are F measurable}. The projection PF : L(X) -> LF is defined by the averaging matrix (P ) F xy = {Vl/I \0 x, y G / w i t h / a block in F otherwise. Applications of Group Representations to Statistical Problems 1043 Two partitions, F,G e 9 are called orthogonal if their subspaces are geometrically orthogonal, or if PFPG = PGPF- F ° r t n e ANOVA example, R and C are orthogonal provided nu — \X\ni+n+j with ni+ = X/"(/- An orthogonal design has all factors orthogonal. In recent years, orthogonal designs with the set of factors closed under maxima have come to be seen as a useful class with a unified theory. Adding maxima of orthogonal factors preserves orthogonality, so a design can always be completed in this way. Such designs are called Tjur designs because of the following basic result. Theorem (Tjur (1984)), A Tjur design admits a unique decomposition L(X) = © VB GeS) with the property that LF = © VG. G>F The projection of a given / G L(X) onto the various VG constitute the analysis of variance for a Tjur design. The point of the decomposition is this: it is easy to complete the projection PF of / onto LF. The partially ordered set of factors then allows the computation onto the VG by subtraction which amounts to Möbius inversion in the poset. In the basic ANOVA example, the factors can be diagrammed as: U^ R' R A C Given / G L(X) one computes the projection onto the constant functions by Pv. The projection onto the row effects space VR is given by PR — Pv. The projection onto the column effects space Vc is given by Pc — Pv. The projection onto the residual or interaction space VRAC is given by PRAC — PR — Pc + Pu- More generally, the projection onto VG is given by Y^F^GVÌF* G)PF with p the Möbius function of the partially ordered set of factors. This is an easy algorithm which is used by many large computer programs (e.g., Genstat) to analyze designed experiments. A splendid treatment of this point of view appears in Tjur (1984). 1044 Persi Diaconis 5. Groups and Modem ANOVA Sections 3 and 4 above present two different approaches to the analysis of designed experiments. In both, data are represented as / G L(X). In the group case, a group G is found acting on X and the analysis consists of decomposing L(X) and computing projections. In the second case, a collection of partitions is produced and one uses the splitting of L(X) into parts indexed by these partitions. It is natural to ask about the relation between these approaches. This problem has been solved for a large class of examples in recent work of Bailey, Prager, Rowley, and Speed (1983). Their work intertwines the two approaches. It is also important group-theoretically in providing examples where the Fourier transform can be computed using the simple averaging and difference algorithm outlined before. To describe their result, let X be a finite set and <3 = {F} a design, or set of partitions of X. Assume that the blocks of each F e § have uniform size, that all partitions F, G in Si are orthogonal, that @) is closed under max and min, and finally that 3) forms a distributive lattice under max and min. These assumptions include many complex classical cases. However, adding the minimum of two partitions to an orthogonal design can destroy orthogonality. As automorphism of a design 3) is a 1-1 map % : X -» X such that for each F e 3, if x and y are in the same block of F thçn n(x) and %(y) are in the same block of F. The set of all automorphisms of 3) is called the automorphism group of 3. Bailey, Prager, Rowley, and Speed (1983) did three main things: I) They identified the automorphism group of 3 as a generalized wreath product of symmetric groups indexed by a partially ordered set. These generalized wreath products have been extensively developed because of their role in the algebraic theory of semi-groups (Krohn-Rhodes theory). A marvelous introduction to this theory appears in Wells (1976). The result also builds on previous work by Silcock (1977). II) They identify the characters of the automorphism group that appear in the representation L(X). III) They show that the group-theoretic and partition based analysis agree. Much further work is not reported here. For example, they determine the commuting algebra of L(X), give a natural language for describing the groups and decomposition, and finally they make the link with the large body of statistical work in a useful way. Each approach has problems where it seems to be the superior mode of analysis. The approach by partitions works for some designs without enough symmetry to permit a useful group-theoretic analysis. For example, consider a 2-way array with n y entries per cell where ntj are as shown: Applications of Group Representations to Statistical Problems 1045 Here it does not make sense to permute the rows or columns. However, the design is orthogonal and permits a straightforward analysis. In the other direction, block designs are a widely used class of design which are not orthogonal. As an example, consider an experiment in which v levels of vanilla are to be compared to help decide how much to put into ice cream. If one asks people to taste many ice cream cones, they all taste like colored sugar water. Thus suppose people are asked to taste k < v flavors. A complete block design involves v\ people each of whom tastes k levels of vanilla. Suppose the response is a rating between 0 and 100. This yields fc( I responses in total. The underlying set X = {(/, s) : 1 < / < v, \s\ = k, i G s} the responses give / : X -> R. T The two natural partitions are for treatments and blocks. Thus (/, s) ~ (/', s') if / = /' and (/, s) ~ (/', s') if s = s'. These two partitions do not yield an orthogonal design. Indeed, here T A B = U and the condition for orthogonality can be stated as nis\X\ = n{ns where - nt is the number of elements in X receiving treatment /1 so n{ = ( - ns is the number of elements in X in block s (so ns = k) - nis is the number of elements in X with x = (i, s) (so ns = 1 if / G S, 0 if ifis). It follows that nis\X\ ^ nx • ns for /fis. The group-theoretic analysis of this kind of data is straightforward but picks up aspects not developed in earlier analysis. The automorphism group can be identified with Sv. The representation L(X) decomposes into a treatment space and a block space, but it also includes new pieces which may be interpreted as the effect of taster's rating by comparison. Fortini (1977) or Diaconis (1988) give further detail. 6. Other Topics This section gives pointers to closely related research which cannot be adequately covered due to space limitations. 6.1 Stochastic Models The approach to spectral analysis outlined above begins with data and a group. Almost all of the statistical literature begins with a probability model and presents the projections as estimates of parameters in a model. For example, for two way analysis of variance with one observation per cell the model would be written as /(/, j) = P + a, + ßj + fiy where p, af, ßj are parameters to be estimated (and £ a f = Y,ßj = 0 t o yi^ld identifiable parameters). The ey are errors, or disturbance terms, which are usually assumed to be independent random variables with mean 0 and constant variance. The least squares estimates of these parameters are the projections described earlier. 1046 9 Persi Diaconis Assuming a model leads to well understood ways to quantify standard errors for the estimates. It also allows analysis of data with no symmetry at all. Further, if more careful specifications are made on the distribution of the error terms, a variety of other estimates of the parameters become available. One of the nice results of the past 10 years is a complete understanding of all possible covariance structures for the error terms which lead to the original projections being efficient estimators. This, and the closely related subject of general balance are treated by Speed (1987), or Bailey and Rowley (1990). Rosemary Bailey has developed a more elaborate theory which allows incorporation of the randomization aspect of many designed experiments. Her treatment provides separate provision for treatment and design aspects. The theory makes extensive use of group theory and is well related with statistical practice. A recent survey with extensive pointers to other work is Bailey (1990). There has also been an extensive development which assumes that the errors are Gaussian. The leading work here comes out of the Danish school. Andersson and Perlman (1989) is an important paper with pointers to other work. 6.2 Bayesian Methods and Shrinkage Estimators Once a model is specified, the Bayesian approach to statistics proceeds by putting a prior distribution on the parameters and then using observations to get a posterior distribution. There has been very little work on analyzing the kind of data discussed above from a Bayesian perspective. Dawid (1988) presents some results as do Box and Tiao (1973) and Diaconis, Eaton, and Lauritzen (1991) but much remains to be done. One of the exciting findings of recent statistical research has been the understanding that when many parameters must be estimated, the classical projection estimates can be uniformly improved. The improvement depends on the assumption of a model. Our current understanding of a reasonable way to go after the improvement involves a Bayesian (or empirical Bayesian) treatment of the problem. Again there has not been much work on shrinkage estimates for designed experiments but Bock (1975) and George (1986) are good starts. 6.3 Messy Data Real data often contains a few stray or wild values that will foul up the classical linear estimates. There is a growing theory of robust statistics surveyed in Huber (1981) or Hoaglin, Mosteller and Tukey (1983,1985). Much remains to be done in specializing the results available for robust regression to the demands of a complex designed experiment. Tukey (1977) has developed robust analyses in much the same data analytic spirit as presented here. He supplements these with an extensive residual analysis for ferreting out non-linearities and wild values. He also gives techniques for fitting non-linear models such as f(i, j) = a + ft + yj + ößft + Gij. Applications of Group Representations to Statistical Problems 1047 There is active work on non-linear substitutes for classical least squares estimates. Projection pursuit, as developed by Friedman, Stutzle, and Schroeder (1984) or Huber (1985) is one of many varieties. Again, the adaption to analysis of designed experiments is largely open. Finally, missing data is an annoying part of real statistical analysis. A neat design can become a nightmare with symmetry destroyed. The E.M. Algorithm is now a standard tool for beginning to deal with this problem. Dempster, Laird, and Rubin (1977) is a good reference and Little and Rubin (1987) is a comprehensive guide to the state of the art. 6.4 Final Words The problem with a model based analysis is that usually the model is simply made up out of whole cloth, from linearity through stochastic assumptions. While in principle assumptions can be checked, in my experience they are wildly misused. See Freedman (1986, 1987) for an extensive discussion. Complex models have the further disadvantage that their parameters are not simple-to-interpret averages but estimates of rather complex quantities whose interpretation depends crucially on the correctness of the model. The spectral estimate proposed in the first sections of this paper are relatively easy to understand averages. Finally, as for block designs, group-theoretic considerations can lead to different analyses and new models. There is clearly much to be done in combining the best features of the various theories and then confronting them with reality. References Andersson, S., Perlman, M. (1988): Lattice models for conditional independence in a multivariate normal distribution. Technical report 155, Department of Statistics, University of Washington Bailey, R. (1990): A unified approach to design of experiments. J. Roy. Statist. Soc. A 144, 214-223 Bailey, R., Praeger, T., Rowley, C, Speed, T. (1983): Generalized wreath products of permutation groups. Proc. London Math. Soc. 47 Bailey, R., Rowley, C. (1990): General Balance and Treatment Permutations. Lin. Alg. Appi. 127, 183-225 Beth, T. (1984): Verfahren der Schnellen Fourier-Transform. Teubner, Stuttgart Bloomfield, P. (1976): Fourier analysis of time series. An introduction. Wiley, New York Bock, M.E. (1975): Minimax estimators of the mean of a multivariate normal distribution. Ann. Statist. 3,209-218 Box, G, Tiao, G (1973): Bayesian inference in statistical analysis. Addison-Wesley, Reading, Mass Brillinger, D. (1975): Time series, data analysis and theory. Holt, Rinehart, and Winston, New York Clausen, M. (1989a): Fast Fourier transforms for meta-Abelian groups. SIAM J. Comput. 18, 584-593 Clausen, M. (1989b): Fast generalized Fourier transforms. J. Theoret. Comput. Sci. 67,55-63 Dawid, P. (1988): Symmetry models and hypotheses for structured data layouts. J. Roy. Statist. Soc. B 50, 1-26 1048 Persi Diaconis Dempster, A., Laird, N., Rubin, D. (1977): Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39,1-38 Diaconis, P. (1988): Group representations in probability and statistics. Institute of Math. Statistics, Hay ward, CA Diaconis, P. (1989): A generalization of spectral analysis with application to ranked data. Ann. Statist. 17, 949-979 Diaconis, P. and Smith, L. (1989): Residual analysis for discrete longitudinal data. Technical report. Department of Statistics^ Stanford University Diaconis, P., Rockmore, D. (1990): Efficient computation of the Fourier Transform on finite groups. J. Amer. Math. Soc. 3, 297-332 Diaconis, P., Eaton, M., Lauritzen, S. (1991): Finite de Finetti theorems in linear models and multivariate analysis. Technical report Dept. of Mathematics, Aalborg University. To appear in Scand. J. Statist. Fortini, P. (1977): Representation of groups and the analysis of variance. Ph.D. Thesis, Department of Statistics, Harvard University Friedman, J.H., Schroeder, A., Stutzle, W. (1984): Projection pursuit density estimation. J. Amer. Statist. Assoc. 79, 599-608 Freedman, D. (1987): As others see us: A case study in path analysis. J. Educ. Statist. 12, 101-206 Freedman, D., Navidi, W. (1986): Regression models for adjusting the 1980 census. Statist. Sci. 1, 3-39 George, E. (1986): Minimax multiple shrinkage estimation. Ann. Statist. 14,188-205 Hannen, E.J. (1965): Group representations and applied probability. Methuen, New York Hoaglin, D., Mosteller, F., Tukey, J. (1983): Understanding robust and exploratory analysis. Wiley, New York Hoaglin, D., Mosteller, F., Tukey, J. (1985): Exploring data: Tables, trends, and shapes. Wiley, New York Huber, P. (1981): Robust statistics, Wiley, New York Huber, P. (1985): Projection pursuit. Ann. Statist. 13, 435-525 Izenman, A., Zabell, S. (1978): Babies and the blackout: the genesis of a misconception. Technical report 38, Dept. of Statistics, University of Chicago James, A. (1957): The relationship algebra of an experimental design. Ann. Math. Statist. 27, 993-1002 James, A. (1982): Analysis of variance determined by symmetry and combinatorial properties of zonal polynomials. In G. Kallianpur et al. (eds.) Statistics and probability: essays in honor of CR. Rao. North-Holland, New York Ledermann, W. (1987): Introduction to group characters (2nd ed.). Cambridge University Press, Cambridge Little, R., Rubin, D. (1987): Statistical analysis with missing data. Wiley. New York Scheffe, H. (1959): Analysis of variance. Wesely, New York Serre, J.P. (1977): Linear representations of finite groups. Springer, Berlin Heidelberg New York Silcock, H.L. (1977): Generalized wreath products and the lattice of normal subgroups of a group. Algebra Universalis. 7, 361-372 Speed, T. (1987): What is an analysis of variance? Ann. Statist. 15, 885-941 Tjur, T. (1984): Analysis of variance models in orthogonal designs. Int. Statist. 52, 33-82 Tukey, J. (1961): Discussion, emphasizing the connection between analysis of variance and spectrum analysis. Technometrics 3, 191-219 Tukey, J. (1977): Exploratory data analysis. Addison-Wesley, Reading, Mass Wells, C. (1976): Some applications of the wreath product construction. Amer. Math. Monthly 83, 317-338 Yates, F. (1937): The design and analysis of factorial experiments. Imperial Bureau of Soil Science, Harpenden, England