International Conference on Mathematical and Statistical Modeling

Transcription

International Conference on Mathematical and Statistical Modeling
International Conference on Mathematical and Statistical Modeling
in Honor of Enrique Castillo. June 28-30, 2006
Distance-based Multi-sample Tests for General
Multivariate Data
Carles M. Cuadras∗
Department d’Estadistica
Uiversitat de Barcelona
Abstract
Most multivariate tests are based on the hypothesis of multinormality. But often
this hypothesis fails, or we have variables that are non quantitative. On the other
hand we can deal with a large number of variables. Defining probabilistic models
with mixed data is not easy. However, it is always possible to define a measure of
distance between two observations. We prove that the use of distances can provide
alternative tests for comparing several populations when the data are of general
type. This approach is illustrated with three real data examples. We also define
and study a measure of association between two data sets and make an extension
of the so-called distance-based discriminant rule.
Key Words: Statistical distances, multivariate association, discriminant analysis,
MANOVA, ANOQE, permutation test, large data sets.
1
Introduction
Let Ω = {ω1 , ω2 , ..., ωn } be a finite set with n individuals. Let δii =
δ(ωi , ωi ) = δ(ωi , ωi ) ≥ δ(ωi , ωi ) = 0 a distance or dissimilarity function
defined on Ω. We suppose that the n × n distance matrix ∆ = (δii ) is
Euclidean. Then there exist a configuration x1 , . . . , xn ∈ Rp , with xi =
(xi1 , . . . , xip ) , i = 1, . . . , n, such that
δii2 p
=
(xij − xi j )2 = (xi − xi ) (xi − xi ).
(1.1)
j=1
These coordinates constitute an n × p matrix X =(xij ) such that the distance between two rows i and i equals δii .
A way of obtaining X from ∆ is as follows. Write A = − 12 ∆(2) and
G = HAH, where ∆(2) = (δii2 ) and H = In − n−1 1n 1n is the centering
∗
Correspondence to: Carles M. Cuadras. Department of Statistics. University of
Barcelona. Spain
2
C. M. Cuadras
matrix. Then ∆ is Euclidean with dimension p =rang(G) if and only
if G ≥ 0. The spectral decomposition G = UΛ2 U gives X = UΛ. Thus
G = XX and the relation between ∆(2) and G is given by ∆(2) = 1g +
g1 − 2G, where the n × 1 vector g contains the diagonal entries in G. Note
that if S = (sii ) is a similarity matrix and we define the squared distance
δii2 = sii + si i − 2sii , then G = HSH.
Matrices X and U contain the principal and standard coordinates, respectively. This method is called classic multidimensional scaling or principal coordinate analysis (Cox and Cox, 1994; Gower, 1966; Mardia et al.,
1979).
Classic multivariate inference is mainly based on the hypothesis of normality. But often this hypothesis fails or we have non-quantitative variables. On the other hand, we can deal with a large number of variables.
The main aim of this paper is to present three distance-based methods,
on the basis of general data (quantitative, qualitative, binary, nominal,
mixed), for comparing several populations. This distance-based approach
extends some results by Cuadras and Fortiana (2004) and is in the line of
Cuadras (1989; 1992), Cuadras and Arenas (1990), Cuadras et al. (1996,
1997), Rao (1982) and Liu and Rao (1995).
Firstly, let us comment some distance-based aspects on multivariate
association and discrimination.
2
Multivariate association
Suppose that we have two data sets D1 and D2 on the same Ω. The task of
associating D1 and D2 has been well studied when two quantitative data
matrices are available. Thus, if X and Y are two centered data matrices
of orders n × p and n × q, Escoufier (1973) introduced the generalized
correlation
RV(X, Y) = tr(S12 S21 )/ tr(S211 )tr(S222 ),
where S11 = X X, S22 = Y Y, S12 = X Y, S21 = Y X. This correlation is
quite related to the Procrustes statistics (Cox and Cox, 1994)
R2 = 1 − {tr(X YY X)1/2 }2 /{tr(X X)tr(Y Y)}.
Yanai et al. (2006) employ determinants of rectangular matrices to
3
Distance-based Tests
introduce a measure Re(X, Y) of association, where
X X X Y
2
/[det(X X) det(Y Y)].
Re(X, Y) = det
YX YY
We have RV(X, Y) = 1, R2 =Re(X, Y)2 = 0 if X = TY (T orthogonal)
and RV(X, Y) = 0, R2 =Re(X, Y)2 = 1 if X Y = 0.
We show that we can use principal coordinates to define association
with general data when a distance function is available. In this case, we
may obtain X, Y by considering a distance between observations, which
provides the n×n distance matrices ∆x and ∆y . By spectral decomposition
of the corresponding inner product matrices Gx = UΛ2x U , Gy = VΛ2y V ,
we can obtain the standard coordinates U, V and the principal coordinates
X = UΛx , Y = VΛy .
We define the association between D1 and D2 by
η 2 (D1 , D2 ) = det(U VV U).
(2.1)
For the sake of simplicity, we write η 2 (X, Y). This association coefficient
satisfies the following properties:
1) 0 ≤ η 2 (X, Y) = η 2 (Y, X) ≤ 1.
2) η 2 (X, Y) = det(X YY X)/[det(X X) det(Y Y)].
3) η 2 (X, Y) does not depend on the configuration matrices X, Y.
4) If y is a vector and X is a matrix, both quantitative, then R2 (y, X) =
η 2 (y, X), where R is the multiple correlation coefficient.
5) If x, y are quantitative vectors, then r 2 (y, x) = η 2 (y, x), where r is
the ordinary correlation coefficient.
6) If rj , j = 1, · · · , q are the canonical correlation coefficients between
X and Y, then
q
rj2 .
η 2 (X, Y) =
j=1
We outline the proof. Write W = UV , where the columns of U and V
are orthonormal. Then 1) follows from 0 ≤ det(WW ) = det(W W) ≤ 1.
As Λx , Λy are diagonal, 2) reduces to det(U VV U). Similarly, if S and T
are p × p nonsingular matrices, then η 2 (X, Y) = η 2 (XS, YT). In particular
4
C. M. Cuadras
S and T can be orthogonal matrices and XS, YT define the same distance
√
matrices ∆x , ∆y . To prove 4), note that Λx / n contains the standard
√
−1
deviations of the columns of X = UΛx . If y = nsy v and r =Λ−1
x X ysy
is the vector of simple correlations between y and X, as Rxx = I, then:
R2 (y, X) = r R−1
xx r
XΛ−1 Λ−1 X y
y
= s−2
y
x
x
= v UU v.
2
Finally, the canonical correlations satisfy det(Rxy R−1
yy Ryx −rj Rxx ) = 0,
with Rxx = Ryy = I and Rxy = Ryx = U V and 6) follows.
The association measure (2.1) is similar to the measure used in Arenas
and Cuadras (2004) for studying the agreement between two representations of the same data. This measure can be computed using only distances
and is given by
θ(X, Y) = 2[1 − tr(Gxy )/tr(Gx + Gy )],
1/2
1/2
1/2
1/2
where Gxy = Gx + Gy − (Gx Gy + Gy Gx )/2. Normalizing Gx , Gy
to tr(Gx )=tr(Gy )=tr(Λ2x ) =tr(Λ2y ) = 1, this measure reduces to
θ(X, Y) = tr(UΛx U VΛy V ).
Example 2.1. Is there relation between trade and science? Table 1 is a
matrix reporting trade and scientific relations between 10 countries. The
lower triangle contains 1 if significant trade occurred between two countries,
0 otherwise. The diagonal and upper triangle contains, for every pair of
countries, the number of papers (mathematics and statistics, period 19962002) published in collaboration. Thus, Spain published 8597 paper without
collaboration, 692 collaborating with USA, 473 with France, etc.The upper
matrix Q is standardized to S = D−1 Q, where D =diag(S). Thus S has
ones in the diagonal. The lower matrix is transformed to a matrix of similarities using the Jaccard coefficent, as explained in Cox and Cox (1994,
p. 73). Then we obtain the spectral decomposition G = HSH for each
similarity matrix, and considering all principal dimensions (i.e., nine), we
get η 2 (D1 , D2 ) = 0.0816. This coefficient reveals a weak association between
trade and science.
5
Distance-based Tests
Table 1: Trade (1 if significant, 0 otherwise) and scientific relation (number of
papers of mathematics and statistics in collaboration during 1996-2002) among
ten countries.
USA
Spain
France
U.K.
Italy
Germany
Canada
Japan
China
Russia
3
USA Spa Fra U.K. Ital Ger Can Jap Chi
Rus
63446 692 2281 2507 1642 2812 2733 1039 1773
893
1 8597 473 347 352 278 163 69
104
177
1
1 17155 532 916 884 496 269 167
606
1
1
1 12585 490 810 480 213 339
365
0
1
1
0 13197 677 290 169 120
512
1
1
1
1
1 16588 499 350 408
984
1
0
0
1
0
0 7927 228 601
204
1
0
0
0
0
0
1 20001 371
193
1
0
0
0
0
0
1
1 39140
64
0
0
0
0
0
0
0
0
1 18213
The proximity function
Let X be a random vector with pdf f (x) with respect to a suitable measure
and support S. Since the results below can be generalized easily, we may
suppose the Lebesgue measure. If δ is a distance or dissimilarity function
between the observations of X, we define the geometric variability of X
with respect to δ as
1
δ2 (x, y) f (x) f (y) dxdy.
(3.1)
V δ (X) =
2 S×S
The proximity function of an observation x to the population Π represented by X is defined by
δ2 (x, y)f (y)dy−V δ (X) .
(3.2)
φ2δ (x,Π) =
S×S
Suppose that ψ : S → L is a representation of S in a Euclidean (or separable Hilbert) space L such that δ2 (x, y) = ||ψ(x)−ψ(y)||2 . The interest
of V δ (X) and φ2δ (x) comes from the following properties.
1. We can interpret V δ (X) as a generalized variance and φ2δ (x) as the
6
C. M. Cuadras
squared distance from x to an ideal mean of X :
V δ (X)
= E||ψ(X)||2 − ||E(ψ(X)||2 ,
φ2δ (x,Π) = ||ψ(x)−E(ψ(X))||2 .
(3.3)
In fact, if δ is the ordinary Euclidean distance, then V δ (X) =tr(Σ).
2. If we transform the distance: δ2 = aδ2 + b, then V δ = aV δ + b/2 and
φ2δ = aφ2δ + b/2.
3. If δ2 = δ12 + δ22 then φ2δ = φ2δ1 + φ2δ2 .
4. By suitable choices of a, b we may transform δ and generate the probability density
fδ (x) = exp(−φ2δ (x,Π)).
Then
I(f ||fδ ) = V δ (X) − H(f ) ≥ 0,
where I(f ||fδ ) is the Kullback-Leibler divergence and H(f ) is the
Shannon entropy.
5. Given G populations Π1 , . . . , ΠG , where X has pdf fg (x) when x
comes from Πg , we can allocate an individual ω ∈ Π1 ∪ · · · ∪ ΠG by
using the rule:
allocate ω to Πi if φ2δ (x,Πi ) = min {φ2δ (x,Πg )}.
1≤g≤G
This is the distance-based (DB) discriminant rule (Cuadras et al, 1997).
In the next section we perform an extension of this rule.
4
The distance-based Bayes rule
Suppose that Π1 , . . . , ΠG have probabilities “a priori” P (Πj ) = qj , with
qj = 1. The DB discriminant rule is equivalent to the Bayes rule by
using the dissimilarity
δ2 (x, y) = log fj (x)fj (y) + 2 log qj
if x, y comes from Πj .
Then φ2δ (x,Πj ) = log fj (x) + log qj and the Bayes rule
allocate ω to Πi if
qi fi (x) = max {qg fg (x)},
1≤g≤G
7
Distance-based Tests
is equivalent to the DB rule
allocate ω to Πi if
log fi (x) + log qi = min {log fg (x) + log qg }.
1≤g≤G
However, the DB rule has interest when we can define a proper distance
between observations without using the pdf. For example, suppose that Πj
is multivariate normal Np (µj , Σ). The Mahalanobis distance M 2 (x, y) =
(x − y) Σ−1 (x − y) between observations provides VM = p and φ2M =
(x − µj ) Σ−1 (x − µj ).
By adding an additive constant:
2 (x, y) = (x − y) Σ−1 (x − y) − 4 log qj
M
if x, y comes from Πj ,
then φ2M (x,Πj ) = (x − µj ) Σ−1 (x − µj ) − 2 log qj , and the DB rule:
allocate ω to Πi if
d(x, µi , qi ) = min {d(x, µg , qg )},
1≤g≤G
where d(x, µj , qj ) = (x − µj ) Σ−1 (x − µj ) − 2 log qj , is equivalent to the
Bayes rule.
5
Multivariate multiple-sample tests
The comparison of several populations can be approached under parametric
models. A non-parametric general method, which extends that of Cuadras
and Fortiana (2004) is next proposed.
Suppose that D1 , . . . , DG are G ≥ 2 independent data sets coming from
the populations Π1 , . . . , ΠG . These data can be general (quantitative, qualitative, nominal, mixed). We wish to test
H0 : Π1 = · · · = ΠG .
Under H0 all data come from the same underlying distribution.
First, we assume that, by means of a distance function between observations, we can obtain the intra-distance matrices ∆11 , . . . , ∆GG , and
the inter-distance matrices ∆12 , . . . , ∆G−1G . Thus we have the n × n superdistance matrix
⎤
⎡
∆11 · · · ∆1G
⎢
.. ⎥ ,
..
∆ = ⎣ ...
.
. ⎦
∆G1 · · · ∆GG
8
C. M. Cuadras
where ∆ij is ni × nj .
Next, we compute, via principal coordinate analysis, the matrices G
and X such that G = XX . We write the full X as
⎤
⎡
X1
⎥
⎢
X = ⎣ ... ⎦ .
XG
The Euclidean distances between the rows of Xi and Xi give ∆ii . Thus
the matrices X1 , . . . , XG may represent the G quantitative data sets, which
can be compared for testing H0 .
5.1
Partitioning the geometric variability
The rows x1 , . . . , xn of any N × p multivariate data matrix X satisfy
N
N N
= 2N
(5.1)
i=1
i =1 (xi − xi )(xi − xi )
i=1 (xi − x)(xi − x) ,
where x = N −1
N
i=1 xi
is the mean vector.
Now suppose G data matrices X1 , · · · , XG , where each Xi has order
ni × p. Recall the identity T = B + W, with
B
=
W =
T
=
G
g=1 ng (xg
G
ng
G
ng
g=1
g=1
− x)(xg − x) ,
i=1 (xgi
− xg )(xgi − xg ) ,
i=1 (xgi
− x)(xgi − x) ,
where n = n1 + · · · + ng , xgi is a row of Xg with mean xg and x is the
overall mean. Matrices T, B, W are the total, between samples and within
samples, respectively.
9
Distance-based Tests
From (5.1) we obtain:
T
B
Wg
=
G
g,h=1
ng ,nh
i,i =1 (xgi
− xhi )(xgi − xhi )
ng
= 2n G
i=1 (xgi − x)(xgi − x) ,
g=1
= G
g,h=1 ng nh (xg − xh )(xg − xh )
= 2n G
g=1 ng (xg − x)(xg − x) ,
= G
g,h=1 ng nh (xg − xh )(xg − xh )
ng
= 2ng i=1
(xgi − xg )(xgi − xg ) ,
which shows the following identity concerning matrices built with differences between observations and between means:
T=B+n
G
n−1
g Wg ,
(5.2)
g=1
where T, B and Wi are p × p matrices.
We can partition the variability in a similar way. The geometric variability of an n × n distance matrix ∆ = (δii ), with related inner product
matrix G, is defined by
Vδ =
n
n
1 2
δii = tr(G)/n,
2n2
i=1 i =1
where δii2 = (xi − xi ) (xi − xi ). Vδ is the sampling version of (3.1).
By taking traces in (5.2) we obtain
tr(T) = tr(B) + n
G
n−1
g tr(Wg ).
g=1
We write this identity as
Vδ (total) = Vδ (between) + n
−1
G
g=1
ng Vδ (within g).
(5.3)
10
5.2
C. M. Cuadras
Tests with principal coordinates
The above identities (5.2) and (5.3) can be used for comparing populations. Given G independent data sets D1 , · · · , DG , we may obtain the
super-distance matrix ∆ and the principal coordinates X1 , . . . , XG . Then
we can obtain B and T and compute two statistics for testing H0 :
a) γ1 = det(T − B)/ det(T),
b) γ2 = Vδ (between)/Vδ (total).
Both statistics lie between 0 and 1. Small values of γ1 and large values
of γ2 , respectively, give evidence to the alternative hypothesis. Note that
γ2 = tr(B)/tr(T) is an statistic based on quadratic entropy and it is used
in ANOQE (analysis of quadratic entropy), a generalization of ANOVA
(analysis of variance) proposed by C. R. Rao in several papers. Also note
that the distribution of γ1 is Wilks if the populations are multivariate normal with the same covariance matrix and we choose the Euclidean distance.
Except for multinormal data and other few distributions, the sample
distribution of γ1 and γ2 is unknown. The asymptotic distribution involves
sequences of nuisance parameters, which were found for very specific distances and distributions (see Cuadras and Fortiana, 1995; Cuadras and
Lahlou, 2000; Cuadras et al., 2006). Liu and Rao (1997) derives the bootstrap distribution of Vδ (between). Indeed, the use of resampling methods,
as described in Flury (1997), may overcome this difficulty.
5.3
Tests with proximity functions
Another test, which avoids resampling procedures, can be derived by using
proximity functions and non-parametric statistics. First note that, with
quantitative data and Mahalanobis distances, the proximity functions are
φ2δ (x,Πg ) = (x − µg ) Σ−1 (x − µg ),
g = 1, . . . , G.
These functions are equal under H0 : µ1 = · · · = µG .
Suppose in general that x11 , . . . , x1n1 represent the n1 observations coming from Π1 and ω is a new individual with coordinates x. The sampling
counterpart of the proximity function (3.2) is
n1
1 2
δ2 (x, x1i ) − Vδ (within 1 ),
φ1 (x) =
n1
i=1
11
Distance-based Tests
where δ(ω, ω1i ) = δ(x, x1i ). Note that we do not need to find the x’s vectors.
We similarly obtain φ2 (x) , etc. However, under H0 all the population
2
proximity functions are the same, see (3.3). Thus, we may work with the
full proximity function
G
ng
1 2 δ x, xgi − Vδ (total).
φ2 (x) =
n
g=1 i=1
If agi = φ2 (xgi ) = ||xgi −x||2 , where xgi comes from Πg , we obtain the
proximity values (computed using only distances),
Π1 : a11 , . . . , a1n1 ; · · · ; ΠG : aG1 , . . . , aGnG .
Under H0 the agi follows (approximately) the same distribution. Then
a Kruskal-Wallis test can be performed to accept or reject H0 . This test is
based on
G
Rg2
12
)
− 3(n + 1),
H=(
n(n + 1) g=1 ng
where the n values agi are arranged and Rg is the sum of the ranks for the
values in Πg . This statistic H is asymptotically chi-square with G − 1 d.f.
Example 5.1. We consider three data sets covering the quantitative, mixed
and nominal cases.
The first data set is the well-known Fisher Iris data with p = 4 quantitative variables, G = 3 species and ng = 50. We use the city-block distance.
The student mixed data is taken from Mardia et al. (1979, p. 294).
We only consider G = 3 groups with n1 = 25, n2 = 117, n3 = 26. There is a
quantitative variable and a qualitative variable and we use the distance δij =
1 − sij , where sij is Gower’s similarity coefficient for mixed variables
(Gower and Legendre, 1986).
The DNA data, used in Cuadras et al. (1997), consists of sequences of
length 360 base pairs taken from a segment of mitochondrial DNA for a set
of 120 individuals belonging to G = 4 human groups, with n1 = 25, n2 =
41, n3 = 37, n4 = 17 individuals. Since the data are large strings of ACGT,
the standard methods fail miserably, whereas the DB approach provides a
solution.
12
C. M. Cuadras
Table 2: Features and sizes of data sets used to illustrate three distance-based
multisample tests.
Data
Iris
Students
DNA
Type
Quantitative
Mixed
Nominal
Groups
3
3
4
Sizes
50 + 50 + 50 = 150
25 + 117 + 26 = 168
25 + 41 + 37 + 17 = 120
Distance
City-block
Gower
Matching
Table 3: Some results of three distance-based multisample tests on real data.
Data
Iris
Students
DNA
H
42.85
4.498
27.49
d.f.
2
2
3
P −value
0.000
0.105
0.000
γ1
0.0056
0.5788
0.0071
P −value
0.000
0.001
0.000
γ2
0.8051
0.0424
0.7800
P −value
0.000
0.003
0.000
We obtain the randomization distribution of γ1 and γ2 for N = 10000
partitions into subsets of sizes n1 , . . . , nG and estimate the P -values. In
contrast, note that the statistic H can be obtained without resampling.
Table 2 describes data and distances used. Note that the Euclidean
distance for Iris data gives γ1 = 0.0234, which under normality and H0 is
distributed as Wilks Λ(4, 147, 2). Table 3 reports the results obtained. There
are significant differences among Iris species and among DNA groups. The
statistic H suggests a non-significant difference among students groups.
Finally, we test the performance of this method by comparing three artificial populations. Suppose the bivariate normal populations N2 (c1, Σ),
N2 (2c1, Σ), N2 (3c1, Σ). We simulate samples of sizes n1 = n2 = n3 for
c = 0 and c = 1, respectively. We then choose the Euclidean distance,
compute the Wilks statistic, the exact and the empirical P -values after
N = 10000 permutations. The results, summarized in Table 4, show that
the conclusions (to accept H0 for c = 0, to reject H0 for c = 1) at level of
significance α = 0.05 are the same.
13
Distance-based Tests
Table 4: Distance-based comparison of three (simulated) bivariate normal populations using the Euclidean distance.
Size
3
5
10
6
c=0
Exact
Wilks P -value
0.3364 0.203
0.7609 0.538
0.8735 0.465
Empirical
P -value
0.221
0.527
0.470
Size
3
5
10
c=1
Exact
Wilks P -value
0.2243 0.086
0.1835 0.001
0.4608 0.000
Empirical
P -value
0.078
0.001
0.000
References
Cox, T.F. and Cox, M.A.A. (1994) Multidimensional Scaling. Chapman and Hall, London.
Cuadras, C.M. (1989) Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge
(Ed.), Statistical Data Analysis and Inference, pp. 459–473. Elsevier
Science Publishers B. V. (North–Holland), Amsterdam.
Cuadras, C.M. (1992) Some examples of distance based discrimination.
Biometrical Letters, 29, 3-20.
Cuadras, C.M. and Arenas, C. (1990) A distance based regression
model for prediction with mixed data. Communications in Statistics,
Theory and Methods, 19, 2261-2279.
Cuadras, C.M. and Fortiana, J. (1995) A continuous metric scaling
solution for a random variable. Journal of Multivariate Analysis, 52,
1-14.
Cuadras, C.M., Arenas, C. and Fortiana, J. (1996) Some computational aspects of a distance-based model for prediction. Communications in Statistics, Simulation and Computation, 25, 593-609.
Cuadras, C.M., Atkinson, R.A. and Fortiana, J. (1997) Probability densities from distances and discriminant analysis. Statistics and
Probability Letters, 33, 405-411.
Cuadras, C.M. and Fortiana, J. (2004) Distance-based multivariate
two sample tests. In: M. S. Nikulin, N. Balakrishnan, M. Mesbah,
14
C. M. Cuadras
N. Limnios, (Eds.), Parametric and Semiparametric Models with Applications to Reliability, Survival Analysis and Quality of Life, pp.
273-290. Birkhauser, Boston.
Cuadras, C.M., Fortiana, J. and Oliva, F. (1997) The proximity
of an individual to a population with applications in discriminant
analysis. Journal of Classification, 14, 117-136.
Cuadras, C.M., Cuadras, D. and Lahlou, Y. (2006) Principal
directions for the general Pareto distribution. Journal of Statistical
Planning and Inference, 136, 2572-2583.
Cuadras, C.M., and Lahlou, Y. (2000) Some orthogonal expansions
for the logistic distribution. Communications in Statistics, Theory
and Methods, 29, 2643-2663.
Escoufier, Y. (1973) Le trataiment des variables vectorielles. Biometrics, 29, 751-760.
Flury, B. (1997) A First Course in Multivariate Statistics. SpringerVerlag, New York.
Gower, J.C. (1966) Some distance properties of latent roots and vector
methods in multivariate analysis. Biometrika, 53, 315-328.
Gower, J.C. and Legendre, P. (1986) Metric and Euclidean properties
of dissimilarity coefficients. Journal of Classification, 3, 5-48.
Liu, Z.J. and Rao, C.R. (1995) Asymptotic distribution of statistics
based on quadratic entropy and bootstrapping. Journal of Statistical
Planning and Inference, 43, 1-18.
Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate
Analysis. Academic Press, London.
Rao, C.R. (1982) Diversity: its measurement, decomposition, apportionment and analysis. Sankhya A, 44, 1-21.
Yanai, H., Takane, Y. and Ishii, Y. (2006) Nonnegative determinant
of a rectangular matrix: its definition and applications to multivariate
analysis. Linear Algebra and Its Applications, in press.