as a PDF

Transcription

as a PDF
PSYCHOMETRIKA--VOL.
46, NO. 4.
DECEMBER,
1981
NONMETRIC MAXIMUM LIKELIHOOD MULTIDIMENSIONAL
SCALING FROM DIRECTIONAL RANKINGS OF" SIMILARITIES
YOSHIO TAKANE
MCGILL UNIVERSITY
and
J. DOUGLAS CARROLL
BELL LABORATORIES
A maximum
likelihood estimation procedure is developed for multidimensional scaling when
(dis)similarity measures are taken by ranking procedures such as the methodof conditional rank
orders or the method of triadic combinations. The central feature of these procedures may be
termed directionality of ranking processes. That is, rank orderings are performed in a prescribed
order by successive first choices. Those data have conventionally been analyzed by ShepardKruskal type of nonmetric multidimensional scaling procedures. Wepropose, as a more appropriate alternative, a maximum
likelihood methodspecifically designed for this type of data. A broader
perspective on the present approach is given, which encompasses a wide variety of experimental
methodsfor collecting dissimilarity data including pair comparison methods(such as the methodof
tetrads) and the pick-Mmethodof similarities. Anexampleis given to illustrate various advantages
of nonmetric maximumlikelihood multidimensional scaling as a statistical method. At the moment
the approach is limited to the case of one-modetwo-wayproximity data, but could be extended in a
relatively straightforward way to two-mode two-way, two-mode three-way or even three-mode
three-way data, under the assumption of such models as INDSCALor the two or three-way
unfolding models.
Key words: rank orders, modelselection, AIC, face perception.
Introduction
In ranking procedures such as the method of triadic combinations [Richardson, 1938]
or the method of conditional rank orders [Young, 1975] the rank order judgments may be
characterized as directional. For example, in the methodof triadic combinationsstimuli are
presented in triads. The subject is asked to decide which two of the three are most alike, and
then which two are most different. Furthermore, a stimulus pair whose membersare neither
most alike nor most different is deducedfrom the previous two judgments to obtain a single
ranking of three dissimilarities defined on a triad of stimuli. Thus, the ranking is conditional
on the triad of stimuli, and it is directional since it is performedin a specific order.
In the methodof conditional rank orders one of n stimuli is designated as a standard
stimulus. The subject is asked to choose the most similar stimulus to the standard among
n - 1 stimuli, and then after this stimulus is eliminated, to choose the one which is now
The first author’s work was supported partly by the Natural Sciences and Engineering Research Council of
Canada, grant numberA6394.Portions of this research were done while the first author was at Bell Laboratories.
MAXSCAL-4.1,
a program to perform the computations described in this paper can be obtained by writing to:
Computing Information Service, Attention: Ms. Carole Scheiderman, Bell Laboratories, 600 Mountain Ave.,
MurrayHill, N.J. 07974. Thanksare due to YukioInukai, whogenerously let us use his stimuli in our experiment,
and to Jim Ramsayfor his helpful commentson an earlier draft of this paper. Confidence regions in Figures 2 and
3 were drawn by the program written by Jim Ramsay. Weare also indebted to anonymousreviewers for their
suggestions.
Requests for reprints should be sent to Yoshio Takane, Departmentof Psychology, McGill University, 1205
Docteur Penfield Ave., Montreal, QuebecH3A, 1B 1, Canada.
0033-3123/81/1200-3044500.75/0
© 1981 The Psychometric Society
389
PSYCHOMETRIKA
390
most similar to the standard amongthe n - 2 remaining stimuli, proceeding in this way
until all n- 1 stimuli are completely rank ordered relative to the standard. The whole
process is repeated with a different stimulus used as a standard stimulus in turn until all n
stimuli serve as a standard. Again the ranking process is directional (being performed from
the smallest to the largest dissimilarity) and conditional uponthe standard stimulus.
Dissimilarity data arising from ranking procedures have been analyzed either by
Torgerson’s [1952] classical multidimensional scaling (MDS)or by the type of nonmetric
MDSoriginally introduced by Shepard and Kruskal [Shepard, 1962; Kruskal, 1964a, b;
McGee,1966; Guttman, 1968; Roskam, 1970; Young, 1975]. In this paper we propose, as a
more appropriate alternative, a maximumlikelihood estimation (MLE)method specifically
designed for such directional rank order data. The principal advantage of the maximum
likelihood methodover other fitting procedures is that it permits various (asymptotically
valid) statistical inferences, provided that the distributional assumptions are correct. For
maximumlikelihood MDSprocedures for other types of dissimilarity data, see Ramsay
[1977, 1978, 1980b] and Takane [1978a, b, 1981].
The Method
Our general approach in this paper may be called a "parametric approach" to nonmetric scaling. In this approach nonmetric data are viewed as incomplete data [Dempster,
Laird & Rubin, 1977] conveying only ordinal information about distances. An unobserved
metric process conveying complete information about distances is assumed to underlie the
nonmetric data generation process, and a specific information reduction mechanism is
postulated between the two kinds of processes. The likelihood function is specified for
observed nonmetric data, which are related to distances based on someparametric assumptions about the underlying metric process.
The Likelihood Function
Let us assume that a set of n stimuli have an A-dimensional representation
euclidean space. That is, di~, the distance betweenstimuli i and j, is given by
A
in the
~I/2
do = ,=~,(xia-
x,.)2~
(1)
where x~a is the coordinate of stimulus i on dimensiona. It is straightforward to extend the
current approach to the INDSCAL
model [Carroll & Chang, 1970]. However, in this paper
we deliberately restrict our attention to the simple euclidean model defined above. Let us
further assume that dis as defined above is error-perturbed, and generates an errorperturbed metric process, ~t)
replication k and at occasion t; i.e.,
~ijk for
,
=
eok),
(2)
where e~)k is an error random variable and where ~ is a joint function (to be made more
explicit) of the distance and the error component. Weconsider the following two error
modelsin this paper:
,~(t), dlj + _tt)
ij~ ~
where eij ~(t),
,~ N(0,
a~/2),
ff’ijk
(Additive error model)
(3)
and
2,)
ijk
=
do eijK
~t~,
(Multiplicative error model)
(4)
where In e~ ,-~ N(0, ak2/2). The replication maybe taken over a sample of individuals (i.e.,
replication k is individual k). The a~ allows for possible individual differences in dispersion.
YosHIO
TAKANE AND J.
DOUGLAS CARROLL
391
Wemay optionally set tr~ = tr 2 for all k. These are basically the principal error models
considered by Ramsay [1977] in his maximumlikelihood approach to metric MDS.We
refer the reader to Ramsay’s paper for a discussion of the psychological and empirical
rationale for these particular models.
Our problem is to specify the likelihood of an observed set of ranked dissimilarities
given a set of distances which are error-perturbed in a specific way. A modelof psychological processes by which complete metric information is "collapsed" into incomplete ordinal
information must be an intrinsic part of the likelihood function. The problemof parameter
estimation is then to find a set of distances with the structure defined in (1) such that they
maximizethe likelihood of the particular observed data.
For explanatory purposes we only discuss the specification of the likelihood function
for the methodof conditional rank orders in somedetail. (The extension to the methodof
triadic combinationsis straightforward.) As noted earlier, the central feature of this method
is the directionality of the ranking processes. Giventhat the ranking process is directional, a
rank order maybe regarded as resulting from successive first choices. The likelihood of a
ranking is then defined by the product of the likelihoods of successive first choices, assumingthat each successive first choice is madestatistically independently of the others.
(This assumption mayseemunrealistically strong at first glance. However,as will be shown
later, it follows from a muchweakerassumption.) The joint likelihood of several rankings
arising from different conditionalities may,in turn, be defined by the product of the likelihoodsof those rankings. It is thus sufficient to specify the likelihood of a first choice.
Let dio, k) represent the distance betweenstandard stimulus i and whatever the stimulus
judged mth most similar to the standard in replication k. (If that stimulus is j, then dit,~k)
do.) Let ~’i{rn~)
~0 denote the random metric process corresponding to d,r~k~
generated at OC.
casion t. The occasion in this case refers to each successive first choice. Weassumethat the
probability, P~k,
tl~ that the dissimilarity corresponding to d~tik~ is judged smallest among
n - 1 dissimilarities is equal to the probability that the randommetric process "~i(1
~1~ k) corresponding to ditlk~ turns out to be the smallest of all n - 1 randomprocesses at occasion 1.
That is,
(1)
p(1)
ik = Pr(~-itlk)
(1)
< 21(2k),
(1)
21(lk)
...,
¯
< ~(1)
~itn-.,
)
(5)
Define
n-1
Bt~)=
1
0
-1 ...
0
1
I~
0
1
....
n-2
and
¯ ~t~)
~i(lk)
"
and
-- ~(1)
[_ ’~’i(n- 1, k)
L
dltn-
~, k)
Then under the additive error assumption p~) can be more explicitly written as
ik
(~)
ik
f(z,~
(6)
392
PSYCHOMETRIKA
where
~’ik ~ 1.~
~ .~.
~’ik
~,
(7)
and where~(~) is a multidimensional region such that ~(1)~(1) < O. The ~~(1)in (7) is the
covariance
of~(~). In caseof the multiplicativeerror model(4), the elementso~it~(t)
anti" u~l.(x)
shouldbe redefinedas In 2~(~)andIn d~(~), respectively.
~. ~(1)
~ are inde~ndent]yno.ally distributed with variance ~/2; i.e.,
If the elements
~r
ik
,8,
=
then the covarianceo! Lik is simplified into
¯ .,n")r(1)n")’,-ik
,~"=ag
¯
(9)
1
The important consequence of (9) is that Wk-(1) in (6) can be well approximated by the
multivariate logistic distribution [Bock, 1975; pp. 521-522],
_(1) =~1 +
llik
exp(ck b l)’d)
,
(10)
q=l
whereb~1)’ is the qth row vector of B(1) (the prime indicates a row vector), and whereCk is a
dispersion parameter which is approximately ~/(31/2ak). This distribution is mucheasier to
evaluate than the multivariate normal integral required in (6). Thevik"(1) in (10) is
interesting in its ownsake, since in the case of the multiplicative error modelit is equivalent
to Luce’s modelfor the first choice [Luce, 1959]. (See Appendix.)
The independence assumption in (8), however, may be restrictive in practical situations. Fortunately, we can relax (8) into
Z~ik~-~(1)
~---
I -~ "ik
J" "~ lUik ,
(11)
where,,ikt’(1) is an arbitrary vector and 1 is a vector of ones [Takane, 1978a]. The same
covariance matrix given in (9) can be derived from this weaker assumption as well. This can
be easily seen by pointing out that B(I)I = 0 so that the second and the third terms on the
right hand side of (11) vanish when~-~k’~(1)is pre- and postmultiplied (a) and B(1)’
respectively. The fact that (11) is sufficient to obtain (9) has considerable import, since
former is muchweaker than the latter, and is consequently more realistic in practical
situations. Notethat (8) follows as a special case of(11) by setting ) = 0.Note als o that an
equalcovariance
casealsofollows
from
(11)bysetting
"~k"(l)
to bea constant
vector.Essentially the same condition as (11) has been postulated by Huynhand Feldt [1970-1 as the
minimal condition to be satisfied in a univariate repeated measures analysis of variance.
Indeed, all mutually orthogonal contrasts are statistically independent under this assumption.
In order to obtainiYik-(2) (the probability that the stimulus corresponding to di(2k ) is
judged the next most similar stimulus to i), we define 2) by e liminating t he l ast r ow and
the last column
of B(1), d~2) byeliminatingthe first elementoi"
uek-’(1)and ~(2), ~’ik
(’~i(2k)(2) ¯ ¯ ¯ ~
](2)
k))"
We
may
then
express
thellik-(2)
in
a
form
analogous
to (6) and (7). Note that it
~’i(n- 1,
YOSHIO
TAKANE AND J.
DOUGLAS CARROLL
393
assumed
that an entirely newset of random
processes
Ai,m,,~(2),.,, (m= 2,
.. ., n -- 1) are generated for this judgment(as indicated by the newoccasion index). In general, we have
eik~(’~)
"~=1
(m),
exp(c~b~
d~
(12)
q=l
for the mth first choice after the m- 1 most similar stimuli are eliminated from the comparison set. Thenb~")’ is the qth row vector ofB(’), which is obtained by deleting the last m- 1
rows and columns ofB"). For m = n - 1 we have p~-~) = 1.
Wemay then state p~, the probability of a complete ranking of n - 1 stimuli with
respect to stimulus i, as
Pik=~
(13)
eik’(~)"
m=l
This assumes the statistical independenceof p~) and p~’) (m m’). Th e st atistical in de~ndence, of course, can be obtained, if all elements in ~) and those in k~’) are statistically
independent. However, it can be deduced from a muchweaker assumption, namely
~’)
ik lu’
+ vl’,
(14)
’~,
where--*kE~’")is the covariance matrix between ~) and
u and v ar e ar bitrary ve ctors,
but with appropriate numbersof components.The 1 is a vector of unities, but the two l’s in
~’’)’ vanishes,
(14) are different in dimensionality. Underthe assumption of (14), Bt~Z~")B
’)
so thatmik’(m)= .U(m)~(mJ~ik ,.~"n z~,)
= B(~’)Z~ are independent
of eachother, andconsequently
so are the p~) ~x~ Fik ¯
The joint likelihood of several conditional rankings is nowstated as
L~
= ~Pit,
(15)
i=1
and the joint likelihood for the entire set of observations as
L= I-I Lk.
(16)
k
Again the statistical independencebetween pl~"~ and p!,~’,) (i ~ i’) can be deduced, if
covariance
matrixbetween
Z.~’). and
"~i,ka~m’)
has a similarstructure to (14). Onthe other hand,
~’i,k,
Z.(~"~
and
a{")
(k
~
k’)
maybe assumedindependent, if replications are taken over different
ik
individuals.
The (log of) L can be maximized by various numerical techniques. MAXSCAL-4.1,
the computer program which performs the necessary computations, uses Fisher’s scoring
algorithm for maximization (see Rao, 1952, for example). A detailed description of this
algorithm as applied to similar situations, as well as of its relation to the Gauss-Newton
methodfor weighted least squares problems, can be found in Takane I1978a].
The necessary derivatives for this optimization are given as follows: For the additive
error model we maywrite
p(m) exp(sR di(rnk))
ik -~,(,.)
ik.
,
(17)
whereSk = -- Ok, and u~’) = ~j = ~, exp(skditik)). Then
OIn
OXqa
p!’~)
-- Sk
Od~t,~k)
1
O)¢q
a Uik° (ra)
.~1
j=raO exp(skdiOk))
OXqa
(18)
394
PSYCHOMETRIKA
and
OSk --
di(mk)
-- ) .E di uk) ex ),
p(sk di tjk)
(19)
where
~ exp(sk
di~ik~)
(~Xq
dditik)
a
= Sk exp(Skditjk)) dXqa
(20)
and
c~ditjk)_ (3i~-
(~(jk)q)(Xia
t~Xqa
-- X(jk)a)
di(jk)
(21)
In the above formula fi .. is the Kroneckerdelta (i.e., 61~ = 1 wheni = q, and 6iq = 0 when
i ~ q. Parenthesized subscript (jk) in fiOk)~ should be replaced by a single stimulus index. If
it is i, then (~(jk)q
= (~iq,
and J~ is as defined above.) For the multiplicative error modelwe
have
sk
(di(rak))
(22)
¯
(m) = ~=m
n -(dit~k)Y
1
~" Weobtain
where
Vik
S k ~di(m~ )
Vik
"~
~(diok))
1
(23)
and
c~
In .tin)
-- - Fik
In di(mk
~S
k
) --
1 n-1
i)~- ~ .E (In
sk
diuk))(diuk))
(24)
where
"~ = si(diuk))(s~-1)
~(di~jk))
ddlt/~)
~Xqa
(25)
~Xqa
Theexpressionfor dd~j~)/dx~ais given in (21).
Little modification is necessary to extend the above formulation to the method of
triadic combinations. In this methodconditional rank orders are taken on dissimilarities
defined on triads of stimuli rather than on dissimilarities with a common
standard stimulus.
Also, each conditional ranking is performed in the order of the smallest, then the largest
(and then the intermediate) rather than from the smallest to the largest dissimilarity (as
the method of conditional rank orders). However,these differences are minor, and can be
easily accomodatedwithin the present framework.
Missing Data
Each conditional ranking does not have to be complete in the method of conditional
rank orders, and there are two possible cases. The major distinction lies in whether or not
an experimenter has experimental control over missing data elements.
In one case the experimenter presents all n - 1 comparison stimuli, but obtains only
n* first choices. In this case we may simply take the product of the likelihoods of n*
successive first choices, while the likelihood of each successive first choice remains the same
YOSHIO TAKANE AND J.
DOUGLAS CARROLL
395
as before. That is,
Pik=
1--[
(26)
eik"~m)
m=l
wherep~k is given in (12).
On the other hand, the experimenter mayrestrict the comparisonstimuli to a subset of
n- 1 stimuli, and obtain a rank ordering amongonly the restricted set of comparison
tn
stimuli. Let n* be the numberof comparison stimuli. Wedefine the probability of the m
first choice (m < n*) in a manneranalogous to (12), but excluding those terms not related
n* comparisonstimuli. That is,
e~"~)=
~)’d~
~)
exp(c~ b~
1
.
(27)
q=l
The likelihood of a ranking is defined as in (26). Note that this is equivalent to redefining
the conditionality of a ranking (the set of dissimilarities on whicha ranking is obtained).
Tied Observations
Since a continuous distribution is assumedon distances, tied ranks should not occur
theoretically, and it maybe possible, depending on the experimental procedure we use, to
completely avoid tied ranks. Nevertheless it is desirable for a nonmetric multidimensional
scaling procedure to have some provisions for treatment of tied observations, since they
mayoccur frequently in practice.
There are two cases to be distinguished, one we call a weaktie and the other we call a
strong tie. Twodissimilarities are said to be in a weak tie whenthey are not explicitly
compared and consequently not rank ordered. For example, an experimental procedure
may require the subject to choose the n* most similar stimuli to a standard, but not
necessarily to obtain a rank ordering amongthe n* picked stimuli. Then the n* dissimilarities (correspondingto the n* picked stimuli) are "tied" in this sense. A strong tie, on the
other hand, is characterized by empirical indistinguishability. That is, two dissimilarities
are explicitly compared,but the subject is unable to rank order them.
The two kinds of ties should be treated differently, analogous to Kruskal’s [1964b]
primary and secondary approaches to ties. In the primary approach no constraints are
imposed among distances corresponding to tied observations except that they should
satisfy, as muchas possible, the order restrictions imposedby the larger and the smaller
dissimilarities than themselves. This approach is suitable for the treatment of weakties. In
the secondary approach, on the other hand, distances corresponding to tied observations
are required to be as close to each other as possible. This treatment of ties is suitable for the
treatment of strong ties, since dissimilarities corresponding to strong ties are deemedempirically indistinguishable.
Supposedissimilarities corresponding to dt") and dtin+l) are tied. Then the primary
approach defines pt~) (the probability of the mth first choice in a certain ranking) excluding
the term related to dtin+ 1), and pt~+1) excludingthe term related to t’). That i s, d t’+ 1) does
not have any effects on pC,,), and t’), i n t urn, d oes n ot have a ny effects on p (m+l). The
secondary approach, on the other hand, defines both pC,,) and pt~+ 1~ including both terms
related to d~’) and dt"+ 1). In this case t") and dt"+ 1) a re bound to be close t o each other, i
ptm) × p(m+ 1) is to be maximized.
In general, conditionalities of rankings can be of any form. The methodof conditional
rank orders and the method of triadic combinations are but two special cases of such
396
PSYCHOMETRIKA
instances. Dissimilarities in different conditionalities are not explicitly compared,and thus
considered to be in weaktie relations. The primary approach to weakties described above
is consistent with our previous treatment of conditionalities.
Generalizations
Wemay generalize our basic results on directional ranking procedures to other experimental procedures, including such procedures as the method of tetrads and the pick-M
methodof similarities. In the methodof tetrads four stimuli are presented in two pairs, and
the subject is asked to judge which pair of stimuli is more similar (or dissimilar). In the
pick-Mmethodof similarities, the subject is required to pick out the Mmost similar stimuli
to a standard stimulus from a well defined set of alternative stimuli. In what follows we
showthat these procedures are in fact two special cases of directional ranking procedures.
Wehave kept our exposition as succinct as possible, so that nonmathematicalreaders may
skip this entire section without loss of continuity.
Let S be the set of dissimilarities defined on a set of pairs of stimuli. Wedefine three
kinds of relations on S × S, order relation, strong tie relation and weaktie relation. We
assumethat for each pair of dissimilarities one of the three relations hold; i.e., for a, b e S,
i. a > b or a < b (order relation)
ii. a ,~ b (strong tie),
iii. a ~ b (weaktie).
Let Si and Sj be subsets of S. Wewrite Si > Sj (Si < S~), S~ ,~ S~ or S~ ~ S~, when
~’a e S~ and Vb ~ S~, a > b (a < b), a ~ b or a ~ b, respectively. The S~ and Sj maynot
disjoint.
Define the set of separate conditionalities Q whose i th memberis denoted by Sic S.
VSi, ¥Sj ~ Q (i :p j), S~ ~ Sj. Let the numberof elements in S~ be L. The procedure which
requires the subject to rank order Melements (M < u~
L) in S~ partitions S~ into subsets S~
and S~~- M~,whereS~~- M~is possibly null (whenM= L). The u~ is theset of t he M sm
allest
dissimilarities in Si, while SIt’- M~is the set of L - Mlargest dissimilarities in S~. Obviously
SIu~ < S~~’-u~. Either the order or strong tie relations are defined amongthe elements of
S~~t~, whereasthe weaktie relations are defined amongthe elements ofS~~- M~.
With appropriate choice of the memberof S~ we have the methodof tetrads (including
the method of triads as a special case) when L = M= 2, the method of triadic combinations when L = M= 3, and the method of conditional rank orders when L = M= n - 1
(where n is the numberof stimuli). The two incomplete (missing data) cases of the method
conditional rank orders can be obtained by setting L < M= n - 1 and L = M< n - 1.
(Of course, we may set.L < M< n - 1 to generate yet another variation of the method of
conditional rank orders.)
With minor modification the above formulation can be extended to the pick-M
methodof similarities. This methodalso requires the subject to partition the S~ into two
disjoint subsets, S~~ and S~~’-u~. However,this time S~L-u~ cannot be null (i.e., M< L).
have S~u~ < S~~’- ~t~ as before. However,the weaktie relation is defined amongthe dements
ofSl M~~’-M~.
as well as amongthe elements ofS~
A whole variety of experimental procedures, other than those already mentioned, can
be generated by specifying different Q’s. MAXSCAL-4.1
is capable of performing MDSfor
any definitions of Q, provided that each S~ is at most matrix conditional (i.e., dissimilarities
are not comparedacross different replications). Wecall this general case where Q can be
arbitrary the methodof general directional rank orders.
Note, however, that the above discussion is exclusively based on the algebraic structures of data obtained by the various data collection methods. Little statistical consider-
YOSHIO
TAKANE AND J.
DOUGLAS CARROLL
397
ations are taken into account, which may cause some problem in justifying statistical
assumptions underlying the methods. For example, we must assume statistical
independence of weakties, which mayor maynot be valid empirically.
Results and Discussion
As has been alluded to earlier, maximum
likelihood estimation offers various advantages over other fitting procedures. The asymptotic chi square goodness of fit statistic can
be readily derived from the general principle of likelihood ratio I’Wilks, 1962]. In addition
the AICstatistic [Akaike, 1974] can be used for identifying a best fitting model, wherethe
asymptotic chi square test is not feasible. Furthermore, the knowledgeof the asymptotic
behavior of maximum
likelihood estimators allows us to draw asymptotic confidence regions (of a prescribed size) surrounding estimated stimulus points to indicate the degree
precision of the estimation. In this section we demonstrate these features of maximum
likelihood estimation in the specific context of multidimensionalscaling.
Data
A small experiment was conducted to collect relevant data. Seventeen schematic faces
(Figure 1) were used in this experiment. Those stimuli were originally used by Inukai,
Nakamuraand Shinohara [Note 1] in their study to identity determinants of face perception. They were constructed by combiningtwo of the most important determinants of facial
expressions, namely the curvature of lips and the curvature of eyes. The stimuli, 4 cm in
height and 3 cmin width, were each pasted on a 7.5 cm x 12.5 cm index card. Dissimilarity
judgments were obtained from five subjects (two male and three female university students)
by the method of conditional rank orders. The order in which each stimulus served as a
standard stimulus was randomized over subjects. An average subject took 70 minutes to
complete the task.
Identifying the Best fitting Model
In order to choose the best fitting model the data were analyzed under all combinations of two error assumptions (the additive and the multiplicative error models), two
conditions about error variance (constant or variable over subjects) and two dimensionalities (two and three dimensions). In certain cases four and five dimensionalsolutions were
also obtained. The results are summarizedin Table 1. Each cell of the table contains three
numbers; the top one is the log likelihood of the model, and the one in the middle is the
value of the AICstatistic [-Akaike, 1974], whichis defined by
AIC = -2 In L ÷ 2 d.f.,
where the d.f. is the effective numberof parameters in the modelgiven at the bottom of the
cell (enclosed in parentheses). Note that the d.f. used here are for the degrees of freedom
the model, not for the error degrees of freedom. An important feature of AIC is that it
explicitly takes into account the numberof model parameters in evaluating the goodness of
fit of a model used to describe the data. The AICis a badness-of-fit index, so that the
smaller value indicates a better fit. A relatively nontechnical discussion on this statistic may
be found in Takane [1981].
The AIC statistic
may be used when models to be compared are not necessarily
hierarchically structured. This feature is very convenient when we compare, for example,
the goodnessof fit of the two error modelssince the usual asymptotic chi square test is not
feasible for such comparisons. The d.f. of the modelis calculated by NA- A(A + 1)/2 + r,
where r is the number of estimated dispersion parameters. This r is zero when the dispersions are assumed constant over subjects, and is equal to N - 1 when they are assumed
PSYCHOMETRIKA
398
CONVEX
~
CONCAVE,
EYES
FIGURE1
The stimulus configuration in terms of the curvatures of lips and eyes (FromInukai, et al., Note 1).
to vary over subjects. This is so, because there is a trade off betweenthe size of stimulus
configuration and the dispersion parameter, and consequently we can either fix the size of
configuration or fix one of the dispersion parameters.
The AIC is unfortunately based on asymptotic properties of maximumlikelihood
estimators (as is the asymptotic chi square). This meansthat, strictly speaking, it is only
applicable to large-sample data. Althoughin all cases we shall deal with in this paper we
have at least ten times as manyobservations as there are parameters to be estimated, we are
not entirely sure whether it is sufficient. Furthermore, with models in which the numberof
parameters increases with increasing numberof replications (e.g., the model which allows
for individual differences in dispersion), data are never sufficiently large to warrant the
asymptotic properties of MLE.Consequently, we have to rely on someheuristic rule in the
following discussion. The work [Takane & Carroll, Note 2] is in process, however, to
develop a precise modification rule to the basic formula of AIC when the asymptotic
properties of MLEmay not be exploited.
Choosing a model with a minimumAIC value, when models are in fact hierarchically
399
YOSHIO TAKANE AND J. DOUGLAS CARROLL
Table i
Summary of the results of MAXSCAL-4.1
analyses of the face data
Unconstrainedsolutions
Constrained solutions
dimensionality
5
4
dimensionality
3
2
-1737.9
-1810.6
3565.8
3683.1
(45)
(31)
4
2
Additive error
Constant
variance
in(L)
AIC
d.f.
Nonconstant in(L)
variance
AIC
d.f.
-1668.6
-1680.9
-1711.5
-1784.8
-1755.5
-1847.5
3485.2
3485.8
3521.1
3639.6
3554.9
3727.0
(74)
(62)
(49)
(35)
(22)
(16)
Multiplicative error
Const ant
variance
in (L)
AIC
d.f.
Nonconstant In(L)
variance
AIC
d.f.
-1748.9
-1817.6
3587.9
3697.2
(45)
(31)
-1720.1
-1797.6
3538.2
3665.3
(49)
(35)
structured, is equivalent to choosing a more restricted model, wheneverthe asymptotic chi
square is smaller than 2 x d.f. (where the d.f. is now the difference in the numbers of
parameters between two models being compared), and to choosing a less restricted model
otherwise. Ramsay[-1980a] has shownthat the asymptotic chi square tends to be inflated
for small samples, but that there is a remarkableregularity in the wayit is inflated. Noting
this fact he has developeda simple correction formula for the asymptotic chi square, which
is obtained by just multiplying some constant to the chi square criterion value. This
constant should represent howthe asymptotic chi square tends to be inflated under specific
circumstances. Interestingly, this constant never seems to exceed 3. Furthermore, it reduces
to approximately 1.5, when there are 15 stimuli and five complete replications. Wemay
exploit this fact to construct a tentative rule of thumb. That is, we favor a less restricted
400
PSYCHOMETRIKA
model whenever the asymptotic chi square exceeds 3 x d.f. (3 = 1.5 x 2). Otherwise
choose a more restricted model. This decision rule is equivalent to adding 3 x d.f. (rather
than 2 x d.f.) to - 2 In L in the formula of AIC. (Note, however, that the values of AIC
Table 1 are evaluated according to the unmodified formula.)
With the above qualification in mind let us start with the comparison between the
additive error modeland the multiplicative error model. Wesee that in all four corresponding solutions (two conditions on the variance structure times two dimensionalities) obtained under these two error models, the AICis consistently smaller in the additive error
model. For these data the additive error modelseems to fit better than the multiplicative
error model. Our experience with other data sets also indicates that the additive error
model is generally superior to the multiplicative error model, when the type of judgment
involved is direct comparison of two or more dissimilarities [Takane, 1978b], though the
converse is true for rating judgment[Takane, 1981].
Assumingthat the additive error model is more appropriate, we may now compare the
constant variance and the nonconstant variance assumptions. For both two and three
dimensional solutions the values of AIC are found smaller in the nonconstant variance
assumption (Z2 = 51.6 with 4 d.f. for the two dimensional solution, andz2 = 52.8 with 4 d.f.
for the three dimensional solution. Since the values of the asymptotic chi square are both
more than ten times larger than the correspondingd.f., it maybe safely concludedthat there
are substantial individual differences in dispersions.
Weare now in the position to determine appropriate dimensionality of the representation space. The comparison between two and three dimensional solutions in the
nonconstant variance additive error model reveals that the third dimension is clearly
significant (Z2 = 146.6 with 14 d.f.), implyingthat the appropriate dimensionality is at least
three. It is possible that the dimensionality is even higher, so that four and five dimensional
solutions were also obtained under the equivalent conditions. The fourth dimension seems
to be significant (Z2 --- 61.3 with 13 d.f.), while the fifth dimensionis not. The value of the
asymptotic chi square ( = 24.6) representing the difference between the four and five dimensional solutions is only slightly larger than twice the correspondingd.f.(= 12). Note that the
five dimensional solution still has the minimumAICvalue according to the original formula, but the four dimensional solution would have been the minimumAICsolution, if we had
taken into account the correction factor for small samples (i.e., if we had added3 x d.f. to
-2 In L instead of 2 x d.f.).
The four dimensional solution is depicted in Figures 2 and 3 for selected pairs of
dimensions along with 95%asymptotic confidence regions for the estimated points. The
first two dimensions roughly correspond with the two defining properties of the stimuli;
dimension 1 represents the lips dimension, while dimension 2 represents the eyes dimension.
Although the stimuli are physically two dimensional, the third and the fourth dimensions
are clearly interpretable. While it is possible to interpret the unrotated third and fourth
dimensions (designated as dimension 3 and dimension 4, respectively in Figure 3), the
°interpretation
would be much more straightforward if they are rotated for about 60
counter clockwise, as indicated in Figure 3. In the figure the new dimensions are designated
as dimension 3’ and dimension 4’. (The dimensions are ordered in terms of the amount of
stimulus variability they can account for.) Dimension3’ represents "skew-symmetry"about
the eyes dimension. It can be seen in Figure 3 that the stimuli, which occupy symmetric
locations with respect to the neutral eye curvature level (i.e., flat eyes) in Figure 1, take
approximately equal coordinate values on dimension 3’. For example, stimuli 1 and 17 are
very close on this dimension, since they have exactly opposite curvatures of eyes. Dimension 4’ represents essentially the same thing for the lips dimension. (i.e., it represents
"skew-symmetry"about the lips dimensions.) Stimuli 3 and 15, for example, take approximately equal coordinate values on this dimension.
YOSHIO TAKANEANDJ.
DOUGLAS CARROLL
401
DIMENSION1
The two-dimensional plot of the best fitting
FIGURE
2.
four dimensional solution from MAXSCAL-4.1:
Dimension 1 versus
Dimension 2.
Z
U,l
DIMENSION
3
The two-dimensional plot of the best fitting
FIGURE3
four dimensional solution from MAXSCAL-4.1
: Dimension3 versus
Dimension 4.
402
PSYCHOMETRIKA
Curvatures equal in absolute value but opposite in sign are perceived to be more
similar than wouldbe predicted from linear dimensions of curvature. The linear dimensions
are folded (or bent) at neutral curvature levels in order to accomodate the perceived
similarities between "skew-symmetric" curvatures. Figure 4 depicts howthe two defining
properties of the stimuli are folded (or bent) at the neutral curvature levels. (The top figure
is the plot of dimension 1 against dimension 4’; the bottom one is the plot of dimension 2
against dimension 3’. In makingthese plots dimensions 1 and 2 were rotated slightly (about
10° clockwise in the plane spanned by these dimensions) so that they more closely match
with the two defining physical dimensions of the stimuli.) It maybe observed in Figure
that there is a subtle difference in the waysin whichthe two dimensionsare folded (or bent).
For the lips dimensionthe bent is gradual, while for the eyes dimensionit is rather sharp. At
the momentwe are not sure whether this difference has any psychological significance.
A "folded" dimension may be interpreted psychologically as representing "intensity"
analogous in some way to the social utility dimension discussed by Coombsand Kao
[1960]. On the other hand, it maybe just an artifact of the noneuclidean nature of the
perceptual space. For example, as pointed out by Changand Carroll [1980] in the context
of individual differences MDS
analysis of color normal and color deficient subjects, possible
idiosyncratic monotonic transformations of a commonstimulus dimension by different
subjects may well give rise to a "folded" dimension like the ones we have observed in our
study. At present, however, there is no definite evidence to favor one over the other. The
3@
15.
=
11
.2
8,,
,16
13
10= =5
17
9
DIMENSION 1
@
1
=17
11
= ~14
4
7,~=2
"8
z
uJ
"10
DIMENSION 2
The plots
of the "folded"
dimensions:
FIGURE4
Dimension 1 versus dimension 4’ (top)
(bottom).
and dimension 2 versus dimension
YOSHIO
TAKANE AND J.
DOUGLAS CARROLL
403
important point is that a "folded" dimension is a rule rather than an exception [Chang &
Carroll, 1980; Takanc, 1981].
Since the stimuli were constructed by incomplete factorial combinations of two physical properties, wc mayformally test whether those physical properties can account for the
subjective dissimilarities amongthe stimuli. This hypothesis is geometrically very similar to
the configuration depicted in Figure 1, except that equal intervals arc not necessarily
assumed between adjacent curvature levels. Wcmay add the "skew-symmetric" hypothesis
to obtain a four dimensional hypothesis. Dimensions1 and 2 represent the curvature of lips
and the curvature of eyes, respectively. The third dimension represents the "skewsymmetry"about the eye dimension. On this dimension the coordinates arc assumed equal
for stimuli 3, 6, 9, 12 and 15, stimuli 5, 8, l0 and 13, stimuli 2, 4, 7, 11, 14 and 16, and stimuli
1 and 17, respectively. Similarly, the fourth dimension represents the "skew-symmetry"
about the lips dimension. The coordinates for stimuli 1, 4, 9, 14 and 17, stimuli 5, 8, 10 and
13, stimuli 2, 6, 7, 11, 12 and 16, and stimuli 3 and 15 arc assumedequal on this dimension.
Coordinates which arc assumed equal were treated as if they were single parameters in the
estimation procedure.
These hypotheses were fitted under the nonconstant variance additive error assumption. Amongthe two constrained solutions the four dimensional solution has the minimum
AICvalue of 3554.9 (it 2 = 184.0 with 6 d.f.). However,this value is still substantially larger
than that of the corresponding unconstrained solution. The asymptotic chi square representing the difference between the two solutions is 149.2, which is more than three times
its correspondingd.f. (-- 40).
Asymptotic Confidence Regions
The inverse of the information matrix evaluated at the maximum
likelihood estimates
of parameters is knownto give asymptotic variance and covariancc estimates of estimated
parameters. It has been shown by Ramsay[1978] that the regular inverse maybe replaced
by the Moorc-Pcnrosc inverse when the information matrix is singular duc to nonuniquco
ness of parameters in multidimensional scaling. Exploiting this fact and the asymptotic
normality of the maximumlikelihood estimators, wc may draw asymptotic confidence
regions for any subsets of parameters. In MDSwe arc typically interested in confidence
regions drawnseparately for each estimated point. In cases of a four dimensional configuration, each confidence region forms a four dimensional hypcr ellipsoid, which can bc
orthogonally projected onto two dimensional subspaces each formed by two of the four
dimensions. The projection of a hypcr ellipsoid forms an ellipse like those depicted in
Figures 2 and 3. In drawing those ellipses a small-sample correction factor, developed by
Ramsayfor this MULTISCALE
[Ramsay, 1980a], was applied. It generally has an effect of
makingthem somewhatlarger. Unfortunately, it is impossible to visualize four dimensional
hyper ellipsoid basedon these ellipses.
Discussion
Wehave seen an example of analysis which can bc performed by MAXSCAL-4.1.
Althoughthis examplepertains to conditional rank order data, essentially the same type of
analyses can be performed for other types of directional rank order data. As should bc clear
from the definition of the likelihood function [(12), (14) &(15)], the directionality of ranking
processes is one of the most crucial ingredients of the current procedure. It maybe that this
procedure proves to bc very robust against violation of this basic assumption, so that it
maybe safely used for nondirectional rank order data as well. At present, however, we have
no hard evidence which indicates that this is the case, though a study is being undertaken
which is aimed at systematically examining the robustness of the current procedure against
nondirectional ranking processes [Takanc & Carroll, Note 2].
404
PSYCHOMETRIKA
Aside from the robustness question there is a good reason to favor directional ranking
procedures over nondirectional ones; the former generally provides dissimilarity rank
orders which are more consistent with true orderings of underlying distances. To see this a
number of dissimilarity rankings were generated from assumed stimulus configurations by
both the directional and the nondirectional ranking process. Correlations (Kendall’s tau)
were calculated between true rankings of distances and rankings obtained by the two
processes. For two levels of the error examined[tr = .2 and .4 for tr(X’X) = n, whichis set
to 10] the correlations were uniformly higher for the directional ranking processes. This
implies that directional ranking procedures can provide more reliable rankings of dissimilarities than their nondirectional counterparts. Note that in the latter error processes are
generated once for all per ranking, whereas in the former they are assumedto be generated
for each successivefirst choice.
The empirical results indicating superior validity of directional rankings can be rationalized in terms of the following argument. For sake of simplicity let us consider the case of
ranking of only three objects (although the arguments for n > 3 should follow straightforwardly). For n = 3, the whole difference between directional and nondirectional rankings
arises from the probability that the object whose"true" rank is 2 is in fact given an actual
rank of 2. (The process for choosing the first ranked object is identical for both processes,
while the third ranked item is completely specified once the first two ranks are specified.)
Denoting this probability as Pr(r = 21R-- 2) where R denotes the "true" rank and r the
observed rank, we can see that the conditional probability is just the joint probability that
object 2 is not maximalin the first place and is maximalin the second. In the nondirectional
case the first and second events must be negatively correlated, since a small value associated
with object 2 will tend to make that object nonmaximalon both the first and second first
choices. Thus the fact that object 2 was not maximalin the first ranking tends to imply that
the value associated with it was in fact relatively small, and thus decreases the probability
that that object is (correctly) maximalin the secondfirst choice. In the directional case,
the other hand, two events are independent, so that the nonmaximality of object 2 in the
first place has no implied effect on its position in the second first choice. Consequently,
Pr(r = 21R= 2) for directional rankings must be greater than Pr(r --- 21R= 2) for nondirectional rankings. Since in this particular case this conditional probability completely
determines the relative validity of the two processes, it follows that directional rankings (at
least for n = 3) are more valid than nondirectional ones. A similar line of argumentmaybe
applied to the case of n > 3.
REFERENCESNOTES
1. Inukai,Y., Nakamura,
K., Shinohara,
M.Amultidimensional
scalinganalysisof judgeddissimilaritiesamong
schematic
facial expressions.
BulletinoflndustrialProduct
Research
Institute,1977,81, 21-30(in Japanese).
2. Takane,Y., andCarroll, J. D. Onthe robustnessof AICin the contextof maximum
likelihoodmultidimensionalscaling.Manuscript
in preparation.
REFERENCES
Akaike,H.Anewlookat the statistical modelidentification.IEEETransactions
on automatic
control,1974,19,
716-723.
Bock,R. D.Multivariate
statistical methods
in behavioral
research.NewYork:McGraw
Hill, 1975.
Carroll, J. D., &Chang,J. J. Analysisof individualdifferencesin multidimensional
scaling via an N-way
generalizationof"Eckart-Young"
decomposition.
Psychometrika,
1970,35, 283-319.
Chang,J. J., &Carroll, J. D. Threeare not enough:AnINDSCAL
analysissuggestingthat color spacehas seven
( _+1) dimensions.
Colorresearch
andapplications,
1980,5, 193--206.
Coombs,
C. H., &Kao,R. C. Ona connectionbetween
factor analysisandmultidimensional
unfolding.Psychometrika,1960,25, 219-231.
Dempster,
A. P., Laird, N. M.,&Rubin,D. B. Maximum
likelihoodfromincompletedata via the EMalgorithm.
TheJournal
of the RoyalStatisticalSociety,SeriesB. 1977,39,1-38.
YOSHIO TAKANE AND J.
DOUGLAS CARROLL
405
Guttman,L. A general nonmetric technique for finding the smallest coordinate space for a configuration of points.
Psychometrika, 1968, 33, 469-506.
Huynh, H. & Feldt, L. S. Conditions under which mean square ratios in repeated measurements designs have
exact F-distributions. Journal of the AmericanStatistical Association, 1970, 65, 1582-1589.
Kruskal, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika,
1964(a), 29, 1-27.
Kruskal, J. B. Nonmetricmultidimensional scaling: a numerical method. Psychometrika, 1964 (b), 29, 115-129.
Luce, R. D. Individual choice behavior : A theoretical analysis. NewYork: Wiley, 1959.
McGee,V. The multidimensional scaling of "elastic" distances. The British Journal of Mathematicaland Statistical
Psychology, 1966, 19, 181-196.
Ramsay,J. O. Maximum
likelihood estimation in multidimensional scaling. Psychometrika, 1977, 42, 241-266.
Ramsay,J. O. Confidenceregions for multidimensional scaling analysis. Psychometrika, 1978, 43, 145-160.
Ramsay, J. O. Somesmall sample results for maximumlikelihood estimation in multidimensional scaling. Psychometrika, 1980(a), 45, 139-144.
Ramsay,J. O. Joint analysis of direct ratings, pairwise preferences and dissimilarities. Psychometrika,1980(b), 45,
149-165.
Rao, C. R. Advancedstatistical methods in biometric research. NewYork: Wiley, 1952.
Richardson, M. W. Multidimensional psychophysics. Psychological Bulletin, 1938, 35, 659-660.
Roskam,E. E. The methodsof triads for multidimensional scaling. Nederlands Tijdschrift Voor de Psychologic en
haar grensgebieden, 1970, 25, 404-417.
Shepard, R. N. The analysis of proximities: Multidimensional scaling with an unknowndistance function, I & II.
Psychometrika, 1962, 27, 125-140 & 219-246.
Takane, Y. A maximumlikelihood method for nonmetric multidimensional scaling: I. The case in which all
empirical pairwise orderings are independent--theory. JapanesePsychological Research, 1978 (a), 20, 7-17.
Takane, Y. A maximumlikelihood method for nonmetric multidimensional scaling: I. The case in which all
empirical pairwise ordedngs are independent---evaluations. Japanese Psychological Research, 1978 (b), 20,
105-114.
Takane, Y. Multidimensional successive categories scaling: A maximum
likelihood method. Psychometrika, 1981,
46, 9-28.
Torgerson, W. S. Multidimensional scaling: I. Theory and method. Psychometrika, 1952, 17, 401-419.
Wilks, S. S. Mathematicalstatistics. NewYork: Wiley, 1962.
Young,F. W. Scaling replicated conditional rank-order data. In Heise, D. R. (ed.), Sociological Methodology.San
Francisco: Jossey-Bass Publishers, 1975.
Manuscript received 11/24/80
Final version received 6/18/81
APPENDIX
Weshow the equivalence of (10) to Luce’s model for the first
plicative error modelwe maywrite (10)
choice [Luce, 1959]. Assuming the multi-
,k = L¯ + E exp(ck(ln d,,xk, -- In
} d,e+l,k,)
=
-c~
1 + ~ exp{ln(difq+l.k)/d,lt))
1
I + ~.(d~+
I.k)/di~o,))
which is Luce’s model. The last equation can be derived by multiplying both the numerator and the denominator
by (d,lk))-’’ and by replacing -- ck by s~( <