as a PDF
Transcription
as a PDF
PSYCHOMETRIKA--VOL. 46, NO. 4. DECEMBER, 1981 NONMETRIC MAXIMUM LIKELIHOOD MULTIDIMENSIONAL SCALING FROM DIRECTIONAL RANKINGS OF" SIMILARITIES YOSHIO TAKANE MCGILL UNIVERSITY and J. DOUGLAS CARROLL BELL LABORATORIES A maximum likelihood estimation procedure is developed for multidimensional scaling when (dis)similarity measures are taken by ranking procedures such as the methodof conditional rank orders or the method of triadic combinations. The central feature of these procedures may be termed directionality of ranking processes. That is, rank orderings are performed in a prescribed order by successive first choices. Those data have conventionally been analyzed by ShepardKruskal type of nonmetric multidimensional scaling procedures. Wepropose, as a more appropriate alternative, a maximum likelihood methodspecifically designed for this type of data. A broader perspective on the present approach is given, which encompasses a wide variety of experimental methodsfor collecting dissimilarity data including pair comparison methods(such as the methodof tetrads) and the pick-Mmethodof similarities. Anexampleis given to illustrate various advantages of nonmetric maximumlikelihood multidimensional scaling as a statistical method. At the moment the approach is limited to the case of one-modetwo-wayproximity data, but could be extended in a relatively straightforward way to two-mode two-way, two-mode three-way or even three-mode three-way data, under the assumption of such models as INDSCALor the two or three-way unfolding models. Key words: rank orders, modelselection, AIC, face perception. Introduction In ranking procedures such as the method of triadic combinations [Richardson, 1938] or the method of conditional rank orders [Young, 1975] the rank order judgments may be characterized as directional. For example, in the methodof triadic combinationsstimuli are presented in triads. The subject is asked to decide which two of the three are most alike, and then which two are most different. Furthermore, a stimulus pair whose membersare neither most alike nor most different is deducedfrom the previous two judgments to obtain a single ranking of three dissimilarities defined on a triad of stimuli. Thus, the ranking is conditional on the triad of stimuli, and it is directional since it is performedin a specific order. In the methodof conditional rank orders one of n stimuli is designated as a standard stimulus. The subject is asked to choose the most similar stimulus to the standard among n - 1 stimuli, and then after this stimulus is eliminated, to choose the one which is now The first author’s work was supported partly by the Natural Sciences and Engineering Research Council of Canada, grant numberA6394.Portions of this research were done while the first author was at Bell Laboratories. MAXSCAL-4.1, a program to perform the computations described in this paper can be obtained by writing to: Computing Information Service, Attention: Ms. Carole Scheiderman, Bell Laboratories, 600 Mountain Ave., MurrayHill, N.J. 07974. Thanksare due to YukioInukai, whogenerously let us use his stimuli in our experiment, and to Jim Ramsayfor his helpful commentson an earlier draft of this paper. Confidence regions in Figures 2 and 3 were drawn by the program written by Jim Ramsay. Weare also indebted to anonymousreviewers for their suggestions. Requests for reprints should be sent to Yoshio Takane, Departmentof Psychology, McGill University, 1205 Docteur Penfield Ave., Montreal, QuebecH3A, 1B 1, Canada. 0033-3123/81/1200-3044500.75/0 © 1981 The Psychometric Society 389 PSYCHOMETRIKA 390 most similar to the standard amongthe n - 2 remaining stimuli, proceeding in this way until all n- 1 stimuli are completely rank ordered relative to the standard. The whole process is repeated with a different stimulus used as a standard stimulus in turn until all n stimuli serve as a standard. Again the ranking process is directional (being performed from the smallest to the largest dissimilarity) and conditional uponthe standard stimulus. Dissimilarity data arising from ranking procedures have been analyzed either by Torgerson’s [1952] classical multidimensional scaling (MDS)or by the type of nonmetric MDSoriginally introduced by Shepard and Kruskal [Shepard, 1962; Kruskal, 1964a, b; McGee,1966; Guttman, 1968; Roskam, 1970; Young, 1975]. In this paper we propose, as a more appropriate alternative, a maximumlikelihood estimation (MLE)method specifically designed for such directional rank order data. The principal advantage of the maximum likelihood methodover other fitting procedures is that it permits various (asymptotically valid) statistical inferences, provided that the distributional assumptions are correct. For maximumlikelihood MDSprocedures for other types of dissimilarity data, see Ramsay [1977, 1978, 1980b] and Takane [1978a, b, 1981]. The Method Our general approach in this paper may be called a "parametric approach" to nonmetric scaling. In this approach nonmetric data are viewed as incomplete data [Dempster, Laird & Rubin, 1977] conveying only ordinal information about distances. An unobserved metric process conveying complete information about distances is assumed to underlie the nonmetric data generation process, and a specific information reduction mechanism is postulated between the two kinds of processes. The likelihood function is specified for observed nonmetric data, which are related to distances based on someparametric assumptions about the underlying metric process. The Likelihood Function Let us assume that a set of n stimuli have an A-dimensional representation euclidean space. That is, di~, the distance betweenstimuli i and j, is given by A in the ~I/2 do = ,=~,(xia- x,.)2~ (1) where x~a is the coordinate of stimulus i on dimensiona. It is straightforward to extend the current approach to the INDSCAL model [Carroll & Chang, 1970]. However, in this paper we deliberately restrict our attention to the simple euclidean model defined above. Let us further assume that dis as defined above is error-perturbed, and generates an errorperturbed metric process, ~t) replication k and at occasion t; i.e., ~ijk for , = eok), (2) where e~)k is an error random variable and where ~ is a joint function (to be made more explicit) of the distance and the error component. Weconsider the following two error modelsin this paper: ,~(t), dlj + _tt) ij~ ~ where eij ~(t), ,~ N(0, a~/2), ff’ijk (Additive error model) (3) and 2,) ijk = do eijK ~t~, (Multiplicative error model) (4) where In e~ ,-~ N(0, ak2/2). The replication maybe taken over a sample of individuals (i.e., replication k is individual k). The a~ allows for possible individual differences in dispersion. YosHIO TAKANE AND J. DOUGLAS CARROLL 391 Wemay optionally set tr~ = tr 2 for all k. These are basically the principal error models considered by Ramsay [1977] in his maximumlikelihood approach to metric MDS.We refer the reader to Ramsay’s paper for a discussion of the psychological and empirical rationale for these particular models. Our problem is to specify the likelihood of an observed set of ranked dissimilarities given a set of distances which are error-perturbed in a specific way. A modelof psychological processes by which complete metric information is "collapsed" into incomplete ordinal information must be an intrinsic part of the likelihood function. The problemof parameter estimation is then to find a set of distances with the structure defined in (1) such that they maximizethe likelihood of the particular observed data. For explanatory purposes we only discuss the specification of the likelihood function for the methodof conditional rank orders in somedetail. (The extension to the methodof triadic combinationsis straightforward.) As noted earlier, the central feature of this method is the directionality of the ranking processes. Giventhat the ranking process is directional, a rank order maybe regarded as resulting from successive first choices. The likelihood of a ranking is then defined by the product of the likelihoods of successive first choices, assumingthat each successive first choice is madestatistically independently of the others. (This assumption mayseemunrealistically strong at first glance. However,as will be shown later, it follows from a muchweakerassumption.) The joint likelihood of several rankings arising from different conditionalities may,in turn, be defined by the product of the likelihoodsof those rankings. It is thus sufficient to specify the likelihood of a first choice. Let dio, k) represent the distance betweenstandard stimulus i and whatever the stimulus judged mth most similar to the standard in replication k. (If that stimulus is j, then dit,~k) do.) Let ~’i{rn~) ~0 denote the random metric process corresponding to d,r~k~ generated at OC. casion t. The occasion in this case refers to each successive first choice. Weassumethat the probability, P~k, tl~ that the dissimilarity corresponding to d~tik~ is judged smallest among n - 1 dissimilarities is equal to the probability that the randommetric process "~i(1 ~1~ k) corresponding to ditlk~ turns out to be the smallest of all n - 1 randomprocesses at occasion 1. That is, (1) p(1) ik = Pr(~-itlk) (1) < 21(2k), (1) 21(lk) ..., ¯ < ~(1) ~itn-., ) (5) Define n-1 Bt~)= 1 0 -1 ... 0 1 I~ 0 1 .... n-2 and ¯ ~t~) ~i(lk) " and -- ~(1) [_ ’~’i(n- 1, k) L dltn- ~, k) Then under the additive error assumption p~) can be more explicitly written as ik (~) ik f(z,~ (6) 392 PSYCHOMETRIKA where ~’ik ~ 1.~ ~ .~. ~’ik ~, (7) and where~(~) is a multidimensional region such that ~(1)~(1) < O. The ~~(1)in (7) is the covariance of~(~). In caseof the multiplicativeerror model(4), the elementso~it~(t) anti" u~l.(x) shouldbe redefinedas In 2~(~)andIn d~(~), respectively. ~. ~(1) ~ are inde~ndent]yno.ally distributed with variance ~/2; i.e., If the elements ~r ik ,8, = then the covarianceo! Lik is simplified into ¯ .,n")r(1)n")’,-ik ,~"=ag ¯ (9) 1 The important consequence of (9) is that Wk-(1) in (6) can be well approximated by the multivariate logistic distribution [Bock, 1975; pp. 521-522], _(1) =~1 + llik exp(ck b l)’d) , (10) q=l whereb~1)’ is the qth row vector of B(1) (the prime indicates a row vector), and whereCk is a dispersion parameter which is approximately ~/(31/2ak). This distribution is mucheasier to evaluate than the multivariate normal integral required in (6). Thevik"(1) in (10) is interesting in its ownsake, since in the case of the multiplicative error modelit is equivalent to Luce’s modelfor the first choice [Luce, 1959]. (See Appendix.) The independence assumption in (8), however, may be restrictive in practical situations. Fortunately, we can relax (8) into Z~ik~-~(1) ~--- I -~ "ik J" "~ lUik , (11) where,,ikt’(1) is an arbitrary vector and 1 is a vector of ones [Takane, 1978a]. The same covariance matrix given in (9) can be derived from this weaker assumption as well. This can be easily seen by pointing out that B(I)I = 0 so that the second and the third terms on the right hand side of (11) vanish when~-~k’~(1)is pre- and postmultiplied (a) and B(1)’ respectively. The fact that (11) is sufficient to obtain (9) has considerable import, since former is muchweaker than the latter, and is consequently more realistic in practical situations. Notethat (8) follows as a special case of(11) by setting ) = 0.Note als o that an equalcovariance casealsofollows from (11)bysetting "~k"(l) to bea constant vector.Essentially the same condition as (11) has been postulated by Huynhand Feldt [1970-1 as the minimal condition to be satisfied in a univariate repeated measures analysis of variance. Indeed, all mutually orthogonal contrasts are statistically independent under this assumption. In order to obtainiYik-(2) (the probability that the stimulus corresponding to di(2k ) is judged the next most similar stimulus to i), we define 2) by e liminating t he l ast r ow and the last column of B(1), d~2) byeliminatingthe first elementoi" uek-’(1)and ~(2), ~’ik (’~i(2k)(2) ¯ ¯ ¯ ~ ](2) k))" We may then express thellik-(2) in a form analogous to (6) and (7). Note that it ~’i(n- 1, YOSHIO TAKANE AND J. DOUGLAS CARROLL 393 assumed that an entirely newset of random processes Ai,m,,~(2),.,, (m= 2, .. ., n -- 1) are generated for this judgment(as indicated by the newoccasion index). In general, we have eik~(’~) "~=1 (m), exp(c~b~ d~ (12) q=l for the mth first choice after the m- 1 most similar stimuli are eliminated from the comparison set. Thenb~")’ is the qth row vector ofB(’), which is obtained by deleting the last m- 1 rows and columns ofB"). For m = n - 1 we have p~-~) = 1. Wemay then state p~, the probability of a complete ranking of n - 1 stimuli with respect to stimulus i, as Pik=~ (13) eik’(~)" m=l This assumes the statistical independenceof p~) and p~’) (m m’). Th e st atistical in de~ndence, of course, can be obtained, if all elements in ~) and those in k~’) are statistically independent. However, it can be deduced from a muchweaker assumption, namely ~’) ik lu’ + vl’, (14) ’~, where--*kE~’")is the covariance matrix between ~) and u and v ar e ar bitrary ve ctors, but with appropriate numbersof components.The 1 is a vector of unities, but the two l’s in ~’’)’ vanishes, (14) are different in dimensionality. Underthe assumption of (14), Bt~Z~")B ’) so thatmik’(m)= .U(m)~(mJ~ik ,.~"n z~,) = B(~’)Z~ are independent of eachother, andconsequently so are the p~) ~x~ Fik ¯ The joint likelihood of several conditional rankings is nowstated as L~ = ~Pit, (15) i=1 and the joint likelihood for the entire set of observations as L= I-I Lk. (16) k Again the statistical independencebetween pl~"~ and p!,~’,) (i ~ i’) can be deduced, if covariance matrixbetween Z.~’). and "~i,ka~m’) has a similarstructure to (14). Onthe other hand, ~’i,k, Z.(~"~ and a{") (k ~ k’) maybe assumedindependent, if replications are taken over different ik individuals. The (log of) L can be maximized by various numerical techniques. MAXSCAL-4.1, the computer program which performs the necessary computations, uses Fisher’s scoring algorithm for maximization (see Rao, 1952, for example). A detailed description of this algorithm as applied to similar situations, as well as of its relation to the Gauss-Newton methodfor weighted least squares problems, can be found in Takane I1978a]. The necessary derivatives for this optimization are given as follows: For the additive error model we maywrite p(m) exp(sR di(rnk)) ik -~,(,.) ik. , (17) whereSk = -- Ok, and u~’) = ~j = ~, exp(skditik)). Then OIn OXqa p!’~) -- Sk Od~t,~k) 1 O)¢q a Uik° (ra) .~1 j=raO exp(skdiOk)) OXqa (18) 394 PSYCHOMETRIKA and OSk -- di(mk) -- ) .E di uk) ex ), p(sk di tjk) (19) where ~ exp(sk di~ik~) (~Xq dditik) a = Sk exp(Skditjk)) dXqa (20) and c~ditjk)_ (3i~- (~(jk)q)(Xia t~Xqa -- X(jk)a) di(jk) (21) In the above formula fi .. is the Kroneckerdelta (i.e., 61~ = 1 wheni = q, and 6iq = 0 when i ~ q. Parenthesized subscript (jk) in fiOk)~ should be replaced by a single stimulus index. If it is i, then (~(jk)q = (~iq, and J~ is as defined above.) For the multiplicative error modelwe have sk (di(rak)) (22) ¯ (m) = ~=m n -(dit~k)Y 1 ~" Weobtain where Vik S k ~di(m~ ) Vik "~ ~(diok)) 1 (23) and c~ In .tin) -- - Fik In di(mk ~S k ) -- 1 n-1 i)~- ~ .E (In sk diuk))(diuk)) (24) where "~ = si(diuk))(s~-1) ~(di~jk)) ddlt/~) ~Xqa (25) ~Xqa Theexpressionfor dd~j~)/dx~ais given in (21). Little modification is necessary to extend the above formulation to the method of triadic combinations. In this methodconditional rank orders are taken on dissimilarities defined on triads of stimuli rather than on dissimilarities with a common standard stimulus. Also, each conditional ranking is performed in the order of the smallest, then the largest (and then the intermediate) rather than from the smallest to the largest dissimilarity (as the method of conditional rank orders). However,these differences are minor, and can be easily accomodatedwithin the present framework. Missing Data Each conditional ranking does not have to be complete in the method of conditional rank orders, and there are two possible cases. The major distinction lies in whether or not an experimenter has experimental control over missing data elements. In one case the experimenter presents all n - 1 comparison stimuli, but obtains only n* first choices. In this case we may simply take the product of the likelihoods of n* successive first choices, while the likelihood of each successive first choice remains the same YOSHIO TAKANE AND J. DOUGLAS CARROLL 395 as before. That is, Pik= 1--[ (26) eik"~m) m=l wherep~k is given in (12). On the other hand, the experimenter mayrestrict the comparisonstimuli to a subset of n- 1 stimuli, and obtain a rank ordering amongonly the restricted set of comparison tn stimuli. Let n* be the numberof comparison stimuli. Wedefine the probability of the m first choice (m < n*) in a manneranalogous to (12), but excluding those terms not related n* comparisonstimuli. That is, e~"~)= ~)’d~ ~) exp(c~ b~ 1 . (27) q=l The likelihood of a ranking is defined as in (26). Note that this is equivalent to redefining the conditionality of a ranking (the set of dissimilarities on whicha ranking is obtained). Tied Observations Since a continuous distribution is assumedon distances, tied ranks should not occur theoretically, and it maybe possible, depending on the experimental procedure we use, to completely avoid tied ranks. Nevertheless it is desirable for a nonmetric multidimensional scaling procedure to have some provisions for treatment of tied observations, since they mayoccur frequently in practice. There are two cases to be distinguished, one we call a weaktie and the other we call a strong tie. Twodissimilarities are said to be in a weak tie whenthey are not explicitly compared and consequently not rank ordered. For example, an experimental procedure may require the subject to choose the n* most similar stimuli to a standard, but not necessarily to obtain a rank ordering amongthe n* picked stimuli. Then the n* dissimilarities (correspondingto the n* picked stimuli) are "tied" in this sense. A strong tie, on the other hand, is characterized by empirical indistinguishability. That is, two dissimilarities are explicitly compared,but the subject is unable to rank order them. The two kinds of ties should be treated differently, analogous to Kruskal’s [1964b] primary and secondary approaches to ties. In the primary approach no constraints are imposed among distances corresponding to tied observations except that they should satisfy, as muchas possible, the order restrictions imposedby the larger and the smaller dissimilarities than themselves. This approach is suitable for the treatment of weakties. In the secondary approach, on the other hand, distances corresponding to tied observations are required to be as close to each other as possible. This treatment of ties is suitable for the treatment of strong ties, since dissimilarities corresponding to strong ties are deemedempirically indistinguishable. Supposedissimilarities corresponding to dt") and dtin+l) are tied. Then the primary approach defines pt~) (the probability of the mth first choice in a certain ranking) excluding the term related to dtin+ 1), and pt~+1) excludingthe term related to t’). That i s, d t’+ 1) does not have any effects on pC,,), and t’), i n t urn, d oes n ot have a ny effects on p (m+l). The secondary approach, on the other hand, defines both pC,,) and pt~+ 1~ including both terms related to d~’) and dt"+ 1). In this case t") and dt"+ 1) a re bound to be close t o each other, i ptm) × p(m+ 1) is to be maximized. In general, conditionalities of rankings can be of any form. The methodof conditional rank orders and the method of triadic combinations are but two special cases of such 396 PSYCHOMETRIKA instances. Dissimilarities in different conditionalities are not explicitly compared,and thus considered to be in weaktie relations. The primary approach to weakties described above is consistent with our previous treatment of conditionalities. Generalizations Wemay generalize our basic results on directional ranking procedures to other experimental procedures, including such procedures as the method of tetrads and the pick-M methodof similarities. In the methodof tetrads four stimuli are presented in two pairs, and the subject is asked to judge which pair of stimuli is more similar (or dissimilar). In the pick-Mmethodof similarities, the subject is required to pick out the Mmost similar stimuli to a standard stimulus from a well defined set of alternative stimuli. In what follows we showthat these procedures are in fact two special cases of directional ranking procedures. Wehave kept our exposition as succinct as possible, so that nonmathematicalreaders may skip this entire section without loss of continuity. Let S be the set of dissimilarities defined on a set of pairs of stimuli. Wedefine three kinds of relations on S × S, order relation, strong tie relation and weaktie relation. We assumethat for each pair of dissimilarities one of the three relations hold; i.e., for a, b e S, i. a > b or a < b (order relation) ii. a ,~ b (strong tie), iii. a ~ b (weaktie). Let Si and Sj be subsets of S. Wewrite Si > Sj (Si < S~), S~ ,~ S~ or S~ ~ S~, when ~’a e S~ and Vb ~ S~, a > b (a < b), a ~ b or a ~ b, respectively. The S~ and Sj maynot disjoint. Define the set of separate conditionalities Q whose i th memberis denoted by Sic S. VSi, ¥Sj ~ Q (i :p j), S~ ~ Sj. Let the numberof elements in S~ be L. The procedure which requires the subject to rank order Melements (M < u~ L) in S~ partitions S~ into subsets S~ and S~~- M~,whereS~~- M~is possibly null (whenM= L). The u~ is theset of t he M sm allest dissimilarities in Si, while SIt’- M~is the set of L - Mlargest dissimilarities in S~. Obviously SIu~ < S~~’-u~. Either the order or strong tie relations are defined amongthe elements of S~~t~, whereasthe weaktie relations are defined amongthe elements ofS~~- M~. With appropriate choice of the memberof S~ we have the methodof tetrads (including the method of triads as a special case) when L = M= 2, the method of triadic combinations when L = M= 3, and the method of conditional rank orders when L = M= n - 1 (where n is the numberof stimuli). The two incomplete (missing data) cases of the method conditional rank orders can be obtained by setting L < M= n - 1 and L = M< n - 1. (Of course, we may set.L < M< n - 1 to generate yet another variation of the method of conditional rank orders.) With minor modification the above formulation can be extended to the pick-M methodof similarities. This methodalso requires the subject to partition the S~ into two disjoint subsets, S~~ and S~~’-u~. However,this time S~L-u~ cannot be null (i.e., M< L). have S~u~ < S~~’- ~t~ as before. However,the weaktie relation is defined amongthe dements ofSl M~~’-M~. as well as amongthe elements ofS~ A whole variety of experimental procedures, other than those already mentioned, can be generated by specifying different Q’s. MAXSCAL-4.1 is capable of performing MDSfor any definitions of Q, provided that each S~ is at most matrix conditional (i.e., dissimilarities are not comparedacross different replications). Wecall this general case where Q can be arbitrary the methodof general directional rank orders. Note, however, that the above discussion is exclusively based on the algebraic structures of data obtained by the various data collection methods. Little statistical consider- YOSHIO TAKANE AND J. DOUGLAS CARROLL 397 ations are taken into account, which may cause some problem in justifying statistical assumptions underlying the methods. For example, we must assume statistical independence of weakties, which mayor maynot be valid empirically. Results and Discussion As has been alluded to earlier, maximum likelihood estimation offers various advantages over other fitting procedures. The asymptotic chi square goodness of fit statistic can be readily derived from the general principle of likelihood ratio I’Wilks, 1962]. In addition the AICstatistic [Akaike, 1974] can be used for identifying a best fitting model, wherethe asymptotic chi square test is not feasible. Furthermore, the knowledgeof the asymptotic behavior of maximum likelihood estimators allows us to draw asymptotic confidence regions (of a prescribed size) surrounding estimated stimulus points to indicate the degree precision of the estimation. In this section we demonstrate these features of maximum likelihood estimation in the specific context of multidimensionalscaling. Data A small experiment was conducted to collect relevant data. Seventeen schematic faces (Figure 1) were used in this experiment. Those stimuli were originally used by Inukai, Nakamuraand Shinohara [Note 1] in their study to identity determinants of face perception. They were constructed by combiningtwo of the most important determinants of facial expressions, namely the curvature of lips and the curvature of eyes. The stimuli, 4 cm in height and 3 cmin width, were each pasted on a 7.5 cm x 12.5 cm index card. Dissimilarity judgments were obtained from five subjects (two male and three female university students) by the method of conditional rank orders. The order in which each stimulus served as a standard stimulus was randomized over subjects. An average subject took 70 minutes to complete the task. Identifying the Best fitting Model In order to choose the best fitting model the data were analyzed under all combinations of two error assumptions (the additive and the multiplicative error models), two conditions about error variance (constant or variable over subjects) and two dimensionalities (two and three dimensions). In certain cases four and five dimensionalsolutions were also obtained. The results are summarizedin Table 1. Each cell of the table contains three numbers; the top one is the log likelihood of the model, and the one in the middle is the value of the AICstatistic [-Akaike, 1974], whichis defined by AIC = -2 In L ÷ 2 d.f., where the d.f. is the effective numberof parameters in the modelgiven at the bottom of the cell (enclosed in parentheses). Note that the d.f. used here are for the degrees of freedom the model, not for the error degrees of freedom. An important feature of AIC is that it explicitly takes into account the numberof model parameters in evaluating the goodness of fit of a model used to describe the data. The AICis a badness-of-fit index, so that the smaller value indicates a better fit. A relatively nontechnical discussion on this statistic may be found in Takane [1981]. The AIC statistic may be used when models to be compared are not necessarily hierarchically structured. This feature is very convenient when we compare, for example, the goodnessof fit of the two error modelssince the usual asymptotic chi square test is not feasible for such comparisons. The d.f. of the modelis calculated by NA- A(A + 1)/2 + r, where r is the number of estimated dispersion parameters. This r is zero when the dispersions are assumed constant over subjects, and is equal to N - 1 when they are assumed PSYCHOMETRIKA 398 CONVEX ~ CONCAVE, EYES FIGURE1 The stimulus configuration in terms of the curvatures of lips and eyes (FromInukai, et al., Note 1). to vary over subjects. This is so, because there is a trade off betweenthe size of stimulus configuration and the dispersion parameter, and consequently we can either fix the size of configuration or fix one of the dispersion parameters. The AIC is unfortunately based on asymptotic properties of maximumlikelihood estimators (as is the asymptotic chi square). This meansthat, strictly speaking, it is only applicable to large-sample data. Althoughin all cases we shall deal with in this paper we have at least ten times as manyobservations as there are parameters to be estimated, we are not entirely sure whether it is sufficient. Furthermore, with models in which the numberof parameters increases with increasing numberof replications (e.g., the model which allows for individual differences in dispersion), data are never sufficiently large to warrant the asymptotic properties of MLE.Consequently, we have to rely on someheuristic rule in the following discussion. The work [Takane & Carroll, Note 2] is in process, however, to develop a precise modification rule to the basic formula of AIC when the asymptotic properties of MLEmay not be exploited. Choosing a model with a minimumAIC value, when models are in fact hierarchically 399 YOSHIO TAKANE AND J. DOUGLAS CARROLL Table i Summary of the results of MAXSCAL-4.1 analyses of the face data Unconstrainedsolutions Constrained solutions dimensionality 5 4 dimensionality 3 2 -1737.9 -1810.6 3565.8 3683.1 (45) (31) 4 2 Additive error Constant variance in(L) AIC d.f. Nonconstant in(L) variance AIC d.f. -1668.6 -1680.9 -1711.5 -1784.8 -1755.5 -1847.5 3485.2 3485.8 3521.1 3639.6 3554.9 3727.0 (74) (62) (49) (35) (22) (16) Multiplicative error Const ant variance in (L) AIC d.f. Nonconstant In(L) variance AIC d.f. -1748.9 -1817.6 3587.9 3697.2 (45) (31) -1720.1 -1797.6 3538.2 3665.3 (49) (35) structured, is equivalent to choosing a more restricted model, wheneverthe asymptotic chi square is smaller than 2 x d.f. (where the d.f. is now the difference in the numbers of parameters between two models being compared), and to choosing a less restricted model otherwise. Ramsay[-1980a] has shownthat the asymptotic chi square tends to be inflated for small samples, but that there is a remarkableregularity in the wayit is inflated. Noting this fact he has developeda simple correction formula for the asymptotic chi square, which is obtained by just multiplying some constant to the chi square criterion value. This constant should represent howthe asymptotic chi square tends to be inflated under specific circumstances. Interestingly, this constant never seems to exceed 3. Furthermore, it reduces to approximately 1.5, when there are 15 stimuli and five complete replications. Wemay exploit this fact to construct a tentative rule of thumb. That is, we favor a less restricted 400 PSYCHOMETRIKA model whenever the asymptotic chi square exceeds 3 x d.f. (3 = 1.5 x 2). Otherwise choose a more restricted model. This decision rule is equivalent to adding 3 x d.f. (rather than 2 x d.f.) to - 2 In L in the formula of AIC. (Note, however, that the values of AIC Table 1 are evaluated according to the unmodified formula.) With the above qualification in mind let us start with the comparison between the additive error modeland the multiplicative error model. Wesee that in all four corresponding solutions (two conditions on the variance structure times two dimensionalities) obtained under these two error models, the AICis consistently smaller in the additive error model. For these data the additive error modelseems to fit better than the multiplicative error model. Our experience with other data sets also indicates that the additive error model is generally superior to the multiplicative error model, when the type of judgment involved is direct comparison of two or more dissimilarities [Takane, 1978b], though the converse is true for rating judgment[Takane, 1981]. Assumingthat the additive error model is more appropriate, we may now compare the constant variance and the nonconstant variance assumptions. For both two and three dimensional solutions the values of AIC are found smaller in the nonconstant variance assumption (Z2 = 51.6 with 4 d.f. for the two dimensional solution, andz2 = 52.8 with 4 d.f. for the three dimensional solution. Since the values of the asymptotic chi square are both more than ten times larger than the correspondingd.f., it maybe safely concludedthat there are substantial individual differences in dispersions. Weare now in the position to determine appropriate dimensionality of the representation space. The comparison between two and three dimensional solutions in the nonconstant variance additive error model reveals that the third dimension is clearly significant (Z2 = 146.6 with 14 d.f.), implyingthat the appropriate dimensionality is at least three. It is possible that the dimensionality is even higher, so that four and five dimensional solutions were also obtained under the equivalent conditions. The fourth dimension seems to be significant (Z2 --- 61.3 with 13 d.f.), while the fifth dimensionis not. The value of the asymptotic chi square ( = 24.6) representing the difference between the four and five dimensional solutions is only slightly larger than twice the correspondingd.f.(= 12). Note that the five dimensional solution still has the minimumAICvalue according to the original formula, but the four dimensional solution would have been the minimumAICsolution, if we had taken into account the correction factor for small samples (i.e., if we had added3 x d.f. to -2 In L instead of 2 x d.f.). The four dimensional solution is depicted in Figures 2 and 3 for selected pairs of dimensions along with 95%asymptotic confidence regions for the estimated points. The first two dimensions roughly correspond with the two defining properties of the stimuli; dimension 1 represents the lips dimension, while dimension 2 represents the eyes dimension. Although the stimuli are physically two dimensional, the third and the fourth dimensions are clearly interpretable. While it is possible to interpret the unrotated third and fourth dimensions (designated as dimension 3 and dimension 4, respectively in Figure 3), the °interpretation would be much more straightforward if they are rotated for about 60 counter clockwise, as indicated in Figure 3. In the figure the new dimensions are designated as dimension 3’ and dimension 4’. (The dimensions are ordered in terms of the amount of stimulus variability they can account for.) Dimension3’ represents "skew-symmetry"about the eyes dimension. It can be seen in Figure 3 that the stimuli, which occupy symmetric locations with respect to the neutral eye curvature level (i.e., flat eyes) in Figure 1, take approximately equal coordinate values on dimension 3’. For example, stimuli 1 and 17 are very close on this dimension, since they have exactly opposite curvatures of eyes. Dimension 4’ represents essentially the same thing for the lips dimension. (i.e., it represents "skew-symmetry"about the lips dimensions.) Stimuli 3 and 15, for example, take approximately equal coordinate values on this dimension. YOSHIO TAKANEANDJ. DOUGLAS CARROLL 401 DIMENSION1 The two-dimensional plot of the best fitting FIGURE 2. four dimensional solution from MAXSCAL-4.1: Dimension 1 versus Dimension 2. Z U,l DIMENSION 3 The two-dimensional plot of the best fitting FIGURE3 four dimensional solution from MAXSCAL-4.1 : Dimension3 versus Dimension 4. 402 PSYCHOMETRIKA Curvatures equal in absolute value but opposite in sign are perceived to be more similar than wouldbe predicted from linear dimensions of curvature. The linear dimensions are folded (or bent) at neutral curvature levels in order to accomodate the perceived similarities between "skew-symmetric" curvatures. Figure 4 depicts howthe two defining properties of the stimuli are folded (or bent) at the neutral curvature levels. (The top figure is the plot of dimension 1 against dimension 4’; the bottom one is the plot of dimension 2 against dimension 3’. In makingthese plots dimensions 1 and 2 were rotated slightly (about 10° clockwise in the plane spanned by these dimensions) so that they more closely match with the two defining physical dimensions of the stimuli.) It maybe observed in Figure that there is a subtle difference in the waysin whichthe two dimensionsare folded (or bent). For the lips dimensionthe bent is gradual, while for the eyes dimensionit is rather sharp. At the momentwe are not sure whether this difference has any psychological significance. A "folded" dimension may be interpreted psychologically as representing "intensity" analogous in some way to the social utility dimension discussed by Coombsand Kao [1960]. On the other hand, it maybe just an artifact of the noneuclidean nature of the perceptual space. For example, as pointed out by Changand Carroll [1980] in the context of individual differences MDS analysis of color normal and color deficient subjects, possible idiosyncratic monotonic transformations of a commonstimulus dimension by different subjects may well give rise to a "folded" dimension like the ones we have observed in our study. At present, however, there is no definite evidence to favor one over the other. The 3@ 15. = 11 .2 8,, ,16 13 10= =5 17 9 DIMENSION 1 @ 1 =17 11 = ~14 4 7,~=2 "8 z uJ "10 DIMENSION 2 The plots of the "folded" dimensions: FIGURE4 Dimension 1 versus dimension 4’ (top) (bottom). and dimension 2 versus dimension YOSHIO TAKANE AND J. DOUGLAS CARROLL 403 important point is that a "folded" dimension is a rule rather than an exception [Chang & Carroll, 1980; Takanc, 1981]. Since the stimuli were constructed by incomplete factorial combinations of two physical properties, wc mayformally test whether those physical properties can account for the subjective dissimilarities amongthe stimuli. This hypothesis is geometrically very similar to the configuration depicted in Figure 1, except that equal intervals arc not necessarily assumed between adjacent curvature levels. Wcmay add the "skew-symmetric" hypothesis to obtain a four dimensional hypothesis. Dimensions1 and 2 represent the curvature of lips and the curvature of eyes, respectively. The third dimension represents the "skewsymmetry"about the eye dimension. On this dimension the coordinates arc assumed equal for stimuli 3, 6, 9, 12 and 15, stimuli 5, 8, l0 and 13, stimuli 2, 4, 7, 11, 14 and 16, and stimuli 1 and 17, respectively. Similarly, the fourth dimension represents the "skew-symmetry" about the lips dimension. The coordinates for stimuli 1, 4, 9, 14 and 17, stimuli 5, 8, 10 and 13, stimuli 2, 6, 7, 11, 12 and 16, and stimuli 3 and 15 arc assumedequal on this dimension. Coordinates which arc assumed equal were treated as if they were single parameters in the estimation procedure. These hypotheses were fitted under the nonconstant variance additive error assumption. Amongthe two constrained solutions the four dimensional solution has the minimum AICvalue of 3554.9 (it 2 = 184.0 with 6 d.f.). However,this value is still substantially larger than that of the corresponding unconstrained solution. The asymptotic chi square representing the difference between the two solutions is 149.2, which is more than three times its correspondingd.f. (-- 40). Asymptotic Confidence Regions The inverse of the information matrix evaluated at the maximum likelihood estimates of parameters is knownto give asymptotic variance and covariancc estimates of estimated parameters. It has been shown by Ramsay[1978] that the regular inverse maybe replaced by the Moorc-Pcnrosc inverse when the information matrix is singular duc to nonuniquco ness of parameters in multidimensional scaling. Exploiting this fact and the asymptotic normality of the maximumlikelihood estimators, wc may draw asymptotic confidence regions for any subsets of parameters. In MDSwe arc typically interested in confidence regions drawnseparately for each estimated point. In cases of a four dimensional configuration, each confidence region forms a four dimensional hypcr ellipsoid, which can bc orthogonally projected onto two dimensional subspaces each formed by two of the four dimensions. The projection of a hypcr ellipsoid forms an ellipse like those depicted in Figures 2 and 3. In drawing those ellipses a small-sample correction factor, developed by Ramsayfor this MULTISCALE [Ramsay, 1980a], was applied. It generally has an effect of makingthem somewhatlarger. Unfortunately, it is impossible to visualize four dimensional hyper ellipsoid basedon these ellipses. Discussion Wehave seen an example of analysis which can bc performed by MAXSCAL-4.1. Althoughthis examplepertains to conditional rank order data, essentially the same type of analyses can be performed for other types of directional rank order data. As should bc clear from the definition of the likelihood function [(12), (14) &(15)], the directionality of ranking processes is one of the most crucial ingredients of the current procedure. It maybe that this procedure proves to bc very robust against violation of this basic assumption, so that it maybe safely used for nondirectional rank order data as well. At present, however, we have no hard evidence which indicates that this is the case, though a study is being undertaken which is aimed at systematically examining the robustness of the current procedure against nondirectional ranking processes [Takanc & Carroll, Note 2]. 404 PSYCHOMETRIKA Aside from the robustness question there is a good reason to favor directional ranking procedures over nondirectional ones; the former generally provides dissimilarity rank orders which are more consistent with true orderings of underlying distances. To see this a number of dissimilarity rankings were generated from assumed stimulus configurations by both the directional and the nondirectional ranking process. Correlations (Kendall’s tau) were calculated between true rankings of distances and rankings obtained by the two processes. For two levels of the error examined[tr = .2 and .4 for tr(X’X) = n, whichis set to 10] the correlations were uniformly higher for the directional ranking processes. This implies that directional ranking procedures can provide more reliable rankings of dissimilarities than their nondirectional counterparts. Note that in the latter error processes are generated once for all per ranking, whereas in the former they are assumedto be generated for each successivefirst choice. The empirical results indicating superior validity of directional rankings can be rationalized in terms of the following argument. For sake of simplicity let us consider the case of ranking of only three objects (although the arguments for n > 3 should follow straightforwardly). For n = 3, the whole difference between directional and nondirectional rankings arises from the probability that the object whose"true" rank is 2 is in fact given an actual rank of 2. (The process for choosing the first ranked object is identical for both processes, while the third ranked item is completely specified once the first two ranks are specified.) Denoting this probability as Pr(r = 21R-- 2) where R denotes the "true" rank and r the observed rank, we can see that the conditional probability is just the joint probability that object 2 is not maximalin the first place and is maximalin the second. In the nondirectional case the first and second events must be negatively correlated, since a small value associated with object 2 will tend to make that object nonmaximalon both the first and second first choices. Thus the fact that object 2 was not maximalin the first ranking tends to imply that the value associated with it was in fact relatively small, and thus decreases the probability that that object is (correctly) maximalin the secondfirst choice. In the directional case, the other hand, two events are independent, so that the nonmaximality of object 2 in the first place has no implied effect on its position in the second first choice. Consequently, Pr(r = 21R= 2) for directional rankings must be greater than Pr(r --- 21R= 2) for nondirectional rankings. Since in this particular case this conditional probability completely determines the relative validity of the two processes, it follows that directional rankings (at least for n = 3) are more valid than nondirectional ones. A similar line of argumentmaybe applied to the case of n > 3. REFERENCESNOTES 1. Inukai,Y., Nakamura, K., Shinohara, M.Amultidimensional scalinganalysisof judgeddissimilaritiesamong schematic facial expressions. BulletinoflndustrialProduct Research Institute,1977,81, 21-30(in Japanese). 2. Takane,Y., andCarroll, J. D. Onthe robustnessof AICin the contextof maximum likelihoodmultidimensionalscaling.Manuscript in preparation. REFERENCES Akaike,H.Anewlookat the statistical modelidentification.IEEETransactions on automatic control,1974,19, 716-723. Bock,R. D.Multivariate statistical methods in behavioral research.NewYork:McGraw Hill, 1975. Carroll, J. D., &Chang,J. J. Analysisof individualdifferencesin multidimensional scaling via an N-way generalizationof"Eckart-Young" decomposition. Psychometrika, 1970,35, 283-319. Chang,J. J., &Carroll, J. D. Threeare not enough:AnINDSCAL analysissuggestingthat color spacehas seven ( _+1) dimensions. Colorresearch andapplications, 1980,5, 193--206. Coombs, C. H., &Kao,R. C. Ona connectionbetween factor analysisandmultidimensional unfolding.Psychometrika,1960,25, 219-231. Dempster, A. P., Laird, N. M.,&Rubin,D. B. Maximum likelihoodfromincompletedata via the EMalgorithm. TheJournal of the RoyalStatisticalSociety,SeriesB. 1977,39,1-38. YOSHIO TAKANE AND J. DOUGLAS CARROLL 405 Guttman,L. A general nonmetric technique for finding the smallest coordinate space for a configuration of points. Psychometrika, 1968, 33, 469-506. Huynh, H. & Feldt, L. S. Conditions under which mean square ratios in repeated measurements designs have exact F-distributions. Journal of the AmericanStatistical Association, 1970, 65, 1582-1589. Kruskal, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 1964(a), 29, 1-27. Kruskal, J. B. Nonmetricmultidimensional scaling: a numerical method. Psychometrika, 1964 (b), 29, 115-129. Luce, R. D. Individual choice behavior : A theoretical analysis. NewYork: Wiley, 1959. McGee,V. The multidimensional scaling of "elastic" distances. The British Journal of Mathematicaland Statistical Psychology, 1966, 19, 181-196. Ramsay,J. O. Maximum likelihood estimation in multidimensional scaling. Psychometrika, 1977, 42, 241-266. Ramsay,J. O. Confidenceregions for multidimensional scaling analysis. Psychometrika, 1978, 43, 145-160. Ramsay, J. O. Somesmall sample results for maximumlikelihood estimation in multidimensional scaling. Psychometrika, 1980(a), 45, 139-144. Ramsay,J. O. Joint analysis of direct ratings, pairwise preferences and dissimilarities. Psychometrika,1980(b), 45, 149-165. Rao, C. R. Advancedstatistical methods in biometric research. NewYork: Wiley, 1952. Richardson, M. W. Multidimensional psychophysics. Psychological Bulletin, 1938, 35, 659-660. Roskam,E. E. The methodsof triads for multidimensional scaling. Nederlands Tijdschrift Voor de Psychologic en haar grensgebieden, 1970, 25, 404-417. Shepard, R. N. The analysis of proximities: Multidimensional scaling with an unknowndistance function, I & II. Psychometrika, 1962, 27, 125-140 & 219-246. Takane, Y. A maximumlikelihood method for nonmetric multidimensional scaling: I. The case in which all empirical pairwise orderings are independent--theory. JapanesePsychological Research, 1978 (a), 20, 7-17. Takane, Y. A maximumlikelihood method for nonmetric multidimensional scaling: I. The case in which all empirical pairwise ordedngs are independent---evaluations. Japanese Psychological Research, 1978 (b), 20, 105-114. Takane, Y. Multidimensional successive categories scaling: A maximum likelihood method. Psychometrika, 1981, 46, 9-28. Torgerson, W. S. Multidimensional scaling: I. Theory and method. Psychometrika, 1952, 17, 401-419. Wilks, S. S. Mathematicalstatistics. NewYork: Wiley, 1962. Young,F. W. Scaling replicated conditional rank-order data. In Heise, D. R. (ed.), Sociological Methodology.San Francisco: Jossey-Bass Publishers, 1975. Manuscript received 11/24/80 Final version received 6/18/81 APPENDIX Weshow the equivalence of (10) to Luce’s model for the first plicative error modelwe maywrite (10) choice [Luce, 1959]. Assuming the multi- ,k = L¯ + E exp(ck(ln d,,xk, -- In } d,e+l,k,) = -c~ 1 + ~ exp{ln(difq+l.k)/d,lt)) 1 I + ~.(d~+ I.k)/di~o,)) which is Luce’s model. The last equation can be derived by multiplying both the numerator and the denominator by (d,lk))-’’ and by replacing -- ck by s~( <