Using the conditional grade-of-membership model to assess
Transcription
Using the conditional grade-of-membership model to assess
PSYCHOMETRIKA--VOL. 68, NO. 3, 453-471 SEPTEMBER 2003 USING THE CONDITIONAL GRADE-OF-MEMBERSHIP MODEL TO ASSESS JUDGMENT ACCURACY BRUCE COOIL OWEN GRADUATE SCHOOL OF MANAGEMENT VANDERBILT UNIVERSITY SAJEEV VARKI COLLEGE OF BUSINESS ADMINISTRATION UNIVERSITY OF RHODE ISLAND Consider the case where J instruments are used to classify each of I objects relative to K nominal categories. The conditional grade-of-membership (GoM) model provides a method of estimating the classification probabilities of each instrument (or "judge") when the objects being classified consist of both pure types that lie exclusively in one of K nominal categories, and mixtures that lie in more than one category. Classification probabilities are identifiable whenever the sample of GoM vectors includes pure types from each category. When additional, relatively mild, assumptions are made about judgment accuracy, the identifiable correct classification probabilities are the greatest lower bounds among all solutions that might correspond to the observed multinomial process, even when the unobserved GoM vectors do not include pure types from each category. Estimation using the conditional GoM model is illustrated on a simulated data set. Further simulations show that the estimates of the classification probabilities are relatively accurate, even when the sample contains only a small percentage of approximately pure objects. Key words: nominal classification, incidental parameters, extreme profiles, mixtures. 1. Introduction Imagine any general setting where J instruments are used to classify each of I objects relative to K nominal categories. The instruments themselves may be doctors making diagnoses, coders classifying open-ended survey responses into nominal categories, questions on a psychological test or survey where a subject selects one of several preselected responses, or any other instrument used to classify objects into categories. For simplicity we will refer to these instruments as judges. In these cases, the objects would be the patients seeking a diagnosis, individual responses to open-ended survey questions that are being coded into categories, or individuals taking a test or survey, respectively. Judgment-based classification has been used widely in psychology to study personality, vocational interests, psychiatric diagnoses, and even interpersonal interactions (e.g., Chavez & Buriel, 1988), but the applications also extend across the domain of social sciences. For example, it has been used in education to study teaching styles (e.g., Tsai & Denton, 1993) and in marketing to study the content of advertisements (e.g., Yale & Gilly, 1988). This classification framework has also been used to develop general models and estimators for the reliability and structure of qualitative data (Batchelder & Romney, 1988; Cohen, 1960, 1968; Cooil & Rust, 1995; Dillon & Mulani, 1984; Klauer & Batchelder, 1996; Perreault & Leigh, 1989). General latent class models allow one to use the data provided by the J judges to determine the probabilities with which they make correct and incorrect classifications, as well as the prior and posterior probabilities that a given object belongs to one of the K categories The authors thank Max A. Woodbury, Kenneth G. Manton and H. Dennis Tolley for their help and four anonymous Psychometrika reviewers (including an associate editor) for their beneficial expository and technical suggestions. This work was supported by the Dean's Fund for Summer Research, Owen Graduate School of Management, Vanderbilt University. Requests for reprints should be sent to Bruce Cooil, OGSM, Vanderbilt University, 401 21st Avenue South, Nashville, TN 37203. E-Mail: [email protected] 0033-3123/2003-3/1999-0742-A $00.75/0 @ 2003 The Psychometric Society 453 454 PSYCHOMETRIKA (Batchelder & Romney, 1988, 1989; Dillon & Mulani, 1984). Manton, Woodbury, and Tolley (1994) have developed a general grade-of-membership (GoM) model that extends the latent class framework to include the identification of objects as mixtures of an arbitrary number of latent nominal categories. This general GoM model has been used in a wide range of empirical applications to determine latent mixtures from high dimensional discrete multivariate data (e.g., Berkman, Singer, & Manton, 1989; Blazer et al., 1989; Ventrees & Manton, 1986; Woodbury & Manton, 1982). We consider an adaptation of the general GoM model to the classification framework. As in the more general GoM framework, each object is potentially a mixture of the K categories (or extreme profiles), and the degree to which object i belongs to the k-th category is denoted by the grade-of-membership score gik which varies between 0 and 1 where K ~ gik = 1, 0 < gik <-- 1, i = 1 . . . . , I. (1) k=l When gik = 1, object i has exclusive (or "crisp") membership in category k and is referred to as a "pure" type, while gik = 0 indicates that object i has no membership in category k. Fractional grade-of-membership values provide a representation of each object's heterogeneity that is not possible in many classification models. In contrast to the more general GoM framework, we assume the K categories are known beforehand (not latent) and the judges, who do not directly observe the gik values, classify each object to one of the K categories. The only data available, from the classification process, are the actual classifications of each object by each judge. Consider a specific example where a psychological test is administered to determine a person's vocational interests (e.g., Holland's six vocational dimensions, Holland, 1985), and each test item provides alternative responses that relate directly to the K manifest categories. In this case gik, 1 < k < K , represents respondent i's degree of interest on vocational dimension k, and the GoM model studied here provides estimates of the probabilities with which each test item (judge) correctly or incorrectly classifies each basic type (i.e., an individual with a given "pure" vocational interest, at the extreme of one of' the possible dimensions) into each one of the possible basic types. These probability estimates directly measure test item reliability and also provide a way of determining the overall reliability of any group of test items, or even of the test itself (Cooil & Rust, 1995). We present sufficient conditions under which the GoM model provides an identifiable matrix of the probabilities with which judges make classifications. Given that they are identifiable, Tolley & Manton (1992) have shown that the estimates taken from the conditional version of this model are consistent and asymptotically normal. In this nominal classification framework, each judge makes an explicit classification to one of the K categories, and only the grade-ofmembership values (gik) in (1), which designate the actual origin of each object, remain unobserved. Manton et al. (1994) consider this issue in the more general framework where data are available in the form of closed-ended responses to st~vey questions. In that case the objects are assumed to belong to latent categories, and the classification probabilities are not uniquely determined by the observed responses (Manton et al., 1994, pp. 24-28, 53-66), but can be further constrained so that they are identifiable (Manton et al., 1994, pp. 69-70). We show that by considering the simpler but still quite general classification problem, where expert judgements are actually available on how objects should be classified to K prespecified nominal categories, the estimands are unique, under relatively mild conditions, as long as the sample includes the classification of objects that lie exclusively in each of the nominal categories. These results also suggest important ways of quantifying the various aspects of judgement accuracy and provide a guide as to what is ultimately possible with more general forms of the GoM model. We focus on the conditional form of the GoM model because it provides a practical way to find estimates of classification probabilities, especially when J is considerably larger than K. Therefore, the conditional model extends the range of practical applications to settings where it is necessary or 455 B R U C E COOIL A N D SAJEEV VARKI natural to consider a large number of judges. Such applications include: (a) screening a larger number of judges to determine a subgroup of true experts, and (b) cases where a large number of categories and judges are necessary to accommodate the complexity of the objects being studied. Specific examples include the use of psychological test items, test-takers, patients, or customers to evaluate individual psychological profiles, test procedures, patient care, or the quality of products or services, respectively. The conditional G o M likelihood is a multinomial distribution, conditional on fixed G o M vectors (gi 1, . . . , giK), i = 1, . . . , I, where the parameters of primary interest are the unknown classification probabilities. Following Manton et al. (1994), the classification probabilities are estimated directly from this likelihood along with the G o M vectors. The estimates of these vectors are consistent only in the sense that their empirical J t h order moments converge to those of the gik-distribution, Fg, from which the row vectors (gil . . . . giK), i = 1 . . . . . I, are drawn. Nevertheless, the estimated classification probabilities are unconditionally consistent (Tolley & Manton, 1992, Theorem 4.1, p. 92) and the j oint asymptotic distribution of the estimated classification probabilities and moments of Fg allow approximate Chi-square tests based on likelihood ratio statistics (Manton et al., 1994, pp. 75-82). In section 2, we discuss how the G o M model can be applied to the nominal classification problem, present sufficient conditions for identifiable classification probabilities, and study additional conditions that ensure that the identifiable correct classification probabilities are the greatest lower bounds among all solutions that might correspond to the observed multinomial process. Proofs are provided in Appendix A. The asymptotic distribution of these estimators then follows from Tolley and Manton (1992). These results are also briefly summarized in section 2. Section 3 illustrates the application of the model on a simulated data set. A procedure for selecting starting values and a summary of the an estimation algorithm are found in Appendix B. In section 4, we study the accuracy of the estimation method using simulated data that consists entirely of mixtures, although some are nearly crisp. Our results and conclusions are summarized in section 5. 2. A Model for the Classification of Mixtures Assume that each of I objects are assigned to one of K nominal categories by each of J judges. Although each object is potentially a mixture, each judge is asked only to classify it to the most appropriate single category. Thus, a strength of the model is that it does not require data that are any different from what is available when latent class models are used. Let Yijk represent the indicator variable for whether or not judge j classifies object i to category k: Yijk = 1, if judge j classifies item i to category k = 0, otherwise (2) The model assumes that the Yijk are observed realizations of a random variable Yijk, and that judge j , j = 1 . . . . . J, classifies object i, 1 < i < I, to category k, k = 1 . . . . K, with a p r o b a b i l i t y Pijk, where K Pijk = P[Yijk = 1] = ~ £=1 gie)~ejk, (3) and Lejk represents the probability that judge j classifies an object that lies exclusively in category g (i.e., a "pure" type) to category k. The )~ejk are the classification probabilities and are subject to the constraints, K )~ejk= 1,)~ejk > 0 , k=l for a l l £ a n d j , £= 1. . . . . K , j = 1 . . . . . J. (4) 456 PSYCHOMETRIKA Equation (3) provides the operational definition of the GoM values (gik) in terms of their relationship to the actual classification probabilities for mixtures (Pijk), given the underlying probabilities governing the classification of pure types ()Vejk) of (4). If judges make independent classifications, then conditional on the gik, the multinomial random variables Yijk are independent for different values of i (objects) and j (judges). In what is referred to as the "unconditional likelihood function" for the GoM model, an expectation E{.} is taken with respect to the K-dimensional distribution of the unobserved GoM vectors (gil . . . . . giK), 1 < i < I, which has support on the (K - 1)-dimensional simplex defined in (1) (Manton et al., 1994, p. 23; Varki, Cooil, & Rust, 2000, pp. 483, 488-489): LGoM= E i=l ... j=lk=l (pijk)Y~J k } = E = K {(i~ I i=1 kl=Âk2=l kj=l \j=l I KK tj=lk=l ) {i~I j=l }} where the penultimate expression follows from (3) and, in the last expression (5), the xi,j denote the actual category to which judge " j " assigns object "i". The likelihood in (5) is an integrated, or marginal, likelihood with respect to the gik-distribution. Although the gik that correspond to a specific object i are not parameters per se, (5) is still "unconditional" in the sense that it requires the joint estimation of all J-th-order factorial moments of the gik-distribution, or ( J + K - 1 ) ! / [ J ! ( K - 1)!] - 1 parameters, in addition to the )Vejk, which are J K ( K - 1) additional parameters (given the J K constraints of (4)). This is a formidable estimation task, even when the )Vejk are constrained to be mathematically identifiable, because the number of factorial moments becomes very large when there are more judges (J) than categories (K), and K _> 4 (e.g., if K = 4, J = 8, there are 165 additional moment parameters; if K = 5, J = 10, there are 1001). Thus, even with large samples it is typically difficult to maximize (5) so that the estimates are locally identifiable (i.e., the estimated information matrix is typically singular). Manton et al. (1994, p. 71) discuss other numerical difficulties with maximizing (5). Perhaps for these reasons, we have not seen a published application that uses either (5), or the counterpart available for latent categories (Manton et al., 1994, pp. 23, 6 7 - 6 9 ) . One way of reducing the parameter space is to posit a specific parametric form for the gik-distribution (e.g., Varki et al., 2000). A more general alternative approach uses the conditional likelihood, which is the multinomial distribution of the Yijk with respect to the Pijk probabilities in (3), conditional on the actual gik values, I J K LCGoM= I-I I-I I-I /~jk ~yijk (6) i=lj=lk=l = r-I jl--I1 r-I i=1 '= k=l £=1 gieLejk , (7) where (7) follows from (6) and (3). This conditional likelihood allows the investigator to focus on the estimation of the classification probabilities )Vejk, which consists of J K (K - 1) parameters, given constraint (4). The gik values are treated as missing data from an unspecified distribution. Given constraint (1), the gik have I (K - 1) degrees of freedom. Following Manton et al. (1994, p. 68), the gik are estimated in the penultimate step of repeated iterations of an algorithm designed to maximize profiles of the conditional likelihood (7), first with respect to the gik, and then with respect to the )Vejk (see Appendix B). The final estimation of the )Vejk, and the associated significance testing, require only that the estimated conditional information matrix for the )Vejk of (7), conditional on estimates of the gik, be positive definite. 457 BRUCE COOIL AND S A J E E V VARKI The maximization of (7) provides consistent estimates of the LeA, along with values of the GoM vectors (gil . . . . giK), i = 1 , . . . , I, which have empirical J-th order moments that are consistent estimates of the J-th order moments of the gik-distribution, Fg (Tolley & Manton, 1992, Theorem 4.1, p. 92). Consequently we are effectively using (7) to estimate the parameters of the marginal distribution in (5). Thus, estimability will still require that the degrees of freedom of the model (7) exceeds the number of parameters in model (5), that is, K J > JK(K - 1) + ( J q- K - 1)! J ! ( K - 1)! 1. (8) Since there are K categories and J judges, there are K J ways in which the judges can classify a given object and this must exceed the number of parameters that would be estimated in model (5); that is, the J K ( K - 1) classification probabilities ;~ejk, and the (J + K - 1)!/[J!(K - 1)!] - 1 factorial moments of Fg. Note that the conditional likelihood (7) contains a potentially larger number of I ( K - 1) nuisance parameters (the gik suNect to constraint (1)) than the likelihood in (5) (Tolley & Manton, 1992, pp. 91-92). Nevertheless, assuming (8) and a sufficiently large sample, the distribution of the conditional likelihood (7) is anchored by the sample moments of the gik-distribution, so that under general conditions it should be possible to obtain a reasonably accurate estimate of the ),gjk (Manton et al., 1994, p. 67). General estimation problems of this type are also considered by Kiefer and Wolfowitz (1956), who would refer to the classification probabilities as "structural" parameters, in contrast to the "incidental" gik parameters. 2.1. Sufficient Conditions for an Estimable Model Let P be the I x J K (K nested within J) matrix of {Pijk} defined in (3): P = [P1, P2 . . . . . P J], so that Pj refers to the classification probabilities for judge j, Pj . . Pljl . Pljl ... PljK 1 . . ... PljK l<j<J. (9) Given the classification probabilities for each object P = {Pijk}, (3) can be written in matrix form as (Manton et al., 1994, p. 24): P=GA, (10) where G is the I × K matrix of {gik}; and A is the K × J K matrix of {)vejk}:A = [A1, A2 . . . . . A j], Aj . --- )VljK l . . . )VKj1 • • • )'.KjK I ~ljl . l<j<J. (11) If we begin only with the definition that Pijk =-- P [Yijk = 1], it can be shown that the representation in (10) is always possible (Woodbury, Manton, & Tolley, 1994, pp. 153-154; also Manton et al., 1994, pp. 25-27) and that A can be defined uniquely so that its columns are extreme profiles of a convex hull defined within the probability space generated by the profile vectors P (Woodbury et al., 1994, p. 154). When A is unique, model (5) is estimable (and identifiable) whenever (a) J > 2K, and (b) there is a nonsingular submatrix of E[P~P] of dimension K x K with no diagonal elements (Woodbury et al., 1994, pp. 152-166; especially Theorem 3, p. 160; also Manton et al., 1994, pp. 53-63). Condition (a) is generally regarded as more restrictive than (b), and in this case the number of factorial moments in (5) is particularly large, so that frequently the conditional model (7) is the only practical alternative. Note that Condition (b) is satisfied whenever one of the K x K off-diagonal submatrices of the ti)rm E(P}I P J2), Jl ~ J2, is nonsingular, where Pj is defined in (9). The g-th column of this submatrix is proportional to the vector of K 458 PSYCHOMETRIKA conditional probabilities {P[Yij2e = 1] . . . . . P[Yij12K = llYij2e = 1]} for the Yijk of (3). If we take a Bayesian perspective and imagine that each such column is drawn independently from a distribution with support on the ( K - 1)-dimensional simplex, then E(PS1P j2 ) would be of full rank with probability 1, since otherwise one of the columns represents a point in the (K - 2)-dimensional subspace of the original simplex defined by linear combinations of the other columns, and this can only happen with probability zero. (Similarly, E(PS1 P J2) would be nonsingular with probability 1 if it is randomly selected from a distribution that has continuous support on the K 2 - 1 dimensional simplex.) 2.2. Identifiability of A Manton et al. (1994, p. 24-25, 69-70) show that G and A are not unique if there is a nonsingular matrix A (different from the identity) such that the elements of G* = G A -1 satisfies (1), and such that the elements of A* - A A satisfy (4). If such a nonsingular matrix A exists, then P of (6) can also be written as (Manton et al., 1994, p. 25), P = GA = (GA-1)(AA) = G'A*, (12) so that G* and A* would provide alternative grade-of-membership values (g'k) and classification probabilities ()~ejk)" Constraints (1) and (4) require that G* and the J submatrices of A* = [A~ . . . . . A}] be stochastic matrices (i.e., each of these matrices must have nonnegative elements and the elements of each row must add tol). To summarize these conditions in matrix form, A must be a nonsingular K x K matrix such that G* = G A -1 is a stochastic matrix, that is, G*! = !, and G* = {g*k : gi~ > 0, 1 < i < I, 1 < k < K} (13) (where _1 represents the appropriately dimensioned column vector of ones) and such that the submatrices of A* - [A~ . . . . . A}] - [AA1 . . . . . A A j ] are each stochastic matrices, A~!=!, and A j = { L e j k : Lej k >_0,1 <_ £K, 1 < k < K}, l <_ j <_ J. (14) Thus, even if we knew the object classification probabilities P, there could be many possible matrices A* and corresponding matrices G* that would generate the corresponding object classification probabilities P via (10) and still satisfy the constraints imposed by (13) and (14). 2.3. Sufficient Conditions for the Identifiability of A Theorem 1. If G includes all K pure types, there is a unique A such that P = G A . Also, if there is a P = ( J A that maximizes (6), conditional on G, then A is uniquely determined by and G, whenever G includes all K pure types. (This is proven in Appendix A.) G will satisfy this condition whenever a sufficiently large sample is drawn from a population of objects that includes all K pure types. On the other hand, suppose we also want to consider classification probabilities A that satisfy P = G A , for grade-of-membership matrices G that may not include all K pure types. In this case, the unique estimand A, that satisfies the condition of Theorem 1, is still important because it provides the greatest lower bounds of all possible correct classification probabilities {A~j k : k = 1 . . . . . K, j = 1 . . . . . J} (i.e., the diagonal elements of the A j*, 1 _< j _< J ) that satisfy (10), subject to the constraints (13) and (14), whenever we are willing to assume a certain minimal level of classification accuracy. This result also provides a method for estimating the lower bound of the overall reliability of the I classifications. The following condition on classification accuracy for a given category is sufficient in this case. BRUCE COOIL AND SAJEEV VARKI 459 C1. For some category k, k = 1 , . . . , K, there is at least one judge who correctly classifies a pure type with a probability greater than that judge's largest probability of misclassifying a pure type to the same category, that is, for some category k, 1 < k < K, there exists at least one judge j (k), j = 1 . . . . . J, such that )Vkj (k) k > maximum {)re j (k) k : g = 1 . . . . . K, g ¢ k}. (15) Condition C1 can be interpreted to mean that for category k, at least one judge's classifications are reliable enough to ensure that if a crisp (or pure) object has been classified to that category, then it is more likely to be from that category than from any other single category. We have the following theorem. Theorem 2. A s s u m e that the object classification probabilities P can be represented as P = G A , subject to the constraints of (1) and (4), where G consists of K pure types. Also assume A satisfies condition C1 or some category k, k = 1. . . . K, and let j (k), j (k) = 1 . . . . J , represent any judge who fulfills the accuracy constraint of (15) for category k. Under these conditions, the diagonal element )Vkj (k) k of A is the greatest lower bound of the correct classification * probabilities ,kkj(k)k from all A * that satisfy C1, and for which P = G ' A * , even when the corresponding G* does not include all K p u r e types. Also, if C1 is satisfied for each category k, 1 < k < K, then there is a unique A with diagonal elements )vl j(1) 1, )v2j(2)2, . . . , )Vgj(g)g. (This is proven in Appendix A.) The following corollary is an immediate consequence of this theorem and can be used in most applications where the judges are experts. Corollary 1. If in addition to the conditions of Theorem 2, the constraint (15) is satisfied by A for all judges, j ( j = 1. . . . . J ) , and categories k(k = 1. . . . K), then each diagonal element ,kkjk of A is the greatest lower bound among all possible correct classification probabilities ,k~jk that come from matrices A* that satisfy (15) for all judges and categories, and for which P = A* G*, even when G* does not include all K pure types. Theorems 1 and 2 do implicitly assume that model (5) is estimable, and the uniqueness provided by these theorems is meaningful because the G and A that maximize (7) are consistent under very general conditions (Tolley & Manton, 1992, Theorem 4.1, p. 92) whenever A has been sufficiently constrained so that it is identifiable from P; (_, is consistent in the sense that its empirical J-th order moments converge to those of the gik-distribution, Fg, from which the row vectors (gil . . . . giK), i = 1. . . . . I, of G are drawn. Also, the asymptotic joint distribution of and the estimated moments of Fg allow approximate chi-square tests based on likelihood ratio statistics (Manton et al., 1994, pp. 75-82). For these results, Fg need not be continuous, but must have at most a countable number of points with positive probability. Theorems 1 and 2 are especially useful in applications because it is possible to check directly whether the estimates of G and A meet the conditions. If the sample includes all K pure types, the two basic assumptions of Theorem 2 are easily met whenever there is at least one expert judge for each category. On the other hand, suppose we believe that all pure types are represented in the sample, but we obtain an estimate of (_,(o) of G that does not include all pure types (and assume A(O) represents the corresponding estimate of A). Then a natural transformation would be to postmultiply (_,(o) by a matrix B -1 so that (_, = (_,(°)B-1 does include all pure types and to define A as A = B A (°) (B is then a specific stochastic version of the matrix A of (12)). A transformation of this type is not always possible when K > 2. When K = 2, B is the matrix formed from the two rows of G(O) that have the largest ~(o) values for each category k, ik 460 PSYCttOMETRIKA [ 1 (16) where _g(°/= [max{~}])): 1 < i < 1 } il l-lnax{)}°): 1<i<I}] (17) ~(°/=[l-max{~}°):l<i<I} lnax{)}°): 1 < i < I } ] . i2 In this case the rows of B are simply the rows of (3 (°) that are closest to the pure types. 2.4. An Illustration of the Lower Bound Property Under Condition C1 For a simple illustration of the lower bound property of the correct classification estimates in Theorem 2, note that the matrix B in (16) is generally of the form B= 1 - pq 1 -qp I ' 0<p<l,0<q<l, (16 I) so that if either p or q are not 1, the transformed matrix has elements A, where A = BA (°), has elements: ~ljl = / ~ ~" J j l(0) + (1 - P)~j)I ~.~(o) ~2j2 = (1 - q)'~())2 + q'~2j2" (18) 5(o) Thus, the initial correct classification estimates, "~kjk, " = 1, 2, on the right side of (18), correspond to grade-of-membership values ~;(o) such that the "closest to pure" types are the two rows of B referred to in (1 if) (where p and q are presumably close to 1). By transforming these initial estimates to the estimates 2kjk, k = 1, 2, on the left side of (18), we always obtain correct ~(o) classification probability estimates that are less than or equal to the initial estimates "~kjk because the right side of each equation in (18) is a weighted average of the initial correct classification ~(o) probability estimate "~kjk with an error probability estimate that must be smaller than ~.(k~)k,under condition C1. The new estimates Lkjk (on the left side of (18)) correspond to a G matrix that includes pt~e types in both categories. If G(O)A(O) = I~A maximizes (7), the ~.kjk are estimates of the greatest lower bounds of the correct classification probabilities and the corresponding matrix is an estimate of the unique estimand A of Theorem 2. 2.5. Interpreting Condition C1 The classification accuracy condition C1 has a simple interpretation if we adopt terminology that is typically used in diagnostic testing. Define a judges "sensitivity" for category k as the probability of correct classification (Lk j (k) k), and define the minimum "specificity" of judge j for category k as the minimum probability, across all categories £, £ ¢ k, that the judge will classify an object from ~ to some other category besides k (which is one minus the maximum probability of misclassifying an object from another category g to category k, 1-maximum {Lej(k)k : ~ = 1 . . . . . K, g ¢ k}). Then condition CI is equivalent to requiring that for a given category k, there is a judge with a minimum specificity that exceeds one minus that judge's sensitivity for category k. In summary, if CI* represents the assumption that C1 is satisfied for all categories and all judges, then CI* may also be described as: BRUCE COOIL AND SAJEEV VARKI 461 CI*. for all j and k ( j = 1 . . . . . J, k = 1 . . . . . K ) , {judge j ' s minimum specificity for category k} = 1-maximum {)vej(k)k : g = 1 . . . . . K , g ~ k} > 1 - )Vkj(k)k = 1-{judge j ' s sensitivity for category k}. Consequently, C 1' is satisfied whenever all judge classifications are sufficiently sensitive or specific for each category. This is another way of stating the condition of the Corollary. This condition differs from the following condition C2, that is typically assumed when reliability estimates are calculated using classification probabilities (see Cooil & Rust, 1995, p. 202). C2. Each judge correctly classifies each pure type more often than she/he incorrectly classifies it to any other single category: for each j and k(1 < j < J, 1 < k < K ) )~kjk > maximum {)Vkjg. : g = 1 . . . . . K , g ~ k}. In contrast to condition C 1', condition C2 requires that judges have sufficient sensitivity for each category. Condition C2 is not assumed in the Corollary to Theorem 2 (although it is also a reasonable minimal qualifying condition when selecting judges) but is equivalent to condition CI*, when there are only two categories ( K = 2). Technically when K > 2, CI* is not equivalent to C2, nor does either condition imply the other. But if either CI* or C2 is satisfied, and not both, it is because judges tend to make specific types of misclassifications more frequently than others. For example, CI* and C2 are equivalent if we make the additional assumption that all types of misclassification errors occur less frequently then they would if all judges were classifying each object randomly, that is, CI* and C2 are equivalent whenever: 1 )Vejk < for all j , j = 1 . . . . . J, and for all k and g, g ¢ k. 3. Simulated Illustration We consider a simulated data set where each of 8 judges classify 800 objects into 4 categories ( I = 800, J = 8, K = 4 in (7)). Half (400) of these objects have G o M vectors, [gi 1, gi2, gi3, gi4], that are drawn randomly from the flat Dirichlet distribution with density fG_MIX(Xl, X2, X3, X4) = F O~k k=l (19) k=l F (C~k) = 3!, whereo~k = 1, 1 < k < 4, (20) for 0 _< Xk _< 1, 1 < k < 4, such that ~ 4 = 1 X k = 1. The other 400 G o M values are drawn from a symmetric "bowl-shaped" Dirichlet that puts most of its mass near the four pure types, 4 Xk0.95 f N C ( X l , X2, X3, X4) = F(0.2) E F(0.05)" k=l (21) Given this equal mixture of distributions (20) and (21), the expectation is that for each of the 4 categories, 9% of the 800 objects will be "nearly crisp" in the sense that gik, >_ 0.90, for each category k, 1 < k < 4. 3.1. The Judges To make the estimation of classification probabilities more interesting, we used three types of judges: (a) "experts" ()~kjk, >_ 0.7, for all k, 1 < k < 4; see the first 4 judges in Table 1); (b) 462 PSYCHOMETRIKA TABLE 1. The Actual and Estimated Classification Probabilities Based on 800 Objects. 1 Correct Classification Probabilities (%)2 Judge Selected Misclassification Probabilities (%) 100% x )Vlj 1 L2j2 )~3j3 ~4j4 Llj2 )~3jl 90 (89, 92) 70 90 (88, 93) 70 90 (89, 93) 70 90 (90, 93) 70 3.3 (2.2, 4.1) 10 3.3 (2.3, 4.8) 10 (70, 77) (74, 80) (71, 78) (68, 78) (6.0, 10) (7.8, 12) 90 (87, 91) 70 (75, 80) 90 (88, 91) 70 (74, 81) 70 (72, 79) 90 (89, 94) 70 (73, 76) 90 (86, 91) 3.3 (4.6, 6.6) 10 (3.7, 8.6) 10 (6.3, 11) 3.3 (1.5, 3.1) 90 (88, 91) 50 (49, 58) 90 (87, 91) 50 (50, 59) 50 (46, 58) 90 (87, 91) 50 (48, 57) 90 (88, 92) 3.3 (2.6, 5.9) 17 (11, 18) 17 (16, 21) 3.3 (2.0, 4.8) 65 (67, 71) 65 (64, 72) 65 (64, 69) 65 (66, 77) 65 (63, 68) 65 (70, 74) 65 (66, 71) 65 (65, 71) 12 (8.9, 13) 12 (10, 14) 12 (9.9, 14) 12 (6.7, 12) Experts Judge 1 Judge 2 Judge 3 Judge 4 Specialists Judge 5 Judge 6 Not Reliable Judge 7 Judge 8 1 Half of this sample is drawn from the flat Dirichlet of (20), and the other half from the "bowl-shaped" Dirichlet of (21). For each category, the expectation is that 9% of the sample is composed of objects that are at least 90% from that category (gik > 0.9). 2 The parenthetical values are the 1st and 3rd quaxtiles of the distribution of estimates from 10 replications. "specialists" who only classify pure types from some categories with high accuracy (see Judges 5 and 6 in Table 1, where )~kjk is either 0.5 or 0.9); and (c) judges who do not reliably classify any pure type Gkjk < 0.7 for all k, 1 < k < 4; see Judges 7 and 8 in Table 1, for whom )~kjk = 0.65, 1 < k < 4). In each case, judges also make classification errors randomly so that, Lkje = [1 -- Lkjk]/(K -- 1), £ ¢ k. Thus, all judges satisfy the conditions C1 (for all categories) and C2. The data, Yijk (2), are generated as multinomial realizations using the Pijk (3), that are calculated from the Lkje and the randomly generated gik. 3.2. Starting Values and Estimation Procedures for selecting plausible starting values are of particular importance because there are frequently local maxima at the boundaries of the parameter space for A. To find global estimates of A in the interior of the parameter space, it often helps to identify those objects that are closest to pure types when selecting starting values. A simple, data-driven procedure for estimating A (including the selection of starting values) is outlined in Appendix B. Table 1 summarizes the 1st and 3rd quartiles of the distribution of estimates from 10 replications, where in each replication the judges classify a new sample of 800 objects (400 of which are drawn from each of the two Dirichlet distributions of (20) and (21)). The quartiles illustrate the typical accuracy of the estimates. Here the average correct classification probability in each 463 BRUCE COOIL AND SAJEEV VARKI category is nearly 74%, and certainly, as this average decreases, a larger sample would be needed to achieve the same estimation accuracy. The median absolute relative error of the correct classification estimates (across replications) increases as the estimand, )~kjk, decreases toward 0.5 and increases as the error level, {~7,e#k )~ejk }, increases. In this small study, the mean absolute relative error of the estimates went from a high of 12% when )~kjk = 0.5 and ~7,e#k )~ejk = 0.23, to a low of 1.7% when )~kjk = 0.9 and ~7,e#k )~ejk = 0.1 (median absolute relative error ranged from 11 to 1.7%). Nevertheless, a sample size of 800 is a relatively small sample for an estimation problem of this size: here there are 96 ( = J K ( K - 1)) classification probabilities to estimate, only 36% of these objects are crisp enough so that gik > 0.9 for some category k, k = 1 , . . . , 4, and the estimates themselves are based entirely on the relatively crude Bernoulli classifications of (2) that are made by the 8 judges. The inter-quartile ranges of Table 1 also illustrate that estimates are positively biased when the estimand is greater than 0.5 and negatively biased when )~kje < 0.5. This attenuation makes judges 7 and 8 appear more reliable than they really are. Still, the estimates do generally reveal the categories in which the "specialists" and "experts" are most accurate, and they also correctly identify the most reliable judges. Note that (15) was not imposed as a constraint in the estimation procedure. The attenuation toward 0 and 1 is typical of latent class estimates of classification probabilities and is studied more thoroughly in section 4, where we consider varying numbers of categories and judges, and a much larger number of replications. 4. Simulation Study The purpose of these simulations is to study the relative accuracy of estimators of the correct classification probabilities, )~kjk, under a variety of conditions, where in every case the gik are drawn from distributions where true crisp values never actually occur, so that even under the most favorable mixture conditions described below, a fixed sample G can only approximately satisfy the constraint in the theorems of section 2. We consider cases where there are 2, 3, or 4 categories, varying numbers of judges, different proportions of nearly crisp objects, and different correct classification probability levels. We assume that the judges make errors randomly (so that )~kje = )~kjeI whenever g ~ k, and gl ¢ k), but still consider the relatively general case where judges make classifications with accuracy that may change by judge and category. 4.1. Simulation Design Each sample is constructed so that either 50, 70 or 90% of the I objects have G o M vectors, [gi 1, • .., gig], that are drawn from the symmetric and slightly peaked Dirichlet, K fmixture item(X1 . . . . . XK) = F(2K) (22) I - I Xk, k=l and the remaining portion of G o M vectors (50, 30, or 10%, respectively) are drawn from the nearly crisp Dirichlet, ~k Jnea* crisp(X1 . . . . . XK) = r "k rTgT) -- F(10(K -- 1))x9K_10 ' - 17 (23) k=l where o~e = 9 ( K - 1) and O~k = 1, when k ¢ g. In (23), g takes on the value of each category (1 . . . . . K ) for an equal number of objects, so that 1 / K of this nearly crisp portion of the sample is drawn from each category. Even in the case where 50% of the sample comes from the nearly crisp distributions of (23), on average only from 28% (when K = 4) to 31% (when K = 2) of the entire sample will consist of objects that are more than 90% from one category. 464 PSYCHOMETRIKA As in section 3, the data (the Yijk of (2)), are generated as multinomial realizations of the from selected values of ~kj~ and the randomly generated values of gik (as described above). To ensure independent assessments of accuracy, we evaluate the estimates of the )~kjk for only one category (k) in each sample of the judge classifications for I objects. Pijk (3), that are calculated 4.2. Simulation Results Figure 1 shows the absolute relative error (ARE = {l~.kjk - ),.kjkl/)~kjk} X 100%) of estimates for the correct classification probabilities in the worst cases, where the samples are either 70 or 90% true mixtures from (22) (i.e., the remaining 30 or 10%, respectivels; were nearly crisp from (23)), and the corresponding correct classification probability, 100% x Lkjk, is either 60% or 90%. At these high mixture levels, the AREs are highest when the correct classification level is 60% and lowest at 90%. The distribution of ARE, represented by each box plot, is based on 192 replications, where the total random error level, ~-,e#k ,kejk x 100%, is uniformly distributed across the values 10, 20, 30, and 40% (48 replications each). Each replication is based on the classification of 200K objects (i.e., the sample size, I, is 400, 600, 800 for K = 2, 3, or 4, respectively), where the overall average correct classification probability (across judges), ~_,j ,~kjk/J, is 0.75 in each category k, k = 1, . . . , K. By design, the expected number of objects per category is constant (i.e., E [ ~ i gik/I] is the same for each k, k = 1 . . . . . K). This is analogous to the typical design for empirical studies of classification processes where there are a constant number of objects per category. In this case, the objects are not crisp. Typically, one should only use the conditional model when the number of judges exceeds the number of categories, J > K (ideally, J > 1.5K), but we have included cases where J = 4 when K = 4, for illustration and in these cases the median ARE ranges from 3.4 to 35%. Otherwise median ARE is always less than 20%, and usually less than 10%. The cases in Figure 1 represent the least auspicious circumstances for accurate estimation of the ),kjk. Median AREs tend to be substantially smaller when the mixture level is 50%. In this Categories Judges 4 2 6 8 4 3 6 8 4 4 6 8 50 - - 40 ._ -- 30 - - o + t l, 100% *~kjk Mix Level (%) 7( FIGURE 1. Each box plot represents the distribution of absolute relative error (ARE) in the estimation of the correct classification probability Lkjk for the indicated number of categories, judges, actual value of Lkjk, and mixture level (percent of sample drawn from (22), the remaining proportion is drawn from (23)). The 95% confidence interval for median ARE is indicated within each box. Each empirical distribution is based on 192 replications, where the total error level (100%x Y~g¢k Lgjk) is distributed uniformly across the values 0.1, 0.2, 0.3, and 0.4 (48 replications each). 465 BRUCE COOII. AND SAJEEV VARKI Mix Level (%) Categories Judges 70 50 2 3 4 684 4 6846 2 4 68468 3 90 4 46 2 8 468i4 3 6846 4 8 30-20-LLI , ._> cc o -10 -- \ -20 -- FIGURE 2. For each combination of mixture level, number of categories and number of judges, the plot connects the median relative error when the correct probability levels are 0.6, 0.7, 0.8 and 0.9, respectively. Each point is the median of 192 observations (48 at each total error level, = 0.1, 0.2, 0.3 and 0.4). (~e/:k)~jk) case, median A R E is uniformly less than 10% when J > K, except when K = 2, J = 4 and the correct classification level is 0.6 or 0.7 (here median AREs are 18 and 14%, respectively). Figure 2 shows how relative error ( = {[~.kjk -- "~kjk]/)Wk} X 100%) varies by data quality and the dimensions of the problem. Median relative error (MRE) provides a way of studying the typical bias of these estimates. At each combination of mixture level, number of categories and number of judges, the four points plotted in Figure 2 are the M R E when )Wk is 0.6, 0.7, 0.8 and 0.9, respectively (going from left to right). Within each combination of mixture level and number of categories, these plots show that the M R E generally decreases as the level of the estimand ()Wk) increases and as the number of judges increases. M R E also generally increases as the total error level (~,e#k Lejk) increases (not shown). Overall, the M R E runs from 7.5%, when ;Lkjk = 0.6, to --3.6%, when Lkjk = 0.9. This positive bias in the estimates of the lower correct classification probabilities was also noted in the empirical example of section 3. 4.3. The Effect of Average Con'ect Classification Rate In the simulations summarized above, the average correct classification probability within each category was held constant at 0.75. To study the effect that decreasing this average would have on estimation accuracy, we did additional simulations with the samples consisting of 50% mixtures (i.e., 50% of each sample was drawn from the symmetric Dirichlet of (22) and an equal proportion of the remaining 50% drawn from each of the nearly crisp distributions of (23), for = 1, . . . , K ) where the average correct classification rate was set at 0.65. We then studied the change in estimation accuracy at the common levels of Lkjk and {~-,e#k )~ejk} that were used in both sets of simulations (,kkjk = 0.6, 0.7, 0.8 and { ~ # k Lejk} = 0.2, 0.3, 0.4). The change in A R E depended primarily on the values of J and K . In the 3-category, 8-judge case, median A R E actually decreased by 1.6% in absolute terms from 6.0%, when the average )Wk was 0.75, to 4.4%, when the average )Wk was 0.65 ! Otherwise median A R E increased from 2.6 to 12.3% in absolute terms, depending on J and K , as the average correct classification level 466 PSYCHOMETRIKA fell from 0.75 to 0.65, and at the lower average correct classification level (0.65), the largest median ARE was 17.0% (this occurred in the 4-category, 8-judge case). 5. Conclusion The conditional GoM model provides a practical method of estimating classification probabilities when sample oNects consist of both mixtures and pure-types. Theorem 1 shows that if G includes all possible pure types, the classification matrix A is unique. Furthermore, according to Theorem 2, if we are also willing to assume that certain judges satisfy a minimum level of classification accuracy for a particular category k (condition C 1), the estimate of the probability that this judge will correctly classify an object from category k is also the greatest lower bound among all possible A matrices that satisfy C1 (including those for which G does not include all K pure types). Whenever the conditions of the Corollary of Theorem 2 are met, lower bound measures of data reliability (or test reliability when the "judges" are a group of test items) can be calculated directly from the estimates of A (e.g., see Cooil & Rust, 1995: estimates of lower bounds for reliability follow immediately from expressions (22)-(23), p. 204, using "random error model 1" as shown in Table 2, p. 214). In some applications, the "judges" may actually be identifying latent categories that do not directly correspond to the classification categories. For example, consider a psychological testing framework, where test items play the role of judges that "classify" subjects (or patients) in terms of several preselected categories. What if the test items really only identify latent categories? The classification model (7) is directly applicable when a unique subgroup of latent categories are mapped into each of the K classification categories (so that the L latent categories are in mutually exclusive subgroups that are each subsets of a different classification category). On the other hand, whenever a subgroup of more than one classification category corresponds to a single latent category, it would be necessary to redefine that subgroup as a single classification category (if it can be identified beforehand) to obtain the crisp objects needed in Theorems 1 and 2. If the latent dimensions and classification categories are not directly related in one of these ways, model (7) is at best an approximation of a more complicated process, and a more general GoM model, that explicitly accommodates latent categories, may be needed (Manton et al., 1994). In section 3, the conditional GoM model was used to estimate the classification probabilities of 8 judges across 4 categories. The unconditional model would require the joint estimation of A and 164 additional parameters in this case, which may not be practical without a considerably larger sample. Another approach would be to assume a specific parametric family for the gikdistribution, but this would also become impractical as we try to anticipate mixture distributions that could be significantly more complicated than those considered in section 4. Thus, the conditional GoM model makes it possible to consider a wider range of applications than is possible with the unconditional model (or even specific parametric alternatives), including those in which a relatively large number of judges and/or categories are used. These applications include cases where it is necessary to screen a large number of judges to determine a subgroup of experts, and simafions where a large number of categories and judges are necessary to gauge the complexity of the objects. The "judges" may be test items that are used to evaluate psychological or psychiatric subjects, or the "judges" may even be actually test-takers, patients, or customers, who are aske~ to evaluate test procedures, patient care, or the quality of products or services, respectively. The simulations in section 4 indicate that one can generally expect relatively accurate estimates of the classification probabilities (or under condition C 1 of section 2, accurate estimates of the greatest lower bounds of the correct classification probabilities, across all possible grade-of-membership values, G) when there are at least an average of 200 objects per category, even when there are very few crisp objects. When the average correct classification level was 0.75, the median A R E was usually well below 10%, which indicates that the median A R E would generally be well below 5% if there were as many as a thousand objects per category (on average), even if only 10% of those objects were nearly crisp (and among these nearly crisp objects, nearly half could B R U C E COOIL A N D SAJEEV VARKI 467 be less than 90% from one category). To attain the same accuracy, larger samples will typically be required when the average correct classification level is lower, but even smaller samples will suffice as the proportion of nearly crisp objects increases. Appendix A Proofs of the Theorems Let A be a nonsingular matrix such that G* = G A -1 satisfies (13) and such that A* = A A satisfies (14). (P and G must have K-dimensional support, so it suffices to consider nonsingular K x K matrices A). First we show that both A and A -1 must have eigenvector 1, with corresponding eigenvalue 1. Note that for any j , j = 1 . . . . . J, AAjl = 1 (A1) because A~ satisfies (14) and A~ = A A j . But since A jl = ! (A2) (i.e., the elements of each row of A j must add to 1 since it is a stochastic matrix by definition), A 1 = 1, (A3) by substitution of (A2) into (A1). Equation (A3) implies A - 1 1 = 1. (A4) The proofs of both theorems will use the following lemma. Lemma 1. If G includes all pure types, then A -~ is a stochastic matrix. Remark 1. Recall that a stochastic matrix is defined as any matrix with only nonnegative elements such that the elements in each row add to 1. The fact A and A -1 must have row elements that sum to 1 is implied by (A3) and (A4), respectively. Proof It remains to show the elements of A -1 are all nonnegative. Let a ke denote the (kg)th element of A -1, and assume it is negative (i.e., a ke < 0). Since G is assumed to include all pure types, there exists a row i such that gik = 1, gig = 0, g ¢ k. (A5) Now consider the (ig)-th element g* of G*, where G* = G A -1, so that gi*~ = gik ake q- ~_. gir are = ake rg=k (A6) where the last equality follows from (A5). But since a ke < 0 by assumption, we have that g*e < 0 by (A6). This contradicts the assumption that A is such that G* = G A -1 satisfies (13) and establishes that all elements of A - 1 must be nonnegative. Proof of the Theorem 1. By assumption G includes all K pure types. Suppose there is another estimate G*, G* = G A -1, that also includes all K pure types. Then since G = G ' A , the matrix A must also be stochastic by the lemma (following the same proof using A in place of A - l ) . But the fact that A and A -1 are both stochastic implies that A is the identity matrix. Thus, A is unique. Finally, by exactly the same argument (with G and A replaced by (J and A, respec- 468 PSYCHOMETRIKA tively) if there is a f' = G A - 1 A A that maximizes (6), conditional on G, then A is the identity, and A is uniquely determined by P and (_,, whenever (_, includes all K pure types. [] Proof of the Theorem2. Suppose there is another A* = A A and corresponding G* = G A -1, such that A* also satisfies condition C1 and P = G ' A * . By assumption, G includes all K pure types, so A -1 must be a stochastic matrix by the lemma. Let a k£ denote the (kg)-th element of A -1. Then A = A - 1 A *, and for each category k, k = 1, . . . , K, there is a judge j (k), such that )~kj(klk = y-~ ..k£ . * a * ~ej(klk -< [maximum {)~ej(klk : g . . . .1, £=1 * a k£ , K}] x L£=1 = )~kj(klk, _[ (A7) K l ak£ = 1, ak£ > 0 and the aswhere we use the fact that A -1 is stochastic, so that ~ e = sumption that A* satisfies condition C1 for the same judge j and category k, so that maximum {)~*ej(k)k : £ = 1 . . . . K} = )~ j(k)k" It follows that the diagonal element )~k j(k)k is the greatest lower bound among all possible A* that satisfy condition C1. Furthermore, if C1 is satisfied for each category k, k = 1, . . . , K , then A is the unique estimate with the corresponding diagonal elements )~1j (1) 1, )~2j (2)2, .. •, )~f j ( f ) f ' if we can show that a strict inequality must occur in (A7) for at least one category k whenever A is not the identity matrix. But a strict inequality does occur in (A7), because A -1 is stochastic and A* satisfies condition C1, where a strict inequality is assumed in (15): that is, if A is not the identity matrix, there must be at least one category k such that a kk ¢ 1, so that Lkj(k)k = ~--~ak£L*~j(k)k < )~j(k)k X £=1 a k£ L£=1 = )~kj(k)k,* _[ , ~K k£ s i n c e ) ~ j ( k ) k > maximum{)~ej(k)k : g ¢ k} and 2_,e=la = 1, ake > O. O t h e r w i s e a kk = 1, for all k, k = 1 , . . . , K, which would imply that A -1 is the identity (and therefore that A = A*). [] Appendix B Selecting Starting Values and the Profile Likelihood Algorithm Outline of the Estimation Procedure Used in Sections 3 and 4 This procedure consists of: (a) the selection of starting values, (b) estimation of G and A by an iterative profile likelihood (PL) algorithm, and (c) final estimation of A by maximizing (7), conditional on an estimate (_, taken from the penultimate step of the PL algorithm. For step (c), we used the GAUSS constrained maximum likelihood module (Aptech Systems, 1995), which also provided estimates of the covariances of the )~kje estimates. A GAUSS program for the entire procedure is available from the first author. Starting Values Initially, when the classification probabilities, A, are unknown, crisp objects (i.e., pure types) are indicated when a relatively large plurality of judges choose a single category and relatively few judges choose any other single category. If Mi 1 (k) is the largest number of judges that agree that object i should be classified to some category k, and if Mi2(k) is the second largest number of judges that agree on any category other than k, k = 1 , . . . , K , then the most probable crisp values are characterized by large values of Mi 1 (k) - Mi2 (k). To derive preliminary benchmarks for those values of M i l ( k ) - Mi2(k) that might indicate crisp objects, we considered a 469 B R U C E COOIL A N D SAJEEV VARKI rather extreme scenario of low judgment accuracy, where (a) judges make errors randomly, (b) the correct classification probability )~kjk is at least as large as the misclassification probability )~kje, and (c))~kjk is at most twice the misclassification probability when K = 2 or 3 and when K > 3, it is at most 0.5: this implies )~kjk < maximum {2/(K + 1), 0.5}, and that for a crisp object i from category k, E [ M i l ( k ) - Mi2(k)] <_ CEIL[maximum { J / ( K + 1), J ( K - 2 ) / ( 2 ( K - 1))}] (B1) where CEIL[x] represents the smallest integer greater than or equal to x. Consequently, if the observed value of M i l (k) - Mi2(k) is at or above the bound on the right side of (B 1), this would indicate that object i is crisp even if judge accuracy was relatively low (and if judges are more accurate, even larger values of Mi 1 (k) - Mi2 (k) would be expected for crisp objects). Here is a relatively simple procedure for selecting preliminary starting values for the classification probabilities, A, and grade-of-membership values, G. 1. Object i is initially identified as crisp from category k, k = 1 , . . . , K, whenever M i l ( k ) Mi2(k) is greater than, or equal to, the bound in (B1): M i l ( k ) - Mi2(k) > C E I L [ m a x i m u m { J / ( K q- 1), J ( K - 2 ) / ( 2 ( K - 1))}] so that for these objects we initially set gik = 1 and gie = O, g. 7k k. 2. Use the observed classifications to estimate the matrix of classification probabilities, A, assuming the objects selected in Step 1 are crisp and that modal selections are the correct categories. 3. For the objects that are not initially identified as crisp, set the preliminary value of gik equal to the number of judges that classified object i to category k, divided by J. 4. Use the preliminary value of A, from Step 2, and the preliminary estimates of the gik, from Steps 1 and 3, as starting values for the PL Algorithm (described below). The bound in Step 1 is used to avoid the identification of mixture objects as pure types. Still, some of the objects that are identified as crisp will be mixtures that have large values of Mi 1 (k) - Mi2 (k), and some pure types will not be identified initially. Step 1 should be modified to incorporate prior information about (a) the minimum classification accuracy of judges, and (b) the minimum number of nearly crisp objects per category. PL Algorithm When the successive estimates of gik and )~kje converge, the following formulas maximize the conditional likelihood of (7), first with respect to G for fixed A (this is the G-step in (B2)) and then with respect to A for fixed G (the A-step of (B3)): 1 G-step: gik = J ~-'g[i, klj], 7 (B2) wh~e g[i, k l j ] = •(o)2(o) ik k jx(i,j) K ~-~ ~(o)z gie A£Jx(i,J) £=1 (~}(o) ik and ,~o kjx(i,j) represent estimates from the previous iteration) and x(i, j ) is the category to which judge j classifies object i; 470 PSYCHOMETRIKA I yijgg(i, /¢lj ) A-step: 2kJ£ = i=1 I( I (B3) ~,Yijgg(i, Rlj) [=1 i=1 where Yijk is the indicator for whether judge j classifies object i to category k defined in (2). Varki (1996, pp. 136-139) show's how (B2) and (B3) each maximize the corresponding profiles of the conditional likelihood (7), when they converge. This is an iterative approximation to an approach justified by Richards (1961), although we are relying entirely on the conditional likelihood (7). Manton et al. (1994, p. 68) provide a similar approach to the conditional GoM model that is designed for latent categories (based on Woodbury & Clive, 1974) and Tolley & Manton (1992) show that when A and G are selected to maximize the conditional likelihood (7), A will be consistent (whenever P is estimable, and A and G are identifiable). Additional information about the joint asymptotic distribution of A and the moments of Fg is provided by Manton et al. (1994, pp. 75-76). References Aptech Systems. (1995). Constrained maximum likelihood. Kent, WA: Aulhor. Batchelder, W.H., & Romney, A.K. (1986). The statistical analysis of a general Condorcel model for dichotomous choice situations. In B. Grofman & G. Owen (Eds.), Information pooling and group decision making (pp. 103-112). Greenwich, CN: JAI Press. Batchelder, W.H., & Ronmey, A.K. (1988). Test theory wilhout an answer key. Psyehometrika, 53, 193-224. Batchelder, W.H., & Romney, A.K. (1989). New results in test lheory without an answer key. In Edward E. Roskam (Ed.), Mathematical psychology in progress (pp. 229-248). Berlin, Heidelberg, New York: Springer-Verlag. Berkman, L., Singer, B., & Manton, K.G. (1989). Black/white differences in health status and mortality among the elderly. Demography, 26, 6614578. Blazer, D., Woodbury, M.A., Hughes, D., George, L.K., Manton, K.G., Bachar, J.R., & Fowler, N. (1989). A statistical analysis of the classification of depression in a mixed community and clinical sample. Journal of Affective Disorders, 16, 11-20. Chavez, J.M., & Buriel, R. (1988). Mother-child interactions involving a child with epilepsy: A comparison of immigrant and native-born Mexican Americans. Journal of Pediatric P~ychology, 13, 349-361. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 3746. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220. Cooil, B., & Rust, R.T. (1995). General estimators for the reliability of qualitative data. Psyehometrika, 60, 199-220. Dillon, W.R., & Mulani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behavioral Research, 19, 438-458. Holland, J.L. (1985). Making vocational choices: A theory of vocational personalities and work em,ironments. Englewood Cliffs, NJ: Prentice-Hall. Kiefer, J., & Wolfowitz, J. (1956). Consistency of the Maximum likelihood estimator in the presence of infinitely many incidental parameters. Annals of Mathematical Statistics, 27, 887-906. Klauer, C.K., & Batchelder, W.H. (1996). Structural analysis of subjective categorical data. Psychometrika, 61,199-240. Manton, K.G., Woodbur}; M.A., & Tolley. H.D. (1994). Statistieat applications using fitz£y sets. New' York, NY: John Wiley & Sons. Pen-eault, \V.D., Jl:, & Leigh, L.E. (1989). Reliability o1"nominal dala based on qualitative judgments. Jou17~tl of Mal= keting Research, 26, 135-148. Richards, F.S.G. (1961). A method of maximum-likelih~x)d estimalion. Journal of the Royal Statistical Soeiety, Series B, 23, 469-476. Tolley, H.D., & Manton, K.G. (1992). Large sample properties of estimates of a discrete grade of membership model. Annals of the Institute of Statistieat Mathematics, 44, 85-95. Tsai, C.Y., & Denton, J.J. (1993). Reliability assessment of a classroom observation system. Journal of Classroom Interaction, 28, 23-32. Varki, S. (1996). New strategies and methodologies in customer satisfaction. Unpublished doctoral dissertation, Vanderhilt University, Nashville, TN. Varki, S., Cooil, B., & Rust, R.T. (2000). Modeling fuzzy dala in qualitative marketing research. Journal of Marketing Research, 37, 480-489. Vertrees, J., & Manton, K.G. (1986). A multivariate approach for cla,ssifying hospitals and computing blended payment rates. Medical Care, 24, 283-300. Woodbury, M.A., & Clive, J. (1974). Clinical pure types as a fuzzy partifion. Journal of Cybernetics, 4, 111-121. Woodbury, M.A., & Manton, K.G. (1982). A new procedure for analysis of medical classification. Methods oflnformation in Medicine, 21,210-220. BRUCE COOIL AND SAJEEV VARKI 471 Woodbury, M.A., Manton, K.G., & Tolley, H.D. (1994). A general model for statistical analysis using fuzzy sets: Sufficient conditions for identifiability and statistical properties. Information Sciences, 1, 149-180. Yale, L., & Gilly, M.C. (1988). Trends in advertising research: A look at the content of marketing-oriented journals from 1967 to 1985. Journal of Advertising, 17, 12-22. Manuscript received 7 JUN 1999 Final version received 24 NOV 2002