Presentation Order Effects on Category Learning and Category
Transcription
Presentation Order Effects on Category Learning and Category
Preprint; please don’t quote Presentation Order Effects on Category Learning and Category Generalization Fabien Mathy Jacob Feldman Université de Franche-Comté Rutgers University – New Brunswick We investigate the effect of presentation order on category learning in order to differentiate the predictions of several classes of learning models. Exp. 1, we examined the effect of blocked orders (i.e., constant across blocks) on category learning, with the negative examples interspersed in the blocks. In Exp. 2 and Exp. 3, constant fully-blocked orders were administered (i.e., with all the negative examples clearly separated from the positive examples in each block). In comparison to a similarity-based presentation order (previously found more advantageous than a random order, Medin & Bettger, 1994), a rule-based presentation order systematically facilitated learning (in agreement with Mathy & Feldman, 2009), especially in the fully-blocked conditions. Applying the fully-blocked orders to the 5-4 category structure in Exp. 3, we also observed different generalization patterns depending on whether orders were similarity- or rule-based. Existing models cannot account for these results without modification. We propose an extension of the Generalized Context Model (GCM), which allows for a simple computation of the temporal proximity between the stimuli in order to account for the distortion of the stimulus space generated by order manipulation. Although this extension leads to a better explanation of the variance when compared to a more restricted GCM model, our results also suggest that categorization learning involves a process of rule-based abstraction that we here simulate using SUSTAIN. als use rules, exemplars or both to mentally represent category objects (Allen & Brooks, 1991; Ashby & Ell, 2001; Goldstone, 1994; Hahn & Chater, 1998; Homa, Rhoads, & This study extends previous research examining whether Chambliss, 1979; Komatsu, 1992; Pothos, 2005; Rips, 1989; supervised category learning is influenced by the order in Rosch & Mervis, 1975; E. E. Smith, Patalano, & Jonides, which examples are presented (Busemeyer & Myung, 1988; 1998; E. E. Smith & Sloman, 1994; Thibaut, Dupont, & Crawford & Duffy, 2010; Elio & Anderson, 1981, 1984; Anselme, 2002; Thibaut & Gelaes, 2006). Mathy and FeldJones & Sieck, 2003; Jones, Love, & Maddox, 2006; Medin man (2009) recently showed that a rule-based presentation & Bettger, 1994; Nosofsky, Kruschke, & McKinley, 1992; order (based on a small number of rule-learning assumpStewart, Brown, & Chater, 2002; Skorstad, Gentner, & tions) yields superior learning compared to the similarityMedin, 1988). Medin & Bettger for instance reported based order (maximizing the adjacency of the training examthat maximizing the similarity between successive examples) previously found to be most advantageous in artificial ples leads to more efficient learning. Similar efficiency has classification tasks (Elio & Anderson, 1981, 1984; Medin been observed in studies manipulating the alternation of the & Bettger, 1994). Each type of presentation order could be contrasting categories (Clapper & Bower, 1994; Goldstone, expected to favor the corresponding type of mental represen1996). The manipulation of presentation orders can be espetation (a rule-based presentation might favor the abstraction cially beneficial to the testing of categorization models that of rules over the memorization of exemplars), but we would emphasize incremental learning (Sakamoto, Jones, & Love, argue that any class of models (rule-based, exemplar-based) 2008; Love, Medin, & Gureckis, 2004) or category contrast should be able to give a satisfying account of performance effects (Stewart et al., 2002). in all conditions. For example, an suitable rule-based model We addressed the question of whether similar items tend might account for the inferior performance in the similarityto be stored in close proximity to one another. This first based condition based on the idea that an erratic presentation requires clarification of the structure on which the proximorder promotes the induction of invalid hypotheses that evenity notion is based. Indeed, it is unclear whether individutually delay learning. In the rule-based orders constructed in our previous work (Mathy & Feldman, 2009), the stimuli were ordered from the biggest clusters to the smallest ones (following a putaThis research was supported by the Agence Nationale pour tive rule + exception structure), and presentation within clusla Recherche (ANR) Grant # ANR-09-JCJC-0131-01 to Fabien ters was random in accordance with a rule-abstraction proMathy. We are grateful to Azizedine Elmahdi and Nicolas Heller for cess that supposedly impedes stimulus singularity. Because assistance in data analysis. Correspondence concerning this article should be addressed to Fabien Mathy, 30-32 rue Mégevand, 25030 1 the current experiments are closely related, in the Methods Besançon Cedex, France. or by e-mail at fabien.mathy@univsection of Exp. 1 we briefly review the procedure in those fcomte.fr. studies, as well as the specific concept that was learned by 2 MATHY & FELDMAN their subjects. Figure 2A shows the proportion of correct responses as a function of block number observed by Mathy & Feldman. The top curve relates to the group who learned a concept in a rule-based order; the bottom curve corresponds to the learning of the same concept in a similarity-based order. In that paper we that the nature of the rule-based order provided a learning advantage beyond that provided by inter-item similarity, especially because inter-item similarity was indeed higher in the similarity-based order. However, in several respects the design probably complicated the modeling of presentation-order effects, and possibly flattened the effects of each presentation type. First, presentation orders varied across blocks: each new block was newly randomized (within the constraints of the respective desired order) in order to avoid the repetition of identical sequences of category responses across blocks. Second, the negative examples were randomly interleaved with the positive, which somewhat confused the intended order. Third, the negative examples were not ordered, despite their possible effect on category formation. In the present study, which builds directly from this earlier work, the training blocks were arranged in order to remove a maximum number of random variations in the manipulated orders. The participants in the first experiment were each administered a series of fixed/blocked orders across blocks (the orders were constant for each subject, but varied between subjects), with the negative examples interspersed (e.g., + + - + - + + + - - + - - - + -). A second experiment used fullyblocked orders across blocks (the negative examples were all presented after a positive, + + + + + + + + - - - - - - - -). We hypothesized that constant orders would enhance presentation order effects, especially when blocked. In our third experiment a 5-4 category structure (Medin & Schaffer, 1978) was administered using fully-blocked orders, with a view to studying how individual subjects generalize their acquired category knowledge to the to-be-classified transfer items. We also hypothesized that subjects in the rule-based condition would exhibit generalization patterns consistent with rulebased retrieval. To model the subjects’ performances, we developed an exemplar model which capitalized on the temporal dimension to assess the distinctiveness between stimuli. The model aims to account for contiguity effects in the same way as serial position effects can be modeled as discrimination problems in serial recall tasks (Bjork & Whitten, 1974; Brown, Neath, & Chater, 2007; Murdock, 1960). Accordingly, presentation orders can considerably modify the perceived similarity between exemplars and subsequently change their classification. Although this extension leads to a satisfactory explanation of the variance, our results also suggest that categorization learning can involve a process of rule-based abstraction, which we simulate here by running SUSTAIN (Love & Medin, 1998), a clustering model that detects sequencing effects with no modification to its implementation. The three experiments are presented successively, followed by a single modeling section that includes the three data sets and some of Mathy & Feldman’s data. Experiment 1 Method This experiment is identical to that of Mathy and Feldman (2009), except that we used constant presentation orders for each subject, manipulated the presentation order of the negative examples, and only administered one concept to subjects instead of two. Our aim was to simplify their procedure to obtain more stable data. The procedure in Mathy and Feldman (2009) is presented in parallel to Experiment 1 to avoid duplication. Their earlier data are reanalysed and compared to those of the first experiment in the Results section. Participants. The subjects were 22 Freshman or Sophomore students from the University of Franche-Comté (France), who received course credits in exchange for their participation. The subjects were randomly assigned to the experimental conditions. Procedure. Tasks were computer-driven. Participants were individually tested during a single one-hour session (including briefing and debriefing). The participants sat approximately 60-cm from the computer display and were given a tutorial before the task began. Each participant was then asked to learn a single concept and was administered one constant presentation order across blocks until the learning criterion was met. There was no warmup session (such as learning a simple one-dimensional concept) in order to prevent participants from believing that the task would consist of searching for simplistic rules. Stimulus objects were presented one at a time on the top half of the computer screen. After each response, feedback indicating a correct or incorrect classification appeared on the bottom of the screen for two seconds. The positive and negative categories corresponded respectively to the up and down keys, and to two category pictures on the right hand side of the screen. In the category frame, a school bag was displayed at the top, and a trash can at the bottom (to match the response keys). Each time a response key was pressed, the corresponding picture appeared for two seconds along with the feedback, while the opposite picture disappeared for two seconds. The two category pictures reappeared whenever a new stimulus was presented. To encourage learning, each correct response scored the subjects one point on a progress bar. To regulate the learning process, each response had to be given in less than eight seconds (making a maximum of 10 seconds between two stimuli, at which point a “Too late” message appeared for two seconds). If the response was too late, participants lost three PRESENTATION ORDERS points on the progress bar. The number of empty boxes in the progress bar was 4 × 2D (D = number of dimensions, which was equal to four in our study). One empty box was filled whenever a correct response was given, but the progress bar was reset if an incorrect response was given. This criterion was identical to the one used by Shepard, Hovland, and Jenkins (1961) in their first experiment and by Mathy and Feldman (2009). Consequently, subjects had to correctly classify stimuli in four consecutive blocks of 2D stimuli to complete the experiment. This setting required them to correctly classify all the stimuli, including those considered as exceptions (in accordance with the rules terminology employed below). This intentionally limited the number of partial strategies that could provide partial solutions such as being able to classify stimuli on the basis of a limited number of features with less than 100% accuracy. Choice of concepts studied. Participants were each administered a single concept defined over four Boolean dimensions. We chose to restrict our experiment to the most complex concept studied by Mathy and Feldman (2009), for which the presentation orders provided a substantial benefit. Following the classification of Feldman (2003), this concept is called 124[8] (Fig. 1) to indicate that it is the 12th of a set of 4-dimensional concepts made of 8 positive examples. The choice of a four-dimensional concept can easily be justified by the fact that the number of objects to be classified (24 = 16) is large enough to identify any presentation order effects, but small enough to be learnable. As described below, this concept presents an interesting category structure. The top of Fig. 1 shows a four-dimensional hypercube made of 24 = 16 stimuli encoded from 0000 (standing for a0 b0 c0 d0 ) to 1111 (standing for abcd). The 124[8] concept is shown in an arbitrary rotation in the second hypercube of Fig. 1, in which the eight positive examples are indicated by black circles.1 The reasons why this concept has interesting properties are detailed in Mathy and Feldman (2009). They can be summarized as 1) the concept is moderately complex and 2) the concept has a substructure made of several well-defined clusters. Cluster 1 represents six of the concept’s eight members. These six objects can collectively be represented by a verbal expression such as “all a0 except bc”. These objects are circled in red in Fig. 1. In contrast, Clusters 2 and 3 both consist of only one object, each requiring four literals in order to be identified (abc0 d0 and ab0 cd0 respectively). Thus Cluster 1 plays the role of a salient “rule”, whereas Clusters 2 and 3 play the role of “exceptions”. Concerning the negative examples, the second cluster was defined by the negation of (bc)0 on the a0 dimension (i.e., the negation of the first cluster of the positive examples), i.e., a0 ((bc)0 )0 or simply a0 bc, comprising the examples 0110 and 0111. The first cluster was defined by the negation of d0 (bc0 + b0 c) on the a dimension (i.e., the negation of the 3 second and third clusters of the positive examples), that is, a(d0 (bc0 + b0 c))0 , comprising the rest of the negative examples. The negative clusters were therefore simply regarded as an inversion of the positive clusters. Stimuli. Stimulus objects varied across four Boolean dimensions (Shape, Color, Size, and Filling texture). Rotation and permutation were randomized in our experiment for each subject, meaning that dimension a could be any of one of the four dimensions, and that features within dimensions were randomly drawn and permuted (for instance, a0 = blue and a = red, or a0 = red and a = blue, or a0 = green and a = red, etc.). Two values for each feature were chosen randomly (shape = triangle, square, or circle; color = blue, pink, red, or green; filling = hatched or grilled; size = small or big). The combination of these four separable dimensions (Garner, 1974) overall formed 16 single unified objects (e.g., a small hatched red square, a big grilled blue circle, etc.). Ordering of stimuli. We retained the two presentation orders that best facilitated learning from the study by Mathy & Feldman: a rule-based order and a similarity-based order (their dissimilarity-based order was left out to limit the number of experimental conditions). Presentation order was a between-subject manipulation. One presentation order was randomly chosen for a given subject beforehand and then constantly applied across the blocks until the learning criterion was met. In the rule-based order, positive objects were randomly drawn from Cluster 1 until all six had been presented. This was followed by the positive objects in Cluster 2, then Cluster 3. Thus in the rule-based order, all members of the biggest cluster per category were presented first, in a random order but separated from exceptional members, in order to encourage subjects to abstract the simplest rules. The presentation within clusters was random in accordance with a ruleabstraction process that is supposed to impede stimulus singularity. The ordering procedure was similar for the negative examples, starting with the six Cluster 1 examples, followed by the two Cluster 2 examples. Following Mathy and Feldman (2009), we hypothesized that the division of 124[8] into these particular clusters would be beneficial to learning. Table 1 shows a possible rule-based presentation order for Exp. 1. In the similarity-based order, the first object in a given category was chosen at random, and subsequent objects of the same category were chosen randomly from those with maximal similarity to the previous object, and so forth until the set of examples had been exhausted. Ties were resolved randomly. Similarity was computed on a trial-by-trial basis so 1 These hypercubes, also known as Hasse diagrams, are extremely useful for quickly looking at the conceptual structures, which do not easily appear in the corresponding truth tables. 4 MATHY & FELDMAN 7-0110 3-0100 16-1111 11-0101 4-1100 12-1101 c b b' 15-0111 8-1110 c' a' 5-0010 a 13-0011 6-1010 9-0001 2-1000 1-0000 d' 14-1011 10-1001 Possible representations of Cluster 2 B9 B6 T14 T11 B8 T13 Figure 1. A4 T10 A1 A5 B7 T12 T15 A2 n X di j = [ |xia − x ja |r ]1/r (1) a=1 d T16 as to maximize inter-item similarity locally. This method did not guarantee maximized inter-item similarity over an entire block, but it did offer a greater number of possible orders. The same procedure was applied to the negative examples. Dissimilarity between two stimuli i and j was computed by the Minkowski metric A3 Concept 124[8] and Concept 5−4. Note. The positive examples of a given concept are indicated by black circles in the hypercube, whereas negative examples are represented by empty vertices. The examples are also all listed in Table 1. For both concepts, the red set indicates the positive objects belonging to the biggest cluster, the blue set indicates the positive objects from the second cluster, and the green those from the third. Among the positive examples, there are three clusters in concept 124[8] and two clusters in 5 − 4. The negative category clusters are not represented in order to lighten the presentation. The stimulus coding order is ABCD, each of the uppercase letters representing one dimension. The code 0000 stands for a0 b0 c0 d0 , 1111 stands for abcd, each of the lower case letters representing a dimension value (i.e., features). The number preceding the code (1,...16) is a simpler identification number. In the 124[8] notation, the [8] extension means that there are 8 positive examples in the concept, 4 means that the concept is four-dimensional, and 12 is an arbitrary label which identifies this concept from the others available within the 4[8] set (Feldman, 2003). The 5 − 4 notation is older and simpler; it only refers to the presence of 5 positive examples vs. 4 negative examples in the category structure (Medin & Schaffer, 1978; Smith & Minda, 2000). For this concept, following Smith & Minda, 2000 (p. 4), the objects have been numbered by indexing each of the examples using letters (A, for the examples belonging to the positive category called A; B, for the negative category; and T for the examples that were presented during the transfer phase, which the subjects could freely classify as As or Bs). T 10 ,..., T 16 in Smith & Minda (2000) are T 1 ,..., T 7 respectively in Johansen & Palmeri (2002). The numbers following the letters A, B, and T are arbitrary numbers that distinguish the individual items. The first cluster and second cluster are circled by discontinuous red and blue curves respectively in the 5 − 4/7 concept to illustrate their fuzzy aspect. Indeed, the representation of the clusters can be more or less specific in the subject’s mind since 7 examples are not associated with clear categories. where xia is the value of stimulus i along dimension a. We used a city-block metric appropriate to the separable dimensions used in this study (r = 1). The similarity was simply computed using si j = n − di j . The most important aspect of this procedure was that the ordering did not necessarily respect the cluster boundaries targeted in the rule-based order, as similarity steps can cross in and out of clusters. For instance, the stimulus 0100 can be followed by stimulus 1100. For both order types, the main difference with the study by Mathy & Feldman (in which each new block was newly randomized, although constrained to a given order type) was the choice of a constant presentation order. In our experiment, the negative examples were also randomly intermingled with the positives in order to avoid long uninterrupted sequences of positives and negatives. The random position of the negative and positive examples for a given order was computed differently for each subject before the task began, and then applied constantly to all the blocks until the learning criterion was reached. The present blocked order procedure could have made it less difficult for the subjects to learn the categories for two reasons. Ideally, the subjects would benefit from the constancy of the presentation to abstract rules or to reinforce exemplar memory traces. However, the subjects could also have progressively noticed one or several short sequences of category responses (for instance + − −− from a given example), but longer sequences were also potentially learnable, up to the length of an entire block. Because these sequences were an obvious extraneous variable, we ran this study with a limited number of participants. Still, given that the subjects may not have realized early on that the presentation order was repetitive, and given that the results can be compared with the other experiments presented in this study, we thought that this procedure deserved to be tested first. Justification for the types of presentation order chosen. These two types of presentation order match two extreme ways of learning that will be targeted in the subsequent simulations: a complex inductive process based on abstraction and an elementary process with underlying associative mechanisms. 1. The rule-based condition uses a set of clusters which are presented to subjects in an order depending on their magnitude (since in many domains, exceptions are learned last), 5 PRESENTATION ORDERS with no distinction within clusters (since the abstraction process supposedly cancels out any effects of the non-diagnostic features on learning). Because the objects are supposed to involve common abstract properties within clusters, they are drawn randomly. 2. The similarity-based condition follows a more simple hypothetical associative process that uses the temporal contiguity of the stimuli to reinforce the memory traces locally. Results The learning curves in Figure 2B show the influence of presentation order on learning. In line with our prediction, learning was faster in the rule-based order. A paired ttest between the two curves in Figure 2B, which consisted of comparing the mean proportions of correct responses by pairing the proportions along the blocks, was significant, t(58) = 14.07, p < .001 (this simple method is equivalent to taking the blocks as the covariate). The difference between the mean numbers of blocks to criterion was significant, F(1, 20) = 7.72, p = .01, η2 = .28 (24.2 and 37.1 blocks for the rule-based and similarity-based orders respectively). The average inter-item similarity within the rule-based order and the similarity-based order was 1.86 and 2.25 respectively, when inter-item similarity was computed for both positive and negative examples. The distribution of inter-item similarity is shown in Figure 3B. The difference between the two mean inter-item similarity values was significant, F(1, 672) = 483, p < .001, with a lower value for the rulebased order. A follow-up questionnaire at the end of the experiment indicated that 17 participants noticed that the presentation order was manipulated (but were still unsure about the exact regularity/circularity of the sequences). Five other subjects declared that they believed the order to be completely constant (3 in the similarity-based condition, and 2 in the rulebased condition), and only one did not notice any manipulation of presentation order (in the rule-based condition). Comparison with Mathy & Feldman’s 2009 data. For the sake of comparison, we now give a brief analysis of Mathy & Feldman’s 2009 data which matches the preceding analysis verbatim. We restricted the 2009 data to concept 124[8] and to the similarity-based and rule-based presentation orders (those experiments included four other conditions). The learning curves in Figure 2A show the influence of presentation order on learning. Learning was faster in the rule-based order (N = 25) than in the similaritybased order (N = 23). A paired t-test between the two curves, which consisted of comparing the mean proportions of correct responses by pairing the proportions on the basis of blocks, was significant, t(77) = 6, p < .001. In addition, the mean number of blocks to criterion was significant, F(1, 46) = 6.34, p = .01, η2 = .12 (22.6 vs. 28.7 blocks to criterion respectively). The average inter-item similarity within the rule-based order and the similarity-based order was 1.9 and 2.24 respectively (the distribution of inter-item similarity according to presentation order is shown in Figure 3). The difference between the two mean inter-item similarity values was significant, F(1, 1222) = 613, p < .001. Conclusion. To our surprise, despite the use of constant presentation orders, we did not observe better performance in Experiment 1 than in Mathy & Feldman’s 2009 data (either by comparing the proportion of correct responses or the number of blocks to criterion). In fact, faster learning was observed in the 2009 data set. We had expected faster learning in the present experiment, giving that constant presentation orders may consolidate memory formation. This absence of a stronger contrast between the two studies can, however, be explained by two factors: (1) The 2009 data only correspond to the subjects who were able to reach the learning criterion within the hour allocated to the entire experiment (a total of 27 subjects dropped out because they did not meet the criterion within the one-hour schedule). The 2009 data would therefore have led to much higher estimates had all the subjects been kept in the analyses. In the present experiment, only two subjects did not reach the learning criterion in the allocated time (for these two subjects who were administered a similarity-based condition, the high number of blocks reached before they dropped out was still used to compute the mean number of blocks to criterion); (2) all of the subjects in the study by Mathy & Feldman benefited from a warm-up condition, and because the subjects learned two different concepts (permuted between subjects), half of them also benefited from learning a one-dimensional concept before learning the 124[8] concept. The subjects were therefore more familiar with the task overall. (In such tasks, the degree to which a concept is learned is known to be correlated with better performance, Mathy & Bradmetz, 2004.) Experiment 2 In order to avoid identical sequence repetition of category responses across blocks, Experiment 2 was the same as Experiment 1, except that we deconfounded the training and categorization aspects by alternating learning blocks (not requiring any response) and categorization blocks (requiring a response). The presentation order was set to constant in the learning block, but was random in the categorization block. This allowed a clear separation of the order chosen for the positive examples from the one chosen for the negative examples, by separating the presentation of all the positive examples from all the negative examples, in a so-called ”fullyblocked” order (Clapper & Bower, 1994, 2002). 6 MATHY & FELDMAN Table 1 Encoded study items of Concept 124[8] and Concept 5−4 presented in Fig. 1, and three presentation order samples for Concept 124[8] 124[8] 5−4 Presentation order samples in 124[8] # Cat A # Cat A SB0-2009 RBO-Exp1 RBO-Exp2 1 0000 A1 1010 3 0100 3 0100 3 0100 3 0100 A2 0001 13 0011 13 0011 13 0011 4 1100 A3 1001 ? Cat B 12 1101 1 0000 5 0010 A4 0011 1 0000 1 0000 5 0010 6 1010 A5 1011 ? Cat B 14 1011 11 0101 9 0001 5 0010 5 0010 9 0001 11 0101 11 0101 11 0101 4 1100 13 0011 9 0001 9 0001 6 1010 ? Cat B 8 1110 12 1101 # Cat B # Cat B ? Cat B 4 1100 14 1011 2 1000 B1 0100 4 1100 2 1000 8 1110 7 0110 B2 1100 6 1010 16 1111 2 1000 8 1110 B3 0010 ? Cat B 10 1001 16 1111 10 1001 B4 0111 ? Cat B 6 1010 10 1001 12 1101 ? Cat B 7 0110 7 0110 14 1011 ? Cat B 15 0111 15 0111 15 0111 16 1111 Note. Cat A, positive objects of the concept; Cat B, negative objects. The 124[8] and 5 − 4 concepts are shown in Fig. 1.; SBO-2009, Similarity-based order in Mathy & Feldman’s 2009 study (this one-block presentation order was sampled from the many different orders that could be instantiated; because the category B stimuli were randomly drawn and interspersed within blocks, the “?” cells facing the “Cat B” cells could be replaced by any example of the negative category; the “?” cells therefore indicate that the order was not manipulated) RBO-Exp1, Rule-based order in Exp. 1 (the presentation order was constant across blocks, the negative stimuli were interspersed but the presentation order of the negative examples was also manipulated and set constant across blocks); RBO-Exp2, Rule-based order in Exp. 2 (the presentation order was constant across blocks during the learning blocks, but presentation was random during the categorization blocks; the positive and negative stimuli were fully blocked, with all the positive stimuli presented before all the negative stimuli); Stimulus numbers are indicated in Fig. 1 Method Participants. The subjects were 46 Freshman or Sophomore students from the University of Franche-Comté, who received course credits in exchange for their participation. The subjects were randomly assigned to the experimental conditions (rule-based or similarity-based). Procedure and Stimuli. The procedure and the concept chosen were identical to those used in Exp. 1, except that the fully-blocked training blocks alternated with the random categorization blocks until the learning criterion were met. A presentation order was randomly drawn for each participant and applied to all training blocks, starting with all positive examples, followed by all negative examples. During the training blocks, each stimulus was displayed for one second. While the stimulus was presented, the category was labeled below the stimulus (i.e., “school-bag” or “trash-can”) and the corresponding category picture was also displayed for one second (for instance, for a positive category, the school bag was shown for one second, while the trash can was hidden). This was followed by a confirmation phase during which the subject had to press the response key corresponding to the category picture that had just appeared. After the key was pressed, feedback indicating a correct or incorrect classification appeared at the bottom of the screen for two seconds, during which the stimulus remained on display. Our pretests showed that in this condition, none of the instructed responses could be missed by the participants. The subjects PRESENTATION ORDERS 7 Figure 2. Proportion correct as a function of block number. Note. A) Mathy & Feldman’s 2009 data set, B) Exp. 1, C) Exp. 2, D) Exp 3. Blue curve, rule-based order; Red curve, similarity-based order. were therefore expected to receive 100% positive feedback during this phase. This method was employed to make sure the subjects were actively following the learning phase and that they did not miss any of the instructed categories. The progress bar was hidden during the learning phase. Following a 5-second pause, each learning phase was followed by a categorization phase in which all the stimuli were randomly drawn. The number of points accumulated on the progress bar in a given categorization block was reset whenever a new categorization phase started. Results The following analysis is based on the categorization blocks. The learning curves in Figure 2C show the influence of presentation order on learning (the abscissa reports the number of categorization blocks to criterion). More efficient learning was observed in this experiment than in the first experiment, F(1, 184) = 4.3, p < .05, η2 = .02, when comparing the proportion of correct responses in Figure 2. We observed no Experiment × Presentation order interaction. Learning was faster in the rule-based order. A paired t-test between the two curves (pairing the proportions on the basis of blocks) was significant, t(34) = 7.6, p < .001. In addition, the difference between the mean numbers of blocks to 8 MATHY & FELDMAN Figure 3. Mean inter-item similarity per block, averaged across blocks. Note. A) Mathy & Feldman’s 2009 data set, B) Exp. 1, C) Exp. 2, D) Exp 3. Left boxplot, rule-based order; Right boxplot, similarity-based order. criterion was significant, F(1, 44) = 6.71, p = .01, η2 = .13 (11.7 and 16.1 blocks to criterion respectively). The average inter-item similarity within the rule-based order and the similarity-based order was 1.90 and 2.24 respectively (the distribution of similarity is shown in Figure 3). The difference between the two mean inter-item similarity values was significant, F(1, 686) = 4.57, p < .03. Conclusion. Although learning appeared to be quicker in Experiment 2 than in Experiment 1, the difference would be less obvious if both learning blocks and categorization blocks had been cumulated. (In the rule-based order, about 100% correct would have actually been reached for about 2 × 15 blocks instead of 15 categorization blocks.) Indeed, the subjects were also allowed to learn the categories during the random categorization blocks. Hence, 30 blocks almost stretches to the results of Experiment 1. The basic effect of the frequency at which the stimuli are shown to subjects therefore tends to reduce the possible beneficial effect of the fully-blocked learning phase. Experiment 3 Experiment 3 was the same as Experiment 2, except that the concept given to the subjects was the Medin and Schaffer (1978) 5-4 category set (see Fig. 1 and Table 1). The 5-4 category structure was first studied by Medin and Schaffer (1978), then in many subsequent studies that have been re- PRESENTATION ORDERS analyzed by J. D. Smith and Minda (2000), among others (Cohen & Nosofsky, 2003; Johansen & Kruschke, 2005; Johansen & Palmeri, 2002; Lamberts, 2000; Lafond, Lacouture, & Mineau, 2007; Minda & Smith, 2002; Rehder & Hoffman, 2005; Zaki, Nosofsky, Stanton, & Cohen, 2003). This structure allows the way in which 7 unclassified items (out of 16) are categorized during a transfer phase of learning to be studied. Our objective was to find out if different generalization patterns can be observed depending on whether orders are similarity- or rule-based. A second objective was to generalize the effect of the presentation order types observed for concept 124[8] in the previous experiments. 9 For instance, Cluster 1 could instead be represented by the set (A5 , T 15 , A1 , A2 , T 12 , and A3 ). The same applies to the negative examples. To order the items in a rule-based fashion during the training phase, we chose to rely on a simple one-dimensional rule plus exceptions. For the positive category, we chose to group the objects into two mutually exclusive clusters, that is, Cluster 1 = (A5 , A1 , A2 , A3 ), a subset of the red set in Fig. 1, and Cluster 2 = A4 , representing the rule “All green, except the large plain red square, and the large hatched green circle”. Similarly, the negative objects B8 , B9 , and B6 preceded the presentation of B7 (i.e., “All red, except the large hatched green circle, and the large plain red square”). Method Participants. The subjects were 44 Freshman or Sophomore students from the University of Franche-Comté, who received course credits in exchange for their participation. Procedure. For this experiment, rather than giving concepts in an arbitrary rotation and permutation of features (see Fig. 1); instead, to facilitate learning, each logical dimension was instantiated by the same physical dimension for all subjects. A color dimension differentiated the objects at the top of the hypercube from those at the bottom (green vs. red respectively); a shape dimension differentiated the objects at the front from those at the back (square vs. circle); a size dimension distinguished the objects in the left cube from the those in the right cube (small vs. large); and finally, the left and right objects within the cubes were hatched vs. plain. Consistent with Exp.2 and Exp. 3, presentation order (rulebased or similarity-based) was a between-subject manipulation during the training phase. The subjects were randomly assigned to these two conditions. Once the subjects reached the learning criterion (the progress bar was this time equal to 7 × (5 + 4)), we conducted a transfer phase during which both the training and transfer stimuli were presented (each once in a block). The transfer phase was composed of 5 blocks of 16 stimuli. Clusters. The 5 − 4 notation only refers to the presence of 5 positive examples vs. 4 negative examples in the category structure (Medin & Schaffer, 1978; J. D. Smith & Minda, 2000). For this concept, the objects have been numbered by indexing each of the training stimuli using A (the positive category), B (the negative category), and T (the transfer items). This notation refers to previous research (e.g., Smith & Minda, 2000, p. 4). Fig. 1 depicts a solution that can be adopted by subjects to cluster the positive items during the transfer phase. Note that the choice of stimuli in the first cluster is hypothetical (we refer here to the clusters that may result from the subjects’ conceptualization during the training phase, and are then applied to the transfer phase). For this reason, the red and blue sets are printed in dashed curves. Results Learning Phase. Learning was faster in the rule-based order (Figure 2D), as confirmed by a significant paired t-test between the two curves, t(27) = 7.5, p < .001, and the difference between the mean numbers of blocks to criterion was significant, F(1, 42) = 5.15, p = .03, η2 = .11 (6.4 vs. 9.9 blocks to criterion). The difference between the two mean values for inter-item similarity shown in Figure 3 was also significant, F(356) = 72, p < .001, 2.20 and 2.60 respectively. Transfer Phase. One subject who was not able to meet the learning criterion in the time allowed did not complete the transfer phase. The following analysis of the transfer phase follows that of Johansen and Palmeri (2002) (pp. 495-...). Johansen and Palmeri (2002) developed a complex analysis of the patterns reflecting rule-based category representations, which we here take for granted. These patterns only vary for a subset of stimuli labeled the critical stimuli. Figure 4 only shows a subset of these average categorization probabilities corresponding to the critical transfer stimuli (those that are diagnostic of a rule-based or an exemplar-based generalization pattern: T 10 , T 11 , T 13 , T 14 , T 15 ). The graph does not include the categorization probabilities for the 5 positive and 4 negative examples encountered by the subjects in the training phases, as these stimuli were globally categorized as expected in the transfer phase. Because the subjects were trained to categorize the stimuli labeled A as belonging to the A category during the learning phase, the proportion p(A) was very high during the transfer blocks for those five stimuli, regardless of presentation type. The opposite was logically found for the four stimuli labeled B. In Figure 4, the categorization probability p(A) is the observed proportion of stimuli categorized as A (i.e., as positive) during the transfer phase across the five blocks. A stimulus categorized five times out of five as A simply corresponds to p(A) = 1. When selecting the non-critical stimuli only (all A’s, all B’s, along with T 12 , and T 16 ), a Stimulus Type × Presentation Order ANOVA on the proportion of A responses indi- 10 MATHY & FELDMAN cated only a slight significant effect of Stimulus Type (note that the T 12 stimuli were mostly categorized as A’s, whereas the T 16 were categorized as B’s). However, a similar analysis restricted to the critical stimuli showed a significant interaction between Stimulus Type and Presentation Order, F(4, 205) = 2.78, p = .028, η2 = .05. Figure 4 indicates that following the rule-based presentation order, the average pattern (across subjects) is BBABA (a typical rulebased generalization pattern), as opposed to ABABA for the similarity-based presentation order (a non-typical pattern, although similar to ABBBA, a typical exemplar-based pattern). We now focus on the distribution of the generalization patterns at the individual subject level (N = 43). In theory, a prominent rule-based generalization pattern is BBABA, which corresponds to a one-dimensional rule (plus exceptions) based on the Color dimension, i.e. “The positive are all red objects except B7 , and the negative are all green objects except A4 ”). The results show that the BBABA pattern was less common in the similarity-based presentation order than in the rule-based order (7 vs. 14 subjects respectively). Note that the AABBB pattern (another prominent rule-based pattern, based on a one-dimensional rule using Size instead of Color as the main dimension) is represented twice for subjects who were given a similarity-based order and once by a subject in the rule-based presentation order condition. We conclude that the subjects tended to use Color to separate the categories. A total of 24 subjects eventually categorized the transfer objects in a way that suggests that they applied a rule-based strategy2 . Overall, our result clearly indicate a distortion in the generalization patterns according to presentation order, and this distortion is mainly visible in the frequency associated with the BBABA pattern. Figure 4. Average categorization probabilities of the critical transfer items (T 10 , T 11 , T 13 , T 14 , T 15 ) in the 5-4 category structure during the transfer phase (amounting to 5 blocks). Note. p(A) is the observed proportion that each of the stimuli labeled under the abscissa was categorized as A (i.e., as positive) during the transfer phase. The proportions are broken down by presentation order conditions (rule- vs similarity-based). The graph does not include the categorization probabilities for the 5 positive examples and the 4 negative examples that the subjects encountered in the training phases. Error bars show +/- one s.e. Temporal-GCM fit to the Exp. 1 and Exp. 2 data sets We argue that exemplar models in their present form are not totally able to predict subjects’ category representations as a function of presentation order. In this category of models, there is sometimes no specific mechanism for modulating the strength of the memory traces based on the order in which the stimuli are presented (Nosofsky, 1984, 1986; Nosofsky, Gluck, Palmeri, McKinley, & Gauthier, 1994). We mentioned above that exemplar-models that use a decay function of lag of presentation can handle recency effects (Nosofsky et al., 1992), but they do not seem to be appropriate for handling primacy effects (Busemeyer & Myung, 1988). For instance, Nosofsky et al. (1992) suggest that the similarity of a stimulus i to an exemplar x is modulated by the memory strength M x associated with exemplar x, with M x = exp(−lag), signifying that the greater the number of intervening items between the presentation of i and x, the lesser the memory strength. The model that we present in this paper follows the same principle, except that we use temporal contiguity (Burgess & Hitch, 1999; Howard & Kahana, 2002) computed by block to account for primacy effects. Nosofsky et al. (1992) focused on the memorization of the stimuli, whereas our model focuses more on the discriminability of the stimuli. For instance, according to Nosofsky et al. (1992), given a block of three stimuli a, b, and c, the lag between stimulus a of the second block and stimulus c of the first block is only one. The greater memory of c makes the subject perceive a as very close to c. However, if we restrict the computation of the lags within a block, a is simply maximally isolated from c. In the short-term memory literature, this isolation is known to produce a primacy effect. It follows that the memory strength for a might be stronger as a result of this primacy effect, rather than faded due to 2 Twenty-four (7+14+2+1 = 24) is quite a high value in relation to the fairly erratic distribution of patterns that has previously been observed (see Johansen & Palmeri, 2002, p. 491). PRESENTATION ORDERS the effect of time. Because our constant presentation orders produced a circularity between blocks, especially when the training blocks were associated with categorization blocks, we target here a model that accounts for the discriminability of the stimuli within blocks. TGCM (a temporal version of GCM) is a simple extension of the standard model that incorporates the temporal dimension into the usual computation of the similarity between pairs of stimuli. In GCM, the distance function presented in Equation (1) can be used with r = 1 (a city-block metric suitable for separable dimensions), n the number of physical dimensions (here, n = 4), and xia the value of stimulus i in dimension a. The distance function can be augmented with six free parameters: a scale parameter c reflecting discriminability in the psychological space and n attention P weight parameters of dimensions with 0 ≤ wa ≤ 1, and wa = 1 (n − 1 = 4 − 1 were free to vary). di j = c[ n X wa |xia − x ja |r ]1/r (2) a=1 The following exponential decay function can be used to relate stimulus similarity to psychological distance (Nosofsky, 1986; Shepard, 1987): ηi j = e−di j (3) Given the total similarity of a stimulus i to all exemplars in categories X and Y, the probability of responding with category X is generally computed by Luce’s choice rule: P ( x∈X ηix )γ P P(X/i) = P (4) ( x∈X ηix )γ + ( y∈Y ηiy )γ The choice rule can be augmented with γ, which is a response-scaling parameter that governs the extent to which responding is probabilistic versus deterministic (Ashby & Maddox, 1993; McKinley & Nosofsky, 1995; Navarro, 2007; Nosofsky & Zaki, 2002). This parameter is used to fit the data across the blocks, and to better fit the data when performance may be close to the chance level (at the beginning of an experiment) or when performance is errorless (sometimes by the end of an experiment). Values of γ less than 1 reflect greater levels of guessing, whereas values above 1 make the predicted probabilities more deterministic (close to 0 or 1). This parameter can be avoided when fitting the data by epochs. The parameter representing the bias for making category responses and the one controlling the frequency of the stimuli were also considered minor in our study (respectively, because the number of positive and negative examples is balanced in 124[8] and almost balanced in 5 − 4, and because the stimuli were presented in each block with equal frequency). 11 In TGCM, the distance function presented in Equation (1) uses an extra attention weight to the temporal dimension. To simplify, and following Brown et al., 2007, p. 544, r was set to 1 for both physical and temporal distances. We now detail why presentation orders can affect performance in this model. According to TGCM, the categorization process is influenced by the temporal contiguity between stimuli. Upon presentation of a stimulus, the psychological distance between the stimulus and the exemplars depends both on the physical dimensions and the temporal dimension. TGCM can be primarily used to account for the distortion of the memory traces based on presentation order. Figure 5 illustrates how presentation orders can affect the pattern of probabilities when some attention is attributed to the temporal dimension (for instance by setting the attention weight vector to w = [.33 .33 .33], with the two first values corresponding to the two physical dimensions and the last value corresponding to the temporal dimension). When w = [.5 .5 0] (left square of Figure 5), the probabilities of classifying each of the positive examples as positive is p = .62, vs. p = .38 for the negative examples (regardless of presentation order because the attention weight for the temporal dimension is set to zero, and with c and γ both set to 1). When the presentation order is fully-blocked (middle square of Figure 5) and when w = [.33 .33 .33], the first presented positive exemplar increases its distinctiveness, contrary to the second with regard to the negative examples (p = .73 for the first, vs. p = .67 for the second). The effect is reversed for the negative exemplars for which the first presented exemplar has a higher probability of being categorized as positive (because it is presented in the middle of a set of positive examples), in contrast to the last presented negative exemplar (p = .27) that is isolated. Therefore, the first and last stimuli are better discriminated from the opposite category and better associated with their correct category (p = .73 and p = .27 respectively). Overall, all examples are expected to be better categorized in this fully-blocked presentation order using w = [.33 .33 .33], compared to the prediction that uses w = [.5 .5 0]: the two probabilities for the positive examples (p = .73 and p = .67) effectively exceed p = .62; likewise for the negative exemplars for which the two probabilities (p = .33 and p = .27) are both lower than p = .38. When the presentation is more disorganized (see the right square in Figure 5), the change in the predicted probabilities is more dramatic, with a higher probability of the first negative example being misclassified (p = .39), and a lower probability of the second positive example (p = .61) being correctly classified. We believe that this simple integration of the temporal dimension in the computation of the similarity matrices for each subject and each epoch can make more accurate predictions than the standard GCM. To test TGCM, the fit values were first computed for each experiment, presentation order, subject, and epoch of 12 MATHY & FELDMAN five blocks, for a total of five epochs. The temporal distances were computed for each block, under the simplifying assumption that the temporal distances were restricted to the block boundaries. Because the distances were computed between one object and the preceding objects only within a block, the distance between every pair of objects was symmetrical. For instance, for three objects presented in a block, the distances between the first and second objects, the first and third objects, and the second and third objects would be 1, 2, and 1 respectively. We therefore hypothesized a participant’s reliance on temporal associations to guide learning, with associations being stronger in the forward than in the backward direction (Kahana, 1996; Kahana, Howard, & Polyn, 2008). This assumption is relevant because each presentation-order-manipulated block was separated by a categorization phase in Experiments 2 and 3 and because of the primacy effect in Experiment 1 (the fact that the first block began with a given set of objects that reappeared in a loop), two different clues that could help participants identify the temporal structures relative to the start and end of the ordered blocks. The assumption is more open to criticism for the 2009 data set, where the subjects had no idea where the blocks began, but we still wanted to systematically run the model in a similar fashion for all data sets. We therefore simply hypothesized that the temporal distances computed for each virtual block in the 2009 data set would still help detect existing temporal patterns. For instance, the fact that exceptions were always distant from the more regular positive objects was such a regularity (even though these two clusters of objects were close in the reverse order when a new block began, the greater distance in the direct order can still produce a clear demarcation between the clusters). Another regularity was that some pairs of stimuli were more often contiguous than chance in a similarity-based presentation order. To scale the temporal distances, we divided each of them by 3.75. Because the maximal temporal distance between two stimuli within a block was 15 (15 being the difference between 1 –the first presented stimulus within a block– and 16 –the last stimulus within a block–), a division by 3.75 allowed for the greatest temporal distances to match the greatest physical distances (i.e., 15/3.75 = 4). This scaling method could be further amended by the attention weighting process described above. Note that because the temporal dimension is fully diagnostic of the categories in a fullyblocked presentation, we computed a mean temporal distance based on the distances in the presentation phase and those in the random categorization phase in the last two experiments. Log-likelihood was used as a measure of goodness-of-fit. For each experiment, subject, epoch, and parameter setting, the likelihood of the data was computed using the binomial distribution: Y Fi ! L= pifi (1 − pi )Fi − fi , f i i (5) with i the stimulus number, fi the number of positive category responses for stimulus i, Fi the number of blocks in an epoch (i.e., the maximal number of positive category responses for stimulus i), and pi the probability given by the model of a positive category response for stimulus i. Computing the log likelihood simply equates to replacing a computation based on a product (to compute the joint likelihood across subjects and across epochs for all models and all sets of parameters) by a computation based on a sum (for an introduction, see Lamberts, 1994). Indeed, it is more convenient and ultimately more accurate to compute the logarithm of the likelihood to avoid undesirable underflow, which can result from constantly multiplying values between 0 and 1. Because GCM is a restricted version of TGCM, a test of the difference in goodness-of-fit between the models was computed using the log-likelihood-ratio statistic: χ2 (d f ) = −2[ln L(restricted) − ln L(general)] (6) The degrees of freedom are the number of parameters that are removed in the restricted model, compared to the general version. Here, d f = 1 because the restricted GCM model does not integrate the temporal dimension present in the more general TGCM model. First, both GCM and TGCM were tested using a fixed sensitivity value and a fixed gamma value (both equal to 1). The objective was to show the role of the temporal dimension for the simplest version of the models. The parameter of interest in the present investigation is w. We focused on searching for the best weight parameters from the 126 possible weight patterns that can be generated using .2 steps (e.g., [1 0 0 0 0], [.8 .2 0 0 0], ..., [.6 .2 .2 0 0], etc.; using .1 steps, the set of 1001 different weight combinations significantly increased the search time of the model-fitting process). All 126 possible weight combinations were tested for each block and each subject. We computed which weight pattern sequence gave the best joint likelihood across the blocks for both models (for a given subject, the best weight patterns could therefore be different between epochs). The sum log-likelihood (across subjects) for TGCM was -6600, -3231, -3444, and -1333 respectively for the four experiments ordered chronologically (i.e. Exp. 2009, Exp. 1, Exp. 2 and Exp. 3.), based on the best attentional weights found for each subject and each epoch. In contrast, the values for GCM were -6989, -3339, -4792, and -1580 respectively. In TGCM, the mean temporal weights averaged across subjects and blocks were .199, .136, .461, and .319 respectively for the four experiments. We obtained a significant log-likelihood-ratio for each experiment (e.g., PRESENTATION ORDERS Figure 5. An example of how the temporal dimension affects the categorization probabilities in a temporal version of GCM. Note. W is a vector of dimension-salience weights. The third value in the vector is the weight given to the temporal dimension, the two first values are the weights for the physical dimensions. The left square shows the probabilities of classifying a stimulus in the positive category (i.e., p(A)), when equivalent attention is allocated to both physical dimensions. For instance, the central square indicates the probabilities of classifying a stimulus in the positive category, when the temporal dimension is given a .33 value, and when stimuli are presented by clusters in a fully-blocked manner (for every square, the value “1” represents the first displayed item, the value “2” represents the second displayed item, and so on). The right square shows the probabilities of classifying each of the stimuli in the positive category when the temporal dimension is given a .33 value, but when stimuli are presented in a more unstructured way (with the negative examples interspersed). χ2 = −2[−3339 − (−3231)] = 216 for Exp. 2, a value significantly higher than the critical 3.84 value). The proportion correct values predicted by TGCM are plotted in Figure 6 for the four conditions studied in Exp. 1 and Exp. 2 (rule-based blocked, similarity-based blocked, rule-based unblocked, similarity-based unblocked respectively). SUSTAIN fit to the Exp. 1 and Exp. 2 data sets Our goal here is to show that the temporal exemplar model that we simulated in the preceding section, although quite powerful, faces concurrent models that can handle presentation orders quite naturally in their present development. For instance, SUSTAIN (Supervised and Unsupervised STratified Adaptive Incremental Network) (Love & Medin, 1998; Love et al., 2004) is a model of category learning by clustering similar stimuli together. The clusters are activated by stimuli and serve as abstractions. If simple solutions prove inadequate, SUSTAIN progressively recruits additional clusters to represent the stimuli. We chose this model because it is known to be susceptible to sequencing effects. We ran this model for Exp. 1. and Exp. 2 using the set of suitable parameters for “All studies”, i.e. Attentional focus r = 2.844642, Cluster competition β = 2.386305, Decision consistency d = 12.0, Learning rate η = 0.09361126 (Love et al., 2004, 13 (a) (b) Figure 6. (a) Proportion correct predicted by TGCM using a limited set of integer values for the sensitivity parameter ranging from 1 to 15 and 126 different weight patterns generated by considering all possible combinations of attentional weights that could be built using .2 increments (e.g., [1 0 0 0 0], [.8 .2 0 0 0], ..., [.6 .2 .2 0 0], based on the presence of four physical dimensions and one temporal dimension). The proportion correct was averaged every five blocks (1-5, 6-10, 11-15, 16-20, 21-25). (b) Proportion correct observed in the data. The continuous curves correspond to the blocked conditions of Exp. 3. The discontinuous curves correspond to the unblocked conditions of Exp. 2. p. 313). The proportion correct values predicted by SUSTAIN are plotted in Figure 7. In its present development and given the standard implementation we chose, the model cannot compete with TGCM as the variance explained is lower, but still, all the manipulated presentation orders in Exp. 1 and Exp. 2 (rule-based blocked, similarity-based blocked, rule-based unblocked, similarity-based unblocked respectively) led to better performance than our simulated random presentations (estimated on the basis of 1000 samples 14 MATHY & FELDMAN Figure 7. Proportion correct predicted by SUSTAIN using the set of suitable parameters for “All studies”: Attentional focus r = 2.844642, Cluster competition β = 2.386305, Decision consistency d = 12.0, Learning rate η = 0.09361126 (Cf. Love et al. (2004) p. 313). of random-based presentation orders, and five epochs of five blocks each). This result confirms the preceding observations that similarity-based presentations lead to better performance than random presentations. Furthermore, the model demonstrated its ability to predict the pattern we observed in our data (rule-based blocked < similarity-based blocked < rulebased unblocked < similarity-based unblocked) with substantial variations between the four conditions, although the curves were closer than those observed in our data. Therefore, this simple simulation shows that SUSTAIN is able to predict the benefit of rule- and similarity-based presentation order types, without even modifying a fixed set of parameters known to function for other studies. Table 2 shows that with a more complex search of parameters, SUSTAIN still predicts a lesser portion of variance than TGCM, and still produces greater AIC and BIC values (two computations often used in model selection that measure the relative goodness of fit of a statistical model while penalizing the number of parameters; higher values denote a lack-of-fit of the chosen model). Hypothetical lower AIC and BIC values for SUSTAIN (with a similar portion of explained variance) would have meant that TGCM tended to over-fit our data and that SUSTAIN was a more concise model and a better candidate for applying the model to future data. General Discussion Previous studies of inductive learning have not primarily focused on the order in which examples are actually encoun- tered (Komatsu, 1992; Kruschke, 2005; Murphy, 2002). Our research addressed the question of whether the manipulation of presentation orders based on rule vs. exemplar learning assumptions can be beneficial to category learning. This issue was first addressed by Mathy and Feldman (2009) who reported an experiment in which a rule-based order better facilitated category learning in comparison to simply maximizing inter-item similarity (Elio & Anderson, 1981, 1984; Medin & Bettger, 1994). In order to enhance presentation order effects in the present study, further manipulation was introduced by maintaining constant presentation orders across blocks for every subject (Exp. 1, Exp. 2, and Exp. 3), and by making use of fully-blocked presentation orders to separate the positive examples from the negative examples during the training phases (Exp. 2, and Exp. 3). We also hypothesized that subjects in the rule-based condition would exhibit generalization patterns consistent with rule-based retrieval in our third experiment, in which the subjects were trained with the 5-4 category structure and tested in a subsequent transfer phase with all the old stimuli and six new stimuli. Our main results are a systematic positive effect of the rule-based presentation order over the similarity-based presentation in our three experiments, a positive effect of fullyblocked orders (when contrasting the results of Exp. 1 with those of Exp. 2, although we signaled a potential confound due to the greater frequency of stimuli in Exp. 2), and patterns of generalization induced by presentation orders (Exp. 3). Lastly, the effect of maintaining constant presentation orders (versus mixing orders of one kind across blocks, e.g., Mathy & Feldman, 2009) was inconclusive. Overall, our results indicate that presentation-order types affect how categories are represented (Ashby & Ell, 2001; Sloman, 1996), which does not tend to validate a unitary view of how classification functions (Pothos, 2005). Our results support an intricate relationship between time and categorization, which adds to the studies on restructuration or initial training (Lewandowsky, Kalish, & Griffiths, 2000; Lee et al., 1988; Spiering & Ashby, 2008). An extension of the Generalized Context Model (Nosofsky, 1984, 1986; Nosofsky, Gluck, et al., 1994) called Temporal-GCM was described and fit to our data in order to account for both primacy and recency effects within blocks. This model allows for a simple computation of the temporal proximity between stimuli, in order to account for the distortion of the psychological space generated by presentation orders. This extension leads to a better explanation of the variance when compared to the more restricted GCM model. Our results also suggest that categorization performance is driven by a process of rule-based abstraction that can be accounted for by a model such as SUSTAIN, which is incremental by nature (Love et al., 2004). The main limitation of this study is that we did not test prototype (Ashby & Maddox, 1993; Minda & Smith, 2001, 2002; Nosofsky & Zaki, PRESENTATION ORDERS 15 Table 2 Maximum likelihood estimation, Akaike Information Criterion, and Bayesian Information Criterion, computed for Exp. 1 and Exp. 2 combined. TGCM GCM SUSTAIN k 5 4 4 θ̂mle -4026 -4299 -8427 AIC 8063 8605 16863 BIC (N = 5440) 8096 8631 16890 R2 86.8 86.1 64.7 Note. Each model was fit to 5440 data points corresponding to the proportion p(A) that a stimulus was categorized as a positive member by the 68 subjects across 5 epochs of 5 blocks (16 × 68 × 5 = 5440) of Exp. 1 and Exp. 2; k, number of parameters free to vary; θ̂mle , maximum likelihood estimation; AIC, Akaike Information Criterion; BIC, Bayesian Information Criterion; R2 , percent of variance explained. Probabilities p(A) predicted by TGCM used a limited set of integer sensitivity values ranging from 1 to 15 and 126 different weight patterns generated by considering all possible combinations of weights that could be built using .2 increments (e.g., [1 0 0 0 0], [.8 .2 0 0 0], ..., [.6 .2 .2 0 0]. Probabilities p(A) predicted by GCM used the same combinations of parameters except that the weight patterns for which attention to the temporal dimension was above zero were removed from the computation. Maximum likelihood was estimated in TGCM and GCM for each subject and each epoch using the best combination of parameters. SUSTAIN was run with integer values for the attentional focus r ranging from 1 to 10, with integer values for the Cluster competition β parameter ranging from 1 to 10, with Decision consistency d values taken from [1, 5, 10, 15, 20], and with Learning rate η values taken from [.10,.15,.20,.25,.30,.35,.40,.45]. Maximum likelihood was estimated in SUSTAIN for each subject across epochs using the best combination of parameters. 2002; Osherson & Smith, 1981; J. D. Smith & Minda, 1998; Zaki et al., 2003) and hybrid models (Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Anderson & Betz, 2001; Erickson & Kruschke, 1998; Goodman, Tenenbaum, Feldman, & Griffiths, 2008; Nosofsky, Palmeri, & McKinley, 1994; Rosseel, 2002; E. E. Smith & Sloman, 1994; Vandierendonck, 1995; Vanpaemel & Storms, 2008), which are also extensively referenced in the categorization literature. Another limitation is that we did not compare TGCM with other exemplar models known to handle trial-by-trial changes in category representations (Kruschke, 1992, 1996; Nosofsky et al., 1992), a comparison which we believe merits a fully-developed explanation of how the models differ in their structure. We conclude with several speculations on the mechanisms involved in subjects’ performance. Firstly, the detrimental effect of similarity-based orders might be attributed to the overly specific hypotheses they tend to induce in the subjects’ mind, a mechanism in the formation of abstraction that can be accounted for by SUSTAIN. A second possibility is that examples are associated on the basis of both time and features (a mechanism which is more consistent with TGCM). Still, our data also suggest that the effect of the rule-based presentation may be solely due to the isolation of the exceptions in the sequence orders (an effect known as the von Restorff effect, Restorff, 1933, which might apply to categorization). A third possible explanation for the easier rulebased condition is that the random aspect of sampling within clusters helps in the abstraction of the diagnostic features. Our results confirm certain observations that have been made on the relationship between temporal distinctiveness and memory retrieval. For instance, Kahana (1996) reanalyzed a number of classic free-recall studies and showed that temporal contiguity clearly determines some associative mechanisms, leading to a process of episodic clustering. The neighbor items studied in list positions tend to be reported successively and more rapidly during the recall period, regardless of their degree of semantic association. Accordingly, if rule learning functions as a discrimination problem, abstraction is easier whenever within-cluster stimuli are not easily discriminable due to their temporal contiguity, and whenever between-cluster stimuli are easily discriminable within the time scale. Brown et al. (2007) have also shown such a dependency between time and discriminability in their model of memory retrieval that involves temporal discrimination as a core principle to account for the fact that forgetting is due to reduced local distinctiveness. Our study also follows up the series of research on category discovery that took place in the 1950’s, particularly those focusing on the effect of the informativeness of order, with some instances eliminating more possible concepts than others (Bruner, Goodnow, & Austin, 1956; Hovland & Weiss, 1953). Our results suggest that in blocked presentations, subjects have difficulty learning concepts from negative instances even though each category transmits the same amount of information (Hovland & Weiss, 1953). Also, the effect of blocked orders (or simply the effect of mass- 16 MATHY & FELDMAN ing the presentation of positive examples) has been found in previous research on the learning of paired-associate lists (Gagné, 1950), on supervised concept learning (Kurtz & Hovland, 1956; Goldstone, 1996), unsupervised concept learning (Clapper & Bower, 1994, 2002; Zeithamova & Maddox, 2009), incidental concept learning (Wattenmaker, 1993), and clustering (Gmeindl, Walsh, & Courtney, in press). The fact that blocked categories benefit learning does not, however, fit with the quasi-ubiquitous finding that there are benefits to distributed practice compared to massed practice (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006), even though massing apparently creates a sense of fluent learning (Kornell & Bjork, 2008; Kornell, Castel, Eich, & Bjork, 2010; Wahlheim, Dunlosky, & Jacoby, 2011 in press). Conclusion Previous results on the influence of presentation orders have suggested that spacing facilitates induction (Kornell & Bjork, 2008), but that optimal training procedures also depend on the nature of the categories being learned (Spiering & Ashby, 2008). Other research has revealed a more specific influence of order types (e.g., rule-based, similarity-based) on categorization learning (Elio & Anderson, 1981; Mathy & Feldman, 2009; Medin & Bettger, 1994). Whether presentation orders are useful for investigating categorization processes, and whether spacing aids or impairs induction, are both complex issues that we believe could be further addressed by studying the effects of presentation order on different concept types in more detail. References Allen, S. W., & Brooks, L. R. (1991). Specializing the operation of an explicit rule. Journal of Experimental Psychology: General, 120, 3-19. Anderson, J. R., & Betz, J. (2001). A hybrid moel of categorization. Psychonomic Bulletin & Review, 8(4), 629–647. Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological theory of multiple systems in category learning. Psychological Review, 105, 442-481. Ashby, F. G., & Ell, S. W. (2001). The neurobiology of human category learning. Trends in Cognitive Sciences, 5(5), 204-210. Ashby, F. G., & Maddox, W. T. (1993). Relations between prototype, exemplar, and decision bound models of categorization. Journal of Mathematical Psychology, 37(372-400). Bjork, R. A., & Whitten, W. B. (1974). Recency-sensitive retrieval processes in long-term free recall. Cognitive Psychology, 6, 173 - 189. Brown, G. D. A., Neath, I., & Chater, N. (2007). A temporal ratio model of memory. Psychological Review, 114, 539-576. Bruner, J., Goodnow, J., & Austin, G. (1956). A study of thinking. New York: Wiley. Burgess, N., & Hitch, G. (1999). Memory for serial order: A network model of the phonological loop and its timing. Psychological Review, 106, 551-581. Busemeyer, J. R., & Myung, I. J. (1988). A new method for investigating prototype learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 3 - 11. Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354-380. Clapper, J. P., & Bower, G. H. (1994). Category invention in unsupervised learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 443-460. Clapper, J. P., & Bower, G. H. (2002). Adaptative categorization in unsupervised learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 908-923. Cohen, A. L., & Nosofsky, R. M. (2003). An extension of the exemplar-based random-walk model to separable-dimension stimuli. Journal of Mathematical Psychology, 47, 150-165. Crawford, L. E., & Duffy, S. (2010). Sequence effects in estimating spatial location. Psychonomic Bulletin & Review, 17, 725-730. Elio, R., & Anderson, J. R. (1981). Effects of category generalizations and instance similarity on schema abstraction. Journal of Experimental Psychology: Human Learning and Memory, 7, 397-417. Elio, R., & Anderson, J. R. (1984). The effects of information order and learning mode on schema abstraction. Memory & Cognition, 12, 20–30. Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in category learning. Journal of Experimental Psychology: General, 127, 107-140. Feldman, J. (2003). A catalog of Boolean concepts. Journal of Mathematical Psychology, 47, 75-89. Gagné, R. M. (1950). The effect of sequence of presentation of similar items on the learning of paired associates. Journal of Experimental Psychology, 40, 61 - 73. Garner, W. (1974). The processing of information and structure. Potomac, MD: Erlbaum. Gmeindl, L., Walsh, M., & Courtney, S. M. (in press). Binding serial order to representations in working memory: a spatial/verbal dissociation. Memory & Cognition. Goldstone, R. L. (1994). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123, 178-200. Goldstone, R. L. (1996). Isolated and interrelated concepts. Memory & Cognition, 24(608-628). Goodman, N. D., Tenenbaum, J. B., Feldman, J., & Griffiths, T. L. (2008). A rational analysis of rule-based concept learning. Cognitive Science, 32, 108-154. Hahn, U., & Chater, N. (1998). Similarity and rules: distinct? exhaustive? empirically distinguishable? Cognition, 65, 197-230. Homa, D., Rhoads, D., & Chambliss, D. . (1979). Evolution of conceptual structure. Journal of Experimental Psychology: Human Learning and Memory, 5, 11–23. Hovland, C. I., & Weiss, W. (1953). Transmission of information concerning concepts through positive and negative instances. Journal of Experimental Psychology, 45, 175-182. Howard, M. W., & Kahana, M. J. (2002). A distributed representa- PRESENTATION ORDERS tion of temporal context. Journal of Mathematical Psychology, 46, 269 - 299. Johansen, M. K., & Kruschke, J. K. (2005). Category representation for classification and feature inference. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1433-1458. Johansen, M. K., & Palmeri, T. J. (2002). Are there representational shifts during category learning? Cognitive Psychology, 45(482553). Jones, M., Love, B. C., & Maddox, W. T. (2006). Recency effects as a window to generalization: separating decisional and perceptual sequential effects in category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 316-332. Jones, M., & Sieck, W. R. (2003). Learning myopia: An adaptive recency effect in category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 626 - 640. Kahana, M. J. (1996). Associative retrieval processes in free recall. Memory & Cognition, 24(103-109). Kahana, M. J., Howard, M. W., & Polyn, S. M. (2008). Associative retrieval processes in episodic memory. In H. L. Roediger (Ed.), Cognitive psychology of memory. vol. 2 of learning and memory: A comprehensive reference, 4 vols (j. byrne, editor-inchief). Oxford: Elsevier. Komatsu, L. K. (1992). Recent views of conceptual structure. Psychological Bulletin, 112, 500-526. Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: is spacing the ”enemy of induction”? Psychol Sci, 19, 585-592. Kornell, N., Castel, A. D., Eich, T. S., & Bjork, R. A. (2010). Spacing as the friend of both memory and induction in young and older adults. Psychol Aging, 25, 498-503. Kruschke, J. K. (1992). Alcove: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44. Kruschke, J. K. (1996). Dimensional relevance shifts in category learning. Connection Science, 8, 225-247. Kruschke, J. K. (2005). Category learning. In K. Lamberts & R. L. Goldstone (Eds.), The handbook of cognition, ch. 7 (p. 183-201). London: Sage. Kurtz, K. H., & Hovland, C. I. (1956). Concept learning with differing sequences of instances. Journal of Experimental Psychology, 51, 239-243. Lafond, D., Lacouture, Y., & Mineau, G. (2007). Complexity minimization in rule-based category learning: Revising the catalog of boolean concepts and evidence for non-minimal rules. Journal of Mathematical Psychology, 51, 57-74. Lamberts, K. (1994). Flexible tuning of similarity in exemplarbased categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1003-1021. Lamberts, K. (2000). Information-accumulation theory of speeded categorization. Psychological Review, 107, 227-260. Lee, E. S., MacGregor, J. N., Bavelas, A., Mirlin, L., Lam, N., & Morrison, I. (1988). The effects of error transformations on classification performance. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 66 - 74. Lewandowsky, S., Kalish, M., & Griffiths, T. L. (2000). Competing strategies in categorization: Expendiency and resistance to knowledge restructuring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1666-1684. 17 Love, B. C., & Medin, D. (1998). SUSTAIN: A model of human category learning. In C. Rich & J. Mostow (Eds.), Proceedings of the Fifteenth National Conference on Artificial Intelligence (p. 671-676). Cambridge, MA: MIT Press. Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A network model of category learning. Psychological Review, 111, 309-332. Mathy, F., & Bradmetz, J. (2004). A theory of the graceful complexification of concepts and their learnability. Current Psychology of Cognition, 22, 41-82. Mathy, F., & Feldman, J. (2009). A rule-based presentation order facilitates category learning. Psychonomic Bulletin & Review, 16, 1050-1057. McKinley, S. C., & Nosofsky, R. M. (1995). Investigations of exemplar and decision bound models in large, investigations of exemplar and decision bound models in large, ill-defined category structures. Journal of Experimental Psychology: Human Perception and Performance, 21(128-148). Medin, D. L., & Bettger, J. G. (1994). Presentation order and recognition of categorically related examples. Psychonomic Bulletin & Review, 1, 250-254. Medin, D. L., & Schaffer, M. (1978). A context theory of classification learning. Psychological Review, 85, 207-238. Minda, J. P., & Smith, J. D. (2001). Prototypes in category learning: The effects of category size, category structure, and stimulus complexity. Journal of Experimental Psychology, 27(3), 775–799. Minda, J. P., & Smith, J. D. (2002). Comparing prototype-based and exemplar-based accounts of category learning and attentional allocation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 275-292. Murdock, B. B. (1960). The distinctiveness of stimuli. Psychological Review, 67, 16- 31. Murphy, G. L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Navarro, D. J. (2007). On the interaction between exemplar-based concepts and a response scaling process. Journal of Mathematical Psychology, 51, 85-98. Nosofsky, R. M. (1984). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10(1), 104-114. Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39-57. Nosofsky, R. M., Gluck, M. A., Palmeri, T. J., McKinley, S. C., & Gauthier, P. (1994). Comparing models of rules-based classification learning: A replication and extension of Shepard, Hovland, and Jenkins (1961). Memory & Cognition, 22, 352-369. Nosofsky, R. M., Kruschke, J. K., & McKinley, S. C. (1992). Combining exemplar-based category representations and connectionist learning rules. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 211-233. Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Ruleplus-exception model of classification learning. Psychological Review, 101, 53-79. Nosofsky, R. M., & Zaki, S. R. (2002). Exemplar and prototype models revisited: Response strategies, selective attention, and 18 MATHY & FELDMAN stimulus generalization. Journal of Experimental Psychology: Learning, Memory, and Cognition(28), 924-940. Osherson, D. N., & Smith, E. E. (1981). On the adequacy of prototype theory as a theory of concepts. Cognition, 9, 35-58. Pothos, E. M. (2005). The rules versus similarity distinction. Behavioral and Brain Sciences, 28(1), 1-14. Rehder, B., & Hoffman, A. B. (2005). Thirty-something categorization results explained: selective attention, eyetracking, and models of category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 811-829. Restorff, H. von. (1933). Uber die wirkung von bereichsbildungen im spurenfeld [on the influence of the segregation in the trace field]. Psychologische Forschung, 18, 299-34. Rips, L. J. (1989). Similarity, typicality, and categorization. In S. Vosniadou & A. Ortony (Eds.), Simlilarity and analogical reasoning ( pp. 21-59). Cambridge, MA: Cambridge University Press. Rosch, E., & Mervis, C. (1975). Family resemblances : studies in the internal structure of categories. Cognitive Psychology, 7, 573-605. Rosseel, Y. (2002). Mixture models of categorization. Journal of Mathematical Psychology, 46, 178-210. Sakamoto, Y., Jones, M., & Love, B. C. (2008). Putting the psychology back into psychological models: Mechanistic versus rational approaches. Memory & Cognition, 36(1057-1065). Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317-1323. Shepard, R. N., Hovland, C. L., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs, 75, 13, whole No. 517. Skorstad, J., Gentner, D., & Medin, D. L. (1988). Abstraction processes during concept learning: a structural view. In Proceedings of the 10th Annual Conference of the Cognitive Science Society (p. 419-425). Hillsdale, NJ: Lawrence Erbaum Associates. Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119, 3-22. Smith, E. E., Patalano, A. L., & Jonides, J. (1998). Alternative strategies of categorization. Cognition, 65, 167-196. Smith, E. E., & Sloman, S. A. (1994). Similarity- vs. rule-based categorization. Memory & Cognition, 22(4), 377–386. Smith, J. D., & Minda, J. P. (1998). Prototypes in the mist: The early epochs of category learning. Journal of Experimental Psy- chology: Learning, Memory, and Cognition, 24, 1411–1436. Smith, J. D., & Minda, J. P. (2000). Thirty categorization results in search of a model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 3-27. Spiering, B. J., & Ashby, F. G. (2008). Initial training with difficult items facilitates information integration, but not rule-based category learning. Psychological Science, 19, 1169-1177. Stewart, N., Brown, G. D. A., & Chater, N. (2002). Sequence effects in categorization of simple perceptual stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 3-11. Thibaut, J. P., Dupont, M., & Anselme, P. (2002). Dissociations between categorization and similarity judgments as a result of learning feature distributions. Memory & Cognition, 30, 647656. Thibaut, J.-P., & Gelaes, S. (2006). Exemplar effects in the context of a categorization rule: Featural and holistic influences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 1403 - 1415. Vandierendonck, A. (1995). A parallel rule acti vati on and rule s ynt he s i s mo de l a parallel rule activation and rule synthesis model for generalization in category learning. Psychonomic Bulletin & Review, 2, 442-459. Vanpaemel, W., & Storms, G. (2008). In search of abstraction: the varying abstraction model of categorization. Psychonomic Bulletin & Review, 15, 732-749. Wahlheim, C. N., Dunlosky, J., & Jacoby, L. L. (2011 in press). Spacing enhances the learning of natural concepts: an investigation of mechanisms, metacognition, and aging. Memory & Cognition. Wattenmaker, W. D. (1993). Incidental concept learning, feature frequency, and correlated properties. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 203-222. Zaki, S. R., Nosofsky, R. M., Stanton, R. D., & Cohen, A. L. (2003). Prototype and exemplar accounts of category learning and attentional allocation: a reassessment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1160-1173. Zeithamova, D., & Maddox, W. T. (2009). Learning mode and exemplar sequencing in unsupervised category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 731-741.
Similar documents
Serial-order Effects on Category Learning Fabien Mathy Jacob
2012; Weitnauer, Carvalho, Goldstone, & Ritter, 2013). The Jacob Feldman. We are grateful to Azizedine Elmahdi and Nicolas interleaved vs. blocked factor has extended previous research Heller for a...
More informationThe Influence of Presentation Order on Category Transfer Fabien
received course credits in exchange for their participation. Choice of categories studied. Each participant was administered a single 5-4 category set. The 5-4 is shown in the bottom hypercube of F...
More information