Presentation Order Effects on Category Learning and Category

Transcription

Presentation Order Effects on Category Learning and Category
Preprint; please don’t quote
Presentation Order Effects on Category Learning and Category Generalization
Fabien Mathy
Jacob Feldman
Université de Franche-Comté
Rutgers University – New Brunswick
We investigate the effect of presentation order on category learning in order to differentiate the
predictions of several classes of learning models. Exp. 1, we examined the effect of blocked
orders (i.e., constant across blocks) on category learning, with the negative examples interspersed in the blocks. In Exp. 2 and Exp. 3, constant fully-blocked orders were administered (i.e., with all the negative examples clearly separated from the positive examples in
each block). In comparison to a similarity-based presentation order (previously found more
advantageous than a random order, Medin & Bettger, 1994), a rule-based presentation order
systematically facilitated learning (in agreement with Mathy & Feldman, 2009), especially in
the fully-blocked conditions. Applying the fully-blocked orders to the 5-4 category structure in
Exp. 3, we also observed different generalization patterns depending on whether orders were
similarity- or rule-based. Existing models cannot account for these results without modification. We propose an extension of the Generalized Context Model (GCM), which allows for
a simple computation of the temporal proximity between the stimuli in order to account for
the distortion of the stimulus space generated by order manipulation. Although this extension
leads to a better explanation of the variance when compared to a more restricted GCM model,
our results also suggest that categorization learning involves a process of rule-based abstraction
that we here simulate using SUSTAIN.
als use rules, exemplars or both to mentally represent category objects (Allen & Brooks, 1991; Ashby & Ell, 2001;
Goldstone, 1994; Hahn & Chater, 1998; Homa, Rhoads, &
This study extends previous research examining whether
Chambliss, 1979; Komatsu, 1992; Pothos, 2005; Rips, 1989;
supervised category learning is influenced by the order in
Rosch & Mervis, 1975; E. E. Smith, Patalano, & Jonides,
which examples are presented (Busemeyer & Myung, 1988;
1998; E. E. Smith & Sloman, 1994; Thibaut, Dupont, &
Crawford & Duffy, 2010; Elio & Anderson, 1981, 1984;
Anselme, 2002; Thibaut & Gelaes, 2006). Mathy and FeldJones & Sieck, 2003; Jones, Love, & Maddox, 2006; Medin
man (2009) recently showed that a rule-based presentation
& Bettger, 1994; Nosofsky, Kruschke, & McKinley, 1992;
order (based on a small number of rule-learning assumpStewart, Brown, & Chater, 2002; Skorstad, Gentner, &
tions) yields superior learning compared to the similarityMedin, 1988). Medin & Bettger for instance reported
based order (maximizing the adjacency of the training examthat maximizing the similarity between successive examples) previously found to be most advantageous in artificial
ples leads to more efficient learning. Similar efficiency has
classification tasks (Elio & Anderson, 1981, 1984; Medin
been observed in studies manipulating the alternation of the
& Bettger, 1994). Each type of presentation order could be
contrasting categories (Clapper & Bower, 1994; Goldstone,
expected to favor the corresponding type of mental represen1996). The manipulation of presentation orders can be espetation (a rule-based presentation might favor the abstraction
cially beneficial to the testing of categorization models that
of rules over the memorization of exemplars), but we would
emphasize incremental learning (Sakamoto, Jones, & Love,
argue that any class of models (rule-based, exemplar-based)
2008; Love, Medin, & Gureckis, 2004) or category contrast
should be able to give a satisfying account of performance
effects (Stewart et al., 2002).
in all conditions. For example, an suitable rule-based model
We addressed the question of whether similar items tend
might account for the inferior performance in the similarityto be stored in close proximity to one another. This first
based condition based on the idea that an erratic presentation
requires clarification of the structure on which the proximorder promotes the induction of invalid hypotheses that evenity notion is based. Indeed, it is unclear whether individutually delay learning.
In the rule-based orders constructed in our previous work
(Mathy & Feldman, 2009), the stimuli were ordered from
the biggest clusters to the smallest ones (following a putaThis research was supported by the Agence Nationale pour
tive rule + exception structure), and presentation within clusla Recherche (ANR) Grant # ANR-09-JCJC-0131-01 to Fabien
ters was random in accordance with a rule-abstraction proMathy. We are grateful to Azizedine Elmahdi and Nicolas Heller for
cess that supposedly impedes stimulus singularity. Because
assistance in data analysis. Correspondence concerning this article
should be addressed to Fabien Mathy, 30-32 rue Mégevand, 25030 1 the current experiments are closely related, in the Methods
Besançon Cedex, France. or by e-mail at fabien.mathy@univsection of Exp. 1 we briefly review the procedure in those
fcomte.fr.
studies, as well as the specific concept that was learned by
2
MATHY & FELDMAN
their subjects. Figure 2A shows the proportion of correct
responses as a function of block number observed by Mathy
& Feldman. The top curve relates to the group who learned a
concept in a rule-based order; the bottom curve corresponds
to the learning of the same concept in a similarity-based order. In that paper we that the nature of the rule-based order provided a learning advantage beyond that provided by
inter-item similarity, especially because inter-item similarity
was indeed higher in the similarity-based order. However, in
several respects the design probably complicated the modeling of presentation-order effects, and possibly flattened the
effects of each presentation type. First, presentation orders
varied across blocks: each new block was newly randomized (within the constraints of the respective desired order)
in order to avoid the repetition of identical sequences of category responses across blocks. Second, the negative examples
were randomly interleaved with the positive, which somewhat confused the intended order. Third, the negative examples were not ordered, despite their possible effect on category formation.
In the present study, which builds directly from this earlier
work, the training blocks were arranged in order to remove
a maximum number of random variations in the manipulated
orders. The participants in the first experiment were each administered a series of fixed/blocked orders across blocks (the
orders were constant for each subject, but varied between
subjects), with the negative examples interspersed (e.g., + +
- + - + + + - - + - - - + -). A second experiment used fullyblocked orders across blocks (the negative examples were all
presented after a positive, + + + + + + + + - - - - - - - -). We
hypothesized that constant orders would enhance presentation order effects, especially when blocked. In our third experiment a 5-4 category structure (Medin & Schaffer, 1978)
was administered using fully-blocked orders, with a view to
studying how individual subjects generalize their acquired
category knowledge to the to-be-classified transfer items. We
also hypothesized that subjects in the rule-based condition
would exhibit generalization patterns consistent with rulebased retrieval.
To model the subjects’ performances, we developed an
exemplar model which capitalized on the temporal dimension to assess the distinctiveness between stimuli. The model
aims to account for contiguity effects in the same way as serial position effects can be modeled as discrimination problems in serial recall tasks (Bjork & Whitten, 1974; Brown,
Neath, & Chater, 2007; Murdock, 1960). Accordingly, presentation orders can considerably modify the perceived similarity between exemplars and subsequently change their classification. Although this extension leads to a satisfactory explanation of the variance, our results also suggest that categorization learning can involve a process of rule-based abstraction, which we simulate here by running SUSTAIN (Love
& Medin, 1998), a clustering model that detects sequencing
effects with no modification to its implementation.
The three experiments are presented successively, followed by a single modeling section that includes the three
data sets and some of Mathy & Feldman’s data.
Experiment 1
Method
This experiment is identical to that of Mathy and Feldman
(2009), except that we used constant presentation orders for
each subject, manipulated the presentation order of the negative examples, and only administered one concept to subjects
instead of two. Our aim was to simplify their procedure to
obtain more stable data. The procedure in Mathy and Feldman (2009) is presented in parallel to Experiment 1 to avoid
duplication. Their earlier data are reanalysed and compared
to those of the first experiment in the Results section.
Participants.
The subjects were 22 Freshman or
Sophomore students from the University of Franche-Comté
(France), who received course credits in exchange for their
participation. The subjects were randomly assigned to the
experimental conditions.
Procedure. Tasks were computer-driven. Participants
were individually tested during a single one-hour session (including briefing and debriefing). The participants sat approximately 60-cm from the computer display and were given
a tutorial before the task began. Each participant was then
asked to learn a single concept and was administered one
constant presentation order across blocks until the learning
criterion was met. There was no warmup session (such as
learning a simple one-dimensional concept) in order to prevent participants from believing that the task would consist
of searching for simplistic rules.
Stimulus objects were presented one at a time on the top
half of the computer screen. After each response, feedback
indicating a correct or incorrect classification appeared on
the bottom of the screen for two seconds. The positive and
negative categories corresponded respectively to the up and
down keys, and to two category pictures on the right hand
side of the screen. In the category frame, a school bag was
displayed at the top, and a trash can at the bottom (to match
the response keys). Each time a response key was pressed,
the corresponding picture appeared for two seconds along
with the feedback, while the opposite picture disappeared for
two seconds. The two category pictures reappeared whenever a new stimulus was presented.
To encourage learning, each correct response scored the
subjects one point on a progress bar. To regulate the learning
process, each response had to be given in less than eight seconds (making a maximum of 10 seconds between two stimuli, at which point a “Too late” message appeared for two
seconds). If the response was too late, participants lost three
PRESENTATION ORDERS
points on the progress bar. The number of empty boxes in the
progress bar was 4 × 2D (D = number of dimensions, which
was equal to four in our study). One empty box was filled
whenever a correct response was given, but the progress bar
was reset if an incorrect response was given. This criterion
was identical to the one used by Shepard, Hovland, and Jenkins (1961) in their first experiment and by Mathy and Feldman (2009). Consequently, subjects had to correctly classify
stimuli in four consecutive blocks of 2D stimuli to complete
the experiment. This setting required them to correctly classify all the stimuli, including those considered as exceptions
(in accordance with the rules terminology employed below).
This intentionally limited the number of partial strategies that
could provide partial solutions such as being able to classify
stimuli on the basis of a limited number of features with less
than 100% accuracy.
Choice of concepts studied. Participants were each administered a single concept defined over four Boolean dimensions. We chose to restrict our experiment to the most
complex concept studied by Mathy and Feldman (2009), for
which the presentation orders provided a substantial benefit. Following the classification of Feldman (2003), this concept is called 124[8] (Fig. 1) to indicate that it is the 12th of a
set of 4-dimensional concepts made of 8 positive examples.
The choice of a four-dimensional concept can easily be justified by the fact that the number of objects to be classified
(24 = 16) is large enough to identify any presentation order
effects, but small enough to be learnable. As described below, this concept presents an interesting category structure.
The top of Fig. 1 shows a four-dimensional hypercube
made of 24 = 16 stimuli encoded from 0000 (standing for
a0 b0 c0 d0 ) to 1111 (standing for abcd). The 124[8] concept is
shown in an arbitrary rotation in the second hypercube of
Fig. 1, in which the eight positive examples are indicated by
black circles.1 The reasons why this concept has interesting
properties are detailed in Mathy and Feldman (2009). They
can be summarized as 1) the concept is moderately complex and 2) the concept has a substructure made of several
well-defined clusters. Cluster 1 represents six of the concept’s eight members. These six objects can collectively
be represented by a verbal expression such as “all a0 except bc”. These objects are circled in red in Fig. 1. In
contrast, Clusters 2 and 3 both consist of only one object,
each requiring four literals in order to be identified (abc0 d0
and ab0 cd0 respectively). Thus Cluster 1 plays the role of
a salient “rule”, whereas Clusters 2 and 3 play the role of
“exceptions”. Concerning the negative examples, the second
cluster was defined by the negation of (bc)0 on the a0 dimension (i.e., the negation of the first cluster of the positive examples), i.e., a0 ((bc)0 )0 or simply a0 bc, comprising the examples
0110 and 0111. The first cluster was defined by the negation
of d0 (bc0 + b0 c) on the a dimension (i.e., the negation of the
3
second and third clusters of the positive examples), that is,
a(d0 (bc0 + b0 c))0 , comprising the rest of the negative examples. The negative clusters were therefore simply regarded
as an inversion of the positive clusters.
Stimuli. Stimulus objects varied across four Boolean dimensions (Shape, Color, Size, and Filling texture). Rotation
and permutation were randomized in our experiment for each
subject, meaning that dimension a could be any of one of the
four dimensions, and that features within dimensions were
randomly drawn and permuted (for instance, a0 = blue and
a = red, or a0 = red and a = blue, or a0 = green and
a = red, etc.). Two values for each feature were chosen
randomly (shape = triangle, square, or circle; color = blue,
pink, red, or green; filling = hatched or grilled; size = small
or big). The combination of these four separable dimensions
(Garner, 1974) overall formed 16 single unified objects (e.g.,
a small hatched red square, a big grilled blue circle, etc.).
Ordering of stimuli. We retained the two presentation orders that best facilitated learning from the study by Mathy
& Feldman: a rule-based order and a similarity-based order
(their dissimilarity-based order was left out to limit the number of experimental conditions). Presentation order was a
between-subject manipulation. One presentation order was
randomly chosen for a given subject beforehand and then
constantly applied across the blocks until the learning criterion was met.
In the rule-based order, positive objects were randomly
drawn from Cluster 1 until all six had been presented. This
was followed by the positive objects in Cluster 2, then Cluster 3. Thus in the rule-based order, all members of the biggest
cluster per category were presented first, in a random order
but separated from exceptional members, in order to encourage subjects to abstract the simplest rules. The presentation within clusters was random in accordance with a ruleabstraction process that is supposed to impede stimulus singularity. The ordering procedure was similar for the negative
examples, starting with the six Cluster 1 examples, followed
by the two Cluster 2 examples. Following Mathy and Feldman (2009), we hypothesized that the division of 124[8] into
these particular clusters would be beneficial to learning. Table 1 shows a possible rule-based presentation order for Exp.
1.
In the similarity-based order, the first object in a given category was chosen at random, and subsequent objects of the
same category were chosen randomly from those with maximal similarity to the previous object, and so forth until the
set of examples had been exhausted. Ties were resolved randomly. Similarity was computed on a trial-by-trial basis so
1
These hypercubes, also known as Hasse diagrams, are extremely useful for quickly looking at the conceptual structures,
which do not easily appear in the corresponding truth tables.
4
MATHY & FELDMAN
7-0110
3-0100
16-1111
11-0101
4-1100
12-1101
c
b
b'
15-0111
8-1110
c'
a'
5-0010
a
13-0011
6-1010
9-0001
2-1000
1-0000
d'
14-1011
10-1001
Possible representations of Cluster 2
B9
B6
T14 T11
B8
T13
Figure 1.
A4
T10
A1
A5
B7
T12
T15
A2
n
X
di j = [ |xia − x ja |r ]1/r
(1)
a=1
d
T16
as to maximize inter-item similarity locally. This method did
not guarantee maximized inter-item similarity over an entire
block, but it did offer a greater number of possible orders.
The same procedure was applied to the negative examples.
Dissimilarity between two stimuli i and j was computed by
the Minkowski metric
A3
Concept 124[8] and Concept 5−4. Note. The positive examples of a given
concept are indicated by black circles in the hypercube, whereas negative examples are
represented by empty vertices. The examples are also all listed in Table 1. For both
concepts, the red set indicates the positive objects belonging to the biggest cluster, the
blue set indicates the positive objects from the second cluster, and the green those from
the third. Among the positive examples, there are three clusters in concept 124[8] and
two clusters in 5 − 4. The negative category clusters are not represented in order to
lighten the presentation. The stimulus coding order is ABCD, each of the uppercase
letters representing one dimension. The code 0000 stands for a0 b0 c0 d0 , 1111 stands for
abcd, each of the lower case letters representing a dimension value (i.e., features). The
number preceding the code (1,...16) is a simpler identification number. In the 124[8]
notation, the [8] extension means that there are 8 positive examples in the concept, 4
means that the concept is four-dimensional, and 12 is an arbitrary label which identifies
this concept from the others available within the 4[8] set (Feldman, 2003). The 5 − 4
notation is older and simpler; it only refers to the presence of 5 positive examples vs. 4
negative examples in the category structure (Medin & Schaffer, 1978; Smith & Minda,
2000). For this concept, following Smith & Minda, 2000 (p. 4), the objects have
been numbered by indexing each of the examples using letters (A, for the examples
belonging to the positive category called A; B, for the negative category; and T for the
examples that were presented during the transfer phase, which the subjects could freely
classify as As or Bs). T 10 ,..., T 16 in Smith & Minda (2000) are T 1 ,..., T 7 respectively
in Johansen & Palmeri (2002). The numbers following the letters A, B, and T are
arbitrary numbers that distinguish the individual items. The first cluster and second
cluster are circled by discontinuous red and blue curves respectively in the 5 − 4/7
concept to illustrate their fuzzy aspect. Indeed, the representation of the clusters can
be more or less specific in the subject’s mind since 7 examples are not associated with
clear categories.
where xia is the value of stimulus i along dimension a. We
used a city-block metric appropriate to the separable dimensions used in this study (r = 1). The similarity was simply
computed using si j = n − di j . The most important aspect of
this procedure was that the ordering did not necessarily respect the cluster boundaries targeted in the rule-based order,
as similarity steps can cross in and out of clusters. For instance, the stimulus 0100 can be followed by stimulus 1100.
For both order types, the main difference with the study
by Mathy & Feldman (in which each new block was newly
randomized, although constrained to a given order type) was
the choice of a constant presentation order. In our experiment, the negative examples were also randomly intermingled with the positives in order to avoid long uninterrupted
sequences of positives and negatives. The random position
of the negative and positive examples for a given order was
computed differently for each subject before the task began,
and then applied constantly to all the blocks until the learning
criterion was reached.
The present blocked order procedure could have made it
less difficult for the subjects to learn the categories for two
reasons. Ideally, the subjects would benefit from the constancy of the presentation to abstract rules or to reinforce
exemplar memory traces. However, the subjects could also
have progressively noticed one or several short sequences of
category responses (for instance + − −− from a given example), but longer sequences were also potentially learnable,
up to the length of an entire block. Because these sequences
were an obvious extraneous variable, we ran this study with
a limited number of participants. Still, given that the subjects may not have realized early on that the presentation
order was repetitive, and given that the results can be compared with the other experiments presented in this study, we
thought that this procedure deserved to be tested first.
Justification for the types of presentation order chosen.
These two types of presentation order match two extreme
ways of learning that will be targeted in the subsequent simulations: a complex inductive process based on abstraction
and an elementary process with underlying associative mechanisms.
1. The rule-based condition uses a set of clusters which
are presented to subjects in an order depending on their magnitude (since in many domains, exceptions are learned last),
5
PRESENTATION ORDERS
with no distinction within clusters (since the abstraction process supposedly cancels out any effects of the non-diagnostic
features on learning). Because the objects are supposed to
involve common abstract properties within clusters, they are
drawn randomly.
2. The similarity-based condition follows a more simple
hypothetical associative process that uses the temporal contiguity of the stimuli to reinforce the memory traces locally.
Results
The learning curves in Figure 2B show the influence of
presentation order on learning. In line with our prediction,
learning was faster in the rule-based order. A paired ttest between the two curves in Figure 2B, which consisted
of comparing the mean proportions of correct responses by
pairing the proportions along the blocks, was significant,
t(58) = 14.07, p < .001 (this simple method is equivalent
to taking the blocks as the covariate). The difference between the mean numbers of blocks to criterion was significant, F(1, 20) = 7.72, p = .01, η2 = .28 (24.2 and 37.1 blocks
for the rule-based and similarity-based orders respectively).
The average inter-item similarity within the rule-based order and the similarity-based order was 1.86 and 2.25 respectively, when inter-item similarity was computed for both positive and negative examples. The distribution of inter-item
similarity is shown in Figure 3B. The difference between
the two mean inter-item similarity values was significant,
F(1, 672) = 483, p < .001, with a lower value for the rulebased order.
A follow-up questionnaire at the end of the experiment
indicated that 17 participants noticed that the presentation
order was manipulated (but were still unsure about the exact
regularity/circularity of the sequences). Five other subjects
declared that they believed the order to be completely constant (3 in the similarity-based condition, and 2 in the rulebased condition), and only one did not notice any manipulation of presentation order (in the rule-based condition).
Comparison with Mathy & Feldman’s 2009 data. For
the sake of comparison, we now give a brief analysis of
Mathy & Feldman’s 2009 data which matches the preceding analysis verbatim. We restricted the 2009 data to concept 124[8] and to the similarity-based and rule-based presentation orders (those experiments included four other conditions). The learning curves in Figure 2A show the influence of presentation order on learning. Learning was faster
in the rule-based order (N = 25) than in the similaritybased order (N = 23). A paired t-test between the two
curves, which consisted of comparing the mean proportions
of correct responses by pairing the proportions on the basis
of blocks, was significant, t(77) = 6, p < .001. In addition, the mean number of blocks to criterion was significant,
F(1, 46) = 6.34, p = .01, η2 = .12 (22.6 vs. 28.7 blocks
to criterion respectively). The average inter-item similarity
within the rule-based order and the similarity-based order
was 1.9 and 2.24 respectively (the distribution of inter-item
similarity according to presentation order is shown in Figure 3). The difference between the two mean inter-item similarity values was significant, F(1, 1222) = 613, p < .001.
Conclusion. To our surprise, despite the use of constant
presentation orders, we did not observe better performance
in Experiment 1 than in Mathy & Feldman’s 2009 data (either by comparing the proportion of correct responses or the
number of blocks to criterion). In fact, faster learning was
observed in the 2009 data set. We had expected faster learning in the present experiment, giving that constant presentation orders may consolidate memory formation. This absence of a stronger contrast between the two studies can,
however, be explained by two factors: (1) The 2009 data
only correspond to the subjects who were able to reach the
learning criterion within the hour allocated to the entire experiment (a total of 27 subjects dropped out because they did
not meet the criterion within the one-hour schedule). The
2009 data would therefore have led to much higher estimates
had all the subjects been kept in the analyses. In the present
experiment, only two subjects did not reach the learning criterion in the allocated time (for these two subjects who were
administered a similarity-based condition, the high number
of blocks reached before they dropped out was still used to
compute the mean number of blocks to criterion); (2) all of
the subjects in the study by Mathy & Feldman benefited from
a warm-up condition, and because the subjects learned two
different concepts (permuted between subjects), half of them
also benefited from learning a one-dimensional concept before learning the 124[8] concept. The subjects were therefore
more familiar with the task overall. (In such tasks, the degree
to which a concept is learned is known to be correlated with
better performance, Mathy & Bradmetz, 2004.)
Experiment 2
In order to avoid identical sequence repetition of category
responses across blocks, Experiment 2 was the same as Experiment 1, except that we deconfounded the training and
categorization aspects by alternating learning blocks (not requiring any response) and categorization blocks (requiring a
response). The presentation order was set to constant in the
learning block, but was random in the categorization block.
This allowed a clear separation of the order chosen for the
positive examples from the one chosen for the negative examples, by separating the presentation of all the positive examples from all the negative examples, in a so-called ”fullyblocked” order (Clapper & Bower, 1994, 2002).
6
MATHY & FELDMAN
Table 1
Encoded study items of Concept 124[8] and Concept 5−4 presented in Fig. 1, and three presentation order samples for Concept
124[8]
124[8]
5−4
Presentation order samples in 124[8]
#
Cat A
#
Cat A
SB0-2009
RBO-Exp1
RBO-Exp2
1
0000
A1
1010
3
0100
3
0100
3
0100
3
0100
A2
0001
13
0011
13
0011
13
0011
4
1100
A3
1001
?
Cat B
12
1101
1
0000
5
0010
A4
0011
1
0000
1
0000
5
0010
6
1010
A5
1011
?
Cat B
14
1011
11
0101
9
0001
5
0010
5
0010
9
0001
11
0101
11
0101
11
0101
4
1100
13
0011
9
0001
9
0001
6
1010
?
Cat B
8
1110
12
1101
#
Cat B
#
Cat B
?
Cat B
4
1100
14
1011
2
1000
B1
0100
4
1100
2
1000
8
1110
7
0110
B2
1100
6
1010
16
1111
2
1000
8
1110
B3
0010
?
Cat B
10
1001
16
1111
10
1001
B4
0111
?
Cat B
6
1010
10
1001
12
1101
?
Cat B
7
0110
7
0110
14
1011
?
Cat B
15
0111
15
0111
15
0111
16
1111
Note. Cat A, positive objects of the concept; Cat B, negative objects. The 124[8] and 5 − 4 concepts are shown in Fig. 1.; SBO-2009,
Similarity-based order in Mathy & Feldman’s 2009 study (this one-block presentation order was sampled from the many different orders
that could be instantiated; because the category B stimuli were randomly drawn and interspersed within blocks, the “?” cells facing the
“Cat B” cells could be replaced by any example of the negative category; the “?” cells therefore indicate that the order was not
manipulated) RBO-Exp1, Rule-based order in Exp. 1 (the presentation order was constant across blocks, the negative stimuli were
interspersed but the presentation order of the negative examples was also manipulated and set constant across blocks); RBO-Exp2,
Rule-based order in Exp. 2 (the presentation order was constant across blocks during the learning blocks, but presentation was random
during the categorization blocks; the positive and negative stimuli were fully blocked, with all the positive stimuli presented before all the
negative stimuli); Stimulus numbers are indicated in Fig. 1
Method
Participants. The subjects were 46 Freshman or Sophomore students from the University of Franche-Comté, who
received course credits in exchange for their participation.
The subjects were randomly assigned to the experimental
conditions (rule-based or similarity-based).
Procedure and Stimuli. The procedure and the concept
chosen were identical to those used in Exp. 1, except that
the fully-blocked training blocks alternated with the random
categorization blocks until the learning criterion were met. A
presentation order was randomly drawn for each participant
and applied to all training blocks, starting with all positive
examples, followed by all negative examples. During the
training blocks, each stimulus was displayed for one second.
While the stimulus was presented, the category was labeled
below the stimulus (i.e., “school-bag” or “trash-can”) and the
corresponding category picture was also displayed for one
second (for instance, for a positive category, the school bag
was shown for one second, while the trash can was hidden).
This was followed by a confirmation phase during which
the subject had to press the response key corresponding to
the category picture that had just appeared. After the key
was pressed, feedback indicating a correct or incorrect classification appeared at the bottom of the screen for two seconds, during which the stimulus remained on display. Our
pretests showed that in this condition, none of the instructed
responses could be missed by the participants. The subjects
PRESENTATION ORDERS
7
Figure 2. Proportion correct as a function of block number. Note. A) Mathy & Feldman’s 2009 data set, B) Exp. 1, C) Exp. 2, D) Exp 3.
Blue curve, rule-based order; Red curve, similarity-based order.
were therefore expected to receive 100% positive feedback
during this phase. This method was employed to make sure
the subjects were actively following the learning phase and
that they did not miss any of the instructed categories. The
progress bar was hidden during the learning phase.
Following a 5-second pause, each learning phase was followed by a categorization phase in which all the stimuli were
randomly drawn. The number of points accumulated on the
progress bar in a given categorization block was reset whenever a new categorization phase started.
Results
The following analysis is based on the categorization
blocks. The learning curves in Figure 2C show the influence of presentation order on learning (the abscissa reports
the number of categorization blocks to criterion). More efficient learning was observed in this experiment than in the
first experiment, F(1, 184) = 4.3, p < .05, η2 = .02, when
comparing the proportion of correct responses in Figure 2.
We observed no Experiment × Presentation order interaction.
Learning was faster in the rule-based order. A paired t-test
between the two curves (pairing the proportions on the basis
of blocks) was significant, t(34) = 7.6, p < .001. In addition, the difference between the mean numbers of blocks to
8
MATHY & FELDMAN
Figure 3. Mean inter-item similarity per block, averaged across blocks. Note. A) Mathy & Feldman’s 2009 data set, B) Exp. 1, C) Exp. 2,
D) Exp 3. Left boxplot, rule-based order; Right boxplot, similarity-based order.
criterion was significant, F(1, 44) = 6.71, p = .01, η2 = .13
(11.7 and 16.1 blocks to criterion respectively). The average inter-item similarity within the rule-based order and the
similarity-based order was 1.90 and 2.24 respectively (the
distribution of similarity is shown in Figure 3). The difference between the two mean inter-item similarity values was
significant, F(1, 686) = 4.57, p < .03.
Conclusion. Although learning appeared to be quicker in
Experiment 2 than in Experiment 1, the difference would
be less obvious if both learning blocks and categorization
blocks had been cumulated. (In the rule-based order, about
100% correct would have actually been reached for about
2 × 15 blocks instead of 15 categorization blocks.) Indeed,
the subjects were also allowed to learn the categories during
the random categorization blocks. Hence, 30 blocks almost
stretches to the results of Experiment 1. The basic effect
of the frequency at which the stimuli are shown to subjects
therefore tends to reduce the possible beneficial effect of the
fully-blocked learning phase.
Experiment 3
Experiment 3 was the same as Experiment 2, except that
the concept given to the subjects was the Medin and Schaffer
(1978) 5-4 category set (see Fig. 1 and Table 1). The 5-4
category structure was first studied by Medin and Schaffer
(1978), then in many subsequent studies that have been re-
PRESENTATION ORDERS
analyzed by J. D. Smith and Minda (2000), among others
(Cohen & Nosofsky, 2003; Johansen & Kruschke, 2005; Johansen & Palmeri, 2002; Lamberts, 2000; Lafond, Lacouture, & Mineau, 2007; Minda & Smith, 2002; Rehder &
Hoffman, 2005; Zaki, Nosofsky, Stanton, & Cohen, 2003).
This structure allows the way in which 7 unclassified items
(out of 16) are categorized during a transfer phase of learning
to be studied. Our objective was to find out if different generalization patterns can be observed depending on whether orders are similarity- or rule-based. A second objective was to
generalize the effect of the presentation order types observed
for concept 124[8] in the previous experiments.
9
For instance, Cluster 1 could instead be represented by the
set (A5 , T 15 , A1 , A2 , T 12 , and A3 ). The same applies to the
negative examples.
To order the items in a rule-based fashion during the training phase, we chose to rely on a simple one-dimensional rule
plus exceptions. For the positive category, we chose to group
the objects into two mutually exclusive clusters, that is, Cluster 1 = (A5 , A1 , A2 , A3 ), a subset of the red set in Fig. 1, and
Cluster 2 = A4 , representing the rule “All green, except the
large plain red square, and the large hatched green circle”.
Similarly, the negative objects B8 , B9 , and B6 preceded the
presentation of B7 (i.e., “All red, except the large hatched
green circle, and the large plain red square”).
Method
Participants. The subjects were 44 Freshman or Sophomore students from the University of Franche-Comté, who
received course credits in exchange for their participation.
Procedure. For this experiment, rather than giving concepts in an arbitrary rotation and permutation of features (see
Fig. 1); instead, to facilitate learning, each logical dimension
was instantiated by the same physical dimension for all subjects. A color dimension differentiated the objects at the top
of the hypercube from those at the bottom (green vs. red
respectively); a shape dimension differentiated the objects at
the front from those at the back (square vs. circle); a size
dimension distinguished the objects in the left cube from the
those in the right cube (small vs. large); and finally, the left
and right objects within the cubes were hatched vs. plain.
Consistent with Exp.2 and Exp. 3, presentation order (rulebased or similarity-based) was a between-subject manipulation during the training phase. The subjects were randomly
assigned to these two conditions. Once the subjects reached
the learning criterion (the progress bar was this time equal to
7 × (5 + 4)), we conducted a transfer phase during which both
the training and transfer stimuli were presented (each once in
a block). The transfer phase was composed of 5 blocks of 16
stimuli.
Clusters. The 5 − 4 notation only refers to the presence of
5 positive examples vs. 4 negative examples in the category
structure (Medin & Schaffer, 1978; J. D. Smith & Minda,
2000). For this concept, the objects have been numbered
by indexing each of the training stimuli using A (the positive category), B (the negative category), and T (the transfer
items). This notation refers to previous research (e.g., Smith
& Minda, 2000, p. 4). Fig. 1 depicts a solution that can be
adopted by subjects to cluster the positive items during the
transfer phase. Note that the choice of stimuli in the first
cluster is hypothetical (we refer here to the clusters that may
result from the subjects’ conceptualization during the training phase, and are then applied to the transfer phase). For this
reason, the red and blue sets are printed in dashed curves.
Results
Learning Phase. Learning was faster in the rule-based order (Figure 2D), as confirmed by a significant paired t-test
between the two curves, t(27) = 7.5, p < .001, and the difference between the mean numbers of blocks to criterion was
significant, F(1, 42) = 5.15, p = .03, η2 = .11 (6.4 vs. 9.9
blocks to criterion). The difference between the two mean
values for inter-item similarity shown in Figure 3 was also
significant, F(356) = 72, p < .001, 2.20 and 2.60 respectively.
Transfer Phase. One subject who was not able to meet the
learning criterion in the time allowed did not complete the
transfer phase. The following analysis of the transfer phase
follows that of Johansen and Palmeri (2002) (pp. 495-...).
Johansen and Palmeri (2002) developed a complex analysis
of the patterns reflecting rule-based category representations,
which we here take for granted. These patterns only vary
for a subset of stimuli labeled the critical stimuli. Figure 4
only shows a subset of these average categorization probabilities corresponding to the critical transfer stimuli (those
that are diagnostic of a rule-based or an exemplar-based generalization pattern: T 10 , T 11 , T 13 , T 14 , T 15 ). The graph does
not include the categorization probabilities for the 5 positive and 4 negative examples encountered by the subjects in
the training phases, as these stimuli were globally categorized as expected in the transfer phase. Because the subjects
were trained to categorize the stimuli labeled A as belonging
to the A category during the learning phase, the proportion
p(A) was very high during the transfer blocks for those five
stimuli, regardless of presentation type. The opposite was
logically found for the four stimuli labeled B. In Figure 4,
the categorization probability p(A) is the observed proportion of stimuli categorized as A (i.e., as positive) during the
transfer phase across the five blocks. A stimulus categorized
five times out of five as A simply corresponds to p(A) = 1.
When selecting the non-critical stimuli only (all A’s, all
B’s, along with T 12 , and T 16 ), a Stimulus Type × Presentation Order ANOVA on the proportion of A responses indi-
10
MATHY & FELDMAN
cated only a slight significant effect of Stimulus Type (note
that the T 12 stimuli were mostly categorized as A’s, whereas
the T 16 were categorized as B’s). However, a similar analysis restricted to the critical stimuli showed a significant
interaction between Stimulus Type and Presentation Order,
F(4, 205) = 2.78, p = .028, η2 = .05. Figure 4 indicates that following the rule-based presentation order, the
average pattern (across subjects) is BBABA (a typical rulebased generalization pattern), as opposed to ABABA for the
similarity-based presentation order (a non-typical pattern, although similar to ABBBA, a typical exemplar-based pattern).
We now focus on the distribution of the generalization
patterns at the individual subject level (N = 43). In theory, a prominent rule-based generalization pattern is BBABA,
which corresponds to a one-dimensional rule (plus exceptions) based on the Color dimension, i.e. “The positive are
all red objects except B7 , and the negative are all green objects except A4 ”). The results show that the BBABA pattern
was less common in the similarity-based presentation order
than in the rule-based order (7 vs. 14 subjects respectively).
Note that the AABBB pattern (another prominent rule-based
pattern, based on a one-dimensional rule using Size instead
of Color as the main dimension) is represented twice for subjects who were given a similarity-based order and once by a
subject in the rule-based presentation order condition. We
conclude that the subjects tended to use Color to separate
the categories. A total of 24 subjects eventually categorized
the transfer objects in a way that suggests that they applied
a rule-based strategy2 . Overall, our result clearly indicate
a distortion in the generalization patterns according to presentation order, and this distortion is mainly visible in the
frequency associated with the BBABA pattern.
Figure 4. Average categorization probabilities of the critical transfer items (T 10 , T 11 , T 13 , T 14 , T 15 ) in the 5-4 category structure during
the transfer phase (amounting to 5 blocks). Note. p(A) is the observed proportion that each of the stimuli labeled under the abscissa
was categorized as A (i.e., as positive) during the transfer phase.
The proportions are broken down by presentation order conditions
(rule- vs similarity-based). The graph does not include the categorization probabilities for the 5 positive examples and the 4 negative
examples that the subjects encountered in the training phases. Error
bars show +/- one s.e.
Temporal-GCM fit to the Exp. 1
and Exp. 2 data sets
We argue that exemplar models in their present form are
not totally able to predict subjects’ category representations
as a function of presentation order. In this category of models, there is sometimes no specific mechanism for modulating the strength of the memory traces based on the order
in which the stimuli are presented (Nosofsky, 1984, 1986;
Nosofsky, Gluck, Palmeri, McKinley, & Gauthier, 1994).
We mentioned above that exemplar-models that use a decay
function of lag of presentation can handle recency effects
(Nosofsky et al., 1992), but they do not seem to be appropriate for handling primacy effects (Busemeyer & Myung,
1988). For instance, Nosofsky et al. (1992) suggest that the
similarity of a stimulus i to an exemplar x is modulated by
the memory strength M x associated with exemplar x, with
M x = exp(−lag), signifying that the greater the number of
intervening items between the presentation of i and x, the
lesser the memory strength. The model that we present in
this paper follows the same principle, except that we use
temporal contiguity (Burgess & Hitch, 1999; Howard & Kahana, 2002) computed by block to account for primacy effects. Nosofsky et al. (1992) focused on the memorization of
the stimuli, whereas our model focuses more on the discriminability of the stimuli. For instance, according to Nosofsky
et al. (1992), given a block of three stimuli a, b, and c, the
lag between stimulus a of the second block and stimulus c of
the first block is only one. The greater memory of c makes
the subject perceive a as very close to c. However, if we restrict the computation of the lags within a block, a is simply
maximally isolated from c. In the short-term memory literature, this isolation is known to produce a primacy effect.
It follows that the memory strength for a might be stronger
as a result of this primacy effect, rather than faded due to
2
Twenty-four (7+14+2+1 = 24) is quite a high value in relation
to the fairly erratic distribution of patterns that has previously been
observed (see Johansen & Palmeri, 2002, p. 491).
PRESENTATION ORDERS
the effect of time. Because our constant presentation orders
produced a circularity between blocks, especially when the
training blocks were associated with categorization blocks,
we target here a model that accounts for the discriminability
of the stimuli within blocks.
TGCM (a temporal version of GCM) is a simple extension of the standard model that incorporates the temporal dimension into the usual computation of the similarity between
pairs of stimuli.
In GCM, the distance function presented in Equation (1)
can be used with r = 1 (a city-block metric suitable for
separable dimensions), n the number of physical dimensions
(here, n = 4), and xia the value of stimulus i in dimension a.
The distance function can be augmented with six free parameters: a scale parameter c reflecting discriminability in
the psychological space and n attention
P weight parameters of
dimensions with 0 ≤ wa ≤ 1, and wa = 1 (n − 1 = 4 − 1
were free to vary).
di j = c[
n
X
wa |xia − x ja |r ]1/r
(2)
a=1
The following exponential decay function can be used
to relate stimulus similarity to psychological distance
(Nosofsky, 1986; Shepard, 1987):
ηi j = e−di j
(3)
Given the total similarity of a stimulus i to all exemplars
in categories X and Y, the probability of responding with category X is generally computed by Luce’s choice rule:
P
( x∈X ηix )γ
P
P(X/i) = P
(4)
( x∈X ηix )γ + ( y∈Y ηiy )γ
The choice rule can be augmented with γ, which is a
response-scaling parameter that governs the extent to which
responding is probabilistic versus deterministic (Ashby &
Maddox, 1993; McKinley & Nosofsky, 1995; Navarro, 2007;
Nosofsky & Zaki, 2002). This parameter is used to fit the
data across the blocks, and to better fit the data when performance may be close to the chance level (at the beginning
of an experiment) or when performance is errorless (sometimes by the end of an experiment). Values of γ less than
1 reflect greater levels of guessing, whereas values above 1
make the predicted probabilities more deterministic (close to
0 or 1). This parameter can be avoided when fitting the data
by epochs. The parameter representing the bias for making
category responses and the one controlling the frequency of
the stimuli were also considered minor in our study (respectively, because the number of positive and negative examples
is balanced in 124[8] and almost balanced in 5 − 4, and because the stimuli were presented in each block with equal
frequency).
11
In TGCM, the distance function presented in Equation (1)
uses an extra attention weight to the temporal dimension. To
simplify, and following Brown et al., 2007, p. 544, r was
set to 1 for both physical and temporal distances. We now
detail why presentation orders can affect performance in this
model. According to TGCM, the categorization process is
influenced by the temporal contiguity between stimuli. Upon
presentation of a stimulus, the psychological distance between the stimulus and the exemplars depends both on the
physical dimensions and the temporal dimension. TGCM
can be primarily used to account for the distortion of the
memory traces based on presentation order. Figure 5 illustrates how presentation orders can affect the pattern of probabilities when some attention is attributed to the temporal dimension (for instance by setting the attention weight vector
to w = [.33 .33 .33], with the two first values corresponding to the two physical dimensions and the last value corresponding to the temporal dimension). When w = [.5 .5 0]
(left square of Figure 5), the probabilities of classifying each
of the positive examples as positive is p = .62, vs. p = .38
for the negative examples (regardless of presentation order
because the attention weight for the temporal dimension is
set to zero, and with c and γ both set to 1). When the presentation order is fully-blocked (middle square of Figure 5)
and when w = [.33 .33 .33], the first presented positive exemplar increases its distinctiveness, contrary to the second
with regard to the negative examples (p = .73 for the first,
vs. p = .67 for the second). The effect is reversed for
the negative exemplars for which the first presented exemplar has a higher probability of being categorized as positive
(because it is presented in the middle of a set of positive examples), in contrast to the last presented negative exemplar
(p = .27) that is isolated. Therefore, the first and last stimuli are better discriminated from the opposite category and
better associated with their correct category (p = .73 and
p = .27 respectively). Overall, all examples are expected to
be better categorized in this fully-blocked presentation order
using w = [.33 .33 .33], compared to the prediction that uses
w = [.5 .5 0]: the two probabilities for the positive examples
(p = .73 and p = .67) effectively exceed p = .62; likewise
for the negative exemplars for which the two probabilities
(p = .33 and p = .27) are both lower than p = .38. When
the presentation is more disorganized (see the right square
in Figure 5), the change in the predicted probabilities is more
dramatic, with a higher probability of the first negative example being misclassified (p = .39), and a lower probability of
the second positive example (p = .61) being correctly classified. We believe that this simple integration of the temporal
dimension in the computation of the similarity matrices for
each subject and each epoch can make more accurate predictions than the standard GCM.
To test TGCM, the fit values were first computed for
each experiment, presentation order, subject, and epoch of
12
MATHY & FELDMAN
five blocks, for a total of five epochs. The temporal distances were computed for each block, under the simplifying assumption that the temporal distances were restricted
to the block boundaries. Because the distances were computed between one object and the preceding objects only
within a block, the distance between every pair of objects
was symmetrical. For instance, for three objects presented
in a block, the distances between the first and second objects, the first and third objects, and the second and third
objects would be 1, 2, and 1 respectively. We therefore hypothesized a participant’s reliance on temporal associations
to guide learning, with associations being stronger in the forward than in the backward direction (Kahana, 1996; Kahana,
Howard, & Polyn, 2008). This assumption is relevant because each presentation-order-manipulated block was separated by a categorization phase in Experiments 2 and 3 and
because of the primacy effect in Experiment 1 (the fact that
the first block began with a given set of objects that reappeared in a loop), two different clues that could help participants identify the temporal structures relative to the start and
end of the ordered blocks. The assumption is more open to
criticism for the 2009 data set, where the subjects had no idea
where the blocks began, but we still wanted to systematically
run the model in a similar fashion for all data sets. We therefore simply hypothesized that the temporal distances computed for each virtual block in the 2009 data set would still
help detect existing temporal patterns. For instance, the fact
that exceptions were always distant from the more regular
positive objects was such a regularity (even though these two
clusters of objects were close in the reverse order when a new
block began, the greater distance in the direct order can still
produce a clear demarcation between the clusters). Another
regularity was that some pairs of stimuli were more often
contiguous than chance in a similarity-based presentation order.
To scale the temporal distances, we divided each of them
by 3.75. Because the maximal temporal distance between
two stimuli within a block was 15 (15 being the difference
between 1 –the first presented stimulus within a block– and
16 –the last stimulus within a block–), a division by 3.75 allowed for the greatest temporal distances to match the greatest physical distances (i.e., 15/3.75 = 4). This scaling
method could be further amended by the attention weighting process described above. Note that because the temporal dimension is fully diagnostic of the categories in a fullyblocked presentation, we computed a mean temporal distance based on the distances in the presentation phase and
those in the random categorization phase in the last two experiments.
Log-likelihood was used as a measure of goodness-of-fit.
For each experiment, subject, epoch, and parameter setting,
the likelihood of the data was computed using the binomial
distribution:
Y Fi !
L=
pifi (1 − pi )Fi − fi ,
f
i
i
(5)
with i the stimulus number, fi the number of positive category responses for stimulus i, Fi the number of blocks in
an epoch (i.e., the maximal number of positive category responses for stimulus i), and pi the probability given by the
model of a positive category response for stimulus i. Computing the log likelihood simply equates to replacing a computation based on a product (to compute the joint likelihood
across subjects and across epochs for all models and all sets
of parameters) by a computation based on a sum (for an introduction, see Lamberts, 1994). Indeed, it is more convenient
and ultimately more accurate to compute the logarithm of the
likelihood to avoid undesirable underflow, which can result
from constantly multiplying values between 0 and 1.
Because GCM is a restricted version of TGCM, a test
of the difference in goodness-of-fit between the models was
computed using the log-likelihood-ratio statistic:
χ2 (d f ) = −2[ln L(restricted) − ln L(general)]
(6)
The degrees of freedom are the number of parameters that are
removed in the restricted model, compared to the general version. Here, d f = 1 because the restricted GCM model does
not integrate the temporal dimension present in the more general TGCM model.
First, both GCM and TGCM were tested using a fixed sensitivity value and a fixed gamma value (both equal to 1). The
objective was to show the role of the temporal dimension for
the simplest version of the models. The parameter of interest
in the present investigation is w. We focused on searching
for the best weight parameters from the 126 possible weight
patterns that can be generated using .2 steps (e.g., [1 0 0 0 0],
[.8 .2 0 0 0], ..., [.6 .2 .2 0 0], etc.; using .1 steps, the set of
1001 different weight combinations significantly increased
the search time of the model-fitting process). All 126 possible weight combinations were tested for each block and each
subject. We computed which weight pattern sequence gave
the best joint likelihood across the blocks for both models
(for a given subject, the best weight patterns could therefore
be different between epochs).
The sum log-likelihood (across subjects) for TGCM was
-6600, -3231, -3444, and -1333 respectively for the four experiments ordered chronologically (i.e. Exp. 2009, Exp.
1, Exp. 2 and Exp. 3.), based on the best attentional
weights found for each subject and each epoch. In contrast, the values for GCM were -6989, -3339, -4792, and
-1580 respectively. In TGCM, the mean temporal weights
averaged across subjects and blocks were .199, .136, .461,
and .319 respectively for the four experiments. We obtained
a significant log-likelihood-ratio for each experiment (e.g.,
PRESENTATION ORDERS
Figure 5. An example of how the temporal dimension affects the
categorization probabilities in a temporal version of GCM. Note.
W is a vector of dimension-salience weights. The third value in
the vector is the weight given to the temporal dimension, the two
first values are the weights for the physical dimensions. The left
square shows the probabilities of classifying a stimulus in the positive category (i.e., p(A)), when equivalent attention is allocated to
both physical dimensions. For instance, the central square indicates
the probabilities of classifying a stimulus in the positive category,
when the temporal dimension is given a .33 value, and when stimuli are presented by clusters in a fully-blocked manner (for every
square, the value “1” represents the first displayed item, the value
“2” represents the second displayed item, and so on). The right
square shows the probabilities of classifying each of the stimuli in
the positive category when the temporal dimension is given a .33
value, but when stimuli are presented in a more unstructured way
(with the negative examples interspersed).
χ2 = −2[−3339 − (−3231)] = 216 for Exp. 2, a value significantly higher than the critical 3.84 value).
The proportion correct values predicted by TGCM are
plotted in Figure 6 for the four conditions studied in Exp.
1 and Exp. 2 (rule-based blocked, similarity-based blocked,
rule-based unblocked, similarity-based unblocked respectively).
SUSTAIN fit to the Exp. 1 and
Exp. 2 data sets
Our goal here is to show that the temporal exemplar model
that we simulated in the preceding section, although quite
powerful, faces concurrent models that can handle presentation orders quite naturally in their present development. For
instance, SUSTAIN (Supervised and Unsupervised STratified Adaptive Incremental Network) (Love & Medin, 1998;
Love et al., 2004) is a model of category learning by clustering similar stimuli together. The clusters are activated by
stimuli and serve as abstractions. If simple solutions prove
inadequate, SUSTAIN progressively recruits additional clusters to represent the stimuli. We chose this model because it
is known to be susceptible to sequencing effects. We ran this
model for Exp. 1. and Exp. 2 using the set of suitable parameters for “All studies”, i.e. Attentional focus r = 2.844642,
Cluster competition β = 2.386305, Decision consistency d
= 12.0, Learning rate η = 0.09361126 (Love et al., 2004,
13
(a)
(b)
Figure 6. (a) Proportion correct predicted by TGCM using a limited set of integer values for the sensitivity parameter ranging from
1 to 15 and 126 different weight patterns generated by considering
all possible combinations of attentional weights that could be built
using .2 increments (e.g., [1 0 0 0 0], [.8 .2 0 0 0], ..., [.6 .2 .2 0
0], based on the presence of four physical dimensions and one temporal dimension). The proportion correct was averaged every five
blocks (1-5, 6-10, 11-15, 16-20, 21-25). (b) Proportion correct observed in the data. The continuous curves correspond to the blocked
conditions of Exp. 3. The discontinuous curves correspond to the
unblocked conditions of Exp. 2.
p. 313). The proportion correct values predicted by SUSTAIN are plotted in Figure 7. In its present development and
given the standard implementation we chose, the model cannot compete with TGCM as the variance explained is lower,
but still, all the manipulated presentation orders in Exp. 1
and Exp. 2 (rule-based blocked, similarity-based blocked,
rule-based unblocked, similarity-based unblocked respectively) led to better performance than our simulated random presentations (estimated on the basis of 1000 samples
14
MATHY & FELDMAN
Figure 7. Proportion correct predicted by SUSTAIN using the
set of suitable parameters for “All studies”: Attentional focus r =
2.844642, Cluster competition β = 2.386305, Decision consistency
d = 12.0, Learning rate η = 0.09361126 (Cf. Love et al. (2004) p.
313).
of random-based presentation orders, and five epochs of five
blocks each). This result confirms the preceding observations
that similarity-based presentations lead to better performance
than random presentations. Furthermore, the model demonstrated its ability to predict the pattern we observed in our
data (rule-based blocked < similarity-based blocked < rulebased unblocked < similarity-based unblocked) with substantial variations between the four conditions, although the
curves were closer than those observed in our data. Therefore, this simple simulation shows that SUSTAIN is able to
predict the benefit of rule- and similarity-based presentation
order types, without even modifying a fixed set of parameters
known to function for other studies. Table 2 shows that with a
more complex search of parameters, SUSTAIN still predicts
a lesser portion of variance than TGCM, and still produces
greater AIC and BIC values (two computations often used in
model selection that measure the relative goodness of fit of a
statistical model while penalizing the number of parameters;
higher values denote a lack-of-fit of the chosen model). Hypothetical lower AIC and BIC values for SUSTAIN (with a
similar portion of explained variance) would have meant that
TGCM tended to over-fit our data and that SUSTAIN was a
more concise model and a better candidate for applying the
model to future data.
General Discussion
Previous studies of inductive learning have not primarily
focused on the order in which examples are actually encoun-
tered (Komatsu, 1992; Kruschke, 2005; Murphy, 2002). Our
research addressed the question of whether the manipulation
of presentation orders based on rule vs. exemplar learning
assumptions can be beneficial to category learning. This issue was first addressed by Mathy and Feldman (2009) who
reported an experiment in which a rule-based order better
facilitated category learning in comparison to simply maximizing inter-item similarity (Elio & Anderson, 1981, 1984;
Medin & Bettger, 1994). In order to enhance presentation order effects in the present study, further manipulation was introduced by maintaining constant presentation orders across
blocks for every subject (Exp. 1, Exp. 2, and Exp. 3), and by
making use of fully-blocked presentation orders to separate
the positive examples from the negative examples during the
training phases (Exp. 2, and Exp. 3). We also hypothesized that subjects in the rule-based condition would exhibit
generalization patterns consistent with rule-based retrieval in
our third experiment, in which the subjects were trained with
the 5-4 category structure and tested in a subsequent transfer
phase with all the old stimuli and six new stimuli.
Our main results are a systematic positive effect of the
rule-based presentation order over the similarity-based presentation in our three experiments, a positive effect of fullyblocked orders (when contrasting the results of Exp. 1 with
those of Exp. 2, although we signaled a potential confound due to the greater frequency of stimuli in Exp. 2),
and patterns of generalization induced by presentation orders (Exp. 3). Lastly, the effect of maintaining constant
presentation orders (versus mixing orders of one kind across
blocks, e.g., Mathy & Feldman, 2009) was inconclusive.
Overall, our results indicate that presentation-order types affect how categories are represented (Ashby & Ell, 2001; Sloman, 1996), which does not tend to validate a unitary view of
how classification functions (Pothos, 2005). Our results support an intricate relationship between time and categorization, which adds to the studies on restructuration or initial
training (Lewandowsky, Kalish, & Griffiths, 2000; Lee et al.,
1988; Spiering & Ashby, 2008).
An extension of the Generalized Context Model
(Nosofsky, 1984, 1986; Nosofsky, Gluck, et al., 1994) called
Temporal-GCM was described and fit to our data in order to
account for both primacy and recency effects within blocks.
This model allows for a simple computation of the temporal
proximity between stimuli, in order to account for the distortion of the psychological space generated by presentation
orders. This extension leads to a better explanation of the
variance when compared to the more restricted GCM model.
Our results also suggest that categorization performance is
driven by a process of rule-based abstraction that can be accounted for by a model such as SUSTAIN, which is incremental by nature (Love et al., 2004). The main limitation of
this study is that we did not test prototype (Ashby & Maddox, 1993; Minda & Smith, 2001, 2002; Nosofsky & Zaki,
PRESENTATION ORDERS
15
Table 2
Maximum likelihood estimation, Akaike Information Criterion, and Bayesian Information Criterion, computed for Exp. 1 and
Exp. 2 combined.
TGCM
GCM
SUSTAIN
k
5
4
4
θ̂mle
-4026
-4299
-8427
AIC
8063
8605
16863
BIC (N = 5440)
8096
8631
16890
R2
86.8
86.1
64.7
Note. Each model was fit to 5440 data points corresponding to the proportion p(A) that a stimulus was categorized as a positive member
by the 68 subjects across 5 epochs of 5 blocks (16 × 68 × 5 = 5440) of Exp. 1 and Exp. 2; k, number of parameters free to vary;
θ̂mle , maximum likelihood estimation; AIC, Akaike Information Criterion; BIC, Bayesian Information Criterion; R2 , percent of variance
explained. Probabilities p(A) predicted by TGCM used a limited set of integer sensitivity values ranging from 1 to 15 and 126 different
weight patterns generated by considering all possible combinations of weights that could be built using .2 increments (e.g., [1 0 0 0 0], [.8
.2 0 0 0], ..., [.6 .2 .2 0 0]. Probabilities p(A) predicted by GCM used the same combinations of parameters except that the weight patterns
for which attention to the temporal dimension was above zero were removed from the computation. Maximum likelihood was estimated in
TGCM and GCM for each subject and each epoch using the best combination of parameters. SUSTAIN was run with integer values for the
attentional focus r ranging from 1 to 10, with integer values for the Cluster competition β parameter ranging from 1 to 10, with Decision
consistency d values taken from [1, 5, 10, 15, 20], and with Learning rate η values taken from [.10,.15,.20,.25,.30,.35,.40,.45]. Maximum
likelihood was estimated in SUSTAIN for each subject across epochs using the best combination of parameters.
2002; Osherson & Smith, 1981; J. D. Smith & Minda, 1998;
Zaki et al., 2003) and hybrid models (Ashby, Alfonso-Reese,
Turken, & Waldron, 1998; Anderson & Betz, 2001; Erickson
& Kruschke, 1998; Goodman, Tenenbaum, Feldman, & Griffiths, 2008; Nosofsky, Palmeri, & McKinley, 1994; Rosseel,
2002; E. E. Smith & Sloman, 1994; Vandierendonck, 1995;
Vanpaemel & Storms, 2008), which are also extensively referenced in the categorization literature. Another limitation is
that we did not compare TGCM with other exemplar models known to handle trial-by-trial changes in category representations (Kruschke, 1992, 1996; Nosofsky et al., 1992), a
comparison which we believe merits a fully-developed explanation of how the models differ in their structure.
We conclude with several speculations on the mechanisms
involved in subjects’ performance. Firstly, the detrimental
effect of similarity-based orders might be attributed to the
overly specific hypotheses they tend to induce in the subjects’
mind, a mechanism in the formation of abstraction that can
be accounted for by SUSTAIN. A second possibility is that
examples are associated on the basis of both time and features (a mechanism which is more consistent with TGCM).
Still, our data also suggest that the effect of the rule-based
presentation may be solely due to the isolation of the exceptions in the sequence orders (an effect known as the von
Restorff effect, Restorff, 1933, which might apply to categorization). A third possible explanation for the easier rulebased condition is that the random aspect of sampling within
clusters helps in the abstraction of the diagnostic features.
Our results confirm certain observations that have been
made on the relationship between temporal distinctiveness
and memory retrieval. For instance, Kahana (1996) reanalyzed a number of classic free-recall studies and showed
that temporal contiguity clearly determines some associative
mechanisms, leading to a process of episodic clustering. The
neighbor items studied in list positions tend to be reported
successively and more rapidly during the recall period, regardless of their degree of semantic association. Accordingly, if rule learning functions as a discrimination problem,
abstraction is easier whenever within-cluster stimuli are not
easily discriminable due to their temporal contiguity, and
whenever between-cluster stimuli are easily discriminable
within the time scale. Brown et al. (2007) have also shown
such a dependency between time and discriminability in their
model of memory retrieval that involves temporal discrimination as a core principle to account for the fact that forgetting is due to reduced local distinctiveness.
Our study also follows up the series of research on category discovery that took place in the 1950’s, particularly
those focusing on the effect of the informativeness of order, with some instances eliminating more possible concepts
than others (Bruner, Goodnow, & Austin, 1956; Hovland
& Weiss, 1953). Our results suggest that in blocked presentations, subjects have difficulty learning concepts from
negative instances even though each category transmits the
same amount of information (Hovland & Weiss, 1953). Also,
the effect of blocked orders (or simply the effect of mass-
16
MATHY & FELDMAN
ing the presentation of positive examples) has been found
in previous research on the learning of paired-associate lists
(Gagné, 1950), on supervised concept learning (Kurtz &
Hovland, 1956; Goldstone, 1996), unsupervised concept
learning (Clapper & Bower, 1994, 2002; Zeithamova &
Maddox, 2009), incidental concept learning (Wattenmaker,
1993), and clustering (Gmeindl, Walsh, & Courtney, in
press). The fact that blocked categories benefit learning does
not, however, fit with the quasi-ubiquitous finding that there
are benefits to distributed practice compared to massed practice (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006), even
though massing apparently creates a sense of fluent learning (Kornell & Bjork, 2008; Kornell, Castel, Eich, & Bjork,
2010; Wahlheim, Dunlosky, & Jacoby, 2011 in press).
Conclusion
Previous results on the influence of presentation orders
have suggested that spacing facilitates induction (Kornell &
Bjork, 2008), but that optimal training procedures also depend on the nature of the categories being learned (Spiering
& Ashby, 2008). Other research has revealed a more specific
influence of order types (e.g., rule-based, similarity-based)
on categorization learning (Elio & Anderson, 1981; Mathy
& Feldman, 2009; Medin & Bettger, 1994). Whether presentation orders are useful for investigating categorization
processes, and whether spacing aids or impairs induction,
are both complex issues that we believe could be further addressed by studying the effects of presentation order on different concept types in more detail.
References
Allen, S. W., & Brooks, L. R. (1991). Specializing the operation of
an explicit rule. Journal of Experimental Psychology: General,
120, 3-19.
Anderson, J. R., & Betz, J. (2001). A hybrid moel of categorization.
Psychonomic Bulletin & Review, 8(4), 629–647.
Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron,
E. M. (1998). A neuropsychological theory of multiple systems
in category learning. Psychological Review, 105, 442-481.
Ashby, F. G., & Ell, S. W. (2001). The neurobiology of human
category learning. Trends in Cognitive Sciences, 5(5), 204-210.
Ashby, F. G., & Maddox, W. T. (1993). Relations between prototype, exemplar, and decision bound models of categorization.
Journal of Mathematical Psychology, 37(372-400).
Bjork, R. A., & Whitten, W. B. (1974). Recency-sensitive retrieval
processes in long-term free recall. Cognitive Psychology, 6, 173
- 189.
Brown, G. D. A., Neath, I., & Chater, N. (2007). A temporal ratio
model of memory. Psychological Review, 114, 539-576.
Bruner, J., Goodnow, J., & Austin, G. (1956). A study of thinking.
New York: Wiley.
Burgess, N., & Hitch, G. (1999). Memory for serial order: A
network model of the phonological loop and its timing. Psychological Review, 106, 551-581.
Busemeyer, J. R., & Myung, I. J. (1988). A new method for investigating prototype learning. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 14, 3 - 11.
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D.
(2006). Distributed practice in verbal recall tasks: A review and
quantitative synthesis. Psychological Bulletin, 132, 354-380.
Clapper, J. P., & Bower, G. H. (1994). Category invention in unsupervised learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 443-460.
Clapper, J. P., & Bower, G. H. (2002). Adaptative categorization
in unsupervised learning. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 28, 908-923.
Cohen, A. L., & Nosofsky, R. M. (2003). An extension of
the exemplar-based random-walk model to separable-dimension
stimuli. Journal of Mathematical Psychology, 47, 150-165.
Crawford, L. E., & Duffy, S. (2010). Sequence effects in estimating
spatial location. Psychonomic Bulletin & Review, 17, 725-730.
Elio, R., & Anderson, J. R. (1981). Effects of category generalizations and instance similarity on schema abstraction. Journal
of Experimental Psychology: Human Learning and Memory, 7,
397-417.
Elio, R., & Anderson, J. R. (1984). The effects of information order
and learning mode on schema abstraction. Memory & Cognition,
12, 20–30.
Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in
category learning. Journal of Experimental Psychology: General, 127, 107-140.
Feldman, J. (2003). A catalog of Boolean concepts. Journal of
Mathematical Psychology, 47, 75-89.
Gagné, R. M. (1950). The effect of sequence of presentation of
similar items on the learning of paired associates. Journal of
Experimental Psychology, 40, 61 - 73.
Garner, W. (1974). The processing of information and structure.
Potomac, MD: Erlbaum.
Gmeindl, L., Walsh, M., & Courtney, S. M. (in press). Binding serial order to representations in working memory: a spatial/verbal
dissociation. Memory & Cognition.
Goldstone, R. L. (1994). Influences of categorization on perceptual
discrimination. Journal of Experimental Psychology: General,
123, 178-200.
Goldstone, R. L. (1996). Isolated and interrelated concepts. Memory & Cognition, 24(608-628).
Goodman, N. D., Tenenbaum, J. B., Feldman, J., & Griffiths, T. L.
(2008). A rational analysis of rule-based concept learning. Cognitive Science, 32, 108-154.
Hahn, U., & Chater, N. (1998). Similarity and rules: distinct? exhaustive? empirically distinguishable? Cognition, 65, 197-230.
Homa, D., Rhoads, D., & Chambliss, D. . (1979). Evolution of conceptual structure. Journal of Experimental Psychology: Human
Learning and Memory, 5, 11–23.
Hovland, C. I., & Weiss, W. (1953). Transmission of information
concerning concepts through positive and negative instances.
Journal of Experimental Psychology, 45, 175-182.
Howard, M. W., & Kahana, M. J. (2002). A distributed representa-
PRESENTATION ORDERS
tion of temporal context. Journal of Mathematical Psychology,
46, 269 - 299.
Johansen, M. K., & Kruschke, J. K. (2005). Category representation
for classification and feature inference. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 31, 1433-1458.
Johansen, M. K., & Palmeri, T. J. (2002). Are there representational
shifts during category learning? Cognitive Psychology, 45(482553).
Jones, M., Love, B. C., & Maddox, W. T. (2006). Recency effects as a window to generalization: separating decisional and
perceptual sequential effects in category learning. Journal of
Experimental Psychology: Learning, Memory, and Cognition,
32, 316-332.
Jones, M., & Sieck, W. R. (2003). Learning myopia: An adaptive
recency effect in category learning. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 29, 626 - 640.
Kahana, M. J. (1996). Associative retrieval processes in free recall.
Memory & Cognition, 24(103-109).
Kahana, M. J., Howard, M. W., & Polyn, S. M. (2008). Associative retrieval processes in episodic memory. In H. L. Roediger
(Ed.), Cognitive psychology of memory. vol. 2 of learning and
memory: A comprehensive reference, 4 vols (j. byrne, editor-inchief). Oxford: Elsevier.
Komatsu, L. K. (1992). Recent views of conceptual structure. Psychological Bulletin, 112, 500-526.
Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: is spacing the ”enemy of induction”? Psychol Sci, 19,
585-592.
Kornell, N., Castel, A. D., Eich, T. S., & Bjork, R. A. (2010).
Spacing as the friend of both memory and induction in young
and older adults. Psychol Aging, 25, 498-503.
Kruschke, J. K. (1992). Alcove: An exemplar-based connectionist
model of category learning. Psychological Review, 99, 22-44.
Kruschke, J. K. (1996). Dimensional relevance shifts in category
learning. Connection Science, 8, 225-247.
Kruschke, J. K. (2005). Category learning. In K. Lamberts
& R. L. Goldstone (Eds.), The handbook of cognition, ch. 7
(p. 183-201). London: Sage.
Kurtz, K. H., & Hovland, C. I. (1956). Concept learning with
differing sequences of instances. Journal of Experimental Psychology, 51, 239-243.
Lafond, D., Lacouture, Y., & Mineau, G. (2007). Complexity minimization in rule-based category learning: Revising the catalog of
boolean concepts and evidence for non-minimal rules. Journal
of Mathematical Psychology, 51, 57-74.
Lamberts, K. (1994). Flexible tuning of similarity in exemplarbased categorization. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 20, 1003-1021.
Lamberts, K. (2000). Information-accumulation theory of speeded
categorization. Psychological Review, 107, 227-260.
Lee, E. S., MacGregor, J. N., Bavelas, A., Mirlin, L., Lam, N., &
Morrison, I. (1988). The effects of error transformations on
classification performance. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 66 - 74.
Lewandowsky, S., Kalish, M., & Griffiths, T. L. (2000). Competing strategies in categorization: Expendiency and resistance to
knowledge restructuring. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 26, 1666-1684.
17
Love, B. C., & Medin, D. (1998). SUSTAIN: A model of human
category learning. In C. Rich & J. Mostow (Eds.), Proceedings
of the Fifteenth National Conference on Artificial Intelligence
(p. 671-676). Cambridge, MA: MIT Press.
Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN:
A network model of category learning. Psychological Review,
111, 309-332.
Mathy, F., & Bradmetz, J. (2004). A theory of the graceful complexification of concepts and their learnability. Current Psychology
of Cognition, 22, 41-82.
Mathy, F., & Feldman, J. (2009). A rule-based presentation order
facilitates category learning. Psychonomic Bulletin & Review,
16, 1050-1057.
McKinley, S. C., & Nosofsky, R. M. (1995). Investigations of
exemplar and decision bound models in large, investigations of
exemplar and decision bound models in large, ill-defined category structures. Journal of Experimental Psychology: Human
Perception and Performance, 21(128-148).
Medin, D. L., & Bettger, J. G. (1994). Presentation order and recognition of categorically related examples. Psychonomic Bulletin
& Review, 1, 250-254.
Medin, D. L., & Schaffer, M. (1978). A context theory of classification learning. Psychological Review, 85, 207-238.
Minda, J. P., & Smith, J. D. (2001). Prototypes in category learning: The effects of category size, category structure, and stimulus complexity. Journal of Experimental Psychology, 27(3),
775–799.
Minda, J. P., & Smith, J. D. (2002). Comparing prototype-based and
exemplar-based accounts of category learning and attentional allocation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 275-292.
Murdock, B. B. (1960). The distinctiveness of stimuli. Psychological Review, 67, 16- 31.
Murphy, G. L. (2002). The big book of concepts. Cambridge, MA:
MIT Press.
Navarro, D. J. (2007). On the interaction between exemplar-based
concepts and a response scaling process. Journal of Mathematical Psychology, 51, 85-98.
Nosofsky, R. M. (1984). Choice, similarity, and the context theory
of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10(1), 104-114.
Nosofsky, R. M.
(1986).
Attention, similarity, and the
identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39-57.
Nosofsky, R. M., Gluck, M. A., Palmeri, T. J., McKinley, S. C., &
Gauthier, P. (1994). Comparing models of rules-based classification learning: A replication and extension of Shepard, Hovland,
and Jenkins (1961). Memory & Cognition, 22, 352-369.
Nosofsky, R. M., Kruschke, J. K., & McKinley, S. C. (1992). Combining exemplar-based category representations and connectionist learning rules. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 211-233.
Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Ruleplus-exception model of classification learning. Psychological
Review, 101, 53-79.
Nosofsky, R. M., & Zaki, S. R. (2002). Exemplar and prototype
models revisited: Response strategies, selective attention, and
18
MATHY & FELDMAN
stimulus generalization. Journal of Experimental Psychology:
Learning, Memory, and Cognition(28), 924-940.
Osherson, D. N., & Smith, E. E. (1981). On the adequacy of prototype theory as a theory of concepts. Cognition, 9, 35-58.
Pothos, E. M. (2005). The rules versus similarity distinction. Behavioral and Brain Sciences, 28(1), 1-14.
Rehder, B., & Hoffman, A. B. (2005). Thirty-something categorization results explained: selective attention, eyetracking, and
models of category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 811-829.
Restorff, H. von. (1933). Uber die wirkung von bereichsbildungen
im spurenfeld [on the influence of the segregation in the trace
field]. Psychologische Forschung, 18, 299-34.
Rips, L. J. (1989). Similarity, typicality, and categorization. In
S. Vosniadou & A. Ortony (Eds.), Simlilarity and analogical
reasoning ( pp. 21-59). Cambridge, MA: Cambridge University
Press.
Rosch, E., & Mervis, C. (1975). Family resemblances : studies
in the internal structure of categories. Cognitive Psychology, 7,
573-605.
Rosseel, Y. (2002). Mixture models of categorization. Journal of
Mathematical Psychology, 46, 178-210.
Sakamoto, Y., Jones, M., & Love, B. C. (2008). Putting the psychology back into psychological models: Mechanistic versus rational approaches. Memory & Cognition, 36(1057-1065).
Shepard, R. N. (1987). Toward a universal law of generalization for
psychological science. Science, 237, 1317-1323.
Shepard, R. N., Hovland, C. L., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs, 75, 13, whole No. 517.
Skorstad, J., Gentner, D., & Medin, D. L. (1988). Abstraction processes during concept learning: a structural view. In Proceedings of the 10th Annual Conference of the Cognitive Science Society (p. 419-425). Hillsdale, NJ: Lawrence Erbaum Associates.
Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119, 3-22.
Smith, E. E., Patalano, A. L., & Jonides, J. (1998). Alternative
strategies of categorization. Cognition, 65, 167-196.
Smith, E. E., & Sloman, S. A. (1994). Similarity- vs. rule-based
categorization. Memory & Cognition, 22(4), 377–386.
Smith, J. D., & Minda, J. P. (1998). Prototypes in the mist: The
early epochs of category learning. Journal of Experimental Psy-
chology: Learning, Memory, and Cognition, 24, 1411–1436.
Smith, J. D., & Minda, J. P. (2000). Thirty categorization results in
search of a model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 3-27.
Spiering, B. J., & Ashby, F. G. (2008). Initial training with difficult items facilitates information integration, but not rule-based
category learning. Psychological Science, 19, 1169-1177.
Stewart, N., Brown, G. D. A., & Chater, N. (2002). Sequence
effects in categorization of simple perceptual stimuli. Journal of
Experimental Psychology: Learning, Memory, and Cognition,
28, 3-11.
Thibaut, J. P., Dupont, M., & Anselme, P. (2002). Dissociations
between categorization and similarity judgments as a result of
learning feature distributions. Memory & Cognition, 30, 647656.
Thibaut, J.-P., & Gelaes, S. (2006). Exemplar effects in the context
of a categorization rule: Featural and holistic influences. Journal
of Experimental Psychology: Learning, Memory, and Cognition,
32, 1403 - 1415.
Vandierendonck, A. (1995). A parallel rule acti vati on and rule
s ynt he s i s mo de l a parallel rule activation and rule synthesis model for generalization in category learning. Psychonomic
Bulletin & Review, 2, 442-459.
Vanpaemel, W., & Storms, G. (2008). In search of abstraction:
the varying abstraction model of categorization. Psychonomic
Bulletin & Review, 15, 732-749.
Wahlheim, C. N., Dunlosky, J., & Jacoby, L. L. (2011 in press).
Spacing enhances the learning of natural concepts: an investigation of mechanisms, metacognition, and aging. Memory &
Cognition.
Wattenmaker, W. D. (1993). Incidental concept learning, feature
frequency, and correlated properties. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 19, 203-222.
Zaki, S. R., Nosofsky, R. M., Stanton, R. D., & Cohen, A. L. (2003).
Prototype and exemplar accounts of category learning and attentional allocation: a reassessment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1160-1173.
Zeithamova, D., & Maddox, W. T. (2009). Learning mode and exemplar sequencing in unsupervised category learning. Journal
of Experimental Psychology: Learning, Memory, and Cognition,
35, 731-741.

Similar documents

Serial-order Effects on Category Learning Fabien Mathy Jacob

Serial-order Effects on Category Learning Fabien Mathy Jacob 2012; Weitnauer, Carvalho, Goldstone, & Ritter, 2013). The Jacob Feldman. We are grateful to Azizedine Elmahdi and Nicolas interleaved vs. blocked factor has extended previous research Heller for a...

More information

The Influence of Presentation Order on Category Transfer Fabien

The Influence of Presentation Order on Category Transfer Fabien received course credits in exchange for their participation. Choice of categories studied. Each participant was administered a single 5-4 category set. The 5-4 is shown in the bottom hypercube of F...

More information