The dark side of incremental learning: A model of cumulative

Transcription

The dark side of incremental learning: A model of cumulative
ARTICLE IN PRESS
Cognition xxx (2009) xxx–xxx
Contents lists available at ScienceDirect
Cognition
journal homepage: www.elsevier.com/locate/COGNIT
The dark side of incremental learning: A model of cumulative
semantic interference during lexical access in speech production
Gary M. Oppenheim a,*, Gary S. Dell a, Myrna F. Schwartz b
a
b
Beckman Institute, University of Illinois, 405 North Mathews Avenue, Urbana, IL 61801, USA
Moss Rehabilitation Research Institute, 1200 West Tabor Road, Philadelphia, PA 19141, USA
a r t i c l e
i n f o
Article history:
Received 2 December 2008
Revised 26 May 2009
Accepted 18 September 2009
Available online xxxx
Keywords:
Cumulative semantic interference
Lexical access
Speech production
Competitive lexical selection
Incremental learning
Neural network model
a b s t r a c t
Naming a picture of a dog primes the subsequent naming of a picture of a dog (repetition
priming) and interferes with the subsequent naming of a picture of a cat (semantic interference). Behavioral studies suggest that these effects derive from persistent changes in the
way that words are activated and selected for production, and some have claimed that the
findings are only understandable by positing a competitive mechanism for lexical selection. We present a simple model of lexical retrieval in speech production that applies
error-driven learning to its lexical activation network. This model naturally produces repetition priming and semantic interference effects. It predicts the major findings from several published experiments, demonstrating that these effects may arise from incremental
learning. Furthermore, analysis of the model suggests that competition during lexical
selection is not necessary for semantic interference if the learning process is itself
competitive.
Ó 2009 Elsevier B.V. All rights reserved.
1. Introduction
Retrieving a word from memory has consequences for
later retrieval. This is particularly true when retrieval occurs in a semantic memory task such as picture naming.
It is well known that the second presentation of a picture
to be named speeds the naming response and diminishes
the chance of error. This phenomenon, known as repetition
priming, can be explained by the fact that each retrieval
event is also a learning event, and so the second retrieval
benefits from the learning that occurred the first time
(e.g. Mitchell & Brown, 1988). Somewhat less well known
is the fact that repetition priming has a ‘‘dark side”.
Retrieving a word has negative consequences for the subsequent retrieval of other words from the same semantic category (e.g. Abdel Rahman & Melinger, 2007; Belke, 2008;
Belke, Meyer, & Damian, 2005; Blaxton & Neely, 1983;
Brown, 1981; Damian & Als, 2005; Damian, Vigliocco, &
* Corresponding author. Tel.: +1 815 341 9557; fax: +1 217 244 5876.
E-mail address: [email protected] (G.M. Oppenheim).
Levelt, 2001; Howard, Nickels, Coltheart, & Cole-Virtue,
2006; Hsiao, Schwartz, Schnur, & Dell, 2009; Kroll &
Stewart, 1994; Schnur, Schwartz, Brecher, & Hodgson,
2006; Vigliocco, Vinson, Damian, & Levelt, 2002; Wheeldon
& Monsell, 1994). Following Oppenheim, Dell, and Schwartz (2007), we refer to these negative consequences as
cumulative semantic interference. In this paper, we explain
the mechanisms behind cumulative semantic interference
in the domain of picture naming. This explanation takes
the form of a computational model of lexical access in
speech production that simulates the major phenomena
in this domain. The model addresses meaning-based lexical retrieval in general, whether this is elicited by picturenaming, naming-to-definition, or spontaneous production.
Our focus, however, is on persistent changes to lexical processing that result from the natural retrieval of a single
word. The central theoretical point that the model implements is that repetition priming and cumulative semantic
interference are two sides of the same coin. They both result from an error-based implicit learning process that
tunes the language production system to recent experience.
0010-0277/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.cognition.2009.09.007
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
2
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
Although our model is formally developed only for lexical access in speech production, our theoretical goals are
more general. Cumulative semantic interference is a manifestation in speech production of a set of phenomena
known in the memory literature as retrieval-induced forgetting or RIF. Retrieval-induced forgetting studies demonstrate that the episodic memory for a word or association
can be impaired by the previous retrieval of a related
memory (e.g. Anderson, Bjork, & Bjork, 1994; but see also
Anderson & Neely, 1996, for a discussion of retrieval-induced forgetting in semantic memory). Currently, the
explanation for such impairment is debated, with some
claiming it results from suppressing previous competitors
(often termed inhibition or unlearning; e.g. Anderson
et al., 1994; Melton & Irwin, 1940; Norman, Newman, &
Detre, 2007; Postman, Stark, & Fraser, 1968) while others
claim it stems from strengthening previous targets (occlusion or ‘blocking’1; e.g. MacLeod, Dodd, Sheard, Wilson, &
Bibi, 2003; McGeoch, 1932; Mensink & Raaijmakers, 1988).
Our analysis of cumulative semantic interference in speech
production will, we claim, speak to this debate. More generally, our model reflects a recent trend in cognition to link
psycholinguistics with theories of learning and memory by
developing accounts of how experience changes language
processing (e.g. Chang, Dell, & Bock, 2006; Goldinger,
1998; Kraljic & Samuel, 2005).
Much of the theoretical importance of cumulative
semantic interference hinges on an alleged property of
requiring a competitive mechanism for lexical selection
(e.g. Howard et al., 2006). The most prominent theories of
lexical access (e.g. Levelt, Roelofs, & Meyer, 1999) assume
competitive lexical selection. Empirical support for this
assumption has often come from picture-word interference studies (e.g. Schriefers, Meyer, & Levelt, 1990), in
which speakers name pictures as they are presented at
short offsets from distractor words. However, since Mahon,
Costa, Peterson, Vargas, and Caramazza (2007) presented
an analysis demonstrating that picture-word interference
studies have not reliably supported the claims of competitive lexical selection, the search for empirical support
has turned to a simpler task: picture naming, specifically
with regards to cumulative semantic interference.
Two serial picture-naming paradigms have been particularly common in studies of cumulative semantic interference. First is the blocked-cyclic naming paradigm (e.g.
Damian et al., 2001). In each block, subjects repeatedly cycle through naming a small set of pictures (e.g. one block
might consist of four cycles through a set of six pictures).
In the homogeneous condition, all the pictures in the block
represent the same semantic category (e.g. farm animals),
and in the mixed condition each picture represents a different semantic category. Cumulative semantic interference is indexed by greater difficulty naming pictures in
the homogeneous condition relative to the mixed condition (the semantic blocking effect). Typically, the semantic
1
The term ‘blocking’ carries a quite different meaning in the retrievalinduced forgetting literature, where it refers to a hypothesis of competitorbased interference, than in the cumulative semantic interference literature,
where it tends to refer to the structure of an experimental design (i.e.
pictures may be presented in semantically homogeneous blocks).
blocking effect is not present in the first cycle and grows
over subsequent cycles (e.g. Belke et al., 2005). The second
important serial picture-naming paradigm, used by Brown
(1981, Experiment 4) and Howard et al. (2006), can be
called the continuous paradigm. In this method, pictures
drawn from several categories (e.g. animals, vehicles) are
named without repeating any item, but with multiple
exemplars from each category. Here, cumulative semantic
interference is demonstrated by naming times that increase linearly as a function of the number of previously
named pictures in that category. Importantly, the number
of interspersed pictures between each category exemplar
is irrelevant to the effect (Howard et al., 2006). For example, in the sequence GOAT, CAR, TOMATO, TRUCK, HORSE,
the naming time for HORSE would be slower than that
for GOAT, and would be unaffected by the number of unrelated intervening items.
1.1. The nature of cumulative semantic interference: Howard
et al.’s principles
Howard et al. (2006) argued that three specific properties of the lexical retrieval process must interact to produce
cumulative semantic interference in naming latencies:
shared activation, competitive selection, and priming. The
idea is that each time a target word is activated, semantically related competitors are also activated (shared activation), and strongly activated competitors slow down the
selection of target words (competitive selection). Retrieving
a word once primes its future retrieval (priming), making it
a stronger competitor when related words are retrieved in
subsequent trials, thereby causing those subsequent target
words to be retrieved more slowly. We will use these three
properties to structure our review of the phenomenon and
its implications for lexical retrieval.
1.1.1. Shared activation
When a target word such as DOG is activated during its
attempted retrieval, its semantic relatives such as GOAT
are also activated, thereby setting the scene for lexical
competition. This principle of shared activation for semantically related words is what makes cumulative semantic
interference specifically semantic in nature.
While the idea of shared activation is compatible with
most current theories of semantic representation, it arises
naturally from the use of distributed (or feature-based)
semantic representations such as those commonly employed in connectionist models (see McClelland & Rogers,
2003, for a review). Distributed mechanisms would predict
graded effects of semantic similarity, and indeed blockedcyclic picture naming studies have demonstrated that
more closely related items generate stronger interference
effects than those more distant (Vigliocco et al., 2002).
So, for the purpose of understanding cumulative semantic
interference, it may be useful to think of shared activation
arising from shared semantic features rather than all-ornone category membership. That is how shared activation
is implemented in our model.
As noted by Howard et al. (2006), however, shared
semantic activation does not require distributed representations. It may occur with non-decomposed (localist) lexi-
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
cal concepts (e.g. Roelofs, 1992) provided that related concepts connect either directly (e.g. Collins & Loftus, 1975) or
indirectly through shared category or property nodes (e.g.
Collins & Quillian, 1969), and each activated concept sends
activation to neighboring concepts. Moreover, any graded
effects can be attributed to gradations in the number or
strength of such connections. Thus, the finding of graded
cumulative semantic interference does not allow us to distinguish between distributed and localist semantic
representations.
1.1.2. Competitive selection
The second property of lexical retrieval that is required
for cumulative semantic interference, according to Howard
et al. (2006), is that lexical selection be competitive. That is,
increasing the activation of non-target words should decrease the speed and accuracy with which a target word
is selected. In a competitive selection process, words compete in the manner of two athletic teams during ‘‘suddendeath overtime”: the competition continues until a single
winner emerges. This might be implemented via either a
differential threshold (e.g. Levelt et al., 1999) or lateral
inhibition (e.g. Howard et al., 2006), but the key is that having multiple strong competitors makes it harder to select a
winner (Wheeldon & Monsell, 1994). A non-competitive
selection process (e.g. Mahon et al., 2007), in contrast, is
more like a horse race that ends when the first contestant
crosses a pre-determined absolute threshold. To illustrate
this difference, let us imagine selecting a target word,
DOG, when a competitor, GOAT, is also activated. According to a sudden-death competition method, the two compete until one clearly wins, so selecting DOG should be
slower and less accurate when GOAT is more active. Thus,
cumulative semantic interference in response time would
occur if the semantic manipulations raise the activation
of competitors. With a horse-race selection method, the
speed of DOG’s selection is entirely a function of DOG’s
own activation. The activation of GOAT does not enter into
the equation, so this non-competitive selection offers no
obvious way to account for cumulative semantic
interference.
Locating the competition within the word production
process is difficult, but several studies constrain it to a
point after semantic access and before phonological access.
Two findings argue for a post-semantic locus. First, performing non-verbal semantic judgments on pictures in
the blocked-cyclic paradigm has proven insufficient to elicit a semantic blocking effect (Damian et al., 2001). So any
competition that occurs during stages before lexical access
does not appear sufficient to drive the cumulative semantic interference effect. Second, bilingual continuous paradigm experiments indicate that cumulative semantic
interference accumulates independently for each language,
suggesting that the competitive selection process is language-specific and hence post-semantic (Castro, Strijkers,
Costa, & Alario, 2008, Experiments 3 and 4).
Some evidence suggests that the competition may instead characterize the selection of abstract, pre-phonological word-forms, or lemmas. Blocked-cyclic word naming
(reading words aloud) appears to produce semantic facilitation rather than interference, suggesting that the compe-
3
tition that results when naming pictures must arise before
retrieving phonological word-forms (Damian et al., 2001,
Experiment 2a). Retrieving gender-marked determiners
during word naming may bring back the semantic blocking
effect, suggesting that the competition affects the postsemantic, pre-phonological retrieval of abstract lexical
concepts (Damian et al., 2001, Experiment 2b).
Together, the principles of shared lexical activation and
competitive lexical selection are sufficient to produce the
sort of semantic interference that might be seen within a
single trial (e.g. as in the picture-word interference effect,
e.g. Schriefers et al., 1990). Shared activation causes
semantically related competitors to become active, and a
competitive lexical selection mechanism allows these
competitors to hinder selection of a target word. Making
semantic interference cumulative, however, requires some
mechanism by which processes during one trial can affect
subsequent trials. This is the function of priming.
1.1.3. Priming
Priming is Howard et al.’s final necessary property for
cumulative semantic interference. Retrieving a word once
should facilitate its future retrieval by either making the
word itself more accessible or making its competitors less
accessible.
While priming can be implemented in a number of
ways, its effects can be characterized as either temporary
or persistent. Temporary effects occur when priming is ascribed to changes in activation levels – either positive (e.g.
Crowther, Martin, & Biegler, 2008; Howard et al., 2006;
Wheeldon & Monsell, 1994) or negative (inhibitory)
changes (e.g. Brown, 1981; McCarthy & Kartsounis, 2000)
that are carried over from previous trials. For example,
selecting DOG may require the temporary suppression of
the urge to say CAT, which might make it more difficult
to access CAT for a short time. Persistent accounts (e.g.
Damian & Als, 2005; Howard et al., 2006; Schnur et al.,
2006) instead describe priming as a consequence of relatively permanent changes to the way words are accessed,
such as incremental learning. Inspired in part by neural
network models in which incremental learning is attributed to changes in connection weights rather than activation levels, persistent priming is an example of the learning
that continually adjusts the cognitive system to suit its
environment (e.g. Gupta & Cohen, 2002).
A critical property of the priming mechanism that
underlies cumulative semantic interference is that the
interference accumulates incrementally as a function of
relevant experience (such as naming semantically related
pictures), and is unaffected by irrelevant experience (such
as naming unrelated pictures). In Howard et al.’s (2006)
continuous paradigm study, naming pictures from a single
category, such as DOG and then GOAT, produced the same
linear accumulation of semantic interference whether the
relevant pictures were separated by two, four, six, or eight
unrelated items, suggesting that only the relevant experience matters. Moreover, in a variant of Howard et al.’s
(2006) continuous paradigm, Navarrete, Mahon, and
Caramazza (2008) showed that repeating an item, such
as DOG, GOAT, DOG produced the same cumulative interference as accessing an additional novel exemplar from
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
4
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
the category, thus demonstrating that each act of retrieval
contributes to the effect.
Further evidence of robustness to irrelevant experience
comes from the blocked-cyclic paradigm. Damian and Als
(2005) showed that performing nonlinguistic tasks (Experiment 1) and naming unrelated items (Experiments 2 and
3) in between naming DOG and GOAT failed to disrupt
the semantic blocking effect. Thus, the priming from each
relevant experience contributes separately and robustly
to the cumulative semantic interference effect, and irrelevant experience affects neither its accumulation nor
diminution.
Filler trials, as used in the Howard et al. (2006) and
Damian and Als (2005) studies required additional time
to present and process, increasing the chronological time
between the retrieval of related words. So these studies
speak to residual activation (or inhibition) accounts of
priming. Priming by residual activation should be strongly
affected by the time between prime(s) and target. As argued by Bock and Griffin (2000), the activation levels that
control language production must decay quickly in order
for production – the rapid sequential activation of linguistic units – to succeed. For example, computer simulations
of multi-word production have required that activation
levels decay with time constants such that activated linguistic units lose nearly all of their activation within a second or two (e.g. Dell, 1986). Similarly, the effects of
inhibitory processes in production are also time-bound.
For example, many production theories assume that selection of a linguistic unit entails a reduction to a zero or negative activation value for the selected unit (e.g. Dell, Burger,
& Svec, 1997; Houghton, 1990). However, the effects of this
inhibition are quite temporary and are designed to prevent
an immediate perseveratory error. The fact that the filler
trials in Howard et al. (2006), and Damian and Als
(2005), failed to affect cumulative semantic interference
suggests that the priming that underlies this effect is reasonably persistent. Hence, cumulative semantic interference therefore is likely not largely based on the positive
or negative changes in activation levels that arise solely
through the spreading activation mechanisms of the production system.
A more direct demonstration of the temporal insensitivity of priming comes from Experiment 1 of Schnur et al.
(2006), who compared naming latencies in a blocked-cyclic naming paradigm in which pictures were presented
either 1-s or 5-s after the previous response (i.e. a 1-s or
5-s response-stimulus interval, or RSI). Any time-based decay of residual activation predicts a statistical interaction
between presentation rate and semantic blocking condition, specifically less effect of the blocking with the long
RSI (e.g., Wilshire & McCarthy, 2002). Both presentation
rates produced reliable cumulative semantic interference
effects, with no interaction between RSI and semantic
blocking condition, demonstrating that the priming is
insensitive to the passage of time, at least at these
intervals.
To summarize, the priming that causes cumulative
semantic interference is temporally persistent, it accumulates with relevant experience, and it is insensitive to irrelevant experience. These properties offer an awkward fit for
mechanisms based on residual activation or inhibition of
linguistic units, both of whose effects should be expected
to decay rather quickly. Instead, we follow Damian and
Als (2005), Schnur et al. (2006), and Howard et al. (2006)
by suggesting that the priming that underlies cumulative
semantic interference emerges from small, persistent,
experience-driven, post-selection adjustments to the mapping from semantics to words, i.e. incremental learning
within the production system.
1.2. Speech errors and cumulative semantic interference
It appears that cumulative semantic interference does
more than slow lexical retrieval; it also causes lexical
selection errors. In a blocked-cyclic naming task with
healthy older controls, Schnur et al. (2006, Experiment 1)
found that naming latencies were higher and errors more
frequent in the homogeneous condition, relative to the
mixed condition. They also tested aphasic patients (Schnur
et al., 2006, Experiment 2), who made many more errors,
and reported two important findings. First, patients made
more semantic errors (e.g. naming DOG as CAT) and omissions in the homogeneous than mixed condition. Second,
these semantic blocking effects increased across cycles,
while other types of errors (e.g. phonological) showed the
opposite pattern. The patients’ increasing semantic blocking effects for semantic and omission errors thus resembled healthy adults’ increasing blocking effects for
naming latencies, suggesting that they might stem from
the same underlying causes.
The link between the blocking effects on errors in patients and on latencies in unimpaired speakers also has
some support from studies that attempt to associate these
effects with brain regions. Neuroimaging of healthy subjects demonstrated that activation in the left inferior frontal gyrus (LIFG) correlates with increases in naming
latencies due to semantic blocking and related manipulations (Moss et al., 2005; Schnur et al., 2009). The LIFG, for
reference, corresponds to Brodmann’s areas (BA) 44, 45,
and 47, the posterior part of which (BA 44/45) is Broca’s
area. And lesion analyses of patients from the Schnur
et al. (2006) study revealed an association between LIFG
damage and the increase in errors across blocking cycles
(Schnur, Lee, Coslett, Schwartz, & Thompson-Schill, 2005;
Schnur et al., 2009).
A second important finding from these patient studies is
that patients’ error effects are also robust to timing manipulations. Schnur et al. (2006) found that the blocking effect
on errors – like the blocking effect on naming latency with
unimpaired speakers – was not influenced by whether pictures were named with a 1-s or a 5-s RSI. Further examining these patients’ naming errors, Hsiao et al. (2009) found
that their within-set perseverations tended to match the
words that they had used most recently. For instance, if a
patient named pictures of a dog, a pig, and a goat correctly
(i.e. saying PIG DOG GOAT) before incorrectly naming a picture of a horse, then she was more likely to name the horse
as DOG than as PIG. Crucially, the key measure of recency
was not time, but the number of intervening items (henceforth item-lag). Specifically, the chance-corrected perseveration lag functions were the same regardless of whether
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
5
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
pictures were presented 1-s or 5-s after a patient’s most recent response. Together with the temporal insensitivity of
unimpaired speakers’ response time effects, these results
support our previous suggestion that relevant intervening
experience, not timing, matters for the build-up or dissipation of cumulative semantic interference.
1.3. Modeling cumulative semantic interference
Howard et al. (2006) presented an elegant model of the
effect of cumulative semantic interference on response
time. In their model, shared activation is implemented by
assuming that words receive continuous (integrated over
computationally discrete timesteps) activation from
semantic nodes, and each time one semantic node is activated, similar semantic nodes are also activated to a lesser
degree. Lexical competition is implemented by inhibitory
connections running from each word to every other word
(i.e. lateral inhibition). A word is only selected upon reaching an absolute selection threshold, but since activated
words inhibit each other, strong competitors can slow
down the selection of a target word. Finally, each time a
word is selected, its connection from its semantic node
grows stronger, implementing a priming function.
The Howard et al. model is noteworthy because it
instantiates the principles of shared activation, competitive selection, and priming and because it attributes the
interference to processes that are insensitive to time and
to unrelated interference. Our goal is to extend this approach. We do so in three respects. First, we identify the
priming mechanism with error-driven connectionist learning. This learning mechanism has the natural property that
each act of retrieval in a certain context strengthens the
target of retrieval (repetition priming) while at the same
time making it less likely that similar memories are retrieved instead in that context (similarity-sensitive interference). We will show how this mechanism is consistent
with both the perseveratory gradient in the production of
word errors and the insensitivity of cumulative semantic
interference to unrelated items or the passage of time. By
attributing these effects to error-driven learning, we link
up with the many cognitive models that are based on such
learning (e.g. Chang et al., 2006; Gupta & Cohen, 2002;
Plaut, McClelland, Seidenberg, & Patterson, 1996) and, as
Semantic
features
(inputs)
mammalian
horticultural
vehicular
we later demonstrate, this attribution addresses the question of whether retrieval-induced forgetting is caused by
‘‘inhibition”. Second, we develop the model in conjunction
with theories of word production so that it can account for
errors as well as response times. This requires a decision
process that allows for lexical competition to play out in
time, and for errors of commission and errors of omission.
This proposed decision process may underlie the correlation between cumulative semantic interference and activation of Broca’s area. Finally and most importantly, we offer
the hypothesis that cumulative semantic interference does
not, in fact, require a competitive mechanism for lexical
selection. Specifically, we demonstrate that competition
in the lexical selection process is unnecessary when combined with error-driven learning. The resulting non-competitive model, we claim, can explain the major findings
concerned with cumulative semantic interference.
1.4. The model
1.4.1. Overview
The key components of our model concern lexical activation, lexical selection, and learning. As in many models
of lexical retrieval in production, retrieving a word begins
with activating a set of semantic features (e.g. Dell et al.,
1997; Gordon & Dell, 2003; Rapp & Goldrick, 2000). These
semantic features each connect to a number of words, and
thus activate those words in proportion to the strength and
number of these connections (lexical activation, Fig. 1).
Thus, multiple words are activated, requiring some kind
of decision. For the model, we assume that the most active
word is chosen. However, when more than one word is
activated, it is assumed to be difficult to identify the most
active one (i.e. if the difference in activations is slight, the
winner is hard to ‘‘see”), so a ‘booster’ mechanism kicks
in to tease the activations apart. This booster repeatedly
amplifies each word’s activation until a winner can be selected (lexical selection), or until this boosting process
times out. Response time is assumed to be correlated with
the number of boosts needed for the winner to emerge. Errors of commission, such as semantic errors, occur when
the wrong word is chosen, and errors of omission occur if
the booster times out. Finally, after lexical selection has
concluded, an error-driven learning process adjusts the
terrestrial
aquatic
aerial
Trainable semantic-to-lexical
connections
Lexical
units
(outputs)
dog
whale
bat
tree
water lily
orchid
car
boat
airplane
Fig. 1. Words are activated by distributed semantic representations, or features. The strength of a connection from a semantic feature to a word is
established by an initial period of error-based learning, and this learning continues throughout the simulations.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
6
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
semantic-to-lexical connections so as to facilitate future
retrieval of the target word (learning). In the following sections, we describe the details of model’s architecture, and
its lexical activation, lexical selection, and learning
mechanisms.
1.4.2. Model architecture
The model is a feedforward two-layer network. Semantic feature nodes (e.g. FURRY or AQUATIC) form the input
layer of the network. Each feature node connects directly
to each of the word nodes (such as DOG or BOAT) in the
output layer. Connection weights are initialized at zero
and are continually adjusted through an error-driven driven learning process, as detailed later in this description.
There are no lateral connections between semantic feature
nodes or between word nodes, or reverse connections from
words to features.
1.4.3. Algorithms
Lexical activation. When semantic features are activated
(as we assume happens when a picture is presented), these
features in turn activate words. The net input, neti , to any
lexical node i, sums the activation, aj , of each semantic feature, j, times the weight of its connection to the lexical
node, wij (Eq. (1))
neti ¼
X
wij aj
ð1Þ
j
This net input, neti , is then converted to an activation, ai ,
via a logistic function (Eq. (2)).
1
ai ¼
1 þ eneti
ð2Þ
Thus, the activations range from zero to one. We assume
that lexical activation is imprecise and therefore add a
small amount of normally-distributed noise, m (with a
mean of 0 and a standard deviation of h), to the net input,
neti, yielding Eq. (3)
ai ¼
1
1 þ eðneti þmÞ
ð3Þ
Lexical selection. The next stage applies a competitive
winner-take-all process to the lexical activations, linking
increased lexical competition to increased naming latencies. A booster mechanism floods the network with additional activation that combines nonlinearly with the
existing lexical activation until either one word grows discernibly more active than the rest or the boosting process
times out. Notice that this booster process is ‘‘dumb” in the
sense that it does not know which word is the target. It
repeatedly boosts all words. But because it boosts them
in a multiplicative manner, the most active one gradually
increases its lead on the other words.
The booster is engaged only to the extent necessary to
select a single word (that is, it operates more when selection is difficult), recalling Schnur et al.’s (2009) reports of
greater LIFG activity as a function of increased lexical competition. Therefore we tentatively identify this booster
with the competition-biasing mechanisms that are
hypothesized to be a function of the LIFG (e.g. Kan &
Thompson-Schill, 2004; Thompson-Schill, D’Esposito,
Aguirre & Farah, 1997), but we acknowledge that our
implemented booster has arbitrary properties that lack
neural motivation. That is, we commit to the functions of
the booster (aiding selection when competition is present)
and its possible association to the LIFG (Broca’s area),
rather than to its implemented details.
The boosting process plays out over time. To determine
whether a winner has emerged, at each timestep, tn , we
compare the difference between the activation of each
word node, ai tn , and the mean activation of the other word
nodes, aothers tn , to a threshold value, s (Eq. (4))
s > ðai tn aothers tn Þ
ð4Þ
If no word’s activation difference exceeds the difference
threshold (i.e. Eq. (4) is false for all i), then the booster multiplies each word’s current activation level, ai tn , by a constant boosting factor, b > 1.0. The result becomes its new
activation level, ai tnþ1 (Eq. (5)). Then this testing and boosting process repeats
ai tnþ1 ¼ ai tn b
ð5Þ
A word is selected, that is, the boosting stops, if and
when its activation advantage over other words (per Eq.
(4)) is great enough. The timestep at which this selection
occurs, t selection , is treated as an index of the duration of
the lexical selection process, which should correlate with
naming latency. If, for the sake of simplicity, we assume
no variation in the repeated boosting, this iterative process
becomes computationally equivalent to Eq. (6)
tselection ¼ logb
s
ai t1 aothers t1
ð6Þ
However, if no node reaches the difference threshold
within a certain number of boosts, X, then no word is selected and the trial is an omission. This corresponds to a
simple ‘‘wait and give up” theory of omissions. So we do
not consider an omission as a special state that may be
achieved, but rather a lack of sufficient evidence for any
particular word, making it difficult to select a word quickly
enough.
Note that while the implemented boosting process may
be deterministic, based on the initial activations, the target
word will not necessarily be selected. The combination of a
discernible-difference threshold and a selection deadline
may preclude selecting any word if the difference in lexical
activations is too small. Adding noise to the lexical activations, as we have done, increases this chance and further
opens the possibility that a competitor will be selected instead. Furthermore, although we have not done so here,
one could assume that the boosting process is subject to
noise either in its normal operation, or in pathological
cases (e.g. LIFG damage), by allowing for boosts to randomly fail for particular words at particular time steps. A
noisy booster would then have properties in common with
sequential stochastic decision mechanisms such as a random-walk process.
We do not implement any residual activation or inhibition in this model. When the trial ends, either by selecting
a word or by failing to select a word before the deadline, all
activations return to zero.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
Learning. At the end of each trial, semantic-to-lexical
connection weights are adjusted according to Eq. (7),
which is the Widrow–Hoff or delta rule tailored for the logistic activation function (Rumelhart, McClelland, & the
PDP Research Group, 1986; Widrow & Hoff, 1960): Dwij is
the weight change for the connection to node i from node
j, g is the learning rate, and di is the desired activation of
node i
Dwij ¼ gðai ð1 ai Þðdi ai ÞÞaj
ð7Þ
Since this equation will prove crucial for understanding
the behavior of the model, we should unpack it a bit more.
We have said that learning is error-driven. This means that
connections are adjusted according to (di ai ), the discrepancy between the desired activation of output node i, di ,
and its actual activation, ai (that is, its activation before
boosting). So the error in a receiving node’s activation affects both the degree and the direction of the weight
change. Hence, when the error for an output node
(di ai ) is strongly positive, connections feeding it will be
greatly strengthened. When the error for an output node
is strongly negative, the connections feeding it will be
greatly weakened. Notice that because the logistic activation function precludes activations that are actually 0 or
1, every word unit will experience at least some error on
all trials, either positive or negative, if the desired activations are 0 or 1. Next, including the ai ð1 ai Þ component
scales weight adjustments to ai , such that weight changes
are greatest at ai ¼ 0:5 , and decrease as ai approaches 0 or
1. Thus, weight changes are strongest for connections that
contribute to moderate activations. Adding the aj specifies
that connections from input j should only be modified to
the extent that j is activated. And finally, the learning rate,
g, is simply an arbitrary global parameter, used to adjust
how rapidly weight changes occur.
Thus the learning algorithm increases the connection
weights from active semantic features to the target word,
and decreases weights from those features to all other
words, to the extent that those words were active before
boosting. Since this learning is based on the deviation between di and ai , it occurs regardless of whether the target
was ultimately selected. So if the network encountered a
dog (activating semantic features MAMMAL and TERRESTRIAL), then the connections from MAMMAL and TERRESTRIAL to DOG would strengthen, and the connections
from MAMMAL and TERRESTRIAL to any other activated
words (e.g. BAT) would weaken. The next time the network
encounters a dog, those same semantic features will activate DOG more efficiently (i.e. activating DOG more and
competitors less), increasing the speed and likelihood of
its selection.
2. Simulations
Our lexical learning model integrates several features
common to theories of lexical access: lexical retrieval begins when distributed semantic features activate words
(e.g. Dell et al., 1997; Rapp & Goldrick, 2000); lexical selection uses a differential threshold (e.g. Levelt et al., 1999);
and semantic-to-lexical connections are adjusted through
experience (e.g. Gordon & Dell, 2003; Howard et al., 2006).
7
Can this model account for the major behavioral manifestations of cumulative semantic interference? To answer
this question, we simulate several of the experiments described in the Introduction. First, Simulation 1 compares
predicted selection latencies in a continuous paradigm simulation to Howard et al. (2006)’s behavioral data, and establishes one additional prediction from the model. In
Simulation 2 we compare predicted latency patterns from
blocked-cyclic presentation to data from Schnur et al.
(2006, Experiment 1), Damian and Als (2005), and Belke
(2008). Simulation 3 tests whether the simulated semantic
blocking effect generalizes to new items, as reported by
Belke et al. (2005). In Simulation 4, we turn to the aphasic
patient data. Repeating the procedure from Simulation 2
with noisier lexical activations, we compare the resulting
error patterns with those reported by Schnur et al. (2006,
Experiment 2) and Hsiao et al. (2009). Simulation 5 explores
the mechanisms behind the model’s effects by distinguishing the influence of weight increases (facilitatory learning)
and weight decreases (inhibitory learning) on response
times, analogous to the ‘occlusion’ versus ‘inhibition’ debate
in the retrieval-induced forgetting literature. Finally, Simulation 6 examines the role of competitive lexical selection in
creating cumulative semantic interference, demonstrating
that competitive lexical selection is not, in fact, necessary
for any of the effects seen in Simulations 1–5.
Our goal in these simulations is to explore how a simple, omnipresent process – incremental learning – can lead
to a surprising range of behavioral effects depending on the
nature and ordering of stimuli during an experiment. Toward that end, we hold all aspects of the simulations constant throughout this paper, except where we have
motivated reasons to change them.
We implement a standardized vocabulary structure for
these simulations (Fig. 2). Equal numbers of words share
each semantic feature, and each word is uniquely specified
by the conjunction of two semantic features. For example,
WHALE, BOAT, and WATERLILY share the feature AQUATIC,
whereas BAT, PLANE, and ORCHID share AERIAL. And the
features AQUATIC and VEHICULAR specify the word BOAT,
while AQUATIC and MAMMALIAN specify WHALE.
When simulating picture naming in the model, we assume that the target picture is correctly recognized and
that its semantic features are properly activated. Features
that should be active get activations of 1, and those that
should not be active get activations of 0. So we do not simulate errors in pre-lexical processes, though we concede
that they can occur (e.g. Rogers et al, 2004).
Each simulation consists of two phases: training and
testing. In training, we simulate the acquisition of subjects’
pre-experimental lexical-semantic knowledge. Then, in
testing, we simulate the learning that occurs during the
experiment. We present trials that mimic the experimental
conditions being simulated and continue to adjust the connections according to the same learning algorithm and
parameter values that were used during training. So the
only difference between the training and testing phases
is that the testing phases focus on particular subsets of
the vocabulary.
When simulating multiple testing conditions, such as
homogeneous versus mixed blocks, we want to compare
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
8
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
Semantic
inputs
mammalian
horticultural
vehicular
terrestrial
aquatic
aerial
Inhibitory connection
(mean weight =-2.7)
Excitatory connection
(mean weight =0.47)
Lexical
outputs
dog
whale
bat
tree
water lily
orchid
car
boat
airplane
Fig. 2. A scaled-down illustration of a trained network’s vocabulary. Here, the feature MAMMALIAN excites three words (DOG, WHALE, and BAT) and
inhibits the rest. The feature AQUATIC works the same way. When these features are activated together, they activate all of the words that are either
mammalian or aquatic. But they activate WHALE most strongly because it is both mammalian and aquatic.
these directly. So we start the testing phase for each condition with the same trained weights. Thus the only difference between the conditions is the semantic relationship
of the items cued in their testing phases.
To simulate testing multiple subjects with pictures
from many categories, we repeat each simulation 10,000
times. Individual differences in experience are simulated
by beginning each replication with a fresh network in tabula rasa state, and then training it with 100 randomly ordered sweeps through the vocabulary to represent the
experience that this model subject brings to the experiment. Variation in the networks’ performance thus comes
from activation noise (present in both training and testing)
and differences in the order in which words are cued.
2.1. Simulation 1 – continuous paradigm
Howard et al. (2006) reported that picture-naming
latencies increased by a consistent amount for each
same-category item that was named. The magnitude of
the incremental increase was unaffected by the number
of intervening items from different categories. They
claimed that shared activation, competitive selection, and
priming were necessary for any model to account for these
findings. Because our model implements all three of these
properties, it should exhibit a lag-insensitive incremental
increase in selection times.
In Simulation 1a, we approximate Howard et al.’s protocol by cueing for production of five items from a single
semantic category, like ‘‘farm animals”, one at a time. A
variable number of unrelated fillers (two, four, six, or
eight) are cued between each critical item and the next,
allowing examination of how the accumulation of semantic interference is affected by the number of intervening filler items.
2.1.1. Method
Parameters for this simulation are given in Table 1. Constraints on these parameters are discussed in the Results
section.
Training. Networks were trained, as described above, on
a vocabulary of 20 semantic features and 50 words.
Testing. A list of 25 pictures comprised the test phase.
Five of the items on this list were critical items and came
Table 1
Model parameters for Simulations 1, 2, 3, 5, and 6.
Parameter
Value
Learning rate (g)
Activation noise (h)
Boosting rate (b)
Threshold (s)
Deadline (X)
0.75
0.5
1.01
1
100
from the same category. Each shared one of its two semantic features with every other critical item. Following Howard et al. (2006), a lag of 2, 4, 6, or 8 fillers separated each
critical item and the next, making a total of 20 fillers. These
fillers were randomly selected with the constraint that
they shared no semantic features with the critical items;
they were, however, free to share features with other fillers. Each lag occurred once in each list, yielding 4! = 24
possible lag sequences. Each of 10,000 networks was tested
with each of these 24 sequences.
Analyses. Following Howard et al. (2006), mean lexical
selection times were calculated for each lag and ordinal
position. In our simulations, these selection times are the
mean numbers of ‘boosts’ needed before one output node
could be selected, as described in Eq. (6). Following the
standard practice in picture naming studies, only the selection times for correct responses were included in these calculations. Errors of omission and commission occurred in
less than 1% of the trials and were excluded from the
analyses.
2.1.2. Results and discussion
As expected, each critical item took longer to select than
the previous one, with each item contributing an equal
increment to the selection time. There was no systematic
variation in this effect over the different lags (Fig. 3a).
These results are entirely consistent with Howard et al.’s
human data (Fig. 3b).
The proper behavior of the model requires some reasonable limits on its parameters. For instance, the boosting
factor (b) must be greater than 1.00. Otherwise, the booster
isn’t a booster. The threshold (s) and deadline (X) should
be such that, given the value of the boosting factor, words
are often selected before the deadline. Finally, the learning
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
(a)
9
(b)
Fig. 3. In Simulation 1a, the continuous paradigm, each critical item takes a bit longer to select than the previous one, regardless of the number of
intervening unrelated items. (a) Model-predicted mean correct selection times in each of the four lag conditions show identical increases with ordinal
position. We present only an overall mean selection time for ordinal position 1 because the lag for the first item is obviously impossible to calculate. (b)
Human subjects’ mean naming latencies show the same increase. Reprinted from Howard et al., 2006.
rate (g) must be sufficiently large that its effect is not obscured by activation noise (h).
The learning rate matters because incremental learning
underlies the accumulating interference in this model.
Each time a word is retrieved, connections from the activated semantic features to the activated words are adjusted. Connections supporting the target word
strengthen and those supporting competitors weaken. This
learning event promotes repetition priming during the
subsequent retrieval of the same target from these semantic features. But if one of these features is instead used to
cue a different target word, this same learning means that
the new target will be weaker and the previous target –
now a competitor – will be stronger. So the new target is
retrieved more slowly. When sequentially retrieving several words that all share a semantic feature, the learning
that follows each retrieval event makes the retrieved word
a strong competitor and further weakens those competitors that have not yet been named. Each newly named
word from the category therefore takes a bit longer to retrieve than the previous one. Thus the incremental increase in selection times arises from the incremental
adjustments to the semantic-to-lexical connections.
Priming by error-based connection adjustments also
makes the model’s semantic interference effect insensitive to both time- and item-based lag manipulations.
Time-lag insensitivity comes from the fact that all priming in the model is persistent, at least in the sense that
the weight changes do not decay with time. Furthermore,
there is no residual activation to decay while unrelated
items are named. Robustness to long item-lags with intervening unrelated fillers comes from the learning algorithm. The adjustments that follow each trial only affect
connections from the specific semantic features that were
active during that trial. DOG, for instance, relies on connections from TERRESTRIAL and MAMMAL. Thus, the only
thing that would make it more difficult to retrieve DOG
via these two features would be changes to connections
from them. So as long as TERRESTRIAL and MAMMAL
are not activated en route to lexical selection in any filler
trials, no number of filler trials will ever affect DOG’s
intentional retrieval.
In the introduction, we asserted that incremental learning has both a dark side (cumulative semantic interference) and a light side (repetition priming). However, the
first simulation only addressed the dark side. To see how
these light and dark sides interact, we turn to a related
experiment by Navarrete et al. (2008). They added a repetition component to the continuous paradigm. Participants
named related pictures separated by fillers, but the third or
fourth ordinal position was filled by either a novel related
picture or the same picture that appeared in the first or
second ordinal position, respectively. The question here
was how this repetition would affect naming latencies
for subsequent novel related items. Recall that Howard
et al. (2006) showed that naming latencies increase linearly each time a related item is named. Now if cumulative
semantic interference is type-based, meaning that what
matters is the number of unique related pictures a person
has named, then repeating one of the items should not increase RTs at all. However, if cumulative semantic interference is token-based, meaning that what matters is the
number of times a person has named related pictures (unique or not), then there should be no difference between
the interference that accrues from repeating an item and
that which accrues from accessing another novel item.
They found the latter: Each related retrieval, whether
introducing a novel item or repeating an item previously
accessed, contributed separately to the cumulative semantic interference.
But what of the model? Is its cumulative semantic
interference effect type-based or token-based? To address
this question, we now simulate Navarrete et al.’s (2008)
repetition experiment as Simulation 1b.
2.1.3. Method
Parameters, vocabulary, and training for this simulation
were identical to those of Simulation 1a. The testing phase
differed slightly, following Navarrete et al. (2008).
Testing. The testing phase was identical to that of Simulation 1a, except in two respects. First, we included filler
items between the critical items, but did not manipulate
item-lag. Second, either the third or fourth ordinal position
represented a second cueing of one of the earlier critical
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
10
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
(a)
(b)
Fig. 4. Repeating an item creates cumulative semantic interference similar to that from accessing a novel item. (a) Model-predicted mean selection times
from Simulation 1b. (b) Empirical results from Navarrete et al. (2008).
items. In one set of lists, the item in the first ordinal position was repeated in the third position. In the other set of
lists, the item in the second position was repeated in the
fourth position. So a five-position critical item sequence
might go DOG BAT DOG WHALE MOLE, or DOG BAT
WHALE BAT MOLE.
Analyses. Response times from multiple model subjects
were generated as in Simulation 1a. We compared the response times to name items in sequences with a repeated
item in ordinal position three to those in sequences with a
repeated item in position four.
2.1.4. Results and discussion
Repeating one of the critical items produced additional
interference, just like accessing another novel item from
the same semantic category (Fig. 4). Specifically, we see
that the linear increase in response times as a function of
ordinal position, apparent in Fig. 3, grows normally when
one of the ordinal positions is filled by a repeated item.
This simulated finding mirrors Navarrete et al.’s empirical
data. Thus, the model’s interference effect, like that in the
human data, appears to be token-based. In the model, this
happens because error-based learning creates additional
interference each time a related item is accessed.
It is also worth noting that the model approximates the
relative sizes of benefit due to repeating an item and the
cost that each similar item imparts, the former being several times larger than the latter. However, we do not want
to emphasize exact quantitative properties of the model, as
its representations and vocabulary are quite simplified.
2.2. Simulation 2 – blocked-cyclic paradigm
In the continuous paradigm, each subject named a large
number of pictures just once. The blocked-cyclic naming
paradigm, in contrast, requires subjects to repeatedly
name a small number of pictures. Here we gauge semantic
interference effects by comparing blocks of pictures from
the same semantic category, the homogeneous condition,
to blocks of unrelated pictures, the mixed condition.
There are two major findings from this paradigm. First,
when pictures are presented in the homogeneous condition, they take longer to name than the same pictures presented in the mixed condition (e.g. Damian et al., 2001).
Second, the magnitude of this semantic blocking effect increases with each cycle (Belke, 2008, Experiment 1; Belke
et al., 2005, Experiment 1; Damian & Als, 2005, Experiment
4; Schnur et al., 2006, Experiment 1). So the blocked-cyclic
paradigm’s semantic interference effect grows with each
cycle, similar to the way that the continuous paradigm’s
semantic interference effect grew with each related item.
Our second simulation simulates this blocked cyclic
procedure.
2.2.1. Method
Model parameters were identical to Simulation 1, and
are given in Table 1.
Vocabulary. The vocabulary for this and the remaining
simulations consisted of 12 semantic features mapped
onto 36 words. Each feature cued exactly six words, and
each word was cued by the intersection of exactly two features. Training followed the format described in the introduction to the simulations.
Testing. Two parallel testing phases simulated the
homogeneous and mixed conditions of a blocked-cyclic
naming experiment. In each condition, a set of six words
was repeatedly cued for four cycles, for a total of 24 trials,
with words ordered randomly within each cycle. The
homogeneous condition used six words from a single category, so they all shared one of their two semantic features. Words in the mixed condition each represented a
different category, so none shared a feature with any other
word in the set.
Both testing phases began with the same trained connection weights. But learning then continued separately
in each condition. This learning proceeded at the same rate
(g) as during the shared training phase. So each replication
tests the same trained network in both conditions, with no
carryover from one condition to the next.
Analyses. We report selection times in terms of the
mean number of ‘boosts’ for each item position in each
condition. As before, errors of omission and commission
occurred in less than 1% of the trials and were excluded
from the analyses.2
2
Errors were rare, but we note that they were relatively more frequent
in the homogeneous condition than the mixed, and in both conditions, the
errors were predominantly omissions. Given the infrequency of errors in
this simulation, however, we do not discuss them further here.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
2.2.2. Results and discussion
The trial-by-trial predictions (Fig. 5a) show both repetition priming and cumulative semantic interference. Repetition priming occurs in both the homogeneous and
mixed conditions; each time a word is cued, it is selected
more quickly. This is because the incremental learning that
follows each retrieval facilitates future retrievals of the target word by strengthening its connections and weakening
connections from its features to competitor words. Cumulative semantic interference only occurs in the homogeneous condition, creating an incremental increase in
selection times within each cycle. This is the same pattern
that we saw in Simulation 1, because it arises from the
same process. However, in this simulation, the relation between repetition priming and cumulative semantic interference becomes more apparent. Each time one word
gets stronger, its competitors get weaker. When that word
is retrieved again, it is primed. When one of the competitors is retrieved, though, it is subject to interference. Because repetition priming and cumulative semantic
interference are both at work in the blocked-cyclic paradigm we get a saw-toothed function for selection times
in the homogeneous condition (Fig. 5a), and a small decrease in the per-cycle mean selection times (Fig. 5b). A
smaller per-cycle decrease is precisely what we see in human data for the same task (e.g. Fig. 5c, from Schnur et al.,
2006, Experiment 1). So it appears that the model successfully extends to blocked-cyclic naming.
(a) 45
11
Our model predicts some attenuation of the repetition
priming across cycles, because of the learning algorithm.
Each retrieval is a learning event, thus decreasing error
for the next retrieval. And since the magnitude of each
weight change depends on the magnitude of the erroneous
activation for the relevant output node, this error-reduction reduces the weight change (and hence repetition
priming) that will result from the next repetition. We
acknowledge, though, that with our current parameter values for the learning and decision processes, the non-linearity is smaller than it is in the human data.
There is one feature of the data that this simulation
does not exhibit. While our main effect of blocking begins
in the first cycle, Belke et al. (2005) observed that in human
data it tends not to appear until the second cycle. In fact,
human data sometimes shows a brief semantic facilitation
effect, for example in the first cycle depicted in Fig. 5c.
However, there is evidence that this early facilitation represents a conscious strategic process rather than an integral part of lexical retrieval, and is therefore beyond the
scope of a model of lexical access. Three findings support
this conclusion. First, Wheeldon and Monsell (1994) reported a similar brief semantic facilitation in a continuous
paradigm experiment, and argued that it followed a different time course than the longer-lasting interference effect,
suggesting a separate process. Extrapolating to the
blocked-cyclic paradigm, a brief semantic facilitation could
delay the appearance of an interference effect until the
(b)
40
35
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
(c)
Fig. 5. Simulated lexical selection times from Simulation 2 mirror human subjects’ naming latency patterns for the blocked-cyclic paradigm. (a) Mean
simulated lexical selection times across four cycles through sets of six words, for 10,000 networks. (b) Predicted per-cycle means, derived from (a), for
comparison with the human data presented in (c). (c) Subjects’ per-cycle mean naming latencies across four cycles, from Schnur et al. (2006).
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
12
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
second cycle. Second, Damian and Als (2005, Experiment 4)
demonstrated that the semantic interference appears in
the first cycle, and grows thereafter, if homogeneous and
mixed sets are interwoven a single block. This suggests
that subjects’ expectations may play a role in the early
facilitation. Finally, and most conclusively, Belke (2008,
Experiment 1) showed that adding a working memory load
to the blocked-cyclic naming paradigm led to semantic
interference in the first cycle, with no sign of early facilitation. Since the dual-task disrupts semantic facilitation
while leaving the growing semantic interference effect intact, we can conclude that the facilitation represents a resource-demanding process that is distinct from whatever
underlies the cumulative semantic interference effect and
is not integral to the process of lexical retrieval.
ruption, to naming an entirely new set of birds. Belke et al.
(2005) argued that this carryover reflected a refractory
behavior where related words become temporarily inaccessible due to residual activation and/or inhibition.
In order to test whether a learning model could account
for Belke et al.’s finding without representing residual activation or inhibition, we now test the model using their
procedure. They compiled sets of pictures from one category, and had subjects name half of these pictures for four
cycles before switching over to name the other half for four
cycles. For example, given a set of birds {CROW FINCH
GULL JAY ROBIN SPARROW} subjects might name CROW,
FINCH, GULL, FINCH, CROW, GULL, FINCH, GULL, CROW,
GULL, FINCH, CROW, and then immediately switch over
to JAY, ROBIN, SPARROW, ROBIN, SPARROW, JAY, ROBIN,
JAY, SPARROW, JAY, SPARROW, ROBIN. The crucial question is whether the naming latency difference between
homogenous and mixed conditions that develops while cycling through the first subset will continue in the first cycle
of the new subset.
2.3. Simulation 3 – generalization of interference to new
pictures
An interesting empirical property of the semantic
blocking effect is that it extends to naming new pictures
from the same category (Belke et al., 2005, Experiment
3). For instance, repeatedly naming a small set of birds,
such as CROW, FINCH, and GULL, creates a substantial
semantic blocking effect that will carry over, without inter-
(a)
2.3.1. Methods
We followed the methods of Simulation 2, with just one
change in the testing procedure. Instead of cuing one set of
six words for four cycles during testing, we now cued one
(b)
45
Selection time (in boosts)
Homog
Mixed
40
35
30
1
2
3
4
5
6
7
8
Cycle
(c)
Homog
Selection time (in boosts)
800
Mixed
700
600
1
2
3
4
5
6
7
8
Cycle
Fig. 6. Results from Simulation 3. (a) Lexical selection times for items in homogeneous and mixed sets presented in three-trial cycles. The second subset
replaced the first after the twelfth trial. (b) Per-cycle mean selection latencies, derived from (a), for comparison with (c). (c) Human subjects’ per-cycle mean
naming latencies, from Belke et al. (2005).
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
13
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
set of three words for four cycles, and then a set of three
different words for four additional cycles. In the homogeneous condition, all six words shared a single semantic feature. None of the six words in the mixed condition shared
any semantic features.
Model parameters were identical to Simulation 1 and 2,
and are given in Table 1.
2.3.2. Results and discussion
The simulated blocking effect carried over from the first
to the second subset without interruption (Fig. 6a). Cumulative semantic interference built up while naming the first
subset in the first four cycles, resembling the findings from
Simulation 2. Upon switching to the second subset, in cycles five through eight, the interference effect continued
with the entirely new items. The blocking effect in the fifth
cycle exceeded that in the first cycle, demonstrating that
the accumulated semantic interference transferred to
new items from the same category.
In the model, interference generalizes to new items
from the same category because the incremental weight
changes that follow each selection affect all competitors.
Indeed, this is required for the model to simulate the results of Howard et al.’s (2006) continuous paradigm in
which same-category items do not repeat. Importantly,
the fact that our simulated data (Fig. 6b) mirrors Belke
et al.’s (2005) human data (Fig. 6c) demonstrates that the
human results do not depend on the residual activation
or inhibition that are normally associated with refractory
behaviors (e.g. Forde & Humphreys, 1997; McCarthy & Kartsounis, 2000). Rather, this behavior can arise from persistent incremental learning.
2.4. Simulation 4 – aphasic errors
Semantic blocking manipulations elicit longer picture
naming latencies from healthy subjects, and they also affect lexical selection errors made by individuals with aphasia (Schnur et al., 2006). Semantic errors and omissions
become increasingly likely in the homogeneous condition,
compared to the mixed baseline (Schnur et al., 2006,
Experiment 2). Other types of errors (e.g. phonological or
unrelated errors) do not show such effects. Furthermore,
when patients name pictures incorrectly, their within-set
substitutions tend to match the words that they have used
more recently, and there was no difference between this
perseverative recency effect at 1-s and 5-s inter-stimulus
intervals (Hsiao et al., 2009).
To simulate the patient error effects, we need a theory
of the differences between aphasic and nonaphasic lexical
access in this task. Here we follow the approach of several
researchers (e.g. Dell et al., 1997; Rapp & Goldrick, 2000)
and attribute impaired performance to the alteration of
one or more model parameters, but without changing
any of the model’s processes. Specifically, we increase the
activation noise parameter, with the result that errors become much more likely. Rapp and Goldrick specifically
simulated aphasia by activation-noise lesions, and Dell
et al. demonstrated that noise lesions mimic the decay lesions that they used to simulate some patients.
Schnur et al. (2006) compared unimpaired (Experiment
1) and aphasic (Experiment 2) performance in the blockedcyclic paradigm by running both groups in essentially the
same experiment. We do likewise, repeating Simulation
2, which we had based on their Experiment 1 procedure,
with noisier lexical activations. Our analyses follow Schnur
et al. (2006, Experiment 2) and Hsiao et al. (2009), so that
we can compare the model’s predictions directly to the
data. So, if successful, the predictions should show Schnur
et al.’s semantic blocking effects for semantic errors and
omissions, and Hsiao et al.’s recency gradient for perseveration errors.
2.4.1. Methods
The vocabulary, training, and testing followed Simulation 2, only increasing the amount of noise in the lexical
activations from 0.5 to 1.0. The new parameters are given
in Table 2.
The analyses followed those that Schnur et al. (2006)
and Hsiao et al. (2009) used for their human data. Schnur
et al.’s (2006, Experiment 2) major findings concerned
omissions and semantic errors, so our analyses focused
on these as well. As described in the Model Description
section, omissions occurred when the activations of lexical
nodes were so similar that no winner could be selected before the deadline. Incorrect selections were classified as
either semantic errors or other errors, according to whether
the target and error shared a semantic feature. Following
Schnur et al., we calculate per-cycle means for each error
type.
Perseveration error analyses followed Hsiao et al.
(2009). These errors occurred when an erroneously selected word matched another target in the block whose
name had been selected previously. These errors were rare
in the mixed condition, and so perseveration error analyses
were restricted to the homogeneous condition. For each
such error, we recorded the number of trials back that
the erroneously selected word was most recently selected.
So if a dog was named as DOG, a bat was named as
BAT, and a whale was then named as DOG, then the
whale ? DOG error would be counted as a perseveration
at lag-2 because DOG was last produced two items back.
Actual target-error pairs from the testing phase were then
randomly reshuffled within each cycle, with the results
coded as above, in order to estimate the probability that
such a lag distribution might occur by chance. Following
Hsiao et al. (2009, pp. 136–137) and Cohen and Dahaene
(1998), p. 1643) before them, we thus derived a chancecorrected estimates of perseveration frequencies at each
lag.
Table 2
Model parameters for Simulation 4. Except for activation noise, these
parameters are identical to those given in Table 1.
Parameter
Value
Learning rate (g)
Activation noise (h)
Boosting rate (b)
Threshold (s)
Deadline (X)
0.75
1
1.01
1
100
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
14
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
Hsiao et al. (2009) reported perseveration functions for
each of two ISIs. Our model assumes no time-based decay,
and should therefore predict the same effects for any ISI.
However, we give some measure of the variability of our
predicted perseveratory effect by presenting data from
four 10,000-block replications.
2.4.2. Results and discussion
Doubling the activation noise substantially increased
the predicted error rates, producing errors in 12.4% of
homogeneous trials and 10.9% of mixed trials.3 Errors of
commission were largely semantic, with less than 0.01%
unrelated errors occurring in either condition.4 The homogeneous condition elicited more semantic errors (Fig. 7a) and
omissions (Fig. 7b) than the mixed condition. These effects
increased across cycles, recalling Schnur et al.’s (2006,
Experiment 2) aphasic patients’ error patterns (Fig. 7c).
Within-set substitution errors accounted for 51% of the
semantic errors in the homogeneous condition, in agreement with Schnur et al. who reported 53%. In our simulated data, most (82.2%) of these within-set substitutions
were identified as perseverations. As illustrated in Fig. 8a,
the tendency to perseverate a word exceeds chance at
short lags, but falls to near-chance levels thereafter. This
pattern recalls Hsiao et al.’s (2009) report that aphasic patients’ were more likely to perseverate words that they had
used more recently (Fig. 8b).
One apparent difference between our predicted lag
function (Fig. 8a) and Hsiao et al.’s (2009) mean lag function (Fig. 8b) is that our model is most likely to perseverate
words at lag-1, whereas their patients were not. However,
Hsiao et al. noted that this was not actually a reliable feature of their patient data, and actually excluded lag-1 from
their statistical analyses. Roughly half of their patients
showed this lag-1 dip, while half did not. Given this lack
of consistency across patients, we follow one of Hsiao
et al.’s proposed explanations, positing that their lag-1
dip may merely reflect a repetition-avoidance strategy that
some patients chose to employ. Such a strategy has some
precedence in non-patient work. For instance, Anderson
and Neely (1996) cite two early naming-to-definition
experiments demonstrating that preceding a trial with a
semantically related prime produced facilitation if the
prime was never the correct answer, but interference if
the prime was sometimes the correct answer (Brown,
1979; Roediger, Neely, & Blaxton, 1983). This change suggests that a brief facilitation or anti-perseveration effect
may merely index participants’ sensitivity to the sequen3
These totals included 2.7% semantic errors and 8.9% omissions overall,
but these relative proportions depend on the values of the selection
deadline and activation noise parameters. That is, a greater value for the
deadline would decrease the number of omissions and increase the number
of semantic errors, so the fact that there are more omissions than semantic
errors is merely an accident of parameters. Therefore we focus on these
error types individually, but avoid comparing them.
4
Schnur et al. (2006) also reported that ‘‘Other errors” (e.g. phonological
slips and unrelated word errors) become increasingly likely in the Mixed
condition, relative to their frequency in the Homogeneous condition, as
shown in Fig. 7c. Although our model does not currently generate such
errors, adding some noise to semantic-input activations increases the
likelihood of unrelated word errors, and these do in fact show such reverse
blocking effects.
tial-statistical properties of the testing paradigm (i.e. participants learn to expect that no picture will be
immediately repeated, allowing them to constrain their
predictions for the next item). Consistent with applying
such an interpretation to this patient data, Hsiao et al.
noted that targets only immediately repeated in 2.2% of
their trials, so a repetition-avoidance strategy would have
helped performance on the vast majority of trials.
While the precise omission-to-commission error ratio
depends on the selection deadline, all of the error predictions that we have presented here hold true for a wide
range of parameters. As long as there is sufficient noise
to produce errors, but not so much that selection becomes
completely random, the semantic error and perseveration
effects emerge. And as long as the selection deadline
parameter cuts off some, but not all, lexical selections,
the omission effects also show up. Thus, the effects are
not just a matter of fitting the model to the data.
So where do these error effects come from? Since activations return to baseline at the end of each trial, any errors necessarily result from the interaction of activation
noise and persistent changes in connection weights. Recall
that omissions occur when target and competitor activations are too similar for a winner to be resolved before
the selection process times out. Semantic errors happen
when a non-target word from the target category becomes
substantially more active than its competitors, including
the target. The conditions underlying these errors are rare,
under normal circumstances. But adding noise to the model increases the likelihood that errors will arise from target
and competitor net inputs that are merely ‘‘somewhat similar”. And these somewhat similar net inputs drove the
selection latency effects in Simulations 1–3. Thus, the error
effects derive from the same basic process that led to latency effects in the previous simulations: incremental
learning.
Incremental learning (Eq. (7)) also drives the recency effect for perseverations. Here it may be helpful to work
through an example. Let us imagine a patient sequentially
naming three pictures, DOG, BAT, and WHALE, that all
share a MAMMAL feature. When he first names DOG, the
MAMMAL feature will support each of these words moreor-less equally. After naming DOG, however, the link from
MAMMAL to DOG is strengthened and the links from
MAMMAL to BAT and to WHALE are weakened. Thus, when
the patient encounters the picture BAT, the MAMMAL feature will activate DOG more strongly than BAT or WHALE,
making him more likely to name the picture as DOG than
as WHALE. This is the primary basis of the recency effect:
words tend to form stronger competitors when their connections from shared features have been strengthened
more recently, and weakened less recently.
We should note, however, that other learning models
could explain a recency effect, provided that they can explain errors in addition to response times. In Howard
et al.’s (2006) priming model, for instance, a semantic-tolexical connection is strengthened by a fixed increment
each time its associated word is selected. Thus, the DOGBAT-WHALE explanation for recency outlined above is also
consistent with models that learn just by strengthening
each item by a fixed amount.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
(a)
15
(b)
(c)
Fig. 7. Predicted semantic error and omission patterns from Simulation 4, compared to patient error data for the same task. (a) Simulated semantic errors
become increasingly more frequent in the Homogeneous condition, compared to the Mixed condition. (b) The differences in the omission error rates
between the Homogeneous and Mixed conditions also increase across cycles. (c) Patient error data from Schnur et al. (2006).
Chance-corrected perseveration frequency
(a) 100
(b)
80
60
Replication 1
40
Replication 2
Replication 3
Replication 4
20
Mean
0
−20
−40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Lag
Fig. 8. Chance-corrected perseveration frequencies in the homogeneous condition, plotted as a function of item-lag. (a) In Simulation 4, perseveration
errors at short lags are more frequent than would be expected by chance. (b) Patients’ perseverations, from Hsiao et al. (2009), are also more frequent at
short lags.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
16
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
Does our error-based learning algorithm offer any
advantage in explaining the recency gradient for perseverations? To investigate this question, we repeat the perseveration simulation with three different learning rules
applied at testing. First, we present results from a model
using our standard delta-rule learning algorithm. To gauge
the contribution of error-based learning over other approaches, we compare these results to those from a version
of the model that uses a non-error-based connectionstrengthening algorithm. Finally, to assess how much of
the perseveration effect might reflect random variation,
structural peculiarities of the blocked-cyclic paradigm, or
persistent biases that cannot be attributed to either learning process, we present results from a version that does not
learn at all during the testing phase.
Networks were trained as normal, using a delta-rule
learning algorithm and the parameters listed in Table 2.
Perseveration simulations followed, as described above,
but with two changes.
First, after training each network as normal, we applied
one of three learning algorithms at the time of testing. The
delta-rule simulation implemented the same error- and
activation-proportional learning algorithm (Eq. (7)) that
we have used elsewhere in this paper. The priming simulation instead implemented a fixed-increment unsupervised
learning algorithm. Each time a word was selected, connections to it from the active features were strengthened
by a fixed increment.5 Finally, the non-learning simulation
did not strengthen or weaken any connections during the
test phase.
Second, to preclude the possibility that omission errors
might obscure differences in the perseveration effects, we
removed the timeout parameter (X) during the test phase.
So these simulations do not allow for omission errors,
meaning that they will not be directly comparable to our
previous simulations. However, removing the omission
behavior allows us to keep all parameters constant across
simulations, without further concern for matching error
and omission rates.
The delta rule and priming simulations produced equivalent rates of semantic errors (5.79% and 5.80%, respectively). This means that we can compare the strength of
these recency effects more or less directly. The non-learning simulation produced slightly more semantic errors
(6.60%), owing to the fact that its mappings never
improved after training. As in the previous simulation,
non-semantic errors were exceedingly rare, so we describe
perseveration effects only for the homogeneous condition.
In the delta-rule simulation, perseverations were again
much more likely to match recent responses (Fig. 9). The
priming simulation also produced some recency effect for
perseverations, though this was much smaller. And the
non-learning simulation demonstrates that both effects
exceed what we might expect to emerge from any persistent biases in model weights. It is thus clear that incremental learning is crucial for the recency effect.
5
We set this increment equal to the average weight increase for the first
target in the test phase of the delta rule simulation (i.e.
Dwij ¼ gðai ð1 ai Þðdi ai ÞÞaj ¼ 0:75 ð0:72 ð1 0:72Þð1 0:72ÞÞ 1 ¼
0:042).
The delta rule creates the stronger recency gradient by
causing connection weakening as well as connection
strengthening. As in the priming account, the more recent
a potential competitor is, the more likely that it will have
been strengthened more times by the delta rule in the
blocked-cyclic paradigm. However, the delta rule amplifies
the gradient because the more recent a competitor is, the
fewer times it will have been weakened by the action of
the rule, since its last strengthening. So, in the sequence
BAT, DOG, WHALE, when WHALE is the target, both BAT
and DOG have been strengthened, and so both are potential perseverates. But DOG, the more recent item, is a more
likely error for WHALE because BAT was weakened during
the DOG trial. BAT was especially active at that time (because it had just been strengthened), and hence the delta
rule will have weakened it in proportion to its activation.
Thus, DOG will be a more powerful competitor than BAT
during the WHALE trial.
2.5. Simulation 5 – facilitation versus inhibition
Our learning algorithm specifies changes in connection
weights whenever there is some degree of error and it does
not distinguish between the learning that strengthens connections and the learning that weakens them. For example,
we saw that, in the analysis of the model’s perseverations,
both strengthening and weakening of connections contribute to the effect of recency on these kinds of errors. But
both processes arise from the same weight-change
equation.
Several explanations of cumulative semantic interference, however, attribute it either to a process that strengthens competitors or a process that inhibits targets, but not
both, and thus there is a debate about whether this effect
is, at core, one of facilitation or inhibition. This is analogous
to the occlusion versus inhibition debate in the retrieval-induced forgetting literature. The central idea in the facilitation (or occlusion) explanation is that retrieving an item
facilitates its future retrieval, making it a stronger competitor when related words are cued. Thus, lexical competition
grows more intense when retrieving words in a homogeneous context because each new target faces a growing
number of prepotent competitors. This facilitatory account
dominates much of the cumulative semantic interference
literature. Damian et al. (2001) and Belke et al. (2005) both
suggested temporary facilitation mechanisms, while
Wheeldon and Monsell (1994), Damian and Als (2005), Schnur et al. (2006), and Howard et al. (2006) argued for more
persistent facilitation. Some memory researchers have also
offered facilitation-based explanations of retrieval-induced
forgetting (e.g. Macleod et al., 2003; McGeoch, 1932; Mensink & Raaijmakers, 1988).
However, inhibition has played a more prominent role
in discussions of retrieval-induced forgetting (e.g. Anderson, 2003; Anderson et al., 1994; Norman et al., 2007; Postman et al., 1968). The basic idea here is that retrieving a
target necessarily harms its competitors, whether by inhibiting concepts directly (e.g. Postman et al., 1968) or merely
making them less accessible via retrieval cues (e.g. Melton
& Irwin, 1940). For cumulative semantic interference, an
inhibition account would entail the active suppression of
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
17
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
100
Chance-corrected perseveration frequency
80
60
Delta rule learning
40
Persistent priming
No learning at test
20
0
−20
−40
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Lag
Fig. 9. Chance-corrected perseveration frequencies as a function of item-lag, using three different learning rules during the testing phase: (1) Error-based
delta-rule learning, (2) non-error based fixed-increment strengthening (priming), and (3) no learning. As in Fig. 8a, each spline represents a mean of four
10,000-block replications, though here we have omitted the individual data points in favor of visual clarity.
lexical competitors during each naming attempt, with the
lasting consequence of making them less accessible. Each
new target in a homogenous set then becomes increasingly
difficult to retrieve by virtue of being repeatedly suppressed each time one of its competitors is retrieved instead. Though pure inhibitory explanations are relatively
rare in discussions of cumulative semantic interference,
we note one example in McCarthy and Kartsounis’ (2000)
report of patients’ omission errors.
So in all these cases we see semantic interference attributed to priming that either strengthens recent targets or
weakens recent competitors, but not both. And, as Schnur
et al. (2006) pointed out, these two mechanisms generate
different predictions. We can use our model to separate
the strengthening and weakening processes to see what
each contributes to predicted cumulative semantic interference effects. In this simulation, we selectively apply
either connection strengthening or connection weakening
while disrupting the other. If the model realizes the notion
that cumulative semantic interference arises from
strengthening targets, then a version of the model that
only allows for connection strengthening should exhibit
the interference effect. If cumulative semantic interference
arises from weakening competitors, then it should be seen
in a model version that only allows for connection
weakening.
2.5.1. Methods
Model parameters were identical to Simulations 1–3,
and are given in Table 1.
Methods for these simulations followed Simulation 2
exactly, but with one additional manipulation during the
testing phase. Each replication began with the standard
training phase, where learning was applied according to
the delta rule. So the training was identical to Simulation
2. During the testing phase, however, the learning algorithm was modified to only apply either weight increases
or decreases. In Simulation 5a, these weight changes only
increased connection weights (Eq. (8)) and in Simulation
5b, they only decreased the weights (Eq. (9))
Dwij ¼
Dwij ¼
Dwij
0
if Dwij P 0
if Dwij < 0
Dwij
0
if Dwij 6 0
if Dwij > 0
ð8Þ
ð9Þ
2.5.2. Results and discussion
The simulations that only increased or only decreased
weights had distinct effects on simulated lexical selection
times (Fig. 10). Recall that, in Simulation 2, selection times
in the homogeneous condition increased within a cycle
and decreased across cycles, while the blocking effect
(homogeneous-mixed difference) increased both within
and across cycles (Fig. 10a). This pattern represents the
combined effects of increased and decreased weights. In
Simulation 5a, weight increases alone produced a step-like
decrease for selection latencies in both homogeneous and
mixed conditions (Fig. 10b). This is what we would expect
from a repetition priming effect. However, as shown by the
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
18
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
(a)
(b)
As one would expect from Fig. 10, this blocking effect
was stronger when learning via weight decreases than
when learning via increases. Moreover, learning via weight
decreases produced a blocking effect for omission errors
that increased across cycles, thus tracking the interference
effects in response times seen in Fig. 10c.
From these findings, we can characterize the model’s
repetition priming as a facilitatory (strengthening of
weights) effect, and its cumulative semantic interference
as an inhibitory (weakening of weights) effect. Thus, the
model’s account clearly differs from a popular characterization of cumulative semantic interference as a facilitation-based effect, adopting instead an inhibition
explanation that better resembles recent accounts of retrieval-induced forgetting (e.g. Norman et al., 2007).
2.6. Simulation 6 – competition in selection and learning
(c)
Fig. 10. Strengthening connections to target words produces a repetition
priming effect, while weakening connections to competitors produces a
semantic interference effect. (a) Applying both weight increases and
decreases (Simulation 2) generates selection latencies resembling a sawtooth pattern. (b) Increasing connection weights to target words (Simulation 5a) decreases selection times in each cycle, creating a step-like
pattern. (c) Decreasing weights to competitors (Simulation 5b) continually increases selection times in the homogeneous condition.
close overlap of curves, the semantic blocking effect was
minimal, suggesting that weight increases play a much
stronger role in repetition priming than cumulative
semantic interference. Learning through weight decreases
alone (Simulation 5b) generated a robust semantic blocking effect with only a very weak repetition priming effect
(Fig. 10c). Selection times steadily increased in the homogeneous condition and decreased only slightly in the
mixed condition.
Errors were rare, occurring in less than 1% of the trials
in either simulation. But, as in Simulation 2, these errors
were predominantly omissions and were relatively more
likely in the homogeneous conditions than in the mixed.
In the previous simulation we explored the contributions of weight strengthening and weakening to repetition
priming and cumulative semantic interference. Crucially,
we found that the weight-strengthening version of the
model failed to produce cumulative semantic interference.
This finding is unexpected because this restricted version
still implements Howard et al.’s (2006) necessary and sufficient principles of shared activation, priming, and competitive selection. Shared activation was clearly present
in the distributed semantic features. The experience-based
weight increases facilitated access to recent targets, fulfilling Howard et al.’s priming function. And the lexical selection algorithm was competitive by virtue of implementing
a differential threshold. So why didn’t it work?
The answer may lie in the notion of competitive selection that our model implements. In a nutshell, our version
of competitive selection may not have been competitive
enough. Recall that our selection rule (Eq. (6)) compares
each word’s activation to the mean activation of all its
competitors. Facilitation-based explanations of cumulative
semantic interference often focus on the role of individual
prepotent competitors in prolonging lexical selection
times. But, with a large vocabulary that includes lots of
unrelated words, using an average activation for the differential threshold could minimize the impact of these prepotent competitors. That is, having a competition between
the target and the mean activation of other words is quite
close to having a competition between the target and some
absolute standard or threshold, which is not technically a
competition.
If this analysis is correct and lexical selection in our
model truly functioned as a non-competitive process, then
by Howard et al.’s logic the full version of the model with
both strengthening and weakening of weights should not
have been able to generate cumulative semantic interference in any of the previous five simulations. But it did. In
the following simulations we explore this puzzle and offer
a solution, a solution that forces us to change our conceptions of what constitutes competition. Specifically, we first
verify our intuition that the competition involving the
mean of the competitors is functionally like that of using
a non-competitive absolute-threshold decision rule by
repeating Simulation 5 with an absolute threshold. From
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
this, we then develop a new hypothesis about the nature of
competition and test one of its ramifications by repeating
Simulation 5 with a clearly competitive selection rule in
which the target is compared against only its most potent
competitor.
19
(a)
2.6.1. Methods
We repeated Simulation 5 with an absolute (i.e. non-differential) threshold, replacing Eq. (6) (reprinted below as
Eq. (10)) with Eq. (11), below. Thus, lexical selection becomes a non-competitive horse race. Competitors’ activations do not affect selection times, and the first word to
reach the threshold wins
t selection ¼ logb
t selection ¼ logb
s
ai t1 aothers t1
s
ai t1
(b)
ð10Þ
ð11Þ
Otherwise, this simulation replicates Simulation 5
exactly.
2.6.2. Results and discussion
The results from the absolute threshold model (Fig. 11)
were virtually indistinguishable from those from the differential-threshold model that we used in the first five
simulations (Fig. 10). The full-learning condition produced
the familiar saw-toothed function (Fig. 11a), indicating
repetition priming and cumulative semantic interference.
Weight increases carried a repetition priming effect
(Fig. 11b), while decreases carried the interference effect
(Fig. 11c).
In fact, the absolute threshold version of the model is
not just indistinguishable from the differential-threshold
model with regard to the saw-tooth function. It can simulate every one of the phenomena that the differentialthreshold model did. Fig. 12 shows the results of new simulations that repeat Simulations 1–4 while replacing the
competitive selection rule with a non-competitive absolute
threshold.
These results are important for two reasons. First, they
fit our previous results to a tee. This consistency means
that not only was competitive selection unimportant for
our previous findings, but it did not even contribute to
them. Second, demonstrating that we can get cumulative
semantic interference effects without a competitive selection mechanism proves that cumulative semantic interference does not require competitive lexical selection.
So how does the weight-reduction version of the model
(Fig. 8c) produce cumulative semantic interference without competitive selection? We claim that the interference
effect requires competition in a broader sense, not necessarily limited to the selection process. If the essence of
competitive selection is to cause the facilitation of one
word to slow down retrieval of other words, then our model already does that through its learning algorithm. The learning process naturally involves weakening connections to
competitors while strengthening connections to targets.
Weakening connections slows down the subsequent retrieval of competitor, thereby creating cumulative semantic
(c)
Fig. 11. Predictions from the non-competitive selection model (Eq. (11))
mirror those from our competitive model in Simulation 5. (a) Implementing both weight increases and decreases produces the familiar sawtoothed pattern. (b) Weight increases alone produce the step function
associated with repetition priming. (c) Weight decreases still lead to a
cumulative semantic interference effect.
interference. So, in this context, error-based learning is
competitive.
Howard et al.’s (2006) model required competitive lexical selection because it offered a wholly facilitation-based
account of cumulative semantic interference. Their priming mechanism strengthened connections to targets without weakening connections to competitors. So their
model implemented something like our facilitatory learning condition. And within those confines, they needed
competitive selection to convert repetition priming into a
competitor-inhibition effect and achieve semantic interference (e.g. Mensink & Raaijmakers, 1988).
We can test this analysis. Having established our original mean-based selection rule as essentially non-competitive, we can try making it more competitive. Instead of
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
20
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
(a)
(b)
(c)
(d)
40
40
Homog
Selection time (in boosts)
Selection time (in boosts)
Homog
Mixed
35
30
25
Mixed
35
30
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
1
Trial
2
3
4
5
6
7
8
Cycle
(e)
(f)
100
Replication 1
Chance-corrected
perseveration frequency
80
Replication 2
Replication 3
60
Replication 4
40
Mean
20
0
−20
−40
1 2 3 4 5 6 7 8 9 1011 12 1314 15 1617 1819 20 2122 23
Lag
Fig. 12. Replicating Simulations 1–4 with a non-competitive selection mechanism gives results identical to those already presented. (a) Lag-insensitive
increases in selection times as a function of ordinal position (Simulation 1a replication). (b) Repeated items also contribute to the interference effect
(Simulation 1b replication). (c) Selection times increase within a cycle and decrease across cycles (Simulation 2 replication). (d) Accumulated interference
generalizes to other same-category items (Simulation 3 replication). (e) Error effects increase across cycles (Simulation 4 replication). (f) More recently used
words are more likely to be perseverated (Simulation 4 replication).
comparing each word’s activation to the mean activation
all its competitors, we might hone in on just the strongest.
The idea here is to amplify the effect of the strongest prepotent competitor, so that it has a greater impact on selection times. Thus, we replace Eq. (6) with Eq. (12), below:
tselection ¼ logb
s
ai t1 astrongest competitor t1
ð12Þ
Now we repeat the previous batch of simulations. If this
tweak of the competitive selection rule is sufficient to turn
repetition priming into cumulative semantic interference,
then we should see it in the facilitatory learning condition.
And we do (Fig. 13). In the facilitatory learning condition, focusing competitive selection on a single close competitor translates the repetition priming into cumulative
semantic interference (Fig. 13b). Interestingly, the selec-
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
(a)
21
competitive selection. Therefore, the phenomenon of
cumulative semantic interference cannot be claimed to uniquely support the existence of a competitive mechanism
for lexical selection in speech production. We expand on
this important conclusion in the general discussion.
3. General discussion
3.1. Summary and implications of findings
(b)
(c)
Fig. 13. A more competitive selection rule (Eq. (12)) compares each
word’s activation with just its strongest competitor. (a) Applying both
weight increases and decreases generates selection latencies resembling
the saw-tooth pattern from Simulation 2. (b) Increasing connection
weights to target words produces a less dramatic version of this sawtooth. (c) Decreasing weights to competitors continually increases
selection times in the homogeneous condition while decreasing times
in the mixed condition.
tion time pattern in this facilitatory learning condition
resembles that of the simulation including both weight increases and decreases (Fig. 13a). Thus, competitive selection allows facilitatory changes (as in Howard et al.’s
model) to account for both repetition priming and cumulative semantic interference effects. We should note, however, that inhibitory learning (Fig. 13c) still plays a major
role in creating the combined interference effect, demonstrating its continued relevance to explanations of cumulative semantic interference.
The results of Simulations 5 and 6 suggest that cumulative semantic interference requires competition, but not
necessarily during lexical selection. Incremental learning
can also be competitive, and competitive learning is sufficient to produce cumulative semantic interference without
Lexical retrieval leads to lexical learning. The light side
of learning is well known. Retrieving the same word again
becomes faster and more accurate. But learning also has a
dark, competitive, side that hinders the subsequent retrieval of semantically related words. In our theoretical framework, this dark side of learning leads to the behaviors
identified with cumulative semantic interference.
Remarkably, this framework does not require competitive lexical selection. Competitive learning obviates the
need for competition in the selection process. Thus, we
can modify Howard et al.’s (2006) explanation to identify
four properties that must be true of lexical retrieval in order to account for cumulative semantic interference:
shared activation, activation-dependent selection time,
persistent priming, and competition. First, activating one
word must also activate similar words. Though, as Howard
et al. (2006) pointed out, this shared activation can be
accomplished in many ways, it is an inherent property of
models such as ours that rely on distributed semantic representations. Second, more activated words should be selected more quickly than less activated words. Such
activation-dependent selection time is common to most theories of lexical selection, both competitive (e.g. Roelofs,
1992) and non-competitive (e.g. Mahon et al., 2007). Third,
retrieving a word must persistently prime its future retrieval. Such persistent priming is most readily understood as
incremental learning (e.g. Damian & Als, 2005). Finally,
facilitating retrieval of one word must have negative consequences for related words. This competition could be
implemented in many ways, including competitive lexical
selection, but we note that it follows naturally from implementing priming as error-based learning.
Given these principles, our learning model offers a parsimonious account of the empirical aspects of cumulative
semantic interference. It incorporates distributed semantic
representations, error-based learning, and a booster mechanism that produces activation-dependent lexical selection
times and errors.
3.2. Response time effects
The model’s selection times reflect all the responsetime hallmarks of cumulative semantic interference. Laginvariant incremental increases in response times, in the
continuous paradigm (Simulation 1a), showed that the
model can account for Brown’s (1981) and Howard
et al.’s (2006) main findings. Moreover, the model simulated the dual effects of repeating an item (Navarrete
et al., 2008): that item is named much more quickly (repetition priming), but repeating it creates additional
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
22
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
semantic interference, just about as much interference as
two different related items would (Simulation 1b). Applied
to blocked-cyclic naming (Simulation 2), the model’s selection times showed both repetition priming and cumulative
semantic interference effects (e.g. Damian et al., 2001; Schnur et al., 2006, Experiment 1). And the accumulated interference transferred to novel items from the same semantic
category (Simulation 3), recalling Belke et al.’s (2005) report. Thus, we have demonstrated that the model accounts
for the major reaction time manifestations of cumulative
semantic interference.
3.3. Error effects
The learning model can also account for aphasic patients’ error patterns (Simulation 4). One way of understanding aphasic brain damage is to assume that
processes work normally, but are more prone to error.
The model was made more error-prone by adding a small
amount of noise to each word’s net input. This noise led
to blocking effects for semantic errors and omissions that
increased with each cycle, reminiscent of findings from
Schnur et al.’s (2006, Experiment 2) patient work. And
the model’s perseveration errors recalled Hsiao et al.’s
(2009) lag-based recency gradient, suggesting a further
match to the empirical patient data.
Although we are heartened by the degree to which the
model simulates the major response time and error data
effects in the literature, we acknowledge the model’s limitations. We do not, indeed we can not, ‘fit’ the data quantitatively. This is because the model’s lexical base is small
and its treatment of semantics is rudimentary. Most humans, for example, know more than 36 words, and would
describe a dog as something more than merely a terrestrial
mammal. Moreover, we have constrained the scope of the
model to lexical activation and selection. We have not represented any processes leading up to the activation of
semantic features, nor any processes that follow lexical
selection. And, although we postulate that the experimental paradigms that are modeled may induce strategies, we
do not simulate these. All of these factors compromise the
model’s ability to fit the actual numbers. What is not compromised by the model’s simplifications, we would argue,
is its ability to make theoretical issues more transparent.
This was a strength of Howard et al.’s (2006) model and
we hope to achieve the same with our model, particularly
with regard to the issue of the role of competition, to
which we turn next.
3.4. Competition and the occlusion versus inhibition debate
We provided an analysis of the model’s account of cumulative semantic interference in Simulations 5 and 6. In contrast to several previous accounts, facilitatory processes
contributed almost nothing to the model’s interference effect (Simulation 5). Learning-based weakening of semantic-to-lexical connections, however, reliably produced
cumulative semantic interference effects (Simulation 5),
even without competitive lexical selection (Simulation 6).
Thus, competitive selection was not needed for cumulative
semantic interference when weight decreases occurred
during learning. In fact, Simulation 6 demonstrated that,
although the selection mechanism employed in the first five
simulations was functionally non-competitive, the model
still exhibited cumulative semantic interference.
Competitive lexical selection did, however, allow the
model to generate cumulative semantic interference via
weight strengthening alone (Simulation 6), as in Howard
et al. (2006), by effectively converting an occlusion process
into an inhibitory one (cf. Mensink & Raaijmakers, 1988).
Thus cumulative semantic interference can derive from
either competitive selection, where strong competitors
interfere with target retrieval, or competitive learning,
where strengthening a target involves weakening competitors. Therefore, we can derive a more general principle
that lexical competition affects lexical selection, without
constraining the point at which this competition comes
into play. And error-based learning provides sufficient
competition to explain cumulative semantic interference.
Our conclusion that competitive selection is not required to explain cumulative semantic interference has
ramifications for the current debate about the necessity
of such a selection process. It is fair to say that a majority
of production researchers hold that lexical selection is
competitive; that is, activated competitors retard target retrieval (e.g. Levelt et al., 1999). Competitive selection was
thought to have been demonstrated in the picture-word
interference paradigm, in which a seen or heard distractor
item slows the naming of a picture presented at about the
same time (Schriefers et al., 1990). Mahon et al. (2007),
however, presented findings suggesting that the influence
of external distractors on naming response times occurs
post-lexically, and so the relevance of semantic interference from this paradigm for competitive lexical selection
can be questioned (see, e.g. Abdel Rahman & Melinger,
2009; Janssen, Schirm, Mahon, & Caramazza, 2008; Mahon
& Caramazza, 2009, for recent discussion). Thus, cumulative semantic interference, in which interference is generated from previous naming trials rather than external
stimuli, might be seen as better evidence for competitive
lexical selection (Dell, Oppenheim, & Kittredge, 2008;
Howard et al., 2006; Navarrete et al., 2008; Schnur et al.,
2006). Here is where our modeling exercise matters.
Although the model implicates a role for competition in
explaining semantic interference, it does not require a
competitive selection process to simulate the data. On
the one hand, the model reinforces the conclusion of
Howard et al. (2006), that some kind of competition is required to explain cumulative semantic interference. On the
other hand, the model offers the novel hypothesis that this
competition arises through learning, rather than through a
lexical selection mechanism that is slowed by any activated competitor. If this hypothesis is true (as well as concerns about the relevance of the picture-word paradigm),
then we are back at square one on the question of competitive lexical selection.
3.5. Incremental learning as an account of cumulative
semantic interference
The model uses the delta rule, an error-based learning
algorithm, to explain cumulative semantic interference.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
The important aspect of this algorithm is that it creates
both strengthening of the connections to the target, and
the weakening of the connections to competitors. Connection strengthening is clearly required for repetition priming, and connection weakening appears to explain at
least a component of semantic interference and the
strength of the perseveratory lag effect. Moreover, connection weakening is necessary for semantic interference if it
is assumed that lexical selection is not competitive. Thus,
the delta rule motivates an account of the data that combines the two principal hypothesized explanations for
semantic interference and retrieval-induced forgetting in
general – the occlusion and the inhibition hypotheses.
The fact that the weakening and strengthening is directly
proportional to error, is not, as far as we can tell, directly
relevant to the model’s account of the data. But this aspect
of the delta rule is motivated by research on Pavlovian conditioning (e.g. Rescorla & Wagner, 1972), episodic memory
encoding (e.g. McClelland, McNaughton, & O’Reilly, 1995),
frequency sensitivity in priming (e.g. Chang et al., 2006),
greater repetition priming in more error-prone conditions
(e.g. Anderson, 2008), and division of labor effects in multi-component computational systems (e.g. Harm & Seidenberg, 2004).
3.6. Extending and testing the model
Though not formally simulated, the model is also compatible with several other empirically-established effects
of cumulative semantic interference on lexical retrieval
times:
1. Interference is robust to timing manipulations (e.g.
intervening non-verbal fillers: Damian & Als, 2005,
Experiment 1; RSI manipulations: Hsiao et al., 2009;
Schnur et al., 2006, Experiments 1 and 2; cf. simultaneous presentation: Belke et al., 2005, Experiment 2).
In our model, cumulative semantic interference derives
from incremental learning and therefore follows the
same time course as the learning process itself. That
is, it persists without regard to time.
2. Interference is robust to filler material in other paradigms (e.g. naming pictures from other categories in
the blocked-cyclic paradigm: Damian & Als, 2005,
Experiments 2–4). In line with Simulation 1, this
robustness comes from the fact that only relevant experience leads to relevant learning, and hence priming or
interference. As long as filler material is sufficiently
orthogonal to the critical items, it should never affect
the build-up or resolution of semantic interference.
3. Interference effects are graded as a function of
semantic similarity (Vigliocco et al., 2002). In other
words, more similar sets of items produce more interference than less similar sets. Such graded effects naturally emerge from the use of distributed semantic
representations, where similarity is a function of feature overlap instead of discrete category membership.
4. Interference is task-dependent (e.g. Damian et al.,
2001). It requires mapping from shared semantic representations to separate lexical representations. Therefore, tasks that engage either type of representation,
23
but involve no such mapping, should not elicit interference. For instance, non-verbally categorizing pictures
according to visual or semantic features (e.g. Damian
et al., 2001) should not involve the semantic-to-lexical
mapping, and should therefore not show our cumulative semantic interference. Similarly, orthographically
cued word naming (e.g. Damian et al., 2001, Experiment
2; Kroll & Stewart, 1994, Experiment 1) should only elicit semantic interference to the extent that utterance
planning requires semantic access.
5. Interference is cue-independent. For example, Wheeldon and Monsell (1994) used naming-to-definition to
prime picture-naming, suggesting that the prime
affected mappings from amodal semantic representations to lexical items. Our model is consistent with this
cue-independence because semantic interference
effects are carried in the semantic-to-lexical connections. Any process that uses these connections should
therefore show cumulative semantic interference,
regardless of the instigating stimulus.
These five response time effects derive from the fact
that the model attributes semantic interference to incremental learning during lexical access, as opposed to some
kind of time-dependent facilitory or inhibitory priming. Indeed, these findings could also be readily explained by the
model of Howard et al. (2006), which was the first implemented account of semantic interference in production
based on persistent changes. We consider our model to
be a descendant of that model. However, our model differs
from its ancestor in three respects. First, it links up with
speech-error based models of production and hence accounts for errors, including perseverations and omissions.
This allows the model to simulate aphasic error data, and
to offer a mechanism for how the brain chooses among active lexical candidates. Second, it ascribes an important
role to connection weakening in explaining semantic interference effects on RT’s and errors, and specifically the perseveration lag function. Connection weakening is a natural
consequence of the delta rule, an algorithm for incremental
learning. And third, the model represents the possibility
that the competition needed to explain interference may
not occur during lexical selection; it can arise from learning. As a result, non-competitive lexical selection becomes
a viable account for data in this paradigm, and for production in general.
3.7. Additional predictions of our model
Our model offers additional predictions, mostly stemming from the idea that cumulative semantic interference
reflects incremental learning. First, because the model’s
learning is error-based and not dependent on actual selection, cumulative semantic interference should accrue even
when a name is not correctly retrieved, as in the case of errors and omissions. Although this property has not been
tested for cumulative semantic interference, it appears to
hold true for retrieval-induced forgetting. Using a partset cueing paradigm, Storm, Bjork, Bjork and Nestojko
(2006) demonstrated that providing misleading practice
cues (i.e. they either elicited novel associations or offered
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
24
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
no valid response) created retrieval-induced forgetting
equivalent to that from valid cues. Therefore we expect,
and our model predicts, that semantic interference should
even accrue from naming trials that elicit omission errors.
Also, because cumulative semantic interference is
attributed to persistent changes in connection weights,
the effects should not spontaneously dissipate. Several picture-naming studies (e.g. Howard et al., 2006; Nickels,
Howard, Dodd, & Coltheart, 2008) have suggested that
cumulative semantic interference persists at relatively
long item-lags (minutes). But longer periods of persistence
have not been examined in the picture-naming paradigm.
One episodic memory study (Anderson & Spellman, 1995)
concluded that retrieval-induced forgetting persisted over
lags of twenty minutes and another (Storm et al., 2006)
even reported that retrieval-induced forgetting effects
may remain detectable after a 1-week lag (but see
MacLeod & Macrae, 2001, and Postman et al., 1968, for
some evidence to the contrary). So, with the caveat that
relevant experience is easier to define in a model than in
the real world, we should find that cumulative semantic
interference dissipates largely as a function of relevant
experience, rather than as a function of time.
Finally, since our model derives from domain-general
principles, it should formally extend to non-linguistic processes. Any system that involves shared activation, activation-dependent selection, and competitive learning should
exhibit similar effects, and may be examined through similar experimental paradigms. Throughout this paper, we
have referred to retrieval-induced forgetting (RIF), a wellknown episodic memory effect, where the process of
retrieving one association leads to impaired recall of competing associations. Several prominent accounts of RIF (e.g.
Anderson, 2003; Norman et al., 2007) suggest that it arises
from the process of new information overwriting the old.
In other words, they ascribe the memory effect to the process of learning. Such explanations have also surfaced for
effects in visual object recognition (e.g. Marsolek, Schnyer,
Deason, Ritchey, & Verfaellie, 2006). Thus, we can identify
cumulative semantic interference with a more general
theme in the way the mind operates, independent of the
particular types of representations in use.
It may also be possible to extend the model’s links with
brain mechanisms and regions. The successful resolution of
semantic interference may depend on processes subserved
by the LIFG and/or left temporal lobe (LT) and thus may be
impaired by damage to these regions. Evidence for this
localization comes from both neuroimaging studies of
healthy subjects (Maess, Friederici, Damian, Meyer, & Levelt, 2002; Moss et al., 2005, Schnur et al., 2009; see also
Hocking, McMahon, & de Zubicaray, 2008) and lesion-mapping studies in patients (Schnur et al., 2005, 2009). The
involvement of frontal processes is further supported by
single-case and group studies that associate exaggerated
blocking effects with nonfluency and other symptoms of
anterior aphasia (Biegler, Crowther, & Martin, 2008; McCarthy & Kartsounis, 2000; Schnur et al., 2006; Wilshire &
McCarthy, 2002). Consistent with the LT localization, Simulation 4 showed that adding noise to lexical activations succeeded in reproducing the aphasic blocking pattern. In a
follow-up test (not reported in this article), we simulated
LIFG damage by instead adding noise to the booster mechanism and found that this gave the same result as the simulated lexical damage. While these simulations of brain
damage are encouraging, we note that they fail to capture
some subtle but potentially important distinctions. For
one thing, it seems that while lesions to either the LIFG or
LT exaggerate the blocking effect, only LIFG lesions produce
the pattern of increased errors across cycles (Schnur et al.,
2005, 2006); in our simulations, noisy lexical activations
and a noisy booster both had this effect. Also, whereas
our simulations of aphasic damage focused on errors (in
keeping with the data from Schnur et al., 2006), recent evidence suggests that some frontal aphasics manifest exaggerated blocking interference in naming latencies rather
than errors (Biegler et al., 2008). To resolve these issues, it
may be useful to measure the time course of LIFG and LT
activation with imaging methods with good temporal resolution (e.g. the EROS optical imaging technique, Tse et al.,
2007). If interference in, say, the blocking paradigm is present in lexical-semantic areas (LT), but resolved in the LIFG,
then manipulations of degree of interference should manifest first in the former region, and then the latter.
4. Conclusion
The model instantiates a dynamic view of lexical knowledge. Shared semantic representations put competing
words in a dynamic equilibrium where no semantic feature
connects too strongly to any one word. Each act of lexical
retrieval produces persistent, competitive, learning that
perturbs this balance. It facilitates repeating the same word
and impairs access to competing words. But retrieving a
competitor shifts the balance back again. So not only are
we capable of learning new words every day, but we are
constantly adjusting which words, of the ones we know,
are more or less available for use in speaking.
We believe that our model offers a parsimonious account of the lexical production data dealing with persistent
effects, and it has implications for other issues in production, learning, and memory. Similarity-based interference
that results from learning is not unique to lexical retrieval
(e.g. Anderson, 2003; Marsolek et al., 2006; Norman et al.,
2007). But, when words are retrieved in a semanticallymanipulated context, the resulting impairment is what
we know as cumulative semantic interference.
Acknowledgments
This research was supported by National Institutes of
Health Grants DC000191, HD44458, and MH1819990. We
thank Aaron Benjamin, Kara Federmeier, Matt Goldrick, John
Hummel, Brad Mahon, Randi Martin, Sharon ThompsonSchill, Tatiana Schnur, and the Bock and Dell joint lab meeting for comments on this work and other contributions.
References
Abdel Rahman, R., & Melinger, A. (2007). When bees hamper the
production of honey: Lexical interference from associates in speech
production. Journal of Experimental Psychology: Learning, Memory and
Cognition, 33(3), 604–614.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
Abdel Rahman, R., & Melinger, A. (2009). Semantic context effects in
language production: A swinging lexical network proposal and a
review. Language and Cognitive Processes, 24(5), 713.
Anderson, M. C. (2003). Rethinking interference theory: Executive control
and the mechanisms of forgetting. Journal of Memory and Language,
49(4), 415–445.
Anderson, J. D. (2008). Age of acquisition and repetition priming effects on
picture naming of children who do and do not stutter. Journal of
Fluency Disorders, 33(2), 135–155.
Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause
forgetting: Retrieval dynamics in long-term memory. Journal of
Experimental Psychology: Learning, Memory and Cognition, 20(5),
1063–1087.
Anderson, M. C., & Neely, J. H. (1996). Interference and inhibition in
memory retrieval. In E. L. Bjork & R. A. Bjork (Eds.), Memory
(pp. 237–313). San Diego, CA: Academic Press.
Anderson, M. C., & Spellman, B. A. (1995). On the status of inhibitory
mechanisms in cognition: Memory retrieval as a model case.
Psychological Review, 102(1), 68–100.
Belke, E. (2008). Effects of working memory load on lexical-semantic
encoding in language production. Psychonomic Bulletin and Review,
15(2), 357–363.
Belke, E., Meyer, A. S., & Damian, M. F. (2005). Refractory effects in picture
naming as assessed in a semantic blocking paradigm. Quarterly
Journal of Experimental Psychology Section A – Human Experimental
Psychology, 58(4), 667–692.
Biegler, K. A., Crowther, J. E., & Martin, R. C. (2008). Consequences of an
inhibition deficit for word production and comprehension: Evidence
from the semantic blocking paradigm. Cognitive Neuropsychology, 25,
493–527.
Blaxton, T. A., & Neely, J. H. (1983). Inhibition from semantically related
primes: Evidence of a category-specific inhibition. Memory and
Cognition, 11(5), 500–510.
Bock, K., & Griffin, Z. M. (2000). The persistence of structural priming:
Transient activation or implicit learning? Journal of Experimental
Psychology: General, 129(2), 177–192.
Brown, A. S. (1979). Priming effects in semantic memory retrieval
processes. Journal of Experimental Psychology: Human Learning and
Memory, 5(2), 65–77.
Brown, A. S. (1981). Inhibition in cued retrieval. Journal of Experimental
Psychology: Human Learning and Memory, 7(3), 204–215.
Castro, Y. G., Strijkers, K., Costa, A., & Alario, X. (2008). When cat competes
with dog but not with perro: Evidence from the semantic competitor
paradigm. Poster presented at the 5th international workshop on
language production, Annapolis, MD, July.
Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological
Review, 113(2), 234–272.
Cohen, L., & Dehaene, S. (1998). Competition between past and present:
Assessment and interpretation of verbal perseverations. Brain, 121,
1641–1659.
Collins, A. M., & Loftus, E. F. (1975). A spreading activation theory of
semantic memory. Psychological Review, 82, 407–428.
Collins, A. M., & Quillian, E. R. (1969). Retrieval time from semantic
memory. Journal of Verbal Learning and Verbal Memory, 8, 240–247.
Crowther, J. E., Martin, R. C., & Biegler, K. A. (2008). The semantic blocking
effect in naming: A computational model of inhibition failure. Poster
presented at the 46th annual meeting of the academy of aphasia, Turku,
Finland, October.
Damian, M. F., & Als, L. C. (2005). Long-lasting semantic context effects in
the spoken production of object names. Journal of Experimental
Psychology: Learning, Memory and Cognition, 31(6), 1372–1384.
Damian, M. F., Vigliocco, G., & Levelt, W. J. (2001). Effects of semantic
context in the naming of pictures and words. Cognition, 81(3), B77–86.
Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence
production. Psychological Review, 93(3), 283–321.
Dell, G. S., Burger, L. K., & Svec, W. R. (1997). Language production and
serial order: A functional analysis and a model. Psychological Review,
104(1), 123–147.
Dell, G. S., Oppenheim, G. M., & Kittredge, A. K. (2008). Saying the right
word at the right time: Syntagmatic and paradigmatic interference in
sentence production. Language and Cognitive Processes, 23, 583–608.
Forde, E. M. E., & Humphreys, G. W. (1997). A semantic locus for refractory
behaviour: Implications for access-storage distinctions and the nature
of semantic memory. Cognitive Neuropsychology, 14(3), 367–402.
Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical
access. Psychological Review, 105(2), 251–279.
Gordon, J. K., & Dell, G. S. (2003). Learning to divide the labor: An account
of deficits in light and heavy verb production. Cognitive Science, 27,
1–40.
25
Gupta, P., & Cohen, N. J. (2002). Theoretical and computational analysis of
skill learning, repetition priming, and procedural memory.
Psychological Review, 109(2), 401–448.
Harm, M. W., & Seidenberg, M. S. (2004). Computing the meanings of
words in reading: Cooperative division of labor between visual and
phonological processes. Psychological Review, 111(3), 662–720.
Hocking, J., McMahon, K. L., & de Zubicaray, G. I. (2008). Semantic context
and visual feature effects in object naming: An fMRI study using
arterial spin labeling. Journal of Cognitive Neuroscience.
Houghton, G. (1990). The problem of serial order: A neural network
model of sequence learning and recall. In R. Dale, C. Mellish, & M. Zock
(Eds.), Current research in natural language generation (pp. 287–319).
London: Academic Press.
Howard, D., Nickels, L., Coltheart, M., & Cole-Virtue, J. (2006). Cumulative
semantic inhibition in picture naming: Experimental and
computational studies. Cognition, 100(3), 464–482.
Hsiao, E. Y., Schwartz, M. F., Schnur, T. T., & Dell, G. S. (2009). Temporal
characteristics of semantic perseverations induced by blocked-cyclic
picture naming. Brain and Language, 108, 133–144.
Janssen, N., Schirm, W., Mahon, B. Z., & Caramazza, A. (2008). Semantic
interference in a delayed naming task: Evidence for the response
exclusion hypothesis. Journal of Experimental Psychology: Learning,
Memory and Cognition, 34(1), 249–256.
Kan, I. P., & Thompson-Schill, S. L. (2004). Effect of name agreement on
prefrontal activity during overt and covert picture naming. Cognitive
Affective and Behavioral Neuroscience, 4(1), 43–57.
Kraljic, T., & Samuel, A. G. (2005). Perceptual learning for speech: Is there
a return to normal? Cognitive Psychology, 51(2), 141–178.
Kroll, J. F., & Stewart, E. (1994). Category interference in translation and
picture naming: Evidence for asymmetric connections between
bilingual memory representations. Journal of Memory and Language,
33, 149–174.
Levelt, W. J., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in
speech production. Behavioral and Brain Sciences, 22(1), 1–38.
MacLeod, C. M., Dodd, M. D., Sheard, E. D., Wilson, D. E., & Bibi, U. (2003).
In opposition to inhibition. In B. H. Ross (Ed.), The psychology of
learning and motivation (pp. 163–214). San Diego, CA: Academic Press.
MacLeod, M. D., & Macrae, C. N. (2001). Gone but not forgotten: The
transient nature of retrieval-induced forgetting. Psychological Science:
A Journal of the American Psychological Society/APS, 12(2), 148–152.
Maess, B., Friederici, A. D., Damian, M., Meyer, A. S., & Levelt, W. J. M.
(2002). Semantic category interference in overt picture naming:
Sharpening current density localization by PCA. Journal of Cognitive
Neuroscience, 14(3), 455–462.
Mahon, B. Z., & Caramazza, A. (2009). Why does lexical selection have to
be so hard? Comment on Abdel Rahman and Melinger’s swinging
lexical network proposal. Language and Cognitive Processes, 24(5), 735.
Mahon, B. Z., Costa, A., Peterson, R., Vargas, K. A., & Caramazza, A. (2007).
Lexical selection is not by competition: A reinterpretation of semantic
interference and facilitation effects in the picture-word interference
paradigm. Journal of Experimental Psychology: Learning, Memory and
Cognition, 33(3), 503–535.
Marsolek, C. J., Schnyer, D. M., Deason, R. G., Ritchey, M., & Verfaellie, M.
(2006). Visual antipriming: Evidence for ongoing adjustments of
superimposed visual object representations. Cognitive Affective and
Behavioral Neuroscience, 6(3), 163–174.
McCarthy, R. A., & Kartsounis, L. D. (2000). Wobbly words: Refractory
anomia with preserved semantics. Neurocase, 6, 487–497.
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are
complementary learning systems in the hippocampus and neocortex:
Insights from the successes and failures of connectionist models of
learning and memory. Psychological Review, 102(3), 419–457.
McClelland, J. L., & Rogers, T. T. (2003). The parallel distributed processing
approach to semantic cognition. Nature Reviews: Neuroscience, 4(4),
310–322.
McGeoch, J. A. (1932). Forgetting and the law of disuse. Psychological
Review, 39(4), 352–370.
Melton, A. W., & Irwin, J. M. (1940). The influence of degree of interpolated
learning on retroactive inhibition and the overt transfer of specific
responses. The American Journal of Psychology, 100(3–4), 610–641.
Mensink, G., & Raaijmakers, J. G. (1988). A model for interference and
forgetting. Psychological Review, 95(4), 434–455.
Mitchell, D. B., & Brown, A. S. (1988). Persistent repetition priming in
picture naming and its dissociation from recognition memory. Journal
of Experimental Psychology: Learning, Memory and Cognition, 14(2),
213–222.
Moss, H. E., Abdallah, S., Fletcher, P., Bright, P., Pilgrim, L., Acres, K., et al.
(2005). Selecting among competing alternatives: Selection and retrieval in the left inferior frontal gyrus. Cerebral Cortex, 15, 1723–1735.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007
ARTICLE IN PRESS
26
G.M. Oppenheim et al. / Cognition xxx (2009) xxx–xxx
Navarrete, E., Mahon, B. Z., & Caramazza, A. (2008). The cumulative
semantic cost in picture naming (Il Costo Cumulativo Intracategoriale
in compiti di denominazio-ne di figure). Paper presented at the XIV
Congresso Nazionale Associazione Italiana di Psicologia, Padova, Italia,
September.
Nickels, L., Howard, D., Dodd, H., & Coltheart, M. (2008). Further
investigations into cumulative semantic inhibition in picture
naming: Effects of long lag & repeated members of a category.
Poster presented at the 5th international workshop on language
production, Annapolis, MD, July.
Norman, K. A., Newman, E. L., & Detre, G. (2007). A neural network model
of retrieval-induced forgetting. Psychological Review, 114(4), 887–953.
Oppenheim, G. M., Dell, G. S., & Schwartz, M. F. (2007). Cumulative
semantic interference as learning. Brain and Language, 103, 175–176.
Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996).
Understanding normal and impaired word reading: Computational
principles in quasi-regular domains. Psychological Review, 103(1),
56–115.
Postman, L., Stark, K., & Fraser, J. (1968). Temporal changes in interference. Journal of Verbal Learning and Verbal Behavior, 7, 672–694.
Rapp, B., & Goldrick, M. (2000). Discreteness and interactivity in spoken
word production. Psychological Review, 107(3), 460–499.
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning:
Variations in the effectiveness of reinforcement and nonreinforcement.
In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current
research and theory (pp. 64–99). New York: Appleton-Century-Crofts.
Roediger, H. L., Neely, J. H., & Blaxton, T. A. (1983). Inhibition from related
primes in semantic memory retrieval: A reappraisal of Brown’s (1979)
paradigm. Journal of Experimental Psychology: Learning, Memory and
Cognition, 9(3), 478–485.
Roelofs, A. (1992). A spreading-activation theory of lemma retrieval in
speaking. Cognition, 42(1–3), 107–142.
Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L.,
Hodges, J. R., et al. (2004). Structure and deterioration of semantic
memory: A neuropsychological and computational investigation.
Psychological Review, 111(1), 205–235.
Rumelhart, D. E., & McClelland, J. L.and the PDP Research Group. (1986).
Parallel distributed processing: Explorations in the microstructure of
cognition (Vol. I). Cambridge, MA: MIT Press.
Schnur, T. T., Schwartz, M. F., Kimberg, D. Y., Hirshorn, E., Coslett, H. B., &
Thompson-Schill, S. L. (2009). Localizing interference during naming:
Convergent neuroimaging and neuropsychological evidence for the
function of broca’s area. Proceedings of the National Academy of
Sciences, 106(1), 322–327.
Schnur, T. T., Lee, E., Coslett, H. B., Schwartz, M. F., & Thompson-Schill, S. L.
(2005). When lexical selection gets tough, the LIFG gets going: A
lesion analysis study of interference during word production. Brain
and Language, 95(1), 12–13.
Schnur, T. T., Schwartz, M. F., Brecher, A., & Hodgson, C. (2006). Semantic
interference during blocked-cyclic naming: Evidence from aphasia.
Journal of Memory and Language, 54(2), 199–227.
Schriefers, H., Meyer, A. S., & Levelt, W. J. M. (1990). Exploring the time
course of lexical access in language production: Picture-word
interference studies. Journal of Memory and Language, 29, 86–102.
Storm, B. C., Bjork, E. L., Bjork, R. A., & Nestojko, J. F. (2006). Is retrieval
success a necessary condition for retrieval-induced forgetting?
Psychonomic Bulletin & Review, 13(6), 1023–1027.
Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., & Farah, M. J. (1997).
Role of left inferior prefrontal cortex in retrieval of semantic
knowledge: A reevaluation. Proceedings of the National Academy of
Sciences of the United States of America, 94(26), 14792–14797.
Tse, C. Y., Lee, C. L., Sullivan, J., Garnsey, S. M., Dell, G. S., Fabiani, M., et al.
(2007). Imaging cortical dynamics of language processing with the
event-related optical signal. Proceedings of the National Academy of
Sciences of the United States of America, 104(43), 17157–17162.
Vigliocco, G., Vinson, D. P., Damian, M. F., & Levelt, W. J. M. (2002).
Semantic distance effects on object and action naming. Cognition,
85(3), B61–B69.
Wheeldon, L. R., & Monsell, S. (1994). Inhibition of spoken word
production by priming a semantic competitor. Journal of Memory
and Language, 33, 332–356.
Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. IRE WESCON
convention record. New York: IRE (Reprinted in Anderson, J. A., &
Rosenfeld, E. (Eds.) (1988). Neurocomputing: Foundations of research
(pp. 123–134). Cambridge, MA: MIT Press.).
Wilshire, C. E., & McCarthy, R. A. (2002). Evidence for a context-sensitive
word retrieval disorder in a case of nonfluent aphasia. Cognitive
Neuropsychology, 19, 165–186.
Please cite this article in press as: Oppenheim, G. M., et al. The dark side of incremental learning: A model of cumulative semantic interference during lexical access in speech production. Cognition (2009), doi:10.1016/j.cognition.2009.09.007