Dynamic Temporal Processing of Multisensory Information

Transcription

Dynamic Temporal Processing of Multisensory Information
Dynamic Temporal Processing of
Multisensory Information
Dr. Zhuanghua Shi
Habilitation
an der Fakultät für Psychologie und Pädagogik
der Ludwig-Maximilians-Universität
München
vorgelegt von
Dr. Zhuanghua Shi
München, den 15 Jan. 2013
to my family ...
vi
Inhaltsverzeichnis
1 Synopsis
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Multisensory spatial integration . . . . . . . . . . . . . . . . . . . .
1.1.2 Multisensory temporal integration . . . . . . . . . . . . . . . . . . .
1.1.3 Multisensory time perception . . . . . . . . . . . . . . . . . . . . .
1.1.4 Sensorimotor recalibration and delay perception . . . . . . . . . . .
1.1.5 Multisensory enhancement and perceptual learning in visual search
1.1.6 Multimodal feedback delay in human-machine interaction (HMI) . .
1.2 Cumulative research work . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Multisensory temporal integration and motion perception . . . . . .
1.2.2 Multisensory and sensorimotor time perception . . . . . . . . . . .
1.2.3 Multisensory enhancement, context learning and search performance
1.2.4 Delays in multimodal feedback and user experience . . . . . . . . .
1.3 Summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
1
2
4
5
5
6
7
8
11
15
17
23
25
2 Wissenschaftliche Veröffenlichungen
2.1 List of publications (2009-2013) . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Part I: Multimodal temporal integration and motion perception . . . . . .
2.2.1 Audiovisual Ternus apparent motion . . . . . . . . . . . . . . . . .
2.2.2 Perceptual grouping and crossmodal Apparent motion . . . . . . . .
2.2.3 Auditory capture on Tactile apparent motion . . . . . . . . . . . .
2.3 Part II: Multimodal time perception . . . . . . . . . . . . . . . . . . . . .
2.3.1 Auditory reproduction . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Feedback delay and duration reproduction . . . . . . . . . . . . . .
2.3.3 Emotional modulation of tactile duration . . . . . . . . . . . . . . .
2.3.4 Emotional modulation of audiotactile TOJ . . . . . . . . . . . . . .
2.3.5 Simultaneity in Schizophrenia patients . . . . . . . . . . . . . . . .
2.4 Part III: Multimodal enhancement, perceptual learning and search performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Eye movements and pip-and-pop effect . . . . . . . . . . . . . . . .
2.4.2 Contextual cueing in multiconjunction search . . . . . . . . . . . .
2.4.3 Transfer of contextual cueing in full-icon display remapping . . . .
31
31
32
32
47
59
68
68
89
101
111
126
137
137
156
175
viii
2.5
Inhaltsverzeichnis
Part IV: Delays in multimodal processing and user experience . . . .
2.5.1 Neural latencies and motion extrapolation in the central fovea
2.5.2 Delay in haptic telepresence systems . . . . . . . . . . . . . .
2.5.3 Effects of packet loss and latency on visual-haptic TOJs . . .
2.5.4 Temporal perception of visual-haptic events . . . . . . . . . .
2.5.5 Delay perception in different haptic environments . . . . . . .
2.5.6 Optimization for haptic delayed telepresence systems . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
186
186
194
206
216
231
240
Acknowledgements
255
Lebenslauf
256
1
Synopsis
1.1
1.1.1
Introduction
Multisensory spatial integration
Signals from the natural environment are highly redundant, since we perceive the external
world via multiple senses. For example, when knocking on a door we do not only hear
a sound, see a hand movement, but also perceive a touch from the knocking hand. The
multisensory nature of the world is highly advantageous, because it increases perceptual
reliability and saliency, and, as a result, it enhances object discrimination and identification,
and facilitates a reaction to the external world (Vroomen & Keetels, 2010). However, the
multisensory nature of the world also raises complex integration and segregation problems.
For instance, how does our brain sort through relevant and irrelevant signals to form a
coherent multisensory perception? Imagine you are chatting with your friends in a coffee
bar, you hear multiple voices, and see lip movements simultaneously. You identify and
combine those information correctly without any difficulty, but sometimes you may fail to
integrate or mis-combine different face and voice together. This happens, too, when you
are watching a movie in a cinema. You believe that voices are coming from the actor’s lips,
although the voice is delivered from the loud speakers on the sidewalls. This is known as
a typical ventriloquist effect. Such an audiovisual speech illusion, however, is one example
of multisensory integration, and there are many others. To take another example, when a
single flash is accompanied with two beeps, the single flash is often perceived as two flashes
(Shams, Kamitani, & Shimojo, 2000).
Over the past few decades, much has been gained in multisensory perception, particular in spatial integration. The most common account for multisensory integration is the
modality appropriateness or modality precision hypothesis (Welch & Warren, 1980). The
hypothesis states that the sensory modality with highest acuity outweighs the others in
multisensory integration. For example, vision with its high spatial resolution may dominate over audition for spatial perception, which explains that the position of an auditory
stimulus is often captured by a simultaneous visual stimulus. In recent years, probabilistic models, such as Maximum likelihood estimation (MLE) (Alais & Burr, 2004; Ernst
& Banks, 2002; Ernst & Bülthoff, 2004), have been developed to provide quantitative accounts for multisensory integration. The MLE model assumes that different sensory inputs
are assigned to differential weights, with each weight set in proportion to the reliability of
the corresponding sensory estimates. Using this approach, the final multisensory estima-
2
1. Synopsis
te has minimal variance (in other words, maximal reliability). Consider a bimodal source
(e.g. an audiovisual signal) that produces two sensory cues (e.g. positional cues), estimated
by the auditory and visual systems with (Ŝa , Ŝv ). The MLE model predicts that the final
audiovisual estimate Ŝ is:
(1.1)
Ŝ = wa Ŝa + wv Ŝv ,
where wa = 1/ a2 /(1/ a2 + 1/ v2 ), wv = 1 wa . And
and visual sensory estimates (see Figure 1.1).
2
a,
2
v
are variances of the auditory
Ŝ
Ŝv
Ŝa
Abbildung 1.1: A normative MLE model of audiovisual integration. The audiovisual estimate Ŝ is the linear combination of Ŝa , Ŝv , with each weight set in proportion to its reliability.
The reliability-based integration models have successfully predicted multisensory integration in many situations, such as visual-haptic size estimation, and audio-visual localization (for a review, see Alais, Newell, & Mamassian, 2010). The MLE model has recently
been extended to a more general Bayesian framework, in which prior knowledge of the multisensory information has been incorporated (Ernst & Di Luca, 2011; Körding & Wolpert,
2004; Roach, Heron, & McGraw, 2006). Using priors allows Bayesian models to predict both
multisensory integration and multisensory segregation (Ernst & Di Luca, 2011; Roach et
al., 2006).
Note, those quantitative models mentioned above are derived from studies of multisensory spatial integration (Alais & Burr, 2004; Ernst & Banks, 2002). However, evidence is
rather mixed whether or not those quantitative models would also apply to multisensory
temporal integration (see next section 1.1.2).
1.1.2
Multisensory temporal integration
A pre-assumption for multisensory integration is the assumption of unity. The assumption
of unity suggests that multisensory integration makes sense only when the perceptual
system has evidence that the multiple signals (events) originate from a common source
(Welch, 1999; Welch & Warren, 1980). Without doubt, the most important factors for
perceiving a common source are spatial proximity and temporal coincidence. Multisensory
integration occurs when the sensory signals originated from proximal location and reach the
1.1 Introduction
3
brain at around the same time. Otherwise, sensory signals are likely perceived as separated
events.
Similar to multisensory spatial integration, multisensory events with temporal coincidence may interact and integrate, forming a coherent temporal percept. Recent studies on
audiovisual interaction in temporal order judgments (Morein-Zamir, Soto-Faraco, & Kingstone, 2003; Scheier, Nijhawan, & Shimojo, 1999) found that the temporal discrimination
threshold of visual events can be altered by adding two auditory clicks. When the first click
is slightly prior to the first flash and the second click shortly after the second flash, the visual temporal resolution could be enhanced, as if the two clicks pull two flashes apart. This
has been termed as the temporal ventriloquist effect, analogue to the spatial ventriloquist
effect. Various types of the temporal ventriloquist effect have been found recently using
different paradigms (Fendrich & Corballis, 2001; Getzmann, 2007; Keetels, Stekelenburg,
& Vroomen, 2007). For example, Getzmann has recently used classical apparent motion to
investigate how brief beeps altered visual apparent motion. He found that, similar to the
temporal ventriloquist effect, beeps presented before the first and after the second visual
flash as well as simultaneously presented sounds reduce the motion impression, whereas
sounds intervening two visual flashes facilitated apparent motion relative to the baseline
(visual flashes without sounds). The common explanation for multisensory temporal integration is similar to the account for the multisensory spatial integration (i.e., the traditional
modality precision hypothesis, Welch & Warren, 1980), arguing that the auditory modality
has higher temporal resolution than other modalities, and as a result, auditory information
dominates the final temporal percept.
Note, temporal coincidence can be influenced by physical temporal discrepancies (e.g.,
sound and light travel at different speeds), and by differential neural processing time among
sensory modalities. Scientists are well aware of neural latency differences. For example,
auditory stimuli are often perceived faster than visual stimuli (Levitin, MacLean, Mathews,
& Chu, 2000; Stone et al., 2001), whereas, the latency of touch has to be considered
where the stimulation originated, because the travel time is longer from the toes to the
brain than from the forehead (Vroomen & Keetels, 2010). Although latencies are different
for different modalities, the perceptual system still promotes a temporally coherent and
unified perception to a certain degree. It is thus essential for researchers to investigate the
compensation mechanisms of the perceptual system. The ability of the perceptual system to
compensate for different latencies has been referred to as temporal window of multisensory
integration. To date, it has been revealed that the temporal integration window depends on
many aspects, such as the modality, training, attentional bias etc. For example, Fujisaki et
al. have shown that training and adaptation can alter the crossmodal simultaneity window
(Fujisaki, Shimojo, Kashino, & Nishida, 2004). Spence and colleagues have demonstrated
that attention can also shift the integration window (Spence, Nicholls, & Driver, 2001).
However, most of the aforementioned studies have mainly examined crossmodal integration
of the audio-visual modalities. Research of touch-related multisensory temporal integration
is relative scarce and the temporal integration mechanism with touch still need to be further
investigated.
Besides spatial proximity and temporal coincidence, other Gestalt grouping principles,
4
1. Synopsis
such as common fate or common feature, may also lead to a coherent percept. More recently, it has been shown that perceptual grouping in general could be a potential influential
factor for multisensory perception (Spence, Sanabria, & Soto-Faraco, 2007). For example,
unimodal auditory grouping and segregation (i.e. pop-out pips) can enhance discrimination
of concurrent visual events (Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2008; Vroomen & de Gelder, 2000), or temporal order judgments (Keetels et al., 2007). In a study
of audiovisual interaction in visual apparent motion, Bruns and Getzmann have revealed
that either a continuous sound filling in the gap between two flashes or a short sound
intervening between two flashes promotes the crossmodal grouping of movement, which
enhances perceived visual motion continuity (Bruns & Getzmann, 2008). However, it is
less known how unimodal and crossmodal grouping interact and modulate multisensory
temporal integration.
1.1.3
Multisensory time perception
The perception of time, in particular, time among multiple senses, is not straightforward,
since there is no special sensory organ for time perception. Yet, the traditional centralized
and amodal internal clock model dominated the field of time perception over the last 30
years (Bueti, 2011), which consists of a pacemaker emitting pulses at a certain rate and a
mode of switch that can open and close to permit an accumulator to collect emitted pulses.
More recently new evidence has been accumulated to challenge the one-centralized-clock
amodal model. For instance, the amodal account can not explain differential modalityspecific pacemaker rates (Droit-Volet, Meck, & Penney, 2007; Penney, Gibson, & Meck,
2000; Wearden, Edwards, Fakhri, & Percival, 1998). Neurophysiological evidence, on the
other hand, suggests separate brain regions devote to visual and auditory duration processing (Bueti, Bahrami, & Walsh, 2008; Ghose & Maunsell, 2002). The amodal model has
also difficulty to explain, for examples, why temporal discrimination is better for audition
than vision (Grondin, 1993), and why auditory duration is judged longer than the same
physical visual duration (Wearden et al., 1998). Those recent evidence suggests that time
perception is rather distributed across brain areas and sensory modalities (Bueti, 2011; Ivry
& Richardson, 2002; Matell & Meck, 2004).
Since time processing is distributed across modalities, studies on crossmodal time judgments have revealed rather complex and inconclusive results. For instance, it has been
shown that the duration of auditory events was lengthened or shortened by the presence of
conflicting looming or receding visual information, while the perceived duration of visual
events was unaffected by auditory looming or receding stimuli (van Wassenhove, Buonomano, Shimojo, & Shams, 2008). Other studies, on the other hand, using static stimuli or
implicit measures have reported the opposite results, that is, the perceived visual duration
was affected by a concurrent auditory duration (e.g., Y.-C. Chen & Yeh, 2009).
Unlike spatial perception, time perception can be distorted dramatically by emotional
stats. For instance, when involved in an accident, such as a car crash, people often report that they felt the world slow down. Research suggests high-arousal stimuli, such as
threatening pictures, are often perceived longer in duration compared to neutral stimuli
1.1 Introduction
5
(Droit-Volet et al., 2007). The lengthening effect induced by emotion has been confirmed
in visual (Angrilli, Cherubini, Pavese, & Manfredini, 1997; Droit-Volet, Brunot, & Niedenthal, 2004), and auditory (Noulhiane, Mella, Samson, Ragot, & Pouthas, 2007) modalities.
Although there is now ample evidence of how emotion distorts duration perception, most
of the studies have focused only on unisensory modulation. Given that time perception is
distributed processing, there is still only scant understanding of how induced emotion from
one sensory modality influences time perception in another modality.
1.1.4
Sensorimotor recalibration and delay perception
Time perception can be influenced by an action, too (Bueti & Walsh, 2010; Cunningham,
Billock, & Tsou, 2001; Stetson, Cui, Montague, & Eagleman, 2006). Stetson et al. (2006)
have demonstrated that, following brief exposure to delayed visual feedback of a voluntary
action, the onset time of the action-feedback signal is perceived earlier than the the action
itself when the delay is removed. The effect has been attributed to dynamical shifts of the
feedback event to the onset of the action, in order to maintain appropriate causality perception. Other related studies have confirmed that a delayed sensory effect is perceived as
having appeared slightly earlier in time if it follows a voluntary action - a phenomenon referred to as intentional binding. Intentional binding also attracts a voluntary action toward
its sensory effect, so that the action is perceived as having occurred slightly later in time
too, and perceived feedback delay as shorter than the actual delay (Engbert, Wohlschläger, & Haggard, 2008; Engbert, Wohlschläger, Thomas, & Haggard, 2007; Haggard, Clark,
& Kalogeras, 2002). The shortening effect has been attributed to a transient slowdown
of an internal clock after a voluntary action, and as a result, less ticks are accumulated
(Wearden, 2008). This shortening effect might be reinforced by everyday experience which
leads us to assume sensorimotor synchrony between the start of a motor action and its
sensory consequence (Heron, Hanson, & Whitaker, 2009). However, whether sensorimotor
temporal calibration is due to timing changes in the motor system or in the perceptual
system is still controversial. Some researchers have suggested that sensorimotor temporal
calibration is induced mainly by a temporal shift in the motor system (Sugano, Keetels,
& Vroomen, 2010), whereas others have attributed sensorimotor temporal calibration to
pure perceptual learning (Kennedy, Buehner, & Rushton, 2009).
1.1.5
Multisensory enhancement and perceptual learning in visual
search
Temporal coinciding multisensory events, such as synchronous audiovisual signals, can easily be picked out by our brain amongst other objects or events in the environment. For
example, a car collision with a big ‘Peng’ easily attracts our attention to the accident
spot. Such enhancement may come about as result of redundant target coding and multisensory perceptual saliency. Multisensory enhancement and facilitation have been shown
in various search paradigms in which a visual target was accompanied by a sound signal
(Bolia, D’Angelo, & Richard, 1999; Doyle & Snowden, 1998; Van der Burg, Cass, Olivers,
6
1. Synopsis
Theeuwes, & Alais, 2010). For example, Doyle & Snowden (1998) found that simultaneous,
spatially congruent sound facilitated covert orienting to non-salient visual targets in a
conjunction search paradigm. Interestingly, multisensory enhancement of visual perception
and search performance has been found not only with spatially informative, but also with
temporally informative auditory (Van der Burg et al., 2010; 2008; Vroomen & de Gelder,
2000) or tactile signals (Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2009). For instance, Vroomen & de Gelder (2000) investigated crossmodal influences from the auditory
onto the visual modality at an early level of perceptual processing. In their study, a visual
target was embedded in a rapidly changing sequence of visual distractors. They found a
high tone embedded in a sequence of low tones to improve the detection of a synchronously
presented visual target, while this enhancement was reduced or abolished when the high
tone was presented asynchronously to the visual target or became part of a melody. Using
a dynamic visual search paradigm, Van der Burg et al. demonstrated that irrelevant beeps
could guide visual attention towards the location of a synchronized visual target, which, if
presented without such synchronous beeps, was extremely hard to find (Van der Burg et
al., 2008). With the aid of synchronous beeps, search performance was improved substantially (in fact, in the order of seconds). Van der Burg et al. referred to this facilitation as
pip-and-pop effect. However, when synchronized tones were not transient, rather smooth
(e.g. by a sine wave enveloping), pip-and-pop effect vanished, suggesting transient feature
of the auditory signals is important (Van der Burg et al., 2010). To date, the true underlying mechanisms and linkage between multisensory enhancement and search performance
are still not well known, so it deserves further investigation.
1.1.6
Multimodal feedback delay in human-machine interaction
(HMI)
Multisensory time processing has critical implication in human-machine interaction, particular in multimodal virtual reality systems. Multimodal virtual reality systems have been
adopted in a variety of applications, such as remote virtual conference, telesurgery, teleoperation in space and under water. In a typical multimodal telepresence system, multimodal
information are bilateral between the local and remote sites. Users not only receive information from remote sites, but send multimodal commands (e.g. audiovisual stream as
well as haptic actions). However, due to the communication distance, data encoding, and
control scheme, communication delays between the local and remote sites are inevitable.
These delays can vary from dozens of milliseconds to seconds. For instance, the feedback
latency for an intercontinental teleoperation via the Internet is on average 300 ms, while
the latency can be up to 5-10 seconds for teleoperation tasks in space. In addition, delays
may vary among different modalities. Thus, remote multimodal synchronous events, such
as visual-haptic collision, may be turned into local asynchronous incidents. And a normal
immediate action-effect turns into action-delayed-effect as well.
The effect of time delay on simple performance have been investigated in several studies.
For example, examining the effect of visual-feedback delay on user’s task completion time,
1.2 Cumulative research work
7
Mackenzie and Ware found that performance was affected by delays exceeding 75 ms, with
completion time thereafter increasing linearly with time delay (> 75 ms) and task difficulty (MacKenzie & Ware, 1993). Similar effects have been confirmed in various modalities,
such as delay in visual feedback (Kim, Zimmerman, Wade, & Weiss, 2005), haptic feedback (Ferrell, 1966), as well as visual-haptic feedback (Jay, Glencross, & Hubbold, 2007).
While many studies of time delays have examined issues related to task performance, there
are relative few studies on delay perception per se in multimodal virtual reality systems.
Arguably, knowing human’s capabilities of perceiving delays is useful for providing system designers with guidelines for the development of multimodal communication protocols
as well as for human-centered evaluations of existing applications with respect to system
fidelity and user experience.
1.2
Cumulative research work
As alluded above, there are several open key issues in multimodal temporal processing.
During my habilitating period, I have focused on the following four research topics:
1. Multisensory temporal integration and motion perception: Using various apparent motion paradigms, studies in this research topic extended previous multisensory
temporal integration at point in time to multisensory interval (duration) integration,
and revealed that quantitative models, such as MLE, could predict multisensory interval estimation very well. In addition, crossmodal grouping principles on multisensory
integration has been extensively investigated.
2. Multisensory time perception: In this topic, various studies have been conducted
on multisensory duration perception, particularly on issues of sensorimotor duration
perception and crossmodal emotional modulation on time perception.
3. Multisensory enhancement, context learning and search performance: In the
third line of research, studies have been focused on how audiovisual synchronous events
and contextual cueing boost visual search performance. Eye tracking method has been
applied in the studies to reveal how synchronous audiovisual events influence oculomotor behaviors. In addition, context learning in general has been examined.
4. Multimodal feedback delay and user experience: Feedback delay is ubiquitous
in applied multimodal systems, such as telepresence, involving large data transmission.
The influence of delay on multisensory perception and user experience is the main focus
of this last research agenda. Here various studies have been conduced to identify the
impacts of delays in visual-haptic environments on perception of multisensory simultaneity and user’s operation performance. Based on fundamental findings, performance
optimization methods have been proposed.
Time
Time
further to quantitatively describe the inter
one common dot at the center. When the spatial configuthe auditory and visual intervals.
ration is fixed, observers typically report two distinct percepts dependent on the inter-stimulus onset interval (ISOI):
‘element motion’ and ‘group motion’ (Fig. 1). Short ISOIs
Experiment 1
usually give rise to the percept of element motion, that is:
the outer dots are perceived as moving, while the center dot
8
1. Synopsis
Two auditory clicks presented close in tim
appears to remain static or flashing. By contrast, long ISOIs
events have been found to influence th
give rise to the perception of group motion: the two dots
(visual)
temporal-order judgments (More
are perceived
to move together
as a groupand
(Kramer
and
1.2.1 Multisensory
temporal
integration
motion
perception
2003; Scheier et al. 1999) as well as the (v
Yantis 1997; Pantle and Petersik 1980; Pantle and Picciano
Most studies on multisensory
temporal
integration
follow
the traditional
approach
of (Freeman
the
apparent
motion
and Driver 2
1976). Numerous
studies
have shown
that element
and
2007).
In
Experiment
1,
we
examined wh
group
motion
are
never
perceived
simultaneously,
that
is,
multisensory spatial integration (such as spatial ventriloquist effect), focusing on crossmoaudiovisual
temporal capture effect would
typesinoftime
percept
aretemporal
mutually exclusive
(for a effect).
dal temporal capture the
at atwo
point
(e.g.
ventriloquist
The common
the
visual
Ternus
paradigm. In particular,
review,
see
Petersik
and
Rice
2006).
finding is that the onset time of a visual event is perceived to be aligned with the onested in any temporal capture effect using
In Experiment 1, we combined dual sounds with the
set of an auditory event
which appears temporally near the visual event
(Burr, Banks,
interval of 30 ms, which is generally with
visual Ternus display, introducing three different audiovi& Morrone, 2009; Freeman
& Driver,
2008; As
Getzmann,
2007; Morein-Zamir
et al., 2003;
the audiovisual
simultaneity window (Lev
sual temporal
arrangements.
temporal ventriloquism
has
Scheier et al., 1999). been
However,
theboth
temporal
ventriloquist
is manifested
with
Stone et al.only
2001).
found in
temporal-order
judgments effect
(TOJ) and
paired audiovisual stimuli.
studies tasks,
havea shown
that a single
classicalSeveral
apparent-motion
similar interaction
was sound leaves visual
Method
expected
to be
observed with(Morein-Zamir
visual Ternus apparent
temporal-order judgment
(TOJ)
uninfluenced
et al., 2003;
Scheier et al.,
2, we two
examined
the effects
of sin- for the audiovisual
1999). This has been motion.
taken In
toExperiment
suggest that
sounds
are required
gle-sound audiovisual arrangements, with a sound prestimuli to be perceived
as unitary events. Arguably, however, two beepsParticipants
clearly define an
sented in temporal proximity to either the first or the
auditory interval, which
- invisual
contrast
point
time -theis onset
another
of the (6
time
Ten participants
females and 4 males, m
second
frame.toIf the
sound
onsetin
captures
of feature
perception. Moreover, visual
pairedstimuli,
stimulione
canwould
easilyexpect
form to
a perceptual
group,
which
may
further
took part in Experiment 1. All had normal
obtain a similar
influence on multisensory
temporal
integration.
normal vision and normal hearing. All wer
ventriloquism
effect
in single-sound conditions as in dualpurpose integration,
of the experiment. Informed conse
sound
conditions.
However,
if
other
factors,
such
as
interTo investigate the influence of a sound interval on audiovisual temporal
1
before
the
starttypical
of the experimental session
sound
interval
or
auditory
grouping,
are
important
for
we adopted a Ternus apparent motion paradigm (Shi, Chen, & Müller, 2010). The
producing a temporal modulation, the temporal ventriloTernus apparent motion is produced by presenting two sequential visual frames; each frame
Apparatus and stimuli
quism effect with single sounds may be weak, if at all
consists of two horizontal
dots, and the two frames, when overlaid, share one common dot
present. In Experiments 3 and 4, we further examined the
at the center. Observers
typically
report
percepts
dependent
the were
interVisualon
stimuli
presented on a 17-inc
importance
of the
intervaltwo
for distinct
the audiovisual
temporal
stimulus onset intervalinteraction.
(ISOI): element
Short (Viewsonic)
ISOIs usually
with give
a refresh rate of 100 Hz
If a given motion
(physical)and
timegroup
intervalmotion.
is perceived
1024
9 768 pixels,
differentlymotion,
in different
modalities
the dots
auditory
rise to the percept of element
that
is: the and
outer
areinterval
perceivedofas
moving,
while which was controlle
AMD
Athlon
64
Dual-Core
Processor) with
influences
the
visual
interval,
one
would
expect
a
temporal
the center dot appears to remain static or flashing. By contrast, long ISOIs give rise
to
FSC
graphics
card.
The
computer
program
interaction
to
occur
even
when
the
onsets
of
the
visual
and
the perception of group motion: the two dots are perceived to move together as a group
of the experiment was developed with Mat
auditory events coincide in time. Additionally, the vari(See Figure 1.2). The ances
transition
threshold between the element motion and
group motion,
Inc.) and the Psychophysics Toolbox (Brai
of the interval estimates for the auditory, the visual,
measured by the chance
of two alternative
force were
choices
(2AFC),1997).
is relative
stable
The testing
cabin was dimly lit w
and level
the combined
audiovisual events
examined
when the spatial configuration is fixed.
ambient luminance of 0.09 cd/m2. The v
was set to 57 cm, maintained by using a c
stimuli consisted of two ‘stimulus frames,’
Space
Space
(b)
(a)
two black disks (1.3! of visual angle in diam
luminance) presented on a gray backgroun
The separation between the two disks was 2!
The two frames shared one element location
the monitor but contained two other elem
horizontally opposite positions relative to
Fig. 1 Schematic representation of the classical Ternus effect. a
Mono sounds
‘Element
motion’ percept. of
As the
illustrated,
the apparent
‘center’ dot which
Abbildung 1.2: Schematic
representation
Ternus
motion.Fig.
(a)1).Element
mo-(65 dB, 1,000 Hz) we
occupies the same position in two frames is perceived to remain in the
delivered
via
an
M-Audio
card (Delta 101
tion percept. (b) Group
motion percept.
same location, while the ‘outer’ dots (the remaining two dots) are
(RT-788V, RAPTOXX). To ensure accura
perceived to move from one location to the other. b ‘Group motion’
auditory and visual stimuli, the durations of t
percept. Two dots are perceived to move together in a manner
Using Ternus apparent
we could
implicitly measure audiovisual
in- of the auditory an
and theduration
synchronization
consistentmotion
with the physical
displacement
tegration by observing the shifts of the transition thresholds. In the study (Shi et al.,
1
In most studies I have collaborated with my colleagues and doctoral students. Thus, I prefer the word
we to I in the report. Other times, I use the words we and you to refer a generic third person. It should
be clear from the context.
1.2 Cumulative research work
9
2010), we systematically investigated influences of paired beeps and a single beep with
three different audiovisual temporal configurations. In the paired-beeps conditions, auditory gap intervals were clearly defined. Similarly to previous temporal ventriloquist studies
(Morein-Zamir et al., 2003; Scheier et al., 1999), we found audiovisual interval capture
effects. When the first sound preceded the first visual frame and the second sound trailed
the second visual frame by 30 ms, more group motion responses were observed compared
to the baseline condition - two sounds presented synchronously with the visual frames. The
opposite effect was also found when two sounds were presented in-between two visual frames (see Figure 1.3). However, such audiovisual capture effects were almost gone when one
beep was removed (either the first or the second, see Figure 1.4), which strongly suggested
that the auditory interval is a critical key factor in the audiovisual temporal integration.
Further experiments quantified such audiovisual interval integration using direct audiovisual interval comparisons. Auditory intervals were typically perceived longer than visual
intervals with the same physical length. The perceived audiovisual interval was predicted
Exp Brain
Res
by MLE model, indicating
auditory
and visual intervals are integrated in an optimal way
in terms of variability 1.1.
(a)
PSE (ms)
Proportion of "group motion" responses
by outer and inner sounds, respectively. In
present results go beyond previous stud
2007; Morein-Zamir et al. 2003; Scheier e
example, Getzmann (2007) demonstrated
0.8
inner clicks on a continuous apparent-mot
failed to find a significant effect of outer
0.6
the latter showed some tendency toward
another study (Morein-Zamir et al. 2003),
0.4
one intervening-sounds condition (with
Outer sounds
stimulus-interval between sounds) that yi
0.2
Synch. sounds
in a TOJ task. The failure to find opp
Inner sounds
previous studies might be due to the shor
0
50
100
150
200
250
300
350
TOJ tasks and participants’ difficulties in
ISOI (ms)
multiple percepts in classical apparent-m
above).
(b) 180
Abbildung 1.3: Psychometric curves fitted for paired-beeps conditions. The solid curve and
Possible accounts for the temporal in
160
circles represent the the baseline
‘synchronous-sounds’ condition, the have
dashed
and in previous studi
beencurve
put forward
crosses the ‘outer-sounds’ condition, and the dashed-dotted curve and explanation
pluses theis ‘innerbased on the modality preci
140
sounds’ condition.
When two sensory modalities provide disc
information about an event, this discrepanc
120
favoringgrouping
the modality
In another study (Chen, Shi, & Müller, 2010), we examined how perceptual
in that is characteriz
100
precision
in
registering
general influences crossmodal temporal processing using the same Ternus apparent motion that event. The aud
resolution than the visual
paradigm. Instead of using audiovisual
modalities, in this study we usedhigher
visualtemporal
and tactile
80
1999; Welch and Warren 1980). Accord
Ternus apparent motion, given that we intended to examine bidirectional
interactions and
events are assigned high weights, and vi
60
Ternus apparent motion can onlyOuter
besounds
constructed
in visual
or tactile modality.
tac- integration (More
Synch. sounds
Inner sounds
weights, in The
audiovisual
tile Ternus apparent motion was created Audiovisual
by threeIntervals
tactile solenoids, which
would
tap
2003; Welch and the
Warren 1980, 1986).
remains
unclear isbywhether the differ
three fingers to induceFig.indentation
taps.
apparent
motion was
constructed
3 a Psychometric
curvesThe
fitted visual
for the data
from all participants
in Experiment
1. The solid
circleswe
represent
the ‘synchrois based on points in time (onsets
three LEDs near the three
solenoids.
In curve
the and
study,
introduced
intraweighting
and cross-modal
nous-sounds’ condition, the dashed curve and crosses the ‘outerintervals.
Experiment
2 was designed to pr
temporal grouping of the
middle
element
tactile
visual)
sounds’
condition,
and the(either
dash-dotted
curveorand
pluses by
the presenting the middfor
deciding
between
these
alternatives by
condition.
b Mean We
PSEs found
(and associated
standard
le element twice prior‘inner-sounds’
to the Ternus
display.
that intramodal
grouping
of
thedo not provide any a
single
sounds,
which
errors) for three conditions of audiovisual intervals
information, would influence the visual T
motion.
Table 1 Transition thresholds (and associated standard errors, ms)
1
between ‘element motion’ and ‘group motion’ for three audiovisual
(AV) intervals for Experiments 1 and 2
Exp Brain Res
10
1
0.8
30 ms
0 ms
−30 ms
0.6
0.4
0.2
10
TVEs (ms)
edure
Proportion of "group motion" responses
rticipants who had taken part in Expericipated in Experiment 2. Five participants
ment 2 on day 1 and Experiment 1 on day
sa for the other five participants—thus
potential practice effects across the two
(a)
1. Synopsis
5
0
−5
−10
0
50
30 ms
−30 ms
TVEs (ms)
Proportion of "group motion" responses
nsisted of two separate sessions, hitherto
100
150
200
250
300
350
ISOI (ms)
periments 2a and 2b, respectively. Half the
(b)
ormed Experiment 2a first and then
1
Abbildung
1.4:
Psychometric
curves
fitted
for single-beep conditions. The solid curve and
and vice versa for the other half. In
30 ms
0 ms
circles
represent
the baseline ‘synchronous-sound’
condition, the dashed curve and crosses
only one sound was
presented,
namely,
0.8
−30 ms
visual frame. In Experiment
2b,
only
the
the ‘preceding-sound’ condition (audiovisual interval 30 ms), and the dash-dotted curve
s presented. The settings
were thethe
same
as
and pluses
‘trailing-sound’
interval -30 ms). The magnitude of
0.6condition (audiovisual
10
1, except that either
the
second
sound
the temporal ventriloquist effects (TVEs), calculated
against the baseline, is presented in
5
or the first sound (Experiment 2b) was
0.4
a subplot for the ‘preceding-sound’
(30 ms) and 0‘trailing-sound’ conditions (-30 ms).
e conditions of audiovisual interval were
(30 ms before the onset of the respective
−5
0.2
ynchronous sound, and trailing sound
−10
30 ms −30biased
ms
middle
element
Ternus apparent motion
onset of the respective
visual
frame). with rhythmic 0or short precue intervals
100was
150
200
250
300
350 grouping on Ternus aptoward element motion, whereas50there
no effect
of crossmodal
ISOI (ms)
parent motion with same temporal settings. This indicated intramodal temporal grouping
4 a Psychometric
curves fittedwhich
for the data
fromto
all more
participants
promotes the saliency Fig.
of the
middle element,
leads
element motion percept
in Experiment 2a. The solid curve and circles represent the
for the three audiovisual intervals were
in responses. However,‘synchronous-sound’
such the effectcondition
was relative
weak
to be
manifested
for crossmodal
(audiovisual
interval
0 ms),
the
xperiment 1. For Experiment 2a, in which
dashed
curve
and
crosses
the
‘preceding-sound’
condition
(audiovitemporal grouping conditions.
mpanied the first visual frame, the mean
sual interval 30 ms), and the dash-dotted curve and pluses the
this
line of ‘trailing-sound’
research, we
further
investigated
the
of the crossmodal
ed in Table 1 (see alsoAlong
Fig. 4a).
A repecondition
(audiovisual
interval -30
ms).influences
The magnitude of theon
TVEs,
calculated
against the ‘synchronous-sound’
NOVA of the PSEs
showed
timing
and the
themain
event structure
intraand cross-modal
perceptual grouping (Chen, Shi,
baseline, is presented in a subplot for the ‘preceding-sound’ (30 ms)
iovisual interval & to
be significant,
Müller,
2011). In the
we used
bi-stable
tactile apparent
motion streams.
andstudy
‘trailing-sound’
conditions
(-30 two-tap
ms). b Psychometric
curves
P \ 0.05. Separate paired t-tests (with the
fitted
for
the
data
from
all
participants
in
Experiment
2b.
The
Since the two tactile taps were repeatedly presented with same inter-stimulus interval,
nferroni correction) of the PSEs revealed
magnitude of the TVEs, calculated against the ‘synchronous-sound’
the leftward and rightward
percepts
were
that(30
is,ms)
two mutually exclusive
baseline,motion
is presented
in a subplot
for thebi-stable,
‘preceding-sound’
lly) significant difference between the
and ‘trailing-sound’
conditions
(-30 ms)
states switched
equally and
unpredictably.
During the 90-second tactile motion
and trailing-soundperceptual
conditions (difference
stream,
mono
0.09). However, the
classical
TVEsbeeps
cal- were added and paired with tactile taps using various temporal asynPSEs didtap
not differ
reliably with
between
the beep,
preceding-sound
o the synchronous-sound
baseline
were far
chronies.
When
each tactile
was paired
one
we found a typical temporal
and
synchronous-sound
conditions
(3.5
ms,
P
=
gnificance (although
they
were
in
the
right
ventriloquist effect, as we found earlier (Shi et al., 2010). That0.9).
is, auditory intervals captuTheAs
factathat
there was
significant
TVE with a short
sound audiotactile interval
cally): -3.7 ms, P
0.42, and
3.7 ms,
res= paired
tactile
intervals.
result,
twonotaps
with perceived
preceding or trailing the first visual frame or a sound prehe preceding-sound and trailing-sound
were grouped together,
forming a dominant tactile motion percept. However, when only
ceding the second visual frame in Experiment 2 is consistent
ctively.
of the taps
(e.g. odd-numbered
taps) were
paired
beeps,
modulation of audiotactiwith Bruns and Getzmann
(2008),
whowith
reported
a similar
were obtained inhalf
Experiment
2b, where
le
temporal
asynchronies
was
diminished.
Instead
of
a
temporal
capture
effect, a dominant
pattern with temporally proximal single sounds. However, a
mpanied the second visual frame—see
smallaudiotactile
TVE was foundside
for trailing
associated with
motion
from the
to thesound
tactile-only
side the
was observed indepenmean PSEs (see also
Fig. 4b).percept
A repeatedsecond
visual frame
in Experiment
asymmetry
A revealed the main
effect of
of the
audiovisual
dently
crossmodal
asynchrony
variation.
This 2b.
wasThe
mainly
due to strong attentional
preceding and grouping,
trailing sounds
is similar
the
gnificant, F(2,18) =
5.49,
P \ 0.05.
Folbias
towards
the
side between
of the crossmodal
giving
rise toto apparent
tactile motion
results obtained in a previous TOJ study with a dual-sound
ests (with Bonferroni correction) showed
from the side of the audiotactile grouping to the other side.
modulation (Morein-Zamir et al. 2003). The underlying
railing-sound condition to be significantly
Taken
studies
together,
weunclear,
have though
a clear
view
how crossmodal
interval and
mechanism
is still
it has
beenon
suggested
that
d to both the synchronous
soundthese
(classical
ussion
ms, P \ 0.05) and the preceding sound
-9.4 ms, P \ 0.05) condition, while the
this pattern relates to an asymmetry in audiovisual simultaneity (Dixon and Spitz 1980; Morein-Zamir et al. 2003).
1.2 Cumulative research work
11
perceptual grouping influence on multisensory temporal integration. The temporal ventriloquist effect has been manifested repeatedly for full paired crossmodal stimuli. Convergent
evidence suggests that crossmodal interval/duration integration is one important factor for
the temporal ventriloquist effect. On the other hand, when the crossmodal stimuli are unequally paired, perceptual grouping (either intra- or cross-modal grouping ) may first be
processed, which leads to dynamic attention shifts and bias the motion percept.
1.2.2
Multisensory and sensorimotor time perception
Although distributed models of time perception have been gradually accepted in multisensory time research (Bueti, 2011; Buhusi & Meck, 2005), it is still controversial how the
distributed (or modality-specific) timing is integrated together. Distributed timing processes may cause differences between action and perception time, which has been sparsely
mentioned in the literature. For example, Walker and Scott once found that motor reproduction relying only on kinesthetic information (i.e. action timing) was overestimated by
about 12 percent for an auditory standard duration (Walker & Scott, 1981). In a recent
study (Bueti & Walsh, 2010), an action task, where participants reproduced an auditory
or visual duration by pressing a button, was compared to a perceptual task, where participants stopped the compared signal when its perceived duration reached the same amount
of time as the standard duration. The action timing was strongly overestimated for short
durations and underestimated for long duration. Some other studies also demonstrated that
the perceived time of a second presented immediately after a saccade or arm movement is
often perceived longer than subsequent seconds (but see Binda, Cicchini, Burr, & Morrone,
2009; Park, Schlag-Rey, & Schlag, 2003; Yarrow, Haggard, Heal, Brown, & Rothwell, 2001).
Given that action and perceived time is far from veridical and time estimation can be easily
biased by various factors, our brain must encounter challenges to integrate various sources
of temporal information to enable accurate timing for a multisensory or sensorimotor event.
In a recent study (Shi, Ganzenmüller, & Müller, 2013), we investigated this issue using
three different duration estimation tasks: auditory duration comparison, motor reproduction, and auditory reproduction. Auditory duration comparison and motor reproduction
tasks aimed to measure perceptual and action time processing, whereas the auditory reproduction task was a bimodal (i.e. perceptual and motor) task, which aimed to find how
perceptual and action durations are integrated together. We measured estimation variability for all three different tasks. In the spatial domain, reliability-based optimal integration
models, such as MLE (1.1), have successfully predicted the effects of multimodal integration in various cases, such as visual-haptic size estimation, audiovisual localization etc.
(for a review, see Ernst & Di Luca, 2011). In one of our previous studies using implicit
measure (Shi et al., 2010), we also found that the MLE model predict audiovisual duration
integration well. We further tested the reliability-based integration model for sensorimotor
temporal integration (Shi et al., 2013), particular for auditory reproduction. In contrast to
the previous approach using the implicit assumption of unbiased estimates,2 we explicitly
2
For Bayesian integration models, disregarding biases allows one to focus on minimizing variance as an
12
1. Synopsis
introduced biases in the quantitative model. Suppose there is a standard auditory duration
Ds . An auditory estimate D̂a , derived from a duration comparison task, may contain a bias
a . A pure motor reproduction, on the other hand, may lead to a different estimate D̂r ,
containing a different bias r . That is,
E(D̂a ) = Ds + E(
a ),
(1.2)
E(D̂r ) = Ds + E(
r ),
(1.3)
where E(·) is the expectation function. In auditory reproduction, both perceptual auditory comparison and motor reproduction are present. Suppose perceptual and motor
estimates are independent of each other, the maximum likelihood prediction of the auditory reproduction is given by the following:
E(D̂ar ) = Ds + wa E(
a)
+ wr E(
r ),
(1.4)
where wa and wr are the weights of perceptual and motor estimates. According to MLE,
the optimal weights should be inversely proportional to the correspondent variances,
1/ a2
wa =
1/ a2 + 1/
wr = 1 wa .
2
r
,
(1.5)
(1.6)
2
If the optimal weighting rule is followed, the variance for the auditory reproduction ar
should also be lower than the variances of the pure perceptual and motor estimates, a2
and r2 .
Using one second auditory intervals as a standard stimuli in three different duration
tasks, we confirmed the previous finding of overestimation in motor reproduction (Walker
& Scott, 1981). In our case, the motor reproduction produced about 40% overestimates,
whereas the auditory comparison task provided a relative precise estimation (Figure 1.5).
We further compared reliability-based MLE predictions and observed behavioral results,
and found the prediction of the MLE model was relative high for the observed auditory
reproduction (r = 0.62) and variability (r = 0.68).
Similar conclusions were further confirmed by a subsequent experiment with varied
standard durations and varied signal-noise ratios (SNRs) in the compared/reproduced tones (Figure 1.6, r = 0.81.). The MLE prediction on sensorimotor duration reproduction
was proved to be far better than either a motor or a perceptual dominance model. However,
turning to the variability of the bimodal condition, the MLE model turned out to be an
suboptimal model, that is, not showing the theoretical improvement. Interestingly, though,
it confirmed our previous findings (Shi et al., 2010) and other recent studies (Burr et al.,
2009; Hartcher-O’Brien & Alais, 2011; Tomassini, Gori, Burr, Sandini, & Morrone, 2011).
optimality criterion. In some studies (e.g. Burr et al., 2009), biases are assumed to be constant across all
conditions.
1.2 Cumulative research work
13
means (ms)
600
Motor Rep.
Comparison
Auditory Rep.
400
200
0
Bias
SD
Abbildung 1.5: Mean estimation biases and standard deviations (SDs) with ±1 standard
error bars for 1-second duration estimation in three different tasks.
600
400
H/800
L/800
H/1200
L/1200
200
0
−200
M.−Rep. Comp. A.−Rep.
B
Observed reproduction
A
Mean estimation biases (ms)
That is, the variability in crossmodal temporal integration is often found to be suboptimal.
The reason of this suboptimal integration is not clear at present. It has been suggested
that the assumption of Gaussian noise might not be appropriate for timing tasks (Burr
et al., 2009). Alternatively, additional decision noise may be introduced in the bimodal
(or sensorimotor) task owing to multiple information and increased task difficulty. It is
also possible that time estimates from different sensory (motor) modalities are not independently distributed, but partially dependent, as hinted by the literature of the amodal
internal clock model. When sensory estimates are correlated, it has been shown that the
true optimal weights and reliability could dramatically deviate from independent optimal
integration (Oruç, Maloney, & Landy, 2003).
1800
1600
1400
1200
1000
800
600
500
1000
1500
Predicted reproduction
Abbildung 1.6: A. Mean estimation biases (with ±1 standard error bars) as a function of
standard duration and SNR. H and L denote high and low SNRs, 800 and 1200 short and
long standard in ms. B. Observed reproductions plotted against predicted reproductions.
The solid red line is a linear regression of the data (y = 45 + 1.029x). The dot-dashed
line indicates ideal optimal cue integration based on MLE. The green and blue crosses
represent data from high and low SNR conditions respectively.
In addition to the feedback information, feedback delay itself can influence the duration
14
1. Synopsis
reproduction, too. In another recent study (Ganzenmüller, Shi, & Müller, 2012), we investigated this issue by injecting an onset- or offset-delay to the sensory feedback signal from
a duration reproduction task. We found that the reproduced duration was lengthened, and
the lengthening effect was observed immediately, on the first trial with the onset-delay.
In contrast, a shortening effect was found with feedback signal offset-delay, though the
effect was weaker and merely manifested partially in the auditory reproduction, not in the
visual reproduction. The offset of reproduction much relied on the action stop signal. The
findings suggest that the reproduction task with feedbacks integrates both perceptual and
action time, but relies differentially on the onset of the feedback signals and the motorstop signals. Such differential binding may well relate to the memory-mixing model (Gu &
Meck, 2011). Due to limited capacity of working memory and the cause-effect relationship,
motor timing, and caused -feedback timing may share the same representation, which pulls
both onsets closer. In the study (Ganzenmüller et al., 2012), we further confirmed strong
overestimation in the auditory reproduction as shown in other studies (Bueti & Walsh,
2010; Shi et al., 2013; Walker & Scott, 1981).
Overestimation of duration can also be induced by emotional states. For example, threatening pictures (Droit-Volet et al., 2004) or angry faces (Droit-Volet et al., 2007) are often
judged as longer than neutral stimuli. However, most evidence of emotional distortion of
time perception has been gained with unisensory modulation only. Given that time processing is distributed (for a review, see Bueti, 2011), there is no guarantee that introducing
emotional stimuli in one modality could influence on the time perception of stimuli from
another modality. On the other hand, emotional states may increase general arousal level,
and/or bias the crossmodal linkage and perception-action associations, which may in turn
influence duration judgments in other modalities. Recently we investigated this issue using
a visual-tactile approach (Shi, Jia, & Müller, 2012). We compared modulation induced by
three types of emotional pictures (threat, disgust, and neutral) on the subsequent judgment
of vibrotactile duration. The results revealed that the processing of threatening pictures
to lengthen, relative to the neutral baseline, subsequent judgments of tactile duration. However, there was no evidence of the lengthening effects using disgust pictures. This clearly
rejected the hypothesis of a general arousal as a determine factor. We further examined
how visual threat influences tactile time processing. If only the pacemaker of the tactile
‘clock’ was sped up, we should observe a slope effect using short and long range intervals
(Wearden, 2008), that is, larger difference between the threat and neutral conditions in the
long interval condition than the short interval condition. However, this was not the case.
Further experiments revealed that emotional activation is followed by emotional regulation. When participants were exposed to threatening pictures, attentional resources was first
rapidly directed to the defensive system, including the somatosensory system, for preparing
a reaction. As a result, the tactile time processing is dilated. While the same would apply
to the long interval condition, participants eventually realized that the tactile stimuli was
not a threat event. Accordingly, attentional resources would be increasingly redirected to
processes of emotional regulation. As a consequences, the lengthening effect disappeared.
High-arousal emotional state not only dilates the duration perception, but prioritizes
the crossmodal temporal processing, as shown in one of our new study (Jia, Shi, Zang, &
1.2 Cumulative research work
15
Müller, 2013). In the study, participants were asked to make temporal order judgments
(TOJs) to a pair of audiotactile stimuli while gazing at a concurrently presented emotion
picture. When the audiotactile stimuli were presented separately on the left and right
sides, a significant temporal bias toward the tactile modality was found when the picture
had negative meaning (e.g. threat). This finding confirmed our previous conclusion (Shi
et al., 2012) that visual-tactile linkage in emotional association is more likely to direct
attention toward the tactile than auditory modality. Interestingly, when audiotactile stimuli
originated from the same location, there was no such emotional modulation of modalityorientated attention. This suggests that the unity assumption (Welch & Warren, 1980)
in crossmodal integration, that is, multisensory stimuli that come from the same origin
is likely to be integrated as one single multisensory object than two distal signals, could
counteract the otherwise ensuing modality-oriented attentional bias.
1.2.3
Multisensory enhancement, context learning and search performance
It is known that detection of a spatio-temporal coinciding multisensory signal is faster than
each of the corresponding signals presented separatly. Recently studies by van der Burg
and colleagues revealed an interesting phenomenon, the ‘pip and pop’ effect, which showed
that spatial uninformative but temporal informative beeps could facilitate search performance (Van der Burg et al., 2010; 2008). In their paradigm, participants had to search for
a horizontal or vertical bar among oblique distractors. Both the target and distractors were
either green or red, and changed their color randomly. Thus the search task was extremely
difficult (see Figure 1.7). When color changes of the target were accompanied by synchronous beeps, however, the search performance was boosted in the order of seconds. Van der
Burg and colleagues argued that enhanced performance was due to bottom-up audiovisual
integration and saliency-boosting. In contrast, other literature (Colonius & Arndt, 2001;
Doyle & Snowden, 1998) showed that performance enhancement by audiovisual integration
was typically around 100 ms, way less than the reported pip and pop effect.
To further examine the effects of spatially uninformative sound on visual search and
the underlying mechanisms, we recently adopted the pip-and-pop paradigm (Van der Burg
et al., 2008) and measured eye movements (Zou, Müller, & Shi, 2012). In addition to the
auditory synchronous cues, we introduced an informative spatial (central arrow) cue as a
top-down attentional guidance and a target-absent condition in a separated experiment. If
the pip-and-pop effect is pure bottom-up crossmodal enhancement, we should observe no
interaction with top-down precue manipulation, as well as no facilitation in the visual target
absent condition given that no crossmodal integration would happen. Our study replicated
the pip and pop effect. More interestingly, the effect was not purely bottom-up, as we found
interaction between top-down precue and sound presence (Figure 1.8, Left). In addition,
detection was also facilitated with the presence of the beeps when the target was absent
(Figure 1.8, Right). The behavioral results indicated that some top-down strategies must
have been adopted by participants. Further eye movement data showed that mean fixation
16
1–18
1. Synopsis
Zou, Müller, & Shi
8
d from a ‘‘pip-and-pop’’ effect
integration). In Experiment 1,
on effect was found to derive
rget side, suggestive of a general
Experiment 2 was designed to
pip’’ from a ‘‘pip-and-pop’’ effect
-absent trials, in addition to
the non-spatial beeps cause a
f visual search, for instance, as a
d spatially extended information
at-beep fixations, the facilitation
ved even on target-absent trials
entirely) irrelevant sounds.
Figure 5. Example search display used in Experiment 2. Displays
me as in Experiment
1, with1.7:
the An example
Abbildung
search display used in ‘pip-and-pop’ search paradigm. Displays
contained 36 bars of different orientations, and observers had to
w.
contained multiple bars of different orientations, and observers had to detect the target
detect whether or not a target, either a horizontal or a vertical bar,
orientation (or the was
target
presence,
in aone
of ouralteration
experiment).
There was a repeating
present.
There was
repeating
of the display
items’
colors,
occurring
at
random
time
intervals.
The
onset
of
the
alteration of the display items’ color, occurring at random time intervals.
The onsets of the
color
changes
were
accompanied
by
mono-tone
beeps,
which
changes
observers (ninecolor
females,
meanwere accompanied by mono beeps.
were either synchronized with the changes of the target or of
mal or corrected-to-normal visual
distractors depending on conditions of target presence (see
aring participated in the experiMethod section for details).
n informed consent
were paid
wasand
longer
in the sound-present
than -absent condition (see Figure 1.9). In particular,
They also practiced
task in one
thethefixation
duration was extended when the beep occurred during the fixation and the
Results and discussion
to the formal experiment.
amplitude of the immediately following saccade was increased. Eye movement patterns
revealed that participants
to fixate
longer
additionaltrials
sounds were presented,
Meantended
accuracy
was lower
forwhen
target-present
permitting temporally
and than
spatially
expanded information
(90.3%)
for target-absent
trials (99.8%),sampling
F(1, 14) and improving the
¼ 0.79.
target-present
trials,
¼ 51.85,
p ,changes
0.01, gp2and
ues, a white fixation
dot in the
registration
of singleton
color
thusFor
guiding
the next
saccade more precisely
2
mean
RTs
were
significantly
longer
for
error
(i.e.,
target
beforeto the target. The study demonstrated that temporal coincident audiovisual
28, 75.8 cd/m ) was
andshown
efficiently
miss) responses (12.21 s) than for correct (hit) responses
al. The dynamic search display
events not only show perceptual enhancement, but also influence
oculomotor behavior and
(5.39 s), F(1, 14) ¼ 62.45, p , 0.01, gp2 ¼ 0.82; by
ly when participants had fixated
boostdisplay,
performance.
contrast, for target-absent trials, mean RTs did not
1000 m. In the search
all
Besides10multisensory
enhancement,
learning
of spatial
context could also facilitate
differ significantly
between
error
(i.e., false-alarm)
stributed across an invisible
·
responses
and
responses,
F(1,
¼
10.78, 0.558 jitter).
avoid
searchTo
performance.
In one of
ourcorrect
recent(rejection)
studies (Geyer,
Shi,
& 14)
Müller,
2010), a contex0.15,
p
¼
0.71.
This
pattern
(in
particular,
the
raised
rgets (if present)tual
never
appeared
cueing paradigm with multiconjuction visual search was used. We confirmed a robust
error rate for target-present trials) is likely attributable
cells of the matrix
(see Figurecueing,
5).
contextual
that
is, target presence was discerned more rapidly when the target was
to the difficulty of the search task: participants stopped
play contained a target (either a
a predictive
compared
to a non-predictive
configuration.
searching
after a certain
amount of time
had elapsed Further, contextual
bar) in half of embedded
the trials; ininthe
cueing was dislarger when
only
subsethaving
of configuration
containing
theoftarget, compared to
without
a target
been detected.
The bias
ontained only (oblique-bar)
responding
‘‘target absent’’
in this contextual
case yielded
an was larger when a
configurations,
was predictive.
In addition,
cueing
eriment 1, therethe
wereother
two sound
increased
error
rate
on
target-present
trials.
Note
that
nt and sound-absent.
Importantpredictive display was repeatedly shown across two successive trials. These findings reveal
response
accuracy was
unaffected
byguidance
sound condition,
condition, the onset
of the beeps of spatial
the importance
contextual
learning
for the
of visual search. In another
F(1, 14) ¼ 2.00, p ¼ 0.17, gp2 ¼ 0.12. Thus, only trials
h the target color changes on
recent study (Shi, Zang,
Jia, Geyer, & Müller, 2013), we applied a similar contextual cuwith correct responses were subjected to the subsequent
ut with random distractors color
eing
paradigm
mobile user interface, examining icon re-configurations during display
on target-absent
trials.
Partici-to a analyses.
model
switch
in
touch-based
mobile devices. In most current devices, icons are shuffled in
a two-alternative forced-choice
rapidly as possible
to indicate
a positional
order when the display mode is changed (e.g., from the portrait to landscape
was present. Sound-present and administered block-wise, with 4
on presented in random order; in
and -absent trials were randomof 30 trials.
Reaction time effects
Figure 6 presents the mean correct RTs as a function
of target presence for the conditions with and without
sound. A repeated-measures ANOVA with the factors
target presence and sound presence revealed target-
1.2 Cumulative research work
Journal of Vision (2012) 12(5):2, 1–18
17
Zou, Müller,
& Shiof Vision (2012) 12(5):2, 1–18
Journal
5 & Shi
Zou, Müller,
search. Mean numbers of fixations are shown in Figure F(2, 28) ¼ 14.90, p , 0.01
3a. The pattern is similar to the mean RTs (Figure 2) – presence or absence, F(1,
as also confirmed by a repeated-measures ANOVA, (Figure 7b). Interestingly
which, similar to the RT ANOVA, yielded a significant presence and fixation ty
main effect of sound presence, F(1, 7) ¼ 21.42, p , 0.01, 3.54, p , 0.05, gp2 ¼ 0
gp2 ¼ 0.75; a significant main effect of cue, F(1, 7) ¼ longer duration of fixat
56.94, p , 0.01, gp2 ¼ 0.89; as well as a significant present, compared to the
interaction between the two factors, F(1, 7) ¼ 15.54, p ms vs. 410.9 ms; Figure
, 0.01, gp2 ¼ 0.69. This pattern indicates that the
These findings of an
synchronous sounds facilitated visual search in general beeps (with or without
by permitting participants to plan more effective longer duration in the so
saccades, and this facilitation was more pronounced tion would appear to be
when the cue was invalid. To explore the latter effect pop’’ account assumin
further, we separated fixations on the target side from boosting of visual salien
those on the non-target side (see Figure 3b). The et al., 2008). Such an ac
significant interaction between cue validity and sound the opposite pattern: if
presence
was largely due(6SE)
to the
non-target side, F(1, 7) the sound, one would ex
6. Mean reaction times
in seconds as a function of
2
Figure 2. Mean reaction times (6SE) in seconds as a function of Figure
¼a
0.65,
rather
than
targetand
side, the latency of next, ta
¼ 13.09,
p ,(present,
0.01, gas
pabsent),
target
presence
for
sound-present
(stars)validity
Abbildung
1.8:
Left:
Mean
reaction
time
(±SE)
in
seconds
function
ofthecue
cue validity and sound presence; stars (solid line) and squares
0.04. That is, for the non- shortened. Note that t
F(1, 7) ¼ 0.29,
p ¼ 0.61,
gp2 ¼respectively.
sound-absent
conditions
(squares),
represent theRight:
sound-present
sound-absent
and (dotted
soundline)presence.
meanandreaction
timetarget
(±SE)
seconds
asreduced
a function
of target
side, in
sound
presence
the number
of similar to that adduced t
conditions, respectively.
saccadesconditions
(on invalid trials)
dramatically (from 17.5 to target-absent compared
presence, for sound-present (stars) and sound-absent
present responses
to be(squares),
faster than respectively.
target-absent
8.1 saccades), indicating that the synchronous sound visual search (see, e.g.
(5.4 s vs. 12.9 s), F(1, 14) ¼ 62.0, p , 0.01,
ANOVA revealed no RT ‘‘facilitation’’ for error versus responses
contrast, the finding of i
effectively
guided saccades to the valid target side.
2
0.82. The main effect of sound presence was nearcorrect trials, F(1, 7) ¼ 0.39, p ¼ 0.55, gp2 ¼ 0.07; that is, gp ¼To
examine
how¼participants
managed
2to minimize beeps is more in line w
significant,
F(1,
14)
4.48,
p
¼
0.05,
g
¼
0.23:
p
there was no evidence of a speed versus accuracy tradetheir number
of
saccadesicons
(and,
thus,
fixations),
we induced by the sound
mode).
disrupts
the spatial
among
Figure
1.10).
synchronous
beeps
facilitated
search(see
performance
by
off inSuch
searchremapping
task performance.
Consequently,
in the relationships
durations in the soun
compared
the
mean
fixation
durations
among
each
742 ms, consistent
with the results
of Experiment
1.traditioThe
subsequent
analysis,novel
only correct
trials remapping
were included. methods:
sound-and-target-absent
We tested
several
display
“position-order
invariant”
(a
sound
and
cue
condition.
A
repeated-measures
ANinteraction between sound presence and target presence
by assuming that an em
OVA
of
the
mean
fixation
durations
(presented
in
nal icon-shuffle method), “global rotation” (rotating
wholeF(1,
display),
“local
was not the
significant,
14) ¼ 0.001,
p ¼invariant”
0.97,
during such a fixation r
Figure
4a)
revealed
the
main
effect
of
sound
to
be
indicating
that synchronous
beeps
facilitated
respond-squaReactionlocal
time effects
2
confidence (based on fur
(preserving
regions), and “central invariant”
(preserving
the
central
maximal
¼
0.47,
but
not
significant,
F(1,
7)
¼
6.22,
p
,
0.05,
g
p
ing to essentially the same extent on target-absent 2as on
a ‘‘target-present’’ decisi
¼
0.03.
that
of
cue
validity,
F(1,
7)
¼
0.23,
p
¼
0.64,
g
p
re region).
Wemean
found
that
when
using
the local-invariant
or central-invariant remapping
target-present trials.
Individual
reaction
times
(RTs) were
estimated
Analysis of the numb
fixation duration was 137 ms longer on trials
for eachcontextual
variable combination,
excluding
error respons-afterMean
overall fewer fixations w
methods,
cueing
was
preserved
the
display
was
changed,
indicating
perwith
sound
than
on
those
without
sound.
However,
in
es. Figure 2 presents the mean correct RTs averaged
than in the sound-absent
contrast
to
the manual RTs
and the
number offor
effects
formance
benefits inRTs
thewere
iconthen
localization
The
global-rotation
method
is intuitive
across participants.
submitted totask.
a Oculomotor
, 0.01, gp2 ¼ 0.43. And,
fixations,
there
was
no
interaction
between
cue
validity
The mean fixation durations are depicted in Figure
repeated-measures
ANOVA
with
cue
validity
and
2
search,
establishing tar
users,
however, in the present study, using desktop
monitor
to F(1,
simulate
the
0.25.
presence,
7)
¼ 2.30,
pof
¼ mobile
0.17,
gp ¼device,
sound presence as factors. The main effect of cue 7a.and
A sound
repeated-measures
ANOVA
the fixation
fixations
than did establ
2
Inwas
order
toto further
thebetween
fixation
pattern
it might
introduce
additional
mental
rotation
detrimental
to search
performance.
validity
was significant,
F(1, 7) ¼ 56.34,
p , 0.01,
gp ¼ that
durations
failed
reveal aexplore
difference
target¼ 73.71, p , 0.01, gp2 ¼
during
the
dynamic
search,
we
re-categorized
the
As expected, RTs on valid-cue trials were faster, present and -absent trials (main effect of target presence:
The 0.89.
findings
thus provide new guidelines for novel
interface
design
of0.16).
icons
rearrangement
fixations
into pthree
types:
accompanied
by a target presence and soun
by 3.49 s on average, than RTs on invalid-cue trials. F(1,
14) ¼ 2.62,
¼ 0.12,
gp2 ¼fixations
However,
fixation
F(1, 14) ¼ 0.67, p ¼ 0.43
beep, fixations
without beep
but on
on trials
a trialwith,
withthan
sound
The main
effect of sound presence was also significant, durations
were significantly
longer
in mobile
devices.
The mean saccade am
2
presence,
and
fixations
on
trials
without
beeps.
The
F(1, 7) ¼ 19.25, p , 0.01, gp ¼ 0.73. Search on trials without, sound (main effect of sound presence:
7d. A repeated-measure
2three types of fixation are
mean
durations
for
these
performance was on average 2.4 s faster on trials with F(1, 14) ¼ 6.03, p , 0.05, gp ¼ 0.3); thus, they were
cant main effect of the ta
depictedwith
in Figure
4b. A
ANOVA
synchronous beeps. This result replicates the pip-and- consistent
the results
(inrepeated-measures
the target-present condip , 0.05, gp2 ¼ 0.33, whi
with
two
factors
fixation
type
and
cue
validity,
revealed
pop effect, indicating that synchronous beeps facilitate tion) of Experiment 1. Importantly, this effect of sound
was non-significant, F(1,
no significant
effect
of cue validity,
F(1, 7) ¼ 1.13,
visual search performance. Interestingly, there was a presence
was
also
manifest
on target-absent
trials,p ¼ The target presence · s
2
¼ 0.14.
However,
were
significant
0.32, gp that
significant
interactioninbetween
validity and sound
indicating
the beeps
had processing,
a there
general
effect
(not
marginally significant, F
Delay
is ubiquitous
signalcuetransmission
and processing.
The
neural
for
differences
among
the
threeontypes
of fixation,
F(2,exam14) 0.23. On average, saccad
presence, F(1, 7) ¼ 11.51, p , 0.05, gp2 ¼ 0.62. The confined
to target
presence)
visual
search
perfor2
¼ 0.67.
Bonferroni
tests
revealed
¼ 14.45,
p ,to
0.01,
ple, search
takesbenefits
some induced
time to
convey
the sensory
thegp brain.
For
instance,
signals
by the
accompanying
synchro-information
mance.
Similar
to
Experiment
1, we
categorized
fixations
on target-absent than
thethree
mean
fixation
duration
to be significantly
longer result was likely due to
nous
beeps
were
larger
for
invalid
trials
(mean:
4.72
s)
into
types
(fixations
in
the
sound-absent
condition,
fromcompared
the human
retina
to
the
visual
cortex
requires
about
70
to
100
ms
(Schmolesky
et
when the
fixation
wasinaccompanied
by a beep
than for target position at the e
to valid trials (mean: 2.26 s).
fixations
without
beeps
the sound-present
condition,
thelatencies.
other twowith
types
of fixation
(both
p ,is0.05),
while
al., 1998). Other modalities have similar neural
Given
that
delay
not
negliand
fixations
beeps).
Further
analysis
of the
present trials). This pr
the durations
did
notthediffer
between
thetype
latter
two comparison of the propo
fixation
durations
with
factors
fixation
and
gible,Oculomotor
one challenge
faced
in
everyday
environment
for
our
visual
system
is
the
veridical
types,
p ¼ 0.11.
The fixation
duration
was,
average, 18 (i.e., approximately th
effects
target
presence
revealed
the mean
duration
of aon
fixation
by
440
ms
when
the accompanied
fixation
was accompanied
spatiotemporal representation of moving objects.
fast
moving
would
introduce
a
toextended
be A
overall
longer
when
itobject
was
by
a beep,
target-present
(25.1%) an
by a beep
relative
to fixation
the mean
the other
two the proportion of such sa
To further explore dynamic search behavior, we compared
to the
other two
typesofwithout
beeps,
spatial
lag
if
the
latency
(about
100
ms)
was
not
compensated.
A
typical
visual
illusion
examined all fixations and saccades made during the
conditions. Furthermore, the interaction between cue
1.2.4
Delays in multimodal feedback and user experience
induced by the neural transmission delay is the flash-lag effect, that is, a moving object
appears to be ahead of a spatial aligned flashed object. The initial hypothesis proposed by
Nijhawan (1994) suggested that the position of the moving object is extrapolated forward
to compensate for neural delays in the visual pathway so the object’s perceived position
is closer to the object’s true instantaneous location. Since then, other alternative accounts
have been proposed, such as differential latency, attention shift hypotheses, postdiction
18
1. Synopsis
Journal of Vision (2012) 12(5):2, 1–18
Zou, Müller, & Shi
10
Figure 7. (a) Mean fixation duration (6SE) in milliseconds as a function of target presence (present, absent), for sound-present (stars) and
-absent conditions (squares), respectively. (b) Mean fixation duration (6SE) in milliseconds as a function of target presence (present,
absent), separately for fixations on sound-absent trials (squares), and for fixations with (stars) and, respectively, without beep (diamonds)
on sound-present trials. (c) Mean number of fixation (6SE) as a function of target presence (present, absent), for sound-present (stars)
and -absent conditions (squares), respectively. (d) Mean saccade amplitude (6SE) in degrees of visual angle as a function of target
presence (present, absent), for sound-present (stars) and -absent (squares) conditions, respectively.
Abbildung 1.9: (a) Mean fixation duration (±SE) in milliseconds as a function of target presence (present, absent), for sound-present (stars) and -absent conditions (squares),
respectively. (b) Mean fixation duration (±SE) in milliseconds as a function of target presence (present, absent), separately for fixations on sound-absent trials (squares), and for
the former condition, F(1, 14) ¼ 25.85, p , 0.01, gp2
confidence that a target is actually absent within the
fixations¼in 0.65.
with
(stars) and, respectively, without
beep (diamonds) on sound-present trials.
The borderline-significant interaction (p ¼ 0.06)
currently attended region. For target-absent trials, this
was
mainly
due
to
(the
presence
of)
beeps
increasing
condition
would leadpresence
to, on average,
larger subsequent
(c) Mean number of fixation (±SE) as a function of target
(present,
absent), for
saccade amplitudes in the target-absent condition, t(14)
saccades to outside the currently scanned region.
sound-present
(stars)
and
-absent
conditions
(squares),
respectively.
(d)
Mean
saccade
am¼ 2.6, p , 0.05. Recall that the fixation duration
In summary, consistent with results of Experiment 1
analysis (presented above) had revealed the mean
and previous findings (van der Burg et al., 2010; van
plitude fixation
(±SE)
in degrees
of visual
angle as a function
presence
duration
to be extended
in the sound-present
der Burg et of
al., target
2008), non-spatial
beeps(present,
synchronizedabsent),
conditions.
Longer
fixation
durations
may
permit
the
with
dynamic
color
changes
of
the
target
for sound-present (stars) and -absent (squares) conditions, respectively. can facilitate
attentional spotlight to be expanded and gain greater
visual search. The major finding of Experiment 2 was
1.2 Cumulative research work
19
Shi, Zang, Jia, Geyer, & Müller
181
(a)
(b)
182
Figure 2. Example displays in the experiments. (a) Example of a horizontal display. In this example,
184
185
display. In this case, icons are remapped from the horizontal display (a) by keeping the position order
Abbildung 1.10: Mockup displays for a mobile device. When the display mode is changed,
183 the “Apple” icon (second row, right-most column) is the search target. (b) Example of a vertical
icons
are shuffled and the spatial relationships among icons are partially destroyed.
constant in the left-to-right and up-to-down manner (Experiment 1).
etc. (Baldo & Klein, 1995; Eagleman, 2000; Whitney & Murakami, 1998), to explain the
186 Design
flash-lag
effect.and
Theprocedure
major difference between the extrapolation account (Nijhawan, 1994)
and others is that other hypotheses simply deny low-level compensation mechanism, since
187 A three-factorial within-subject design was used with display orientation (horizontal,
such low-level extrapolation is hard to observe directly. In a recent study (Shi & Nijha188 vertical), context (old, new), and experimental epoch (1-9) as independent variables.
wan, 2012), we directly tested the extrapolation hypothesis using a novel approach, that
189 From the 24 possible target locations, we randomly selected 12 target positions for old
is, using the nature of two foveal scotomas (i.e., scotoma to dim light and scotoma to blue
190 and the other 12 positions for new displays. This way, the target appeared equally
light) to trigger motion extrapolation. In the central fovea there is a rod-free area about
191 likely at any of the 24 possible locations. Further, and given that one set of 12
0.3
diameter, where the low intensity objects fail to yield a visual percept (Hecht, 2002). If
192motion
locations
wasfollows
used for
old and the
other
set of 12 locations
new displays,
the
percept
faithfully
to the
retinotopic
map, onefor
should
observe target
a disconti193
location
repetitions
benefits
were
equated
between
the
two
types
of
display.
Thus,
RT fovea
nuous movement at the boundary of the fovea when the dim object moves across the
194
benefits
associated
with
the
presentation
of
an
old
display
can
only
be
attributed
to
the ex(see Figure 1.11, left). However, the forward shifts should be observed if there is motion
195 presentation
repeated iconmechanism
arrangement.
of the pathway,
old target even
locations
was there
trapolation
owing toofathe
compensation
in Each
the visual
though
paired with
11 randomly
selectedfovea
distractors
at the beginning
of each
experiment
and
is196
no physical
response
in the central
(see Figure
1.11, right).
Indeed,
our behavioral
197
served
as
old-horizontal
displays.
These
horizontal
displays
were
also
used
to
define
experiments provide solid evidence supporting the original motion extrapolation account
198Figure
the remapped
(see
1.12). old-vertical displays. Remapping was one as follows:
Perceiving time delay and crossmodal asynchrony also exists for external synchronous
199 (i) Experiment 1 (”position-order invariant”). The positional order (left-to-right and
multisensory
events. Sound, for example, travel through air much slower than light. Thus,
200 top to bottom) of the icons in the vertical display was the same as that in the
we hear a thunder several seconds later than the flash. Even if a light stimulates the retina
201 horizontal display (Figure 1b). This method is used in most of the present mobile
and a sound pushes the eardrum at the same time, brain activation occurs roughly 30-50
202 devices for the rearrangement of icons.
ms earlier for the auditory signal (Fujisaki et al., 2004). To have coherent perception of the
external
world
and precise
sensorimotor interaction
withdisplay
the environment,
ourrotated
brain must
203 (ii)
Experiment
2 (“global-rotation”).
The horizontal
was, as a whole,
compensate
latencies
and
adjust
multisensory
perception
accordingly.
204 by 90°
clockwise
into
the vertically
display,temporal
while preserving
the (upright)
orientation
In a of
number
of recent
studies
(Rank,
Shi,
Müller,
Hirche,
2010;relationships
2010; Shi, of
Zou, &
205
the individual
icons.
With this
global
rotation,
the&
global
and local
Müller,
2010;
Shi
et
al.,
2010),
we
have
investigated
various
delayed
multimodal
feedback
206 the icons are rotated by 90° across display changes (Figure 1c).
in multimodal telepresence systems and gained better understanding how multimodal de207
208
(iii) Experiment 3 (“local invariant”). To preserve the local (and global) spatial
configuration as much as possible, in Experiment 3, the display was divided into four
7
20
1. Synopsis
Abbildung 1.11: Left: A dim object moves across the fovea. If there is no extrapolation
mechanism in the visual pathway, the motion percept should follow faithfully to the retinotopic map. Right: A dim object moves across the fovea. Owing to the extrapolation
mechanism in the visual pathway, the moving object is still perceived within the rod-free
fovea, and reappears further away from fovea due to the neural transmission delay.
Motion Extrapolation in the Central Fo
Figure 3. Results of Experiment 1. (a) Individual thresholds of participants for three conditions. The left arrows denote the perceived vanis
Abbildung 1.12:
Results of Experiment 1 from Shi & Nijhawan (2012). Left: Individual
positions in the motion-terminated condition; the right arrows denote the perceived initial positions in the motion-initiated condition; the gray
2
. (b)arrows
Mean forward
shifts the
in theperceived
motion-initiated
and motion-terminated condit
denote
the
thresholdsfor
(50%)
of motion
visibility at 0.028
cd/m
thresholds of participants
three
conditions.
The
left
denote
vanis(6SE, n = 6). The vertical dot-dashed line denotes the mean radius of the relatively insensitive fovea centralis.
hing positions
in the motion-terminated condition; the right arrows denote the perceived
doi:10.1371/journal.pone.0033651.g003
initial positions in the motion-initiated condition; the gray bars denote the thresholds (50%)
2
trials. Figure 4a and
shows the thresholds for
eccentricityat0.5u,
which
is regarded
as Mean
in the rod-free
of motion visibility
0.028
cd/m
. Right:
forwardarea
shiftsininthe
thecatch
motion-initiated
according
to
the
anatomical
size
[25],
was
2.5%,
as
low
as
the
participants
with
the
green
and
the blue filters. The mo
motion-terminated conditions (±SE, n = 6). The2vertical dot-dashed line denotes the mean
mean false alarm rate (t(5) = 0.76, p = 0.48, gp = 0.1). This
insensitive boundary estimated with the blue filter was 0.8760
radius of thesuggested
relatively
fovea
that ininsensitive
the rod-free area
therecentralis.
was no response to the
(indicated by the vertical dot-dashed line in Figure 4b), wh
low luminance motion.
Figure 3a shows that all participants perceived the moving dot
as vanishing inside the motion insensitive fovea center in the
motion-terminated condition and appearing near the boundary of
the motion insensitive area in the motion-initiated condition. The
mean perceived termination and initiation positions (6SE) were
0.9260.12u and 1.3960.07u, respectively. Compared with the
boundary of the motion insensitive fovea center, the average
forward shift into the boundary was 0.5560.13u (corresponding to
agreed with previous estimates of Maxwell’s spot [26,27]. The
moving dot was perceived to vanish at position 0.4560.11u in
motion-terminated condition, and to first appear at posi
0.7460.09u away from the center in the motion-initi
condition.
Using the motion insensitive boundary, we calculated
positional shifts of the blue moving dot (Figure 4b). Consistent w
the result for the dim moving dot (Experiment 1), the blue mo
dot overshot significantly into the motion insensitive boundary
1.2 Cumulative research work
21
lay perception influences user’s performance. In particular, delays in the haptic modality
heavily depends on how the user issues action commands and what information is fed
back. In a study by Rank et al. (2010), we systematically investigated the impact of the
frequency and amplitudes of active movements on delay detection in a spring-type force
field environment. We found that the detection thresholds for time delay in force feedback
were negatively correlated with movement frequency and movement amplitude. Movement
amplitude and frequency influence the delay detection independently. Within a comfortable force range, magnitudes of feedback force did not affect the discrimination of the
haptic delay. This force invariant property in the haptic delay perception provides a useful
guideline for system design, such as micro manipulation systems. In such an application
area, forces arising in a micro-scale environment are typical very small, below human’s
detection threshold, thus forces must be scaled and augmented for operators to provide a
comfortable haptic impression. Our findings indicate that using a scale-up design does not
change the multimodal, particular haptic, temporal processing. In another study (Rank et
al., 2010), we further examined how we perceive haptic delay with different haptic environmental characteristics, such as spring, damper, and inertia. We found that the delay
detection was easiest in a damping environment, owing to an additional direction conflict
of the force induced by a time delay at movement turning point. All those findings that we
obtained could be very useful for future user-centered multimodal system design.
Besides inevitable delays in multimodal network communication, information itself
could be lost over large geographical distances via Internet(termed as packet loss). In
a recent study, we investigated how packet loss and communication delays affect the crossmodal temporal perception (Shi et al., 2010). We simulated packet loss patterns using the
Gilbert-Elliot model. When a burst of packet loss happens in the visual feedback, a moving visual scene stagnates. Thus, packet loss may induce a general impression of delayed
feedback. Experimental results confirmed that both the point of subjective simultaneity
(PSS) and the just noticeable difference (JND) increased as a function of packet loss rate
(Figure 1.13). This suggests that the perception of the stagnant visual information may
have biased judgments of the temporal order of the visual-haptic event and increased the
task difficulty, even though the visual-haptic event is intact and without delay.
To further identify how packet loss in the past impacts the forthcoming crossmodal
temporal processing, in a follow-up study (Shi et al., 2010), we conducted a new experiment on visual-haptic temporal discrimination with packet loss in the visual feedback.
This time, we switched off packet-loss before the critical visual-haptic collision with various switch-off intervals (Figure 1.14). The study revealed that the PSS decreased with
increasing switch-off distance, approaching the level achieved in the no-packet-loss condition at a switch-off interval of 172 ms (60-mm distance) (Figure 1.15). This suggests that
the past information does not immediately impact on the crossmdoal temporal processing,
indicating internal crossmodal temporal processing requires some time (in the range of
a hundred or so milliseconds) to incorporate prior information. Thus, on this ground, it
would be advisable to use this interval range for the implementation of assistive functions
in the design of telepresence systems.
As we have shown previously (Shi et al., 2010), packet loss and feedback delay are two
1. Synopsis
Prop. of "visual collision first" responses
22
1.00
No Loss
10% Loss
20% Loss
30% Loss
0.75
0.50
0.25
0
0
20
40
60
80 100
Visual−haptic SOA (ms)
120
Abbildung 1.13: Psychometric functions for the four different loss rates in visual-haptic
temporal order judgments (TOJs).
Abbildung 1.14: Schematic illustration of a trial sequence. The movement trajectory is
denoted by the long red arrow. The dashed line of the trajectory denotes visual feedback
with packet loss, and the solid line visual feedback without packet loss. The packet loss
switch-off distance is denoted by d.
1.3 Summary and outlook
23
Abbildung 1.15: PSS as a function of the switch-off time interval. The switch-off time
intervals were estimated from the movement velocity. Error bars indicate 95% confidence
intervals, which were estimated from 1000-sample bootstrapping.
important factors for visual-haptic temporal perception. Moreover, the user’s own action
characteristics (e.g. movement amplitude and phase) contributed to the delay detection,
too (Rank et al., 2010). To further unravel the impact of packet loss and delay on task
performance, we developed an performance-optimal control scheme for delayed network
feedbacks and tested task performances, measured by the number of collisions and task
completion time, between with and without an active communication control algorithm
(Rank, Shi, Hermann, Hirche, & Member, 2013). The quality control method is based on
a online predictive stochastic reachability analysis, that is, analyzing the collision probability of the user’s movements online and dynamically adjust network quality of service
(QoS). We used a 2D labyrinth with visual-haptic feedback and asked participants to ‘walk’
through the labyrinth avoid colliding to the walls as fast as possible. Experimental results
demonstrated that dynamic QoS control could effectively reduce the collision probability,
while completion time was not significantly affected (Figure 1.16). This is the first step
of a new approach that we took using sensorimotor interaction and predictive coding to
improve task performance in a delayed feedback system. It points out our future direction of research, using predictive human control strategies for delayed-feedback telepresence
systems.
1.3
Summary and outlook
The focus of this cumulative habilitation thesis is on dynamic temporal processing of multisensory information, which includes multiple aspects: multisensory temporal integration,
time perception, and multisensory performance enhancement, as well as potential applications in technical multimodal systems.
24
1. Synopsis
14.8
4.5
Ncol
4
Td (t) [s]
pcol ([0, tp ])
tcom [s]
achieved with a lower time delay level, t
is switched accordingly. The success of t
3.5
quality control scheme can be explained by
14.4
3
the case of an imminent collision, the opera
2.5
ates movement to not collide with the obst
14.2
off
on
off
on
inertia during this phase leads to a shorter
QoS control
QoS control
helping the operator to avoid an impact.
It can be additionally noted that comp
Fig. 10. While completion time is not significantly affected by communicasignificantly
influenced by the proposed
Abbildung 1.16: While
completion
time
was
not
significantly
affected
by
communication
tion quality control, the number of collisions with the walls are lower when
time
delay
is
controlled
according
to
the
collision
probability.
algorithm.
This
despite the fact that
quality control, the number of collisions with the walls was lower when dynamic QoS iswas
delay also causes packets to be lost [5] an
applied.
earlier found to increase task completion ti
0.6
between the lost packets in the quality-co
0.4
and simulated packet dropout investigated
0.2
10.2 two important
10.4
10.6
In the first line of the research,
factors, 10.8
crossmodal interval
perceptual
delayand
is lowered
not randomly, but only
approaches
an
obstacle.
grouping, in multisensory temporal
integration have been identified. Several studies (Chen Reconsidering tha
0.2
generally dominated by breaking actions, th
et al., 2011; Shi et al., 2010)
0.15 brought convergent evidence that crossmodal interval (duinformation on completion time may be ne
0.1
14.6
ration) integration determines the temporal ventriloquist effect. Asymmetric crossmodal
10.2 on the
10.4 other10.6
10.8 abolish the temporal ventriloor intramodal perceptual grouping,
hand, may
VI. C ONCLUSION AND F UTURE D
t [s]
quist effect. In addition, Interval (duration) integration plays a critical roleThe
in sensorimotor
novel notion of dynamic task perfo
timing, too. The reproduced
duration,
combination
andintroduced
mix of motor
and
Fig. 11. The
value of the for
cost example,
function J1 (t)isis a
displayed
exemplary for
for visual-haptic
telepresence ta
time delay levels (Td,max dotted, Td,min solid, Td,med dash-dotted). On the basis of a specific model, an optim
perceptual time. Thethree
weights
of
perceptual
and
motor
time
depend
on
the
variabilities
of
The optimal time delay Td,opt with minimum value of J1 (t) (dashed) is
quality control
algorithm is developed, p
lowpass-filtered
(solid) and
appliedsensory
to the communication
correspondent estimates.
Moreover,
when
feedbackchannel.
delay is introduced,
the repromal time delay based on the current task
duced duration is then heavily relied on the onset of the feedback, as well as the offset of
operator’s physical abilities. The approac
motor action.
of an imminent collision where lower latency produces lower the specific example of moving in an ob
visual andintegration
haptic feedback. Time delay as
probability
pcol ([0, tpapproaches,
]). The actual crossmodal
values for 1temporal
Using quantitive collision
measures
and Bayesian
lowers
the
operator’s task performance, m
and
were
found
in
piloting
experiments.
2
has been shown to follow the MLE model with some modifications. The main modification
After familiarization with the system and training of at of obstacle collisions and completion tim
is that biases are explicitly
acknowledged in sensory time estimates and in motor reproleast five trials with quality-controlled time delay and five of communication quality control demonst
duction. Incorporating
biases
explicitly
the model
prediction
ofcaused
MLEbyfor
collisions
the operator.
constant-delay
trials, eachin
participant
had toshows
performhigh
40 trials
crossmodal perceptual
integration
(Shi etusing
al.,the
2010)
and sensorimotor
durationquality control is b
As communication
withduration
communication
quality control
cost function
reproduction (Shi etfrom
al., equation
2013). (10) and 40 trials using a cost function without operators in terms of the ability to avo
considering the collision probability, resulting in a constant obstacles, further investigations are encour
Temporally coinciding
multisensory
and salienpriate algorithms
improving task perform
time delay
of Td = 0.2. signals may boost perceptual reliability
cy, and as a result, it facilitates
object
discrimination
and
identification.
Using
eye
tasks. The task performance m
Five engineering students participated in this experiment, telepresence tracking
consideration
of multiple
of them
in the Experiment
described in Secmethod, the study ofnone
Zou
et al.participated
(2012) revealed
that audiovisual
synchronous
events
can performance mod
control
scheme.
It
may further be inter
tion
IV.
All
were
right-handed.
All
experimental
conditions,
alter oculomotor behavior to boost search performance. An auditory tone automatically
quality-controlled and constant time delay conditions, were different communication quality control alg
freezes eye movement, extending the fixation duration. Such long fixation may allow parfully intermixed to avoid adaptation effects. The duration of performance improvements, e.g., the curre
ticipants to use effective
information
planning.
a result,
search
could
be compared
to a static quality assign
one trial
was less thansampling
30 seconds,and
thus saccade
all 80 trials
could be As
performance is greatly
improved.
In
addition,
influences
of
contextual
learning
on
search
performance model used to calculate the c
finished within one hour.
experiment
was performed
using
the hardware setup in this paper was only based on a mechanic
performance have alsoThe
been
investigated
in various
studies.
delay-dependent inertia. More sophisticated
described in the Appendix.
The final research strand focuses on the perception of multimodal feedback
delays in
the human control strategy and the role of
technical systems, such as multimodal telepresence systems. On the aspect
of crossmodal
in time-delayed
telepresence are desirab
E. Results and Discussion
simultaneity, the influences of packet loss and feedback delays on visual-haptic
synchrony
accurate estimate
of the operator’s future p
Paired t-tests reveal that communication quality control
has been examined, and
results
indicate
that
packet
loss
induced
an
impression
of
stagnahas a significant positive effect on the task performance in
A PPENDIX
tion, biasing crossmodal
perception.
the aspect
of delay
terms temporal
of the number
of obstacleOn
collisions
(p < 0.05),
as perception, several
depicted in Figure 10. On the other side, completion time is
not significantly changed by active control of time delay in the
communication channel (p = 0.96). In Figure 11, the onlinecontrol of communication quality using the developed model
predictive controller is depicted exemplarily. In the case of no
C OLLISION P ROBABILITY C ALCULAT
P ROBABILISTIC R EACHABLE
Without loss of generality, we will cons
ations to the time interval [0, T ]. We deno
the reachable set of states from an initia
1.4 References
25
studies (Rank et al., 2010) have been devoted to examine how the user’s action and the
haptic environment influence on the temporal discrimination sensitivity of haptic feedback
delay. The findings indicated that delay perception is neither isolated as the user’s internal
representation, nor determined by environmental characteristics. Delay perception is formed during dynamic interaction with the environment and influenced by both the user’s
own action and the external environments. Using predictive coding methods, a dynamic
optimal quality control scheme for a delayed feedback visual-haptic telepresence system has
been developed. Follow-up behavioral studies indicated such user-oriented quality control
scheme could significantly reduce hazard collision during the operation while keeping the
performance time at the same level.
The results of the cumulative research work also raised various further research questions. One challenge issue in multisensory temporal integration is biased temporal estimates.
It is a common knowledge that time perception can be easily distorted by a variety of factors. Given that time processing is distributed, differential biases in different sensory time
estimates may cause an internal conflict of time representation. Our brain must continuously calibrate related sensory estimates to keep internal consistency. The problem of which
modality should be calibrated arises, when the sensory system has only noisy temporal estimates and their inconsistencies. It is generally impossible to determine which modality is
biased without additional information. How multisensory systems calibrate their temporal
bias would be a future interesting research issue.
Bayesian models have been extensively used for multisensory spatial integration or
unimodal dynamic adjustment. To date, some quantitative models, such as MLE, have
been applied for multisensory temporal integration, including some studies reported here.
The MLE model, however, only focused on sensory likelihoods and unbiased estimates,
without considering any prior knowledge. Recently studies in unimodal temporal perception
(Jazayeri & Shadlen, 2010) showed that prior knowledge actually heavily influences time
estimation. Given that time estimates are often biased, future research should also focus
on how prior knowledge influences on multisensory duration integration.
1.4
References
Alais, D., & Burr, D. C. (2004). The ventriloquist effect results from near-optimal bimodal
integration. Current biology, 14, 257–62.
Alais, D., Newell, F. N., & Mamassian, P. (2010). Multisensory processing in review:
from physiology to behaviour. Seeing and perceiving, 23, 3–38.
Angrilli, A., Cherubini, P., Pavese, A., & Manfredini, S. (1997). The influence of affective
factors on time perception. Attention, Perception, & Psychophysics, 59, 972–982.
Baldo, M. V., & Klein, S. A. (1995). Extrapolation or attention shift?. Nature, 378,
565–566.
Binda, P., Cicchini, G. M., Burr, D. C., & Morrone, M. C. (2009). Spatiotemporal
distortions of visual perception at the time of saccades. The Journal of Neuroscience, 29,
13147–57.
26
1. Synopsis
Bolia, R. S., D’Angelo, W. R., & Richard, L. (1999). Aurally Aided Visual Search
in Three-Dimensional Space. Human Factors: The Journal of the Human Factors and
Ergonomics Society, 41, 664–669.
Bruns, P., & Getzmann, S. (2008). Audiovisual influences on the perception of visual
apparent motion: exploring the effect of a single sound. Acta Psychol, 129, 273–283.
Bueti, D. (2011). The Sensory Representation of Time. Frontiers in Integrative Neuroscience, 5, 1–3.
Bueti, D., & Walsh, V. (2010). Memory for time distinguishes between perception and
action. Perception, 39, 81–90.
Bueti, D., Bahrami, B., & Walsh, V. (2008). Sensory and association cortex in time
perception. Journal of cognitive neuroscience, 20, 1054–62.
Buhusi, C. V., & Meck, W. H. (2005). What makes us tick? Functional and neural
mechanisms of interval timing. Nature reviews. Neuroscience, 6, 755–65.
Burr, D. C., Banks, M. S., & Morrone, M. C. (2009). Auditory dominance over vision in the perception of interval duration. Experimental brain research. Experimentelle
Hirnforschung. Expérimentation cérébrale, 198, 49–57.
Chen, L., Shi, Z., & Müller, H. J. (2010). Influences of intra-and crossmodal grouping
on visual and tactile Ternus apparent motion. Brain research, 1354, 152–162.
Chen, L., Shi, Z., & Müller, H. J. (2011). Interaction of Perceptual Grouping and
Crossmodal Temporal Capture in Tactile Apparent-Motion. PLoS ONE, 6, 17130.
Chen, Y.-C., & Yeh, S.-L. (2009). Catch the moment: multisensory enhancement of
rapid visual events by sound. Experimental brain research. Experimentelle Hirnforschung.
Expérimentation cérébrale, 198, 209–19.
Colonius, H., & Arndt, P. (2001). A two-stage model for visual-auditory interaction in
saccadic latencies. Percept Psychophys, 63, 126–147.
Cunningham, D. W., Billock, V. A., & Tsou, B. H. (2001). Sensorimotor adaptation to
violations of temporal contiguity. Psychological Science, 12, 532–5.
Doyle, M. C., & Snowden, R. J. (1998). Facilitation of visual conjunctive search by
auditory spatial information. Perception, Supplementary, 27, 134.
Droit-Volet, S., Brunot, S., & Niedenthal, P. M. (2004). Perception of the duration of
emotional events. Cognition and Emotion, 18, 849–858.
Droit-Volet, S., Meck, W. H., & Penney, T. B. (2007). Sensory modality and time
perception in children and adults. Behavioural processes, 74, 244–50.
Eagleman, D. M. (2000). Motion Integration and Postdiction in Visual Awareness.
Science, 287, 2036–2038.
Engbert, K., Wohlschläger, A., & Haggard, P. (2008). Who is causing what? The sense
of agency is relational and efferent-triggered. Cognition, 107, 693–704.
Engbert, K., Wohlschläger, A., Thomas, R., & Haggard, P. (2007). Agency, subjective time, and other minds. Journal of Experimental Psychology: Human Perception and
Performance, 33, 1261–1268.
Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information
in a statistically optimal fashion. Nature, 415, 429–433.
1.4 References
27
Ernst, M. O., & Bülthoff, H. H. (2004). Merging the senses into a robust percept. Trends
in cognitive sciences, 8, 162–9.
Ernst, M. O., & Di Luca, M. (2011). Multisensory Perception: From Integration to
Remapping. In J. Trommershäuser, K. P. Körding, & M. S. Landy (Eds. & Trans.), Sensory
Cue Integration (pp. 224–250). New York: Oxford University Press.
Fendrich, R., & Corballis, P. M. (2001). The temporal cross-capture of audition and
vision. Perception & Psychophysics, 63, 719–25.
Ferrell, W. R. (1966). Delayed force feedback. Human factors, 8, 449–455.
Freeman, E., & Driver, J. (2008). Direction of Visual apparent motion driven solely by
timing of a static sound. Current biology, 18, 1262–1266.
Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. (2004). Recalibration of audiovisual simultaneity. Nat Neurosci, 7, 773–8.
Ganzenmüller, S., Shi, Z., & Müller, H. J. (2012). Duration reproduction with sensory feedback delay: differential involvement of perception and action time. Frontiers in
Integrative Neuroscience, 6, 1–11.
Getzmann, S. (2007). The effect of brief auditory stimuli on visual apparent motion.
Perception, 36, 1089–103.
Geyer, T., Shi, Z., & Müller, H. J. (2010). Contextual cueing in multi-conjunction visual
search is dependent on color- and configuration-based intertrial contingencies. Journal of
Experimental Psychology: Human Perception and Performance.
Ghose, G. M., & Maunsell, J. H. R. (2002). Attentional modulation in visual cortex
depends on task timing. Nature, 419, 616–620.
Grondin, S. (1993). Duration discrimination of empty and filled intervals marked by
auditory and visual signals. Attention, Perception, & Psychophysics, 54, 383–394.
Gu, B. M., & Meck, W. H. (2011). New perspectives on Vierordt’s law: memory-mixing
in ordinal temporal comparison tasks. Multidisciplinary Aspects of Time and Time Perception, 6789, 67–78.
Haggard, P., Clark, S., & Kalogeras, J. (2002). Voluntary action and conscious awareness. Nature neuroscience, 5, 382–5.
Hartcher-O’Brien, J., & Alais, D. (2011). Temporal ventriloquism in a purely temporal
context. Journal of experimental psychology. Human perception and performance, 37, 1383–
1395.
Hecht, E. (2002). Optics. Addison-Wesley.
Heron, J., Hanson, J. V. M., & Whitaker, D. (2009). Effect before cause: supramodal
recalibration of sensorimotor timing. PLoS ONE, 4, 7681.
Ivry, R. B., & Richardson, T. C. (2002). Temporal control and coordination: the multiple
timer model. Brain and cognition, 48, 117–32.
Jay, C., Glencross, M., & Hubbold, R. (2007). Modeling the effects of delayed haptic
and visual feedback in a collaborative virtual environment. ACM Trans. Comput.-Hum.
Interact., 14.
Jazayeri, M., & Shadlen, M. N. (2010). Temporal context calibrates interval timing.
Nature neuroscience, 13, 1020–6.
28
1. Synopsis
Jia, L., Shi, Z., Zang, X., & Müller, H. J. (2013). Concurrent emotional pictures modulate spatial-separated audiotactile temporal order judgments. Brain Research.
Keetels, M., Stekelenburg, J., & Vroomen, J. (2007). Auditory grouping occurs prior to
intersensory pairing: evidence from temporal ventriloquism. Experimental Brain Research,
180, 449–456.
Kennedy, J. S., Buehner, M. J., & Rushton, S. K. (2009). Adaptation to sensory-motor
temporal misalignment: instrumental or perceptual learning?. Quarterly journal of experimental psychology (2006), 62, 453–69.
Kim, T., Zimmerman, P. M., Wade, M. J., & Weiss, C. A. (2005). The effect of delayed
visual feedback on telerobotic surgery. Surgical endoscopy, 19, 683–686.
Körding, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning.
Nature, 427, 244–247.
Levitin, D. J., MacLean, K., Mathews, M., & Chu, L. (2000). The perception of crossmodal simultaneity. International Journal of Computing and Anticipatory Systems, 323–
329.
MacKenzie, I. S., & Ware, C. (1993). Lag as a determinant of human performance in
interactive systems. Conference on Human Factors in Computing Systems, 488.
Matell, M., & Meck, W. H. (2004). Cortico-striatal circuits and interval timing: coincidence detection of oscillatory processes. Cognitive Brain Research, 21, 139–170.
Morein-Zamir, S., Soto-Faraco, S., & Kingstone, A. (2003). Auditory capture of vision:
examining temporal ventriloquism. Cognitive Brain Research, 17, 154–163.
Nijhawan, R. (1994). Motion extrapolation in catching. Nature, 370, 256–7.
Noulhiane, M., Mella, N., Samson, S., Ragot, R., & Pouthas, V. (2007). How Emotional
Auditory Stimuli Modulate Time Perception. Emotion, 7, 697–704.
Oruç, İ., Maloney, L. T., & Landy, M. S. (2003). Weighted linear cue combination with
possibly correlated error. Vision Research, 43, 2451–2468.
Park, J., Schlag-Rey, M., & Schlag, J. (2003). Voluntary action expands perceived
duration of its sensory consequence. Experimental Brain Research, 149, 527–529.
Penney, T. B., Gibson, J. J., & Meck, W. H. (2000). Differential effects of auditory and
visual signals on clock speed and temporal memory. Journal of Experimental Psychology:
Human Perception and Performance, 26, 1770–1787.
Rank, M., Shi, Z., Hermann, M., Hirche, S., & Member, S. (2013). Performance-Optimal
Communication Quality Control for Haptic Telepresence Systems. Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.
Rank, M., Shi, Z., Müller, H. J., & Hirche, S. (2010). Perception of delay in haptic
telepresence systems. Presence, 19.
Rank, M., Shi, Z., Müller, H. J., & Hirche, S. (2010). The influence of different haptic
environments on time delay discrimination in force feedback. Lecture Notes in Computer
Science, 6191, 205–212.
Roach, N. W., Heron, J., & McGraw, P. V. (2006). Resolving multisensory conflict:
a strategy for balancing the costs and benefits of audio-visual integration. Proceedings.
Biological sciences / The Royal Society, 273, 2159–68.
1.4 References
29
Scheier, C., Nijhawan, R., & Shimojo, S. (1999). Sound alters visual temporal resolution.
Investigative Ophthalmology & Visual Science, 40.
Schmolesky, M. T., Wang, Y., Hanes, D. P., Thompson, K. G., Leutgeb, S., Schall,
J. D., & Leventhal, A. G. (1998). Signal Timing Across the Macaque Visual System. J
Neurophysiol, 79, 3272–3278.
Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you
hear. Nature, 408, 788.
Shi, Z., & Nijhawan, R. (2012). Motion extrapolation in the central fovea. PloS one, 7,
33651.
Shi, Z., Chen, L., & Müller, H. J. (2010). Auditory temporal modulation of the visual
Ternus effect: the influence of time interval. Experimental Brain Research, 203, 723–35.
Shi, Z., Ganzenmüller, S., & Müller, H. J. (2013). Reducing bias in the duration reproduction by integrating reproduced signal. PloS one.
Shi, Z., Jia, L., & Müller, H. J. (2012). Modulation of tactile duration judgments by
emotional pictures. Frontiers in Integrative Neuroscience, 6, 1–9.
Shi, Z., Zang, X., Jia, L., Geyer, T., & Müller, H. J. (2013). Transfer of contextual
cueing in full-icon display remapping. Journal of Vision.
Shi, Z., Zou, H., & Müller, H. J. (2010). Temporal perception of visual-haptic events
in multimodal telepresence system. In M. H. Zadeh (Ed. & Trans.), Advances in Haptics
(pp. 437–450). InTech.
Shi, Z., Zou, H., Rank, M., Chen, L., Hirche, S., & Müller, H. J. (2010). Effects of
packet loss and latency on the temporal discrimination of visual-haptic events. Haptics,
IEEE Transactions on, 3, 28–36.
Spence, C., Nicholls, M. E., & Driver, J. (2001). The cost of expecting events in the
wrong sensory modality. Percept Psychophys, 63, 330–336.
Spence, C., Sanabria, D., & Soto-Faraco, S. (2007). Intersensory Gestalten and crossmodal scene perception. In K. Noguchi (Ed. & Trans.), Psychology of beauty and Kansei:
New horizons of Gestalt perception (pp. 519–579). Tokyo: Fuzanbo International.
Stetson, C., Cui, X., Montague, P. R., & Eagleman, D. M. (2006). Motor-sensory recalibration leads to an illusory reversal of action and sensation. Neuron, 51, 651–9.
Stone, J. V., Hunkin, N. M., Porrill, J., Wood, R., Keeler, V., Beanland, M., . . . Porter,
N. R. (2001). When is now? Perception of simultaneity. Proc Biol Sci, 268, 31–8.
Sugano, Y., Keetels, M., & Vroomen, J. (2010). Adaptation to motor-visual and motorauditory temporal lags transfer across modalities. Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale, 201, 393–9.
Tomassini, A., Gori, M., Burr, D. C., Sandini, G., & Morrone, M. C. (2011). Perceived
duration of Visual and Tactile Stimuli Depends on Perceived Speed. Frontiers in integrative
neuroscience, 5, 51.
Van der Burg, E., Cass, J., Olivers, C. N. L., Theeuwes, J., & Alais, D. (2010). Efficient
visual search from synchronized auditory signals requires transient audiovisual events. PloS
one, 5, 10664.
Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2008). Pip
and pop: nonspatial auditory signals improve spatial visual search. Journal of experimental
30
1. Synopsis
psychology. Human perception and performance, 34, 1053–65.
Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2009). Poke
and pop: tactile-visual synchrony increases visual saliency. Neuroscience letters, 450, 60–4.
Vroomen, J., & Keetels, M. (2010). Perception of intersensory synchrony: a tutorial
review. Attention, perception, & psychophysics, 72, 871–84.
Vroomen, J., & de Gelder, B. (2000). Sound enhances visual perception: cross-modal
effects of auditory organization on vision. Journal of experimental psychology. Human perception and performance, 26, 1583–1590.
Walker, J. T., & Scott, K. J. (1981). Auditory–visual conflicts in the perceived duration
of lights, tones, and gaps. Journal of Experimental Psychology: Human Perception and
Performance, 7, 1327–1339.
Wearden, J. H. (2008). Slowing down an internal clock: implications for accounts of
performance on four timing tasks. Quarterly journal of experimental psychology (2006),
61, 263–74.
Wearden, J. H., Edwards, H., Fakhri, M., & Percival, A. (1998). Why “sounds are judged
longer than lights:” application of a model of the internal clock in humans. Q J Exp Psychol
B, 51, 97–120.
Welch, R. B. (1999). Meaning, attention, and the ’unity assumption’ in the intersensory
bias of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Müsseler
(Eds. & Trans.), Presence (Vol. 8, pp. 371–387). Elsevier.
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory
discrepancy. Psychological Bulletin, 88, 638–67.
Whitney, D., & Murakami, I. (1998). Latency difference, not spatial extrapolation.
Nature Neuroscience, 1, 656–657.
Yarrow, K., Haggard, P., Heal, R., Brown, P., & Rothwell, J. C. (2001). Illusory perceptions of space and time preserve cross-saccadic perceptual continuity. Nature, 414, 302–5.
Zou, H., Müller, H. J., & Shi, Z. (2012). Non-spatial sounds regulate eye movements
and enhance visual search. Journal of Vision, 12, 1–18.
van Wassenhove, V., Buonomano, D. V., Shimojo, S., & Shams, L. (2008). Distortions
of Subjective Time Perception Within and Across Senses. PLoS ONE, 3, 1437.