Dynamic Temporal Processing of Multisensory Information
Transcription
Dynamic Temporal Processing of Multisensory Information
Dynamic Temporal Processing of Multisensory Information Dr. Zhuanghua Shi Habilitation an der Fakultät für Psychologie und Pädagogik der Ludwig-Maximilians-Universität München vorgelegt von Dr. Zhuanghua Shi München, den 15 Jan. 2013 to my family ... vi Inhaltsverzeichnis 1 Synopsis 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Multisensory spatial integration . . . . . . . . . . . . . . . . . . . . 1.1.2 Multisensory temporal integration . . . . . . . . . . . . . . . . . . . 1.1.3 Multisensory time perception . . . . . . . . . . . . . . . . . . . . . 1.1.4 Sensorimotor recalibration and delay perception . . . . . . . . . . . 1.1.5 Multisensory enhancement and perceptual learning in visual search 1.1.6 Multimodal feedback delay in human-machine interaction (HMI) . . 1.2 Cumulative research work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Multisensory temporal integration and motion perception . . . . . . 1.2.2 Multisensory and sensorimotor time perception . . . . . . . . . . . 1.2.3 Multisensory enhancement, context learning and search performance 1.2.4 Delays in multimodal feedback and user experience . . . . . . . . . 1.3 Summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 4 5 5 6 7 8 11 15 17 23 25 2 Wissenschaftliche Veröffenlichungen 2.1 List of publications (2009-2013) . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Part I: Multimodal temporal integration and motion perception . . . . . . 2.2.1 Audiovisual Ternus apparent motion . . . . . . . . . . . . . . . . . 2.2.2 Perceptual grouping and crossmodal Apparent motion . . . . . . . . 2.2.3 Auditory capture on Tactile apparent motion . . . . . . . . . . . . 2.3 Part II: Multimodal time perception . . . . . . . . . . . . . . . . . . . . . 2.3.1 Auditory reproduction . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Feedback delay and duration reproduction . . . . . . . . . . . . . . 2.3.3 Emotional modulation of tactile duration . . . . . . . . . . . . . . . 2.3.4 Emotional modulation of audiotactile TOJ . . . . . . . . . . . . . . 2.3.5 Simultaneity in Schizophrenia patients . . . . . . . . . . . . . . . . 2.4 Part III: Multimodal enhancement, perceptual learning and search performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Eye movements and pip-and-pop effect . . . . . . . . . . . . . . . . 2.4.2 Contextual cueing in multiconjunction search . . . . . . . . . . . . 2.4.3 Transfer of contextual cueing in full-icon display remapping . . . . 31 31 32 32 47 59 68 68 89 101 111 126 137 137 156 175 viii 2.5 Inhaltsverzeichnis Part IV: Delays in multimodal processing and user experience . . . . 2.5.1 Neural latencies and motion extrapolation in the central fovea 2.5.2 Delay in haptic telepresence systems . . . . . . . . . . . . . . 2.5.3 Effects of packet loss and latency on visual-haptic TOJs . . . 2.5.4 Temporal perception of visual-haptic events . . . . . . . . . . 2.5.5 Delay perception in different haptic environments . . . . . . . 2.5.6 Optimization for haptic delayed telepresence systems . . . . . . . . . . . . . . . . . . . . . . . . . . 186 186 194 206 216 231 240 Acknowledgements 255 Lebenslauf 256 1 Synopsis 1.1 1.1.1 Introduction Multisensory spatial integration Signals from the natural environment are highly redundant, since we perceive the external world via multiple senses. For example, when knocking on a door we do not only hear a sound, see a hand movement, but also perceive a touch from the knocking hand. The multisensory nature of the world is highly advantageous, because it increases perceptual reliability and saliency, and, as a result, it enhances object discrimination and identification, and facilitates a reaction to the external world (Vroomen & Keetels, 2010). However, the multisensory nature of the world also raises complex integration and segregation problems. For instance, how does our brain sort through relevant and irrelevant signals to form a coherent multisensory perception? Imagine you are chatting with your friends in a coffee bar, you hear multiple voices, and see lip movements simultaneously. You identify and combine those information correctly without any difficulty, but sometimes you may fail to integrate or mis-combine different face and voice together. This happens, too, when you are watching a movie in a cinema. You believe that voices are coming from the actor’s lips, although the voice is delivered from the loud speakers on the sidewalls. This is known as a typical ventriloquist effect. Such an audiovisual speech illusion, however, is one example of multisensory integration, and there are many others. To take another example, when a single flash is accompanied with two beeps, the single flash is often perceived as two flashes (Shams, Kamitani, & Shimojo, 2000). Over the past few decades, much has been gained in multisensory perception, particular in spatial integration. The most common account for multisensory integration is the modality appropriateness or modality precision hypothesis (Welch & Warren, 1980). The hypothesis states that the sensory modality with highest acuity outweighs the others in multisensory integration. For example, vision with its high spatial resolution may dominate over audition for spatial perception, which explains that the position of an auditory stimulus is often captured by a simultaneous visual stimulus. In recent years, probabilistic models, such as Maximum likelihood estimation (MLE) (Alais & Burr, 2004; Ernst & Banks, 2002; Ernst & Bülthoff, 2004), have been developed to provide quantitative accounts for multisensory integration. The MLE model assumes that different sensory inputs are assigned to differential weights, with each weight set in proportion to the reliability of the corresponding sensory estimates. Using this approach, the final multisensory estima- 2 1. Synopsis te has minimal variance (in other words, maximal reliability). Consider a bimodal source (e.g. an audiovisual signal) that produces two sensory cues (e.g. positional cues), estimated by the auditory and visual systems with (Ŝa , Ŝv ). The MLE model predicts that the final audiovisual estimate Ŝ is: (1.1) Ŝ = wa Ŝa + wv Ŝv , where wa = 1/ a2 /(1/ a2 + 1/ v2 ), wv = 1 wa . And and visual sensory estimates (see Figure 1.1). 2 a, 2 v are variances of the auditory Ŝ Ŝv Ŝa Abbildung 1.1: A normative MLE model of audiovisual integration. The audiovisual estimate Ŝ is the linear combination of Ŝa , Ŝv , with each weight set in proportion to its reliability. The reliability-based integration models have successfully predicted multisensory integration in many situations, such as visual-haptic size estimation, and audio-visual localization (for a review, see Alais, Newell, & Mamassian, 2010). The MLE model has recently been extended to a more general Bayesian framework, in which prior knowledge of the multisensory information has been incorporated (Ernst & Di Luca, 2011; Körding & Wolpert, 2004; Roach, Heron, & McGraw, 2006). Using priors allows Bayesian models to predict both multisensory integration and multisensory segregation (Ernst & Di Luca, 2011; Roach et al., 2006). Note, those quantitative models mentioned above are derived from studies of multisensory spatial integration (Alais & Burr, 2004; Ernst & Banks, 2002). However, evidence is rather mixed whether or not those quantitative models would also apply to multisensory temporal integration (see next section 1.1.2). 1.1.2 Multisensory temporal integration A pre-assumption for multisensory integration is the assumption of unity. The assumption of unity suggests that multisensory integration makes sense only when the perceptual system has evidence that the multiple signals (events) originate from a common source (Welch, 1999; Welch & Warren, 1980). Without doubt, the most important factors for perceiving a common source are spatial proximity and temporal coincidence. Multisensory integration occurs when the sensory signals originated from proximal location and reach the 1.1 Introduction 3 brain at around the same time. Otherwise, sensory signals are likely perceived as separated events. Similar to multisensory spatial integration, multisensory events with temporal coincidence may interact and integrate, forming a coherent temporal percept. Recent studies on audiovisual interaction in temporal order judgments (Morein-Zamir, Soto-Faraco, & Kingstone, 2003; Scheier, Nijhawan, & Shimojo, 1999) found that the temporal discrimination threshold of visual events can be altered by adding two auditory clicks. When the first click is slightly prior to the first flash and the second click shortly after the second flash, the visual temporal resolution could be enhanced, as if the two clicks pull two flashes apart. This has been termed as the temporal ventriloquist effect, analogue to the spatial ventriloquist effect. Various types of the temporal ventriloquist effect have been found recently using different paradigms (Fendrich & Corballis, 2001; Getzmann, 2007; Keetels, Stekelenburg, & Vroomen, 2007). For example, Getzmann has recently used classical apparent motion to investigate how brief beeps altered visual apparent motion. He found that, similar to the temporal ventriloquist effect, beeps presented before the first and after the second visual flash as well as simultaneously presented sounds reduce the motion impression, whereas sounds intervening two visual flashes facilitated apparent motion relative to the baseline (visual flashes without sounds). The common explanation for multisensory temporal integration is similar to the account for the multisensory spatial integration (i.e., the traditional modality precision hypothesis, Welch & Warren, 1980), arguing that the auditory modality has higher temporal resolution than other modalities, and as a result, auditory information dominates the final temporal percept. Note, temporal coincidence can be influenced by physical temporal discrepancies (e.g., sound and light travel at different speeds), and by differential neural processing time among sensory modalities. Scientists are well aware of neural latency differences. For example, auditory stimuli are often perceived faster than visual stimuli (Levitin, MacLean, Mathews, & Chu, 2000; Stone et al., 2001), whereas, the latency of touch has to be considered where the stimulation originated, because the travel time is longer from the toes to the brain than from the forehead (Vroomen & Keetels, 2010). Although latencies are different for different modalities, the perceptual system still promotes a temporally coherent and unified perception to a certain degree. It is thus essential for researchers to investigate the compensation mechanisms of the perceptual system. The ability of the perceptual system to compensate for different latencies has been referred to as temporal window of multisensory integration. To date, it has been revealed that the temporal integration window depends on many aspects, such as the modality, training, attentional bias etc. For example, Fujisaki et al. have shown that training and adaptation can alter the crossmodal simultaneity window (Fujisaki, Shimojo, Kashino, & Nishida, 2004). Spence and colleagues have demonstrated that attention can also shift the integration window (Spence, Nicholls, & Driver, 2001). However, most of the aforementioned studies have mainly examined crossmodal integration of the audio-visual modalities. Research of touch-related multisensory temporal integration is relative scarce and the temporal integration mechanism with touch still need to be further investigated. Besides spatial proximity and temporal coincidence, other Gestalt grouping principles, 4 1. Synopsis such as common fate or common feature, may also lead to a coherent percept. More recently, it has been shown that perceptual grouping in general could be a potential influential factor for multisensory perception (Spence, Sanabria, & Soto-Faraco, 2007). For example, unimodal auditory grouping and segregation (i.e. pop-out pips) can enhance discrimination of concurrent visual events (Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2008; Vroomen & de Gelder, 2000), or temporal order judgments (Keetels et al., 2007). In a study of audiovisual interaction in visual apparent motion, Bruns and Getzmann have revealed that either a continuous sound filling in the gap between two flashes or a short sound intervening between two flashes promotes the crossmodal grouping of movement, which enhances perceived visual motion continuity (Bruns & Getzmann, 2008). However, it is less known how unimodal and crossmodal grouping interact and modulate multisensory temporal integration. 1.1.3 Multisensory time perception The perception of time, in particular, time among multiple senses, is not straightforward, since there is no special sensory organ for time perception. Yet, the traditional centralized and amodal internal clock model dominated the field of time perception over the last 30 years (Bueti, 2011), which consists of a pacemaker emitting pulses at a certain rate and a mode of switch that can open and close to permit an accumulator to collect emitted pulses. More recently new evidence has been accumulated to challenge the one-centralized-clock amodal model. For instance, the amodal account can not explain differential modalityspecific pacemaker rates (Droit-Volet, Meck, & Penney, 2007; Penney, Gibson, & Meck, 2000; Wearden, Edwards, Fakhri, & Percival, 1998). Neurophysiological evidence, on the other hand, suggests separate brain regions devote to visual and auditory duration processing (Bueti, Bahrami, & Walsh, 2008; Ghose & Maunsell, 2002). The amodal model has also difficulty to explain, for examples, why temporal discrimination is better for audition than vision (Grondin, 1993), and why auditory duration is judged longer than the same physical visual duration (Wearden et al., 1998). Those recent evidence suggests that time perception is rather distributed across brain areas and sensory modalities (Bueti, 2011; Ivry & Richardson, 2002; Matell & Meck, 2004). Since time processing is distributed across modalities, studies on crossmodal time judgments have revealed rather complex and inconclusive results. For instance, it has been shown that the duration of auditory events was lengthened or shortened by the presence of conflicting looming or receding visual information, while the perceived duration of visual events was unaffected by auditory looming or receding stimuli (van Wassenhove, Buonomano, Shimojo, & Shams, 2008). Other studies, on the other hand, using static stimuli or implicit measures have reported the opposite results, that is, the perceived visual duration was affected by a concurrent auditory duration (e.g., Y.-C. Chen & Yeh, 2009). Unlike spatial perception, time perception can be distorted dramatically by emotional stats. For instance, when involved in an accident, such as a car crash, people often report that they felt the world slow down. Research suggests high-arousal stimuli, such as threatening pictures, are often perceived longer in duration compared to neutral stimuli 1.1 Introduction 5 (Droit-Volet et al., 2007). The lengthening effect induced by emotion has been confirmed in visual (Angrilli, Cherubini, Pavese, & Manfredini, 1997; Droit-Volet, Brunot, & Niedenthal, 2004), and auditory (Noulhiane, Mella, Samson, Ragot, & Pouthas, 2007) modalities. Although there is now ample evidence of how emotion distorts duration perception, most of the studies have focused only on unisensory modulation. Given that time perception is distributed processing, there is still only scant understanding of how induced emotion from one sensory modality influences time perception in another modality. 1.1.4 Sensorimotor recalibration and delay perception Time perception can be influenced by an action, too (Bueti & Walsh, 2010; Cunningham, Billock, & Tsou, 2001; Stetson, Cui, Montague, & Eagleman, 2006). Stetson et al. (2006) have demonstrated that, following brief exposure to delayed visual feedback of a voluntary action, the onset time of the action-feedback signal is perceived earlier than the the action itself when the delay is removed. The effect has been attributed to dynamical shifts of the feedback event to the onset of the action, in order to maintain appropriate causality perception. Other related studies have confirmed that a delayed sensory effect is perceived as having appeared slightly earlier in time if it follows a voluntary action - a phenomenon referred to as intentional binding. Intentional binding also attracts a voluntary action toward its sensory effect, so that the action is perceived as having occurred slightly later in time too, and perceived feedback delay as shorter than the actual delay (Engbert, Wohlschläger, & Haggard, 2008; Engbert, Wohlschläger, Thomas, & Haggard, 2007; Haggard, Clark, & Kalogeras, 2002). The shortening effect has been attributed to a transient slowdown of an internal clock after a voluntary action, and as a result, less ticks are accumulated (Wearden, 2008). This shortening effect might be reinforced by everyday experience which leads us to assume sensorimotor synchrony between the start of a motor action and its sensory consequence (Heron, Hanson, & Whitaker, 2009). However, whether sensorimotor temporal calibration is due to timing changes in the motor system or in the perceptual system is still controversial. Some researchers have suggested that sensorimotor temporal calibration is induced mainly by a temporal shift in the motor system (Sugano, Keetels, & Vroomen, 2010), whereas others have attributed sensorimotor temporal calibration to pure perceptual learning (Kennedy, Buehner, & Rushton, 2009). 1.1.5 Multisensory enhancement and perceptual learning in visual search Temporal coinciding multisensory events, such as synchronous audiovisual signals, can easily be picked out by our brain amongst other objects or events in the environment. For example, a car collision with a big ‘Peng’ easily attracts our attention to the accident spot. Such enhancement may come about as result of redundant target coding and multisensory perceptual saliency. Multisensory enhancement and facilitation have been shown in various search paradigms in which a visual target was accompanied by a sound signal (Bolia, D’Angelo, & Richard, 1999; Doyle & Snowden, 1998; Van der Burg, Cass, Olivers, 6 1. Synopsis Theeuwes, & Alais, 2010). For example, Doyle & Snowden (1998) found that simultaneous, spatially congruent sound facilitated covert orienting to non-salient visual targets in a conjunction search paradigm. Interestingly, multisensory enhancement of visual perception and search performance has been found not only with spatially informative, but also with temporally informative auditory (Van der Burg et al., 2010; 2008; Vroomen & de Gelder, 2000) or tactile signals (Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2009). For instance, Vroomen & de Gelder (2000) investigated crossmodal influences from the auditory onto the visual modality at an early level of perceptual processing. In their study, a visual target was embedded in a rapidly changing sequence of visual distractors. They found a high tone embedded in a sequence of low tones to improve the detection of a synchronously presented visual target, while this enhancement was reduced or abolished when the high tone was presented asynchronously to the visual target or became part of a melody. Using a dynamic visual search paradigm, Van der Burg et al. demonstrated that irrelevant beeps could guide visual attention towards the location of a synchronized visual target, which, if presented without such synchronous beeps, was extremely hard to find (Van der Burg et al., 2008). With the aid of synchronous beeps, search performance was improved substantially (in fact, in the order of seconds). Van der Burg et al. referred to this facilitation as pip-and-pop effect. However, when synchronized tones were not transient, rather smooth (e.g. by a sine wave enveloping), pip-and-pop effect vanished, suggesting transient feature of the auditory signals is important (Van der Burg et al., 2010). To date, the true underlying mechanisms and linkage between multisensory enhancement and search performance are still not well known, so it deserves further investigation. 1.1.6 Multimodal feedback delay in human-machine interaction (HMI) Multisensory time processing has critical implication in human-machine interaction, particular in multimodal virtual reality systems. Multimodal virtual reality systems have been adopted in a variety of applications, such as remote virtual conference, telesurgery, teleoperation in space and under water. In a typical multimodal telepresence system, multimodal information are bilateral between the local and remote sites. Users not only receive information from remote sites, but send multimodal commands (e.g. audiovisual stream as well as haptic actions). However, due to the communication distance, data encoding, and control scheme, communication delays between the local and remote sites are inevitable. These delays can vary from dozens of milliseconds to seconds. For instance, the feedback latency for an intercontinental teleoperation via the Internet is on average 300 ms, while the latency can be up to 5-10 seconds for teleoperation tasks in space. In addition, delays may vary among different modalities. Thus, remote multimodal synchronous events, such as visual-haptic collision, may be turned into local asynchronous incidents. And a normal immediate action-effect turns into action-delayed-effect as well. The effect of time delay on simple performance have been investigated in several studies. For example, examining the effect of visual-feedback delay on user’s task completion time, 1.2 Cumulative research work 7 Mackenzie and Ware found that performance was affected by delays exceeding 75 ms, with completion time thereafter increasing linearly with time delay (> 75 ms) and task difficulty (MacKenzie & Ware, 1993). Similar effects have been confirmed in various modalities, such as delay in visual feedback (Kim, Zimmerman, Wade, & Weiss, 2005), haptic feedback (Ferrell, 1966), as well as visual-haptic feedback (Jay, Glencross, & Hubbold, 2007). While many studies of time delays have examined issues related to task performance, there are relative few studies on delay perception per se in multimodal virtual reality systems. Arguably, knowing human’s capabilities of perceiving delays is useful for providing system designers with guidelines for the development of multimodal communication protocols as well as for human-centered evaluations of existing applications with respect to system fidelity and user experience. 1.2 Cumulative research work As alluded above, there are several open key issues in multimodal temporal processing. During my habilitating period, I have focused on the following four research topics: 1. Multisensory temporal integration and motion perception: Using various apparent motion paradigms, studies in this research topic extended previous multisensory temporal integration at point in time to multisensory interval (duration) integration, and revealed that quantitative models, such as MLE, could predict multisensory interval estimation very well. In addition, crossmodal grouping principles on multisensory integration has been extensively investigated. 2. Multisensory time perception: In this topic, various studies have been conducted on multisensory duration perception, particularly on issues of sensorimotor duration perception and crossmodal emotional modulation on time perception. 3. Multisensory enhancement, context learning and search performance: In the third line of research, studies have been focused on how audiovisual synchronous events and contextual cueing boost visual search performance. Eye tracking method has been applied in the studies to reveal how synchronous audiovisual events influence oculomotor behaviors. In addition, context learning in general has been examined. 4. Multimodal feedback delay and user experience: Feedback delay is ubiquitous in applied multimodal systems, such as telepresence, involving large data transmission. The influence of delay on multisensory perception and user experience is the main focus of this last research agenda. Here various studies have been conduced to identify the impacts of delays in visual-haptic environments on perception of multisensory simultaneity and user’s operation performance. Based on fundamental findings, performance optimization methods have been proposed. Time Time further to quantitatively describe the inter one common dot at the center. When the spatial configuthe auditory and visual intervals. ration is fixed, observers typically report two distinct percepts dependent on the inter-stimulus onset interval (ISOI): ‘element motion’ and ‘group motion’ (Fig. 1). Short ISOIs Experiment 1 usually give rise to the percept of element motion, that is: the outer dots are perceived as moving, while the center dot 8 1. Synopsis Two auditory clicks presented close in tim appears to remain static or flashing. By contrast, long ISOIs events have been found to influence th give rise to the perception of group motion: the two dots (visual) temporal-order judgments (More are perceived to move together as a groupand (Kramer and 1.2.1 Multisensory temporal integration motion perception 2003; Scheier et al. 1999) as well as the (v Yantis 1997; Pantle and Petersik 1980; Pantle and Picciano Most studies on multisensory temporal integration follow the traditional approach of (Freeman the apparent motion and Driver 2 1976). Numerous studies have shown that element and 2007). In Experiment 1, we examined wh group motion are never perceived simultaneously, that is, multisensory spatial integration (such as spatial ventriloquist effect), focusing on crossmoaudiovisual temporal capture effect would typesinoftime percept aretemporal mutually exclusive (for a effect). dal temporal capture the at atwo point (e.g. ventriloquist The common the visual Ternus paradigm. In particular, review, see Petersik and Rice 2006). finding is that the onset time of a visual event is perceived to be aligned with the onested in any temporal capture effect using In Experiment 1, we combined dual sounds with the set of an auditory event which appears temporally near the visual event (Burr, Banks, interval of 30 ms, which is generally with visual Ternus display, introducing three different audiovi& Morrone, 2009; Freeman & Driver, 2008; As Getzmann, 2007; Morein-Zamir et al., 2003; the audiovisual simultaneity window (Lev sual temporal arrangements. temporal ventriloquism has Scheier et al., 1999). been However, theboth temporal ventriloquist is manifested with Stone et al.only 2001). found in temporal-order judgments effect (TOJ) and paired audiovisual stimuli. studies tasks, havea shown that a single classicalSeveral apparent-motion similar interaction was sound leaves visual Method expected to be observed with(Morein-Zamir visual Ternus apparent temporal-order judgment (TOJ) uninfluenced et al., 2003; Scheier et al., 2, we two examined the effects of sin- for the audiovisual 1999). This has been motion. taken In toExperiment suggest that sounds are required gle-sound audiovisual arrangements, with a sound prestimuli to be perceived as unitary events. Arguably, however, two beepsParticipants clearly define an sented in temporal proximity to either the first or the auditory interval, which - invisual contrast point time -theis onset another of the (6 time Ten participants females and 4 males, m second frame.toIf the sound onsetin captures of feature perception. Moreover, visual pairedstimuli, stimulione canwould easilyexpect form to a perceptual group, which may further took part in Experiment 1. All had normal obtain a similar influence on multisensory temporal integration. normal vision and normal hearing. All wer ventriloquism effect in single-sound conditions as in dualpurpose integration, of the experiment. Informed conse sound conditions. However, if other factors, such as interTo investigate the influence of a sound interval on audiovisual temporal 1 before the starttypical of the experimental session sound interval or auditory grouping, are important for we adopted a Ternus apparent motion paradigm (Shi, Chen, & Müller, 2010). The producing a temporal modulation, the temporal ventriloTernus apparent motion is produced by presenting two sequential visual frames; each frame Apparatus and stimuli quism effect with single sounds may be weak, if at all consists of two horizontal dots, and the two frames, when overlaid, share one common dot present. In Experiments 3 and 4, we further examined the at the center. Observers typically report percepts dependent the were interVisualon stimuli presented on a 17-inc importance of the intervaltwo for distinct the audiovisual temporal stimulus onset intervalinteraction. (ISOI): element Short (Viewsonic) ISOIs usually with give a refresh rate of 100 Hz If a given motion (physical)and timegroup intervalmotion. is perceived 1024 9 768 pixels, differentlymotion, in different modalities the dots auditory rise to the percept of element that is: the and outer areinterval perceivedofas moving, while which was controlle AMD Athlon 64 Dual-Core Processor) with influences the visual interval, one would expect a temporal the center dot appears to remain static or flashing. By contrast, long ISOIs give rise to FSC graphics card. The computer program interaction to occur even when the onsets of the visual and the perception of group motion: the two dots are perceived to move together as a group of the experiment was developed with Mat auditory events coincide in time. Additionally, the vari(See Figure 1.2). The ances transition threshold between the element motion and group motion, Inc.) and the Psychophysics Toolbox (Brai of the interval estimates for the auditory, the visual, measured by the chance of two alternative force were choices (2AFC),1997). is relative stable The testing cabin was dimly lit w and level the combined audiovisual events examined when the spatial configuration is fixed. ambient luminance of 0.09 cd/m2. The v was set to 57 cm, maintained by using a c stimuli consisted of two ‘stimulus frames,’ Space Space (b) (a) two black disks (1.3! of visual angle in diam luminance) presented on a gray backgroun The separation between the two disks was 2! The two frames shared one element location the monitor but contained two other elem horizontally opposite positions relative to Fig. 1 Schematic representation of the classical Ternus effect. a Mono sounds ‘Element motion’ percept. of As the illustrated, the apparent ‘center’ dot which Abbildung 1.2: Schematic representation Ternus motion.Fig. (a)1).Element mo-(65 dB, 1,000 Hz) we occupies the same position in two frames is perceived to remain in the delivered via an M-Audio card (Delta 101 tion percept. (b) Group motion percept. same location, while the ‘outer’ dots (the remaining two dots) are (RT-788V, RAPTOXX). To ensure accura perceived to move from one location to the other. b ‘Group motion’ auditory and visual stimuli, the durations of t percept. Two dots are perceived to move together in a manner Using Ternus apparent we could implicitly measure audiovisual in- of the auditory an and theduration synchronization consistentmotion with the physical displacement tegration by observing the shifts of the transition thresholds. In the study (Shi et al., 1 In most studies I have collaborated with my colleagues and doctoral students. Thus, I prefer the word we to I in the report. Other times, I use the words we and you to refer a generic third person. It should be clear from the context. 1.2 Cumulative research work 9 2010), we systematically investigated influences of paired beeps and a single beep with three different audiovisual temporal configurations. In the paired-beeps conditions, auditory gap intervals were clearly defined. Similarly to previous temporal ventriloquist studies (Morein-Zamir et al., 2003; Scheier et al., 1999), we found audiovisual interval capture effects. When the first sound preceded the first visual frame and the second sound trailed the second visual frame by 30 ms, more group motion responses were observed compared to the baseline condition - two sounds presented synchronously with the visual frames. The opposite effect was also found when two sounds were presented in-between two visual frames (see Figure 1.3). However, such audiovisual capture effects were almost gone when one beep was removed (either the first or the second, see Figure 1.4), which strongly suggested that the auditory interval is a critical key factor in the audiovisual temporal integration. Further experiments quantified such audiovisual interval integration using direct audiovisual interval comparisons. Auditory intervals were typically perceived longer than visual intervals with the same physical length. The perceived audiovisual interval was predicted Exp Brain Res by MLE model, indicating auditory and visual intervals are integrated in an optimal way in terms of variability 1.1. (a) PSE (ms) Proportion of "group motion" responses by outer and inner sounds, respectively. In present results go beyond previous stud 2007; Morein-Zamir et al. 2003; Scheier e example, Getzmann (2007) demonstrated 0.8 inner clicks on a continuous apparent-mot failed to find a significant effect of outer 0.6 the latter showed some tendency toward another study (Morein-Zamir et al. 2003), 0.4 one intervening-sounds condition (with Outer sounds stimulus-interval between sounds) that yi 0.2 Synch. sounds in a TOJ task. The failure to find opp Inner sounds previous studies might be due to the shor 0 50 100 150 200 250 300 350 TOJ tasks and participants’ difficulties in ISOI (ms) multiple percepts in classical apparent-m above). (b) 180 Abbildung 1.3: Psychometric curves fitted for paired-beeps conditions. The solid curve and Possible accounts for the temporal in 160 circles represent the the baseline ‘synchronous-sounds’ condition, the have dashed and in previous studi beencurve put forward crosses the ‘outer-sounds’ condition, and the dashed-dotted curve and explanation pluses theis ‘innerbased on the modality preci 140 sounds’ condition. When two sensory modalities provide disc information about an event, this discrepanc 120 favoringgrouping the modality In another study (Chen, Shi, & Müller, 2010), we examined how perceptual in that is characteriz 100 precision in registering general influences crossmodal temporal processing using the same Ternus apparent motion that event. The aud resolution than the visual paradigm. Instead of using audiovisual modalities, in this study we usedhigher visualtemporal and tactile 80 1999; Welch and Warren 1980). Accord Ternus apparent motion, given that we intended to examine bidirectional interactions and events are assigned high weights, and vi 60 Ternus apparent motion can onlyOuter besounds constructed in visual or tactile modality. tac- integration (More Synch. sounds Inner sounds weights, in The audiovisual tile Ternus apparent motion was created Audiovisual by threeIntervals tactile solenoids, which would tap 2003; Welch and the Warren 1980, 1986). remains unclear isbywhether the differ three fingers to induceFig.indentation taps. apparent motion was constructed 3 a Psychometric curvesThe fitted visual for the data from all participants in Experiment 1. The solid circleswe represent the ‘synchrois based on points in time (onsets three LEDs near the three solenoids. In curve the and study, introduced intraweighting and cross-modal nous-sounds’ condition, the dashed curve and crosses the ‘outerintervals. Experiment 2 was designed to pr temporal grouping of the middle element tactile visual) sounds’ condition, and the(either dash-dotted curveorand pluses by the presenting the middfor deciding between these alternatives by condition. b Mean We PSEs found (and associated standard le element twice prior‘inner-sounds’ to the Ternus display. that intramodal grouping of thedo not provide any a single sounds, which errors) for three conditions of audiovisual intervals information, would influence the visual T motion. Table 1 Transition thresholds (and associated standard errors, ms) 1 between ‘element motion’ and ‘group motion’ for three audiovisual (AV) intervals for Experiments 1 and 2 Exp Brain Res 10 1 0.8 30 ms 0 ms −30 ms 0.6 0.4 0.2 10 TVEs (ms) edure Proportion of "group motion" responses rticipants who had taken part in Expericipated in Experiment 2. Five participants ment 2 on day 1 and Experiment 1 on day sa for the other five participants—thus potential practice effects across the two (a) 1. Synopsis 5 0 −5 −10 0 50 30 ms −30 ms TVEs (ms) Proportion of "group motion" responses nsisted of two separate sessions, hitherto 100 150 200 250 300 350 ISOI (ms) periments 2a and 2b, respectively. Half the (b) ormed Experiment 2a first and then 1 Abbildung 1.4: Psychometric curves fitted for single-beep conditions. The solid curve and and vice versa for the other half. In 30 ms 0 ms circles represent the baseline ‘synchronous-sound’ condition, the dashed curve and crosses only one sound was presented, namely, 0.8 −30 ms visual frame. In Experiment 2b, only the the ‘preceding-sound’ condition (audiovisual interval 30 ms), and the dash-dotted curve s presented. The settings were thethe same as and pluses ‘trailing-sound’ interval -30 ms). The magnitude of 0.6condition (audiovisual 10 1, except that either the second sound the temporal ventriloquist effects (TVEs), calculated against the baseline, is presented in 5 or the first sound (Experiment 2b) was 0.4 a subplot for the ‘preceding-sound’ (30 ms) and 0‘trailing-sound’ conditions (-30 ms). e conditions of audiovisual interval were (30 ms before the onset of the respective −5 0.2 ynchronous sound, and trailing sound −10 30 ms −30biased ms middle element Ternus apparent motion onset of the respective visual frame). with rhythmic 0or short precue intervals 100was 150 200 250 300 350 grouping on Ternus aptoward element motion, whereas50there no effect of crossmodal ISOI (ms) parent motion with same temporal settings. This indicated intramodal temporal grouping 4 a Psychometric curves fittedwhich for the data fromto all more participants promotes the saliency Fig. of the middle element, leads element motion percept in Experiment 2a. The solid curve and circles represent the for the three audiovisual intervals were in responses. However,‘synchronous-sound’ such the effectcondition was relative weak to be manifested for crossmodal (audiovisual interval 0 ms), the xperiment 1. For Experiment 2a, in which dashed curve and crosses the ‘preceding-sound’ condition (audiovitemporal grouping conditions. mpanied the first visual frame, the mean sual interval 30 ms), and the dash-dotted curve and pluses the this line of ‘trailing-sound’ research, we further investigated the of the crossmodal ed in Table 1 (see alsoAlong Fig. 4a). A repecondition (audiovisual interval -30 ms).influences The magnitude of theon TVEs, calculated against the ‘synchronous-sound’ NOVA of the PSEs showed timing and the themain event structure intraand cross-modal perceptual grouping (Chen, Shi, baseline, is presented in a subplot for the ‘preceding-sound’ (30 ms) iovisual interval & to be significant, Müller, 2011). In the we used bi-stable tactile apparent motion streams. andstudy ‘trailing-sound’ conditions (-30 two-tap ms). b Psychometric curves P \ 0.05. Separate paired t-tests (with the fitted for the data from all participants in Experiment 2b. The Since the two tactile taps were repeatedly presented with same inter-stimulus interval, nferroni correction) of the PSEs revealed magnitude of the TVEs, calculated against the ‘synchronous-sound’ the leftward and rightward percepts were that(30 is,ms) two mutually exclusive baseline,motion is presented in a subplot for thebi-stable, ‘preceding-sound’ lly) significant difference between the and ‘trailing-sound’ conditions (-30 ms) states switched equally and unpredictably. During the 90-second tactile motion and trailing-soundperceptual conditions (difference stream, mono 0.09). However, the classical TVEsbeeps cal- were added and paired with tactile taps using various temporal asynPSEs didtap not differ reliably with between the beep, preceding-sound o the synchronous-sound baseline were far chronies. When each tactile was paired one we found a typical temporal and synchronous-sound conditions (3.5 ms, P = gnificance (although they were in the right ventriloquist effect, as we found earlier (Shi et al., 2010). That0.9). is, auditory intervals captuTheAs factathat there was significant TVE with a short sound audiotactile interval cally): -3.7 ms, P 0.42, and 3.7 ms, res= paired tactile intervals. result, twonotaps with perceived preceding or trailing the first visual frame or a sound prehe preceding-sound and trailing-sound were grouped together, forming a dominant tactile motion percept. However, when only ceding the second visual frame in Experiment 2 is consistent ctively. of the taps (e.g. odd-numbered taps) were paired beeps, modulation of audiotactiwith Bruns and Getzmann (2008), whowith reported a similar were obtained inhalf Experiment 2b, where le temporal asynchronies was diminished. Instead of a temporal capture effect, a dominant pattern with temporally proximal single sounds. However, a mpanied the second visual frame—see smallaudiotactile TVE was foundside for trailing associated with motion from the to thesound tactile-only side the was observed indepenmean PSEs (see also Fig. 4b).percept A repeatedsecond visual frame in Experiment asymmetry A revealed the main effect of of the audiovisual dently crossmodal asynchrony variation. This 2b. wasThe mainly due to strong attentional preceding and grouping, trailing sounds is similar the gnificant, F(2,18) = 5.49, P \ 0.05. Folbias towards the side between of the crossmodal giving rise toto apparent tactile motion results obtained in a previous TOJ study with a dual-sound ests (with Bonferroni correction) showed from the side of the audiotactile grouping to the other side. modulation (Morein-Zamir et al. 2003). The underlying railing-sound condition to be significantly Taken studies together, weunclear, have though a clear view how crossmodal interval and mechanism is still it has beenon suggested that d to both the synchronous soundthese (classical ussion ms, P \ 0.05) and the preceding sound -9.4 ms, P \ 0.05) condition, while the this pattern relates to an asymmetry in audiovisual simultaneity (Dixon and Spitz 1980; Morein-Zamir et al. 2003). 1.2 Cumulative research work 11 perceptual grouping influence on multisensory temporal integration. The temporal ventriloquist effect has been manifested repeatedly for full paired crossmodal stimuli. Convergent evidence suggests that crossmodal interval/duration integration is one important factor for the temporal ventriloquist effect. On the other hand, when the crossmodal stimuli are unequally paired, perceptual grouping (either intra- or cross-modal grouping ) may first be processed, which leads to dynamic attention shifts and bias the motion percept. 1.2.2 Multisensory and sensorimotor time perception Although distributed models of time perception have been gradually accepted in multisensory time research (Bueti, 2011; Buhusi & Meck, 2005), it is still controversial how the distributed (or modality-specific) timing is integrated together. Distributed timing processes may cause differences between action and perception time, which has been sparsely mentioned in the literature. For example, Walker and Scott once found that motor reproduction relying only on kinesthetic information (i.e. action timing) was overestimated by about 12 percent for an auditory standard duration (Walker & Scott, 1981). In a recent study (Bueti & Walsh, 2010), an action task, where participants reproduced an auditory or visual duration by pressing a button, was compared to a perceptual task, where participants stopped the compared signal when its perceived duration reached the same amount of time as the standard duration. The action timing was strongly overestimated for short durations and underestimated for long duration. Some other studies also demonstrated that the perceived time of a second presented immediately after a saccade or arm movement is often perceived longer than subsequent seconds (but see Binda, Cicchini, Burr, & Morrone, 2009; Park, Schlag-Rey, & Schlag, 2003; Yarrow, Haggard, Heal, Brown, & Rothwell, 2001). Given that action and perceived time is far from veridical and time estimation can be easily biased by various factors, our brain must encounter challenges to integrate various sources of temporal information to enable accurate timing for a multisensory or sensorimotor event. In a recent study (Shi, Ganzenmüller, & Müller, 2013), we investigated this issue using three different duration estimation tasks: auditory duration comparison, motor reproduction, and auditory reproduction. Auditory duration comparison and motor reproduction tasks aimed to measure perceptual and action time processing, whereas the auditory reproduction task was a bimodal (i.e. perceptual and motor) task, which aimed to find how perceptual and action durations are integrated together. We measured estimation variability for all three different tasks. In the spatial domain, reliability-based optimal integration models, such as MLE (1.1), have successfully predicted the effects of multimodal integration in various cases, such as visual-haptic size estimation, audiovisual localization etc. (for a review, see Ernst & Di Luca, 2011). In one of our previous studies using implicit measure (Shi et al., 2010), we also found that the MLE model predict audiovisual duration integration well. We further tested the reliability-based integration model for sensorimotor temporal integration (Shi et al., 2013), particular for auditory reproduction. In contrast to the previous approach using the implicit assumption of unbiased estimates,2 we explicitly 2 For Bayesian integration models, disregarding biases allows one to focus on minimizing variance as an 12 1. Synopsis introduced biases in the quantitative model. Suppose there is a standard auditory duration Ds . An auditory estimate D̂a , derived from a duration comparison task, may contain a bias a . A pure motor reproduction, on the other hand, may lead to a different estimate D̂r , containing a different bias r . That is, E(D̂a ) = Ds + E( a ), (1.2) E(D̂r ) = Ds + E( r ), (1.3) where E(·) is the expectation function. In auditory reproduction, both perceptual auditory comparison and motor reproduction are present. Suppose perceptual and motor estimates are independent of each other, the maximum likelihood prediction of the auditory reproduction is given by the following: E(D̂ar ) = Ds + wa E( a) + wr E( r ), (1.4) where wa and wr are the weights of perceptual and motor estimates. According to MLE, the optimal weights should be inversely proportional to the correspondent variances, 1/ a2 wa = 1/ a2 + 1/ wr = 1 wa . 2 r , (1.5) (1.6) 2 If the optimal weighting rule is followed, the variance for the auditory reproduction ar should also be lower than the variances of the pure perceptual and motor estimates, a2 and r2 . Using one second auditory intervals as a standard stimuli in three different duration tasks, we confirmed the previous finding of overestimation in motor reproduction (Walker & Scott, 1981). In our case, the motor reproduction produced about 40% overestimates, whereas the auditory comparison task provided a relative precise estimation (Figure 1.5). We further compared reliability-based MLE predictions and observed behavioral results, and found the prediction of the MLE model was relative high for the observed auditory reproduction (r = 0.62) and variability (r = 0.68). Similar conclusions were further confirmed by a subsequent experiment with varied standard durations and varied signal-noise ratios (SNRs) in the compared/reproduced tones (Figure 1.6, r = 0.81.). The MLE prediction on sensorimotor duration reproduction was proved to be far better than either a motor or a perceptual dominance model. However, turning to the variability of the bimodal condition, the MLE model turned out to be an suboptimal model, that is, not showing the theoretical improvement. Interestingly, though, it confirmed our previous findings (Shi et al., 2010) and other recent studies (Burr et al., 2009; Hartcher-O’Brien & Alais, 2011; Tomassini, Gori, Burr, Sandini, & Morrone, 2011). optimality criterion. In some studies (e.g. Burr et al., 2009), biases are assumed to be constant across all conditions. 1.2 Cumulative research work 13 means (ms) 600 Motor Rep. Comparison Auditory Rep. 400 200 0 Bias SD Abbildung 1.5: Mean estimation biases and standard deviations (SDs) with ±1 standard error bars for 1-second duration estimation in three different tasks. 600 400 H/800 L/800 H/1200 L/1200 200 0 −200 M.−Rep. Comp. A.−Rep. B Observed reproduction A Mean estimation biases (ms) That is, the variability in crossmodal temporal integration is often found to be suboptimal. The reason of this suboptimal integration is not clear at present. It has been suggested that the assumption of Gaussian noise might not be appropriate for timing tasks (Burr et al., 2009). Alternatively, additional decision noise may be introduced in the bimodal (or sensorimotor) task owing to multiple information and increased task difficulty. It is also possible that time estimates from different sensory (motor) modalities are not independently distributed, but partially dependent, as hinted by the literature of the amodal internal clock model. When sensory estimates are correlated, it has been shown that the true optimal weights and reliability could dramatically deviate from independent optimal integration (Oruç, Maloney, & Landy, 2003). 1800 1600 1400 1200 1000 800 600 500 1000 1500 Predicted reproduction Abbildung 1.6: A. Mean estimation biases (with ±1 standard error bars) as a function of standard duration and SNR. H and L denote high and low SNRs, 800 and 1200 short and long standard in ms. B. Observed reproductions plotted against predicted reproductions. The solid red line is a linear regression of the data (y = 45 + 1.029x). The dot-dashed line indicates ideal optimal cue integration based on MLE. The green and blue crosses represent data from high and low SNR conditions respectively. In addition to the feedback information, feedback delay itself can influence the duration 14 1. Synopsis reproduction, too. In another recent study (Ganzenmüller, Shi, & Müller, 2012), we investigated this issue by injecting an onset- or offset-delay to the sensory feedback signal from a duration reproduction task. We found that the reproduced duration was lengthened, and the lengthening effect was observed immediately, on the first trial with the onset-delay. In contrast, a shortening effect was found with feedback signal offset-delay, though the effect was weaker and merely manifested partially in the auditory reproduction, not in the visual reproduction. The offset of reproduction much relied on the action stop signal. The findings suggest that the reproduction task with feedbacks integrates both perceptual and action time, but relies differentially on the onset of the feedback signals and the motorstop signals. Such differential binding may well relate to the memory-mixing model (Gu & Meck, 2011). Due to limited capacity of working memory and the cause-effect relationship, motor timing, and caused -feedback timing may share the same representation, which pulls both onsets closer. In the study (Ganzenmüller et al., 2012), we further confirmed strong overestimation in the auditory reproduction as shown in other studies (Bueti & Walsh, 2010; Shi et al., 2013; Walker & Scott, 1981). Overestimation of duration can also be induced by emotional states. For example, threatening pictures (Droit-Volet et al., 2004) or angry faces (Droit-Volet et al., 2007) are often judged as longer than neutral stimuli. However, most evidence of emotional distortion of time perception has been gained with unisensory modulation only. Given that time processing is distributed (for a review, see Bueti, 2011), there is no guarantee that introducing emotional stimuli in one modality could influence on the time perception of stimuli from another modality. On the other hand, emotional states may increase general arousal level, and/or bias the crossmodal linkage and perception-action associations, which may in turn influence duration judgments in other modalities. Recently we investigated this issue using a visual-tactile approach (Shi, Jia, & Müller, 2012). We compared modulation induced by three types of emotional pictures (threat, disgust, and neutral) on the subsequent judgment of vibrotactile duration. The results revealed that the processing of threatening pictures to lengthen, relative to the neutral baseline, subsequent judgments of tactile duration. However, there was no evidence of the lengthening effects using disgust pictures. This clearly rejected the hypothesis of a general arousal as a determine factor. We further examined how visual threat influences tactile time processing. If only the pacemaker of the tactile ‘clock’ was sped up, we should observe a slope effect using short and long range intervals (Wearden, 2008), that is, larger difference between the threat and neutral conditions in the long interval condition than the short interval condition. However, this was not the case. Further experiments revealed that emotional activation is followed by emotional regulation. When participants were exposed to threatening pictures, attentional resources was first rapidly directed to the defensive system, including the somatosensory system, for preparing a reaction. As a result, the tactile time processing is dilated. While the same would apply to the long interval condition, participants eventually realized that the tactile stimuli was not a threat event. Accordingly, attentional resources would be increasingly redirected to processes of emotional regulation. As a consequences, the lengthening effect disappeared. High-arousal emotional state not only dilates the duration perception, but prioritizes the crossmodal temporal processing, as shown in one of our new study (Jia, Shi, Zang, & 1.2 Cumulative research work 15 Müller, 2013). In the study, participants were asked to make temporal order judgments (TOJs) to a pair of audiotactile stimuli while gazing at a concurrently presented emotion picture. When the audiotactile stimuli were presented separately on the left and right sides, a significant temporal bias toward the tactile modality was found when the picture had negative meaning (e.g. threat). This finding confirmed our previous conclusion (Shi et al., 2012) that visual-tactile linkage in emotional association is more likely to direct attention toward the tactile than auditory modality. Interestingly, when audiotactile stimuli originated from the same location, there was no such emotional modulation of modalityorientated attention. This suggests that the unity assumption (Welch & Warren, 1980) in crossmodal integration, that is, multisensory stimuli that come from the same origin is likely to be integrated as one single multisensory object than two distal signals, could counteract the otherwise ensuing modality-oriented attentional bias. 1.2.3 Multisensory enhancement, context learning and search performance It is known that detection of a spatio-temporal coinciding multisensory signal is faster than each of the corresponding signals presented separatly. Recently studies by van der Burg and colleagues revealed an interesting phenomenon, the ‘pip and pop’ effect, which showed that spatial uninformative but temporal informative beeps could facilitate search performance (Van der Burg et al., 2010; 2008). In their paradigm, participants had to search for a horizontal or vertical bar among oblique distractors. Both the target and distractors were either green or red, and changed their color randomly. Thus the search task was extremely difficult (see Figure 1.7). When color changes of the target were accompanied by synchronous beeps, however, the search performance was boosted in the order of seconds. Van der Burg and colleagues argued that enhanced performance was due to bottom-up audiovisual integration and saliency-boosting. In contrast, other literature (Colonius & Arndt, 2001; Doyle & Snowden, 1998) showed that performance enhancement by audiovisual integration was typically around 100 ms, way less than the reported pip and pop effect. To further examine the effects of spatially uninformative sound on visual search and the underlying mechanisms, we recently adopted the pip-and-pop paradigm (Van der Burg et al., 2008) and measured eye movements (Zou, Müller, & Shi, 2012). In addition to the auditory synchronous cues, we introduced an informative spatial (central arrow) cue as a top-down attentional guidance and a target-absent condition in a separated experiment. If the pip-and-pop effect is pure bottom-up crossmodal enhancement, we should observe no interaction with top-down precue manipulation, as well as no facilitation in the visual target absent condition given that no crossmodal integration would happen. Our study replicated the pip and pop effect. More interestingly, the effect was not purely bottom-up, as we found interaction between top-down precue and sound presence (Figure 1.8, Left). In addition, detection was also facilitated with the presence of the beeps when the target was absent (Figure 1.8, Right). The behavioral results indicated that some top-down strategies must have been adopted by participants. Further eye movement data showed that mean fixation 16 1–18 1. Synopsis Zou, Müller, & Shi 8 d from a ‘‘pip-and-pop’’ effect integration). In Experiment 1, on effect was found to derive rget side, suggestive of a general Experiment 2 was designed to pip’’ from a ‘‘pip-and-pop’’ effect -absent trials, in addition to the non-spatial beeps cause a f visual search, for instance, as a d spatially extended information at-beep fixations, the facilitation ved even on target-absent trials entirely) irrelevant sounds. Figure 5. Example search display used in Experiment 2. Displays me as in Experiment 1, with1.7: the An example Abbildung search display used in ‘pip-and-pop’ search paradigm. Displays contained 36 bars of different orientations, and observers had to w. contained multiple bars of different orientations, and observers had to detect the target detect whether or not a target, either a horizontal or a vertical bar, orientation (or the was target presence, in aone of ouralteration experiment). There was a repeating present. There was repeating of the display items’ colors, occurring at random time intervals. The onset of the alteration of the display items’ color, occurring at random time intervals. The onsets of the color changes were accompanied by mono-tone beeps, which changes observers (ninecolor females, meanwere accompanied by mono beeps. were either synchronized with the changes of the target or of mal or corrected-to-normal visual distractors depending on conditions of target presence (see aring participated in the experiMethod section for details). n informed consent were paid wasand longer in the sound-present than -absent condition (see Figure 1.9). In particular, They also practiced task in one thethefixation duration was extended when the beep occurred during the fixation and the Results and discussion to the formal experiment. amplitude of the immediately following saccade was increased. Eye movement patterns revealed that participants to fixate longer additionaltrials sounds were presented, Meantended accuracy was lower forwhen target-present permitting temporally and than spatially expanded information (90.3%) for target-absent trials (99.8%),sampling F(1, 14) and improving the ¼ 0.79. target-present trials, ¼ 51.85, p ,changes 0.01, gp2and ues, a white fixation dot in the registration of singleton color thusFor guiding the next saccade more precisely 2 mean RTs were significantly longer for error (i.e., target beforeto the target. The study demonstrated that temporal coincident audiovisual 28, 75.8 cd/m ) was andshown efficiently miss) responses (12.21 s) than for correct (hit) responses al. The dynamic search display events not only show perceptual enhancement, but also influence oculomotor behavior and (5.39 s), F(1, 14) ¼ 62.45, p , 0.01, gp2 ¼ 0.82; by ly when participants had fixated boostdisplay, performance. contrast, for target-absent trials, mean RTs did not 1000 m. In the search all Besides10multisensory enhancement, learning of spatial context could also facilitate differ significantly between error (i.e., false-alarm) stributed across an invisible · responses and responses, F(1, ¼ 10.78, 0.558 jitter). avoid searchTo performance. In one of ourcorrect recent(rejection) studies (Geyer, Shi, & 14) Müller, 2010), a contex0.15, p ¼ 0.71. This pattern (in particular, the raised rgets (if present)tual never appeared cueing paradigm with multiconjuction visual search was used. We confirmed a robust error rate for target-present trials) is likely attributable cells of the matrix (see Figurecueing, 5). contextual that is, target presence was discerned more rapidly when the target was to the difficulty of the search task: participants stopped play contained a target (either a a predictive compared to a non-predictive configuration. searching after a certain amount of time had elapsed Further, contextual bar) in half of embedded the trials; ininthe cueing was dislarger when only subsethaving of configuration containing theoftarget, compared to without a target been detected. The bias ontained only (oblique-bar) responding ‘‘target absent’’ in this contextual case yielded an was larger when a configurations, was predictive. In addition, cueing eriment 1, therethe wereother two sound increased error rate on target-present trials. Note that nt and sound-absent. Importantpredictive display was repeatedly shown across two successive trials. These findings reveal response accuracy was unaffected byguidance sound condition, condition, the onset of the beeps of spatial the importance contextual learning for the of visual search. In another F(1, 14) ¼ 2.00, p ¼ 0.17, gp2 ¼ 0.12. Thus, only trials h the target color changes on recent study (Shi, Zang, Jia, Geyer, & Müller, 2013), we applied a similar contextual cuwith correct responses were subjected to the subsequent ut with random distractors color eing paradigm mobile user interface, examining icon re-configurations during display on target-absent trials. Partici-to a analyses. model switch in touch-based mobile devices. In most current devices, icons are shuffled in a two-alternative forced-choice rapidly as possible to indicate a positional order when the display mode is changed (e.g., from the portrait to landscape was present. Sound-present and administered block-wise, with 4 on presented in random order; in and -absent trials were randomof 30 trials. Reaction time effects Figure 6 presents the mean correct RTs as a function of target presence for the conditions with and without sound. A repeated-measures ANOVA with the factors target presence and sound presence revealed target- 1.2 Cumulative research work Journal of Vision (2012) 12(5):2, 1–18 17 Zou, Müller, & Shiof Vision (2012) 12(5):2, 1–18 Journal 5 & Shi Zou, Müller, search. Mean numbers of fixations are shown in Figure F(2, 28) ¼ 14.90, p , 0.01 3a. The pattern is similar to the mean RTs (Figure 2) – presence or absence, F(1, as also confirmed by a repeated-measures ANOVA, (Figure 7b). Interestingly which, similar to the RT ANOVA, yielded a significant presence and fixation ty main effect of sound presence, F(1, 7) ¼ 21.42, p , 0.01, 3.54, p , 0.05, gp2 ¼ 0 gp2 ¼ 0.75; a significant main effect of cue, F(1, 7) ¼ longer duration of fixat 56.94, p , 0.01, gp2 ¼ 0.89; as well as a significant present, compared to the interaction between the two factors, F(1, 7) ¼ 15.54, p ms vs. 410.9 ms; Figure , 0.01, gp2 ¼ 0.69. This pattern indicates that the These findings of an synchronous sounds facilitated visual search in general beeps (with or without by permitting participants to plan more effective longer duration in the so saccades, and this facilitation was more pronounced tion would appear to be when the cue was invalid. To explore the latter effect pop’’ account assumin further, we separated fixations on the target side from boosting of visual salien those on the non-target side (see Figure 3b). The et al., 2008). Such an ac significant interaction between cue validity and sound the opposite pattern: if presence was largely due(6SE) to the non-target side, F(1, 7) the sound, one would ex 6. Mean reaction times in seconds as a function of 2 Figure 2. Mean reaction times (6SE) in seconds as a function of Figure ¼a 0.65, rather than targetand side, the latency of next, ta ¼ 13.09, p ,(present, 0.01, gas pabsent), target presence for sound-present (stars)validity Abbildung 1.8: Left: Mean reaction time (±SE) in seconds function ofthecue cue validity and sound presence; stars (solid line) and squares 0.04. That is, for the non- shortened. Note that t F(1, 7) ¼ 0.29, p ¼ 0.61, gp2 ¼respectively. sound-absent conditions (squares), represent theRight: sound-present sound-absent and (dotted soundline)presence. meanandreaction timetarget (±SE) seconds asreduced a function of target side, in sound presence the number of similar to that adduced t conditions, respectively. saccadesconditions (on invalid trials) dramatically (from 17.5 to target-absent compared presence, for sound-present (stars) and sound-absent present responses to be(squares), faster than respectively. target-absent 8.1 saccades), indicating that the synchronous sound visual search (see, e.g. (5.4 s vs. 12.9 s), F(1, 14) ¼ 62.0, p , 0.01, ANOVA revealed no RT ‘‘facilitation’’ for error versus responses contrast, the finding of i effectively guided saccades to the valid target side. 2 0.82. The main effect of sound presence was nearcorrect trials, F(1, 7) ¼ 0.39, p ¼ 0.55, gp2 ¼ 0.07; that is, gp ¼To examine how¼participants managed 2to minimize beeps is more in line w significant, F(1, 14) 4.48, p ¼ 0.05, g ¼ 0.23: p there was no evidence of a speed versus accuracy tradetheir number of saccadesicons (and, thus, fixations), we induced by the sound mode). disrupts the spatial among Figure 1.10). synchronous beeps facilitated search(see performance by off inSuch searchremapping task performance. Consequently, in the relationships durations in the soun compared the mean fixation durations among each 742 ms, consistent with the results of Experiment 1.traditioThe subsequent analysis,novel only correct trials remapping were included. methods: sound-and-target-absent We tested several display “position-order invariant” (a sound and cue condition. A repeated-measures ANinteraction between sound presence and target presence by assuming that an em OVA of the mean fixation durations (presented in nal icon-shuffle method), “global rotation” (rotating wholeF(1, display), “local was not the significant, 14) ¼ 0.001, p ¼invariant” 0.97, during such a fixation r Figure 4a) revealed the main effect of sound to be indicating that synchronous beeps facilitated respond-squaReactionlocal time effects 2 confidence (based on fur (preserving regions), and “central invariant” (preserving the central maximal ¼ 0.47, but not significant, F(1, 7) ¼ 6.22, p , 0.05, g p ing to essentially the same extent on target-absent 2as on a ‘‘target-present’’ decisi ¼ 0.03. that of cue validity, F(1, 7) ¼ 0.23, p ¼ 0.64, g p re region). Wemean found that when using the local-invariant or central-invariant remapping target-present trials. Individual reaction times (RTs) were estimated Analysis of the numb fixation duration was 137 ms longer on trials for eachcontextual variable combination, excluding error respons-afterMean overall fewer fixations w methods, cueing was preserved the display was changed, indicating perwith sound than on those without sound. However, in es. Figure 2 presents the mean correct RTs averaged than in the sound-absent contrast to the manual RTs and the number offor effects formance benefits inRTs thewere iconthen localization The global-rotation method is intuitive across participants. submitted totask. a Oculomotor , 0.01, gp2 ¼ 0.43. And, fixations, there was no interaction between cue validity The mean fixation durations are depicted in Figure repeated-measures ANOVA with cue validity and 2 search, establishing tar users, however, in the present study, using desktop monitor to F(1, simulate the 0.25. presence, 7) ¼ 2.30, pof ¼ mobile 0.17, gp ¼device, sound presence as factors. The main effect of cue 7a.and A sound repeated-measures ANOVA the fixation fixations than did establ 2 Inwas order toto further thebetween fixation pattern it might introduce additional mental rotation detrimental to search performance. validity was significant, F(1, 7) ¼ 56.34, p , 0.01, gp ¼ that durations failed reveal aexplore difference target¼ 73.71, p , 0.01, gp2 ¼ during the dynamic search, we re-categorized the As expected, RTs on valid-cue trials were faster, present and -absent trials (main effect of target presence: The 0.89. findings thus provide new guidelines for novel interface design of0.16). icons rearrangement fixations into pthree types: accompanied by a target presence and soun by 3.49 s on average, than RTs on invalid-cue trials. F(1, 14) ¼ 2.62, ¼ 0.12, gp2 ¼fixations However, fixation F(1, 14) ¼ 0.67, p ¼ 0.43 beep, fixations without beep but on on trials a trialwith, withthan sound The main effect of sound presence was also significant, durations were significantly longer in mobile devices. The mean saccade am 2 presence, and fixations on trials without beeps. The F(1, 7) ¼ 19.25, p , 0.01, gp ¼ 0.73. Search on trials without, sound (main effect of sound presence: 7d. A repeated-measure 2three types of fixation are mean durations for these performance was on average 2.4 s faster on trials with F(1, 14) ¼ 6.03, p , 0.05, gp ¼ 0.3); thus, they were cant main effect of the ta depictedwith in Figure 4b. A ANOVA synchronous beeps. This result replicates the pip-and- consistent the results (inrepeated-measures the target-present condip , 0.05, gp2 ¼ 0.33, whi with two factors fixation type and cue validity, revealed pop effect, indicating that synchronous beeps facilitate tion) of Experiment 1. Importantly, this effect of sound was non-significant, F(1, no significant effect of cue validity, F(1, 7) ¼ 1.13, visual search performance. Interestingly, there was a presence was also manifest on target-absent trials,p ¼ The target presence · s 2 ¼ 0.14. However, were significant 0.32, gp that significant interactioninbetween validity and sound indicating the beeps had processing, a there general effect (not marginally significant, F Delay is ubiquitous signalcuetransmission and processing. The neural for differences among the threeontypes of fixation, F(2,exam14) 0.23. On average, saccad presence, F(1, 7) ¼ 11.51, p , 0.05, gp2 ¼ 0.62. The confined to target presence) visual search perfor2 ¼ 0.67. Bonferroni tests revealed ¼ 14.45, p ,to 0.01, ple, search takesbenefits some induced time to convey the sensory thegp brain. For instance, signals by the accompanying synchro-information mance. Similar to Experiment 1, we categorized fixations on target-absent than thethree mean fixation duration to be significantly longer result was likely due to nous beeps were larger for invalid trials (mean: 4.72 s) into types (fixations in the sound-absent condition, fromcompared the human retina to the visual cortex requires about 70 to 100 ms (Schmolesky et when the fixation wasinaccompanied by a beep than for target position at the e to valid trials (mean: 2.26 s). fixations without beeps the sound-present condition, thelatencies. other twowith types of fixation (both p ,is0.05), while al., 1998). Other modalities have similar neural Given that delay not negliand fixations beeps). Further analysis of the present trials). This pr the durations did notthediffer between thetype latter two comparison of the propo fixation durations with factors fixation and gible,Oculomotor one challenge faced in everyday environment for our visual system is the veridical types, p ¼ 0.11. The fixation duration was, average, 18 (i.e., approximately th effects target presence revealed the mean duration of aon fixation by 440 ms when the accompanied fixation was accompanied spatiotemporal representation of moving objects. fast moving would introduce a toextended be A overall longer when itobject was by a beep, target-present (25.1%) an by a beep relative to fixation the mean the other two the proportion of such sa To further explore dynamic search behavior, we compared to the other two typesofwithout beeps, spatial lag if the latency (about 100 ms) was not compensated. A typical visual illusion examined all fixations and saccades made during the conditions. Furthermore, the interaction between cue 1.2.4 Delays in multimodal feedback and user experience induced by the neural transmission delay is the flash-lag effect, that is, a moving object appears to be ahead of a spatial aligned flashed object. The initial hypothesis proposed by Nijhawan (1994) suggested that the position of the moving object is extrapolated forward to compensate for neural delays in the visual pathway so the object’s perceived position is closer to the object’s true instantaneous location. Since then, other alternative accounts have been proposed, such as differential latency, attention shift hypotheses, postdiction 18 1. Synopsis Journal of Vision (2012) 12(5):2, 1–18 Zou, Müller, & Shi 10 Figure 7. (a) Mean fixation duration (6SE) in milliseconds as a function of target presence (present, absent), for sound-present (stars) and -absent conditions (squares), respectively. (b) Mean fixation duration (6SE) in milliseconds as a function of target presence (present, absent), separately for fixations on sound-absent trials (squares), and for fixations with (stars) and, respectively, without beep (diamonds) on sound-present trials. (c) Mean number of fixation (6SE) as a function of target presence (present, absent), for sound-present (stars) and -absent conditions (squares), respectively. (d) Mean saccade amplitude (6SE) in degrees of visual angle as a function of target presence (present, absent), for sound-present (stars) and -absent (squares) conditions, respectively. Abbildung 1.9: (a) Mean fixation duration (±SE) in milliseconds as a function of target presence (present, absent), for sound-present (stars) and -absent conditions (squares), respectively. (b) Mean fixation duration (±SE) in milliseconds as a function of target presence (present, absent), separately for fixations on sound-absent trials (squares), and for the former condition, F(1, 14) ¼ 25.85, p , 0.01, gp2 confidence that a target is actually absent within the fixations¼in 0.65. with (stars) and, respectively, without beep (diamonds) on sound-present trials. The borderline-significant interaction (p ¼ 0.06) currently attended region. For target-absent trials, this was mainly due to (the presence of) beeps increasing condition would leadpresence to, on average, larger subsequent (c) Mean number of fixation (±SE) as a function of target (present, absent), for saccade amplitudes in the target-absent condition, t(14) saccades to outside the currently scanned region. sound-present (stars) and -absent conditions (squares), respectively. (d) Mean saccade am¼ 2.6, p , 0.05. Recall that the fixation duration In summary, consistent with results of Experiment 1 analysis (presented above) had revealed the mean and previous findings (van der Burg et al., 2010; van plitude fixation (±SE) in degrees of visual angle as a function presence duration to be extended in the sound-present der Burg et of al., target 2008), non-spatial beeps(present, synchronizedabsent), conditions. Longer fixation durations may permit the with dynamic color changes of the target for sound-present (stars) and -absent (squares) conditions, respectively. can facilitate attentional spotlight to be expanded and gain greater visual search. The major finding of Experiment 2 was 1.2 Cumulative research work 19 Shi, Zang, Jia, Geyer, & Müller 181 (a) (b) 182 Figure 2. Example displays in the experiments. (a) Example of a horizontal display. In this example, 184 185 display. In this case, icons are remapped from the horizontal display (a) by keeping the position order Abbildung 1.10: Mockup displays for a mobile device. When the display mode is changed, 183 the “Apple” icon (second row, right-most column) is the search target. (b) Example of a vertical icons are shuffled and the spatial relationships among icons are partially destroyed. constant in the left-to-right and up-to-down manner (Experiment 1). etc. (Baldo & Klein, 1995; Eagleman, 2000; Whitney & Murakami, 1998), to explain the 186 Design flash-lag effect.and Theprocedure major difference between the extrapolation account (Nijhawan, 1994) and others is that other hypotheses simply deny low-level compensation mechanism, since 187 A three-factorial within-subject design was used with display orientation (horizontal, such low-level extrapolation is hard to observe directly. In a recent study (Shi & Nijha188 vertical), context (old, new), and experimental epoch (1-9) as independent variables. wan, 2012), we directly tested the extrapolation hypothesis using a novel approach, that 189 From the 24 possible target locations, we randomly selected 12 target positions for old is, using the nature of two foveal scotomas (i.e., scotoma to dim light and scotoma to blue 190 and the other 12 positions for new displays. This way, the target appeared equally light) to trigger motion extrapolation. In the central fovea there is a rod-free area about 191 likely at any of the 24 possible locations. Further, and given that one set of 12 0.3 diameter, where the low intensity objects fail to yield a visual percept (Hecht, 2002). If 192motion locations wasfollows used for old and the other set of 12 locations new displays, the percept faithfully to the retinotopic map, onefor should observe target a disconti193 location repetitions benefits were equated between the two types of display. Thus, RT fovea nuous movement at the boundary of the fovea when the dim object moves across the 194 benefits associated with the presentation of an old display can only be attributed to the ex(see Figure 1.11, left). However, the forward shifts should be observed if there is motion 195 presentation repeated iconmechanism arrangement. of the pathway, old target even locations was there trapolation owing toofathe compensation in Each the visual though paired with 11 randomly selectedfovea distractors at the beginning of each experiment and is196 no physical response in the central (see Figure 1.11, right). Indeed, our behavioral 197 served as old-horizontal displays. These horizontal displays were also used to define experiments provide solid evidence supporting the original motion extrapolation account 198Figure the remapped (see 1.12). old-vertical displays. Remapping was one as follows: Perceiving time delay and crossmodal asynchrony also exists for external synchronous 199 (i) Experiment 1 (”position-order invariant”). The positional order (left-to-right and multisensory events. Sound, for example, travel through air much slower than light. Thus, 200 top to bottom) of the icons in the vertical display was the same as that in the we hear a thunder several seconds later than the flash. Even if a light stimulates the retina 201 horizontal display (Figure 1b). This method is used in most of the present mobile and a sound pushes the eardrum at the same time, brain activation occurs roughly 30-50 202 devices for the rearrangement of icons. ms earlier for the auditory signal (Fujisaki et al., 2004). To have coherent perception of the external world and precise sensorimotor interaction withdisplay the environment, ourrotated brain must 203 (ii) Experiment 2 (“global-rotation”). The horizontal was, as a whole, compensate latencies and adjust multisensory perception accordingly. 204 by 90° clockwise into the vertically display,temporal while preserving the (upright) orientation In a of number of recent studies (Rank, Shi, Müller, Hirche, 2010;relationships 2010; Shi, of Zou, & 205 the individual icons. With this global rotation, the& global and local Müller, 2010; Shi et al., 2010), we have investigated various delayed multimodal feedback 206 the icons are rotated by 90° across display changes (Figure 1c). in multimodal telepresence systems and gained better understanding how multimodal de207 208 (iii) Experiment 3 (“local invariant”). To preserve the local (and global) spatial configuration as much as possible, in Experiment 3, the display was divided into four 7 20 1. Synopsis Abbildung 1.11: Left: A dim object moves across the fovea. If there is no extrapolation mechanism in the visual pathway, the motion percept should follow faithfully to the retinotopic map. Right: A dim object moves across the fovea. Owing to the extrapolation mechanism in the visual pathway, the moving object is still perceived within the rod-free fovea, and reappears further away from fovea due to the neural transmission delay. Motion Extrapolation in the Central Fo Figure 3. Results of Experiment 1. (a) Individual thresholds of participants for three conditions. The left arrows denote the perceived vanis Abbildung 1.12: Results of Experiment 1 from Shi & Nijhawan (2012). Left: Individual positions in the motion-terminated condition; the right arrows denote the perceived initial positions in the motion-initiated condition; the gray 2 . (b)arrows Mean forward shifts the in theperceived motion-initiated and motion-terminated condit denote the thresholdsfor (50%) of motion visibility at 0.028 cd/m thresholds of participants three conditions. The left denote vanis(6SE, n = 6). The vertical dot-dashed line denotes the mean radius of the relatively insensitive fovea centralis. hing positions in the motion-terminated condition; the right arrows denote the perceived doi:10.1371/journal.pone.0033651.g003 initial positions in the motion-initiated condition; the gray bars denote the thresholds (50%) 2 trials. Figure 4a and shows the thresholds for eccentricityat0.5u, which is regarded as Mean in the rod-free of motion visibility 0.028 cd/m . Right: forwardarea shiftsininthe thecatch motion-initiated according to the anatomical size [25], was 2.5%, as low as the participants with the green and the blue filters. The mo motion-terminated conditions (±SE, n = 6). The2vertical dot-dashed line denotes the mean mean false alarm rate (t(5) = 0.76, p = 0.48, gp = 0.1). This insensitive boundary estimated with the blue filter was 0.8760 radius of thesuggested relatively fovea that ininsensitive the rod-free area therecentralis. was no response to the (indicated by the vertical dot-dashed line in Figure 4b), wh low luminance motion. Figure 3a shows that all participants perceived the moving dot as vanishing inside the motion insensitive fovea center in the motion-terminated condition and appearing near the boundary of the motion insensitive area in the motion-initiated condition. The mean perceived termination and initiation positions (6SE) were 0.9260.12u and 1.3960.07u, respectively. Compared with the boundary of the motion insensitive fovea center, the average forward shift into the boundary was 0.5560.13u (corresponding to agreed with previous estimates of Maxwell’s spot [26,27]. The moving dot was perceived to vanish at position 0.4560.11u in motion-terminated condition, and to first appear at posi 0.7460.09u away from the center in the motion-initi condition. Using the motion insensitive boundary, we calculated positional shifts of the blue moving dot (Figure 4b). Consistent w the result for the dim moving dot (Experiment 1), the blue mo dot overshot significantly into the motion insensitive boundary 1.2 Cumulative research work 21 lay perception influences user’s performance. In particular, delays in the haptic modality heavily depends on how the user issues action commands and what information is fed back. In a study by Rank et al. (2010), we systematically investigated the impact of the frequency and amplitudes of active movements on delay detection in a spring-type force field environment. We found that the detection thresholds for time delay in force feedback were negatively correlated with movement frequency and movement amplitude. Movement amplitude and frequency influence the delay detection independently. Within a comfortable force range, magnitudes of feedback force did not affect the discrimination of the haptic delay. This force invariant property in the haptic delay perception provides a useful guideline for system design, such as micro manipulation systems. In such an application area, forces arising in a micro-scale environment are typical very small, below human’s detection threshold, thus forces must be scaled and augmented for operators to provide a comfortable haptic impression. Our findings indicate that using a scale-up design does not change the multimodal, particular haptic, temporal processing. In another study (Rank et al., 2010), we further examined how we perceive haptic delay with different haptic environmental characteristics, such as spring, damper, and inertia. We found that the delay detection was easiest in a damping environment, owing to an additional direction conflict of the force induced by a time delay at movement turning point. All those findings that we obtained could be very useful for future user-centered multimodal system design. Besides inevitable delays in multimodal network communication, information itself could be lost over large geographical distances via Internet(termed as packet loss). In a recent study, we investigated how packet loss and communication delays affect the crossmodal temporal perception (Shi et al., 2010). We simulated packet loss patterns using the Gilbert-Elliot model. When a burst of packet loss happens in the visual feedback, a moving visual scene stagnates. Thus, packet loss may induce a general impression of delayed feedback. Experimental results confirmed that both the point of subjective simultaneity (PSS) and the just noticeable difference (JND) increased as a function of packet loss rate (Figure 1.13). This suggests that the perception of the stagnant visual information may have biased judgments of the temporal order of the visual-haptic event and increased the task difficulty, even though the visual-haptic event is intact and without delay. To further identify how packet loss in the past impacts the forthcoming crossmodal temporal processing, in a follow-up study (Shi et al., 2010), we conducted a new experiment on visual-haptic temporal discrimination with packet loss in the visual feedback. This time, we switched off packet-loss before the critical visual-haptic collision with various switch-off intervals (Figure 1.14). The study revealed that the PSS decreased with increasing switch-off distance, approaching the level achieved in the no-packet-loss condition at a switch-off interval of 172 ms (60-mm distance) (Figure 1.15). This suggests that the past information does not immediately impact on the crossmdoal temporal processing, indicating internal crossmodal temporal processing requires some time (in the range of a hundred or so milliseconds) to incorporate prior information. Thus, on this ground, it would be advisable to use this interval range for the implementation of assistive functions in the design of telepresence systems. As we have shown previously (Shi et al., 2010), packet loss and feedback delay are two 1. Synopsis Prop. of "visual collision first" responses 22 1.00 No Loss 10% Loss 20% Loss 30% Loss 0.75 0.50 0.25 0 0 20 40 60 80 100 Visual−haptic SOA (ms) 120 Abbildung 1.13: Psychometric functions for the four different loss rates in visual-haptic temporal order judgments (TOJs). Abbildung 1.14: Schematic illustration of a trial sequence. The movement trajectory is denoted by the long red arrow. The dashed line of the trajectory denotes visual feedback with packet loss, and the solid line visual feedback without packet loss. The packet loss switch-off distance is denoted by d. 1.3 Summary and outlook 23 Abbildung 1.15: PSS as a function of the switch-off time interval. The switch-off time intervals were estimated from the movement velocity. Error bars indicate 95% confidence intervals, which were estimated from 1000-sample bootstrapping. important factors for visual-haptic temporal perception. Moreover, the user’s own action characteristics (e.g. movement amplitude and phase) contributed to the delay detection, too (Rank et al., 2010). To further unravel the impact of packet loss and delay on task performance, we developed an performance-optimal control scheme for delayed network feedbacks and tested task performances, measured by the number of collisions and task completion time, between with and without an active communication control algorithm (Rank, Shi, Hermann, Hirche, & Member, 2013). The quality control method is based on a online predictive stochastic reachability analysis, that is, analyzing the collision probability of the user’s movements online and dynamically adjust network quality of service (QoS). We used a 2D labyrinth with visual-haptic feedback and asked participants to ‘walk’ through the labyrinth avoid colliding to the walls as fast as possible. Experimental results demonstrated that dynamic QoS control could effectively reduce the collision probability, while completion time was not significantly affected (Figure 1.16). This is the first step of a new approach that we took using sensorimotor interaction and predictive coding to improve task performance in a delayed feedback system. It points out our future direction of research, using predictive human control strategies for delayed-feedback telepresence systems. 1.3 Summary and outlook The focus of this cumulative habilitation thesis is on dynamic temporal processing of multisensory information, which includes multiple aspects: multisensory temporal integration, time perception, and multisensory performance enhancement, as well as potential applications in technical multimodal systems. 24 1. Synopsis 14.8 4.5 Ncol 4 Td (t) [s] pcol ([0, tp ]) tcom [s] achieved with a lower time delay level, t is switched accordingly. The success of t 3.5 quality control scheme can be explained by 14.4 3 the case of an imminent collision, the opera 2.5 ates movement to not collide with the obst 14.2 off on off on inertia during this phase leads to a shorter QoS control QoS control helping the operator to avoid an impact. It can be additionally noted that comp Fig. 10. While completion time is not significantly affected by communicasignificantly influenced by the proposed Abbildung 1.16: While completion time was not significantly affected by communication tion quality control, the number of collisions with the walls are lower when time delay is controlled according to the collision probability. algorithm. This despite the fact that quality control, the number of collisions with the walls was lower when dynamic QoS iswas delay also causes packets to be lost [5] an applied. earlier found to increase task completion ti 0.6 between the lost packets in the quality-co 0.4 and simulated packet dropout investigated 0.2 10.2 two important 10.4 10.6 In the first line of the research, factors, 10.8 crossmodal interval perceptual delayand is lowered not randomly, but only approaches an obstacle. grouping, in multisensory temporal integration have been identified. Several studies (Chen Reconsidering tha 0.2 generally dominated by breaking actions, th et al., 2011; Shi et al., 2010) 0.15 brought convergent evidence that crossmodal interval (duinformation on completion time may be ne 0.1 14.6 ration) integration determines the temporal ventriloquist effect. Asymmetric crossmodal 10.2 on the 10.4 other10.6 10.8 abolish the temporal ventriloor intramodal perceptual grouping, hand, may VI. C ONCLUSION AND F UTURE D t [s] quist effect. In addition, Interval (duration) integration plays a critical roleThe in sensorimotor novel notion of dynamic task perfo timing, too. The reproduced duration, combination andintroduced mix of motor and Fig. 11. The value of the for cost example, function J1 (t)isis a displayed exemplary for for visual-haptic telepresence ta time delay levels (Td,max dotted, Td,min solid, Td,med dash-dotted). On the basis of a specific model, an optim perceptual time. Thethree weights of perceptual and motor time depend on the variabilities of The optimal time delay Td,opt with minimum value of J1 (t) (dashed) is quality control algorithm is developed, p lowpass-filtered (solid) and appliedsensory to the communication correspondent estimates. Moreover, when feedbackchannel. delay is introduced, the repromal time delay based on the current task duced duration is then heavily relied on the onset of the feedback, as well as the offset of operator’s physical abilities. The approac motor action. of an imminent collision where lower latency produces lower the specific example of moving in an ob visual andintegration haptic feedback. Time delay as probability pcol ([0, tpapproaches, ]). The actual crossmodal values for 1temporal Using quantitive collision measures and Bayesian lowers the operator’s task performance, m and were found in piloting experiments. 2 has been shown to follow the MLE model with some modifications. The main modification After familiarization with the system and training of at of obstacle collisions and completion tim is that biases are explicitly acknowledged in sensory time estimates and in motor reproleast five trials with quality-controlled time delay and five of communication quality control demonst duction. Incorporating biases explicitly the model prediction ofcaused MLEbyfor collisions the operator. constant-delay trials, eachin participant had toshows performhigh 40 trials crossmodal perceptual integration (Shi etusing al.,the 2010) and sensorimotor durationquality control is b As communication withduration communication quality control cost function reproduction (Shi etfrom al., equation 2013). (10) and 40 trials using a cost function without operators in terms of the ability to avo considering the collision probability, resulting in a constant obstacles, further investigations are encour Temporally coinciding multisensory and salienpriate algorithms improving task perform time delay of Td = 0.2. signals may boost perceptual reliability cy, and as a result, it facilitates object discrimination and identification. Using eye tasks. The task performance m Five engineering students participated in this experiment, telepresence tracking consideration of multiple of them in the Experiment described in Secmethod, the study ofnone Zou et al.participated (2012) revealed that audiovisual synchronous events can performance mod control scheme. It may further be inter tion IV. All were right-handed. All experimental conditions, alter oculomotor behavior to boost search performance. An auditory tone automatically quality-controlled and constant time delay conditions, were different communication quality control alg freezes eye movement, extending the fixation duration. Such long fixation may allow parfully intermixed to avoid adaptation effects. The duration of performance improvements, e.g., the curre ticipants to use effective information planning. a result, search could be compared to a static quality assign one trial was less thansampling 30 seconds,and thus saccade all 80 trials could be As performance is greatly improved. In addition, influences of contextual learning on search performance model used to calculate the c finished within one hour. experiment was performed using the hardware setup in this paper was only based on a mechanic performance have alsoThe been investigated in various studies. delay-dependent inertia. More sophisticated described in the Appendix. The final research strand focuses on the perception of multimodal feedback delays in the human control strategy and the role of technical systems, such as multimodal telepresence systems. On the aspect of crossmodal in time-delayed telepresence are desirab E. Results and Discussion simultaneity, the influences of packet loss and feedback delays on visual-haptic synchrony accurate estimate of the operator’s future p Paired t-tests reveal that communication quality control has been examined, and results indicate that packet loss induced an impression of stagnahas a significant positive effect on the task performance in A PPENDIX tion, biasing crossmodal perception. the aspect of delay terms temporal of the number of obstacleOn collisions (p < 0.05), as perception, several depicted in Figure 10. On the other side, completion time is not significantly changed by active control of time delay in the communication channel (p = 0.96). In Figure 11, the onlinecontrol of communication quality using the developed model predictive controller is depicted exemplarily. In the case of no C OLLISION P ROBABILITY C ALCULAT P ROBABILISTIC R EACHABLE Without loss of generality, we will cons ations to the time interval [0, T ]. We deno the reachable set of states from an initia 1.4 References 25 studies (Rank et al., 2010) have been devoted to examine how the user’s action and the haptic environment influence on the temporal discrimination sensitivity of haptic feedback delay. The findings indicated that delay perception is neither isolated as the user’s internal representation, nor determined by environmental characteristics. Delay perception is formed during dynamic interaction with the environment and influenced by both the user’s own action and the external environments. Using predictive coding methods, a dynamic optimal quality control scheme for a delayed feedback visual-haptic telepresence system has been developed. Follow-up behavioral studies indicated such user-oriented quality control scheme could significantly reduce hazard collision during the operation while keeping the performance time at the same level. The results of the cumulative research work also raised various further research questions. One challenge issue in multisensory temporal integration is biased temporal estimates. It is a common knowledge that time perception can be easily distorted by a variety of factors. Given that time processing is distributed, differential biases in different sensory time estimates may cause an internal conflict of time representation. Our brain must continuously calibrate related sensory estimates to keep internal consistency. The problem of which modality should be calibrated arises, when the sensory system has only noisy temporal estimates and their inconsistencies. It is generally impossible to determine which modality is biased without additional information. How multisensory systems calibrate their temporal bias would be a future interesting research issue. Bayesian models have been extensively used for multisensory spatial integration or unimodal dynamic adjustment. To date, some quantitative models, such as MLE, have been applied for multisensory temporal integration, including some studies reported here. The MLE model, however, only focused on sensory likelihoods and unbiased estimates, without considering any prior knowledge. Recently studies in unimodal temporal perception (Jazayeri & Shadlen, 2010) showed that prior knowledge actually heavily influences time estimation. Given that time estimates are often biased, future research should also focus on how prior knowledge influences on multisensory duration integration. 1.4 References Alais, D., & Burr, D. C. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current biology, 14, 257–62. Alais, D., Newell, F. N., & Mamassian, P. (2010). Multisensory processing in review: from physiology to behaviour. Seeing and perceiving, 23, 3–38. Angrilli, A., Cherubini, P., Pavese, A., & Manfredini, S. (1997). The influence of affective factors on time perception. Attention, Perception, & Psychophysics, 59, 972–982. Baldo, M. V., & Klein, S. A. (1995). Extrapolation or attention shift?. Nature, 378, 565–566. Binda, P., Cicchini, G. M., Burr, D. C., & Morrone, M. C. (2009). Spatiotemporal distortions of visual perception at the time of saccades. The Journal of Neuroscience, 29, 13147–57. 26 1. Synopsis Bolia, R. S., D’Angelo, W. R., & Richard, L. (1999). Aurally Aided Visual Search in Three-Dimensional Space. Human Factors: The Journal of the Human Factors and Ergonomics Society, 41, 664–669. Bruns, P., & Getzmann, S. (2008). Audiovisual influences on the perception of visual apparent motion: exploring the effect of a single sound. Acta Psychol, 129, 273–283. Bueti, D. (2011). The Sensory Representation of Time. Frontiers in Integrative Neuroscience, 5, 1–3. Bueti, D., & Walsh, V. (2010). Memory for time distinguishes between perception and action. Perception, 39, 81–90. Bueti, D., Bahrami, B., & Walsh, V. (2008). Sensory and association cortex in time perception. Journal of cognitive neuroscience, 20, 1054–62. Buhusi, C. V., & Meck, W. H. (2005). What makes us tick? Functional and neural mechanisms of interval timing. Nature reviews. Neuroscience, 6, 755–65. Burr, D. C., Banks, M. S., & Morrone, M. C. (2009). Auditory dominance over vision in the perception of interval duration. Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale, 198, 49–57. Chen, L., Shi, Z., & Müller, H. J. (2010). Influences of intra-and crossmodal grouping on visual and tactile Ternus apparent motion. Brain research, 1354, 152–162. Chen, L., Shi, Z., & Müller, H. J. (2011). Interaction of Perceptual Grouping and Crossmodal Temporal Capture in Tactile Apparent-Motion. PLoS ONE, 6, 17130. Chen, Y.-C., & Yeh, S.-L. (2009). Catch the moment: multisensory enhancement of rapid visual events by sound. Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale, 198, 209–19. Colonius, H., & Arndt, P. (2001). A two-stage model for visual-auditory interaction in saccadic latencies. Percept Psychophys, 63, 126–147. Cunningham, D. W., Billock, V. A., & Tsou, B. H. (2001). Sensorimotor adaptation to violations of temporal contiguity. Psychological Science, 12, 532–5. Doyle, M. C., & Snowden, R. J. (1998). Facilitation of visual conjunctive search by auditory spatial information. Perception, Supplementary, 27, 134. Droit-Volet, S., Brunot, S., & Niedenthal, P. M. (2004). Perception of the duration of emotional events. Cognition and Emotion, 18, 849–858. Droit-Volet, S., Meck, W. H., & Penney, T. B. (2007). Sensory modality and time perception in children and adults. Behavioural processes, 74, 244–50. Eagleman, D. M. (2000). Motion Integration and Postdiction in Visual Awareness. Science, 287, 2036–2038. Engbert, K., Wohlschläger, A., & Haggard, P. (2008). Who is causing what? The sense of agency is relational and efferent-triggered. Cognition, 107, 693–704. Engbert, K., Wohlschläger, A., Thomas, R., & Haggard, P. (2007). Agency, subjective time, and other minds. Journal of Experimental Psychology: Human Perception and Performance, 33, 1261–1268. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433. 1.4 References 27 Ernst, M. O., & Bülthoff, H. H. (2004). Merging the senses into a robust percept. Trends in cognitive sciences, 8, 162–9. Ernst, M. O., & Di Luca, M. (2011). Multisensory Perception: From Integration to Remapping. In J. Trommershäuser, K. P. Körding, & M. S. Landy (Eds. & Trans.), Sensory Cue Integration (pp. 224–250). New York: Oxford University Press. Fendrich, R., & Corballis, P. M. (2001). The temporal cross-capture of audition and vision. Perception & Psychophysics, 63, 719–25. Ferrell, W. R. (1966). Delayed force feedback. Human factors, 8, 449–455. Freeman, E., & Driver, J. (2008). Direction of Visual apparent motion driven solely by timing of a static sound. Current biology, 18, 1262–1266. Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. (2004). Recalibration of audiovisual simultaneity. Nat Neurosci, 7, 773–8. Ganzenmüller, S., Shi, Z., & Müller, H. J. (2012). Duration reproduction with sensory feedback delay: differential involvement of perception and action time. Frontiers in Integrative Neuroscience, 6, 1–11. Getzmann, S. (2007). The effect of brief auditory stimuli on visual apparent motion. Perception, 36, 1089–103. Geyer, T., Shi, Z., & Müller, H. J. (2010). Contextual cueing in multi-conjunction visual search is dependent on color- and configuration-based intertrial contingencies. Journal of Experimental Psychology: Human Perception and Performance. Ghose, G. M., & Maunsell, J. H. R. (2002). Attentional modulation in visual cortex depends on task timing. Nature, 419, 616–620. Grondin, S. (1993). Duration discrimination of empty and filled intervals marked by auditory and visual signals. Attention, Perception, & Psychophysics, 54, 383–394. Gu, B. M., & Meck, W. H. (2011). New perspectives on Vierordt’s law: memory-mixing in ordinal temporal comparison tasks. Multidisciplinary Aspects of Time and Time Perception, 6789, 67–78. Haggard, P., Clark, S., & Kalogeras, J. (2002). Voluntary action and conscious awareness. Nature neuroscience, 5, 382–5. Hartcher-O’Brien, J., & Alais, D. (2011). Temporal ventriloquism in a purely temporal context. Journal of experimental psychology. Human perception and performance, 37, 1383– 1395. Hecht, E. (2002). Optics. Addison-Wesley. Heron, J., Hanson, J. V. M., & Whitaker, D. (2009). Effect before cause: supramodal recalibration of sensorimotor timing. PLoS ONE, 4, 7681. Ivry, R. B., & Richardson, T. C. (2002). Temporal control and coordination: the multiple timer model. Brain and cognition, 48, 117–32. Jay, C., Glencross, M., & Hubbold, R. (2007). Modeling the effects of delayed haptic and visual feedback in a collaborative virtual environment. ACM Trans. Comput.-Hum. Interact., 14. Jazayeri, M., & Shadlen, M. N. (2010). Temporal context calibrates interval timing. Nature neuroscience, 13, 1020–6. 28 1. Synopsis Jia, L., Shi, Z., Zang, X., & Müller, H. J. (2013). Concurrent emotional pictures modulate spatial-separated audiotactile temporal order judgments. Brain Research. Keetels, M., Stekelenburg, J., & Vroomen, J. (2007). Auditory grouping occurs prior to intersensory pairing: evidence from temporal ventriloquism. Experimental Brain Research, 180, 449–456. Kennedy, J. S., Buehner, M. J., & Rushton, S. K. (2009). Adaptation to sensory-motor temporal misalignment: instrumental or perceptual learning?. Quarterly journal of experimental psychology (2006), 62, 453–69. Kim, T., Zimmerman, P. M., Wade, M. J., & Weiss, C. A. (2005). The effect of delayed visual feedback on telerobotic surgery. Surgical endoscopy, 19, 683–686. Körding, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature, 427, 244–247. Levitin, D. J., MacLean, K., Mathews, M., & Chu, L. (2000). The perception of crossmodal simultaneity. International Journal of Computing and Anticipatory Systems, 323– 329. MacKenzie, I. S., & Ware, C. (1993). Lag as a determinant of human performance in interactive systems. Conference on Human Factors in Computing Systems, 488. Matell, M., & Meck, W. H. (2004). Cortico-striatal circuits and interval timing: coincidence detection of oscillatory processes. Cognitive Brain Research, 21, 139–170. Morein-Zamir, S., Soto-Faraco, S., & Kingstone, A. (2003). Auditory capture of vision: examining temporal ventriloquism. Cognitive Brain Research, 17, 154–163. Nijhawan, R. (1994). Motion extrapolation in catching. Nature, 370, 256–7. Noulhiane, M., Mella, N., Samson, S., Ragot, R., & Pouthas, V. (2007). How Emotional Auditory Stimuli Modulate Time Perception. Emotion, 7, 697–704. Oruç, İ., Maloney, L. T., & Landy, M. S. (2003). Weighted linear cue combination with possibly correlated error. Vision Research, 43, 2451–2468. Park, J., Schlag-Rey, M., & Schlag, J. (2003). Voluntary action expands perceived duration of its sensory consequence. Experimental Brain Research, 149, 527–529. Penney, T. B., Gibson, J. J., & Meck, W. H. (2000). Differential effects of auditory and visual signals on clock speed and temporal memory. Journal of Experimental Psychology: Human Perception and Performance, 26, 1770–1787. Rank, M., Shi, Z., Hermann, M., Hirche, S., & Member, S. (2013). Performance-Optimal Communication Quality Control for Haptic Telepresence Systems. Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans. Rank, M., Shi, Z., Müller, H. J., & Hirche, S. (2010). Perception of delay in haptic telepresence systems. Presence, 19. Rank, M., Shi, Z., Müller, H. J., & Hirche, S. (2010). The influence of different haptic environments on time delay discrimination in force feedback. Lecture Notes in Computer Science, 6191, 205–212. Roach, N. W., Heron, J., & McGraw, P. V. (2006). Resolving multisensory conflict: a strategy for balancing the costs and benefits of audio-visual integration. Proceedings. Biological sciences / The Royal Society, 273, 2159–68. 1.4 References 29 Scheier, C., Nijhawan, R., & Shimojo, S. (1999). Sound alters visual temporal resolution. Investigative Ophthalmology & Visual Science, 40. Schmolesky, M. T., Wang, Y., Hanes, D. P., Thompson, K. G., Leutgeb, S., Schall, J. D., & Leventhal, A. G. (1998). Signal Timing Across the Macaque Visual System. J Neurophysiol, 79, 3272–3278. Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you hear. Nature, 408, 788. Shi, Z., & Nijhawan, R. (2012). Motion extrapolation in the central fovea. PloS one, 7, 33651. Shi, Z., Chen, L., & Müller, H. J. (2010). Auditory temporal modulation of the visual Ternus effect: the influence of time interval. Experimental Brain Research, 203, 723–35. Shi, Z., Ganzenmüller, S., & Müller, H. J. (2013). Reducing bias in the duration reproduction by integrating reproduced signal. PloS one. Shi, Z., Jia, L., & Müller, H. J. (2012). Modulation of tactile duration judgments by emotional pictures. Frontiers in Integrative Neuroscience, 6, 1–9. Shi, Z., Zang, X., Jia, L., Geyer, T., & Müller, H. J. (2013). Transfer of contextual cueing in full-icon display remapping. Journal of Vision. Shi, Z., Zou, H., & Müller, H. J. (2010). Temporal perception of visual-haptic events in multimodal telepresence system. In M. H. Zadeh (Ed. & Trans.), Advances in Haptics (pp. 437–450). InTech. Shi, Z., Zou, H., Rank, M., Chen, L., Hirche, S., & Müller, H. J. (2010). Effects of packet loss and latency on the temporal discrimination of visual-haptic events. Haptics, IEEE Transactions on, 3, 28–36. Spence, C., Nicholls, M. E., & Driver, J. (2001). The cost of expecting events in the wrong sensory modality. Percept Psychophys, 63, 330–336. Spence, C., Sanabria, D., & Soto-Faraco, S. (2007). Intersensory Gestalten and crossmodal scene perception. In K. Noguchi (Ed. & Trans.), Psychology of beauty and Kansei: New horizons of Gestalt perception (pp. 519–579). Tokyo: Fuzanbo International. Stetson, C., Cui, X., Montague, P. R., & Eagleman, D. M. (2006). Motor-sensory recalibration leads to an illusory reversal of action and sensation. Neuron, 51, 651–9. Stone, J. V., Hunkin, N. M., Porrill, J., Wood, R., Keeler, V., Beanland, M., . . . Porter, N. R. (2001). When is now? Perception of simultaneity. Proc Biol Sci, 268, 31–8. Sugano, Y., Keetels, M., & Vroomen, J. (2010). Adaptation to motor-visual and motorauditory temporal lags transfer across modalities. Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale, 201, 393–9. Tomassini, A., Gori, M., Burr, D. C., Sandini, G., & Morrone, M. C. (2011). Perceived duration of Visual and Tactile Stimuli Depends on Perceived Speed. Frontiers in integrative neuroscience, 5, 51. Van der Burg, E., Cass, J., Olivers, C. N. L., Theeuwes, J., & Alais, D. (2010). Efficient visual search from synchronized auditory signals requires transient audiovisual events. PloS one, 5, 10664. Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2008). Pip and pop: nonspatial auditory signals improve spatial visual search. Journal of experimental 30 1. Synopsis psychology. Human perception and performance, 34, 1053–65. Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2009). Poke and pop: tactile-visual synchrony increases visual saliency. Neuroscience letters, 450, 60–4. Vroomen, J., & Keetels, M. (2010). Perception of intersensory synchrony: a tutorial review. Attention, perception, & psychophysics, 72, 871–84. Vroomen, J., & de Gelder, B. (2000). Sound enhances visual perception: cross-modal effects of auditory organization on vision. Journal of experimental psychology. Human perception and performance, 26, 1583–1590. Walker, J. T., & Scott, K. J. (1981). Auditory–visual conflicts in the perceived duration of lights, tones, and gaps. Journal of Experimental Psychology: Human Perception and Performance, 7, 1327–1339. Wearden, J. H. (2008). Slowing down an internal clock: implications for accounts of performance on four timing tasks. Quarterly journal of experimental psychology (2006), 61, 263–74. Wearden, J. H., Edwards, H., Fakhri, M., & Percival, A. (1998). Why “sounds are judged longer than lights:” application of a model of the internal clock in humans. Q J Exp Psychol B, 51, 97–120. Welch, R. B. (1999). Meaning, attention, and the ’unity assumption’ in the intersensory bias of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Müsseler (Eds. & Trans.), Presence (Vol. 8, pp. 371–387). Elsevier. Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–67. Whitney, D., & Murakami, I. (1998). Latency difference, not spatial extrapolation. Nature Neuroscience, 1, 656–657. Yarrow, K., Haggard, P., Heal, R., Brown, P., & Rothwell, J. C. (2001). Illusory perceptions of space and time preserve cross-saccadic perceptual continuity. Nature, 414, 302–5. Zou, H., Müller, H. J., & Shi, Z. (2012). Non-spatial sounds regulate eye movements and enhance visual search. Journal of Vision, 12, 1–18. van Wassenhove, V., Buonomano, D. V., Shimojo, S., & Shams, L. (2008). Distortions of Subjective Time Perception Within and Across Senses. PLoS ONE, 3, 1437.