Measuring Cognitive Load During Visual Tasks by Combining
Transcription
Measuring Cognitive Load During Visual Tasks by Combining
MEASURING COGNITIVE LOAD DURING VISUAL TASKS BY COMBINING PUPILLOMETRY AND EYE TRACKING A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Jeff Klingner May 2010 iv Abstract Visualizations and visual interfaces can provide the means to analyze and communicate complex information, but such interfaces often overwhelm or confuse their users. Evaluating an interfaces’s propensity to overload users requires the ability to assess cognitive load. Changes in cognitive load cause very small dilations of the pupils. In controlled settings, high-precision pupil measurements can be used to detect small differences in cognitive load at time scales shorter than one second. However, cognitive pupillometry has been generally limited to experiments using auditory stimuli and a blank visual field, because the pupils’ responsiveness to changes in brightness and other visual details interferes with load-induced pupil dilations. In this dissertation, I present several improvements in methods for measuring cognitive load using pupillary dilations. First, I extend the set of eye tracking equipment validated for cognitive pupillometry, by determining the pupillometric precision of a remote-camera eye tracker and using remote camera equipment to replicate classic cognitive pupillometry experiments performed originally using head-mounted cameras. Second, I extend the applicability of cognitive pupillometry in visual tasks by developing fixation-aligned averaging methods to to handle the unpredictability of visual attention, and by demonstrating the measurement of cognitive load during visual search and map reading. I describe the methods used to accomplish these results, including experimental protocols and data processing methods to control or correct for various non-cognitive pupillary reflexes and methods for combining pupillometry with eye tracking. I present and discuss a new finding of a cognitive load advantage to visual presentation of simple arithmetic and memorization tasks. v vi Acknowledgements Advisors and committee I am grateful to my advisors Pat Hanrahan and Barbara Tversky. Together, they made it possible for me to stand with a foot in computer science and a foot in cognitive psychology without slipping. Both have wide expertise and their perspectives have been an invaluable guide to my research. Pat has stuck by me through several thesis topic changes and funding droughts, and Barbara has protected me from sloppy psychological thinking and overbroad experimental ambitions. Together with Pat and Barbara, Scott Klemmer provided valuable feedback on this dissertation, and with Jeff Heer and Roy Pea, helped me during my oral defense to see the the broader implications of this work. I am grateful to Manu Kumar, who worked hard to get the eye tracker for our lab and turned me on to its rich experimental uses. I am also grateful to the rest of the Stanford Graphics Lab faculty and the professors in the department with whom I have taught, for showing me how to do all of the things that professors do. I am grateful to Kathi DiTommaso, Meredith Hutchin, and the rest of the department staff who helped me navigate the paperwork of graduate school and find so many opportunities to teach. Family and Friends Above all, I am grateful to my wife Sophie, for her ceaseless love, support, and encouragement. My parents and my brother have also been wonderfully encouraging. Dave Akers enriched our office and kept me going through the toughest times of grad school. I am deeply thankful to him and my other close friends and fellow students at Stanford, who made my years at Stanford some of the happiest of my life: Augusto Roman, Doantam Phan, Daniel Horn, Kayvon Fatahalian, vii Dan Morris, and all the g-slackers, gates-poker folks, and Christmas decoration overachievers. Funding This work was funded by the Stanford Regional Visual Analytics Center, through the U.S. Department of Energy’s Pacific Northwest National Laboratory. Portions of this research were supported by NSF grants HHC 0905417, IIS-0725223, IIS-0855995, and REC 0440103. Our eye tracker was funded by the Stanford MediaX project and the Stanford School of Engineering. My graduate studies were also funded by a National Science Foundation Graduate Research Fellowship and by the John and Kate Wakerly Stanford Graduate Fellowship. viii Contents Abstract v Acknowledgements vii 1 Introduction 1 2 Background 7 2.1 Cognitive load defined . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Past cognitive pupillometry research . . . . . . . . . . . . . . . . . . 8 2.2.1 Cognitive psychology . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Human-computer interaction . . . . . . . . . . . . . . . . . . . 10 Cognitive pupillometry . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Infrared video pupillometry . . . . . . . . . . . . . . . . . . . 10 2.3.2 Cognitive pupillometry uses eye trackers . . . . . . . . . . . . 10 2.3.3 Types of video eye trackers used for pupillometry . . . . . . . 11 2.3.4 Advantages of remote imaging . . . . . . . . . . . . . . . . . . 13 2.3.5 Relative scales and the need for trial aggregation . . . . . . . 15 2.3 3 Remote eye tracker performance 17 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Study description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Evaluated instrument . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.2 Reference instrument . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 ix 3.3 3.4 Metrology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.1 Pupil diameter metrology . . . . . . . . . . . . . . . . . . . . 22 3.3.2 Pupil dilation metrology . . . . . . . . . . . . . . . . . . . . . 26 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Replication of classic pupillometry results 4.1 4.2 4.3 4.4 31 Digit span memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.2 Study description . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Mental multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.1 Study description . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Vigilance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.1 Study description . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 From auditory to visual 41 5.1 The need to use visual stimuli . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Controlling for non-cognitive pupillary motions . . . . . . . . . . . . 42 5.2.1 Pupillary light reflex . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.2 Luminance changes caused by shifting gaze . . . . . . . . . . . 43 5.2.3 Other visual causes of pupil changes . . . . . . . . . . . . . . 43 5.2.4 Pupillary blink response . . . . . . . . . . . . . . . . . . . . . 44 Visual replication of classic auditory studies . . . . . . . . . . . . . . 45 5.3.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.3 Digit sequence memory . . . . . . . . . . . . . . . . . . . . . . 48 5.3.4 Mental multiplication . . . . . . . . . . . . . . . . . . . . . . . 50 5.3.5 Vigilance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3 x 6 Combining gaze data with pupillometry 65 6.1 The usefulness of gaze data . . . . . . . . . . . . . . . . . . . . . . . 66 6.2 Fixation-aligned pupillary response averaging . . . . . . . . . . . . . 67 6.2.1 Identifying subtask epochs using patterns in gaze data . . . . 68 6.2.2 Aligning pupil data from selected epochs . . . . . . . . . . . . 69 6.2.3 Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Example applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3.1 Visual search . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3.2 Map legend reference . . . . . . . . . . . . . . . . . . . . . . . 81 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.3 6.4 7 Unsolved problems 7.1 7.2 87 Current limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.1.1 Simple, short tasks . . . . . . . . . . . . . . . . . . . . . . . . 87 7.1.2 Restrictions on task display . . . . . . . . . . . . . . . . . . . 88 7.1.3 Restrictions on interaction . . . . . . . . . . . . . . . . . . . . 88 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.2.1 Disentangling various pupillary influences . . . . . . . . . . . . 88 7.2.2 Combining pupillometry with other psychophysiological measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.2.3 Modeling and compensating for the pupillary light reflex . . . 90 7.2.4 Expanding proof-of-concept studies . . . . . . . . . . . . . . . 91 A Experimental Methods 93 A.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 A.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 A.3 Physical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 A.3.1 Room illumination . . . . . . . . . . . . . . . . . . . . . . . . 96 A.4 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 A.4.1 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 A.4.2 Perspective distortion . . . . . . . . . . . . . . . . . . . . . . . 99 xi A.4.3 Data processing for statistical evaluation of differences in dilation magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . 100 A.4.4 Significance tests . . . . . . . . . . . . . . . . . . . . . . . . . 100 A.4.5 Baseline subtraction . . . . . . . . . . . . . . . . . . . . . . . 100 A.4.6 Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 xii List of Tables 3.1 Breakdown of the lighting and task conditions used to induce pupil states and movements between double measurements. . . . . . . . . . 3.2 3.3 23 Breakdown of the diameter precision results for the eye tracker by study participant and eye. . . . . . . . . . . . . . . . . . . . . . . . . 26 Summary of the Tobii 1750’s pupillometric performance . . . . . . . . 29 xiii xiv List of Figures 1.1 Scan path on the Stanford parking map . . . . . . . . . . . . . . . . . 2 2.1 Tobii 1750 eye tracker and highlighted pupil image . . . . . . . . . . 11 2.2 Chin-rest and head-mounted style eye trackers . . . . . . . . . . . . . 12 2.3 Two off-the-shelf remote eye tracking systems . . . . . . . . . . . . . 13 2.4 Sources of variation in measurements of pupil diameter. . . . . . . . . 15 3.1 Instruments used in the metrology study . . . . . . . . . . . . . . . . 20 3.2 Arrangement of equipment in the metrology study . . . . . . . . . . . 21 3.3 Scatterplot of simultaneous measurements taken during the metrology study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1 Participant fields of view during auditory experiments. . . . . . . . . 33 4.2 Comparison to classic auditory digit span result . . . . . . . . . . . . 34 4.3 Comparison to classic auditory mental multiplication result . . . . . . 36 4.4 Auditory vigilance pupil trace . . . . . . . . . . . . . . . . . . . . . . 38 5.1 Pupillary blink response for blinks of length 0.1 sec . . . . . . . . . . 46 5.2 Pupillary reaction to auditory vs. visual presentation of the digit span task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Pupillary reaction to auditory vs. visual presentation of the mental multiplication task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 52 Pupillary reaction to sequential vs. simultaneous visual presentation of the mental multiplication task . . . . . . . . . . . . . . . . . . . . . . 5.5 49 53 Pupillary reaction to mental multiplication problems of varying difficulty 54 xv 5.6 Pupillary reaction to auditory vs. visual presentation of the vigilance task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.7 Pupillary reaction to vigilance moments that require reactions. . . . . 58 5.8 Performance on simple tasks by auditory vs. visual task presentation 60 6.1 Illustration of epoch alignment via temporal translation followed by averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Illustration of piecewise linear warping applied to a single epoch of pupil diameter data defined by four gaze events . . . . . . . . . . . . 6.3 70 72 Illustration of epoch alignment via piecewise linear warping followed by averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.4 A fragment of a search field used in my visual search study . . . . . . 75 6.5 Pupillary response to visual search fixations on targets vs. non-targets 77 6.6 Pupillary response to sisual search discovery vs. revisit fixations . . . 79 6.7 Pupillary responses to visual search target discovery order . . . . . . 80 6.8 Gaze trace illustrating a legend reference in the map reading task . . 82 6.9 Pupillary response to legend references . . . . . . . . . . . . . . . . . 84 A.1 Arrangement of experimental equipment . . . . . . . . . . . . . . . . 95 A.2 Illustration of data cleaning steps . . . . . . . . . . . . . . . . . . . . 98 A.3 Left-right correlation of pupil size by frequency component . . . . . . 99 xvi Chapter 1 Introduction Every active intellectual process, every psychical effort, every exertion of attention, every active mental image, regardless of content, particularly every affect just as truly produces pupil enlargement as does every sensory stimulus. —Oswald Bumke, 1911 [21] Visualizations and visual interfaces work well when they make good use of people’s perceptual and cognitive abilities. When patterns in data are mapped to visual forms in which those patterns are easily explored and apprehended, visualizations expand human capabilities. But unfortunately, many visualizations and visualizations often overwhelm us, overloading our perceptual and cognitive resources instead of using them efficiently. Figure 1.1 illustrates an example of the confusion that can be caused by an overwhelming amount of detail. The experience of being overloaded or overwhelmed is captured in part by the psychological concept of cognitive load. That people have limited and measurable cognitive capacities has been known and studied for decades [79], and experimental psychologists have developed many experimental methods to measure the cognitive load imposed by various tasks. 1 2 CHAPTER 1. INTRODUCTION Figure 1.1: Scan path of somebody looking for visitor parking near the Clark center on the Stanford campus. Circles along the path show fixations, and their area indicates how long each fixation lasted. The person solving this problem spent a lot of time looking at places in the map and on the legend which were irrelevant to the task. The legend is so long that this person needed to refer to it several times. The map, being designed for general-purpose way-finding, contains so much detail that the information relevant to one particular task is hard to find. 3 Assessing the cognitive load imposed by visual tasks is important to the design of cognitively efficient visual interfaces. Most interfaces are visual, and many require people to shift attention between a variety of tasks with varying loads on perception, attention, memory, and information processing. The psychophysiological study of cognitive load in this context requires a physiological proxy which responds to load quickly and reliably reflects small differences in load. One such proxy is the tendency of the pupils to dilate slightly in response to cognitive loads. The use of pupillometry for studying the cognitive load of visual tasks is complicated by the pupils’ responsiveness to brightness and other features of visual stimuli. Nevertheless, pupillometry has many advantages. First, the quick reactivity of pupils (about 100 ms) enables study of the detailed, moment-by-moment timecourse of cognitive load during visual tasks. Second, pupil diameter can now be measured using high-speed remote infra-red cameras, without chin rests, bite bars, or head-mounted equipment [63], making it the least invasive of all psychophysiological proxies for cognitive load. Finally, pupil measurements are recorded as a side effect by most eyetrackers, so it is convenient to collect pupil diameter data, and such data are usually synchronized to gaze direction measurements, enabling the study of how cognitive load is related to the locus of attention. Measuring cognitive load can add depth to our understanding of visualization performance in a way that goes beyond time and errors [88]. Two people may display equal completion times or error rates on a task while devoting different levels of mental effort. An interface that allows people to achieve the same task performance, including completion time or error rate, with less effort than another is superior, because it frees the user to devote more attention to higher level tasks such as hypothesis formulation and pattern finding. There have been, however, many limitations to current cognitive pupillometry methods that make it difficult to apply to visualizations and visual interfaces. These limitations include: 1. Cognitive pupillometry requires a camera fixed to the head, which is inconvenient and can interfere with task performance. 4 CHAPTER 1. INTRODUCTION 2. Pupil dilations and contractions caused by the visual field, especially the pupillary light reflex, interfere with measurement of task-evoked pupil dilations. 3. Visual tasks are complicated, with many overlapping subtasks that occur with unpredictable timing, precluding the time alignment of data from multiple task instances required to detect task-evoked pupil dilations. 4. Pupil motions caused by motor activity confound task-evoked dilations, limiting the use of interaction in tasks studied with cognitive pupillometry. This dissertation expands the scope of cognitive pupillometry by addressing the first three of these limitations, enabling the use of trial-averaged pupillometry to measure cognitive load during simple visual tasks. Specifically, I 1. establish the viability of cognitive pupillometry using remote cameras, by measuring the pupillometric precision and accuracy of a remote video eye tracker (chapter 3) and by replicating classic cognitive pupillometry results using a remote eye tracker (chapter 4); and 2. extend the applicability of cognitive pupillometry in visual tasks, by repeating standard auditory cognitive load experiments using visual stimuli (chapter 5), by developing fixation-aligned averaging methods to to handle the unpredictability of visual attention (chapter 6), and by demonstrating the measurement of cognitive load in several visual tasks (section 6.3). In chapter 2, I give the background context for these contributions, including a definition of cognitive load (section 2.1), a survey of past cognitive pupillometry research (section 2.2), and a summary of the current state of the art in cognitive pupillometry (section 2.3). Most of the content of this dissertation is also published in conference proceedings and journals [61, 62, 63, 64]. I am the primary or sole author of all these papers. My advisors Pat Hanrahan and Barbara Tversky provided guidance on experimental design throughout my research, and Barbara helped me in particular with the presentation and context of the results comparing aural and visual presentation of simple 5 tasks (chapter 5). Rakshit Kumar implemented the frequency-space analysis of the correlation in dilations of the left and right pupils (subsection A.4.1). 6 CHAPTER 1. INTRODUCTION Chapter 2 Background Summary I discuss the operational definition of the term “cognitive load” and various physiological proxies that have been used to measure it. I briefly review the history of cognitive pupillometry in psychology and human-computer interaction research. I describe how cognitive pupil dilations are measured using infrared imaging and taskaligned averaging of pupil diameter measurements. 2.1 Cognitive load defined This dissertation is about new methods for measuring cognitive load. It is therefore appropriate to begin with a careful definition of that term. Psychologists have long used the physical analogies of “cognitive load” [88], “processing load” [12], or “effort” [56] to describe mental states during problem solving. Such physical metaphors are justified by findings that people have a limited capacity for cognitive tasks [79] and the fact that engaging in one mental task interferes with one’s ability to engage in others [56]. As with other vague psychological concepts, “cognitive load” gains definition through the experimental methods that are used to measure it. Dual-task experiments are the most commonly used operationalization of cognitive load, but for contexts where task interference causes problems, it is also 7 8 CHAPTER 2. BACKGROUND possible to use a variety of physiological proxies. Electroencephalography (EEG) and magnetoencephalography (MEG) measure changes in magnetic fields at the scalp caused by changing electrical currents in brain neurons. The main strength of these techniques is their millisecond-level time precision [37]. Brain imaging techniques based on the brain’s consumption of glucose (via PET scanning) or oxygen (via functional MRI) provide a more delayed response to cognition but enable 3D localization of brain activity with millimeter-level spatial precision, which has led to their widespread use in functional neurology and neuroanatomy [7, 45]. Because increased cortical activity causes a brief, small autonomic nervous response, techniques measuring non-neural secondary effects of this response are also used as a proxy for cognitive load. Such physiological effects include electrodermal activity [2], small variations in heart rate [103, 38], blood glucose [29, 107], peripheral arterial tone [48], electrical activity in facial muscles [16], the details of eye movements [76, 117, 84], and small dilations of the pupil, the focus of this dissertation. There is neurological justification for using these physiological proxies as measures of cognitive load, but in psychology, their operational justification comes from a body of findings associating them with differences in task difficulty and differences in individual task performance. Kahneman favored pupil dilations in his effort theory of attention because they exhibit sensitivity to three variables which should be expected of a proxy for effort: differences in difficulty grades within a single tasks, differences in difficulty between different kinds of task, and differences in individual ability [12]. Because this operational approach defines “cognitive load” as measurements of physiological proxies, based on desirable experimental properties of the proxies, the question of what the vague term “cognitive load” means does not arise. Pupil dilations elicited by tasks are cognitive load. 2.2 Past cognitive pupillometry research For full reviews of cognitive pupillometry research, see Goldwater [33], Beatty [12], Beatty and Lucero-Wagoner [13], Andreassi [3, ch. 12]. 2.2. PAST COGNITIVE PUPILLOMETRY RESEARCH 2.2.1 9 Cognitive psychology The earliest references to cognitive pupil dilations I am aware of are in the late 19th German neurology literature [104, 40, cited by Beatty and Lucero-Wagoner 13], though no work appeared in English until the publication in Science of two articles by Eckhard Hess [43, 44]. In a series of experiments, Hess found strong pupillary dilations in response to emotions such as interest [43], disgust [42], and sexual arousal [42]. Among the earliest tasks used to validate pupillary dilations as an index of cognitive load was mental arithmetic, especially mental multiplication. Simple multiplication tasks were part of Hess & Polt’s early experiments [44], and the task has since been used to show that pupil dilations also reflect individual differences in mental multiplication skill [1, also see section 4.2]. Performance on mental multiplication is believed to depend strongly on working memory [4], and similar results have been found for a broad set of short-term recall tasks. Kahneman found that the size of pupillary dilations directly reflects the current load on working memory in simple tasks requiring short term retention of a sequence of digits [57, 58], a result that has since been replicated and extended to other short-term recall tasks [e.g. 32, also see section 4.1]. Beatty and Kahneman [9] also observed dilations in response to long-term memory retrieval tasks, interpreting them as a reflection of information being retrieved from long-term and placed in short-term memory in preparation for response. Pupillary dilations have also been shown to be a reliable indicator of cognitive load in tasks that do not depend on working memory, such as vigilance [11, 77] and perceptual tasks [96]. Pupillometric studies of pitch discrimination [54, 106] and visual threshold flash detection [36] were used to show that these processes are essentially data-limited rather than resource-limited. With careful brightness controls, the effect has been used to study how performance on line length discrimination is related to general intelligence [119]. In the area of reading and language comprehension, pupil dilations have provided insight into many levels of processing from low-level character recognition [10] up to complex sentence comprehension [53] and language translation [47]. 10 CHAPTER 2. BACKGROUND 2.2.2 Human-computer interaction Recently, several human-computer interaction research groups have begun to use the pupillometric capability of head-mounted video eye trackers to measure cognitive load. Marshall [75] applied a wavelet decomposition to the pupil size signal in order to estimate the average number of abrupt discontinuities in pupil size per second, and used this measure as a general index of cognitive activity. Pomplun and Sunkara [95] described how to correct bias in observed pupil size based on gaze direction. Moloney et al. [81] used differences in pupil responses to distinguish older, visually impaired subjects from younger, visually healthy control groups performing a drag-and-drop task. Iqbal et al. [51] applied eye tracker pupillometry to show that mental workload drops at task boundaries in a multi-step task and can be used as an indicator of interruptability. 2.3 2.3.1 Cognitive pupillometry Infrared video pupillometry The most popular method for measuring pupil diameter is with video cameras under infrared illumination. When infrared light shines into the eye near the optical axis of the camera, it reflects off the retina much more efficiently than the cornea, causing the pupil to light up brightly relative to the iris [28] (see 2.3.1). This is the same effect responsible for red-eye artifacts in flash photography. In images captured under this illumination, the bright oval of the pupil is easy to segment and measure. The number of pixels spanned by this oval is converted into a measurement of pupil diameter via a foreshortening division that accounts for the distance between the pupil and the camera. 2.3.2 Cognitive pupillometry uses eye trackers In most modern studies, pupil measurements are made using equipment designed primarily for eye tracking, the measurement of the direction of a person’s gaze. Eye 2.3. COGNITIVE PUPILLOMETRY 11 infrared lights camera (a) Tobii 1750 (b) Pupil image Figure 2.1: The eye tracker illuminates the participant’s eyes using several infrared LEDs which surround the screen. The infrared light reflects efficiently off the participant’s retinas causing their pupils to appear very bright in the image recorded by the infrared camera mounted at the bottom of the screen. The image on the right is an illustration of this effect based on a visible-light photograph of an eye [110]. tracking requires high resolution imaging of the eye and often involves infrared illumination to aid in locating the center of the pupil. Extending such systems to measure pupil diameter is relatively easy, and off-the-shelf eye tracking systems today compute pupil diameter routinely. Because gaze tracking involves locating the center of the pupil, high precision eye trackers tend also to be high precision pupillometers. 2.3.3 Types of video eye trackers used for pupillometry Head-fixed camera pupillometry High precision measurements of pupil diameter depend on a setup in which the pupil spans many pixels in the camera image. This is most easily achieved by placing the camera close to the eye and fixing its position relative to the head, giving a large pupil image and avoiding any foreshortening errors caused by head motion after the initial calibration. There are two types: head mounted cameras, and large table-top systems with chin rests or bite bars used to immobilize the head (Figure 2.2). Headmounted systems are the most popular kind of eye tracker and are used commonly 12 CHAPTER 2. BACKGROUND Figure 2.2: Typical eye trackers used for cognitive pupillometry. The eye tracker on the left is an SMI iView X chin-rest style instrument, used primarily for reading and other high-precision applications [108]. The eye tracker on the right is the Polhemus VisionTrak Standard Head Mounted Eye Tracking System [93], used for mobile applications, especially driving and piloting. for pupillometry. Table-top systems with bite bars or other means of preventing head motion are the most precise video eye trackers available and so also provide the most precise pupil measurements. Remote camera pupillometry The alternative to configurations with a fixed camera–pupil distance involves a remote camera, not fixed with respect to the participant’s head. Cameras are typically located on the desktop or mounted at the bottom of the field of vision, because the view of the eyes from below is occluded by eyelids less often than the view from above. Figure 2.3.3 shows two off-the-shelf remote eye tracking systems. Because the camera is usually located 50-100 cm from the eye rather than the 10 cm or less used in chin-rest and head-mounted systems, these systems measure the pupil with lower precision. In addition, the freedom of head motion requires these systems to estimate the camera–pupil distance for each frame separately in order to implement the foreshortening division. They do this by tracking the 3D position of both eyes with respect to the camera, based on the positions of specular highlights on the surface of 2.3. COGNITIVE PUPILLOMETRY 13 (a) MangoldVision Eye Tracker, set up (b) Interactive Minds binocular Eyefor use with a standard desktop com- gaze Analysis System [49], with two puter [73] cameras mounted at the bottom of a computer display on motor-controlled gimbals to actively point directly at participants’ eyes. Figure 2.3: Two off-the-shelf remote eye tracking systems the eye caused by the same infrared LEDs used to illuminate the retina. Calibration errors in eye trackers can lead them to provide biased measurements of absolute pupil size. This problem is worse in remote camera systems, but because this bias is stable over time, measurements of relative pupil size are unbiased (see chapter 3). This better performance for relative pupil size is what matters for cognitive pupillometry, where the measurement of interest is usually changes in pupil diameter relative to their diameter at the end of an accommodation period preceding each trial [13]. Such dilation magnitudes have been found to be independent of baseline pupil diameter and commensurate across multiple labs and experimental procedures [12, 20, 19]. 2.3.4 Advantages of remote imaging All trial-aggregated cognitive pupillometry research I am aware of has been done using head-fixed cameras. I believe that researchers have made this choice for the better precision of head-fixed configuration. Equipment of this type is known to work and broadly available, so there has been no incentive to validate alternative measurement 14 CHAPTER 2. BACKGROUND equipment. Although remote cameras are not as precise at measuring pupils, they offer some important advantages. Some applications require remote imaging There are some applications which require remote, free-head eye tracking or pupillometry, such as studies with infants [22] or investigations of small changes in anxiety, distraction, or mental effort [97], where head-mounted equipment can interfere with the effects being measured. Marshall reported that some of her experimental subjects were bothered by wearing a head-mounted eye tracker, and that this may have distorted some of her results [75]. With remote eye tracking, the lack of head-mounted equipment and obvious screen-mounted cameras makes using an instrumented computer almost indistinguishable from normal desktop computer use. It is very easy for users to fall into their usual habits and behave normally. Remote imaging is becoming ubiquitous and cheap The eye tracking industry is still small, with most eye trackers costing more than $10,000 and marketed for research or disability applications. However, many manufacturers plan to move into mass-market eye tracking as soon as camera technology with sufficient resolution for eye tracking becomes cheap enough. Many laptop models currently integrate screen-mounted cameras. All that is needed to implement mass-market remote eye tracking is higher imaging resolution and perhaps an infrared light source, technologies which will become cheaper with time. To serve this future of low-overhead eye tracking, many researchers are developing calibration-free or minimal-calibration eye tracking methods [87, 39]. Mass-market gaze tracking will enable many new interactive and data collecting applications. For cognitive pupillometry to ride this wave of deployment, it will need to work with remote imaging. 2.3. COGNITIVE PUPILLOMETRY cognitive load cognition, emotion autonomic nervous response 15 pupil diameter brightness, contrast, etc. data measurement noise Figure 2.4: Sources of variation in measurements of pupil diameter. 2.3.5 Relative scales and the need for trial aggregation The magnitude of workload-related pupil dilations is usually less than 0.5 mm, smaller than the magnitude of other simultaneously ongoing pupil changes caused by light reflexes, emotions, and other brain activity, which collectively cause a constant variation in pupil size over a range of a few millimeters (see Figure 2.4). This difference in magnitude between dilations related to cognitive workload and the background pupil variability makes it impossible to distinguish the pupillary response to any one instance of increased cognitive load from the background “noise” of other pupil changes. In order to measure task-induced pupil dilations it is necessary to combine measurements from several repetitions of the task. One way to address this measurement challenge is to record pupil diameter during a long period of time that includes many task instances or repetitions, then either average pupil size over that long period [95], find consistent short-timescale changes via wavelet transforms [75], or apply frequency-domain analysis [84, 81] to assess aggregate cognitive load during that long task. An alternative to this aggregation technique allows the measurement of differences in cognitive load at a time scale of fractions of second rather than minutes. This precision is achieved by measuring pupil size during many repetitions of the same 16 CHAPTER 2. BACKGROUND short task, then aligning windows of pupil measurements temporally at the moment of task onset and averaging them [13]. The averaging operation will preserve any component of the pupil size signal which is correlated in time with the onset of the task (the task-evoked response), while measurement noise and other variation in pupil size not correlated in time with the stimulus will tend to average to zero. As more trials are conducted and included in the average, the ratio achieved between the level of the signal (pupillary responses caused by the task) and the noise (all other pupillary motions) becomes larger, and the time resolution of the average signal improves. All pupil dilations measurements reported in this dissertation are based on this trial-averaging method. The full details are described in section A.4. Chapter 3 The pupillometric precision of a remote eye tracker Summary This chapter describes a metrological study in which I determined the pupillometry precision of the Tobii 1750 remote eye tracker and a set of experiments in which I replicated classic cognitive pupillometry experiments performed originally on fixedhead equipment, to demonstrate that a remote-imaging eye tracker can successfully be used for cognitive pupillometry. Most of the content of this chapter was published at the 2010 Symposium on Eye Tracking Research & Applications [62]. 3.1 Motivation Most eye trackers used in cognitive pupillometry use head-mounted cameras or chin rests, because a fixed camera-pupil distance enables high pupillometric precision. In contrast, remote eye trackers use cameras placed further from and not fixed to the subject’s head. As a result, remote eye trackers devote fewer pixels to each pupil and must correct for variations in the camera-pupil distance and therefore exhibit worse pupillometric precision. However, because of experimental advantages of remote imaging, and because remote imaging is becoming ubiquitous and cheap, there is a 17 18 CHAPTER 3. REMOTE EYE TRACKER PERFORMANCE need to know whether and how well cognitive pupillometry can be done using remote imaging. The pupillometric performance of remote eye trackers is not well known, because this equipment is generally only used for eye tracking and not for pupillometry. Manufacturers do not currently optimize designs for pupillometric performance, and rarely document pupillometric performance in eye tracker specifications. Quantifying the precision of remote pupillometry is important, to establish the measurement feasibility of the equipment and to guide equipment choices and determine the number of participants and trials required to measure a given magnitude pupillary response using a remote eye tracker. In order to determine the pupillometric performance of the eye tracker I used in my research, I conducted a formal metrological study with respect to a calibrated reference instrument, a medical pupillometer. 3.2 3.2.1 Study description Evaluated instrument I evaluated the pupillometric performance of the Tobii 1750 remove video eye tracker [114], shown in Subfigure 3.1(a). This is the eye tracker I used for all the experiments described in this dissertation. The Tobii 1750 measures the size of a pupil by fitting an ellipse to the image of that pupil under infrared light, then converting the width of the major axis of that ellipse from pixels to millimeters based on the measured distance from the camera to the pupil. According to Tobii, errors in this measurement of camera–pupil distance cause measurements of pupil diameter to have errors of up to 5% for fixed-size pupils [113]. This 5% figure is a good start, but for guiding experimental design, we need to extend it by a) distinguishing bias and precision components of the error, and b) determining the average-case, rather than worst-case performance, because it is usually the averages of many repeated pupil measurements which are used to quantify task-evoked pupillary responses [13]. 3.2. STUDY DESCRIPTION 3.2.2 19 Reference instrument The reference instrument is a Neuroptics VIP-200 ophthalmology pupillometer, shown in Subfigure 3.1(b). The Neuroptics VIP-200 records two seconds of video of the pupil, then reports the mean and standard deviation of the pupil’s diameter over those two seconds. The manual for the Neuroptics VIP-200 reports its accuracy as “±0.1 mm or 3%, whichever is larger” [85]. I asked Neuroptics for clarification and learned: The accuracy reported in the manual is the maximum bound of the error and it refers to the possible error of each single frame during the two seconds measurement. The mean reported by the device is evaluated over all frames; in the hypothetical case that the pupil does not fluctuate, yes, this should result in a better accuracy. However, the pupil is always characterized by a level of neurophisiological [sic] “unrest” and the two seconds mean serves to eliminate the effect of this unrest in the determination of the pupil size [86]. If measurement errors are normally distributed and we make the conservative assumptions that a) “maximum bound” means two standard deviations and b) errors within a two-second window are perfectly correlated, giving us no reduction in error from the averaging, we get a reference instrument precision of about 0.05 mm. Since this measurement error is caused in part by imaging noise which is independent for each frame, the pupillometer’s true precision is probably better. Neuroptics calibrated the VIP-200 to zero bias when it was manufactured. 3.2.3 Procedure Three volunteers participated in the metrology study, which took place in an eye clinic exam room. After a pilot study of 56 measurements to refine the measurement and data recording procedure, I conducted a main study of 336 double measurements in which I measured participants’ pupils using the eye tracker and the pupillometer simultaneously. Because the pupillometer covers the eye it measures, I could not 20 CHAPTER 3. REMOTE EYE TRACKER PERFORMANCE (a) Tobii 1750 (b) Neuroptics VIP-200 Figure 3.1: The eye tracker and reference pupillometer conduct simultaneous measurements of the same eye using both instruments, so for each double measurement, the pupillometer measured one of the participant’s pupils while the eye tracker measured the other (Figure 3.2). The metrological validity of this study is therefore based on the strong correlation between the diameters of the left and right pupils [68]. Measurements taken with the eye tracker were averages over the 100 camera frames gathered in the same two-second measurement window used by the reference pupillometer. The measurements were conducted under various lighting conditions so that they would span a variety of pupil states: half under normal room lighting and half under dim lighting, where a third of the time I switched the lights on or off during the few seconds between successive double measurements (see Table 3.1). In all trials, subjects looked at a small fixation target at the center of the eye tracker’s screen, which was otherwise filled with 64 cd/m2 medium gray. I excluded 120 measurements in which I did not get a clean reading with the pupillometer and 10 measurements in 3.2. STUDY DESCRIPTION 21 Figure 3.2: Metrology study arrangement. An investigator is measuring the participant’s left pupil using the reference pupillometer while the eye tracker simultaneously measures his right pupil. 22 CHAPTER 3. REMOTE EYE TRACKER PERFORMANCE which I did not get a clean reading with the eye tracker, leaving 206 successful double measurements, analyzed below. 3.3 Metrology I present two different metrological analyses of these double measurements: the first, based on pupil diameters, is simpler and can use all of the data but is limited by strong assumptions. The second analysis, based on dilations, uses weaker assumptions but is restricted to a subset of the available data. 3.3.1 Pupil diameter metrology For both instruments, I model the measurement error as being additive and normally distributed: Π=π+ ∼ N (µ, σ), where π is the diameter of the pupil, Π is the measurement of that diameter, and is the measurement error. π and are random variables that take on new values for each measurement. Each instrument’s bias is the the fixed component of the measurement error µ, its accuracy is the magnitude of the bias |µ|, and its precision is the standard deviation of the measurement error σ. For the reference pupillometer (pm), µ[pm ] = 0 mm and σ[pm ] = 0.05 mm, according to information provided by its manufacturer. For the eye tracker (et), the parameters of the measurement error distribution µ[et ] (bias) and σ[et ] (precision) are what we are trying to determine. We can estimate these parameters by analyzing the differences between simultaneous measurements made with the eye tracker and the pupillometer: Πet − Πpm = (πet + et ) − (πpm + pm ) = πet − πpm + et − pm double measurements before 26 memorization and during the retention pause (maximum load on memory) Table 3.1: Breakdown of the lighting and task conditions used to induce pupil states and movements between double measurements. cognitive dilation digit span memory task 41 double measurements before and 28 after lighting change one double measurement 75 lights turned on during the reflex constriction trial stable narrow dim static lighting one double measurement Num. successful double measurements double measurements before and 36 after lighting change stable wide bright static lighting Measurement timing lights turned off during the reflex dilation trial Pupil reaction Trial type 3.3. METROLOGY 23 24 CHAPTER 3. REMOTE EYE TRACKER PERFORMANCE This is an equation of random variables. Considering the variance of each side: σ 2 [Πet − Πpm ] = σ 2 [πet − πpm + et − pm ] σ 2 [Πet − Πpm ] = σ 2 [πet − πpm ] + σ 2 [et ] + σ 2 [pm ] :o 2 2 σ 2 [Πet − Πpm ] = σ 2 [π et − πpm ] + σ [et ] + σ [pm ] σ 2 [Πet − Πpm ] = σ 2 [et ] + σ 2 [pm ] q σ 2 [Πet − Πpm ] − σ 2 [pm ] σ[et ] = (3.1) (3.2) The relationship in Equation 3.2 gives us a way to estimate the precision of the eye tracker based on the known precision of the reference pupillometer σ[pm ] and the variance in the differences between the simultaneous measurements σ 2 [Πet − Πpm ]. Similarly, we can compute the bias of the eye tracker based on the mean of those differences: µ[Πet − Πpm ] = µ[πet − πpm + et − pm ] µ[Πet − Πpm ] = µ[πet − πpm ] + µ[et ] − µ[pm ] :o *o µ[Πet − Πpm ] = µ[π − π ] + µ[ ] + µ[ ] pm et et pm µ[et ] = µ[Πet − Πpm ] (3.3) (3.4) Substituting the mean and variance of the actually observed differences Πet − Πpm in Equations 3.4 and 3.2, the eye tracker’s pupillometric bias is 0.11 mm, and its precision is 0.38 mm. These figures are misleading, however, because the bias and precision of the eye tracker varied substantially between the three participants and between the two eyes of each participant. Figure 3.3 shows the results of all 206 successful simultaneous measurements and illustrates this inter-subject variation. For each eye individually, the measurement error has a much narrower spread, but the average is wrong by as much as 0.67 mm. The accuracy and precision varies from eye to eye because the eye 3.3. METROLOGY 25 3.0 3.5 4.0 4.5 pupillometer measurement (mm) 5.0 o o oo oooo oo ooooo o ooooo ooo o o oo o o oo o o oooo o oooo ooo oo ooo oooooo o oo o o o o o oo Participant 3 Right Eye 2.5 ooo o o o oo oo oo ooo oo ooo oo o ooo oooo oo o o oo oooooo ooooo ooooo oooo oo ooo oo Participant 3 Left Eye −0.5 2.5 o Participant 2 Right Eye 3.0 0.0 oo o o oooooo oo ooo oooo oo Participant 2 Left Eye 3.5 0.5 o o o o ooooooo ooooo o oooo o Participant 1 Right Eye 4.0 o o o o oo o oo oo o o ooo oo o o oo ooo o o oo oo o o oo oo ooo o oo o o o o o o o ooo o o o oo o o ooo oo o o oo ooo oo o o oo oo o ooo o o ooo o o o oo oo o o o oo oooo ooo ooo oo ooo o ooo o o o o o oo o o o o o o ooo o o ooo o ooo oooo oooo oooo o ooooooooo ooo o ooooo o o oo oo ooo o oooo o oooooo o ooooo oooo oo oo ooo ooo ooo o oooo oo o oooooooo ooooo oooooo oooo ooooo oooo ooooo oo ooo oooo ooo o oo oo o o o o oo Participant 1 Left Eye 4.5 difference in measurements (mm) eye tracker measurement (mm) o o all data 1.0 5.0 Figure 3.3: The left graph shows the raw data of the metrology study, with each point representing a double measurement (Πpm , Πet ). Data from each participant and each subject are plotted in a different color. The right chart shows the differences between the eye tracker and pupillometer measurements, Πet − Πpm , broken down by study participant and eye, showing how the eye tracker’s pupillometric bias varies for each eye. tracker’s pupil measurements depend on its estimate of the camera-pupil distance, which is affected by errors in the eye tracker’s calibration to each eye’s corneal shape. Table 3.2 shows the result of applying Equations 3.4 and 3.2 to the data from each eye separately. Across all six eyes, I found an average bias of 0.34 mm (worse than the overall 0.11 mm) and an average precision of 0.12 mm (better than the overall 0.38 mm). Because it is differences in measurements for the same eye (dilations) that form the basis of most experimental use of pupillometry [13], and because pupillometric experiments are usually conducted with several participants, these per-eye results for the eye tracker’s bias and precision are the most relevant and are the ones summarized in Table 3.3. In Equation 3.3 of the derivation for accuracy, the term µ[πet −πpm ] was assumed to be zero. I ensured this zero mean left-right difference in pupil size by counterbalancing which of the two eyes was measured with which instrument within the trials for each participant. 26 CHAPTER 3. REMOTE EYE TRACKER PERFORMANCE Participant Participant Participant Participant Participant Participant 1 1 2 2 3 3 eye tracker accuracy (mm) eye tracker precision (mm) 0.61 0.34 0.30 0.67 0.12 0.03 0.15 0.11 0.09 0.08 0.17 0.11 0.34 0.12 left eye right eye left eye right eye left eye right eye mean Table 3.2: Breakdown of the diameter precision results for the eye tracker by study participant and eye. Similarly, the cancelation in Equation 3.1 of the derivation for precision assumes that term σ 2 [πet − πpm ] is zero. This assumption, that the difference in size between participants’ left and right pupils is constant throughout the study, is much stronger. Judging from pupil data I have recorded in a variety of studies, the assumption holds over short periods of time (a few minutes), but the left-right difference in pupil size can sometimes drift over the 15–20 minutes it takes to make the measurements of each participant. Violations of this assumption would lead to an underestimate of the average error the eye tracker. A more conservative analysis, based on differences in short-term dilations measured by each instrument, provides an alternative estimate of the eye tracker’s precision. 3.3.2 Pupil dilation metrology We can determine the pupillometric precision of the eye tracker using differences in measurements of dilations rather than absolute pupil diameters. Using δ = π2 − π1 to denote the dilation of the pupil from time 1 to time 2 and ∆ = Π2 − Π1 to denote the measurement of that dilation, 3.3. METROLOGY 27 ∆et − ∆pm = (Πet2 − Πet1 ) − (Πpm2 − Πpm1 ) = [(πet2 + et2 ) − (πet1 + et1 )] − [(πpm2 + pm2 ) − (πpm1 + pm1 )] = (πet2 − πet1 ) − (πpm2 − πpm1 ) + et2 − et1 + pm1 − pm2 = (δet − δpm ) + et2 − et1 + pm1 − pm2 (3.5) As before, now considering the variance of the random variables on each side of Equation 3.5: σ 2 [∆et − ∆pm ] = σ 2 [(δet − δpm ) + et2 − et1 + pm1 − pm2 ] :o 2 2 2 2 σ 2 [δ = et − δpm ] + σ [et2 ] + σ [et1 ] + σ [pm1 ] + σ [pm2 ] (3.6) = σ 2 [et2 ] + σ 2 [et1 ] + σ 2 [pm1 ] + σ 2 [pm2 ] = 2σ 2 [et ] + 2σ 2 [pm ] σ 2 [et ] = 12 σ 2 [∆et − ∆pm ] − σ 2 [pm ] q σ[et ] = 12 σ 2 [∆et − ∆pm ] − σ 2 [pm ] (3.7) (3.8) The cancellation in Equation 3.6 is based on the assumption that the difference between the left eye’s dilation and the right eye’s dilation is constant over a short period of time. I observed this fact in an earlier study conducted on lateralized pupillary responses, in which I tried several stimulus based ways of inducing different dilations in subjects’ two pupils but never succeeded in causing any significant leftright differences. I abandoned the effort after learning that the neuroanatomy of pupil size regulation renders such differences extremely unlikely [68]. Step 3.7 relies on the assumption that the bias of the measurement error is stable over time for both instruments (σ[pm1 ] = σ[pm2 ] and σ[et1 ] = σ[et2 ]). Among the 206 successful double measurements, there are 84 pairs of double measurements that took place within 30 seconds of each other. That is, there were 84 28 CHAPTER 3. REMOTE EYE TRACKER PERFORMANCE dilations with duration less than 30 seconds with starting diameters and ending diameters that were both measured with the two instruments simultaneously. Substituting the observed ∆et −∆pm in Equation 3.8 gives pupillometric precision of the eye tracker as 0.15 mm, slightly worse than the diameter-based precision of 0.12 mm. 3.4 Conclusion This chapter presents two analyses of measurements made simultaneously using the Tobii 1750 remote eye tracker and a medical pupillometer. The first analysis, diameterbased metrology (subsection 3.3.1), provided an estimate of the eye tracker’s pupillometric accuracy and—via a relatively strong assumption—a lower bound on the eye tracker’s pupillometric precision. The second analysis, dilation-based metrology (subsection 3.3.2), provided an alternative estimate of precision relying on fewer assumptions but also with less applicable data. The results of both analyses are summarized in Table 3.3, together with the resultant derived precision for binocular and dilation measurements. The Tobii 1750, which is typical of recent research-targeted remote eye trackers, has a binocular pupillometric precision of 0.15. While this performance is worse than the precision offered by head mounted (0.02 mm) or chin-rest systems (0.01 mm), it is good enough for task-averaged cognitive pupillometry. The background variation in pupil size which requires averaging over many task repetitions is on the order of 1.0 mm, so all of these systems have sufficient resolution to detect task-induced dilations. Increasing the resolution of a remote eye tracker’s camera would of course improve its pupillometric precision. Another means of improving the performance of remote camera systems is to use an active aiming system to point the camera directly at a user’s eyes to allow a narrow, eyes-only field of view even during head motion [49]. A similar improvement can also be gained by using a programmable CCD in which faster sampling rates and more image processing are applied to the region of the camera image containing the eyes, wherever they appear in the field of view. data dilation magnitude 0.15 0.12 0.10 0.08 0.21 0.17 0.15 0.12 monocularbinocular monocularbinocular mean mean pupil diameter eye tracker precision (mm) Table 3.3: Summary of the Tobii 1750’s pupillometric performance. Figures for monocular diameter accuracy and precision are the results of the metrological analysis above. Other figures in the table √ were then derived from these primary results. Dilation measurement precision is larger (worse) by a factor of 2, because it is based on the difference of two diameter measurements. When both eyes are measured and averaged, the precision in the √ estimate of their mean dilation (or diameter) improves by a factor of 2 over the monocular case. The difference between the 84 pairs of NA left eye’s dilation and the double measureright eye’s dilation is con- ments stant over 30 sec. The difference in size be- 206 double meatween the left and right surements pupils is constant over the study. assumption eye tracker diameter accuracy per-eye (mm) 0.34 3.4. CONCLUSION 29 30 CHAPTER 3. REMOTE EYE TRACKER PERFORMANCE Chapter 4 Replication of classic cognitive pupillometry results on a remote eye tracker Summary The previous chapter demonstrated that remote eye trackers have a binocular pupillometric precision of 0.15, which should be enough for cognitive pupillometry applications. However, because remote eye trackers have not yet been used for trialaggregated pupillometry, I conducted several basic experiments to see if they work. In this section, I report three of them. In choosing tasks, I sought to (a) span diverse types of cognitive load, (b) replicate well-studied tasks to enable comparisons to prior results, and (c) use simple stimuli that are easy to match between aural and visual presentation (see section 5.3). I chose mental multiplication, digit-span memory, and vigilance. The first two replicate classic cognitive pupillometry studies, to determine whether I could observe expected well-established pupil dilation patterns. The third experiment was original. Most of the content of this chapter was published at the 2008 Symposium on Eye Tracking Research & Applications [63] and in the journal Psychophysiology [64]. 31 32 CHAPTER 4. REPLICATION OF CLASSIC PUPILLOMETRY RESULTS 4.1 4.1.1 Digit span memory Background Short-term recall of a paced sequence of digits (also known as the digit span task) is the most popular experimental task in cognitive pupillometry. First used by Kahneman and Beatty [57], the task was also used to investigate the related processes of long-term recall [9], grouping [55], and rehearsal [58]. Peavler [91] showed that the pupil reaches a plateau diameter of about 0.5 mm around the presentation of the seventh digit. Granholm et al. [34] replicated this finding, confirming that pupil dilation averaging can be used to estimate both the momentary load and the maximum capacity of working memory. My experiment replicated the original Kahneman and Beatty [57] study. 4.1.2 Study description I ran 98 trials of this task with seven participants. Details regarding study participants, equipment, and procedures not specific to this task are described in Appendix A. I began each trial with a two-second pre-stimulus accommodation period, during which participants rested their eyes on a fixation target in the center of the screen in order to stabilize their pupils (see Subfigure 4.1(a)). I then presented a sequence of digits at the rate of one per second, spoken aloud over a speaker placed behind the eye tracker’s screen. After a brief retention pause, participants then reported back the sequence. In a departure from Kahneman and Beatty’s procedure, rather than speaking the product, participants typed it into an on-screen keypad using the mouse (Subfigure 4.1(b)). I randomly varied the length of the digit sequence for each trial between 6 and 8 digits. For this task, and for mental multiplication, where the tasks required numerical responses, I asked participants to type their responses into a low-contrast on-screen keypad. I did this to automate data collection and to avoid pupillary reflexes to varying brightness caused by looking away from the screen. Because button-press 4.1. DIGIT SPAN MEMORY 33 (a) Display during auditory stimulus (b) Display with keypad used for gathpresentation, with fixation target at the ering subject responses in Digit Span center Memory (section 4.1) and Mental Multiplication (section 4.2) tasks Figure 4.1: Participant fields of view during auditory experiments. responses themselves induce pupillary responses [99], and I could not avoid such interference by using spoken responses [17, 55], I limited our analysis to pre-response periods. 4.1.3 Results My findings matched those of Kahneman and Beatty [57]. In both experiments, pupil diameter increased as the digits to be memorized were heard and encoded, peaked during the pause while they were retained, and declined as the subjects reported them back (see Figure 4.2). The magnitude of the response increased monotonically with the length of the memorized sequence. In the 1966 study, subjects repeated the sequence aloud, one digit per second, while in my study, the response was entered using an on-screen numeric keypad (Subfigure 4.1(b)). This enabled a 34 CHAPTER 4. REPLICATION OF CLASSIC PUPILLOMETRY RESULTS Average Pupil Diameter (mm) 4.2 sequence length 7 digits 6 digits 5 digits 4.0 3.8 3.6 -10 -5 0 5 10 sequence length 8 digits 7 digits 6 digits −0.2 0.0 0.1 0.2 0.3 0.4 Change in pupil diameter (mm) 0.5 0.6 Time (seconds) −12 −10 −8 −6 −4 −2 0 2 Time (seconds) 4 6 8 10 12 Figure 4.2: Pupillary response during the digit span short-term memory task. The top graph shows the results reported by Kahneman and Beatty [57], and the bottom graph shows my results. The two graphs are aligned and plotted at the same scale. 4.2. MENTAL MULTIPLICATION 35 faster response and resulted in the observed steeper decline in pupil diameter in my results. 4.2 Mental multiplication Mental multiplication is one of the oldest tasks studied with cognitive pupillometry, with the first experiments conducted in the 19th century [Heinrich 40, cited by Beatty and Lucero-Wagoner 13]. Hess and Polt [44] triggered broad interest in cognitive pupillometry when they reported that solving mental multiplication problems caused pupil dilations and that harder problems evoked larger dilations. Their results were replicated by Bradshaw [18] for mental division with remainders; Boersma et al. [15] for mental addition in a study of mental retardation; and Ahern and Beatty [1] in a study of the effect of individual differences in ability as measured by SAT scores. Recently, Marshall [75] used a mental arithmetic task to validate a wavelet-based method of analyzing pupil measurements. 4.2.1 Study description I ran 65 trials of this task with seven participants. Details regarding study participants, equipment, and procedures not specific to this task are described in Appendix A. As in the study of the digit span task, I began each trial with a two-second prestimulus pupil accommodation period. I then presented the participant with two numbers, the multiplicand and multiplier, separated by two seconds. Five seconds after I presented the multiplier, participants were prompted for the two numbers’ product. As I did for the digit span task, I departed from the original experiment’s procedure by using an on-screen keypad to record the participant’s response (4.1.2). For each trial, I randomly selected a difficulty level of easy, medium, or hard, then chose the multiplier and multiplicand randomly according to Ahern and Beatty’s definition of these difficulty levels: easy problems took the form {6, 7, 8, 9} × {12, 13, 14} (e.g. 7 × 13), medium were {6, 7, 8, 9} × {16, 17, 18, 19}, and hard {11, 12, 13, 14} × 36 CHAPTER 4. REPLICATION OF CLASSIC PUPILLOMETRY RESULTS {16, 17, 18, 19}. I instructed participants not to provide a response in cases when they forgot one of the two numbers or gave up on computing their product. 4.2.2 Results There was a small (0.1 mm) increase in pupil size as the multiplicand was committed to short term memory and a larger, longer-lasting increase after the subjects heard the multiplier and began computing the product (See Figure 4.3). Although I gave problems at all three difficulties, the easy level was the only one for which I collected sufficient correct responses for analysis. The pupillary response I observed for these easy problems resembles the prior result for medium and difficult problems. I speculate that students in 1979 had more practice with mental arithmetic. 4.3 Vigilance The mental multiplication and digit span tasks are both strongly dependent on working memory. I designed my third experiment to investigate pupil dilations evoked by less memory-dependent processes, using a task that requires intermittent vigilance, stimulus discrimination, and speeded motor responses. 4.3.1 Study description I ran 94 trials of this task with eight participants. Details regarding study participants, equipment, and procedures not specific to this task are described in Appendix A. In each trial, I presented an ascending sequence of numbers from 1 through 20. I told participants that the sequence might progress normally or might contain errors at the number 6, 12, and/or 18. When they noticed an error (a target), they were to push a button as quickly as possible. For example, part of the sequence might be “. . . 10, 11, 12, 13, . . . ”, in which case I instructed the participants to do nothing, or it might be “. . . 10, 11, 7, 13, . . . ”, in which case I told them to push the button as soon as possible after noticing the “7”. I inserted sequence errors (targets) at the 4.3. VIGILANCE 37 multiplicand presented Change in pupil diameter (mm) 0.5 multiplier presented DIFFICULT MEDIUM 0.4 0.3 0.2 0.1 EASY 0.0 -0.1 0 2 4 6 8 0.5 Time (seconds) multiplier spoken EASY −0.1 0.0 0.1 0.2 0.3 Change in pupil diameter (mm) 0.4 multiplicand spoken 0 1 2 3 4 5 Time (seconds) 6 7 8 9 Figure 4.3: Pupillary response during the mental multiplication task. The top graph shows the results reported by Ahern and Beatty [1]. The bottom graph shows the results from my replication of their experiment. The two graphs are aligned and plotted at the same scale. Change in pupil diameter (mm) 38 CHAPTER 4. REPLICATION OF CLASSIC PUPILLOMETRY RESULTS 0.2 0.1 0.0 -0.1 possible target 0 5 possible target 10 possible target 15 20 Time (seconds) Figure 4.4: Pupillary response to an aural vigilance task. The grey bars mark moments when subjects needed to listen carefully and react quickly to mistakes in a spoken sequence of numbers. three possible positions independently and randomly with probability one half. Thus any trial could contain 0, 1, 2, or 3 targets, and participants knew exactly when the targets might appear. “6” was never replaced by “16,” nor “18” by “8,” so that errors were apparent from the start of each spoken target stimulus. Unlike my experiments with digit span memory and mental multiplication, this experiment did not replicate a past study, though it incorporated aspects of prior experiments. Beatty [11] found pupil dilations evoked by target tones in an auditory vigilance task, though in that experiment target locations were randomized, so that participants could not anticipate them, and continuous rather than intermittent vigilance was required. The anticipated increase in vigilance required by this task was studied by Richer et al. [100]. 4.3.2 Results I observed sharp spikes in pupil diameter with consistent magnitude, onset timing, duration, and shape following all three mistake points (see Figure 4.4) 4.4. CONCLUSION 4.4 39 Conclusion For all three tasks, I observed patterns of pupil dilation with timing matched to the details of the task. In the digit span tasks, the dilation profile tracked the number of digits held in memory over time. For mental multiplication, a small dilation followed the presentation of the multiplicand and a larger, longer dilation followed presentation of the multiplier. For counting vigilance, dilations occurred at each of the possible mistake points. For the two tasks which replicated classic studies, my results matched the standard findings. These findings confirm that the Tobii 1750 remote eye tracker has sufficient precision to measure task-evoked pupillary dilations. 40 CHAPTER 4. REPLICATION OF CLASSIC PUPILLOMETRY RESULTS Chapter 5 From auditory to visual Summary Pupil dilation magnitude has been shown to be a valid and reliable measure of cognitive load for auditory tasks. Because the pupil dilates for reasons other than cognitive load, especially changes in brightness, assessing cognitive load in visual tasks has been problematic. I review the pupillary light reflex and other non-cognitive sources of pupil motions and how they can be controlled experimentally, including a novel method for compensating for pupillary blink reflexes. I describe a repetition of the three studies described in chapter 4, in which visual stimuli are used instead of auditory. These studies found that remote cognitive pupillometry works well for the visual versions of digit span memory, mental multiplication, and vigilance, and that visual versions of the tasks all evoke smaller pupil dilations than the auditory versions. Most of the content of this chapter was published in the journal Psychophysiology [64]. 5.1 The need to use visual stimuli Developments in graphics have brought interfaces, newspapers, textbooks, and instructions which increasingly present changing visual information. Viewers need to attend to, search through, and evaluate this information in order to integrate it. Are visual interfaces the best way to present this information or might cognitive load be 41 42 CHAPTER 5. FROM AUDITORY TO VISUAL lessened with auditory presentation? Are the parameters of cognitive load similar for visual and auditory presentation? Kahneman utilized pupillary dilations extensively [e.g. 59, 55, 58] and used pupillary dilations as the primary empirical foundation for his attention theory of effort [56]. He identified three criteria desirable for physiological proxies for effort and which he observed in pupillary dilations: differences in the magnitude of averaged pupillary dilations reliably reflect (a) different difficulty levels of a single task, (b) differences in difficulty across qualitatively different tasks, and (c) individual differences in ability. In a review nine years later, Beatty [12] reaffirmed that the experimental evidence then available showed that pupillary dilations fulfill all three of Kahneman’s criteria. To my knowledge, nobody has examined the effect of aural vs. visual presentation mode itself on the magnitude of pupillary dilations. This lack of data confounds the use of dilations for comparing cognitive loads between visual and aural tasks, because it can not be known how much of the difference is caused by the difference in presentation modalities and how much is caused by differences in post-perception task demands. In other words, it is still not known whether Kahneman’s second criterion, inter-task comparability, is fulfilled by pupil dilations when used to study visual as well as auditory tasks. The following section reviews the pupillary light reflex and other non-cognitive sources of pupil dilations and how they can be controlled experimentally, including a novel method for compensating for pupillary blink reflexes. The rest of the chapter describes a replication of the three auditory tasks described in chapter 4, this time using visual instead of auditory stimuli, in order to see the difference caused by presenting tasks visually. 5.2 5.2.1 Controlling for non-cognitive pupillary motions Pupillary light reflex The largest potentially confounding pupillary motion is the pupillary light reflex, which is much larger in magnitude than cognition-induced pupil changes [68]. 5.2. CONTROLLING FOR NON-COGNITIVE PUPILLARY MOTIONS 43 I followed standard practice [e.g. 118, 82], and dealt with this problem by avoiding it. In all of my studies, I maintained constant visual field luminance across experimental conditions. Additionally, I used isoluminant pre-stimulus masks to avoid luminance changes at stimulus onset (see subsection 5.3.2). A few researchers have have attempted to adjust pupil diameter data to compensate for the overall luminance of stimuli [94, 83], but these approaches only model constant luminance, so they are not yet applicable to trial-averaged cognitive pupillometry. 5.2.2 Luminance changes caused by shifting gaze Experiments in which participants shift their gaze to look at many parts of a visual stimulus, including studies of reactions to photographs [67, 26], visual search [5, 97], and visual scanning [95, 117] are subject to pupillary light reflexes when participants fixate on local areas of the stimulus with varying luminance even though the overall luminance of the stimulus does not change. Reading studies, in which textual stimuli have relatively uniform local luminance and consistent fixation sequences, are not as vulnerable to this problem and have successfully measured small task-evoked pupillary responses amidst active eye movements [e.g. 53]. I controlled for saccade-induced luminance changes by presenting all stimuli at a fixed location within an area small enough to fall within the fovea, and by helping participants to keep their gaze fixed by presenting a fixation target at all times and keeping trial durations under 20 seconds. 5.2.3 Other visual causes of pupil changes In addition to the light reflex and the cognitive load response, the pupil also exhibits small dilations or contractions in response to changes in accommodation distance [69], contrast [116], spatial structure [25] and the onset of coherent motion [101]. Kohn and Clynes [65] showed that simply changing the color content of a visual stimulus, without changing either local or global luminance, can cause the pupils to either dilate or contract, depending on the nature of the color change. Many of these effects have been explained as special cases of the pupillary light reflex caused by local neighbor inhibition on the retina [60]. I controlled for all of these influences on pupil size by 44 CHAPTER 5. FROM AUDITORY TO VISUAL using achromatic, fixed-distance, non-moving, constant-contrast stimuli. Fatigue and habituation Over a long experiment, the baseline diameter of the pupil gradually declines [42], an instance of the general affect of fatigue on pupil diameter [71, 91]. In addition, over many replications of the same stimulus the magnitude the resultant pupil dilations gradually decreases [72, 66, cited in Tryon [115, p. 91]]. These effects make pupillometry suitable for the measurement of operator fatigue, but when focusing on tasks rather than people, it is important to control for these effects. In my experiments, I limited the duration of experimental sessions to one hour, including 30 minutes of actual measurements, and limited any one trial type to 50 repetitions per participant. All trial repetitions were initiated by participants, and I told them that they could take a break whenever they wanted; only a few participants ever did so. 5.2.4 Pupillary blink response Blinks cause the pupils to very briefly contract and then recover to their pre-blink diameter [27, cited by 115, p. 91]. Normally, this reaction is controlled in experiments by instructing participants not to blink during trials. For standard pupillometry studies in which the tasks are short (less than ten seconds) to enable trial averaging, suppressing blinks is not difficult. But in order to extend cognitive pupillometry to visual tasks, with less controlled structure and longer duration, a method is needed to compensate for pupillary blink responses. To the extent that blinks occur randomly, pupillary blink responses add noise to averaged pupil diameter measurements, and to the extent that blinks are correlated with stimuli, pupillary blink responses adds bias to averaged pupil diameter measurements. I pooled data from twenty thousand binocular blinks that occurred during several of my eye tracking studies, grouped the blinks by duration, and averaged them to determine 3-second-long blink response correction signals. I observed blink responses consisting of a very brief dilation of about 0.04 mm, followed by a contraction of about 0.1 mm and then a gradual recovery to pre-blink diameter over the next two 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 45 seconds. The timing and magnitude of these changes depend on the duration of the blink. In data processing of each study, I then removed the pupillary blink responses by altering the data following each blink by subtracting the blink response correction signal corresponding to the length of that blink. Figure 5.1 shows the blink correction signal for blinks that lasted five samples (100 ms). For stimulus-correlated blinks, the general effect of this correction is to decrease the magnitude of pupillary responses measured in the first second following a blink by about 0.03 mm and increase the magnitude of pupillary responses measured in the second second following a blink by about 0.05 mm. For stimulus-uncorrelated blinks, the general effect of this correction is to remove measurement noise and thereby decrease the standard errors of the mean in stimulus-locked averages of dilation magnitude. This correction applies to data gathered after each blink. For the missing data points that fall during the blink itself, I followed the standard practice of filling the gaps with linear interpolation. Because this is a new data processing technique for pupil data, I re-ran the analysis of auditory vs. visual stimuli without blink response correction and found that the correction did not change the significance of any of my results and changed the effect sizes by only 0.005–0.01 mm, suggesting that blinks were not well-correlated with stimuli for the tasks I examined and contributed only noise to the stimulus-aligned averages. 5.3 Visual replication of classic auditory studies In my visual replication of the three auditory cognitive pupillometry studies described in chapter 4, I took care to control for all known non-cognitive pupillary reflexes. The visual conditions employ visual fields with matching brightness and contrast to the original auditory studies; the difference is that in the aural conditions, the taskrelevant stimuli were heard, and in the visual conditions they were seen. Because visual perception is generally believed to involve less effort, but the subsequent central processing demands were matched between the two presentation conditions, I expected dilations evoked by visually presented tasks to start out smaller but 46 CHAPTER 5. FROM AUDITORY TO VISUAL pupillary changes around blinks of length 5 (average of 1575 blinks) Observed blink reaction Derived Blink Correction Signal blink 0.04 Change in pupil diameter (mm) 0.02 0.00 −0.02 −0.04 −0.06 −0.08 −1 0 1 2 3 Time (seconds) Figure 5.1: Pupillary blink response for blinks with a 5-sample (0.1 sec) duration. Note that the vertical scale is much smaller than other pupil traces. This blink response was subtracted from data gathered after every blink with this same duration. Similar blink responses were computed and used for blinks with durations up to 25 samples (0.5 sec). 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 47 to eventually reach the same peak diameter as those evoked by the aurally presented versions. I also expected this difference in effort to be reflected in lower error rates and quicker responses in the visual conditions. 5.3.1 Procedure For details of experimental procedure common to all studies, see Appendix A. 5.3.2 Stimuli As in the auditory versions of these experiments, stimuli for all experiments were numbers between 1 and 20. Under the auditory condition, stimuli were 500 ms digitized recordings of spoken numbers played over a computer speaker placed directly behind the screen. Under the visual condition described here, I displayed these numbers at the center of the eye tracker’s integrated 17-inch 1280 × 1024 LCD screen. I used a 28-point font size so that the digits spanned 0.73◦ (about a third of the foveal span) when viewed from participants’ initial seating distance of 60 cm. These numerals were black, and the rest of the screen was always filled with a uniform background of 64 cd/m2 medium gray. The onset timing and duration of visual number presentation were matched to the timing used in my auditory study. During periods of time with no stimulus (between trials, during the pre-stimulus pupil accommodation period, and in between presentation of numbers during the task), where the auditory experiment used silence, I masked the stimulus by displaying an “X” at the center of the screen in place of a number, in order to remove contrast and brightness changes caused by the appearance or disappearance of the numerals. The absence of clear constrictions following the time of visual stimulus change in the visual waveforms provides evidence that these stimulus changes per se had little effect on the pupil in my experiments. 48 CHAPTER 5. FROM AUDITORY TO VISUAL 5.3.3 Digit sequence memory All prior investigations of pupil dilations evoked by the digit span recall task (see section 4.1) presented the digit sequence aurally. This study is another replication of the original Kahneman and Beatty [57] study, with visual rather than auditory presentation of the numbers. I ran 607 repetitions of this task with 17 experimental participants. Unlike the auditory study, where I used sequences of length 6, 7, or 8 digits, I randomly varied the length of the presented sequence for each trial independently between 3 and 8 digits. I used the first two seconds of the retention pause following presentation of the digit sequence as the response window for pupil diameter averaging and significance testing, because this is the moment when Kahneman and Beatty [57] observed maximum dilations. Results Averaged pupil traces from both auditory and visual versions of this experiment are compared in Figure 5.2. Under both auditory and visual presentation, changes in pupil diameter followed the same qualitative pattern observed by Kahneman and Beatty’s auditory study: participants’ pupils gradually dilated as the digits were memorized, reached a peak two seconds after the final digit, during the pause while the sequence was retained in memory, then gradually contracted as the participants reported the digits back. I observed a faster post-retention constriction than Kahneman and Beatty [57], probably because he used paced recall, and my participants typed their response into the on screen keyboard, usually faster than the one digit per second rate used by Kahneman and Beatty. Dilation magnitude by presentation mode Aural presentation caused significantly larger pupil dilations during the retention pause than visual presentation (M = 0.44 mm, SD = 0.22 mm vs. M = 0.24 mm, SD = 0.17 mm; F (1, 20) = 5.9, p = .02). 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES Aural Visual sequence length 0.6 sequence length 0.6 8 digits (35 trials) 7 digits (16 trials) 6 digits (15 trials) Change in pupil diameter (mm) Change in pupil diameter (mm) 0.4 7 6 5 0.2 3 4 5 6 6 5 4 3 2 3 4 1 1 8 7 0.2 2 0 2 end of sequence 4 6 8 Time (seconds) 1 12 1 2 response prompt 10 14 7 6 6 0.0 1 response prompt 0.4 2 0.0 end of sequence 8 digits (41 trials) 7 digits (64 trials) 6 digits (75 trials) 5 digits (122 trials) 4 digits (63 trials) 3 digits (73 trials) 8 7 49 0 1 3 2 2 1 4 3 2 5 5 1 4 2 3 1 3 4 6 5 5 4 3 2 4 3 2 4 6 8 10 12 Time (seconds) Figure 5.2: Pupil dilation evoked by a digit-span memory task presented aurally (left) and visually (right). The two charts are aligned and plotted at the same vertical scale. The numbered circles on each line show the times at which each digit was spoken (auditory presentation) or displayed (visual presentation). The curves are each shifted horizontally so they are aligned at end of the stimulus sequence. Thus the longest sequence (8 digits) starts the furthest to the left. Aural presentation caused larger dilations than visual, and under both presentation modes, longer memorization sequences elicited larger pupil dilations. 50 CHAPTER 5. FROM AUDITORY TO VISUAL Dilation magnitude by task difficulty I found a significant effect of sequence length on the magnitude of pupil dilations during the retention pause (F (3, 60) = 3.73, p = .02, ˜ = .96; see Figure 5.2). The magnitude of the dilation increased monotonically with the length of the memorized sequence. Task performance Considering sequences of all lengths, participants made significantly more recall errors under auditory (30%) than visual (24%) presentation; χ2 (1, N = 1232) = 3.94, p = .02, though this result is reversed if only the longest (length 7 and 8) sequences are considered. Average digit span was 6.0 digits for auditory presentation and 5.6 digits for visual. Error rates for all tasks are shown in Figure 5.8. Discussion This experiment compared cognitive load under short-term memorization of aurally and visually presented digit sequences. As with the mental arithmetic task, the qualitative shape of average pupil dilations was similar in both presentation modes, but the magnitude of dilations was smaller under visual presentation. Although visual presentation led to significantly greater overall performance, the difference was not large, and rates of recall for the longer sequences and average digit span scores suggest a small performance advantage for auditory presentation. A general advantage to serial recall under auditory presentation, especially for items late in the sequence, is well documented [92; 35, p. 22; but see 8]. My findings on recall performance are mixed, but the larger dilations I observed in the auditory condition suggest that task performance in this mode comes with the cost of higher cognitive load. 5.3.4 Mental multiplication I similarly repeated my study of the standard mental multiplication task (see section 4.2), again replacing spoken numbers with numbers displayed on the screen. I ran 431 repetitions of this task with 12 experimental participants. As with the digit 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 51 span task, I ran the study with a visual condition with timing matched to the auditory study, but I also added a second timing variant. In the sequential treatment, which replicates Ahern and Beatty [1], the multiplicand and multiplier were presented one after the other with timing matched to the auditory study. In the simultaneous treatment, both numbers were shown on the screen together for the full eight seconds between the pre-stimulus accommodation period and the response prompt. This simultaneous and continuous presentation was intended to remove the requirement that subjects quickly read and remember the short-lived stimuli and thereby isolate the cognitive load imposed by mental multiplication from that caused by remembering the numbers. As in the auditory study, I instructed participants not to provide a response in cases when they forgot one of the two numbers or gave up on computing their product. This occurred in 10% (65/632) of the trials, mostly for hard problems. Since these trials didn’t involve mental multiplication, I excluded them from analysis. Nine participants had memorized the multiplication table through 12×12 and the rest through 10 × 10, so all but a few of the easiest problems required mental computation beyond simple recall. Results Dilation magnitude by presentation mode Presentation mode affected the overall magnitude of pupil dilations but not their qualitative shape. The onset timing, duration and overall shape of pupil dilations caused by mental multiplication was the same for both auditory and visual presentation. The size of participants’ dilations, however, was significantly larger in the auditory condition (M = 0.35 mm, SD = 0.11 mm vs. M = 0.16 mm, SD = 0.13 mm; F (1, 22) = 12.1, p = .002). This difference in magnitude is clear in Figure 5.3, which shows the pupil dilation evoked by the mental multiplication task, averaged across all trials and participants and broken down by task presentation mode. Dilation shape and magnitude by visual presentation timing The pupil dilation evoked by problems with both components visible simultaneously for eight 52 CHAPTER 5. FROM AUDITORY TO VISUAL 0.5 multiplicand presented multiplier presented response prompted Change in pupil diameter (mm) 0.4 0.3 0.2 0.1 aural (37 trials) visual (165 trials) 0.0 0 2 4 6 8 10 Time (seconds) Figure 5.3: Average pupil dilation evoked by visually and aurally presented mental multiplication problems. The two presentation modes elicited dilations with similar timing, duration, and shape, but different magnitude. Vertical lines show the times during which the two numbers were spoken or displayed and the time during which the participants responded. In this and other figures, the shaded region enclosing each curve shows the standard errors of the mean for the average pupil diameter represented by that curve. 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES Sequential (165 trials) 0.30 multiplicand presented Simultaneous (365 trials) multiplier presented 0.30 response prompted multiplier and multiplicand presented response prompted 0.25 Change in pupil diameter (mm) 0.25 Change in pupil diameter (mm) 53 0.20 0.15 0.10 0.20 0.15 0.10 0.05 0.05 0.00 0.00 0 2 4 6 Time (seconds) 8 10 0 2 4 6 8 10 Time (seconds) Figure 5.4: Comparison of pupil dilations evoked by sequentially and simultaneously presented mental multiplication problems. The left panel shows the average dilation in trials where the multiplier and multiplicand were shown briefly and one after the other; these show the same data as the blue/dashed (visual) curve in Figure 5.3. The right panel shows the average dilation in trials where the two numbers were shown together and continuously for eight seconds. seconds had a different pattern, shown in comparison to the visual sequential treatment in Figure 5.4: a single long dilation and contraction, rather than the two peaks I observed in the sequential case. In addition, the mean pupil dilation was smaller in the simultaneous case (M = 0.13 mm, SD = 0.11 mm vs. M = 0.30 mm, SD = 0.13 mm; F (1, 22) = 10.3, p = .004). This result is not surprising, because the simultaneouspresentation trials lack a second stimulus event to cause a second peak, and these trials were easier to solve, because they did not require participants to remember the two presented numbers. Dilation magnitude by task difficulty Consistent with prior investigations of mental arithmetic, I found a clear difficulty effect on dilation magnitude. (See Figure 5.5). Easy multiplication problems caused the smallest pupil dilations (M = 0.17 mm, SD = 0.19 mm), hard problems the largest (M = 0.27 mm, SD = 0.16 mm), with dilations to medium problems in between (M = 0.21 mm, SD = 0.15 mm). 54 CHAPTER 5. FROM AUDITORY TO VISUAL multiplier and multiplicand presented response prompted Change in pupil diameter (mm) 0.3 0.2 0.1 0.0 HARD (83 trials) MEDIUM (153 trials) EASY (129 trials) -0.1 0 2 4 6 8 10 Time (seconds) Figure 5.5: Difficulty effect on pupil dilation evoked by mental multiplication of two numbers displayed together for eight seconds. The data shown are the same as those in the right panel of Figure 5.4, here separated by difficulty. These differences were significant (F (2, 30) = 13.1, p = .0008, ˜ = .67). Task performance by presentation mode Participants made significantly more errors on aurally presented problems (40%) than visually presented problems (25%); χ2 (1, N = 632) = 3.39, p = .03. Error rates for all tasks are shown in Figure 5.8. Discussion This experiment compared cognitive load under auditory and visual presentation of mental arithmetic problems. The overall pattern of task-evoked pupil dilations 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 55 was similar in both conditions and replicated previous auditory work. Intriguingly, both the better performance under visual presentation and greater cognitive load under auditory presentation suggest an advantage for visual presentation of mental arithmetic. This may be because post-stimulus visual persistence alleviates some load on working memory. 5.3.5 Vigilance I also repeated my pupillometric study of a counting sequence vigilance task (section 4.3), again replacing auditory stimuli with visual while controlling the brightness and contrast of the visual field. I ran 231 repetitions of this task with 17 experimental participants. Results Dilation magnitude by presentation mode Figure 5.6 shows the average dilation evoked by the vigilance task, comparing aurally- and visually-presented trials. Both conditions elicited strong dilation peaks beginning about one second before and peaking 500–1000 ms after each moment when participants were alert for mistakes in the counting sequence. The one-second anticipatory dilation is consistent with measurements of the readiness potential made using scalp electrodes by Becker et al. [14], who found evidence of motor preparation beginning a bit more than one second before action, and is shorter than the 1.5 second lead observed by Richer et al. [100] before the presentation of an action-determining stimulus. For significance testing, I used a wide response window, starting three seconds before each moment when a target could occur and ending three seconds after, encompassing both the pre-stimulus anticipatory dilation and the post-stimulus motorresponse peak. The mean dilation in the auditory presentation condition (M = 0.096 mm, SD = 0.048 mm) was significantly larger than for visual presentation (M = 0.057 mm, SD = 0.046 mm); F (1, 23) = 7.93, p = .01. 56 CHAPTER 5. FROM AUDITORY TO VISUAL 0.4 Change in pupil diameter (mm) aural (78 trials) visual (180 trials) 0.3 0.2 0.1 0.0 -0.1 possible target 0 5 possible target 10 possible target 15 20 Time (seconds) Figure 5.6: Average pupil dilations evoked by a vigilance task presented aurally and visually. The vertical grey bars show the moments at which participants were vigilant for mistakes in a counting sequence (“targets”). The aurally presented task led to larger dilations, but the two presentation modes elicited dilation profiles with similar shape and timing. 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 57 Dilation onset and peak latency by presentation mode In contrast to the digit span and mental multiplication studies, the three task repetitions in each of this study’s trials effectively tripled the number of trials available for analysis and so provided enough data to pinpoint the peak dilation precisely in time and revealed a minor timing difference between the dilations for auditory and visual vigilance. Whether the target was present or absent, the dilation began and peaked slightly later under auditory presentation (see Figure 5.7). This slightly later dilation evoked by auditory stimulus was probably due to the time taken for the stimulus to be presented, because hearing is generally believed to have lower latency than vision [120, 80]. This interpretation is consistent with the difference in mean reaction time I observed: 410 ms (SD = 111 ms) for visual presentation and 713 ms (SD = 140 ms) for auditory. Dilation magnitude and timing by target presence At every potential mistake point, whether or not a target is present, this task required heightened vigilance, motor response preparation, and comparison of the presented number with the expected correct sequence number. I therefore expected dilations in both cases to be similar, perhaps with slightly larger or longer dilations in cases where targets actually appeared, caused by error recognition and/or the additional requirement of carrying out the motor response. I checked this hypothesis by grouping all time segments surrounding moments when the targets were present and averaging them separately from those when the targets were absent. The resultant pupil dilation averages are shown in Figure 5.7. Pupil dilations evoked by targets were larger and longer than those measured during moments when targets were possible but did not appear (M = 0.10 mm, SD = 0.046 mm vs. M = 0.037 mm, SD = 0.047 mm; F (1, 23) = 22.8, p ¡ .0001). The averaged pupil diameter trace for cases with a target (right side of Figure 5.7) showed a secondary peak about 1.5 seconds after the target appeared. Because mean response time was 515 ms (SD = 188 ms), the latency between response and this secondary peak was about one second. Because Richer and Beatty [99] observed similar dilation-response latencies in a non-reactive button pushing task, and because 58 CHAPTER 5. FROM AUDITORY TO VISUAL Target Absent 0.4 aural (101 trials) visual (285 trials) 0.3 Change in pupil diameter (mm) Change in pupil diameter (mm) 0.4 Target Present 0.2 0.1 0.0 -0.1 aural (133 trials) visual (255 trials) 0.3 0.2 0.1 0.0 moment when target -0.1 might be presented -4 -2 0 Time (seconds) 2 moment when target might be presented -4 -2 0 2 Time (seconds) Figure 5.7: Target effect on pupil dilations evoked by heightened vigilance. The data shown are the same as those in Figure 5.6. Each trial had three moments at which I told participants to expect possible targets, which occurred independently at each moment with probability one half. The chart on the left shows the mean dilation in moments in which a target did not occur, and the chart on the right shows the mean dilation in moments when a target did occur. Targets elicited longer and larger pupil dilations, with a secondary peak about 1.5 seconds after target presentation. This secondary peak corresponds to the motor activity of responding to the target’s presence. Whether a target was present or absent, dilations were larger in the auditory condition, and the peak dilation under auditory presentation occurred about half a second later than with visual presentation. 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 59 this secondary peak was only present when motor response was required, I interpreted the secondary peak as an artifact of that motor response. The interaction of stimulus mode and target presence was not significant (F (1, 23) = 0.351, p = .6). The larger dilations evoked by auditory task presentation persist whether a target is present or absent (see Figure 5.7). Task performance Participants made more errors in the counting vigilance task when it was presented aurally (8.5%) than visually (6.1%), but this difference was not significant: χ2 (1, N = 774) = 7.80, p = .14. Error rates for all tasks are shown in Figure 5.8. Discussion This experiment compared the cognitive load under aurally and visually presented intermittent vigilance tasks. As with the other two tasks I studied, the two presentation modes elicited pupil dilations with very similar timing and overall shape, and although I did not observe a significant performance difference, visual presentation caused lower cognitive load. In addition to the presentation mode effect, I also observed that the presence of targets was associated with larger pupil dilations. This difference is consistent with the additional cognitive demand of pushing the button in cases when the target is present. 5.3.6 Discussion Summary of experiments In my first experiment, participants memorized sequences of digits either spoken aloud or displayed on a computer screen. My second experiment examined mental multiplication, again presented both aurally and visually, and my third experiment considered a speeded-reaction vigilance task which did not rely heavily on working memory. In all tasks, I controlled the stimulus timing between the two modes, as well 60 CHAPTER 5. FROM AUDITORY TO VISUAL 60% aural visual 50% Error Rate 40% 30% 20% 10% 0% Mental Multiplication Sequence Memory Vigilance Figure 5.8: Error rates for all three tasks. Whiskers on error rate bars show 95% χ2 confidence intervals. The differences in error rate on the mental multiplication and sequence memory tasks are significant (p = .033 and p = .0026, respectively, under one-tailed tests for equality of proportions with Yates’ continuity correction). The difference in error rates for the vigilance task was not significant (p = .14). 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 61 as controlling all aspects of the visual field—brightness, contrast, and participant fixation—in order to minimize non-cognitive pupillary reactions. Summary of findings I found that the pupil dilations evoked by all three tasks were qualitatively similar under auditory and visual presentation, but that auditory presentation led to larger pupillary dilations. Qualitative match In all three of my experiments, I observed that pupil dilations in both modes had about the same onset timing, duration, and overall shape (See Figures 5.3, 5.2, and 5.6). Additionally, in the two tasks which replicated classic pupillary response studies, mental multiplication [44] and digit span [57], I also found a qualitative match between the dilations I observed and the auditory-only classic results. Both of these qualitative correspondences—visual to auditory in my experiments and visual to classic auditory findings—suggest that the pupil dilations I observed to visually-presented tasks reflect the cognitive demands of the tasks and were generally free of distortion caused by non-cognitive pupillary reactions to brightness or contrast changes. Quantitative difference In all three of my experiments, I observed significantly larger pupillary dilations when I presented tasks aurally than when I presented them visually. The differences were 0.19 mm (0.35 mm vs. 0.16 mm) for mental multiplication, 0.18 mm (0.43 mm vs. 0.25 mm) for digit span memory, and 0.08 mm (0.23 mm vs. 0.15 mm) for vigilance. Implications Because I was careful to control for non-cognitive pupillary responses caused by brightness, contrast, etc., and because of my finding of a qualitative match in dilation trajectories between conditions, I believe that the difference in magnitude between the two conditions was a result of differences in cognitive load. I therefore interpret 62 CHAPTER 5. FROM AUDITORY TO VISUAL this result as evidence that visual task presentation leads to lower cognitive load than auditory presentation across all three of the tasks I studied. This finding contradicted my hypothesis that similar task demands would lead to similar magnitude dilations in the two cases, perhaps with an initially smaller dilation under visual presentation caused by the lesser difficulty of seeing vs. hearing numbers. Instead, I found that auditory task presentation led to larger pupil dilation not only during initial stimulus comprehension but also throughout task completion. Taken together with the better performance I observed in the visual conditions, this finding indicates that visual presentation facilitates processing for all three tasks. That is, comprehending and remembering numbers is easier when they are seen than when they are heard. Relation to prior digit span findings In the case of digit span, my finding of an advantage for visual presentation seemed to contradict prior studies which found better performance under auditory task presentation. Improved recall of heard numbers relative to seen numbers is very well established [92; 35, p. 22; but see 8]. Indeed, in my measurements of error rates, I found that although visual presentation led to significantly greater overall performance, the difference was not large, and rates of recall for the longer sequences and average digit span scores suggest a small performance advantage for auditory presentation, as was found in the cited investigations. This apparent contradiction between lower cognitive load under visual presentation and superior recall of heard numbers can perhaps be resolved by drawing a distinction between levels of effort and levels of performance [c.f. 88]. Although performance was better for heard numbers, my pupillary data suggest that this greater performance may have come with the cost of greater effort and cognitive load. Relation to prior mental arithmetic findings Prior investigations of mental arithmetic have not often addressed the effect of stimulus mode. In a study of the relative importance of different components of working memory in serial mental addition, Logie et al. [70] observed that visual problem presentation led to better performance and less degradation in the context of a variety of interfering tasks. My finding of 5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 63 better performance in the visual case matches theirs. They concluded that the central executive, the visuo-spatial store, and subvocal rehearsal are all involved in mental arithmetic. Taken together with this data, my finding of lower cognitive load in the visual case suggests that visual presentation facilitates mental arithmetic performance by aiding the recruitment of all three of these components of working memory. This possibility is supported by recent fMRI data collected by Fehr et al. [30], who found that presentation mode can significantly impact which regional neuronal networks are employed in the calculation process for mental arithmetic. Conclusion It is well known that visual presentation can lead to higher performance on complicated tasks such as schema learning [24] and finding patterns in data [23]. Such advantages are typically attributed to the benefits of a persistent external representation that reduces load on working memory. My finding of a visual advantage even for simple tasks and even though we controlled presentation duration, displaying the digits exactly as long as they took to speak, suggests that something besides visual persistence underlies this visual advantage. One account for superior performance under visual rather than auditory presentation rests on the role of dual codes in working memory [e.g. 6]. Visual presentation is likely to encourage dual coding of the stimuli [89]. Extensive research has shown that having two mental representations for something, notably, both visual and verbal, is better for memory than having one. If one internal representation is lost or corrupted, the other can compensate. People tend to spontaneously name visual stimuli but they do not spontaneously generate visual images to verbal stimuli, so that visual presentation is more likely to generate two codes than verbal presentation. The existence of two codes could facilitate information processing in addition to augmenting memory. Mental operations like arithmetic are regarded as performed by the articulatory loop. If memory for the stimuli is retained in the visuospatial sketchpad, then the articulatory loop, relieved of memory load, has more capacity for information processing. These findings, if replicated and extended, have broad-ranging implications for education as well as interface design. 64 CHAPTER 5. FROM AUDITORY TO VISUAL Alternatively, it is possible that the greater effort required by aural presentation is due only to differences in the difficulty of perception and not because of any subsequent processing differences, such as visual persistence or differential recruitment of working memory components. Future work could resolve this question by adjusting stimulus discriminability to equalize perception difficulty between the two modes and then check to see whether the effort differences remain. Further research to determine the true cause of mode-related differences in pupil dilations will help to determine whether such dilations can fulfill Kahneman’s second criterion for an effort proxy, inter-task comparability, and thus be useful for comparisons of cognitive load between the auditory and visual domains. Chapter 6 Combining gaze data with pupillometry Summary I describe a new way of analyzing pupil measurements made in conjunction with eye tracking: fixation-aligned pupillary response averaging, in which short windows of continuous pupil measurements are selected based on patterns in eye tracking data, temporally aligned, and averaged together. Such short pupil data epochs can be selected based on fixations on a particular spot or a scan path. The windows of pupil data thus selected are aligned by temporal translation and linear warping to place corresponding parts of the gaze patterns at corresponding times and then averaged together. This approach enables the measurement of quick changes in cognitive load during visual tasks, in which task components occur at unpredictable times but are identifiable via gaze data. I illustrate the method through example analyses of visual search and map reading. I conclude with a discussion of the scope and limitations of this new method. Most of the content of this chapter were published at the 2010 Symposium on Eye-Tracking Research & Applications [61]. 65 66 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY 6.1 The usefulness of gaze data In preceding chapters, I have described how eye trackers, used as pupillometers, can be used to measure instantaneous cognitive load. But the primary purpose of these machines has always been to measure gaze direction. The datum of where somebody is looking is extremely rich. It can be exploited for gaze-based interfaces, which are especially useful to disabled people. It is an almost perfect proxy for attention, so it has been applied broadly to investigations of cognitive psychology, perception, and psychophysics. The fact that a single device simultaneously measures pupil and gaze opens the possibility for rich experiments that investigate cognitive load in tandem with attention. There have been two main obstacles to such research: • Investigation of visual attention requires visual stimuli, and such stimuli can cause pupillary reflexes that interfere with the measurement of cognitive pupillary dilations. • Visual attention changes quickly and unpredictably. This precludes the time alignment of experimental trials based on stimulus presentation, which is necessary to study cognitive dilations on short timescales, as described in subsection A.4.6. The first problem can be addressed through careful control of experimental stimuli, as described in section 5.2. But even when this problem is solved, studies are still limited to short, simple tasks, in which the cognition of interest occurs at a consistent time soon after an experimenter-controlled stimulus. In visual tasks such as map reading, chart reading, visual search, and scene comprehension, people shift their attention rapidly and unpredictably, with scan paths being planned on the fly in response to what has been seen so far. It would be useful to measure the dynamics of cognitive load during such tasks, and pupillary response averaging provides good time resolution. But the unpredictability of visual problem solving violates the requirement of signal averaging that the cognitive process being studied happen with predictable timing, which is necessary for aligning the pupillary 6.2. FIXATION-ALIGNED PUPILLARY RESPONSE AVERAGING 67 responses from multiple trials. This chapter describes a solution to the second obstacle, enabling the assessment of cognitive load in such tasks: fixation-aligned pupillary response averaging, in which eye fixations are used instead of stimulus or response events to temporally align windows of pupil measurements before averaging. This method enables the detection of quick changes in cognitive load in the midst of long, unstructured tasks, especially visual tasks where fixations on certain points or sequences of points are reliable indicators of the timing of certain task components. For example, there are certain subtasks that are usually required to read a bar chart, but these subtasks occur at different times from trial to trial and from person to person: e.g. reading the title and axis labels, judging the relative heights of bars, estimating bar alignment with axis ticks, and looking up bar colors in the legend. If we conduct many repetitions of such a chart reading task, changing the details, we can later use scan path analysis to identify all times when somebody compared the height of two bars. We can then align pupil signals from those epochs at the moments of key fixations in the comparison, then average them to determine the average changes in cognitive load during comparison of the height of two bars in a bar chart. I describe the details of this new averaging method in section 6.2, then illustrate its application in two example analyses of cognitive load: one of target discovery in visual search (subsection 6.3.1) and one of references to the legend during map reading (subsection 6.3.2). I conclude the chapter with a brief discussion of the method’s applicability and limitations. 6.2 Fixation-aligned pupillary response averaging Fixation-aligned pupillary response averaging can be broken down into three steps: 1. the identification of subtask epochs, short spans of time in which the task component occurs, 2. the temporal alignment of all such subtask epochs, and 3. averaging the aligned epochs. 68 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY 6.2.1 Identifying subtask epochs using patterns in gaze data I use the term epoch to refer to short windows of time in which a consistent task component (and therefore a consistent pupillary response) occurs, as well as to the pupil diameter measurements collected during that window of time. Epochs are typically two to ten seconds long. An epoch is characterized by one or more gaze events, experimenter-defined fixations or saccades. Single fixations The simplest gaze event is fixation on a particular spot identified by the experimenter. Epochs defined by single fixations encompass a brief window of time a few seconds long and centered on the fixation. For example, in a visual search task requiring discrimination between targets and distractors, each fixation on a search item determines an epoch containing a visual discrimination subtask. Fixations on targets determine epochs of target recognition (see Section 6.3.1). In a flight simulation study, each fixation on the altimeter could define an epoch. Scan paths Gaze events can also be sequences of fixations (scan paths) or saccades from one location to another. For example, fixation on the axis of a bar chart before looking at any of the bars indicates general orientation to the chart, while fixation on the axis immediately after fixating the edge of one of the bars indicates an epoch of axis reading. Comparison of two bars is signaled by several consecutive fixations alternating between them. Epochs defined by scan paths are usually composed of more than one gaze event. For example, in map reading, when somebody looks a symbol in the map, then saccades to the legend to look up the meaning of the symbol, then saccades back to the symbol, these three gaze events comprise a legend reference epoch (see subsection 6.3.2). In each of these cases, the experimenter defines a sequence of fixations that they expect to reliably occur together with the task component or cognitive process under investigation. 6.2. FIXATION-ALIGNED PUPILLARY RESPONSE AVERAGING 69 Other gaze data attributes There are other attributes of gaze data that might be used to identify subtask epochs, including fixation duration, fixation frequency, saccade velocity, and blink rate. This dissertation only addresses the use of fixations and scan paths. Identifying epochs from non-gaze timing signals Although I do not explore it in these studies, it would also be possible to identify subtask epochs using timing signals from other measurable events besides the gaze. When a task can be modeled in advance and subtask boundaries detected through the state of the interface, such subtask boundaries can serve as timing signals for pupillometry [50], and could potentially be used for trial-averaged cognitive pupillometry if they occur frequently enough without being contaminated by motor-evoked dilations (see section 5.2 and section 7.1). 6.2.2 Aligning pupil data from selected epochs After all the epochs containing the subtask of interest are identified, they need to be aligned based on the timing of gaze events that make them up. Temporal translation For epochs defined by a single gaze event, like fixation on a particular spot, temporal alignment simply requires translation of each epoch so that their gaze events coincide. Formally, if Pi (t) is pupil diameter as a function of time during epoch i, and gi is the time of epoch i’s gaze event, T [Pi (t)] = Pi (t + gi ) is the temporal translation of Pi (t) which places its gaze event at t = 0. Such alignment is done for all epochs that will be averaged. Alignment via temporal translation is illustrated in Figure 6.1. Warping Sometimes, epochs of interest are characterized by multiple gaze events. For example, referencing the legend during a map reading task involves a saccade from the map 70 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY | underlying pattern | trial 1 | | | Change in Pupil Diameter | trial 2 trial 3 trial 4 mean | | epoch 1 | epoch 2b | epoch 3 | | | epoch 4 epoch 2a fixation-aligned mean plus-minus average Time Figure 6.1: Illustration of epoch alignment via temporal translation followed by averaging. The top half of the figure shows four simulated trials with gaze events (fixations) occurring at various times. The simulated pupil diameter data for these trials is the sum of random walks (simulating typical background pupil motions) and a dilation response occurring at a fixed delay following the fixation (illustrated at the top of the figure in grey). Because the fixations in these four trials are not aligned, neither are the pupillary responses, and averaging without translation fails to recover the underlying pattern. Epochs aligned by translation are shown in the bottom half of the figure. Because these epochs are aligned on their gaze events, the pupillary responses are aligned too, and averaging the five signals reveals the underlying pupillary response pattern. The final line in the figure is the ±-average of the four aligned signals (see section A.4.6), which shows the level of noise present in the mean above it. In this example, the magnitude of the signal relative to the background pupil noise is exaggerated; in real pupil dilation data, many dozens (sometimes hundreds) of epochs must be averaged before the noise level in the average is low enough to distinguish the pupillary response. 6.2. FIXATION-ALIGNED PUPILLARY RESPONSE AVERAGING 71 to the legend, a period of time spent on the legend, and a saccade back to the map. Translation could align the map → legend saccades in all these epochs, but because people do not always dwell on the legend for the same amount of time, the returning legend → map saccades will not line up. If the cognition of interest occurs relative to both points, then signal averaging will not reinforce it. Porter et al. [97] faced a similar problem in their analysis of pupil data from tasks of various lengths, in which they needed the start and end times of each task to align. They solved it by setting a fixed task duration and then stretching or compressing the data from each trial to fit in that window. A similar warping operation to the one described in this section is used by Slaney et al. [109] in their method for morphing one sound into another. They first decompose the sound into pitch and spectral signals, then they align each of these one-dimensional components between the first and second sounds using a piecewise linear warp similar to the one described in this section in order to align corresponding parts of the two sounds before cross-fading. In the context of task epochs defined by several gaze events, I applying a linear time-stretching operation to the span of time between each pair of consecutive gaze events. Formally, in an average of n epochs (indexed by i), each of which is defined by m gaze events (indexed by j), the piecewise linear warping of Pi (t) is defined as W [Pi (t)] = t − g1 Pi gi,1 + (gi,2 − gi,1 ) g − g 2 1 t−g Pi gi,2 + (gi,3 − gi,2 ) g − g2 3 2 .. . for g1 ≤ t < g2 for g2 ≤ t < g3 .. . t−g Pi gi,n−1 + (gi,n − gi,n−1 ) g − gn−1 for gn−1 ≤ t ≤ gn , n n−1 where gi,j is the time of the jth gaze event in the ith epoch, and g1 , g2 , . . . , gn are the gaze event reference times, the mean times of occurrence for each gaze event across P all the epochs being aligned: gj = n1 ni=1 gi,j . Epoch alignment via piecewise linear warping is illustrated in detail in Figure 6.2 and applied to averaging several signals in Figure 6.3. This alignment technique is applied to the analysis of legend references CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY Change in Pupil Diameter 72 unwarped warped Time Figure 6.2: Illustration of piecewise linear warping applied to a single epoch of pupil diameter data defined by four gaze events. In the original unwarped pupil diameter data, shapes mark the times at which the gaze events occurred. The epoch is divided into segments at the gaze events, and each segment is linearly transformed in time so that the gaze events that bound it are moved into their reference positions in time. These reference positions are determined by averaging the time of occurrence of each gaze event across all epochs (see section 6.2.2). Figure 6.3 shows this warping operation applied to several epochs at once before averaging them to reveal pupillary responses that occur with consistent timing with respect to the gaze events. in subsection 6.3.2. It is important to note that epoch warping is a selective focusing operation. When pupillary responses take place with respect to more than one gaze event, it can reveal them, but at the same time it will obscure any pupillary responses that do not follow that pattern. 6.2.3 Averaging Once epochs have been aligned, they can then be averaged using the standard trial averaging procedures used in traditional cognitive pupillometry, described in subsecP tion A.4.6. Epochs can be averaged using a simple mean: P (t) = n1 ni=1 B[T [Pi (t)]], 6.2. FIXATION-ALIGNED PUPILLARY RESPONSE AVERAGING 73 underlying pattern trial 1 trial 2 trial 3 Change in Pupil Diameter trial 4 mean epoch 1 warped epoch 2 warped epoch 3 warped epoch 4 warped fixation-aligned mean plus-minus average Time Figure 6.3: Illustration of epoch alignment via piecewise linear warping followed by averaging. The top half of the figure shows four simulated trials, each with four gaze events (fixations or saccades) occurring at various times. As in Figure 6.1, simulated pupil diameter data are the sum of random walks and the indicated pupillary response relative to the four gaze events. The bottom half of the figure shows the result of aligning each epoch via piecewise linear warping. The average of the aligned signals reveals the underlying pupillary responses, because they occurred with consistent timing relative to the gaze events. The final line in the figure is the ±-average of the four warped epochs (see section A.4.6), which indicates the level of noise present in the mean above it. As in Figure 6.1, the magnitude of the signal relative to the background pupil noise is exaggerated. 74 or CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY 1 n Pn i=1 B[W [Pi (t)]], depending on whether translation or warping is used for align- ment. The ±-average can also be used in place of the standard average at this stage, to evaluate the residual noise in the averaged signal (see section A.4.6). 6.3 Example applications While eye trackers have been successfully used for stimulus-locked cognitive pupillometry, it is not obvious that fixation-aligned signal averaging will work. It is possible that the timing of cognitive processes is not consistent enough with respect to eye movements or that the very act of looking around suppresses or interferes with the task-related pupillary response. A successful application of fixation-aligned averaging requires an averaged pupillary response which differs from its corresponding ±-average and any relevant control conditions, and which is consistent with known patterns of cognition for the studied subtask. In the following two example analyses, I apply fixation-locked averaging to two well-studied tasks, in order to illustrate its use and to demonstrate its validity. In both tasks, I defined epochs using gaze events I expected to be strongly correlated with shifts in cognitive load, in order to find out whether fixation-aligned averaging revealed the expected shifts in excess of background pupillary noise. 6.3.1 Visual search Visual search has been studied extensively with eye tracking [e.g. 31], and occasionally with pupillometry [e.g. 97, 5], though signal averaging has only ever been applied with respect to full task onset and completion. This section summarizes an application of fixation-aligned pupillary response averaging to investigate shifts in cognitive load that occur around the moments of search target discovery. The averaging used in this example uses the single gaze events and translation alignment described in Section 6.2.2 and illustrated in Figure 6.1. 6.3. EXAMPLE APPLICATIONS 75 11 16 12 15 19 13 17 18 14 Figure 6.4: A fragment of a search field used in my visual search study. Participants searched for L’s (targets) in a field of T’s (distractors). Each character spanned about 0.73◦ . The scan path from one trial is shown in blue, with circle area proportional to fixation duration and numbers giving the order of fixations. Fixation 17 is a target fixation; all other fixations are non-target fixations. Task description I designed an exhaustive visual search task in which study participants counted the number of L’s (targets) in a field of T’s (distractors) (See Figure 6.4). The search field contained a variable number of targets, and each trial continued until the participant found them all. Targets were often fixated more than once during a search, with later fixations performed to confirm previously-discovered targets’ locations and avoid overcounting. Before the start of the task, a field of X’s was shown; when the search task started, the X’s changed to T’s and L’s, so that task onset would not correspond to a change in the brightness or contrast of the visual field, both of which could have caused pupil reflexes. 76 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY Participants I recruited seventeen undergraduate participants, according to the standard criteria described in section A.1. Fixation identification I segmented scan paths into fixations using the dispersion threshold technique (described by Widdel [121]; see also Salvucci and Goldberg [102] for alternatives), with a minimum fixation duration of 160 ms, and a dispersion threshold of 2◦ . Consecutive sequences of fixations that all fell within 1.25◦ of targets were grouped into dwells, within which the fixation that fell closest to the target was labeled as a target fixation. Fixations located within the search field but at least 5◦ from any target, and excluding the first five and last five fixations of the trial, were labeled as control fixations, included in the analysis to check for any consistent pupillary response to fixation itself. Both target fixations and control fixations were used as gaze events for selecting pupil data epochs for averaging. Results Target fixations vs. control fixations Figure 6.5 shows the average pupillary response to target and control (non-target) fixations, aligned to the start of the fixation, and showing a few seconds of pre- and post-fixation context. For baseline subtraction, I used a baseline interval 0.4 seconds (20 samples) long, starting 1.75 sec before the fixations. I found a clear difference in pupillary responses to the two different types of event. Fixations far from targets had no consistent pupillary response and so averaged to an approximately flat line, while fixations on targets resulted in a dilation of about 0.06 mm. Surprisingly, the averaged dilation begins about one second before fixation on the target. A further breakdown of the data by difficulty and fixation sequence shows the cause: Target discoveries vs. target revisits Figure 6.6 shows the same target fixations, but grouped in two averages, one for all the first fixations on each target (discoveries) and one for fixations on targets that have already been fixated during the search 6.3. EXAMPLE APPLICATIONS 77 Pupillary response to fixation on targets vs. non-targets start of fixation Change in pupil diameter (mm) 0.06 0.04 0.02 0.00 -0.02 far from targets (1511 fixations) on targets (770 fixations) -2 -1 0 1 2 3 Time (seconds) Figure 6.5: Fixations on targets vs. fixations on non-targets. Each line in the chart represents the average of many fixations of each type. The shaded regions surrounding each line indicate the standard errors for those averages. Fixations far from targets had no consistent pupillary response and so averaged to an approximately flat line, while fixations on targets resulted in a dilation of about 0.06 mm. 78 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY (revisits). The dilation response to revisits begins at least one second before the target fixation, perhaps reflecting recall of the previously identified target or saccade planning in order to re-confirm its location. Target discovery sequence Finally, Figure 6.7 shows the average pupillary response to discovering the 1st, 2nd, and 3rd targets during the search. The magnitude of the average dilation is larger in response to 3rd target discoveries than to 1st and 2nd discoveries. The first two discoveries are signs of task progress, but finding a third targets means that exhaustive search is not needed and the task is completed. Discussion As a check on the validity of fixation-aligned pupillary response averaging, this case study was a success. I observed averaged dilations well above the background noise level and which differed substantially between fixations on targets and fixations on non-targets. In addition, the time resolution in the averaged turned out to be fine enough to suggest differences in memory dynamics surrounding target discoveries vs. revisits. A more thorough analysis of visual search would use more complicated attributes of the scan path, like the fraction of search area that has been covered, to identify additional subtasks, or explore how pupillary responses vary over the course of the search. Additionally, the differences in timing and magnitude of pupillary reactions could be analyzed between subjects, or with respect to task performance. 6.3. EXAMPLE APPLICATIONS 79 Pupillary response to first vs. later fixations on targets start of fixation 0.08 Change in pupil diameter (mm) 0.06 0.04 0.02 0.00 -0.02 Target Discovery (449 fixations) Target Revisit (321 fixations) Control (fixations on non-targets) (1511 fixations) -0.04 -2 -1 0 1 2 3 Time (seconds) Figure 6.6: Average pupillary responses to first (discovery) fixations on targets vs. later (revisit) fixations on targets. The dilation response begins about one second before the fixation for revisits. 80 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY start of fixation 0.20 Change in pupil diameter (mm) 0.15 0.10 0.05 0.00 Discovery of 1st Target (359 fixations) Discovery of 2nd Target (223 fixations) Discovery of 3rd Target (89 fixations) Control (fixations on non−targets) (2557 fixations) −0.05 −2 −1 0 1 2 3 Time (seconds) Figure 6.7: Average pupillary responses to first fixations on targets (discoveries), grouped by how many targets had previously been discovered. The magnitude of the average dilation is larger in response to 3rd target discoveries than to 1st and 2nd discoveries. 6.3. EXAMPLE APPLICATIONS 6.3.2 81 Map legend reference The second example application of fixation-aligned pupillary response averaging uses more complicated epochs, defined using multiple gaze events and aligned with warping. Task description In a study of map reading, participants examined a fictitious map showing the locations of frog and toad habitats. The map uses abstract symbols to show places where frogs and toads live, with each symbol standing for a different species. The symbols are identified in a legend providing the species name and classification as frog or toad (Figure 6.8). In reading the map, participants must look up the abstract symbols in the legend to learn which of them correspond to frogs and which correspond to toads. It is these legend references which I analyze here. Participants & apparatus This study included fifteen undergraduates, distinct from those in the first study but subject to the same selection criteria. The eye tracker was the same. Identifying legend reference epochs Epochs of pupil measurements encompassing legend references were identified using scan paths. I defined a legend reference as fixation on a cluster of map symbols, followed by a fixation on the legend, followed by a return saccade to that same cluster of map symbols. An example epoch is shown in Figure 6.8. I used the saccades to and from the legend as gaze events on which to align epochs via piecewise warping before averaging (see section 6.2.2). Baseline intervals used for baseline pupil diameter subtraction were 0.4 sec (20 samples) long, starting 1.5 sec before the saccade from the map to the legend. I calculated fixations using dispersion threshold clustering as described in section 6.3.1. 82 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY 13 7 9 8 6 5 10 11 12 4 Figure 6.8: A fragment of a task map and corresponding legend. A sample scan path is shown in blue, with circle area proportional to fixation duration and numbers indicating the order of fixations. A legend reference epoch begins at the time of fixation 7 and ends at the time of fixation 13. The gaze events used for alignment of this epoch are the 8 → 9 and 12 → 13 saccades. 6.3. EXAMPLE APPLICATIONS 83 Results I expected a momentary dilation preceding each legend reference, caused by the need to store the symbol or symbols being looked up in visual working memory during the saccade to the legend and the search there for the matching symbol(s). This pattern did emerge in the averaged pupil response, along with other changes also correlated with the legend reference saccades (Figure 6.9). On average, participants’ pupils contracted while looking at the legend before recovering nearly to their pre-saccade diameter. Although the changes were small (on the order of 0.05 mm), I collected enough epochs (925 legend references) that these changes stood out above the noise level indicated by the ±-average. Discussion Like the visual search study, this application of fixation-aligned pupillary response averaging succeeded. I found a pupillary response evoked by the subtask of referencing the map legend which substantially differed from its corresponding ±-average, though in this case several hundred trials were required to reveal the response. The changes in pupil diameter which I observed are intriguing; in addition to the simple pre-reference dilation I expected, I also observed a pupil constriction during the dwell on the legend. This pattern is consistent with the loading of visual working memory with the symbol to be looked up, the release of that memory once the symbol has been located in the legend, and a final increase in load as the participant’s running count of frogs is updated depending on the symbol’s classification. Unfortunately, the legend reference task is complicated enough that it is difficult to associate the patterns in pupil dilations with specific cognitive activities like the use of memory. Without more careful experiments that control the context of the averaged gaze epochs, speculative interpretation of the sort given in the previous paragraph is largely unjustified just-so-story telling. The difficulty of using a onedimensional physiological proxy that is affected by many kinds of mental activity to understand complicated tasks is one of the basic limitations of cognitive pupillometry (see section 7.1) 84 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY looking at map symbols looking at map symbols looking at legend Change in pupil diameter (mm) 0.02 0.00 -0.02 Mean Pupil Response Plus-Minus Average -0.04 -1.0 -0.5 Time (seconds before legend saccade) 0.0 0% 50% 100% Time (fraction of legend dwell) 0 0.5 1.0 Time (seconds after return to map) Figure 6.9: Average pupillary response to 925 legend references in a map reading task. Black circles indicate reference gaze event times. The semi-transparent regions bounding each curve show the standard errors of the mean at each time for the plotted average. The average dwell on the legend lasted about 1.2 seconds. The average pupillary response includes a 0.02 mm dilation prior to saccade to the legend, a 0.06 mm constriction and recovery while looking at the legend, and a return to the map at a slightly higher pupil diameter. 6.4. CONCLUSIONS 6.4 85 Conclusions This chapter described fixation-aligned pupillary response averaging, a new method for combining synchronized measurements of gaze direction and pupil size in order to assess short-term changes in cognitive load during unstructured visual tasks. Components of the visual tasks with consistent demands but variable timing are located by analyzing scan paths. Pupil measurements made during many instances of each task component can then be aligned in time with respect to fixations and averaged, revealing any consistent pupillary response to that task component. This new mode of analysis expands the scope of tasks that can be studied using cognitive pupillometry. With existing stimulus-locked averaging methods, only shifts in cognitive load that occur relative to experimenter-controlled stimuli are measurable, but with fixation-aligned averaging, pupillary responses can also be used to study any shifts in cognitive load that occur consistently with respect to patterns of attention detectable in gaze direction data. In the example study of visual search described in subsection 6.3.1, the timing differences in pupillary responses to target discoveries and revisits, which show the recall of previously-visited targets, are only detectable through fixation-aligned averaging. Similarly, the shifts in cognitive load surrounding subject-initiated legend references described in subsection 6.3.2 could only be detected by determining the timing of those legend references using gaze direction data and then using that timing information to align and average pupil diameter measurements. There are many other tasks that could be studied using this method. Reading, for example, which has been studied using eye tracking [98] and pupillometry [53] separately, has many gaze-signaled cognitive events, such as back-tracking, which could be studied with fixation aligned averaging of pupil measurements. 86 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY Chapter 7 Unsolved problems 7.1 7.1.1 Current limitations Simple, short tasks Cognitive pupillometry can be employed when tasks can be reliably constrained by experimental design, or, in the case of fixation-aligned averaging, when epochs can be reliably identified from gaze direction data and when changes in cognitive load during these epochs occur with consistent timing relative to the gaze events that define the epochs. In practice, this limits both the specificity and duration of tasks that can be studied. For specificity, either the general task must be constrained enough or the gaze events defined specifically enough that cognitive processes are consistent across epochs. For example, in the map-reading task (subsection 6.3.2), I only considered legend references which began and ended at the same point in the map, to avoid including fixations on the legend that were simple orientation to the display, or otherwise did not involve looking up a particular symbol. Even when epochs containing consistent cognitive processes are identified, the requirement that the timing of those processes be consistent with respect to the fixations and saccades which define the epochs is generally only satisfied for short epochs, in practice usually 2–5 seconds long. 87 88 CHAPTER 7. UNSOLVED PROBLEMS 7.1.2 Restrictions on task display Because pupils respond reflexively to the brightness [68] and contrast level [116] of whatever is currently fixated, the task display must be designed with spatially uniform brightness and contrast. In addition, epochs occurring soon after changes to the display can be contaminated by reflex dilations to motion cues [101] (see also section 5.2). In all the studies presented here, I used static, low-contrast stimuli with spatiallyuniform brightness (e.g. Figures 6.4 and 6.8). In addition, I left a large margin between the edge of the stimulus and the boundary of the display, because when subjects fixate near that boundary, their field of view includes the display bezel and the wall behind, both of which are difficult to match in brightness to the stimulus itself. 7.1.3 Restrictions on interaction Any user interaction, such as mouse or keyboard use, needs to be well separated in time from the subtask epochs studied, because the preparation and execution of motor actions also causes momentary pupil dilations [99]. All studies described in this dissertation were designed without any interaction during the task. Analysis must generally exclude data gathered that occurred within two seconds of the buttonpushes that participants use to initiate or complete each trial. 7.2 Future research 7.2.1 Disentangling various pupillary influences The same principle that inspires research in cognitive pupillometry also sets limits its use in the study internal cognitive processes. Recall Bumke’s 1991 observation [21], first quoted at the start of this dissertation: Every active intellectual process, every psychical effort, every exertion of attention, every active mental image, regardless of content, particularly 7.2. FUTURE RESEARCH 89 every affect just as truly produces pupil enlargement as does every sensory stimulus. In section 6.2.2, I described a technique for warping pupil data before averaging which enables high-precision measurements of pupillary responses to subtasks even when those subtasks occur with variable timing. I showed how this technique could be used to analyze references to legends in a map reading task (see subsection 6.3.2). However, as the scope of cognitive pupillometry is pushed toward subtasks of greater complexity, it becomes more and more difficult to call the pupillary responses thus measured “cognitive load.” In tasks like digit sequence memorization (section 4.1), the cognitive process of solving the task is well understood. We know that short term verbal memory is being used to store the digits, so it is easy to call the matching pattern of pupil dilations observed during that task a proxy for load on working memory. But when referencing the legend, there are many possible cognitive processes that could be contributing to the pupillary response, including holding the symbol being looked up in short term visual or verbal memory, searching the legend key for the desired symbol, maintaining a count of the number of frog symbols found so far, and finding one’s way to and from the legend itself. All of these processes have the potential to cause pupillary responses More experiments can be done to help disentangle the effects of these different task components, by varying the task demands to eliminate some of them. Even so, as cognitive pupillometry is pushed toward more complex tasks, this entanglement of all the different sources of pupil motions will limit its applicability. To go further, it will be necessary to combine pupillometry with other kinds of measurements that can isolate types of mental effort. 7.2.2 Combining pupillometry with other psychophysiological measurements Some researchers have combined pupillometry with eye movement parameters like fixation frequency [e.g. 5] or with EEG [74] to make composite measurements of mental 90 CHAPTER 7. UNSOLVED PROBLEMS effort. Any technology that measures the spatial distribution of brain activity, like EEG [37] or fMRI [45], has the potential to help disentangle the different influences on a monolithic quantity like cognitive load. 7.2.3 Modeling and compensating for the pupillary light reflex In all my experiments, I carefully controlled the visual field of participants in order to minimize interference from the pupil’s sensitivity to variations in brightness. Light reflexes are so large and neurologically dominant that they have the potential to overwhelm task-induced pupil dilations. In future experiments, I plan to do even more to control these reflexes, by matching the luminance of the screen to the luminance of the wall behind it, and by covering the bezel of the eye tracker with masking tape or some other light-colored mask, to match it with the back wall as well and remove variations in the luminance of the visual field caused by looking around. However, for cognitive pupillometry to be applied to more realistic visualization tasks, it will be necessary to lift these limitations on the visual field. One avenue for doing so may be to model the pupillary light reflex, use eye tracking to estimate the luminance of the visual field, and then use the model to estimate the contribution of the light reflex to the pupils’ size. A few researchers have have attempted to adjust pupil diameter data to compensate for the overall luminance of stimuli [94, 83], but these approaches only model constant luminance, so are not yet applicable to trialaveraged cognitive pupillometry. The neurology of the pupillary light reflex is very well understood [68], and pupil size is determined by two simply-shaped opposing muscles. I believe that this neurophysical system is simple enough to be successfully modeled, so that the light reflex caused by any given visual field could be predicted. I do not know whether the joint influence of the light reflex and cognitive pupillary responses could be modeled. This is an important area of research for extending the scope of cognitive pupillometry. The recent success of Palinko et al. [90] in measuring cognitive load via remote pupillometry in a driving simulator are encouraging in this regard, because they show 7.2. FUTURE RESEARCH 91 that task-evoked dilations are measurable even in a complicated visual environment with frequent shifts in gaze, at least when the task being studied is presented aurally with timing not correlated to the subjects locus of attention. 7.2.4 Expanding proof-of-concept studies The contributions of this dissertation are mainly methodological. The experiments described in chapter 4, section 5.3, and section 6.3 were designed to validate the methods rather than explore the cognitive psychology of the tasks. They demonstrate that cognitive pupillometry with a remote camera eye tracker can be used to study cognitive load in a variety of tasks, but they do not tell us much new about how people think about and complete those tasks. The few conclusions I was able to draw, regarding differences caused by aural vs. visual presentation of tasks (section 5.3.6), are very exciting, and it will be fascinating to explore them further, along with other applications of this new measurement method. 92 CHAPTER 7. UNSOLVED PROBLEMS Appendix A Experimental Methods The following details of my experimental methods apply to all the studies described in this dissertation. A.1 Participants Participants in all my studies were college students recruited from the Computer Science and Communications departments at Stanford University. All were in a pool of students in introductory HCI, design and communications classes who were required to participate in experiments on campus for course credit. Besides awarding this course credit, I also compensated participants with Amazon.com gift certificates. The value of each participant’s gift certificate depended on his or her task performance and varied from about $15 for the lowest scores to about $35 for the highest. Such monetary incentive was shown by Heitz et al. [41] to increase the magnitude of task-evoked pupillary dilations. I screened all participants for normal or corrected-to-normal vision. I excluded participants with contact lenses or eyeglasses providing an astigmatism correction or a refractive correction greater than ten diopters, because such corrective lenses can interfere with accurate pupil diameter tracking. 93 94 APPENDIX A. EXPERIMENTAL METHODS A.2 Apparatus I used a Tobii 1750 remote eye tracker [114]. This device is designed primarily to track people’s gaze direction, but its method of gaze tracking also enables high-speed pupillometry [63]. The eye tracker is based on a standard LCD computer display, with infrared lights and a high resolution infrared camera mounted at the edges of the screen. This remote-camera setup requires neither a chin rest nor a head-mounted camera, enabling pupil measurements without encumbrance or distraction. Measurements are corrected for changes in apparent pupil size due to head motion toward or away from the camera. Accurate pupil tracking with this equipment requires a head motion speed of less than 10 cm/sec within a head box of about 30 × 15 × 20 cm at our initial seating distance of 60 cm from screen. Under infrared illumination, participants’ pupils appear as bright ovals in the eye tracker’s camera image. The Tobii 1750 measures the size of a participant’s pupil by fitting an ellipse to the pupil image then converting the width of the major axis of that ellipse from pixels to millimeters based on the measured distance from the camera to the pupil. Due to inaccuracy in this measurement of camera–pupil distance, measurements of absolute pupil size may have errors of up to 5%, but sample-to-sample changes in pupil diameter are much more accurate [113]. This better accuracy for relative measures makes eye trackers well suited for cognitive pupillometry, where the measurement of interest is usually changes in pupil diameter relative to their diameter at the end of an accommodation period preceding each trial [13]. This measure has been found to be independent of baseline pupil diameter and commensurate across multiple labs and experimental procedures [12, 20, 19]. The Tobii 1750 samples pupil size at 50 Hz with each sample measuring both eyes simultaneously. For gaze direction, the Tobii 1750 has a resolution of 0.25◦ and an accuracy of 0.5◦ . A.3. PHYSICAL SETUP 95 Figure A.1: Physical arrangement of equipment, experimenter, and participant used for all studies. A.3 Physical setup I placed the eye tracker on a desk with the top of the screen approximately 140 cm from the floor. Participants sat in a chair adjusted so that their eyes were at this same height. Participants initiated trials and gave task responses using a two-button computer mouse on the desk between them and the eye tracker. The physical setup is shown in Figure A.1. 96 APPENDIX A. EXPERIMENTAL METHODS A.3.1 Room illumination The size of the pupil is controlled by the relative tone of two opposing smooth muscles in the iris: the parasympathetically innervated, stronger sphincter pupillae and and the sympathetically innervated, weaker dilator pupillae. The task-evoked pupillary response involves tone changes in both muscles [68]. First, parasympathetic inhibition caused by cortical activity or motor response preparation causes the sphincter pupillae to relax, and then sympathetic excitation causes the dilator pupillae to contract, further expanding the pupil [111, pp. 197–200]. Because the sphincter muscles are more active in bright surroundings, the first of these effects is larger under brighter ambient lighting. Thus, the level of ambient lighting can affect dilation onset latency and latency to peak, but not the total change in pupil diameter as measured in millimeters. [112]. In order to get the most accurate measurements of the timing of changes in pupil size, I used relatively bright ambient illumination. I blacked out all windows and used standard overhead diffused fluorescent lighting, leading to 27 cd/m2 of luminance from the surrounding walls at eye level and 32 lx incident at participants’ eyes. Because bright-environment pupillometry is more common than dark, this choice also facilitated comparison to other studies. Procedure Before each task, I explained the task to participants then allowed them to practice until they were familiar and comfortable with the task presentation and providing their responses. All trials were initiated by participants, who first fixated a small target at the center of the screen before starting the trial by clicking a mouse button. Participants’ gaze thus remained at the center of the screen for the duration of each trial and during most of the short intervals between trials. A run of trials for a single task generally took about five minutes. I told participants that they could take breaks at any point between trials to rest their eyes; two did so. A.4. DATA PROCESSING A.4 97 Data processing Because the left and right eyes exhibit matching pupillary responses, I used the average of the two eyes’ pupil diameters to reduce measurement noise. During moments when an eyelid, eyelash, or eyeglasses frame blocked the camera’s view of one pupil, I used the other pupil alone. I performed standard baseline subtraction in each trial based on the average pupil diameter measured over 20 samples (400 ms) at the end of a pre-stimulus accommodation period. After filling blinks via linear interpolation, I smoothed the raw pupil signals with a 10 Hz low-pass digital filter. The effect of these data cleaning steps is illustrated in Figure A.2. A.4.1 Smoothing The pupil measurements made by eye trackers are rather noisy, and remote eye trackers are especially bad, because the freedom of head motion poses two additional problems. First, unless the eye-trackers camera is actively pointed at the eyes, it must maintain a wider field of view, in which fewer pixels can be devoted to observing the pupil. Second, pupil measurements must be corrected for foreshortening by dividing the raw pixel-based pupil size by the distance from the camera to each eye. Drift, tremors, and non-spherical eye shape introduce noise into this distance measure [28], which in turn causes a noisy pupil size signal. Because this instrument noise is high-frequency, and pupils are known to dilate and constrict at low frequencies [77], I smoothed the pupil size signal with a lowpass filter. To determine the appropriate cutoff frequency for this filter, I analyzed the correlation between the pupil size signal from the left and right eyes at different frequencies. Since the instrument noise is independent for the two eyes, I expected the noisy frequency components of the pupil signal to be uncorrelated. In contrast, the frequency components containing the true pupil size signal should be correlated, because they are driven by general cognitive activation, which affects both eyes [13]. To find the boundary between these two parts of the signal, I applied a bandpass filter with a bandwidth of 0.5 Hz and varying central frequencies to the pupil size signal of each eye separately to isolate their individual frequency components 98 APPENDIX A. EXPERIMENTAL METHODS Raw Data (Left Eye) Raw Data (Right Eye) Cleaned Data (Left Eye) Cleaned Data (Right Eye) 4.4 Pupil Diameter (mm) 4.2 4.0 3.8 3.6 13.5 14.0 14.5 15.0 15.5 Time (seconds) Figure A.2: Illustration of data cleaning steps applied to a two-second period of pupil measurements gathered during a single trial. The gap in left eye data at t = 13.7 has been filled in with scaled right-eye data. The blink at t = 14.2 has been filled by linear interpolation, and the outlier near its end has been removed. Data from both eyes are smoothed with a 10Hz low-pass filter. A.4. DATA PROCESSING 99 Correlation Coefficient 1 0.8 Correlation between pupil size of the left and right eyes 95% Confidence interval for each correlation coefficient 0.6 0.4 0.2 0 0 5 10 15 20 25 frequency component (Hz) Figure A.3: Correlation between the measured pupil size of the left and right eyes, by frequency component. Pupil size frequency components above about 10 Hz are uncorrelated. I therefore considered this part of the pupil signal to be noise and removed it with a low-pass filter. and computed the correlation between the left and right eyes at different frequencies (Figure A.3). A.4.2 Perspective distortion Pomplun and Sunkara [95] reported a systematic dependence of pupil size on gaze direction. I replicated the ascending numeral visual search task they used to check for this bias but did not find it in our pupil measurements. I believe this is because the system used by Pomplun and Sunkara measured pupil size as the number of pixels 100 APPENDIX A. EXPERIMENTAL METHODS encompassed by the pupil image, and optical perspective causes this size to vary with gaze direction. The Tobii 1750 I used instead measures pupil size as the length of the major axis of an ellipse fitted to the pupil image. This method is not affected by perspective distortion, though it is still subject to small errors caused by non-circular pupil shape. I recommend either using the ellipse-fitting method or calibrating for the bias as per Pomplun and Sunkara. A.4.3 Data processing for statistical evaluation of differences in dilation magnitude I quantified dilation magnitudes with the mean amplitude method [37, p. 38; 13, p. 148]. This method involves first measuring a baseline pupil size for each trial by averaging pupil size during a pre-stimulus accommodation period, then computing the average pupil size relative to this baseline during a response window defined for each task. I chose the mean dilation quantification method over the also-common peak dilation, because the latter is more sensitive to noise. I quantified each trial separately, enabling statistical evaluation of effect size and significance. A.4.4 Significance tests I used an alpha level of .05 for all statistical tests. Tests of differences in mean dilation magnitude were all based on partitions of variance (ANOVA). Following the policy of Jennings [52], I applied the Huynh-Feldt [46] correction to degrees of freedom for within-subjects factors with more than two levels. In such cases, I report the HuynhFeldt non-sphericity correction parameter ˜, the uncorrected degrees of freedom, and the corrected p-value. I evaluated the significance of differences in error rates through one-tailed tests for equality of proportions with Yates’ continuity correction [78]. A.4.5 Baseline subtraction In cognitive pupillometry, the physical quantity of interest is the change in pupil diameter relative to its diameter shortly before the mental activity being studied A.4. DATA PROCESSING 101 [13]. That is, what matters is dilation (or constriction), not absolute pupil size. The magnitude of dilation responses to simple tasks is independent of baseline pupil diameter and commensurate across multiple labs and experimental procedures [12, 20, 19]. This means that the pupil data we are averaging needs to be transformed from absolute pupil diameter measurements to dilation measurements. This transformation is accomplished by first determining the baseline pupil size for each epoch by averaging the pupil diameter measurements during the epoch (or during a short window of time at the start of the epoch or surrounding its gaze event), then subtracting that baseline diameter from all pupil diameter measurements made in the epoch. Formally, if the time interval chosen for the baseline extends from t = b1 to t = b2 , subtracting the mean pupil diameter during that interval from the full signal gives B[Pi (t)] = Pb2 Pi (t) − b2∆t t=b1 Pi (t), where ∆t is the sampling interval of the eye tracker. −b1 This transformation from diameters to dilations has an important implication for the precision of pupil measurements. For cognitive pupillometry applications, an eye tracker’s accuracy in measuring changes in pupil diameter is much more important than its accuracy in measuring absolute pupil size. A.4.6 Averaging Trials can be averaged using a simple mean: P (t) = 1 n Pn i=1 Pi (t). If the data are messy, it may be better to use a trimmed mean or the median instead. The averaged pupillary response P (t) is the main object of analysis in cognitive pupillometry. Averaging epochs containing consistent pupillary responses preserves the pupillary responses while decreasing the magnitude of the noise in which they are embedded. Because the noise component of the signal is random with respect to the gaze events, the magnitude of the noise average (its standard deviation) decreases in proportion to the square root of the number of epochs included in the average. Cutting the noise by a factor of two requires quadrupling the number of epochs. The actual number of epochs required for a specific experiment depends on the level of measurement noise in the pupillometer and the level of background pupil noise in study participants. In my 102 APPENDIX A. EXPERIMENTAL METHODS studies using a remote video eye tracker and tightly controlled visual field brightness (see Section 7.1), I have found that it takes at least 50 epochs to see large pupillary responses (0.2–0.5 mm) cleanly, and hundreds of epochs to reveal pupillary responses smaller than 0.1 mm. The ±-average The purpose of averaging aligned pupil dilation data is to preserve the signal of interest (the task-evoked pupillary response) while decreasing the power of signal components not correlated in time with gaze events (the noise). However, the magnitude of the pupillary response being investigated is usually not known a priori, so in practice it is difficult to tell whether a feature of the averaged signal P (t) is noise or not. This problem also arises in the analysis of averaged EEG data, where a procedure called the ±-average is used to estimate magnitude of the noise by itself ([122], originally described as the ±-reference by Schimmel [105]). Instead of simply adding all the epochs and dividing by n, the epochs are alternately added and subtracted from P the running total: P ± (t) = n1 ni=1 Pi (t)(−1)i (only defined for even n). This way, any time-correlated signal will be positive half the time and negative half the time and thus cancel exactly to zero, while any other components of the average, which were already as likely to be positive as negative, will be unchanged and approach zero as a function of n as in the normal average. The average magnitude of P ± (t) is usually a good estimate of the noise power in the standard average. If no pupillary response stands out above this level, then either there is no pupillary response to see, or more trials are required to drive the noise power even lower. Bibliography [1] S Ahern and J Beatty. Pupillary responses during information processing vary with scholastic aptitude test scores. Science, 205(4412):1289–1292, September 1979. doi: 10.1126/science.472746. [2] John L. Andreassi. Electrodermal Activity and Behavior, pages 259–288. Routledge, 5 edition, 2006. ISBN 0805849513, 9780805849516. [3] John L. Andreassi. Pupillary response and behavior. In Psychophysiology: Human Behavior and Physological Response, chapter 12, pages 289–307. Routledge, 5th edition, 2006. ISBN 0805849513, 9780805849516. [4] M H Ashcraft. Cognitive arithmetic: a review of data and theory. Cognition, 44(1-2):75–106, August 1992. ISSN 0010-0277. PMID: 1511587. [5] Richard W. Backs and Larry C. Walrath. Eye movement and pupillary response indices of mental workload during visual search of symbolic displays. Applied Ergonomics, 23(4):243–254, August 1992. [6] Alan D. Baddeley. Working memory, thought, and action. Oxford University Press, 2007. ISBN 0198528019. [7] Dale L. Bailey, David W. Townsend, Peter E. Valk, and Michael N. Maisey. Positron Emission Tomography: Basic Sciences. Springer, 1st edition, April 2005. ISBN 1852337982. [8] C. P. Beaman. Inverting the modality effect in serial recall. The Quarterly Journal of Experimental Psychology Section A, 55(2):371–389, 2002. 103 104 BIBLIOGRAPHY [9] J. Beatty and D. Kahneman. Pupillary changes in two memory tasks. Psychonomic Science, 5(10):371–372, 1966. [10] J Beatty and BL Wagoner. Pupillometric signs of brain activation vary with level of cognitive processing. Science, 199(4334):1216–1218, March 1978. doi: 10.1126/science.628837. [11] Jackson Beatty. Phasic not tonic pupillary responses vary with auditory vigilance performance. Psychophysiology, 19(2):167–172, 1982. doi: 10.1111/j. 1469-8986.1982.tb02540.x. [12] Jackson Beatty. Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin, 91(2):276–292, 1982. ISSN 0033-2909 (Print); 1939-1455 (Electronic). doi: doi:10.1037/0033-2909. 91.2.276. [13] Jackson Beatty and Brennis Lucero-Wagoner. The pupillary system. In John T. Cacioppo, Louis G. Tassinary, and Gary Berntson, editors, Handbook of Psychophysiology, pages 142–162. Cambridge University Press, 2nd edition, 2000. ISBN 052162634X. [14] W. Becker, K. Iwase, R. Jürgens, and H. H. Kornhuber. Brain potentials preceding slow and rapid hand movements. In W. C. McCallum and J. R. Knott, editors, The Responsive Brain, pages 99–102. Wright, Bristol, 1976. [15] Frederic Boersma, Keri Wilton, Richard Barham, and Walter Muir. Effects of arithmetic problem difficulty on pupillary dilation in normals and educable retardates. Journal of Experimental Child Psychology, 9(2):142–155, April 1970. ISSN 0022-0965. doi: 10.1016/0022-0965(70)90079-2. [16] A. Boxtel and M. Jessurun. Amplitude and bilateral coherency of facial and jawelevator EMG activity as an index of effort during a two-choice serial reaction task. Psychophysiology, 30(6):589–604, 1993. doi: 10.1111/j.1469-8986.1993. tb02085.x. BIBLIOGRAPHY 105 [17] J. Bradshaw. Pupil size as a measure of arousal during information processing. Nature, 216:515–516, November 1967. [18] J. L. Bradshaw. Pupil size and problem solving. QJ Exp Psychol, 20(2):116–22, 1968. [19] J. L. Bradshaw. Pupil size and drug state in a reaction time task. Psychonomic Science, 18(2):112–113, 1970. [20] John L. Bradshaw. Background light intensity and the pupillary response in a reaction time task. Psychonomic Science, 14(6):271–272, 1969. ISSN 0033-3131 (Print). [21] Oswald Bumke. translated by Eckhard Hess in The Tell Tale Eye, pp. 23–24. New York: Van Nostrand. 1975, 1911. [22] Christopher H. Chatham, Michael J. Frank, and Yuko Munakata. Pupillometric and behavioral markers of a developmental shift in the temporal dynamics of cognitive control. Proceedings of the National Academy of Sciences, 106(14): 5529–5533, 2009. doi: 10.1073/pnas.0810002106. [23] Chaomei Chen. Information Visualization. 2004. ISBN 1852337893, 9781852337896. [24] James Clark and Allan Paivio. Dual coding theory and education. Educational Psychology Review, 3(3):149–210, 1991. doi: {10.1007/BF01320076}. [25] Kenneth D. Cocker. Development of pupillary responses to grating stimuli. Ophthalmic and Physiological Optics, 16(1):64–67, 1996. doi: 10.1046/j.1475-1313. 1996.9500016x.x. [26] J. M. Dabbs and R. Milun. Pupil dilation when viewing strangers: Can testosterone moderate prejudice? Social Behavior and Personality, 27(3):297–301, 1999. 106 BIBLIOGRAPHY [27] J. DeLaunay. A note on the photo-pupil reflex. Journal of the Optical Society of America, 39:364–367, 1949. [28] Andrew T. Duchowski. Eye Tracking Methodology: Theory and Practice. Springer, 1 edition, 2003. ISBN 1852336668. [29] Stephen H Fairclough and Kim Houston. A metabolic measure of mental effort. Biological Psychology, 66(2):177–90, April 2004. ISSN 0301-0511. doi: 10.1016/ j.biopsycho.2003.10.001. PMID: 15041139. [30] Thorsten Fehr, Chris Code, and Manfred Herrmann. Auditory task presentation reveals predominantly right hemispheric fMRI activation patterns during mental calculation. Neuroscience Letters, 431(1):39–44, 2008. ISSN 0304-3940. doi: 10.1016/j.neulet.2007.11.016. [31] J.M. Findlay and Larry R. Squire. Saccades and visual search. In Encyclopedia of Neuroscience, pages 429–436. Academic Press, Oxford, 2009. ISBN 978-008-045046-9. [32] R M Gardner, J S Beltramo, and R Krinsky. Pupillary changes during encoding, storage, and retrieval of information. Perceptual and Motor Skills, 41(3):951–5, December 1975. ISSN 0031-5125. PMID: 1215138. [33] B C Goldwater. Psychological significance of pupillary movements. Psychological Bulletin, 77(5):340–55, May 1972. ISSN 0033-2909. PMID: 5021049. [34] Eric Granholm, Robert F. Asarnow, Andrew J. Sarkin, and Karen L. Dykes. Pupillary responses index cognitive resource limitations. Psychophysiology, 33 (4):457–461, 1996. doi: 10.1111/j.1469-8986.1996.tb01071.x. [35] Robert Leo Greene. Human memory. Lawrence Erlbaum Associates, 1992. ISBN 080580997X, 9780805809978. [36] Gad Hakerem and Samuel Sutton. Pupillary response at visual threshold. Nature, 212(5061):485–486, October 1966. doi: 10.1038/212485a0. BIBLIOGRAPHY 107 [37] Todd C. Handy. Event-related Potentials: A Methods Handbook. The MIT Press, 1 edition, October 2004. ISBN 0262083337. [38] T C Hankins and G F Wilson. A comparison of heart rate, eye activity, EEG and subjective measures of pilot mental workload during flight. Aviation, Space, and Environmental Medicine, 69(4):360–7, April 1998. ISSN 0095-6562. PMID: 9561283. [39] Dan Witzner Hansen and Arthur E.C. Pece. Eye tracking in the wild. Computer Vision and Image Understanding, 98(1):155 – 181, 2005. ISSN 1077-3142. doi: 10.1016/j.cviu.2004.07.013. Special Issue on Eye Detection and Tracking. [40] W. Heinrich. Die aufmerksamkeit und die funktion der sinnesorgane. Zeitschrift für Psychologie und Physiologie der Sinnesorgane, 9:342–388, 1896. [41] Richard P. Heitz, Josef C. Schrock, Tabitha W. Payne, and Randall W. Engle. Effects of incentive on working memory capacity: Behavioral and pupillometric data. Psychophysiology, 45(1):119–129, 2008. doi: 10.1111/j.1469-8986.2007. 00605.x. [42] Eckhard H. Hess. Pupillometrics. In N. S. Greenfield and R. A. Sternbach, editors, Handbook of Psychophysiology, pages 491–531. Holt, Rinehart and Winston (New York), 1972. [43] Eckhard H. Hess and James M. Polt. Pupil size as related to interest value of visual stimuli. Science, 132(3423):349–350, August 1960. doi: 10.1126/science. 132.3423.349. [44] Eckhard H. Hess and James M. Polt. Pupil size in relation to mental activity during simple Problem-Solving. Science, 143(3611):1190–1192, March 1964. doi: 10.1126/science.143.3611.1190. [45] Scott A Huettel, Allen W Song, and Gregory McCarthy. Functional Magnetic Resonance Imaging. Sinauer Associates;;Palgrave, Sunderland Mass. ;Basingstoke, 2004. ISBN 9780878932887. 108 BIBLIOGRAPHY [46] H. Huynh and L. S Feldt. Performance of traditional f tests in repeated measures designs under covariance heterogeneity. Communications in Statistics-Theory and Methods, 9(1):6174, 1980. [47] Jukka Hyönä, Jorma Tommola, and Anna-Mari Alaja. Pupil dilation as a measure of processing load in simultaneous interpretation and other language tasks. The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology, 48(3):598, 1995. ISSN 0272-4987. doi: 10.1080/14640749508401407. [48] Cristina Iani, Daniel Gopher, and Peretz Lavie. Effects of task difficulty and invested mental effort on peripheral vasoconstriction. Psychophysiology, 41(5): 789–798, 2004. doi: 10.1111/j.1469-8986.2004.00200.x. [49] Interactive Minds. Binocular eyegaze analysis system. http://www. interactive-minds.com/en/eye-tracker/eyegaze-analysis-system, March 2010. [50] Shamsi T. Iqbal, Xianjun Sam Zheng, and Brian P. Bailey. Task-evoked pupillary response to mental workload in human-computer interaction. In CHI ’04 extended abstracts on Human factors in computing systems, pages 1477–1480, Vienna, Austria, 2004. ACM. ISBN 1-58113-703-6. [51] Shamsi T. Iqbal, Piotr D. Adamczyk, Xianjun Sam Zheng, and Brian P. Bailey. Towards an index of opportunity: understanding changes in mental workload during task execution. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 311–320, Portland, Oregon, USA, 2005. ACM. ISBN 1-58113-998-5. [52] J. Richard Jennings. Editorial policy on analyses of variance with repeated measures. Psychophysiology, 24(4):474–475, 1987. doi: 10.1111/j.1469-8986. 1987.tb00320.x. [53] Marcel A. Just and Patricia A. Carpenter. The intensity dimension of thought: BIBLIOGRAPHY 109 Pupillometric indices of sentence processing. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie exprimentale, 47(2):310–339, 1993. ISSN 1196-1961 (Print). doi: 10.1037/h0078820. [54] D. Kahneman and J. Beatty. Pupillary responses in a pitch-discrimination task. Perception & Psychophysics, 2:101–105, 1967. [55] D Kahneman, L Onuska, and R E Wolman. Effects of grouping on the pupillary response in a short-term memory task. The Quarterly Journal of Experimental Psychology, 20(3):309–11, August 1968. ISSN 0033-555X. PMID: 5683772. [56] Daniel Kahneman. Attention and Effort. Prentice Hall, September 1973. ISBN 0130505188. [57] Daniel Kahneman and Jackson Beatty. Pupil diameter and load on memory. Science, 154(3756):1583–1585, December 1966. doi: 10.1126/science.154.3756. 1583. [58] Daniel Kahneman and Patricia Wright. Changes of pupil size and rehearsal strategies in a short-term memory task. The Quarterly Journal of Experimental Psychology, 23(2):187, 1971. ISSN 1747-0218. doi: 10.1080/14640747108400239. [59] Daniel Kahneman, Jackson Beatty, and Irwin Pollack. Perceptual deficit during a mental task. Science, 157(3785):218–219, July 1967. doi: 10.1126/science.157. 3785.218. [60] R Kardon. Pupillary light reflex. Current Opinion in Ophthalmology, 6(6):20–6, December 1995. ISSN 1040-8738. PMID: 10160414. [61] Jeff Klingner. Fixation-aligned pupillary response averaging. In ETRA ’10: Proceedings of the 2010 Symposium on Eye-Tracking Research and Applications, pages 275–282, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-994-7. doi: 10.1145/1743666.1743732. [62] Jeff Klingner. The pupillometric precision of a remote video eye tracker. In ETRA ’10: Proceedings of the 2010 Symposium on Eye-Tracking Research and 110 BIBLIOGRAPHY Applications, pages 259–262, New York, NY, USA, 2010. ACM. ISBN 978-160558-994-7. doi: 10.1145/1743666.1743727. [63] Jeff Klingner, Rakshit Kumar, and Pat Hanrahan. Measuring the task-evoked pupillary response with a remote eye tracker. In Proceedings of the 2008 symposium on Eye tracking research and applications, pages 69–72, Savannah, Georgia, 2008. ACM. ISBN 978-1-59593-982-1. doi: 10.1145/1344471.1344489. [64] Jeff Klingner, Barbara Tversky, and Pat Hanrahan. Effects of visual and verbal presentation on cognitive load in vigilance, memory and arithmetic tasks. Psychophysiology, TBD:TBD, 2010. [65] Michael Kohn and Manfred Clynes. Color dynamics of the pupil. Annals of the New York Academy of Sciences, 156(Rein Control or Unidirectional Rate Sensitivity a Fundamental Dynamic and Organizing Function in Biology):931– 950, 1969. doi: 10.1111/j.1749-6632.1969.tb14024.x. [66] D. J. Lehr and B. O. Bergum. Note on pupillary adaptation. Perceptual and Motor Skills, 23:917–918, 1966. [67] William L. Libby, Beatrice C. Lacey, and John I. Lacey. Pupillary and cardiac activity during visual attention. Psychophysiology, 10(3):270–294, 1973. doi: 10.1111/j.1469-8986.1973.tb00526.x. [68] Irene Loewenfeld. The Pupil: Anatomy, Physiology, and Clinical Applications, volume 1. Butterworth-Heinemann, Oxford, UK, 2nd edition, 1999. ISBN 07506-7143-2. [69] A. D. Loewy. Autonomic control of the eye, page 268285. Oxford University Press, New York, A.D. loewy & k. m. spyer edition, 1990. [70] R H Logie, K J Gilhooly, and V Wynn. Counting on working memory in arithmetic problem solving. Memory & Cognition, 22(4):395–410, July 1994. ISSN 0090-502X. PMID: 7934946. BIBLIOGRAPHY 111 [71] O. Lowenstein and Irene Loewenfeld. The sleep–waking cycle and pupillary activity. Annals of the New York Academy of Sciences, 117:142–156, 1964. [72] O. Lowenstein and Irene E. Loewenfeld. Disintegration of central autonomic regulation during fatigue and its reintegration by psychosensory controlling mechanisms: I. disintegration. pupillographic studies. Journal of Nervous and Mental Disease, 115:1–21, 1952. [73] Mangold International. Mangoldvision eye tracker product page. http: //www.mangold-international.com/products/eye-tracker-solutions/ stationary.html, March 2010. [74] Sandra P. Marshall, C. W. Pleydell-Pearce, and B. T. Dickson. Integrating psychophysiological measures of cognitive workload and eye movements to detect strategy shifts. In Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS’03) - Track 5 - Volume 5, page 130.2. IEEE Computer Society, 2003. ISBN 0-7695-1874-5. [75] S.P. Marshall. The index of cognitive activity: measuring cognitive workload. In Human Factors and Power Plants, 2002. Proceedings of the 2002 IEEE 7th Conference on, pages 7–5–7–9, 2002. [76] James G. May, Robert S. Kennedy, Mary C. Williams, William P. Dunlap, and Julie R. Brannan. Eye movement indices of mental workload. Acta Psychologica, 75(1):75–89, October 1990. [77] JW McLaren, JC Erie, and RF Brubaker. Computerized analysis of pupillograms in studies of alertness. Invest. Ophthalmol. Vis. Sci., 33(3):671–676, March 1992. [78] O Miettinen and M Nurminen. Comparative analysis of two rates. Statistics in Medicine, 4(2):213–226, June 1985. ISSN 0277-6715. PMID: 4023479. [79] G. A. Miller. The magic number seven, plus or minus two. Psychological Review, 63:81–97, 1965. 112 BIBLIOGRAPHY [80] Karl E. Misulis and Toufic Fakhoury. Spehlmann’s Evoked Potential Primer. Butterworth-Heinemann, 3rd edition, May 2001. ISBN 0750673338. [81] Kevin P. Moloney, Julie A. Jacko, Brani Vidakovic, François Sainfort, V. Kathlene Leonard, and Bin Shi. Leveraging data complexity: Pupillary behavior of older adults with visual impairment during HCI. ACM Transactions on Computer-Human Interaction, 13(3):376–402, 2006. [82] Sofie Moresi, Jos J. Adam, Jons Rijcken, Pascal W.M. Van Gerven, Harm Kuipers, and Jelle Jolles. Pupil dilation in response preparation. International Journal of Psychophysiology, 67(2):124–130, February 2008. ISSN 0167-8760. doi: 10.1016/j.ijpsycho.2007.10.011. [83] M. Nakayama, I. Yasuike, and Y. Shimizu. Pupil size changing by pattern brightness and pattern contents. The Journal of the Institute of Television Engineers of Japan, 44:288–293, 1990. [84] Minoru Nakayama and Yasutaka Shimizu. Frequency analysis of task evoked pupillary response and eye-movement. In Proceedings of the 2004 symposium on eye tracking research & applications, pages 71–76, San Antonio, Texas, 2004. ACM. ISBN 1-58113-825-3. doi: 10.1145/968363.968381. [85] Neuroptics, Inc. Instruction manual, VIP-200 pupillometer, revision A, 2008. [86] Neuroptics, Inc. personal communication, August 2009. [87] Takehiko Ohno and Naoki Mukawa. A free-head, simple calibration, gaze tracking system that enables gaze-based interaction. In ETRA ’04: Proceedings of the 2004 symposium on Eye tracking research & applications, pages 115–122, New York, NY, USA, 2004. ACM. ISBN 1-58113-825-3. doi: 10.1145/968363.968387. [88] Fred Paas and Jeroen Van Merriënboer. Instructional control of cognitive load in the training of complex cognitive tasks. Educational Psychology Review, 6 (4):351–371, December 1994. doi: 10.1007/BF02213420. BIBLIOGRAPHY 113 [89] Allan Paivio. Mental Representations: A Dual Coding Approach. Oxford University Press US, 1990. ISBN 0195066669, 9780195066661. [90] Oskar Palinko, Andrew L. Kun, Alexander Shyrokov, and Peter Heeman. Estimating cognitive load using remote eye tracking in a driving simulator. In ETRA ’10: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, pages 141–144, New York, NY, USA, 2010. ACM. ISBN 978-1-60558994-7. doi: 10.1145/1743666.1743701. [91] W. Scott Peavler. Pupil size, information overload, and performance differences. Psychophysiology, 11(5):559–566, 1974. doi: 10.1111/j.1469-8986.1974.tb01114. x. [92] C. G. Penney. Modality effects and the structure of short-term verbal memory. Memory & Cognition, 17(4):398–422, 1989. [93] Polhemus, Inc. Visiontrak product web page. http://www.polhemus.com/ ?page=Eye_VisionTrak, March 2010. [94] M. Pomplun, S. Sunkara, A. V. Fairley, and M. Xiao. Using pupil size as a measure of cognitive workload in Video-Based Eye-Tracking studies. unreviewed manuscript available at http://www.cs.umb.edu/~marc/pubs/ pomplun_sunkara_fairley_xiao_draft.pdf, 2009. [95] Marc Pomplun and Sindhura Sunkara. Pupil dilation as an indicator of cognitive workload in human-computer interaction. In Proceedings of the International Conference on HCI, 2003. [96] G Porter, T Trościanko, and I D Gilchrist. Pupil size as a measure of task difficulty in vision. In Perception 31 ECVP Abstract Supplement, 2002. [97] Gillian Porter, Tom Troscianko, and Iain D. Gilchrist. Effort during vi- sual search and counting: Insights from pupillometry. The Quarterly Journal of Experimental Psychology, 60(2):211, 2007. 10.1080/17470210600673818. ISSN 1747-0218. doi: 114 BIBLIOGRAPHY [98] K Rayner. Eye movements in reading and information processing: 20 years of research. Psychological bulletin, 124(3):372–422, November 1998. ISSN 00332909. PMID: 9849112. [99] Francois Richer and Jackson Beatty. Pupillary dilations in movement preparation and execution. Psychophysiology, 22(2):204–207, 1985. doi: 10.1111/j. 1469-8986.1985.tb01587.x. [100] Francois Richer, Clifford Silverman, and Jackson Beatty. Response selection and initiation in speeded reactions: A pupillometric analysis. Journal of Experimental Psychology: Human Perception and Performance, 9(3):360–370, 1983. ISSN 0096-1523 (Print); 1939-1277 (Electronic). doi: 10.1037/0096-1523.9.3.360. [101] Arash Sahraie and John L. Barbur. Pupil response triggered by the onset of coherent motion. Graefe’s Archive for Clinical and Experimental Ophthalmology, 235(8):494–500, 1997. doi: 10.1007/BF00947006. [102] Dario D. Salvucci and Joseph H. Goldberg. Identifying fixations and saccades in eye-tracking protocols. In Proceedings of the 2000 Symposium on Eye Tracking Research & Applications, pages 71–78, New York, NY, 2000. ACM. ISBN 158113-280-8. doi: 10.1145/355017.355028. [103] Hal Scher, John J. Furedy, and Ronald J. Heslegrave. Phasic T-Wave amplitude and heart rate changes as indices of mental effort and task incentive. Psychophysiology, 21(3):326–333, 1984. doi: 10.1111/j.1469-8986.1984.tb02942.x. [104] J. M. Schiff and F. Foa. La pupille considerée comme ethésiomètre (translated by r.g. de choisity). Marseille Medical, 2:736–741, 1874. [105] Herbert Schimmel. The (±) reference: Accuracy of estimated mean components in average response studies. Science, 157(3784):92–94, July 1967. doi: 10.1126/ science.157.3784.92. [106] Kathrin B. Schlemmer, Franziska Kulke, Lars Kuchinke, and Elke Van Der Meer. Absolute pitch and pupillary response: Effects of timbre and key color. Psychophysiology, 42(4):465–472, 2005. doi: 10.1111/j.1469-8986.2005.00306.x. BIBLIOGRAPHY 115 [107] Andrew Scholey, Philippa Jackson, and David Kennedy. Mental effort, blood glucose and performance. Appetite, 47(2):277, September 2006. ISSN 0195-6663. doi: 10.1016/j.appet.2006.07.066. [108] SensoMotoric Instruments. iview x hi speed product page. http: //www.smivision.com/en/eye-gaze-tracking-systems/products/ iview-x-hi-speed.html, March 2010. [109] Malcolm Slaney, Michelle Covell, and Bud Lassiter. Automatic audio morphing. In Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02, pages 1001–1004. IEEE Computer Society, 1996. ISBN 0-7803-3192-3. [110] Benjamin Smith. 023/365: Eye see you! Flickr, under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 Generic license, July 2008. URL http://www.flickr.com/photos/dotbenjamin/2636942186/. [111] Stuart R. Steinhaer. Pupillary responses, cognitive psychophysiology, and psychopathology, 2002. [112] Stuart R. Steinhauer, Greg J. Siegle, Ruth Condray, and Misha Pless. Sympathetic and parasympathetic innervation of pupillary dilation during sustained processing. International Journal of Psychophysiology, 52(1):77–86, March 2004. ISSN 0167-8760. doi: 10.1016/j.ijpsycho.2003.12.005. [113] Tobii Technologies, Inc. personal communication, 2007. [114] Tobii Technologies, Inc. Tobii 1750, 2007. URL http://www.tobii.com. [115] Warren W. Tryon. Pupillometry: A survey of sources of variation. Psychophysiology, 12(1):90–93, 1975. [116] Kazuhiko Ukai. Spatial pattern as a stimulus to the pupillary system. Journal of the Optical Society of America A, 2(7):1094–1100, July 1985. doi: {10.1364/ JOSAA.2.001094}. 116 BIBLIOGRAPHY [117] Karl F Van Orden, Wendy Limbert, Scott Makeig, and Tzyy-Ping Jung. Eye activity correlates of workload during a visuospatial memory task. Human Factors: The Journal of the Human Factors and Ergonomics Society, 43(1): 111–121, 2001. [118] Steven P. Verney, Eric Granholm, and Daphne P. Dionisio. Pupillary responses and processing resources on the visual backward masking task. Psychophysiology, 38(1):76–83, 2001. ISSN 0048-5772 (Print); 1469-8986 (Electronic). doi: 10.1017/S0048577201990195. [119] Steven P. Verney, Eric Granholm, Sandra P. Marshall, Vanessa L. Malcarne, and Dennis P. Saccuzzo. Culture-Fair cognitive ability assessment: Information processing and psychophysiological approaches. Assessment, 12(3):303–319, 2005. ISSN 1073-1911. [120] W. T. Welford. Reaction Times. Academic Pr, November 1980. ISBN 0127428801. [121] H. Widdel. Operational problems in analysing eye movements. In A. G. Gale and F. Johnson, editors, Theoretical and Applied Aspects of Eye Movement Research, pages 21–29. Elsevier, New York, 1984. [122] P K Wong and R G Bickford. Brain stem auditory evoked potentials: the use of noise estimate. Electroencephalography and Clinical Neurophysiology, 50(1-2): 25–34, 1980. ISSN 0013-4694. PMID: 6159189.