Using eye movements to study visual search and to improve tumor

Transcription

Using eye movements to study visual search and to improve tumor
RadloGraphics
index
terms:
Imaging technoloav
PERCEPTION AND DISPLAY
Cumulative
DIagnostIc
observer
Index terms:
radIology,
prformance
Using eye movements
to study
visual search and to improve
tumor detection
Calvin
F. Nodine,
ph.D.*
Harold
L. Kundel,
M.D.t
S
Introduction
Picture
labeling
perception
areas
of the
consists
picture
with
of assigning
the
names
meaning
to a picture
of objects
that
have
by
exist-
ence in the real world. Thus, an experienced
observer
can point to a
chest image and say, “This is a rib” or “This is the heart”. Both “rib” and
“heart” are the names of anatomic
objects, stored in memory, that are
used as labels for certain groupings
of picture elements.
The
mechanism
for the grouping
and labeling
which we shall call object
recognition
is one of the great puzzles of human perception
to which
no solution will be offered here. Recognition,
which implies assigning a
meaning,
is distinguished
from detection,
which is deciding
about the
presence
or absence
of something
that is expected.
Radiological
investigators
have
theory
where
from
modelled
object
the object
everything
else
in the
detection
of interest
image
using
(the signal)
(the noise).
statistical
decision
must be distinguished
When
the object
is just a
blob on a clear background
and the noise is random variation,
object
detectability
can be described
by a signal-to-noise
ratio equation
that
can be derived from the principles
of imaging
physics and physiological optics. Few situations outside of the laboratory
are this simple. Usually, the object of interest (the target) is mixed in with other objects. The
target may either stand out or be hidden by the background
objects
depending
upon their structural characteristics
and arrangements.
Statistical-decision
From the Department
of
Educational
Psychology.
TempIe University (*) and the Pendergrass Diagnostic
Research
Laboratory.
University of
Pennsylvania
(t). Philadelphia.
PA.
This work was supported
in
part by NIH Research Grant CA
32870.
Address reprint requests to
CF. Nodine. Ph.D., Pendergrass
Radiology
Laboratory.
UniversHy of Pennsylvania.
3 Medical
Education Building. Philadelphia.
PA 19104-6068.
Volume
theory
still provides
a useful
method
for quantifying
the detection
of the target objects (i.e. receiver operating
characteristic (ROC) analysis), but the relationship
to theory is stretched
thin. Detecting
an object
That is hidden
in a natural
scene
detecting
an object
displayed
against
a background
is not the same as
of random
noise
(V. The objects in the background
of a scene camouflage
the target
interfering
with object recognition
and forcing the observer
to sift
through extraneous
objects in order to find the target. This is illustrated
by a game devised by artist Al Hirschfeld
of the New York Times who
challenges
the viewers of his cartoons
by hiding the name NINA somewhere in the scene.
7, Number
6, Monograph
#{149}
November
1987
#{149}
RadloGraphics
1241
Perception
and
display
in diagnostic
Nodine
imaging
and
Kundel
Can you find NINA in Hirschfeld’s
depiction
of a scene from “The
Apartment”?
Looking for a lung tumor, camouflaged
by the anatomical
structures of the chest presents the same type of problem
to the perceptual system.
IA
lB
Figure 1
(A) A scene from the film “The Apartment” by Al Hirschfeld.
Hirschfeld’s daughter’s name NINA is embedded in a pictonal detail of the scene. (B) A close-up of the pictorial
detail containing the name NINA. Notice how Hirschfeld
uses the natural contours of the lamp to hide the letters.
This illustrates the camouflaging
effect known as mimicry.
I
-c
#{149}1
.;;
2A
Figure 2
(A) A portion of the right chest exposed in full inspiration
showing a lung tumor. (B) The same chest exposed at
partial inspiration. Notice how the vasculature camou-
1242
2B
I
RadloGraphics
#{149}
November
1987
flages the tumor. This illustrates the camouflage
principle
of “dazzle” in which target-object
contours are broken by
counterforms (17).
Volume
#{149}
7, Number
6, Monograph
and Kundel
Nodine
Perception
and
The lung tumor that is easily detectable
in
the Figure 2A is more difficult to detect in the
Figure 2B because
of the camouflaging
effect
from overlapping
vascular structures.
The practical
consequence
of this situation
difference
radiologist.
from three
programs
missed on
is diagnostic
visible in retrospect.
difference
error
which
does
not
to a “Ninamaniac”
make
much
but may
Percent
make
Screening
People
Cancers
Interval
Screened
Found
ANNUAL
In Retrospect
No.
%
78/156
46
1984
10,040
168
1983 (3)
4 MOS.
4.618
92
70/92
76
Hopkins
1978 (4)
ANNUAL
10,362
78
14/78
18
Eye Movements
Detection
of camouflaged
lung tumors in
chest x-ray images demands
identifying
places
where tumors might be hiding, examining
those
places for tumor features, and perceiving
a
tumor when only part of it is showing. We have
studied how perception
and cognition
interact
detection
Visible
Mayo
Recording
in the
from Three Large
Programs
Memorial
(2)
of camouflaged
objects
imaging
a
Screening
Institution
in diagnostic
to the people
consulting
with the
Table I presents a summary
of data
recent NIH supported
lung-screening
showing the percent
of lung cancers
the original reading of the image, but
Table I
Visible in Retrospect
of Lung Cancers
display
by
studying eye movements
of experts and non-experts searching
for targets in natural pictorial
backgrounds
The eye movement
recordings can be used to determine
where viewers
focus their visual attention.
Eye movements
are
measured
by having the viewer wear a specially-designed
spectacle
frame containing
infrared emitters and sensors that measure
changes
in light reflected
from the border between the iris and sclena of each eye
(5,6,7).
(8).
Figure 3
A frontal view of a viewer wearing the eye-movement
spectacles. The sensors are mounted below the field of
view of the eyes.
Volume
7, Number
6, Monograph
#{149}
November
1987
RadioGraphics
#{149}
1243
Perception
and
display
tions).
in diagnostic
imaging
The eye moves
in jumps
The reflectance
changes
Nodine
(saccades)
are
with intervening
converted
pauses
to x,y coordinates
and Kundei
(fixawhich
indicate the location and duration of fixations. We have found that fixations occur in clusters, and we have used fixation clusters to infer where
attention
is directed.
Figure 4
A series of eye-fixation records showing the locus of the
axis of gaze during fixation and the path of the eye during movement. (A) The raw eye-position record. The small
squares show the x,y locations of the eyes during a 15
sec. viewing period. Each square represents 1/60 sec. (B)
The eye-fixation record. The raw data points are grouped
into fixations. The small circles show the x,y centers of the
raw data groupings. It is the centers of these data groupings that define individual fixations. The lines between
fixations are added to show the scanpath. Each fixation
has a duration, which is the sum of the raw data points,
that varies from 1/60 to I sec. (C) The same fixation record
showing the foveal field of view of each fixation. Circles
having a radius of .5 degrees have been drawn around
the center of each fixation to indicate how much image
detail is picked up by the high-resolution fovea. The
range ofthe fovea has been estimated at I degree visual
angle on the chest image. (D) The same fixation record
showing fixation clusters. Fixations are grouped into clusters using a running-mean
rule, and circles having a
radius of 2.5 degrees have been drawn around the mean
of each fixation cluster. The clusters contain differing
numbers of fixations, reflecting differences in the degree
to which underlying image features are scrutinized.
Figure 5 *
The model of visual search. The initial glance results in a
global impression that provides the viewer with information about orientation, symmetry and anatomic layout of
distinctive image features. Scanning tests diagnostic hypotheses by scrutinizing distinctive image features for significant anatomical perturbations. The evidence
gathered from these tests is used to generate plausible
perceptual interpretations that lead to a diagnostic decisian.
1244
RadioGraphics
#{149}
November
1987
Volume
#{149}
7, Number
6, Monograph
Nodine
and Kundel
Percepflon
The Visual
Search
and
and DetectIon
ments
display
From over a decade
of measuring
the eye moveof radiologists
as they scan x-ray images
I
GLOBAL
L
major components:
attention
to image
EXPECTATIONS
SCHEMA
IMPRESSION
1
ORI ENTATION
SYMMETRY
ANATOMIC
LAYOUT
overall
detail;
model
provides
basic
applied
an organizational
to the understanding
perceptual
It has been
ures and
useful
FOCAL
ATTENTION
SCRUTINY OF IMAGE PERTURBA11ONS
TESTING
OF DIAGNOSTIC
HYPOTHESES
N
Volume
7, Number
for
that
have
we
of lung tumor
for classifying
methods
We are
that occur
detec-
detection
fail-
for improving
particularly
when the
per-
interested
signal-to-
noise ratio is well above that required
for “threshold”
detectability
as in Hirschfeld’s NINAs and most missed
lung tumors.
The model
N
viewers
N
G
developed
a
that has three
framework
processes
has suggested
ceptual
performance.
in detection
failures
S
C
A
we have
detection
pattern
recognition;
focal
and, decision
making.
This
studying
tion.
imaging
Model
searching
for abnormalities,
model of visual search and
COGNITIVE
in diagnostic
It assumes
that
to viewing.
6, Monograph
depicts
first glance
#{149}
the
a search
November
events
to the decision
1987
task
has
that
occur
about
been
RadioGraphics
#{149}
from
the
the image.
defined
prior
1245
Perception
and
display
in diagnostic
imaging
Nodine
Overall
pattern
recognition.
The first
glance at the image produces
a global impression in which the viewer brings his cognitive schema to bear on the image data
obtained
by the retina. The cognitive
schema
consists of knowledge
about the
mapping
of anatomy
and pathology
on to
radiographic
images together
with expectations
about the to-be-seen
image. This
initial
interaction,
hundred
with
which
takes
milliseconds,
a fairly
accurate
Cl)
Ui
>
Cl)
0
the viewer
conception
Kundel
Flash
a few
leaves
and
0.
of the
Ui
content
of the image. It has been shown
that radiologists
can make reasonably
accurate diagnostic
interpretations
from the
information
obtained
in a single,
brief
glance
(9,10). The global
impression
provides the perceptual
system with the informotion needed
to carry out the diagnostic
task. Potential target sites are flagged
and
deviations
from the viewer’s cognitive
schema,
called perturbations
are noted.
This initial impression
sets the stage for detailed focal analysis of the image by the
central vision (IV.
Figure 6
Receiver operating characteristic (ROC) curves comparing detection performance on a set of 22 normal and abnormal chest
images after one 0.2 sec. flash and after unlimited free search.
Scanning.
Following
the initial impression,
the eyes are moved over the image so that central vision can be used to examine
potential
target
sites and
perturbations.
Percent true positives is plotted
The examination
Decision.
The fixations that cluster at perturbations or potential
target sites are presumably
collecting
the data necessary
to test for the
where
presence
of an abnormal
object.
veals a target
object,
a decision
that prolonged
on image detail
ment record
however,
reveals where
lingered,
providing
indirect
evidence
rea
tumor may result. If testing is negative
or inconclusive, search continues.
We consider
each fixation cluster as a decision node. Thus, the report
“normal
chest” is an overall impression
based
on a series of local decisions
that are needed
because
the relevant anatomic
features can
only be resolved by central vision. The viewer is
not aware of all of the decisions,
positive and
negative,
made
during
1246
RadioGraphlcs
scanning.
#{149}
covert
decisions
were
1987
We believe
making activity associated
with the interpretation of anatomical
perturbations
that have potentialas tumor targets.
Volume
#{149}
made.
the eye
about
or multiple
fixations that cluster
signal the testing and decision-
The eye-move-
November
and percent
indicates the index of detectability (del which for the flash condition is significantly greater (de’ 1.2) than chance performance (de’ 0). Overall accuracy for the flash condition was 70
percent true positives as compared with 97 percent true positives for free viewing (based on 10).
is accomplished
by clusters of closely-spaced
fixations. After the places identified
during the
initial global impression
have been scrutinized,
the viewer may follow the same stereotyped
scanning
pattern aimed at discovering
something that was missed, or, may simply scan at
random
while thinking about the image.
If testing
to report
on the ordinate
false positives is plotted on the abcissa. The dotted diagonal line
7, Number
6, Monograph
Nodine
and
Kundel
Perception
in Detecting
Errors
and Interpreting
Following
this model,
we hypothesize
three
sources
of error: sampling,
recognition,
and, decision making
(12).
Sampling
Error. If the purpose
of scanning
is to sample
the image
with the high-resolution
region
of the central
retina
called
the fovea,
then it is likely that some parts of the image
will
be neglected.
they
are
image
Maps of fixation
unevenly
clusters
distributed
(See Figure
over
4D). We have
show that
a chest
hypothesized
that prolonged
scrutiny
is accomplished
by increasing
the number
of fixations
instead
of just
increasing
the duration
of a single fixation.
A
cluster
then, extends
the limited
foveal
vision to
a wider
circular
field
of
±
2.5
deg.
and
display
in diagnostic
Targets
Decision-Making
Error. Often,
parts
camouflaged
objects are detected,
viewer decides that they are normal
rather
than
imaging
the target.
These
errors
of
but the
variants
are
relatively
easy to identify in the eye-movement
record because there is an increase in the number of fixations
clustering
increased
on the
target
visual scrutiny.
site caused
by the
This is the most preva-
lent
type of error.
A study of lung nodule
detection
showed
that 10 percent
of the misses were due to sampling, 30 percent due to recognition and 60 percent to decision
making. (10)
visual angle
(10). Typically
80-90
percent
of the lung image
is covered
by fixation
clusters
of this size (13).
Stated
another
way, it takes about
18 fixation
clusters
to sample
adequately
a chest image.
Coverage
is not exhaustive,
because
the main
purpose
of scanning
is the testing
of perceptual
inferences.
In the process,
some locations
that
are considered
perceptually
uninteresting
are
not covered
(14).
Recognition
Error. Many targets
are looked
at directly
but are not reported.
Looking
at a region containing a target does not guarantee
that it will be recognized,
especially
when
the
target
is camouflaged.
It has been shown that
fixating
a region
for one third of a second
is sufficient
for a negative
decision,
but a deeplyembedded
target
can reguire
a cluster
of fixations lasting up to 3 seconds
(5,15). It is not clear
if the negative
decision
is made
actively
or by
default.
When the viewer
spends
no more time
attending
to an unreported
target
than is spent
attending
to a normal
anatomical
structure,
it is
assumed
that the local picture
elements
were
not synthesized
into a recognizable
object.
Volume
7, Number
6, Monograph
Figure 7A
Examples of three types of errors. Sampling error. The
chest image is scanned by fixation clusters, the boundaries of which are represented by circles, but the lung
tumor in the left upper lobe is notfixated as indicated by
an absence of clusters on or near the target.
#{149}
November
1987
RadloGraphics
#{149}
1247
Perception
and
display
in diagnostic
imaging
Nodine
7B
and
Kundel
7C
Figure 7B & C
(B) Detection error. The lung tumor is fixated by one fixation cluster, but the target is scrutinized by only a single
fixation indicating lack of visual interest in the local image
features. (C) Decision-making
error. The lung tumor is fixated by a fixation cluster containing multiple fixations;
five fixations are shown. Despite this evidence of exiensive scrutiny, the viewer decided that the local image
features did not meet his criteria for defining a true tumor
target and called the image “normal”.
Feedback-Assisted
The number
of fixations
that cluster
at decision nodes varies with the decision
made
at that
local image
site.
True negatives
have the fewest
number
of
fixations
per cluster with a mode
of 2. True positives have the most fixations
per cluster
with
a mode
of 5. False negatives
fall in between
with a mode
between
3-4 fixations
per cluster
indicating
that these decisions
receive
increased
visual scrutiny
compared
to true negative decisions.
Given
that many false negative
decision
nodes can be identified
on the basis of
multi-fixation
clusters,
an interesting
question
is: If
the viewer
is given feedback
about the location
of these multi-fixation
clusters,
can re-evaluation
of decisions
at these potential
target
sites improve performance?
Feeding
back locations
on
the image
that received
intensive
visual scrutiny
gives the viewer
1248
an opportunity
RadioGraphics
#{149}
to review
November
1987
Visual
Search
areas that aroused suspicion but were dismissed
as normal. The original decision can then be revised
or confirmed
on the
of the
second
Preliminary
Results. An experiment
using
Feedback-Assisted
Visual Search
is now in progress (16). A computer-display
system
has been
developed
that provides visual feedback
to the
viewer. The feedback,
in the form of highlights
on the display, is based on data obtained
by
monitoring
highlighted
gorithm
eye-position
locations
that
multi-fixation
identifies
clusters.
are
during
scanning.
The
determined
by an al-
image
These
features
image
receiving
features
are
presumed
to have perceptual
significance
to
the viewer. They represent
perceptually
suspicious
aspects
of the
image.
those
Volume
#{149}
basis
look.
7, Number
6, Monograph
Nodine
and Kundel
Perception
NO. FIXATIONS
I
and
display
in diagnostic
imaging
Figure 8
The distributions of number of fixations per
cluster for three types of decisions. The decision types were determined by measuring
all fixation locations leading up to an overt
decision by the viewer. If the decision’is positive (tumor present) fixations clustering on
truly abnormal image features are categorized as true positives. If the decision is negative (tumor absent) fixations clustering on
truly abnormal image features are categorized as false positives; fixations clustering on
truly normal image features are categorized
as true negatives. The true negative decisions peak at 2 fixations per cluster, the false
negatives at 3 fixations per cluster and the
true positives at 5 fixations per cluster.
CLUSTER
Cl)
Ui
U)
-I
C)
In Phase 1, eye movements
are recorded
as
the radiologist
searches
for lung tumors
in chest
images.
The radiologist
then gives his decision.
In Phase 2, the image
is re-presented
highlighting the locations
of intense
scrutiny
indicated
by
multi-fixation
clusters
(feedback
condition),
or,
random
locations
are fed back
(pseudofeedback condition)
as a control.
The viewer
examines
a second
each highlighted
decision. About
location
and gives
6-8 locations
are
highlighted. Prelimimary tests were carried out
on three viewers
each
examining
120 chest im-
ages, 60 with tumors
show
that
when
and 60 normals.
feedback
was
given
The results
where
at
least one highlight fell on a tumor target, 19 percent of false negative
decisions were revised to
true positive decisions compared
with 8 percent
for pseudofeedback
when
none of the highlights fell on a tumor
target.
The conversion
of
true negatives
to false positives
was the same in
both conditions
(8 percent).
This finding mdicates that informative
feedback
has a positive
effect on nodule detection.
Encouraged
by this
result, research
is continuing
identification
of perceptually
derived
from eye-movement
methods
for displaying
visual
Volume
especially
suspicious
recordings
feedback.
7, Number
on the
areas
and
6, Monograph
Figure 9
A schematic diagram of the Feedback-Assisted Visual
Search system. The viewer wears a pair of spectacles
containing the eye-movement
sensors. These sensors record the viewer’s eye fixations and send them to a computer that analyzes and stores them in Phase I. The viewer
is given 15 sec. to scan the image and make a decision.
The image is re-presented in Phase 2 where multi-fixation
clusters from Phase I meeting a certain numerical criterion are fedback by highlighting their locations on the
image. The viewer re-evaluates the highlighted areas
and revises his original decision.
#{149}
November
1987
#{149}
RadioGraphics
1249
Perception
and
display
in diagnostic
imaging
Nodine
and
Kundel
Conclusions
The eye-brain
system is presently the best
target detector
known, despite the fact that it is
occasionally
fooled by the veil of camouflage
that hides relevant targets. Medical
education
and training can program
the eye-brain
system
direct measures
tually suspicious
to make
during image interpretation
glimpse of the fundamental
mind of the radiologist.
plausible
perceptual
interpretations
even from medical
images containing
the most
meager
perceptual
data. It may be possible to
further improve viewer performance
by using in-
in diagnostic
decision
perimental
and
of attention
to identify percepimage features for re-evaluation
making.
The unique
tool that has enabled
quantify
the
pattern
of human
ex-
us to measure
attention
has also given us a
workings of the
References
1. Kundel
HL. Nodine
CF. Thickman
Dl, Carmody
D. Toto L.
Nodule
detection
with and without
a chest image.
Invest
Radiol 1985; 20:94-99.
2. Heelan
RT, Flehinger
BJ, Melamed,
MR. et al. Non-smallcell lung cancer:
Results of the New York screening
program. Radiology
1984; 151:289-293.
3, Muhm JR. Miller WE. Fontana
RS, et al. Lung cancer
detected
during
a screening
program
using four-month
chest radiographs.
Radiology
1983; 148:609-615.
4. Stitik FP, Tockman
MS. Radiographic
screening
In the
early detection
of lung cancer.
Rad Clin North Am 1978;
16:347- 366.
5. Kundel
HL. Nodine
CF. Studies of eye movements
and
visual search
in radiology.
In: Eye movements
and the
higher
psychological
functions.
Senders
JW. Fisher DF.
Monly RA eds. Hillsdale,
N.J.: Erlbaum,
1978.
6. Nodine
CF. Carmody
DP, Kundel HL. Searching
for NINA.
In: Eye movements
and the higher
psychological
functions. Senders JW. Fisher DF, Monly RA, eds. Hillsdale,
N.J.:
Erlbaum.
1978.
7. Kundel HL. La Follette
PS Jr.,. Visual search
patterns
and
experience
with radiological
images.
Radiology
1972;
103:523 -528.
8. Carmody
DP, Kundel HL. Nodine
CF. Performance
of a
computer
system for recording
eye fixations
using limbus
reflection.
Behav Res Meth Instr 1980; 12:63-66.
9. Gale A Vernon
J. Millar K Worthington
BS. Interpreting
radiographs
in a single glance
(abstr.). Radiology
1983;
149(P):253.
1250
RadioGraphics
November
#{149}
1987
10. Kundel
HL. Nodine
CF. Interpreting
chest radiographs
without
visual search.
Radiology
1975; 116:527-532.
11. Kundel
HL, Nodine
CF. A visual concept
shapes
image
perception.
Radiology
1983; 146:363368.
12. Kundel HL, Nodine
CF. Carmody
DP. Visual scanning,
pattern recognition
and decision-making
in pulmonary
nodule
detection.
Invest Radiol 1978; 13:175-181.
13. Nodine
CF. Kundel HL. The cognitive
side of visual search
In radiology.
In: O’Regan
JK Levy-Schoen
A eds. Eye
movements:
From physiology
to cognition.
Amsterdam:
Elsevier. 1987:
234.
16. Nodine
CF. Kundel HL. Using eye movements
to study
decision-making
processes
of radiologists.
Presented
at
the Fourth European
Conference
on Eye Movements,
Goffingen.
W. Germany.
1987.
17.
Volume
#{149}
573-582.
14. Kundel
HL, Nadine
CF. Thickman
D, Toto L. Searching
for
lung nodules:
A comparison
of human
performance
with
random
and systematic
scanning
methods.
Invest Radio
1987; 22:417-422.
15. King MG. Stanley GV. Burrows GD. Visual search
processes in camouflage
detection.
Hum Factors 1984; 26:223-
Behrens
RR. Art and
American
Review,
7, Number
camouflage.
1981.
6, Monograph
Cedar
Falls,
Iowa:
North