Managing Visual Clutter: A Generalized Technique for Label

Transcription

Managing Visual Clutter: A Generalized Technique for Label
Managing Visual Clutter: A Generalized Technique for Label Segregation
using Stereoscopic Disparity
Stephen Peterson∗
Magnus Axholt†
Stephen R. Ellis‡
Department of Science and Technology
Department of Science and Technology
Human Systems Integration Division
Linköping University
Linköping University
NASA Ames Research Center
A BSTRACT
We present a new technique for managing visual clutter caused by
overlapping labels in complex information displays. This technique, “label layering”, utilizes stereoscopic disparity as a means
to segregate labels in depth for increased legibility and clarity. By
distributing overlapping labels in depth, we have found that selection time during a visual search task in situations with high levels
of overlap is reduced by four seconds or 24%. Our data show that
the depth order of the labels must be correlated with the distance
order of their corresponding objects. Since a random distribution
of stereoscopic disparity in contrast impairs performance, the benefit is not solely due to the disparity-based image segregation. An
algorithm using our label layering technique accordingly could be
an alternative to traditional label placement algorithms that avoid
label overlap at the cost of distracting motion, symbology dimming
or label size reduction.
Keywords: Label placement, user interfaces, stereoscopic displays, augmented reality, air traffic control.
Index Terms: H.5.2 [Information Systems]: User Interfaces; I.3
[Computing Methodologies]: Computer Graphics
1
I NTRODUCTION
As information systems convey more and more data in confined
spaces such as computer screens, care must be taken in the user interface to manage the resulting visual clutter. In cluttered displays,
information may be obscured, fragmented or ambiguous, negatively
affecting system usability.
Labels, textual annotations containing object data, are one important source of visual clutter, as they overlay background layers
containing their associated objects. Since legible labels need to occupy a certain minimum screen space, they may occlude or obscure
other information, including other labels.
Because labels are generally associated with objects or features
in the background, their placement is linked to the spatial projection
of their corresponding objects on the display plane. In certain cases,
such as some information visualization applications, the underlying
data can be spatially or temporally rearranged to simplify labeling
and data interpretation. However, in applications like see-through
Augmented Reality (AR), the background normally consists of real
objects directly observed by the system user; accordingly all underlying display elements cannot be adjusted freely to simplify the
labeling task.
The application domain explored below is an AR display for Air
Traffic Control (ATC) towers, in which tower and apron controllers
operate to maintain safe aircraft separation at the airport. In our
∗ e-mail:
[email protected]
[email protected]
‡ e-mail: [email protected]
† e-mail:
IEEE Virtual Reality 2008
8-12 March, Reno, Nevada, USA
978-1-4244-1971-5/08/$25.00 ©2008 IEEE
environment a Head-Up Display (HUD) system could use AR techniques to process position data and overlay controlled aircraft with
labels, “data tags”, presenting vital flight information such as callsigns. This type of display could minimize controllers’ head-down
time and attention shifts required to scan traditional radar displays.
Despite the elevation of the control tower cab, typically about 50
meters above ground level, the lines of sight to controlled aircraft
towards the local horizon are greatly compressed due to their relatively large distance from the tower, which could surpass 3 km.
Therefore, the associated overlaid aircraft labels will frequently be
subject to visual clutter in a HUD as they would likely overlap other
aircraft and labels, especially at busy airports with distant taxiways
and runways.
Traditional label placement algorithms evaluate available 2D
screen space to find optimal label locations without overlap, e.g.
in cartography [7, 24], scientific illustration [10] and ATC radar interfaces [6, 9, 16]. This approach to label placement is not limited
to a 2D presentation medium, since it includes AR and virtual environment interfaces [2, 3, 25, 21]. While these techniques generally
avoid visual overlap, they introduce another interface design issue:
which label belongs to which object? Despite the fact that a label
may be connected to its background object with a line, there may
be confusion as labels move according to the motion of their corresponding objects. Such confusion occurs especially if label lines
intersect or are forced to overlap due to imperfect performance of
label placement algorithms. Moreover, motion from automatic rearrangement of label positions can disturb or distract the user [1].
Other approaches aim at reducing visual clutter without spatial
rearrangement, e.g. information filtering [15] or symbology dimming [12, 13] of data unimportant to the current task. However,
automated importance classification and subsequent display suppression can entail a safety risk. Furthermore, declutter algorithms
generally do not totally avoid the confusing overlap; they merely
reduce it.
We propose an alternative approach to reduce the visual clutter
associated with label overlap: label layering. This approach does
not rearrange labels in 2D screen space, nor does it filter or dim any
information. Instead it extends the design space and utilizes the
depth dimension, available in e.g. stereoscopic AR displays. More
specifically, our technique entails placing labels in a certain number
of predetermined depth layers located between the observer and the
observed objects, with droplines connecting each label to its corresponding object in depth. While the general technique of reducing
visual clutter using stereoscopic disparity is not novel in itself, as
discussed later on, it is to our knowledge the first application and
rigorous evaluation of the technique concerning the specific problem of label placement. In this work the label layering technique is
instantiated in a HUD for control towers; however, it could potentially be applied to any user interface equipped with a stereoscopic
display device.
The human vision system interprets depth through a series of
depth cues, which combine to give the observer both relative and
absolute object depth information [5]. One of these cues, retinal (or
binocular) disparity, interprets the difference (disparity) in binocular parallax on the retinas to trigger depth perception. The sensation
169
confusing display situation with multiple layers of overlapping labels. We manage the clutter by adjusting the labels’ stereoscopic
disparity.
This paper reports on an experiment testing whether human performance improves in a system with label layering based on stereoscopic disparity. It also assesses whether the disparity differences
themselves are sufficient to provide practical benefit or whether
consistency in depth order of the objects and their labels is also
important.
Section 2 presents related work to the experiment described in
section 3. The results of the experiment are provided in section 4.
Section 5 discusses these results while future work on an automated
label layering algorithm is outlined in section 6.
2
Figure 1: Photographs taken from the subjects’ right eye viewpoint
in three different viewing conditions (a-c), showing the traffic display
overlaid with labels and droplines rendered on the HUD. No polarized filter was attached to the camera, making both the left and right
eye image visible on the HUD. In (a) the scenario was rendered with
fixed far disparity, in (b) with random disparity, and in (c) with ordered disparity. In each viewing condition the lines marked α , β and
γ show the uncrossed stereoscopic disparities in the HUD imagery;
a long line indicates a large disparity and subsequently a large apparent distance behind the screen. The lines marked α indicate the
disparity of the label associated with the farthest object, β of an intermediate object, and γ of the closest object. The lines marked δ
are constant in all viewing conditions, showing the disparity where
the label droplines connect to their corresponding objects.
of depth in a stereoscopic display is an effect of retinal disparity
produced by the disparity in the left and right eye stereo images.
The stereoscopic disparity of two objects at different depth is defined as the difference between convergence (parallax) angles from
the left and right eye points.
Though stereoscopic disparity is mostly known as a depth cue, it
is possibly more important for its role in image segregation through
its support for camouflage breaking. Three-dimensional objects
that share identical visual textures with their background can be revealed by telltale disparity gradients visible in stereographic viewing systems [11]. Aerial stereography takes advantage of this fact in
uncovering camouflaged military targets, invisible in a single camera view. The random dot stereogram, invented by Julesz [14],
could be considered the “perfect camouflage” since hidden objects
appear only if viewed by a stereographic system such as human vision; disparity is the only means to separate objects from the background noise.
Our label layering approach uses the same concept in a cluttered
170
R ELATED W ORK
There have been previous approaches of information segregation
using depth layering to reduce visual clutter. The emphasis has generally been, however, on segregating information of different types
in depth, i.e. information layering. This approach is different from
our label layering technique, since we aim to distribute information of the same type, labels, in different depth layers. These two
approaches, however, share many principles.
Information layering, using the dual physical display planes in
a multilayered display, has been investigated and compared to traditional, single layer, display devices. Although no performance
benefit was found in the simple task conditions, significant performance improvements were detected using the multilayered display
under demanding task conditions [23].
The information layering approach was also found to effectively
offset the effects of added visual clutter in a visual tracking task
on a flight display [19]. The clutter was added to the display in
form of noise (making the display visually crowded), and in the
form of absence of color coding for the tracking and target symbols
(making the stimuli and task ambiguous). The noise, tracking and
target symbols were only present in three segregated depth planes,
reducing the noise impact and providing a cue to clearly identify
the tracking and target symbols.
Work using random dot stereograms has shown that it is possible
to perceive three transparent layers concurrently, extending up to
five in optimal conditions with low layer complexity and high interlayer disparity (> 5.7 arcmin) [22]. Although the display format of
random dot stereograms greatly differs from stereoscopic AR displays, the findings could be supportive of the proposed depth layer
design and distribution, suggesting practical limits to the number of
layers that may be used.
Object motion has been found to interact with stereo in various
ways. Motion detection thresholds have been found to be higher
with stereoscopic vision (e.g. [4, 18]), a phenomenon known as
stereomotion suppression. Conversely, depth segregation is facilitated by motion (e.g. [17]). As relative motion may aid segregation
of labels, the benefits of disparity-based label segregation could be
reduced in situations with moving objects. Consequently, the overall ability to segregate labels based on moving stereoscopic display
elements is perceptually complex and difficult to predict in general.
Practical implementations require empirical investigation.
3
M ETHOD
We constructed an experimental setup which allowed us to fulfill
the following experimental goals:
(i) Simulate realistic traffic at a major airport from the viewing
position of an air traffic control tower. Render this simulation
on a screen placed approximately at optical infinity relative
to the observer, a distance where optical properties of visual
stimuli are similar to those at the relatively large distances in
a real airport environment.
(ii) Overlay the airport traffic with labels identifying each object,
and evaluate whether depth segregation of overlapping labels
helps declutter the display and reduce users’ visual search and
selection time.
(iii) Render the overlay on a stereoscopic HUD, located at a realistic distance from the user considering a normal tower environment while minimizing the accommodation-vergence mismatch characteristic of the stereoscopic display format.
3.1 Hardware Setup
The user was seated on a stool 2.0 m from the HUD (marker ”2”
in fig. 2), a semi-transparent polarization-preserving projection
screen, where the overlay graphics were rendered using passive
stereo techniques. The distance was chosen so that it would be
consistent with a realistic airport tower environment while the difference in accommodative demand between any two targets would
always be less than 0.5 diopters. The active screen area resolution
was 900×450 pixels with a 24.8◦ ×14.2◦ Field-of-View (FoV), giving each pixel a size of 1.7×1.7 arcmin. Two floor-mounted 6500
ANSI lumen projectors with linear polarization filters were used
for projection. Previous measurements with full projector intensity
have yielded center-screen contrast values (Michelson) over 0.99
and luminance values reaching 1000 cd/m2 [20]; approximately
four times brighter than a regular computer monitor. This was considerably brighter than the background traffic display, so the projector intensities were lowered to increase visibility of the traffic
display and to remove visual crosstalk in the HUD. The HUD was
driven by an NVIDIA Quadro FX 4500 stereo graphics card on a
dual Intel Xeon workstation running Ubuntu Linux.
Although seated, the users were free to move their upper body
and head. Head tracking data was fed through an IS-900 tracking
system to the HUD, which meant that no height adjustment of the
stool was needed for each user. The tracker sensor was attached to
a lightweight and comfortable spectacle frame which was mounted
on the user’s head. Traffic data, providing the HUD information
about the current location of objects in the traffic simulation, were
communicated over a dedicated LAN with a 1 Hz refresh rate (normal ground radar performance). Positions for intermediate frames,
approximately 60 per second, were determined using linear extrapolation from previously known positions.
The room was darkened and the windows were covered with
thick black plastic film.
The experimental code was written in C++ using OpenSceneGraph for scenario construction and VR Juggler for tracking and
display configuration.
3.2 Task
The task of each experimental trial was to identify and select by
mouse click one aircraft in an airport traffic scenario on the traffic
display, based on a given target label in the HUD.
Figure 3: A portion of a rendered scenario showing an overlap situation in the top left corner. Labels and droplines (red) were rendered
on the HUD, while the aircraft objects and ground plane were rendered on the traffic display.
Figure 2: The experimental setup showing (1) a subject wearing a
head tracker and polarized glasses, (2) the HUD rendering the overlay and (3) the traffic display rendering the traffic simulation. The
bottom left half of the photo has been digitally enhanced to reveal
detail; in reality the room was as dark as the upper right half.
The traffic display, an opaque projection screen, was mounted
6.4 m from the user in the user’s line of sight through the center of
the HUD (marker ”3” in fig. 2). This distance renders an accommodative demand of 0.33 diopters between the HUD and the traffic
display. The active screen area displayed 1400×770 pixels with a
20.3◦ ×11.3◦ FoV, giving a pixel size of 0.9×0.9 arcmin. One 3500
ANSI lumen projector was used for image projection. The traffic
was not rendered in stereo since it simulated airport traffic located
at least 500 m from the observer, a distance at which relative disparities of physical objects are negligible. The traffic display was
driven by an Intel Centrino laptop running Ubuntu Linux.
The labels consisted of a 6-character airline callsign, 3 letters
and 3 digits in sequence. The letter sequence was randomly selected
within 5 possible combinations, corresponding to real airline identifiers starting with the letter A. The number sequence was randomly
generated. The last digit, however, was only randomly generated
between trials. By keeping the start letter and end digit constant
within a trial, the subject was required to read the 4 centermost
characters for object identification, thus limiting the possibility of
using methods of exclusion. By randomly generating the included
callsigns in a traffic scenario as described here, the subject could
not recall the callsign and its location from a previous trial, thereby
minimizing training effects due to scenario familiarity.
The labels were rendered using the Century Gothic font in full
red color. All labels had the same size and intensity on the screen
plane. The labels were approximately 3.2◦ ×0.7◦ in size, where
the total width varied slightly depending on the character glyphs.
Each label was located approximately 1.2◦ above its corresponding
object. Thus, the height in the visual field of the objects and labels
was a cue to object distance.
The objects in the traffic display, representing airport traffic,
were rendered as blue cones with the base perpendicular to the
ground plane and apex pointing in the direction of motion. Only
the upper half of the cones was visible as the lower portions were
occluded by the ground plane. Due to perspective, the width of
the closest objects was ∼1.3◦ in the screen plane, while the farthest
were about a third of that size. The mouse pointer used for selection
in the traffic display was 0.6◦ in height.
171
Each traffic scenario was an extract from a simulated 24-hour
airport traffic dataset at Paris CDG1 . The total number of visible
objects was between 9 and 14 when each scenario was initialized,
but as the scenarios evolved the number would ultimately range
between 7 and 15.
A screenshot of a rendered traffic scenario with the superimposed HUD graphics is shown in figure 3. Photographs of the
scenarios as they were presented to the experiment participants are
shown in figure 1, although they were not taken with polarized filters (glasses) and therefore include both left and right stereo HUD
imagery (labels and droplines). Moreover the labels shown in the
photos differ from the real stimuli in that the glyphs are considerably thicker due to lens flare; in reality they were perceived as
shown in figure 3.
3.3 Participants
We recruited 17 subjects, with ages ranging from 25 to 60. Because
visual accommodation has substantially degraded after the age of
about 40, we used approximately balanced age subgroups: eight
subjects were 25-39 years of age, while nine were 40-60. Three
subjects were female, of which two were over age 40.
All participants were staff, contractors or students at the
EUROCONTROL Experimental Centre. Participation was voluntary and no compensation was given.
All subjects passed a stereo vision test presented on a computer
monitor with a red-cyan anaglyph technique. Eight sets of random
dot anaglyph images, each set with 4 images, were displayed on
a computer screen. One image per set was distinguished from the
rest by the retinal disparity cue only visible to subjects with stereo
vision. The subjects’ task was to identify the distinguished image
and tell whether the contained square shape was perceived in front
or behind the screen. The retinal disparity required for passing the
test was 3.2 arcmin.
3.4 Procedure
Each subject was provided written experiment instructions before
the trials. The instructions stressed that accuracy and response time
were both important, but accuracy should be prioritized.
We measured each subject’s inter-pupillary distance using a mirror and ruler. This value was used to calibrate the stereo disparity
of the HUD. The subjects then moved to the stool, and mounted the
position tracker and lightweight polarized glasses on their head.
Before the experimental trials the subjects were given four test
trials. During these trials the subjects were accustomed to the task,
and screened for correct stereoscopic depth perception by which
they confirmed that the closest and farthest labels were perceived at
approximately the intended distances. The HUD registration with
the background was calibrated by aligning four white spheres visible in the HUD with the corners of the traffic display. The calibration was performed by manually adjusting the position of the head
tracker. This calibration assured that each dropline extended to the
center of the corresponding object with an acceptable registration
error of up to 0.4◦ .
Before each trial the target callsign was presented on the HUD.
The subjects pressed the spacebar on a keyboard on a table in front
of the stool. This action removed the target callsign and initialized the scenario which appeared after a few seconds. The subjects scanned the labels, identified the target callsign, and visually
followed a dropline extending from the label to the corresponding
object on the traffic display. The label end of the dropline had the
same disparity as the label, while the object end had approximately
the same disparity as the traffic display. The subjects then selected
1 The data was simulated using the TAAM software from Preston Aviation Solutions Pty Ltd.
172
the object on the traffic display using a mouse. They finally confirmed their selection by pressing the spacebar, which cleared the
scenario and presented the next callsign.
The trials were divided into two blocks of 36 trials. During the
break between the blocks, the subjects could rest, and even walk
around. In case subjects removed the head tracker during the break,
the HUD calibration was performed before recommencing. The
subjects were also asked to report any discomfort during the break.
After the trials each subject completed a questionnaire. The total
time including preparations, trials and questionnaire was approximately 60 minutes per subject.
3.5 Independent Variables
Viewing Condition Each label included in a trial was rendered
on the HUD with a certain stereoscopic disparity, making the label
appear to the subject at a distinct depth.
The spatial distribution of stereoscopic disparities for all labels in
a trial was the independent variable determining the principal viewing conditions of the experiment. Four viewing conditions were
used for label presentation; i) ordered disparity, ii) random disparity, iii) fixed near disparity and iv) fixed far disparity. Three of these
viewing conditions are illustrated in figure 1.
(i) In the ordered disparity condition the labels were separated in
depth into N discrete layers according to a logarithmic function:
dn = dmax − log (N − n + 1) ×
dmax − dmin
log N
(1)
where n is the label layer index (1 ≤ n ≤ N) and N is the total
number of label layers, which in this experiment was fixed
at 15. The closest label (n = 1) was located approximately
at the HUD (dn = dmin = 2.2 m) while the farthest (n = N)
was located near the traffic display (dn = dmax = 6.0 m). The
distance between the intermediate labels increases according
to the logarithmic function, in order to approximately equalize
inter-layer stereoscopic disparity (see fig. 4). Even though
the maximum number of labels was not always present in the
view, 15 layers were prepared for each trial to take care of
labels that enter the view at a later time, avoiding potentially
disturbingly visible depth rearrangement. Given an IPD of 64
mm and 15 label layers, the inter-layer stereoscopic disparity
was 6.0 ± 1.2 arcmin. The stereoscopic disparity difference
between overlapping labels could be larger if located in nonadjacent label layers. The depth order of the labels matched
the distance order to the corresponding objects. I.e., the label
of the closest object in the traffic scenario was assigned to
the first label layer; subsequent object labels were assigned to
label layers based on their distance order.
(ii) In the random disparity condition the labels were segregated
as in the ordered condition except that their depth order was
randomized with respect to that of the corresponding objects.
We included this condition as a control case to determine if
segregation by image disparity itself had a benefit, regardless
of depth order. If so, the results would show that both the ordered and random disparity conditions aided search and target
designation through visual clutter reduction.
(iii) In the fixed near disparity condition, all labels were placed
in a single apparent depth layer approximately at the HUD,
(dn = 2.2 m).
(iv) In the fixed far disparity condition, all labels were placed in a
single apparent depth layer approximately at the traffic display
(dn = 6.0 m).
Response Error The target designation error was recorded.
The response was erroneous if the target callsign differed from the
required response. The experiment’s difficulty level was adjusted
to ensure a level of erroneous identifications below 10%.
3.7 Experimental Design
Figure 4: Distance (dn ) for each label layer (n) in the ordered and random disparity viewing conditions, given by equation 1. In the ordered
disparity condition the label layer order correlates with the object distance order; in the random disparity condition the label layer order is
randomized with respect to the object distance order.
Object Motion The aircraft objects in each trial were either
static or dynamic, i.e. in motion. When in motion, they moved according to the traffic simulation. Although objects exhibited varying motion, depending on their situation, depth and corresponding
aircraft size, horizontal screen motion was generally below 0.3◦ /s.
Objects could have higher speed, e.g. when landing or taking off,
but such objects were not selected as targets.
Overlap Level The overlap level could be high, medium or
low as seen by the subjects. High overlap level indicates that the target object had two other objects in its immediate proximity, meaning its label would likely be in an overlap situation with two other
labels. An object was defined to be in the immediate proximity
of the target when within 1.4◦ from the target location as seen by
the subject. In the medium overlap level the target object had one
other object in its immediate proximity. Low overlap level indicates
that no other objects were in the target object’s immediate proximity. This distinction is illustrated in the scenario shown in figure 3,
where object AAG183 would be considered to be in the high overlap level, AIB603 in the medium overlap level, and AAL783 in the
low overlap level. If objects were in motion the overlap level could
change over time; however, the designated level corresponded to
the situation in the first few seconds of each dynamic scenario.
The overlap situations were not identical throughout the experimental conditions but randomly sampled from matched sets having common overlap properties. There were six specific scenarios
per overlap level. By sampling the overlap situations from a larger
set as described here, we reduced the number of times an overlap
situation was re-used, thereby minimizing training effects due to
scenario familiarity.
Repetition Each combination of independent variables was repeated three times per subject with a blocking designed to let sequence effects, due to shifts of the viewing conditions with varied
disparity distributions, abate. In this way the problem of possible
asymmetric transfer in repeated measures experiments was avoided,
as demonstrated in the post-experimental analysis.
3.6 Dependent Variables
Response Time We recorded the duration from the instant
the scenario was displayed until subjects generated the required response. The users responded by identifying and selecting the target
aircraft with a mouse click, and subsequently confirming the selection by pressing the spacebar.
Each subject saw all combinations of the independent variables in a
repeated measures within-subject design. We made a total of 1224
data recordings, 72 per subject (4 viewing conditions × 2 object
motions × 3 overlap levels × 3 repetitions).
The viewing condition was blocked in groups of three, allowing any transient effects to abate when viewing conditions changed.
The three blocked repetitions should not be confused with the independent variable repetition where all independent variables were
fixed. Only the viewing condition was blocked, all other conditions
were randomized for each subject. Accordingly, the three trials in a
specific block were not exact replications of the same conditions.
The blocks were constructed using a partial Latin squares design and were re-used for all subjects. The presentation order of
the blocks between subjects was counterbalanced through random
permutations.
4
R ESULTS
Initial analysis of the data indicated that there were transition effects
on response time within the blocks. Post-hoc Scheffé comparisons
showed that there was a significant difference (F(3, 26) = 19.4, p <
0.01) in response time between the first and second block element,
for each of the main viewing conditions, while there was no significant difference between the second and third repetition which
were approximately equal. The transition effects only showed up in
the ordered and random disparity viewing conditions. We believe
that the transition effects are due to changes in the pattern of ocular vergence required for each of the main viewing conditions. For
example, when viewing the ordered disparity condition the subjects
would be required to change vergence from the far screen to various
positions between the near and far screens. In contrast, when the labels were presented in the fixed disparity conditions, no vergence
changes would be required while the subject looked from one label
to another.
In order to avoid these transition effects we report results based
only on analyses of the third element within each block. All reported statistically significant effects were, however, present even
when all three elements of the block were included. By limiting
ourselves to only the third in-block repetition we are able to filter
out effects and interactions that might be attributed to the transition
effects.
Using regression analysis on the median response time of each
trial we found no significant effect for trial number on response
time (r = −0.32, d f = 22, ns), meaning that subjects showed no
general improvement of performance throughout the trials (see fig.
5). This confirms not only that our measures to minimize training effects due to scenario familiarity were successful, but also that
there was no significant training effect due to skill development.
Two subjects, despite having passed the stereo test and the initial depth perception screening, reported significant difficulties with
stereo accommodation, both orally during the break and in the questionnaire. These subjects were therefore excluded from the analyses. Their exclusion from analysis did not have any effect on the
pattern of statistically significant results, but we believe it improves
the accuracy of the data since our experiment presupposes good
stereoscopic vision.
The questionnaire data is not included in this paper but will be
reported at a later date.
173
(F(2, 26) = 63.9, p < 0.001), which confirmed that our initial scenario design worked as intended (see fig. 6). The post-hoc Scheffé
analysis showed significant increases in response time between low
and medium overlap levels (F(2, 177) = 12.5, p < 0.01) and between the medium and high overlap levels (F(2, 177) = 56.8, p <
0.01).
No significant main effect was found for viewing condition on
response time (F(2, 26) = 1.30, ns). Initially we had anticipated
this main effect to be significant, however, as discussed later, subsequent analysis showed significant interaction effects with overlap
level.
Furthermore there were no significant main effects of age
(F(1, 13) = 0.61, ns) or object motion (F(1, 13) = 0.06, ns) on response time.
Figure 5: Trial number had no significant effect on response time,
showing that no overall training effect was present. The diamond
shapes show each subject’s recording per trial, while the larger
squares show the median of these recordings per trial. The line represents the linear regression through the median response time of
each trial, with its equation given above.
4.1 Response Time
The results were analyzed using analysis of variance (ANOVA), using a repeated measures design with subject as a random variable
and a fixed model for all other independent variables. When analyzing for effects, we applied a logarithmic transformation to the
response time in order to remove the strong positive skew of the response time data, which otherwise would have produced a violation
of the analytic assumption of homogeneity of variance.
Figure 7: Mean response time for each viewing condition, grouped
by overlap level. As the main result of this experiment, an interaction
effect was found in the high overlap level, where ordered disparity
showed significantly lower response times than the other conditions.
As the main result of this experiment, a significant effect was
found for the interaction of viewing condition and overlap level on
response time (F(4, 52) = 4.63, p < 0.005), where a significantly
lower response time was found for the ordered disparity condition when overlap levels were high (fig. 7). This means that in
situations when the target object was in close proximity to two
or more objects, ordered disparity was significantly faster (24.0%,
4.1 s) than fixed disparity (F(2, 57) = 9.72, p < 0.05). Similarly
it was significantly faster (37.2%, 7.6 s) than random disparity
(F(2, 57) = 21.4, p < 0.01).
There was no significant interaction effect of object motion and
viewing condition on response time.
Figure 6: Overlap level had a significant main effect on response
time.
Initial analysis of the data showed an interaction effect for
viewing condition and overlap level on response time (F(6, 78) =
2.49, p < 0.05). Post-hoc Scheffé comparisons showed no significant differences between the two fixed disparity conditions in either levels of high overlap (F(2, 57) = 0.29, ns), medium overlap
(F(2, 57) = 0.03, ns) or low overlap (F(2, 57) = 0.21, ns). Since we
are interested in the effects of the varying disparity scheme against
the traditional “2D” layout with fixed stereoscopic disparity, and all
these pairs of means did not vary by more than 2.7%, we collapsed
these two conditions into one condition called fixed disparity. Subsequent analyses and diagrams are based on this approach.
We found a main effect for overlap level on response time
174
4.2 Response Error
The mean error rate was 6.1%. Analyzing the data using χ 2 contingency tables (see table 1) we found no significant effect of viewing
condition on response error (χ 2 (2) = 0.36, ns).
We suspected that there could be an increase in accuracy through
training on the first two block repetitions, so we made an analysis
on the full block data (see table 2). However, no effect was found
for viewing condition, looking both at contingency tables of proportional and equal error distributions (χ 2 (2) = 4.26, ns).
When analyzing the effects of object motion we found that the
mean error rate was 8.3% for static scenes and 3.9% for dynamic
scenes, but the effect was found not to be significant (χ 2 (1) =
3.1, ns).
Generally, the error rate was too low for the number of subjects
tested to make any further analyses of effects on error rate. How-
Table 1: Contingency table showing response error per viewing condition, third block repetition.
Response
Correct
Incorrect
Disparity
Fixed Ordered
189
96
15
6
Random
96
6
an input to an automatic label layering algorithm.
An important issue to take into account, when designing an interface with layered information, is that the font glyphs must be
thin enough for characterizing features (lines, strokes, etc) to protrude from underlying layers. In some pilot studies we used thicker
font glyphs, yielding fewer available features from each depth layer,
which hampered stereoscopic fusion.
6
Table 2: Contingency table showing response error per viewing condition, all block repetitions.
Response
Correct
Incorrect
Disparity
Fixed Ordered
572
294
40
12
Random
282
24
ever, this low rate confirms that the subjects understood the instruction with respect to prioritizing accuracy over response time.
5
D ISCUSSION
We have shown that ordered label layering significantly improves
decision time, over 24% on average, in complex, realistically moving scenes with high degrees of label overlap, compared to cases
without label layering. The effects were not significant in the less
complex cases with little or no overlap, which was expected since in
these cases the target labels did not need segregation from other labels. These results are similar to the findings of Wong et al. [23]
mentioned previously, where the effects of information layering
were only significant under demanding task conditions.
We had initially assumed that the label layering technique, regardless of depth order, would have a positive impact on performance. However, results showed that only the ordered label layering, where label layer order corresponded to object distance order,
improved performance. Conversely, the random label layer order
significantly decreased user performance, even when compared to
cases without label layering. This is likely due to the mismatch in
depth cues in the random case; the height in visual field of the labels and background objects, as well as the objects’ relative size
differences, are inconsistent with labels’ stereoscopic disparity order, negatively affecting label-object integration. If the label layering technique were applied to a flat 2D interface with an orthogonal
view, like a traditional computer screen, there would be no depth
conflict so we now hypothesize that random disparity would yield
better performance, perhaps similar to ordered disparity.
Motion had no main effect on response time, as shown in the results section. This may be due to the fact that motion can influence
response time both ways. It could decrease response time since the
relative motion could help clutter breaking and make identification
faster. Alternatively it could increase response time as the motion
could encourage the subject to wait for the traffic to evolve, making
the target more visible. We therefore suspected that motion could
have a main effect on error, however that effect was not found to
be significant either, possibly due to our low baseline error rates.
It would be interesting to perform a more statistically powerful, indepth study where motion is isolated as a perceptual cue, and cannot
be used as a “wait-and-see” factor as in this experiment. It would
also be interesting to analyze the interaction of motion and overlap level on response time, which was not possible with the current
data. This would reveal if motion is more or less effective in the
high overlap situations through its effect of user sensitivity to disparity. A better understanding of the role of motion could serve as
F UTURE W ORK
The current study justifies and provides empirical baseline data for
development of a future, automated, label layering algorithm using stereoscopic disparity, which would subsequently be evaluated
against traditional 2D label placement algorithms. As label layer
order is an important factor for performance, at least in a perspective display format, such an algorithm must handle situations where
the depth order of the individual objects changes over time in order
to maintain the required label-object depth correlation. There was
no such differential movement in depth in the present experiment.
It would be interesting to evaluate the depth distribution function
of label layers. The logarithmic distribution used in this experiment
did segregate labels in an effective way; however, many alternative
separation functions are available. We hypothesize that a constant
inter-layer disparity would be preferential, instead of the 6.0 ± 1.2
arcmin disparity range used in this experiment.
It would also be relevant to study the maximum number of perceivable overlapping label layers. If there is an upper limit, perhaps analogous to the five perceivable transparent layers in random
dot stereogram stimuli [22], the label layering algorithm could be
combined with techniques of traditional 2D algorithms for solving
particularly complex situations, yielding a fully three-dimensional
label placement algorithm. With a limit on the number of label layers, larger inter-layer disparities (> 10 arcmin) are possible. The
optimal inter-layer disparity in such a case would require further
empirical investigation.
Even though our new approach effectively expands the design
space and could alleviate the need for compromises required by
traditional declutter methods, such as filtering, dimming, aggregation or label size reduction, it may be the case that the introduced
depth movement of labels could be distracting and capture unnecessary attention. Indeed, research has shown that looming objects are
much more likely to capture attention than receding ones [8]. The
looming stimuli used in that study were however provided through
relative size adjustments, with stereoscopic disparity fixed. Conversely, our system would vary the stereoscopic disparity of labels while keeping relative label size fixed. The practical effect
in our system requires empirical investigation. We speculate however that even if similar patterns of attention capture were present
for disparity-based object looming, the threshold of capture would
be affected by the speed and smoothness of depth motion. Given
that label layering is only effective in overlap situations, as shown
in this paper, the default non-overlap label positions would be in
a single depth layer. Since receding objects capture less attention
than looming ones, imminent label overlap situations should be resolved through a receding motion. To satisfy the disparity ordering
constraint, discovered in this experiment, the label corresponding to
the farthest object should recede. Restoring the label to its default
position could be implemented with a much slower loom, below
the threshold of attention capture. This attention avoidance is e.g.
analogous to the washout phase in motion-based flight simulators,
where the desired sense of acceleration is achieved by tilting the
motion platform backwards, while the restoration to the initial neutral position is performed slow enough to be below the human motion sensor thresholds. In addition, prediction algorithms could be
used to resolve future label overlap situations in advance, reducing
the risk of rapid depth movement.
175
ACKNOWLEDGEMENTS
Stephen Peterson and Magnus Axholt were supported by
PhD scholarships from the Innovative Research Programme at
the EUROCONTROL Experimental Centre, Brétigny-sur-Orge,
France.
R EFERENCES
[1] K. R. Allendoerfer, J. Galushka, and R. H. Mogford. Display
system replacement baseline research report. Technical Report
DOT/FAA/CT-TN00/31, William J. Hughes Technical Center, Atlantic City International Airport (NJ), 2000.
[2] R. Azuma and C. Furmanski. Evaluating label placement for augmented reality view management. In Proceedings of IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR 2003),
pages 55–75, Tokyo, Japan, October 2003.
[3] B. Bell, S. Feiner, and T. Höllerer. View management for virtual and
augmented reality. In UIST ’01: Proceedings of the 14th annual ACM
symposium on User interface software and technology, pages 101–
110, Orlando, Florida, 2001.
[4] K. R. Brooks and L. S. Stone. Stereomotion suppression and the perception of speed: Accuracy and precision as a function of 3d trajectory. Journal of Vision, 6:1214–1223, 2006.
[5] J. E. Cutting. How the eye measures reality and virtual reality. In
Behavior Research Methods, Instrumentation, and Computers, volume 29, pages 29–36, 1997.
[6] A. Dorbes. Requirements for the implementation of automatic and
manual label anti-overlap. Technical Report 21/00, EUROCONTROL
Experimental Centre (EEC), 2000.
[7] S. Edmondson, J. Christensen, J. Marks, and S. Shieber. A general
cartographic labeling algorithm. Cartographica, 33(4):13–23, 1996.
[8] S. L. Franconeri and D. J. Simons. Moving and looming stimuli capture attention. Perception & Psychophysics, 65(7):999–1010, 2003.
[9] K. Hartmann, K. Ali, and T. Strothotte. Floating labels: Applying
dynamic potential fields for label layout. In Proceedings of 4th International Symposium on Smart Graphics, pages 101–113, Berlin,
2004. Springer Verlag.
[10] K. Hartmann, T. Götzelmann, K. Ali, and T. Strothotte. Metrics for
functional and aesthetic label layouts. In Proceedings of 5th International Symposium on Smart Graphics, pages 115–126, Berlin, 2005.
Springer Verlag.
[11] M. B. Holbrook. Breaking camouflage: stereography as the cure for
confusion, clutter, crowding, and complexity - three-dimensional photography. Photographic Society of America Journal, 8, 1998.
[12] M. S. John, B. A. Feher, and J. G. Morrison. Evaluating alternative
symbologies for decluttering geographical displays. Technical Report
1890, Space and Naval Warfare System Center, San Diego, CA, 2002.
[13] M. S. John, H. Smallman, D. I. Manes, B. A. Feher, and J. G. Morrison. Heuristic automation for decluttering tactical displays. The Journal of the Human Factors and Ergonomics Society, 47(3):509–525,
2005.
[14] B. Julesz. Foundations of Cyclopean Perception. The University of
Chicago Press, Chicago, 1971. ISBN: 0-226-41527-9.
[15] S. Julier, M. Lanzagorta, L. Rosenblum, S. Feiner, and T. Höllerer.
Information filtering for mobile augmented reality. In Proceedings of
ISAR 2000, pages 3–11, Munich, Germany, October 2000.
[16] S. Kakos and K. J. Kyriakopoulos. The navigation functions approach
for the label anti-overlapping problem. In Proceedings of the 4th
EUROCONTROL Innovative Research Workshop, Paris, France, 2005.
[17] M. J. M. Lankheet and M. Palmen. Stereoscopic segregation of transparent surfaces and the effect of motion contrast. Vision Research,
38(5):659–668, 1998.
[18] S. P. McKee, S. N. J. Watamaniuk, J. M. Harris, H. S. Smallman, and
D. G. Taylor. Is stereopsis effective in breaking camouflage? Vision
Research, 37:2047–2055, 1997.
[19] R. V. Parrish, S. P. Williams, and D. E. Nold. Effective declutter of
complex flight displays using stereoptic 3-d cueing. Technical Report
3426, NASA, 1994.
176
[20] S. Peterson, M. Axholt, and S. R. Ellis. Very large format stereoscopic
head-up display for the airport tower. In Proceedings of the 16th Virtual Images Seminar, Paris, January 2007.
[21] E. Rosten, G. Reitmayr, and T. Drummond. Real-time video annotations for augmented reality. In International Symposium on Visual
Computing, 2005.
[22] I. Tsirlin, R. S. Allison, and L. M. Wilcox. On seeing transparent
surfaces in stereoscopic displays. Master’s thesis, York University,
Canada, 2006.
[23] B. L. W. Wong, R. Joyekurun, H. Mansour, P. Amaldi, A. Nees, and
R. Villanueva. Depth, layering and transparency: Developing design
techniques. In Proceedings of the Australasian Computer-Human Interaction Conference (OZCHI), Canberra, Australia, 2005.
[24] M. Yamamoto, G. Camara, and L. A. N. Lorena. Tabu search heuristics for point-feature cartographic label placement. GeoInformatica,
6(1):77–90, 2002.
[25] F. Zhang and H. Sun. Dynamic labeling management in virtual
and augmented environments. In Proceedings of the 9th International Conference on Computer Aided Design and Computer Graphics
(CAD/CG), 2005.