Results of user-centered quality evaluation experiments

Transcription

Results of user-centered quality evaluation experiments
Results of user-centered quality evaluation experiments and usability
tests of prototype
Dominik Strohmeier n Kristina Kunze n Satu Jumisko-Pyykkö
MOBILE3DTV
Project No. 216503
Results of user-centered quality evaluation experiments and usability
tests of prototype
Dominik Strohmeier, Kristina Kunze, Satu Jumisko-Pyykkö
Abstract: In this report we present our work towards finalization of the user-centered quality of experience
evaluation framework. During the standardization activities for our framework, we identified the need for
comparisons of OPQ with related methods and evaluation approaches to increase its validity. The first
study compares OPQ evaluation done in laboratory and the context of use and shows comparable results in
the statistical analysis. The second study introduces the Extended-OPQ approach which allows deriving
components of Quality of Experience from a series of OPQ studies. The developed QoE terminology for
mobile 3D video has been applied in a second study and the results are compared to OPQ evaluation. The
comparison shows again comparable results for OPQ and the descriptive evaluation with fixed vocabulary,
but also reveals a need for further work towards an optimized terminology. Altogether, the results of the
studies contribute to the validity of OPQ and finalize its methodological development. The second part of
the report includes the preparations of a final prototype evaluation for MOBILE3DTV. Although we were
not able to conduct the study due to unavailability of the prototype at the planned time, we report in detail
the test procedure and the selection of the independent variables to provide a valid research plan.
Keywords: 3DTV, mobile video, Open Profiling of Quality, descriptive evaluation, comparison model,
conventional profiling, terminology, component model
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Executive Summary
Open Profiling of Quality (OPQ) has become a well established tool during the quality
evaluations of Mobile3DTV. OPQ is a mixed methods research approach which extends
commonly applied psychoperceptual, quantitative evaluations of perceived quality with a
descriptive evaluation approach. This descriptive or sensory evaluation enables researcher to
identify the underlying quality rationale of a quantitative quality rating on test participants‟
individual attributes.
The application of OPQ in a series of studies on different research questions within the
MOBILE3DTV project has shown good validity of OPQ by complementing research results
among the studies. Together with the evaluations of quality in the context of use it now builds the
User-centered Quality of Experience evaluation framework. This evaluation framework was
accepted as contribution for the standardization activities of the ITU-T SG12 where both
methods were presented in the general meeting.
During the work towards the proposal, we identified several issues on OPQ that still needed to
be studied. In this report, we present the results of two studies which targeted an increased
validity of the OPQ approach. The goal of the first study was to extend and to validate the use of
the OPQ method in field circumstances. We conducted the first experiment in two different
evaluation contexts, laboratory and café with varying video qualities under assessment. The
second study targets comparison of OPQ to related research methods. For this comparison, we
included two new approaches into the UC-QoE evaluation framework.
Firstly, we introduced the Extended-OPQ approach (Ext-OPQ) as an additional fourth step within
OPQ to derive a general component model from the individual attributes of a series of studies on
the same domain of research. We describe the research method of the Extended-OPQ and
results of the application of Ext-OPQ approach on mobile 3D video. The result consists of 19
general descriptive attributes for mobile 3D video which we then utilized further in an adaptation
of OPQ based on fixed vocabulary. The descriptive evaluation of quality based on a vocabulary
which is the same for all participants is known as conventional profiling and describes a first
approach towards operationalization of the Ext-OPQ component model. We compare the results
of the CP evaluation to an OPQ evaluation in the second study which is reported here.
For this comparison, we further introduce our comparison model which is the result of our work
towards holistic comparison of research methods. The comparison model has been developed
based on a literature review into different research areas from which we collected the
comparison criteria applied. The comparison model structures these criteria into four different
classes and allows for systematic and holistic comparison of research methods. The comparison
of the CP and OPQ approach is based on a subset of attributes from the comparison model as a
first approach to deeper understanding of benefits and shortcomings of different research
methods in subjective quality evaluations.
In overall, the results presented in this report finalize the methodological work on the Usercentered Quality of Experience evaluation framework within the MOBILE3DTV project and
provide a last validation of OPQ as well established research method for mixed methods
evaluation. A second part of the report is dedicated to the final overall evaluation of the whole
MOBILE3DTV prototype system. A study which combines different evaluation approaches with
optimized components of the system has been planned. The research plan as well as the
selected content and a detailed description of the research method are provided in this report.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Table of Content
1
Introduction .......................................................................................................................... 5
2
Simulator Sickness on mobile autostereoscopic screens...................................................... 8
2.1
3
2.1.1
Simulator Sickness Questionnaire .......................................................................... 8
2.1.2
Procedure .............................................................................................................. 8
2.1.3
Characteristics of the experiments ......................................................................... 9
2.1.4
Apparatus – displays ............................................................................................ 10
2.2
Results ........................................................................................................................ 10
2.3
Discussions and Conclusion ........................................................................................ 11
Probing OPQ in the context of use ..................................................................................... 13
3.1
Research method ........................................................................................................ 13
3.2
Results ........................................................................................................................ 16
3.2.1
Psychoperceptual evaluation................................................................................ 16
3.2.2
Sensory evaluation ............................................................................................... 18
3.2.3
Comparison of results .......................................................................................... 22
3.3
4
5
Research Method .......................................................................................................... 8
Discussion and Conclusion ......................................................................................... 23
Extended OPQ ................................................................................................................... 26
4.1
Fixed vocabulary and terminologies in descriptive analysis ......................................... 26
4.2
The component model as extension of the OPQ method ............................................ 26
4.2.1
Open definition task and qualitative descriptions .................................................. 26
4.2.2
Components of Quality of Experience for mobile 3D video ................................... 28
Comparison of OPQ and CP .............................................................................................. 33
5.1
Introduction and research problem .............................................................................. 33
5.2
Comparison criteria and comparison model ................................................................ 33
5.3
Comparison Study: Comparing OPQ and CP .............................................................. 37
5.3.1
Research Method ................................................................................................. 37
5.3.2
Results ................................................................................................................. 38
5.4
Systematic Comparison of Methods ............................................................................ 43
5.4.1
Test results .......................................................................................................... 43
5.4.2
Test procedure ..................................................................................................... 44
5.4.3
Amount of time ..................................................................................................... 44
5.4.4
Costs .................................................................................................................... 44
5.4.5
Research purpose ................................................................................................ 45
5.5
Discussion ................................................................................................................... 45
6 Prototype Study: Usability and quality experience with the final mobile 3D prototype in the
context of use ............................................................................................................................ 46
MOBILE3DTV
6.1
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Preliminary planning .................................................................................................... 46
6.1.1
Participants .......................................................................................................... 46
6.1.2
Test design .......................................................................................................... 46
6.1.3
Test procedure ..................................................................................................... 46
6.1.4
Test Material and Apparatus ................................................................................ 48
6.2
Actual planning............................................................................................................ 51
6.2.1
Participants .......................................................................................................... 51
6.2.2
Test design .......................................................................................................... 51
6.2.3
Test procedure ..................................................................................................... 51
6.2.4
Test Material and Apparatus ................................................................................ 52
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
1 Introduction
The User-centered Quality of Experience (UC-QoE) evaluation framework [42] is the
methodological result of the subjective quality evaluation studies in the MOBILE3DTV project.
The evaluation framework consists of two main evaluation approaches that extend conventional
quantitative profiling. The evaluations in the context of use extend quality assessment in
controlled laboratory environments with exploration of the impact of different evaluation contexts
on users‟ experienced quality [46][42][29]. The contextual evaluations increase the external
validity of the research results. The approach has been applied successfully in the quality
evaluations of mobile 3D video and television [29][55].
The second methodological approach within the UC-QoE evaluation framework has been the
development of a research method for evaluation of individual quality factors by applying a
descriptive evaluation method in parallel to common quantitative, psychoperceptual evaluations.
Open Profiling of Quality (OPQ) is a mixed method that combines the evaluation of quality
preferences and the elicitation of idiosyncratic experienced quality factors. It therefore uses
quantitative psychoperceptual evaluation and, subsequently, an adaption of Free-Choice
Profiling [10][12]. The application of OPQ in the quality evaluations of mobile 3DTV has shown
complementing results in a series of studies on different research questions (Table 1).
The complementation of results and the additional knowledge obtained from Open Profiling of
Quality in contrast to pure psychoperceptual evaluations have made Open Profiling of Quality to
be a well validated tool for mixed methods evaluations in the UC-QoE evaluation framework.
Both methodological approaches of the UC-QoE evaluation framework were accepted as
proposals for standardization in ITU-T SG12 [57][58]. Within the standardization activities for the
OPQ approach, we identified two issues that are needed towards finalization and validation of
the research method. The first point targets the internal comparison of different methods of
analysis that can be applied on the OPQ data sets [10] and confirmation of its internal validation.
The second point targets the external comparison of OPQ with related research methods in the
field of descriptive quality evaluations.
In the first part of this deliverable we present the results of targeted an complementation of the
existing results. First, we present, in section 2, the results of an analysis of Simulator Sickness
data which was collected in the previous studies. Then, we present the results of a study in
which we conducted an OPQ evaluation in the context of use and compared the results of this
evaluation to an assessment in controlled laboratory environment. The results are presented in
section 3. This work finalizes the development of the OPQ approach. In section 4, we introduce
the Extended-OPQ (Ext-OPQ) approach [12]. During the applications of OPQ in the
MOBILE3DTV project, we identified the need to be able to transform the individual quality factors
into a fixed vocabulary of components of Quality of Experience for mobile 3D video [12]. ExtOPQ introduces the component model as an extension of the OPQ method which enables to
derive a common terminology from a set of OPQ studies in a specified field of research. The
terminology obtained in the Ext-OPQ was then used in an external comparison study in which
the results for OPQ were compared to a Conventional Profiling. Conventional Profiling (CP) is
commonly understood as a sensory evaluation based on a fixed vocabulary [1][2][18][52]. We
operationalized our QoE components for a CP approach. For systematic comparison of related
research methods, we first introduce our comparison model in section 5. This model was built
based on a literature review of different comparison criteria for research methods in different
domains of research. Our comparison model is the first step towards a holistic comparison of
research methods. A subset of comparison criteria is then used to compare our OPQ with the
conventional profiling approach.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Table 1 Series of OPQ studies during its methodological development in the application
on different research questions on experienced quality of mobile 3D video and television
Name of the study
Underlying research question
Summary of results
Study 1: Experienced
Quality of Audiovisual
Depth [10][14]
How does perception of
audiovisual content change
when it is presented in 2D and
3D?
Although the results of the psychoperceptual
evaluations did not reveal any significant
differences between the 2D and 3D conditions,
OPQ results show that the independent
variables in the test were identified and
evaluated. In addition, we were able to identify
different preferences of participants from which
modality they derive their quality attributes.
Study 2: Experienced
Quality of Audiovisual
Depth in Mobile 3D
Television and Video
[10]
How does perception of
audiovisual mobile 3D videos
change when it is presented in
either
2D
or
on
an
autostereoscopic
mobile
device?
The results underlined the dominance of visual
quality factors over the audio factors and their
interaction in the experienced quality. The
results also showed a controversial impact of
3D presentation mode on overall quality and
depth impression. While the use of 3D mode
increased the depth impression, it decreased
the overall satisfaction. 3D quality was often
described with artifacts. However, our results
also showed that in artifact-free cases, 3D can
reach higher perceived quality compared to 2D.
Study 3: Experienced
Quality
of
Video
Coding Methods for
Mobile 3D Television
[8][10]
What is the optimum coding
method for mobile 3D video?
Our results of psychoperceptual evaluation
showed that Multiview Coding and Video +
Depth provide the highest experienced quality
among the tested coding methods. The results
of sensory profiling showed that artifacts are still
the determining quality factor for 3D. The
expected added value through depth perception
was rarely mentioned by the test participants.
When mentioned, it was connected to the
artifact-free video. We identified a hierarchical
dependency between depth perception and
artifacts. When the visibility of artifacts is low,
depth perception seems to contribute to the
added value of 3D.
Study 4: Experienced
Quality of Mobile 3D
Video
Broadcasting
over DVB-H [11][12]
What
are
the
optimum
transmission conditions for
transmitting mobile 3D videos
over a DVB-H channel?
The results show that the provided quality level
of videos with a low error rate is clearly above
50%. Still, the different coding methods had the
highest impact on the experienced quality of test
participants. In the sensory evaluation we were
able to show again that the expected
descriptions of judder as contrast to fluency of
the test items are found rarely and descriptions
are dominated by artifacts relating to blockiness
or blur. Again, impact of 3D perception was only
identified for artifact-free videos.
The second part of this deliverable targets a final holistic evaluation of the MOBILE3DTV
prototype system. During its development process the different stages of the system were
optimized within a user-centered optimization approach along the production chain of mobile 3D
video [10][11][12][45]. The final system from optimized content to use of the final prototype
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
device is targeted for evaluation in a scenario-based approach in different contexts of use.
However, we were not able to conduct this study in time due to problems in the availability of a
final prototype device. Section 6 presents the research plan for the study including selected
content, independent parameters and detailed description of the developed test procedure in the
scenario-based evaluation.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
2 Simulator Sickness on mobile autostereoscopic screens
Previous research into the subjective quality of autostereoscopic displays suggests that visual
discomfort is a common problem for 3D media [60]. Visual discomfort by autostereoscopic
displays is often caused by impairments in stereoscopy, e.g. crosstalk or keystone distortion
[61]. The experienced visual discomfort may degrade the perceived image quality and cause
annoyance which can result in a lower acceptance of the novel technology [60]. Three main
approaches for measuring visual discomfort exist: 1) explorative studies, 2) psychophysical
scaling and 3) questionnaires [60]. Questionnaires are commonly applied to subjectively study
the degree of visual discomfort.
Kennedy et al. (1993) [62] originally developed the Simulator Sickness Questionnaire (SSQ) to
study sickness related symptoms induced by aviation simulator displays. In the questionnaire,
symptoms that contribute to nausea, oculomotor symptoms, disorientation are measured
individually and a combined total severity score to subjectively quantify the experienced
symptoms of the participant is finally calculated. Since its conception, SSQ has also been
applied to several fields outside the aviation research community. Jaeger & Mourant (2001) [64]
compared simulator sickness symptoms with static and dynamic virtual environments. They
concluded that increased duration of exposure intensifies sickness symptoms. During their
longest session of 23 minutes, they did not observe physiological adaptation that would lessen
the symptoms during prolonged exposure. In contrast, Häkkinen et al. (2002) [63] noted that
stereoscopic gaming induced strong nausea and disorientation symptoms with the worst
symptoms being experienced within 10 minutes after task completion. They applied SSQ to
study stability and sickness symptoms after Head-Mounted Display (HMD) use. Oculomotor
symptoms were experienced independently of the used stimuli. In a similar study, Pölönen &
Häkkinen (2009) [65] used the SSQ to measure sickness symptoms in three different
applications while using a Near-to-Eye Display (NED). They compared the dependency of SSQ
during watching a movie, a game play, and reading. The results show that discomfort was
experienced with all applications, especially with reading using the NED. Disorientation was
experienced especially when playing games with strong motion scenes. Movie viewing invoked
the least symptoms of the three applications. Lambooij et al. [60] note in their review on
stereoscopic displays and visual comfort that induced blur may cause unnatural depth
perceptions. They also emphasize that spatial and temporal inconsistencies and conflicting
depth cues are a cause for annoyance and visual discomfort.
2.1 Research Method
2.1.1 Simulator Sickness Questionnaire
In this study, we analyze the data of five different experiments in which we collected SSQ data.
The SSQ as applied contains 16 physical symptoms [62]. Each symptom is rated on a
categorical labeled scale (none, slight, moderate, severe) by the test participant. Each symptom
score then contributes to groups of 1) nausea (e.g. stomach awareness), 2) oculomotor (e.g.
eyestrain), and 3) disorientation (e.g. dizziness). A total score can be calculated by a weighted
contribution of the subcategories. The weight scores are: Nausea = 9.54, Oculomotor= 7.58,
Disorientation = 13.92 and total score = 3.74. The SSQ was collected prior to immersion and
several times after immersion (after 0, 4, 8, 12 minutes) in the experiments 1-4. In the
experiment 5, only immediate post-immersion was collected after two immersive sessions. In
this report, the absolute values are presented.
2.1.2 Procedure
The structure of all experiments was similar. Pre-immersive evaluation of SSQ was collected at
the beginning of each test. During the immersion, test participants conducted a
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
psychoperceptual quality evaluation task [24]. During the rating, participants gazed off screen,
partly resembling typical mobile video viewing situation [72] . Post-task SSQ data was collected
applied directly after completion of the immersive psychoperceptual evaluation task.
2.1.3 Characteristics of the experiments
Experiment 1 targeted the evaluation of a suitable surround sound setup for a 15”
autostereoscopic laptop [10]. Experiment 2 aimed on identifying audiovisual experienced quality
under monoscopic and stereoscopic video presentation [10]. Experiment 3 [10] explored the
influence of video coding methods and experiment 4 [12] transmission parameters on
experienced quality of mobile 3D television. Finally, experiment 5 studied different video coding
parameters. The experiments 1-4 were conducted in the controlled laboratory envorinments. The
experiment 5 was conducted in both controlled and in-door quasi-experimental settings [72]. All
characteristics of the experiments are summarized in Table 2
Table 2 Characteristics of the experiments
EXPERIMENT IMMERSION
DISPLAY CONTENT
Length of viewing
[Reference]
CHARACTERISTICS
SAMPLE
N
EFFECT OF TIME
(post-immersion)
32
Nausea: FR=6.42, df=3,p=.93, ns
Oculomotor: FR=10.95, df=3,
p<.05
Disorientation: FR=17.73, df=3,
p<.001
Total: FR=27.52, df=3, p<.001
Nausea: FR=14.89, df=3, p<.01
Oculomotor: FR=31.04, df=3,
p<.001
Disorientation: FR=39.89, df=3,
p<.001
Total: FR=51.17, df=3, p<.001
Nausea: FR=30.29, df=3, p<.001
Oculomotor: FR=29.52, df=3,
p<.001
Disorientation: FR=48.41, df=3,
p<.001
Total: FR=61.92, df=3, p<.001
Nausea: FR=13.5, df=3, p<.01
Oculomotor: FR=48.55, df=3,
p<.001
Disorientation: FR=27.49, df=3,
p<.001
Total: FR=53.14, df=3, p<.001
Total length
EXP 1
[10]
EXP 2
[10]
EXP 3
[10]
EXP 4
[12]
EXP 5
[72]
Viewing: 4min
Total: 6.7min
Actius AL- Length: 15sec
3DU
Videos: Synthetic
Motion: Moderate
512x768px 3D: 100% of time
at 42.5 DPI Described impairments: N/A
Quality level: Highly acceptable
Viewing: 13.9min HDDP
Length: ~ 18s
Total: 23 min
Videos: Synthetic and Natural
427x240px Motion: Variable
at 155DPI 3D: 50% of time, 2D:50% of time
Described impairments: Depth , spatial
Quality level: Highly acceptable
Viewing: 7.9min HDDP
Length: ~ 10s
Total: 15.8min
Videos: Synthetic and natural
427x240px Motion: Variable
at 155DPI 3D: 100% of time
Described impairments: Spatial
Quality level: Highly acceptable
Viewing: 32 min HDDP
Length: ~ 60s
Total: 37.3 min
Videos: Synthetic and natural
427x240px Motion: Variable
at 155DPI 3D: 100 % of time
Described impairments: N/A
Quality level: Highly acceptable
Viewing: 46min 3D LCD
Length: ~ 30s
[23 + 23]
Videos: Synthetic and natural
Total: 54.8min
400x480px Motion: Variable
[27.4 + 27.4]
at 100 DPI 3D: 80% of time, 2D: 20% of time
Described impairments: Depth , spatial
Quality level: Mainly unacceptable
42
38
77
30
Heterogeneous stimuli material was used in the experiments. The stimuli contained synthetic
and natural video scenes, variable depth levels, motion and impairments. Based on the
descriptive quality evaluation tasks that were included in the evaluations (and conducted after
the post-task SSQ data elicitation), the experiments 2 and 5 contained detectable impairments in
spatial and depth domains while experiment 3 resulted in only spatial impairments. Finally, rated
overall quality in the psychoperceptual studies was highly acceptable except in experiment 5.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
2.1.4 Apparatus – displays
Three dual-view autostereoscopic displays were used in the experiments. Such displays work by
showing different image to each eye of the observer. Dual-view displays generate two images
which are spatially interleaved – half of the sub-pixels are visible from one direction and the
other half from another direction. The light of the display is redistributed by an optical filter –
either parallax barrier, which selectively blocks the light or lenticular sheet, which refracts the
light in different directions [68]. When the display is correctly positioned in respect to the
observer eyes, as marked with “1” and “2” in Figure 2, it is possible to perceive 3D objects as
freely floating in front of the display. In some autostereoscopic displays the optical layer can be
turned off, which allows the display to be used for 2D images. In other displays, where the
optical layer is static, the only option is to duplicate the visual information, and make the same
image visible by each eye of the observer.
The first display used in the experiments is Actius AL-3DU by Sharp, which uses switchable
parallax barrier [68]. Every other sub-pixel of that display belongs to alternative view. As each
view is visible from multiple angles, and the angle of visibility of one view is quite narrow, it is
possible for an observer to perceive reversed stereo image, as is for observer at position 3 on
Fig. 1c. The visual quality of the 3D scene is very sensitive to the observation angle - except for
three narrow observation spots the display exhibits noticeable moiré and ghosting artifacts. In
3D mode, the resolution per view is 512x768px at 42.5 DPI, with pixel aspect ratio of 2:1. The
viewing distance in the experiments was ~55 cm.
The second display is 3D display by NEC with horizontally double-density pixel arrangement,
also known as HDDP display [70]. Due to special pixel arrangement it has the same resolution in
2D and 3D mode, namely 427x240px at 155DPI. Its optical layer is lens-based and cannot be
turned off, and 2D display mode is achieved through pixel doubling. The HDDP display is with
the lowest crosstalk and the highest visual quality of the three. The 3D effect can be observed
from a wide range of angles and distances. The viewing distance in the experiments was ~45
cm.
The third display that was used is Stereoscopic 3D LCD MB403M0117135, produced by
masterImage [69]. It uses Cell Matrix Type parallax barrier which can be switched between
portrait 3D and landscape 3D mode. The display is 2D/3D switchable with 2D resolution of the
display is 800x480px at 200 DPI and the 3D resolution is 400x480px at 100 DPI. The views of
that display alternate for each 3 sub-pixels – every second full pixel belongs to alternative view.
This creates specific color tint artifacts in 3D mode, caused by sub-pixels with certain color being
partially covered by the barrier. As the 100DPI resolution was deemed high enough, 2D images
were shown using 3D mode and pixel doubling. The viewing distance in the experiments was
~40 cm.
2.2 Results
The analysis targeted three main aspects: To explore 1) the influence of immersion by
comparing pre-and post immersive evaluations, 2) influence of post immersive time on
evaluations, and 3) to identify the time when post immersive symptoms are reduced back to the
pre-immersive level. As an overall tendency, the immersive period caused a short term peak to
the total simulator sickness and its factors.
Pre vs. post immersion - The immersion significantly increased the symptoms. Wilcoxon
pairwise comparisons showed a significant difference between pre and post evaluations in 14
out of 15 cases (p<.01, cf. Figure 1). There are two exceptions to this main result. In the first
experiment, the immersion did not influence the severity of nausea symptoms (Z =-1.46, p=1.43,
ns). In the second experiment, a lower level of nausea was reported after the immersion being in
the contradiction to the main tendency of results (Z =-2.30, p<.05)
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Influence of post-immersive time - After the immersion, both individual and total symptoms
reduced over time (Figure 1). Time (see Table 2) has significant influence of the symptoms in
the experiments 1-4. As an exception, this influence was not announced in the case of nausea of
the first experiment.
Reduction of post-immersive symptoms - Immersion caused a short term peak to simulator
sickness score or its factors and starting level of pre-immersive scores was mainly reached
within eight minutes after immersion. The level of pre-immersive total and oculomotor scores
was reached in four minutes after immersion in three experiments (1, 2, 3) while it was not
reached within the twelve minutes in experiment 4 (p>.05). In the terms of disorientation, to
recover from immersion required from four to twelve minutes (Exp 1: 8min, Exp 2-3: 4min, Exp 4:
12 min, p>.05).
Figure 1 Results of simulator sickness symptoms and its factors in the experiments 1-4
shown as total scores. The results of experiment 5 show the post-immersive measures
after two different lengths of exposure. Pre = pre-immersive measure; time 0-12 = time of
post-immersive measurement in minutes. The bars show 95% CI of mean.
Finally, the pre-immersive level of nausea was equal after immersion (Exp 1; p>.05), it become
lower after immersion (Exp2; p<.05), or it was reached within four minutes (Exp 3-4; p>.05). In
total, the results showed that the recovering time was prolonged after a long immersive period of
viewing 3D in the experiment 4 compared to the other experiments. This result is especially
visible in oculomotor factors, as well as in total simulator sickness and disorientation.
In experiment 5, the post immersive evaluations were collected two times, immediately after
viewing at 24.7 min and at 58.4 min (Figure 2). There were neither differences in the total
simulator sickness (Wilcoxon: Z=-1.301, p=.193) nor in its factors (nausea: Z=-1.789, p=.074,
oculomotor: Z=-1.85, p=.064, disorientation: Z=-.794, p=.43) between the post immersive
measurements. Overall, the experiment 5 showed a higher level of symptoms in all evaluations
compared to the other experiments.
2.3 Discussions and Conclusion
Goal of this study was to explore the simulator sickness symptom severity in five quality
evaluation experiments conducted on three mid-sized or small mobile autostereoscopic displays.
Our five experiments were characterized with variable length and structure of video viewing,
overall quality of stimuli and nature of perceivable impairments. Although variable characteristics
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
of settings can limit the direct between experiment comparisons, our results are beneficial for
showing the general tendency of visual comfort in these experiments.
The results showed a slight and mainly short term increase in the symptoms after immersion. In
these studies, the overall quality level was acceptable for the prospective consumer [10][11][12].
Firstly, the reported level of symptoms in our studies with HDDP and Actius ALDU displays are
equal or lower compared to previous studies using CRT or Head-Mounted Displays after 40 min
lasting fast-speed gaming [63]. Secondly, the pre-immersive level of simulator sickness
symptoms was reached mainly within the first four minutes when active viewing time was less
than 14 minutes. For longer viewing time (more than 30 min) on a small sized HDDP display, the
recovery time was prolonged a bit. These results indicate that a short term video viewing (e.g.
typical for mobile television and video), on these autostereoscopic dual-view displays is not
offending.
The results also showed a relatively high level of symptoms after a long viewing task on a 3D
LCD display. In this study, the overall quality level was low for users and some of the used
stimuli were highly impaired, causing perceptual problems such as cross-talk. In this study, the
viewing time (23 or 46 min) did not increase the symptoms. This might be explained by
physiological adaptation. According to Stanney et al. [71], under prolonged exposure, near or
over 30 min can even lessen the symptoms due to the physiological adaptation.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
3 Probing OPQ in the context of use
Subjective quality evaluation research has slowly started to emerge its focus towards usercentric assessments with rich use of methods during recent years. It has become important to
understand quality of experience more extensively than only as sensorial satisfaction or as a
ratio of erroneous and error-free system quality, and to further utilize this information in design of
novel systems [10][42]. Methodologically, this change has not only introduced the application of
descriptive evaluation methods in parallel to traditional quantitative psychoperceptual quality
evaluation tools (e.g. ITU-T P.910 [24]), but also a more detailed analysis of the factors of
external validity referring to the use of the end product such as user characteristics, necessary
system components, and context of use (overview [42]).
Descriptive quality evaluation methods are used to identify the critical quality attributes,
complement and give deeper understanding on the attributes beyond the conventional
psychoperceptual assessment methods [10][12][28][25]. There are two main approaches of the
descriptive methods: 1) Interview-based methods contain a relatively fast data-collection phase
and data-driven procedure in the analysis which can also utilize statistical techniques. These
methods have been applied in the evaluation of unimodal and multimodal stimuli with naïve
participants mainly in the controlled circumstances [25][44]. 2) Vocabulary-based methods are
characterized by a multistep procedure to develop either an individual or consensus vocabulary,
and to rate quality using that vocabulary [17][41]. The analysis emphasizes the use of statistical
techniques, and these methods have been applied in the evaluation of heterogeneous stimuli
with naïve and trained participants in controlled circumstances. Within this branch of descriptive
methods, Open Profiling of Quality (OPQ) is a mixed method combining conventional
quantitative psychoperceptual quality evaluation and qualitative descriptive quality evaluation
based on individual‟s own vocabulary [10]. Although extensive work has been done to develop
different descriptive methods, their applicability and validity outside the laboratory circumstances
is unknown.
There are only few previous studies which have examined quality of experience of time-varying
media in the context of use. Context of use represents the circumstances in which the activity
takes place and it is characterized by physical, temporal, task, social, and technical and
information context, and their variable properties (overview [46]). The focus of the previous
studies has been set on identifying the context-dependant quality requirements, to compare
these requirements to the laboratory evaluations, and to maximize the ecological validity. These
perspectives have a high relevance to the mobile video and television services which are used
(or expected to be used) in heterogeneous usage contexts. However, to conduct this type of
study requires a change in research paradigm from experimental to quasi-experimental research
[45]. It requires an identification of threats of validity, tracking of the circumstances on the
multiple levels in different phases of the study, and use of different research methods to be able
to conclude the causal effects (ibid). Even though these studies have contributed to the novel
way of evaluating quality in the context of use, their focus has not been essentially to validate
the descriptive or mixed methods outside of the laboratory circumstances.
The goal of this study is to extend and to validate the use of the Open Profiling of Quality (OPQ)
method to the field circumstances. Following, we present the research method of a comparison
study between a controlled laboratory circumstances and a context of use. Furthermore, we
report and discuss the results.
3.1 Research method
Participants – Between-subject design was used in the study. A total of 42 untrained
participants (age: 19-52 years; 18 female, 24 male) took part in the study. A control group with
21 participants was tested under laboratory conditions and 21 participants were tested in a user
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
context situation. 15 participants per group were selected randomly for sensory evaluations. All
test participants were tested for visual acuity (myopia and hyperopia: Snellen index: 20/40), color
vision (Ishihara test) and stereo vision (Randot Stereotest 0.6). Five of the participants had been
working in the field of video editing or video application. One of the participants had prior
experience in subjective quality evaluation, but none of them about 3D video. All other test
participants can be classified as naïve participants.
Stimuli – We chose six different audiovisual clips with a length of 20 seconds for the test
according to their audiovisual characteristics and the user requirements for mobile 3D television
and video [30]. The clips were edited using Premiere Pro CS4 and exported with a resolution of
640px x 480px for each channel. Audio was sampled at a 44.1 kHz rate and 16 bit. The videos
were encoded at three different quantization parameters (QP) at three different quality levels:
high at QP 30, medium at QP 40 and low at QP 45. For encoding, we used x.264 for video and
Nero AAC for audio. Finally, we used Stereo Movie Maker to multiplex the prepared clips into the
3D-avi format which was needed for the presentation of the stimuli material.
Table 3 Snapshots of the six contents under assessment (VSD=visual spatial details,
VTD=temporal motion, VD=amount of depth, VDD=depth dynamism, VSC=amount of scene
cuts, A= audio characteristics)
Screenshot
Genre and their audiovisual characteristics
Animation - Dracula
VSD: med, VTD: high, VD: high,
VDD: high, VSC: high, A: music, effects
Documentary - Macroshow
VSD: high, VTD: med, VD: high,
VDD: low, VSC: med, A: orchestral music, ambience
Sports – Skydiving
VSD: low, VTD: med, VD: med,
VDD: low, VSC: low, A: music
User-created Content – Street Dance
VSD: med, VTD: high, VD: high,
VDD: med, VSC: low, A: music, ambience
Documentary – The Eye
VSD: med, VTD: med, VD: med,
VDD: med VSC: med, A: music
Sports – 24h
VSD: med, VTD: high, VD: med,
VDD: high, VSC: high, A: ambient music
Stimuli presentation – The tests were conducted in two different contexts. The first context was
a controlled laboratory environment [4] at Ilmenau University of Technology. We chose a café as
context of use according to the most mentioned usage situations for mobile 3DTV [43]. In the
café, we used the same time slot during the day and same place for each participant to obtain
similar conditions for the study as defined for quasi-experimental settings. The clips were
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
presented on an 8'' FinePix Real 3D V1 display based on parallax barrier technology. Test
participants were allowed to adjust their initial viewing distance of 45cm. The two built in stereo
speakers of the device were used for audio playback. The order of the clips was randomized.
Table 4 Characteristics of the contexts, described based on the Model of Context of Use
for Mobile-Human-Computer-Interaction [46], operationalized in [45][42]
Components/ properties
Physical context
Functional place
Lab
Café
Laboratory conditions
Student café at TU Ilmenau
Sensed attributes (Audio, Visual)
A: quiet
V: calm, indoor
M: none
P: straight
none
A: noisy
V: noisy, indoor
M: none
P: lean
tea cup
1,5 – 2 hours
Vary
Extra time
1,5 – 2 hours
Between 11.45 am and 3 pm
Extra time
Quality evaluation
none
none
Entertain
Quality evaluation
Relax, drink tea/ coffee
possible
Entertain
Moderator
none
Moderator, other guests
possible
none
none
Static
dynamic
*
Freedom to adjust
Freedom to adjust
Entertain, pass time, relax
Freedom to adjust
Freedom to adjust
Movements (Movement, Position)
Artifacts (other than answer sheet)
Temporal context
Duration
Time of day
Actions-time
Task context
Multitask 1
Multitask 2
Interruptions
Task type
Social context
Persons present
Interpersonal actions
Technical and informational context
Other systems
Properties
Level of dynamism
Other related factors
Motivations
Viewing distance
Device volume
Test procedure – In overall, the test procedure of the study followed the Open Profiling of
Quality approach [10][12] and all evaluations were organized in one single session.
Psychoperceptual evaluation started with the visual screening and the explanation of the test
procedure. In the following training and anchoring, we presented a subset of test items which
covered the full range of quality. Test participants were asked to find their best viewing position
and to practice the evaluation task. Then, an Absolute Category Rating (ACR) according to ITUR P.910 [24] was conducted to evaluate the overall quality quantitatively. The stimuli were
presented one by one and the participants rated retrospectively the acceptance of the quality on
a binary (yes/no) scale [27] and the overall satisfaction on an unlabeled 11-point scale [47].
Each stimulus was assessed twice. After a short break of about 10 minutes, in which the
participants filled out a demographic data questionnaire, the sensory evaluation was conducted.
Sensory evaluation started with the introduction of participants to the sensory evaluation task.
Then, in attribute elicitation, the participants watched a second subset of test items to develop
their individual quality attributes [10]. In the attribute refinement they were then asked to define
their quality attributes and if necessary to reconsider if they perceived some of the attributes as
not unique or could not define them precisely. At the end of the refinement, each of the final
attributes was attached to a 10 cm long line with the labels 'min' and 'max' at its ends. In the final
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
sensory evaluation, the stimuli were again presented one after the other and the participants
rated overall quality on all of their attributes for each test item. The participants were instructed
to mark the sensation quality of an attribute on the line, 'min' for no sensation of this attribute at
all and 'max' for the maximum sensation of this attribute.
Methods of Analysis – The quantitative data was analyzed using non-parametric statistical
analysis as no normal distribution was given for the test items (Kolmogorov-Smirnov: P<.05).
Friedman test was applied to check if the independent variables impacted on the dependent
one. Significant differences between two related items can then be measured using Wilcoxon
test. To compare the binary, non-related acceptance data between the contexts, Pearson‟s ChiSquare test was applied. For the pair-wise comparison of satisfaction data between contexts we
applied Mann-Whitney-U test. All quantitative data analysis was performed using PASW
Statistics 18.
The sensory data was transformed into a set of quantitative measures by measuring the
distance from the min-label to the participant‟s tick on the line per attribute and stimulus. This
results in individual configuration per participant which is called individual configuration (Figure
2). The individual configurations can be analyzed by applying Multiple Factor Analysis (MFA) as
suggested in the Extended-OPQ approach [12]. MFA first conducts a principal component
analysis for each configuration and scales each configuration to the PCA‟s first singular value. In
a second step, all configurations are merged into a single matrix and another PCA is conducted.
R and its FactoMineR package were used for sensory analysis. Hierarchical Multiple Factor
Analysis (HMFA) finally is a method that analyzes data sets having a hierarchical structure
(Figure 2). We applied HMFA to compare the two sensory data sets from laboratory and context
for similar information as final step of checking validity.
Laboratory
#1
#2
...
#m
Café
#1
#2
...
#n
Figure 2 The hierarchical structure of the data set with the two hierarchical levels of
individual configurations (green) and the evaluation in different contextual settings (red).
3.2 Results
3.2.1 Psychoperceptual evaluation
Acceptance of overall quality - On average, all presented stimuli at qp30 provided a highly
acceptable quality level of 93%, qp40 stimuli reached an average level just above 50% and qp45
got an acceptance rate of 12%. Comparison between the results of the two contexts did not
reveal significant difference, with exceptance to contents streetdance and theeye at medium
quality level (Pearson‟s χ2: P < .05). Figure 3 presents the results of the overall acceptance
scores content by content and for the two contexts.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Figure 3 Overall acceptance scores for the items under test
Satisfaction with overall quality – Parameter combinations influenced overall quality
satisfaction when averaged over the content for each of the contexts (laboratory: Fr = 627.705,
df = 2, P < .001; café: Fr = 419.846, df = 2, p < .001). The comparison of QPs between the
context revealed slightly better results for QP45 in the café (Mann-Whitney-U: U = -2.485,
P<.05). Comparison of QP30 and QP40 did not show significant differences (all comparisons:
P>.05). In a content-by-content comparison for the QPs significant differences in satisfaction
scores was found only for theeye which was rated slightly better in the café than in the
laboratory (Mann-Whitney-U: U = -2.305, P<.05; other comparisons: P>.05). Figure 4 shows the
overall quality scores averaged over contents and content by content for the different QPs and
contexts. QP30 provided significantly the highest quality satisfaction, the rating for QP45 are
worst (all comparisons: P<.001). The results of content by content analysis follow the overall
tendency for different QPs (all comparisons: P<.001).
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Figure 4 Mean satisfaction scores for the items under test. Error bars show 95% CI of
mean.
3.2.2 Sensory evaluation
Test participants developed a total of 91 individual quality attributes in the laboratory (mean: 6,
min: 4, max: 7) and 78 attributes in the café (mean 5, min: 4, max: 8).
Laboratory – The results for the sensory data from the laboratory can be found as item and
correlation plot in Figure 5 and Figure 6. The item plot (Figure 5) shows the loadings of the test
items on the first and second component of the MFA. The first two components of the MFA
explain 65.42% of the variance in the individual data (also called explained variance) with
44.25% and 12.17%, respectively.
Along the first component, the items separate along the different QPs. Items close to the origin
have less impact on the component than those with high (positive or negative) loadings. Along
the second component, a clear separation of all items of content dracula can be found. These
items show high impact for the second component of the model.
Further insight can be obtained from the correlation plot (Figure 6). This plot shows the
correlation of each individual attribute with the first and second component of the MFA model.
The first component is mainly described with attributes like „blocky‟ or „artifacts‟ on its negative
polarity, while the positive one correlates strongly with attributes like „clear, „sharpness of edges‟,
or „3D effect‟. These items describe the differences in the perception of video quality and are in
accordance to the separation of QPs along the first dimension. The second dimension is
correlating with attributes such as „double images‟ on the one, with few attributes like „color-fast‟
and „perceivable as one image‟ on the other polarity. Having in mind that along this component
content dracula separated from the other content, we see that there was a problem with getting
a proper 3D perception. The double images may have been caused by a high disparity of this
content. A few attributes correlate with both dimensions, such as „3D effect‟, „depth‟, and
„amount of 3D‟. However, the partial plot (Figure 5) in which the impact of each individual
configuration is illustrated, shows that this problem only occurred with some participants. While
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
group.1
group.2
group.3
group.4
group.5
group.6
group.7
group.8
group.9
group.10
group.11
group.12
group.13
group.14
group.15
group.16
5
10
15
some participants show very high loadings for their individual configuration on dimension 2,
others show none.
dracula_qp40
dracula_qp45
dracula_qp30
24h_qp40
0
24h_qp30
24h_qp45
streetdance_qp30
streetdance_qp45
skydive_qp45skydive_qp40skydive_qp30
makro_qp45 streetdance_qp40
makro_qp30
theeye_qp30
theeye_qp45 theeye_qp40 makro_qp40
-10
-5
0
5
10
Dim 1 (44.25 %)
Figure 5 Item plot and partial loadings for the laboratory. The partial loadings show
individual participant’s impact.
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
1.0
MOBILE3DTV
ghost.images.3
ghost.images
ghost.images.2
ghost.images.1
errors
0.0
Dim 2 (12.17 %)
0.5
depth.estimation.1
3D.effect.4
sonorous
degree.of.3D.effect
three.dimensional.1
depth.suggestion
effectful sharpness.1
3D.effect.1 resolution
close.to.reality
3D.effect colour.fidelity
true.to.reality
X3D.impression
recognizable sharp.1
sharpnessnaturalness sharp.3
high.resolution
rich.in.detail sharp bright
sharp.2 sharp.4
vividness
3D.effect.3
clarity.of.3D colour.intensity
quality.of.colour
detailed
image.sharpness.2
rich.in.contrast.1
blocky.8
blocky.9 blocky.7
blocky.1
blocky
artifacts blocky.6
blocky.4
blocky.2 blocky.5
blocky.12 angular
blocky.10
halting.1 wiggly
-0.5
blocky.3
flat
good.colour
perceived.as.one.picture
-1.0
brilliant
-1.0
-0.5
0.0
0.5
1.0
Dim 1 (44.25 %)
Figure 6 Correlation plot of the laboratory evaluation.
Café – The sensory results for the café are comparable to those obtained in the laboratory
(Figure 7 and Figure 8). The first two components of the MFA model account for 47.76% of
explained variance (component 1: 33.48%, component 2: 14.28%). As for the laboratory results,
the items of the café results separate along the first dimension according to their QPs (Figure 7).
The separation of content dracula along the second component can be found as well. The
correlation plot (Figure 8) shows high correlation of attributes like „blocky‟ or „blurry‟ and, in
contrast, of attributes like „spacious‟, „rich in details‟, and „clear‟ with the first component. The
second component correlates with attributes like „double effect‟, „dark‟, and „annoying‟ on its one
polarity. The other one correlates with „bright‟ and „realistic‟. Again, the partial plots show the
differences of individual contributions to the second dimension.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
10
15
group.1
group.2
group.3
group.4
group.5
group.6
group.7
group.8
group.9
group.10
group.11
group.12
group.13
group.14
group.15
5
dracula_qp30
dracula_qp40
dracula_qp45
24h_qp45
0
24h_qp30
skydive_qp30
skydive_qp45
24h_qp40
skydive_qp40
streetdance_qp45
theeye_qp30 streetdance_qp30
theeye_qp40 streetdance_qp40
makro_qp30
makro_qp40
theeye_qp45
makro_qp45
-10
-5
0
5
10
Dim 1 (33.48 %)
Figure 7 Item plot and partial loadings for the café. The partial loadings show individual
participant’s impact.
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
1.0
MOBILE3DTV
ghost.image.1
dark.1
dark ghost.image
perturbing
displaced.3D
3D.effect
0.5
blurred.3
clarity.of.3D
0.0
realistic
bright.2image.quality
jerky.free
image.sharpness
brilliant.colour
high.resolution
bright.image
resolution
bright.1 resolution.1
true.to.detail
blocky.9
blurred.1
blocky.2
blocky.4
blocky.1
blurred
blocky.8
blocky.3
blocky.6 blocky
blurred.4
blocky.7
blocky.5
colour.contrast
bright
-0.5
Dim 2 (14.28 %)
spatial
smearing
true.to.colours
realistic.1
-1.0
brilliant
-1.0
-0.5
0.0
0.5
1.0
Dim 1 (33.48 %)
Figure 8 Correlation plot of the evaluation in the context of use
3.2.3 Comparison of results
The final step of the analysis is the comparison of the two MFA models obtained from laboratory
and café. A simple solution of comparison is the description of differences and similarities
between the results of the individual models. Overall, the separate analysis has shown that the
first two dimensions of each model describe similar things. While the first component relates to
video quality, the second component refers to quality factors in relation to display and disparity
problems. In addition, the attributes, also coming from different participants are very similar in
describing the two components. However, a difference can be found in the amount of attributes.
In general, attributes which have a correlation higher than 0.5, i.e. they contribute to at least 50%
of explained variance, are regarded to be more important than the rest. For the laboratory, a
percentage of 61.5% of all attributes classify to this criterion, while for the café only 44.6% of the
attributes contribute to 50% of explained variance or higher.
A majority of the attributes for the laboratory MFA model show high correlation with the first
dimension while only few correlate with the second one. For the café MFA results it is noticeable
that the amount of attributes along the first component is lower. In addition, there exist more
inter-dimensional attributes, which means that the dimensions are not as well separated as in
the laboratory model.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Although discussion already shows differences, we want to see if models are different. HMFA
results confirm the previous findings and allow modelling the comparison in a joint analysis of
both data sets. In the HMFA result, each test item is plotted at the center of gravity between both
data sets. In addition, the partial clouds for each data set are plotted to see the separate impact
of laboratory and café data. The HMFA model equals the separate models in terms of explained
variance (51.12%; 37.93% for component 1 and 13.19% for component 2) and loadings of the
items (Figure 9). In this joint model, again the different QPs separate along the first component.
The analysis shows that the deviation between the data sets along the first component of the
HMFA model is low. Along the second dimension, we can identify differences in the impact of
the partial clouds. The deviation between the data sets along this component is much higher,
especially for content dracula. Along the second dimension the café data shows higher loadings
than does the laboratory data set.
Summarizing, the statistical comparison between the profiles revealed comparable
discrimination of the test items for the visual quality based on similar factors. Higher sensitivity
for the quality of the 3D in terms of crosstalk between channels can be identified for the café
context.
3.3 Discussion and Conclusion
The goal of this study was to validate the quality models that can be obtained using the Open
Profiling of Quality approach in a comparison between data from laboratory and context of use.
Within the User-centered Quality of Experience approach [26], the descriptive evaluation of
quality and its evaluation in the context of use are two key approaches that were combined in
this study.
The OPQ approach allows for a combined evaluation of quality using quantitative evaluation and
qualitative descriptive sensorial profiling methods and high complementation of the results of
both methods have been found. The comparison of the two contexts has shown that individual
quality factors and their impact on users‟ perceived quality are very stable for different contexts.
From the models, we were able to identify two main components that describe users‟ perceived
quality. The most important component is video quality. As in previous studies, good video
quality includes also descriptions of 3D perception [10][12]. This can explain the contentdependency among different QPs found in the quantitative results. Contents makro and 24h are
rich in details and offer better 3D perception as other contents do. When video quality
decreases, the added valued is not evaluated anymore and the satisfaction of quality with these
two contents decreases accordingly with the loss of details. Beside the video quality, the impact
of perceivable double images was found. These descriptions mainly correlate with content
dracula and arise from a high amount of disparity and resulting crosstalk between left and right
channel. For this dimension, two important findings were made. First, the analysis shows that
these double images were only perceived by several test participants although participants were
screened for the same visual abilities at the beginning of the test. This finding deepens the need
for tools for better screening and description of the test sample [14]. In addition, test participants
show higher sensitivity for this component in the context of use. This finding is confirmed by
previous study in which the ease of use and the viewing comfort for mobile 3D television were
identified as important components of quality in the context of use [42][45].
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Lab
Café
2
3
MOBILE3DTV
dracula_qp30
dracula_qp40
1
dracula_qp45
24h_qp45
24h_qp30
0
24h_qp40
streetdance_qp45
skydive_qp45skydive_qp40
streetdance_qp40
theeye_qp30
makro_qp40
skydive_qp30
streetdance_qp30
-1
theeye_qp40
theeye_qp45
makro_qp45
-2
-1
0
1
2
3
Dim 1 (37.93 %)
Figure 9 Result of the Hierarchical Multiple Factor Analysis. The partial clouds show the
impact of the laboratory data set (black) and the data set from the café (red) on the joint
model
Limitations in this study can be seen in the missing detailed recording of the characteristics of
the context of use during the conduction. Although we tracked special events during the
conduction by writing them down, a detailed recording over time, e.g. video, is missing. This
does not allow for deeper analysis towards shared attention and makes it impossible to report
this more accurately as reported in other studies in the context of use [42][45]. However, the
results of the study give valuable knowledge to deeper understand the interaction of quality
perception and the interaction of perception and shared attention in the context of use.
Summarizing the comparison, we have shown that descriptive quality models obtained by
applying the OPQ method are, in overall, very similar between the two contextual settings. From
both models, two main components, video quality and crosstalk, were identified and the loadings
of attributes as well as the correlation of individual attributes was equal to each other. This
finding contributes to the overall validity of OPQ results that now has been found in several
studies [10][12]. Open Profiling of Quality has become a validated tool for measuring
experienced quality factors. However, differences were still identified between the evaluation in
the laboratory and the context of use which OPQ was not able to measure and monitor, e.g.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
shared attention. This underlines the importance for contextual evaluations in the User-centered
Quality of Experience evaluations which we see as a fruitful addition in the holistic evaluation of
user-centered Quality of Experience of multimedia systems.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
4 Extended OPQ
4.1 Fixed vocabulary and terminologies in descriptive analysis
In contrast to individual descriptive methods, fixed vocabulary approaches evaluate perceived
quality based on a predefined set of quality factors. The descriptive evaluation with fixed
vocabularies has had a long tradition and several methods have been introduced and applied
successfully on different research questions [18][52]. In general, this fixed vocabulary (also
objective language [6], lexicon [15], terminology [36], or consensus vocabulary [1]) is regarded
as a more effective way of communicating research results between the quality evaluators and
other parties(e.g. development, marketing) involved in the development process of a product [6]
compared to individual quality factors. They also allow for direct comparison of different studies
or easier correlation of results with other data sets like instrumental measures [5]. In general,
vocabularies include a list of quality attributes to describe the specific characteristics of the
product to which they refer. These quality attributes are usually structured hierarchically into
categories or broader classes of descriptors. In addition, vocabularies provide definitions or
references for each of the quality attributes [6][15]. Some terminologies in the field of sensory
evaluation have become very popular as they allowed defining a common understanding about
underlying quality structures. Popular examples are the wine aroma wheel by Noble et al. [36] or
Meilgaard et al.‟s beer flavor wheel [34] which have the common wheel structure to organize the
different quality terms.
A fixed vocabulary in sensory evaluation needs to satisfy different quality aspects that were
introduced by Civille and Lawless [5]. Especially the criteria of discrimination and nonredundancy need to be fulfilled so that each quality descriptor has no overlap with another term.
In descriptive evaluation methods which apply these vocabularies, a consensus about the
meaning of each of the attributes is needed among assessors [18]. While sensory evaluation
methods like Texture Profile [3] or Flavour Profile (see [35]) apply vocabularies that have been
defined by underlying physical or chemical properties of the product, Quantitative Descriptive
Analysis (QDA) (see [52]) makes use of extensive group discussions and training of assessors
to develop and sharpen the meaning and consensus of the set of quality factors. Relating to
audiovisual quality evaluations, Bech and Zacharov [1] provide an overview of existing quality
attributes obtained in several descriptive analysis studies. Although these attributes show
common structures, Bech and Zacharov outline that they must be regarded highly applicationspecific so that they cannot be regarded as a terminology for audio quality in general [1]. A
consensus vocabulary for video quality evaluation was developed in Bech et al‟s RaPID
approach [2]. RaPID adapts the ideas of QDA and uses extensive group discussions in which
experts develop a consensus vocabulary of quality attributes for image quality. The attributes are
then refined in a second round of discussions in which the panel then agrees about the
important attributes and the extremes of intensity scale for a specific test according to the test
stimuli available.
4.2 The component model as extension of the OPQ method
4.2.1 Open definition task and qualitative descriptions
Within a set of OPQ studies in a specific research area, test participants develop a large amount
of attributes that all relate to their individual descriptions of perceived quality in the specific
domain. As descriptive analysis targets a broad evaluation of a specific research area with
respect to different research problems [52], these descriptors cover a multifaceted view on
experienced quality in this domain. During the development of Open Profiling of Quality, the
question has arised if it was possible to develop a common vocabulary from these individual
attributes for description and evaluation of experienced quality of audiovisual 3D media. In fact,
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
OPQ is a suitable approach to investigate and model individual experienced quality factors, but
higher level descriptions of these quality factors to be able to communicate the main impacting
factors to engineers or designers have been missing.
As a related approach, Samoylenko et al. [16] introduced the Verbal Protocol Analysis method.
The goal of this approach was to analyze descriptions of the timbres of musical sounds into a
common structure. The approach contains three levels of classification. In the first level,
Samoylenko et al. classify each of the verbal descriptors into its „logical sense‟, i.e. if it describes
similarities or differences between two stimuli. The second phase clusters each descriptor
according to its „stimulus relatedness‟ which refers to either global or specific descriptions. The
third hierarchical level finally groups each descriptor in accordance to its „semantic aspect‟. This
level differentiates each descriptor into either single features or holistic, conceptual descriptions.
In overall, ten different classifications are made for each descriptor along the three levels. The
final result is a classification of each descriptor in accordance to these ten classes. Samoylenko
et al. use this classification to switch from a descriptor-related analysis to a more general
analysis of results within different groups in the classification. Although this approach is
promising for a generalized step of analysis of data obtained from free verbalization tasks, it
does not allow developing general vocabulary which can be used in prospective evaluation
studies.
The component model is a qualitative data extension that allows identifying the main
components of Quality of Experience in an OPQ study and structuring these components into a
logical structure of categories and subcategories. The component model is included in the
Extended-OPQ approach [12] and extends OPQ with a fourth step of data analysis. It uses data
that is collected during the OPQ test Within the attribute refinement task of the sensory
evaluation, a free definition task is conducted. The task completes the attribute refinement and
test participants are asked to define each of their idiosyncratic attributes. Like during the attribute
elicitation, they are free to use their own words, but definitions must make clear what an attribute
means for them or to which aspect of experienced quality it relates. In addition, participants are
asked to define a minimum and a maximum value of sensation for each attribute if possible. Our
experience has shown that this task is rather simple for the test participants compared to the
attribute elicitation. After the attribute refinement task, they were all able to define their attributes
very precisely (Table 5). Collecting definitions of the individual attributes is not new within the
existing Free-Choice profiling approaches and definitions are collected in related methods [56].
However, those definitions have only served to interpret the attributes in the sensory data
analysis [10]. In the Extended-OPQ approach, we see these definitions as a second level of
descriptions of the experienced quality factors with the help of the free definition task. These
descriptions are short (one sentence), well-defined, and precise. While the individual attributes
are used for sensory analysis, the component model extension finally applies these qualitative
descriptors to form a framework of components of Quality of Experience. By applying the
principles of Grounded Theory framework [59] through systematical steps of open coding,
concept development and categorizing, researchers get a descriptive Quality of Experience
framework which shows the underlying main components of QoE in relation to the developed
individual quality factors. Comparable approaches have been used in the interview-based mixed
methods approaches which are also included in the UC-QoE evaluation framework [26][29]. This
similarity makes it possible to directly compare (and combine) the outcomes of the different
methods into a joint model. Following, we will present an application of the component model for
the data obtained within the holistic analysis of mobile 3D video [10][11].
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Table 5 Examples of attributes and their definitions obtained in the transmission study of
Mobile3DTV [11]
Attribute
Participant‟s
Definition
Fluent movement
Movement and action Movements get very N/a
get blurry and get blurry
stuck
in
the
background
Image blurred
Frames
are
layered correctly
Constant background
Background does not N/a
change when there is
a non-moving image
Minimum
not Image not displaced
Maximum
Image seems to be
highly displaced
Colours and outlines
do not change at all
4.2.2 Components of Quality of Experience for mobile 3D video
From the data sets obtained in the evaluations of mobile 3D television, we chose three studies
which represented a large variety of research problems. The characteristics of these studies are
summarized inTable 6.
For each of these studies, test participants developed a set of individual definitions in the free
definition task at the end of OPQ‟s attribute refinement task. These definitions were taken as
independent descriptive data sets for experienced quality and analyzed following the ideas of
data-driven framework in accordance to the principles of Grounded Theory [59] and the
instructions given by Jumisko-Pyykkö [26] as follows:
1.
Open coding towards concepts: Usually, this steps starts by extracting meaningful pieces
of data from the transcribed data sets. In the analysis of the free definition data, each definition
can be treated directly as codes in the analysis as the definitions are short, well defined, and
precise in comparison to e.g. interview data. From these codes, concepts and their properties
are identified.
2.
Categorization: All concepts developed are further categorized into major categories and
probably subcategories
3.
Frequencies of mention: The frequency in each category is determined by counting the
number of the participants who mentioned it. Several mentions of the same concept by the same
participant are counted just once.
4.
Interrater reliability: A second researcher performs coding and categorization for a
randomly selected 20% of each data set and interrater reliability is calculated using Cohen‟s
Kappa [7].
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Table 6 Characteristics of the experiments chosen for development of the QoE
component model
Experiment
problem
and
Experiment 1: [10]
2D-3D Comparison
Sample Size: 15
research
Experiment Variables
Video:
(2D/3D)
Stimuli characteristics under test
presentation
mode
Audio:
presentation
(mono/stereo)
mode
Length: ~18 s
Videos: Synthetic and natural
Content: 6 contents
Presentation mode: 2D and 3D
Quality level: highly acceptable
Video: mp4, 10-22 Mbit/s, 25fps
Audio:WMA 9, 48kHz, 16 bit
Experiment 2: [10]
3D Coding Methods
Sample Size: 15
Video: 4 coding schemes, 2
quality levels (low: 74-160 kbps,
high: 160-452 kbps bit rate)
Content: 6 contents
Length: ~10 s
Videos: Synthetic and natural
Presentation mode: 3D
Quality level: highly acceptable
Video: H.264/AVC (JMVC 5.0.5)
Audio: none
Experiment 3: [12]
3D DVB-H Transmission
Sample Size: 17
Video: 3 coding schemes @slice
and noslice mode, 2 MFER rates
(10%, 20%)
Audio: clean audio
Content: 4 contents
Length: ~60 s
Video: Synthetic and natural
Presentation mode: 3D
Quality level: highly acceptable
Video_ H.264/AVC (JM 14.2),
MVC (JMVC 5.0.5)
Audio: WMA 9 11/44.1 kHz
The results of the data-driven analysis of the free definition task data shows that, in general,
experienced quality for mobile 3DTV transmission is constructed from components of visual
quality (depth, spatial, temporal), viewing experience, content, audio, and audiovisual quality
Table 7.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Table 7 Components of Quality of Experience, their definitions and percentage of
participants’ attributes in this category per study
COMPONENTS
(major and sub)
VISUAL DEPTH
3D effect in general
Excellence of 3D effect
Layered 3D
Foreground
Background
VISUAL SPATIAL
Clarity
Color
Brightness
Blurry
Visible pixels
Detection of objects
VISUAL TEMPORAL
Motion in general
Fluent motion
Influent motion
Blurry motion
VIEWING EXPERIENCE
Eye strain
Ease of viewing
Interest in content
3D Added value
Overall quality
CONTENT
AUDIO
AUDIOVISUAL
DEFINITION (examples)
Descriptions of depth in video
General descriptions of a perceived 3D effect and its delectability
Artificial, strange, erroneous 3D descriptions(too much depth, flat planes)
Depth is described having multiple layers or structure
Foreground related descriptions
Background related descriptions
Descriptions of spatial video quality factors
Good spatial quality (clarity, sharpness, accuracy, visibility, error-free)
Colors in general, their intensity, hue, and contrast
Brightness and contrast
Blurry, inaccurate, not sharp
Impairments with visible structure (e.g. blockiness, graininess, pixels).
Ability to detect details, their edges, outlines
Descriptions of temporal video quality factors
General descriptions of motion in the content or camera movement
Good temporal quality (fluency, dynamic, natural movements)
Impairments in temporal quality (cut-offs, stops, jerky motion, judder)
Experience of blurred motion under the fast motion
User’s high level constructs of experienced quality
Feeling of discomfort in the eyes
Ease of concentration, focusing on viewing, free from interruptions
Interests in viewing content
Added value of the 3D effect (advantage over current system, fun,
worth of seeing, touchable, involving)
Experience of quality as a whole without emphasizing one certain factor
Content and content dependent descriptions
Mentions of audio and its excellence
Audiovisual quality (synchronism and fitness between media).
Exp1
N=15
%
Exp2
N=15
%
Exp3
N=17
%
86.7
66.7
26.7
46.7
33.3
80.0
6.7
33.3
26.7
66.7
58.8
73.3
66.7
26.7
46.7
33.3
73.3
80.0
100.0
80.0
40.0
73.3
80.0
76.5
52.9
17.6
47.1
70.6
47.1
26.7
6.7
20.0
53.3
60.0
40.0
6.7
29.4
52.9
88.2
17.6
20.0
40.0
40.0
53.3
20.0
6.7
13.3
33.3
35.5
52.9
11.8
17.6
20.0
13.3
13.3
40.0
6.7
11.8
17.6
11.8
29.4
23.5
17.6
35.3
The component model provides converging results to the results obtained in the sensory
evaluations in terms of components and their importance. The most important category of the
component model obtained is the visual quality which confirms the findings of the sensory
analysis. Although the weighting of its subcomponents spatial, temporal and depth differs among
the different studies, the overall findings show that especially the artifact-free perception of the
video (clarity, fluency, excellence of 3D) determines participants‟ components of quality. In
addition, the model shows that test participants often use complementary descriptions of quality
which leads to contrary subcategories comparable to descriptions along dimensions in the
sensory results. So, visual spatial quality is either described positively in terms of detection of
objects and their details, while, in contrast, the same effect is negatively described as different
structural imperfections such as blocking impairments and visible pixels. This juxtaposition can
also be identified for other components, e.g. fluent motion vs. influent motion or eye strain vs.
ease of viewing. Finally, it is remarkable that the results of the component framework analysis
confirm the findings of the sensory evaluations for audio and audiovisual components. While in
the sensory analysis one could still argue that the audio-related components are just
overwhelmed by the high impact of visual components, the model shows that just few attributes
are developed in relation to audio and audiovisual quality components. This finding confirms the
sensory results. However, the conclusion of audio and audiovisual as separate components is
important for the holistic view of the developed component model.
Within the work in the UC-QoE framework development, the component model was used in a
joint analysis with qualitative data obtained by interviews in contextual studies [45]. The
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
comparable characteristics of the data and the resulting separate component models allowed
combining the different data sets into one descriptive Quality of Experience model for mobile 3D
video. The results presented by Jumisko-Pyykkö et al. [28] confirm the results of the OPQ
component model and generalize the model. Table 8 lists the final components of QoE for
mobile 3D video [28]. Especially by the contextual data, new emphasis is put on contextdependent components within the category of Viewing Experience. In overall, the developed
joint results present a general descriptive model of QoE for mobile 3D media. Jumisko-Pyykkö et
al. [28] conclude that important steps for further work on the descriptive model relate to
validation and operationalization.
Table 8 Components of Quality of Experience for 3D video on mobile devices and their
definitions
COMPONENTS
(major and sub) - Bipolar impressions
VISUAL QUALITY
DEPTH
Perceivable depth
Perceivable/Not perceivable
Impression of depth
Natural/Artificial
Foreground-background layers
Smoothly combined layers/Separate
layers
Balance of foreground-background
quality
Balanced/Unbalanced
SPATIAL
Clarity of image
Clear/Blur
Block-free image
Block-free/Visible blocks
Color, brightness and contrast
Good/Poor
Objects and edges
Accurate/Inaccurate
MOTION
Fluency of motion
Fluent/Influent
Clarity of motion
Clear/Blurry
Nature of motion
Static/Dynamic
VIEWING EXPERIENCE
Ease of viewing
Easy/Difficult
Pleasantness of viewing
Pleasant/Unpleasant
Enhanced immersion
Enhanced/Not enhanced
DEFINITION (examples)
Descriptions of quality of visual modality, divided into depth, spatial and motion quality
Descriptions of depth quality in video, characterized by perceivable depth, its natural
impression, composition of foreground and background layers, and balance of their quality
Ability to detect depth or variable amount of depth as a part of presentation
3D effect creates a natural, realistic and error-free impression instead of an artificial and
erroneous impression (e.g. too much depth, double objects, shadows, seeing through
objects)
Depth is composed of foreground and background layers and the impression of the
transitions between these layers can vary from smooth to distinguishable separate layers
Balance between the excellence of foreground and background of image quality (e.g.
sharp foreground, blurry background or vice versa, or they are otherwise not in balance)
Descriptions of spatial image quality of video, characterized by clarity, block-freeness,
colors, brightness, contrast and ability to detect objects and edges
Clarity of image in overall -- Clear (synonyms: sharpness, accuracy, visibility) vs. unclear
(synonyms: blur, inaccurate, not sharp)
Existence of impairments with visible structure in image (e.g. blockiness, graininess,
pixels)
Excellence of colors, brightness and contrast
Ability to detect necessary objects and details, their edges and outlines
Descriptions of motion of video, characterized by fluency, clarity and nature of motion
Excellence of natural fluency of motion -- Fluent (dynamic, natural) vs. influent (cut-offs,
stops, jerky)
Excellence of clarity of motion (e.g. accuracy under fast movement or movement out of
screen) -- Clear, sharp vs. blurred, pixelated
Nature of motion in the content or camera movements - Static (synonym: slow) vs.
dynamic (synonym: fast)
Descriptions of viewing experience, characterized by ease and pleasantness of viewing,
enhanced immersion in it, visual discomfort and impression of improved technology and
overall quality
Easy to concentrate on viewing (e.g. free from extra effort and learning, viewing angle
does not interrupt viewing)
Pleasurable viewing experience, also for a longer period of time (e.g. 15min)
Feeling of enhanced immersion into the viewing experience (impression of becoming a
part of the events in the content, involvement, fun and improved impression of naturality,
like-likeness, tangibility and realism)
Visual discomfort
Feeling of visual discomfort (eye-strain) and descriptions of related discomfort symptoms
Experienced/Not experienced
(headache, general discomfort)
Comparison to existing technology
Impression that provided quality of new technology (3D) is higher than quality of
Improved/Not improved
comparable existing technology (e.g. 2D video on a mobile device)
Overall quality
Impression of excellence of quality as a whole without emphasizing a certain factor (e.g.
Good/Bad
excellence over the time, relation between erroneous/error-free)
CONTENT
Descriptions of content, their content-dependency and interests in viewing content
OTHER MODALITIES INTERACTIONS Descriptions of quality of audio modality and interaction between quality of audio and
MOBILE3DTV
Audio
Audiovisual
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
visual modalities
Audio and its excellence
Bimodal audiovisual quality (synchronism and fitness between media) and its excellence
The results of a comparison study between Open Profiling of Quality and a newly introduced
method called Conventional Profiling, in which the Free-Choice Profiling task is substituted with
a Conventional Profiling task using the QoE component model as fixed vocabulary for sensory
evaluations, are presented in the following section.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
5 Comparison of OPQ and CP
5.1 Introduction and research problem
Systematic comparison of different research approaches is an important issue for selecting a
proper research method to a specific research problem. In addition, it is a key aspect in the
methodological work on new research approaches. Comparisons between research methods
are needed to provide guidelines for the effective use of these tools for the practitioners.
Research methods are composed of a collection of methods or techniques which aim at
producing information with as small probability of error as possible [49]. For example, subjective
quality evaluation methods contain components such as sample selection, scaling, evaluation
task, moment of rating, and stimuli and its presentation ([26][39]). Some of these components
have been independently compared to estimate reliability and validity (e.g. [40]), but they provide
only a limited view to the benefits and weaknesses of the whole method and cover only few
dimensions to guide the selection between methods. The method comparisons can cover the
performance-related aspects (e.g., accuracy in different quality range, validity, reliability, and
costs), complexity (e.g., ease of planning, conducting, analyzing, and interpreting results),
evaluation factors (e.g. number of stimuli, knowledge of research personnel) (e.g. [32][33]).
However, there are no extensive criteria for method comparisons available for the multimedia
quality assessment research to guide the practitioners work.
Recent research has proposed novel methods and techniques to capture quality of experience,
but their applicability is unknown. These methods extend the quantitative evaluations [39] by use
of parallel descriptive tasks, psychopysiological measures (e.g. eyetracking, galvanic skin
response), and hybrid methods to assess quality in the natural context of use (see overview
[26]). Among these, mixed methods combine quantitative and descriptive methods into one
study and have slowly started to gain popularity in subjective evaluation research, as several
competing methods exist.
In these mixed methods approaches, quantitative preferences are collected using the
conventional methods provided by the standardization bodies (e.g. ACR [39]. The descriptive
part of these methods is either based on interview techniques or on vocabulary based
approaches (for overview see [10]). We introduced a method called Open Profiling of Quality
(OPQ) [13] in which we extend quantitative assessments with an adaptation of Free-Choice
Profiling (FCP). Naïve participants generate their own vocabularies to describe and evaluate
their quality perceptions. While OPQ and its individual vocabulary approach has provided good
results in the evaluations of multimedia quality by complementing and explaining the
quantitative-only evaluations [10][12], the extension of this new method also increases the costs
of studies (overview [26]). A possibility to shorten the sensory evaluation is to use a fixed
vocabulary for all participants instead of individual attributes as it is proposed in different
consensus based approaches (e.g. [2]). Our Conventional Profiling approach therefore
operationalizes a set of components for Quality of Experience for mobile 3D video [28] (see
section 4.2.2). These components are taken as fixed vocabulary on which the test participants
evaluate their perceived quality in the sensory evaluation. Although this sounds promising in
terms of easier implementation, the benefits and weaknesses need to be compared
systematically. In our mixed methods research approach, „the long-term goal is to support the idea
of safe development of these instruments by understanding their benefits and limitations when capturing
deeper understanding of experienced multimedia quality’. [13]‟
5.2 Comparison criteria and comparison model
A literature review into several fields of research shows that criteria that are applied to describe
different abilities of research methods are varying heavily among these fields. Comparable
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
approaches have been done for different psychoperceptual quality evaluation methods in the
ITU recommendations [39][38]. In recommendations like ITU Recommendation ITU-T P.910 [39]
different research methods are described and short guidelines are offered for purpose-directed
selection of the appropriate method. Within these guidelines mostly stimulus-related factors, e.g.
perceivable quality range or discrimination power, are taken into account to direct the selection
process. Similar approaches can be found in the juxtaposition of different sensory evaluation
methods in food sciences [31][35]. Here, the offered guidelines are oriented along three main
criteria in a more research problem related approach and differentiate in accordance to the
“three primary questions about products”: 1) Question about acceptability, 2) Question about
sensory analysis, and 3) Question about the nature of differences [31]. These two approaches
provide first guidelines for key aspects in the comparison of research methods. Further
comparison criteria can be identified from other Fields of research going beyond the domain of
quality evaluations.
The most general comparison criteria are described in the social sciences. Here, performance
indices are well established tools to measure differences between methods in terms of their
degree of scientific nature. These criteria are primarily validity and reliability [4][22], but also
generalization, replication, and objectivity are found to be important criteria in related research
approaches. Within the social sciences validity and reliability are considered as principles for
good research. These criteria offer a very general way of comparing methods but do not offer
specific guidelines with respect to effort or costs per method.
Studies on usability extend the criteria of validity and reliability with other performance-related
criteria like effectiveness, efficiency and robustness related to economical aspects [32]. In
addition, Markopoulos and Bekker [32] list criteria for describing different usability methods:
purpose of the test, the artifact tested, the interaction tasks, participants, facilitator,
environment/context, procedure, capture of data, and the characteristics of the test participants.
Other studies on comparison of usability tests extend the definitions of effectiveness and
describe it in terms of cost-effectiveness and effectiveness in terms of results (e.g. number of
usability problems identified) [21][50].
In the food sciences also some effort has been done to compare different sensory evaluation
methods. While many of the comparisons focus on the pure juxtaposition of the results [53][37],
McTigue et al. [33] describe a set of requirements for holistic comparison of descriptive methods.
In a comparison of four descriptive analysis approaches, they applied the following criteria:
subject selection, number of subjects, training, samples evaluated, replications, method of
measurement, analysis of data, outcome and professional personnel [33]. In addition to results,
their comparison criteria describe similarities and differences in terms of requirements in time,
test items, personnel, and need for technical equipment. The importance to include test
personnel and technical equipment into a systematic comparison model was also found by
Yokum and Armstrong [54]. In a comparison of several forecasting methods implementationrelated criteria like ease of use, ease of interpretation and cost/time were rated as very important
for the overall comparison beyond validity, reliability, or objectivity. Stecher et al. [51] found
similar criteria when comparing assessments in vocational education.
This short review shows the different nature of comparison criteria in different fields of research
and underlines the importance for an extensive comparison model to guide between-method
comparisons. In a further step we categorized the collected comparison criteria to build up a
comparison model. The model has a structure from particular criteria to more general categories
that we identified during the development process: excellence related, economy related,
implementation related, and assessment related criteria (Figure 10). This structure is beneficial
for the comparison of methods as comparison can be done according to the general categories
as well as by means of particularly selected criteria. Following, the four categories and the most
relevant criteria in each category are described in more detail.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Figure 10 Comparison Model with the four categories and corresponding sub-criteria
Economy related criteria - This category comprises criteria that measure the economical
potential of a method. Thereby, the amount of time and costs are related to the results and
efficiency can be estimated. Furthermore, effectiveness assesses the performance of a method,
its completeness and accuracy, and whether the desired goals are achieved [23]. Time and
costs of a method have to be measured with regard to its results to compare the efficiency of
methods.
Excellence related criteria - Excellence related criteria measure the quality of a test. The
criteria validity and reliability are known as quality criteria in social science [4]. They are the main
prerequisites that account for good research practice. General practice for measuring validity is
a careful and thoughtful test design. Within the test design it is important to consider the things
that could add bias to the test and threaten validity. Coolican [7] gives a good overview over
threats to validity like history effects, sampling bias or confounding variables, which require
thorough consideration in the test development. Reliability is another very important test quality
criterion. Correlation coefficients are used to measure reliability [7] and coefficients above
0.75/0.8 are considered to represent good reliability. Despite validity and reliability, further
excellence related criteria are included in the comparison model (Figure 10). Best practice for a
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
good test design is to describe them carefully in the test development process and to discuss
and interpret them.
Assessment related criteria - Assessment related criteria concern the global characteristics of
the test. The whole test-design depends on the purpose of the test. When designing a test that
answers a particular research question one has to think about the context of the test and an
appropriate environment. Moreover, test participants, their gender, age, expertise and group
composition are relevant as well as the personnel and their demands and duties.
Implementation related criteria - Implementation related criteria concern the implementation of
a test. A detailed description of the test procedure is very important to reconstruct the test.
Within the test description it is also necessary to describe the test items, their production and
how data was captured in the test. In the context of personnel demands and cost the complexity
of a test is an issue to be considered. The complexity of a test can be split into four
subcategories: the ease of implementation, how easy the test can be used, the ease of using the
data, and how easy it is to interpret the results.
Table 9 Selected components for method comparisons
Category
Component
Definition
How to compare/measure
Implementation
related
Test procedure
Test procedure +
methods of analysis
Detailed description of
procedure (number of sessions,
method of measurements, data
analysis, and outcome)
Excellence/Economy
related
Test results
Results/outcome of the
test – interpretation of
data
Describe and interpret results,
differences – what does the
data tell us
Economy related
Amount of time
The time that it takes to
develop, implement,
conduct, analyze the test
and publish the results
Measure the time in minutes
and compare between methods
Economy related
Costs
Costs of the test,
depending on the time
and complexity of the test
Calculate the costs depending
on task demands and amount
of time
Summarizing the model, it comprises many criteria originating from different research areas.
Although we have created a good amount of criteria, a next step towards making the model a
well-validated tool is operationalization of its components based on defined measures. Following
we present the results of our work on comparing two different mixed methods approaches for
multimodal quality assessments based on an initial set of comparison attributes – descriptions
and measures of test procedure, the results, the amount of time, and the costs – which we see
as key criteria for a methods comparison (Table 9).
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
5.3 Comparison Study: Comparing OPQ and CP
5.3.1 Research Method
We applied Open Profiling of Quality (OPQ) and a new variation called Conventional Profiling
(CP) in which OPQ‟s Free-Choice Profiling approach is substituted with a sensory evaluation
based on fixed vocabulary (see Section 4).
5.3.1.1 Participants
A total of 63 test participants took part in the study. All test participants were screened for normal
or corrected to normal vision, color vision and 3D vision. All test participants can be classified as
naïve assessors as they had experience neither in the domain of research nor in subjective
quality evaluation studies. In the study, each test participant passed the psychoperceptual
evaluation. For the qualitative part of the study, 15 randomly selected participants were assigned
to the Conventional Profiling (CP) and 16 participants to the OPQ.
5.3.1.2 Variables and their production
The same contents and variables as in our context study (Section 2) were used.
5.3.1.3 Stimuli presentation
The tests were conducted in a laboratory at Ilmenau University of Technology and test
conditions were arranged according to the specifications in ITU-T P.910 [39]. A digital Viewer
FinePix REAL 3D V1 from FUJIFILM with a resolution of 640x480 pixels was used for playback
of the 3D videos. The viewing distance was set to 50cm initially, but test participants were
allowed to adjust their viewing distance for the best stereoscopic experience. The integrated
loudspeakers of the FinePix V1 were used for audio playback due to a missing headphone
connection. According to the speakers‟ maximum sampling rate audio was represented with a
sampling rate of 11 kHz. Different playlists in pseudo-randomized orders were used for video
presentation. During the psychoperceptual evaluation each test item was presented twice. In the
OPQ and CP task each video was presented once.
5.3.1.4 Test Procedure
Open Profiling of Quality (OPQ) and Conventional Profiling (CP) are both methods that extend
psychoperceptual evaluation with a descriptive profiling task. The Psychoperceptual evaluation
started with training and anchoring. Test participants trained watching the scenes in 3D and
practiced the evaluation task. During this task the whole range of constructed qualities and
contents was presented. Absolute Category Rating (ACR) was applied for the following
psychoperceptual evaluation to evaluate the overall quality on an unlabeled 11-point scale [39].
In addition, the acceptance of overall quality was rated on a binary (yes, no) scale [27]. After a
short break, test participants were either conducting a Free-Choice Profiling according to Open
Profiling of Quality or the Conventional Profiling.
According to Open Profiling of Quality, the psychoperceptual evaluation is complemented with
an adaptation of Free-Choice Profiling [13]. It consists of four subsequent steps, 1) Introduction,
2) Attribute elicitation, 3) Attribute refinement, and 4) Sensory evaluation. We followed these
steps as described in [13]. The introductory “apple task” and an extensive vocabulary elicitation
and refinement are followed by the sensory evaluation. OPQ allows each test participant to
develop his individual quality attributes which he uses to evaluate perceived quality. The whole
OPQ study was conducted in one session in contrast to its original description [13].
For the CP study, we applied all attributes included in the components framework of quality of
experience for 3D video framework [28](see Section 4 Extended OPQ), except the content
component. In the test, participants got a list with the quality components, their descriptions, and
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
the scale labels [28] (Table 8) to become familiar with the attributes. They used them in a
training task, in which they watched and rated 6 videos on a scoring card comparable to OPQ,
but with the fixed attributes. Finally, they did the quality evaluation with the predefined quality
components on all items.
5.3.1.5 Methods of Analysis
For the psychoperceptual evaluation nonparametric methods (Kolmogorov-Smirnov: p < .05)
were used for the analysis. Several ordinal dependent variables were analyzed using Friedman‟s
test and pairwise comparisons were analyzed with Wilcoxon‟s test [7]. Frequencies were
counted for the acceptance ratings. PASW Statistics 18 was used for quantitative data analysis.
Sensory data was analyzed using R with FactoMineR package. A Multiple Factor Analysis
(MFA) and a joint Hierarchical MFA (HMFA) were calculated from the OPQ and the CP data sets
[12].
5.3.2
Results
5.3.2.1 Psychoperceptual evaluation
The presented stimuli reached an acceptance level of 52.8% in total. Items with qp30 were
accepted with a minimum of 82.5% and with 94.2% over all contents. Items with qp40 reached
an acceptance level of 52.4% over all contents, whereas items with qp45 were not acceptable at
all (11.9%)(Figure 11). The coding quality parameters influenced the overall quality perception
when averaged over all contents (Fr = 1363.028, df = 2, p < .001). Figure 2 shows the mean
satisfaction scores averaged over all participants for the different contents and quality
parameters. Videos with qp30 provided the most satisfying quality compared to qp40 and qp45
(all comparisons: p < .001).
Figure 11 Overall Acceptance scores for the items under test
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Figure 12 Mean Satisfaction scores for all contents and quality parameters
5.3.2.2 Sensory evaluation OPQ data
The first two components of the MFA model from OPQ data account for 56.42% of explained
variance (dimension 1: 44.25%, dimension 2: 12.17%). The correlation plot in Figure 3 presents
the attribute distribution in the perceptual space. The negative polarity of dimension 1 is
described by attributes like grainy (e.g. p34.20, p31.12), artifacts (p1.39), or stumbling (p47.60).
The positive polarity of dimension 1 is described with attributes like 3d effect (e.g. p37.67,
p22.4), sharpness (e.g. p38.78, p47.57), naturalness (p38.81), or details (p22.7). Items (see
figure 6: items with black dots) with qp30 are at the positive polarity of dimension 1 and
described with positive attributes, whereas items with qp45 are at the opposite polarity and
described with a negative video quality. Dimension 2 in its positive polarity describes mainly the
ghosting effect of some items with the attribute double pictures (p29.65, p37.70, p27.25, p2.31).
Its negative polarity shows partial correlation with attributes like bright (p49.54), perceivable as
one picture (p17.38), and nice colors (p29.63). The test items are distributed along dimension 1
according to their quality parameter as can be seen in Figure 4. Dimension 2 mainly has an
influence on item Dracula, which can be explained with the ghosting effects perceive within this
content.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Figure 13 Correlation plot of OPQ results – only attributes having more than 50% of
explained variance are shown (attributes are numbered consecutively and presented as
participant number. attribute number (e.g. p27.24))
5.3.2.3 Sensory evaluation CP data
For the CP dataset we calculated as well the MFA over all participants. The CP model resulting
from the MFA accounts for 53.46% explained variance (dimension 1: 44.04% and dimension 2:
9.42%). The item plot (Figure 14, CP: red items) shows that the test items separate along the
first component according to the different QPs. Along the second component, especially the
items of content ‟dracula‟ separate from the other test items.For the sake of clarity we averaged
the resulting correlations for all 19 attributes over the participants and present the averaged
attribute correlations in Figure 16. While all these attributes show high correlation with dimension
1 and low correlation with dimension 2, the non-averaged results (gray arrows) also reveal high
correlation of some attributes with dimension 2. For dimension 1, the highest correlation is given
for attributes like ‟clarity of motion‟, ‟objects and edges‟, ‟color, brightness and contrast‟, and
‟clarity of image‟. For dimension 2, the correlation plot shows that the attributes with high
correlation differ and are of classes like ‟ease of viewing‟, ‟fluency of motion‟, or ‟perceivable
depth‟.We furthermore calculated the correlations of each individual PCA result with the overall
MFA model (Figure 17).For all participants, the dimension 1 (F1) of each participant correlates
with MFA Dim1 and many of the individual F2 with MFA Dim2. So the structure of the individual
data sets seems to be comparable. However, no correlation of the overall attributes with Dim2
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
can be found within the averaged MFA results (Figure 16). According to the individual attributes
(gray arrows in Figure 16) this suggests that maybe participants understood and used the
attributes in different ways and that maybe the used attributes were not adequate for the
description of all perceived quality characteristics.
Figure 14 MFA item plot (OPQ items in black and CP items in red)
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
1.0
MOBILE3DTV
Pleasantness of viewing.0
Fluency of motion.6
Visual discomfort.8
a3 a6
a2
a13
a19
a12 a8
a4 a5
a9 a10
a7
a11
a14
a17
a16
a1
a18
0.0
a15
-0.5
Dim 2 ( 9.42 %)
0.5
Visual discomfort.7
-1.0
Visual discomfort.6
Nature of motion.6 Nature of motion.9
Audiovisual.9
Perceivable depth.6
-1.0
-0.5
0.0
Dim 1 (44.04 %)
0.5
1.0
Figure 15 Correlation plot of averaged MFA correlations over all participants and nonaveraged results (gray arrows) with some exemplary labels (a1:perceivable depth,
a2:impression of depth, a3:fore/background layers, a4:balance of fore/background quality, a5:clarity of image, a6:blockfree image, a7:color, brightness and contrast, a8:objects and edges, a9:fluency of motion, a10:clarity of motion, a11:nature
of motion, a12:ease of viewing, a13:pleasantness of viewing, a14:enhanced immersion, a15:visual discomfort,
a16:comparison to existing technologies, a17:overall quality, a18:audio, a19:audiovisual)
MOBILE3DTV
1.0
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
F2(9)
F2(8)
F2(3)
F2(2)
0.5
F2(1)
F1(8)
F1(15)
F1(2)
F1(14)
F1(7)
F1(9)
F1(12)
F1(3)
F1(6)
F1(10)F1(4)
F1(1)
F1(5)
F1(11)
F1(13)
F2(4)
F2(14)
F2(6)
0.0
Dim 2 (9.42 %)
F2(12)
F2(13)
F2(11)
-0.5
F2(5)
F2(10)
F2(15)
-1.0
F2(7)
-1.0
-0.5
0.0
Dim 1 (44.04 %)
0.5
1.0
Figure 16 Individual configurations and their partial plots in the overall MFA model
5.4 Systematic Comparison of Methods
Open Profiling of Quality and Conventional Profiling both provide valuable results to understand
the underlying quality factors of the psychoperceptual evaluation. However, we aim at a holistic
understanding of similarities and differences among the methods. Following, we compare the
two methods based on our selected comparison criteria. For the whole comparison, we assume
that both methods are valid and reliable thanks to a careful test design, the usage of both
methods in earlier evaluations (described in [10][12]).
5.4.1 Test results
To be able to compare both results on a statistical basis, we conducted a HMFA (Figure 17).
The result and the partial clouds of OPQ and CP data set show good agreement of the
discrimination of the test items. Especially the deviation along the first dimension is low so that
both methods seem to be able to classify video quality similarly. Small differences can be found
along dimension 2. Here, the OPQ data set seems to be more sensitive in capturing the impact
of ghosting effects and double pictures caused by crosstalk.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
5.4.2 Test procedure
Both methods follow the same overall procedure of psychoperceptual evaluation and sensory
analysis. The methods OPQ/CP differ in the important aspects of attribute
generation/familiarization and refinement/training. In OPQ participants generate and refine their
own vocabulary for the quality evaluation, whereas in CP we provide a consensus vocabulary
and participants have to familiarize themselves with the predefined quality attributes in terms of
meaning and usage.
Figure 17 Superimposed representation of the partial clouds of the HMFA of OPQ dataset
(black dots) and CP dataset (red dots)
5.4.3 Amount of time
We measured the time per participant for conducting the whole test session in minutes and
calculated the average amount of time. The mean test duration for the psychoperceptual
evaluation at the beginning of both methods was 31.9 minutes (standard deviation (sd) = 5.5).
Participants needed 51.1 minutes (sd = 5.3) for the CP task and 40.5 minutes (sd = 7.6) for the
OPQ task on average. This shows that participants needed significantly more time for the CP
task than the OPQ task (t-test: T = 4.493, df = 29, p ≤ 0.001). The huge amount of 19 attributes
and the inexperience of our participants may have made the CP method long lasting. This time
could be shortened when using more experienced participants and less attributes.
5.4.4 Costs
Costs of a study depend on the personnel demands for different tasks and the amount of time
that is needed for the study from planning to report of its results. Within our study, the
conduction time was higher for CP than for OPQ and therefore OPQ produced lower costs.
However, carefully conducted OPQ sessions demand for more experienced researchers
compared to CP. The guidance of test participants in attribute elicitation and refinement,
important steps in individual vocabulary profiling, need more experience for correct test
conduction. We have found out that it is very difficult to compare the global costs of a method, as
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
the measurement of task demands and amount of time for certain tasks is not straightforward
and depends on prior knowledge and experiences of the researchers with one or another
method.
5.4.5 Research purpose
Beyond the comparison of results and costs the research purpose itself is a crucial criterion for a
holistic comparison. The research purpose of a study is usually the starting point when
investigating into a certain research question. OPQ is especially suitable for research areas in
which the quality perception is not yet fully understood and no consensus vocabulary exists.
OPQ studies help to identify crucial individual quality attributes, but communication of results to
other parties of a development process is hard due to the individual nature of attributes. In
contrary, CP is useful in an already explored research area with a defined set of quality
components. If these components are well defined and validated, CP will offer a good method
for discrimination of different test stimuli good communication of results based on the fixed
components.
5.5 Discussion
We developed an extensive comparison model to guide between-method comparisons based on
a literature review and we compared two mixed methods using a subset of criteria of the model.
The model contains four main criteria, called economy, excellence, implementation and
assessment including 24 sub-criteria. At the current stage, the model gives an overview to the
evaluation criteria, but further definition and operationalization are needed in work towards an
applicable comparison model. Furthermore, it is essential to reflect the different parts of the
model in relation to the central methods in multimodal quality evaluation research (e.g.
[24][10][26]). This would assist practitioners in their choice for a suitable research method.
The comparison of the mixed methods on a subset of implementation, effectiveness and
economy-related criteria showed benefits and weaknesses between the methods depending on
the measured dimensions. In the implementation, conventional profiling provides a consensus
vocabulary for the intended research problem and requires training for each participant to use it.
OPQ requires a multistep procedure to develop an individual‟s own vocabulary to be used in
rating, but it does not require prior knowledge in research problem. Analysis of this criterion
proposes that OPQ is more suitable for identifying the quality of a novel phenomenon. According
to the economy-related criteria, conventional profiling is slightly more time-consuming (26%) in
the data-collection phase than individual profiling. However, this conclusion is highly dependent
on the used number of rated attributes as well as the presentation of stimuli. The result of the
joint HMFA analysis was similar in the dominating dimension indicating a good accordance of
both methods (positive-negative) and is validated with comparable results from previous studies
[10]. However, small differences appeared between the descriptions of crosstalk which can be
explained by an inconsistent use of the fixed vocabulary to describe this artifact. It seems that
the OPQ method can capture this phenomenon in more detail. However, the currently used
subset of comparison criteria does not allow drawing strong conclusions of the universal
preferences between the methods. Further work needs to address the comparison between the
methods more holistically utilizing the comparison model developed.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
6 Prototype Study: Usability and quality experience with the
final mobile 3D prototype in the context of use
6.1 Preliminary planning
Different coding methods and transmission parameters were developed and examined within the
core technology development of Mobile3DTV. Information about the best coding methods and
transmission parameter combinations were already gathered in previous experiments
[10][11][12] and used for further optimizations of the Mobile3DTV system. In a final step the
developed technologies were combined into one end-product. This end-product is developed
according to the user requirements for mobile 3D television and video [30] and the quality
evaluation results from previous experiments.
The goal of the study is to validate the optimized Mobile3DTV system and the prototype in field
settings under as natural as possible usage conditions.
6.1.1 Participants
We plan to conduct the study with at least 30 participants. Participants should refelct the mobile
3DTV user profiles drawn by the previous studies. End-user group of the age from 20 to 45 took
is assumed. All participants have to be screened for normal or corrected to normal visual acuity
(myopia and hyperopia, Snellen index 20/30), color vision using Ishihara test, and stereo vision
using Randot Stereo test (≤ 60 arcsec).
6.1.2 Test design
A factorial, related design [7] is to be applied to the experiment. Subject variables contain the
content and the presentation modes. Participants should do the quality evaluation in three
contexts.
6.1.3 Test procedure
A combination of a psychoperceptual quality evaluation and descriptive interviews is chosen.
The psychoperceptual quality evaluation consists of a pre-test, a training session, and the quality
evaluation. For the descriptive interview we use a semi-structured interview after each
psychoperceptual evaluations in each context and at the end of the tests. Furthermore,
participants have to do post-task tests, including a demographic and a workload questionnaire at
the end of the tests. Quality evaluations are carried out in every context (laboratory, bus, and
café) including quantitative ratings and qualitative interviews. An overall interview concludes the
experiment after all three quality ratings in the three contexts.
6.1.3.1 Story
The whole study is planned like a big mobile 3D video usage story. Participants arrive at the
laboratory, do the pre-tests, chose the content they want to watch, and do the first quality
evaluation in the laboratory as our control setting. According to a short story 15 participants
would take a bus to a café and do the quality evaluation in both contexts, first in the bus and
afterwards in the café. The task for the participants was to imagine that they take the bus to
meet a friend in the café and to wait for him there. The other 15 participants walked to the café
to meet a friend there and went home by bus afterwards.
6.1.3.2 Psychoperceptual evaluation
Pre-test
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Before the evaluation starts participants are to be introduced to the test signed a data protection
policy. The screening of participants for myopia, hyperopia, color vision, and stereo vision is
done before the actual start of the test session.
Accommodation and training
Accommodation and training takes place only in the laboratory setting to familiarize the
participants with the device and the stereoscopic videos. All participants watch some high quality
stereoscopic videos to get used to the device and to find a good viewing position for optimal
three-dimensional viewing experience.
In the training and anchoring task we show the participants a subset of the test items. This
subset represents the extreme values of the items and all contents. The intention of the training
is to familiarize participants with the evaluation task and the usage of the quality scales.
Furthermore, participants are not told about quality factors, so that they are expected to use their
own quality reference. In the evaluation task they rate the quality acceptance and the overall
quality of each test item.
Quality evaluation
As in previous studies [45] we will use Absolute Category Rating (ACR) according to ITU-T
P.910 [39] for the evaluation of the overall quality. Additionally, we will use the Acceptance
Threshold according to Jumsiko-Pyykkö et al. [27] to measure the general quality acceptance of
the test items. The general acceptance is rated on a binary yes-no scale. The overall quality is
evaluated on an 11-point unlabeled scale. All ratings are given on a small scoring card (A6) that
hung around the user‟s neck when moving between the contexts (see Figure 18). Test items are
presented in randomised orders and participants rated all items twice. The quantitative session
takes about 40 minutes in average in the laboratory with the pre-tests and training and 20
minutes in the contexts bus and café having only the quality evaluation and not the pre-tests and
training again.
Figure 18
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
6.1.3.3 Qualitative, descriptive Interviews
A semi-structured interview is to be done after the quality evaluation in each context. This type of
interview uses main and supporting questions to ask the participant for detailed explanations of
previous answers, but still uses an overall interview guideline. Such a guideline enables the later
comparison of the interviews. The following questions are suggested for the interviews.
Main Questions:





What kind of factors did you pay attention to while viewing in this situation?
What kind of thoughts, feelings and ideas came to your mind while viewing?
How did you experience your surroundings?
Did you notice any positive or negative things, things you liked, didn‟t liked, things that
disturbed you?
Which were the quality characteristics you paid attention to?
Supporting Questions:



Please, could you describe in more detail, what you mean by X (answer of main
question)?
Please, could you describe in more detail when/how X appeared?
Please, could you clarify if X was among annoying – acceptable – pleasurable/negative –
positive factors?
At the end of the whole test another semi-structured interview, following the same guideline is
done.
6.1.3.4 Questionnaires
According to previous studies by Jumisko-Pyykkö et al. [29] we suggest conducting two
questionnaires in our study to be filled out by the participants at the end of the experiment. In our
first questionnaire we ask about demographic data of the participants as well as about interest in
content and knowledge about the technology. A workload questionnaire (based on the NASATLX [19] questionnaire) is to be used to evaluate the demands of the evaluation task in the
context of use.
6.1.4
Test Material and Apparatus
6.1.4.1 Selection of Test Sequences
We have collected 20 different contents to make a selection from for the study. The contents
were selected according to the user requirements of mobile 3D television and video ([30][13][9]).
Contents were selected from four categories: animation, sports, user-created, and
documentation with five different contents in each category. The contents differ in their length
from 40 seconds to approximately 3 minutes according to a meaningful plot and the available
length of the video.
Additionally, contents were chosen to represent a variation of different content parameters like
spatial details, temporal details, depth complexity, scene cuts, and audio. Table 10 illustrates the
20 contents with screenshots and provides the duration and a short description of the content
parameters.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
6.1.4.2 Selection of Test Parameters
For the evaluation of the prototype we want to present video sequences in high quality. For this
purpose we selected MVC as coding method, because this method proved to be one of the best
for mobile television and video and it can be used to encode videos consisting of right and left
video streams ([11][8]). To receive comparable high quality for all sequences the quantization
parameter (QP) of the encoder should be set no higher than 30 for high quality for the Simulcast
Coding. For an optimized transmission simulation of the video sequences we choose two
different transmission scenarios. On the one hand the sequences can be optimized for stationary
transmission for the café and laboratory context. On the other hand the sequences can be
optimized for the moving context in the bus. With the prototype device it is possible that
participants choose the playback mode between stereoscopic (3D) and monosopic (2D). In 2D
playback mode the right video stream is substituted by the left stream, so that the sequence
consists of two left video streams. Participants in the test are not allowed to choose the playback
mode for themselves, as they must not know the actual playback mode. For this reason we shall
generate 3D and 2D versions of each video sequence beforehand.
6.1.4.3 Apparatus and Test Setup
The tests will be conducted in three different context settings. The first context is the Listening
Lab at Ilmenau University of Technology. This laboratory offers a controlled test environment.
The laboratory settings for the study are set according to ITU-R BT.500 [38]. As another context
we chose the student café on the university campus. The café is in the basement of one of the
student hostels and is used by students and employees weekdays from 11 to 17 o‟clock. The
café consists of a bar, a lounge with comfortable sofas and another room with tables. To ensure
comparable test conditions we only conducted the test in the café between 12 and 15 o‟clock,
when approximately the same amount of people was in the café. Participants are always placed
at the same table, from which they could see the entrance. Lighting conditions at this table are
comparable and we always would tell the participants to find a good viewing position to have as
little as possible light reflections on the display. The background noises are generated by talking
people, music, and kitchen noise. We chose a bus ride as the third context for our study to have
a “moving” context of use. Participants would take the bus from the laboratory to the café or from
the café to their home. The bus was a regular local public transport bus in Ilmenau. Participants
are asked to always use the same seat in the back of the bus. Both bus routes (laboratory to
café and café to participants home) has approximately the same length of about 15 minutes.
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Table 10 Snapshots of the 20 contents under assessment (VSD =visual spatial details, VTD: temporal motion,
VD: amount of depth, VDD: depth dynamism, VSC: amount of scene cuts, and A: audio characteristics)
Animation
Sports
User-created
Documentation
Knight‟s quest (1:00 min)
24h Race (1:50 min)
NY-Street Dance (2:50 min)
Wildearth Safari (2:30 min)
VSD: medium, VTD: high, VD:
high, VDD: high, VSC: medium,
A: movie sounds, atmosphere
sounds
VSD: medium, VTD: high, VD:
medium, VDD: high, VSC: high,
A: racing sounds, atmosphere
sounds, music
VSD: medium, VTD: high, VD:
medium, VDD: low, VSC: no, A:
music, clapping hands, voices
VSD: medium, VTD: medium, VD:
medium, VDD: medium, VSC:
medium,
A:
atmosphere
sounds
Dracula (40 sec)
Skydive (2:20 min)
Mountain Bike Race (2:00 min)
Rhine Valley (1:30 min)
VSD: medium, VTD: high, VD:
high, VDD: high, VSC: high, A:
movie sounds, atmosphere
sounds
VSD: low, VTD: medium, VD:
medium, VDD: medium, VSC:
low, A: music, atmosphere
sounds
VSD: medium, VTD: high, VD:
high, VDD: medium, VSC:
medium, A: music
VSD: medium, VTD: medium, VD:
high, VDD: medium, VSC: low, A:
music, atmosphere sounds
Shrek the Third Movie Part
(3:00 min)
Motocross Race (1:20 min)
Cave (2:40 min)
The eye (2:20 min)
VSD: high, VTD: high, VD: high,
VDD: high, VSC: medium, A:
racing sounds, music
VSD: medium, VTD: low, VD:
medium, VDD: low, VSC: low, A:
music, atmosphere sounds
VSD: medium, VTD: low, VD:
medium, VDD: low,
VSC:
medium, A: music
Cloudy with a chance of
meatballs Trailer (1:20 min)
FIFA soccer WM ARG-NIG
(2:30 min)
Berlin in 3D (2:10 min)
Heidelberg (2:40 min)
VSD: high, VTD: medium, VD:
high, VDD: medium, VSC: low, A:
movie sounds, engl. Speech
VSD: high, VTD: medium, VD:
medium, VDD: medium, VSC:
low, A: stadium sounds, engl.
Speaker
VSD: medium, VTD: medium, VD:
medium, VDD: medium, VSC:
low, A: music, atmosphere
sounds
VSD: medium, VTD: medium, VD:
medium, VDD: medium, VSC:
low, A: music, atmosphere
sounds
Ice Age 3 Movie Part (2:20
min)
Downhill skiing (2:00 min)
Rhine Valley (1:50 min)
Makroshow (1:50 min)
VSD: medium, VTD: high, VD:
high, VDD: medium, VSC: low,
A: skiing sounds, atmosphere
sounds, music
VSD: medium, VTD: low, VD:
medium, VDD: medium, VSC:
low, A: music, atmosphere
sounds
VSD: medium, VTD: high, VD:
medium, VDD: high, VSC:
medium, A: music
VSD: medium, VTD: medium, VD:
medium, VDD: medium, VSC:
low, A: movie sounds, engl.
Speech
VSD: medium, VTD: medium, VD:
medium, VDD: medium, VSC:
medium, A: movie sounds,
atmosphere sounds
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
6.2 Actual planning
Planning of the final usability tests of the prototype and system was transferred to WP6 to be
held as part of the physical end-to-end system setup up. Therefore, the plan was modified and
reduced accordingly. We refer also to Deliverable D6.8 “Complete end-to-end 3DTV system over
DVB-H” for the actual conduct of the tests and the results of them.
6.2.1 Participants
We plan to conduct the study with at least 30 participants. Participants should reflect the mobile
3DTV user profiles drawn by the previous studies. End-user group of the age from 20 to 45 is
assumed. However, due to the limited time of the tests, the group of participants, categorized as
„Early adopters‟, should be favored. All participants have to be screened for normal or corrected
to normal visual acuity (myopia and hyperopia, Snellen index 20/30), color vision using Ishihara
test, and stereo vision using Randot Stereo test (≤ 60 arcsec).
6.2.2 Test design
A factorial, related design [7] is to be applied to the experiment. Subject variables contain the
content and the presentation modes. Context is reduced to free watching in a cafeteria.
6.2.3 Test procedure
The procedure will include descriptive interviews. A pre-test, a training session, and free
watching are to be followed. For the descriptive interview we use a semi-structured interview
after the free watching in the chosen context. Furthermore, participants have to do post-task
tests, including a demographic and a workload questionnaire at the end of the tests.
6.2.3.1 Story
The whole study is planned like a mobile 3D video usage story. Participants arrive at the
laboratory, do the pre-tests. Then they are familiarized with the prototype and its functionality.
The device allows for watching three thematic TV channels including documentary, cartoon and
sport. According to the story, the participant comes from a lecture and have some time to spend
in the cafeteria while waiting to meet a friend. That spare time varies between 30 and 40
minutes.
6.2.3.2 Pre-test
Before the evaluation starts participants are to be introduced to the test signed a data protection
policy. The screening of participants for myopia, hyperopia, color vision, and stereo vision is
done before the actual start of the test session.
Accommodation and training takes place in the laboratory to familiarize the participants with the
device and the stereoscopic videos. All participants watch some high quality stereoscopic videos
and play with the device controls to get used to the device and to find a good viewing position for
optimal three-dimensional viewing experience.
6.2.3.3 Qualitative, descriptive Interviews
A semi-structured interview is to be done after the watching in the cafeteria context. This type of
interview uses main and supporting questions to ask the participant for detailed explanations of
previous answers, but still uses an overall interview guideline. Such a guideline enables the later
comparison of the interviews. The following questions are suggested for the interviews.
Main Questions:
MOBILE3DTV





D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
What kind of factors did you pay attention to while viewing in this situation?
What kind of thoughts, feelings and ideas came to your mind while viewing?
How did you experience your surroundings?
Did you notice any positive or negative things, things you liked, didn‟t liked, things that
disturbed you?
Which were the quality characteristics you paid attention to?
Supporting Questions:



6.2.4
Please, could you describe in more detail, what you mean by X (answer of main
question)?
Please, could you describe in more detail when/how X appeared?
Please, could you clarify if X was among annoying – acceptable – pleasurable/negative –
positive factors?
Test Material and Apparatus
6.2.4.1 Selection of Test Sequences
The test sequences are to be selected for the collection of 20 different contents as described in
the preliminary planning section. They should form three TV channels: animation, sports, and
documentary (including user-created content). The content should form selection to be sufficient
for 30-40 minutes watching.
6.2.4.2 Selection of Test Parameters
The video sequences are to be encoded by MVC ([11], [8]) with favorable quantization
parameter (QP) of e.g. 29. The channel settings are to be selected to be optimal for the
anticipated stationary transmission for the cafeteria scenario and the corresponding transport
streams to be prepared and stored for real-time transmission on the Cardinal play-out. Only 3D
viewing mode is assumed.
6.2.4.3 Apparatus and Test Setup
Pre-tests are to be accomplished in the 3D Media Lab of Tampere University of Technology. As
another context we chose the student café ROM of the TUT campus. The cafeteria s in the first
floor of the information technology building (Tietotalo) and is used by students and employees
weekdays from 8:30 to 17 o‟clock. The cafeteria consists of coffee shop, a lounge with
comfortable sofas and another area with tables. The same amount of people is normally
encountered during the working hours, so the tests are scheduled between 9 and 16 o‟clock.
Participants are always placed at the same sofa, from which they could see the entrance.
Lighting conditions at this sofa are comparable and we always would tell the participants to find
a good viewing position to have as little as possible light reflections on the display. The
background noises are generated mainly by talking people.
For the actual tests and analysis of results we refer to D6.8
MOBILE3DTV
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
Bech, S. and Zacharov, N. 2006. “Perceptual Audio Evaluation - Theory, Method and
Application”. Wiley, Chichester, England
Bech, S., Hamberg, R., Nijenhuis, M., et al., “Rapid perceptual image description
(RaPID) method”, Proc. SPIE 2657, pp. 317-328, doi:10.1117/12.238728, 1996
Brandt, M. A., Skinner, E. Z. and Coleman, J. A. (1963), Texture Profile Method. Journal
of Food Science, 28: 404–409. doi: 10.1111/j.1365-2621.1963.tb00218.x
Bryman, A., “Social Research Methods, 3rd ed., Oxford University Press, Oxford, UK,
2008
Civille, G. V. and Lawless, H. T. (1986), The Importance of Language in Describing
Perceptions. Journal of Sensory Studies, 1: 203–215. doi: 10.1111/j.1745459X.1986.tb00174.x
Cliff, M. A., Wall, K., Edwards, B. J. and King, M. C. (2000), Development of a
Vocabulary for Profiling Apple Juices. Journal of Food Quality, 23: 73–86. doi:
10.1111/j.1745-4557.2000.tb00197.x
Coolican, H. “Research methods and statistics in psychology”, 5th ed., London: Hodder
Education, 2009
D. Strohmeier and G. Tech, "Sharp, bright, three-dimensional: open profiling of quality for
mobile 3DTV coding methods", Proc. SPIE 7542, 75420T, 2010, doi:10.1117/12.848000
D. Strohmeier, M. Weitzel, S. Jumisko-Pyykkö, "Use scenarios - mobile 3D television and
video", special session 'Delivery of 3D Video to Mobile Devices' at the conference
'Multimedia on Mobile Devices', a part of the Electronic Imaging Symposium 2009 in San
Jose, California, USA, January 2009.
D. Strohmeier, S. Jumisko-Pyykkö, and K. Kunze, “Open Profiling of Quality: A Mixed
Method Approach to Understanding Multimodal Quality Perception,” Advances in
Multimedia, vol. 2010, Article ID 658980, 28 pages, 2010. doi:10.1155/2010/658980
D. Strohmeier, S. Jumisko-Pyykkö, and K. Kunze, G. Tech, D. Buğdayci, M. O. Bici,
“Results of quality attributes of coding, transmission, and their combinations”, Deliverable
4.3, MOBILE3DTV, Project No. 216503, 2010
D. Strohmeier, S. Jumisko-Pyykkö, and K. Kunze, M. O. Bici, “The Extended-OPQ
method for User-centered Quality of Experience evaluation: A study for mobile 3D video
broadcasting over DVB-H,” special issue “Quality of Multimedia Experience”, EURASIP
Journal on Image and Video Processing, vol. 2011, Article ID 538294, 24 pages, 2011.
doi:10.1155/2011/538294
D. Strohmeier, S. Jumisko-Pyykkö, M. Weitzel, S. Schneider, “Report on User Needs and
Expectations for Mobile Stereo-video”. Tampere University of Technology, 2008.
D. Strohmeier, S. Jumisko-Pyykkö, U. Reiter, “Profiling experienced quality factors of
audiovisual 3D perception”, Proc. of the International Workshop on Quality of Multimedia
Experience (QoMEX 2010), Trondheim, Norway, June 2010
Drake, M. and Civille, G. (2003), Flavor Lexicons. Comprehensive Reviews in Food
Science and Food Safety, 2: 33–40. doi: 10.1111/j.1541-4337.2003.tb00013.x
E. Samoylenko, S. McAdams, and V. Nosulenko, “Systematic Analysis of Verbalizations
Produced in Comparing Musical Timbres,” International Journal of Psychology, vol. 31,
no. 6, pp. 255–278, 1996, doi:10.1080/002075996401025.
G. Lorho, “Perceived Quality Evaluation: An Application to Sound Reproduction over
Headphones”, PhD thesis, Helsinki University of Technology, Helsinki, Finland, 2010
H. T. Lawless and H. Heymann, Sensory evaluation of food: principles and practices, 1st
ed. New York: Chapman & Hall, 1999.
Hart, S. G., Staveland, L. E., “Development of NASA-TLX (Task Load Index): results of
empirical and theoretical research”, In Hancook, P. A., Meshkati, N. (eds) “Human mental
workload”, North-Holland, Amsterdam, pp 139-183, 1988
MOBILE3DTV
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Hart, S. G., Staveland, L. E., “Development of NASA-TLX (Task Load Index): results of
empirical and theoretical research”, In Hancook, P. A., Meshkati, N. (eds) “Human mental
workload”, North-Holland, Amsterdam, pp 139-183, 1988
Hartson, H. R., Andre, T. S., Willigers, R. C., „Criteria for evaluating usability evaluation
methods“, International Journal of Human-Computer Interaction, Vol. 15, No. 1, pp. 145181, 2003
Haslam, S., McCarty, C., “Research methods and statistics in psychology”, Sage
Publications, London, UK, 2003
ISO 9241-11, “Ergonomic requirements for office work with visual display terminaly
(VDTs) – Part 11: Guidance on usability”, International Standards Organization, 1998
ITU-T REC. P.910 “Subjective video quality assessment methods for multimedia
applications”, Switzerland, 1999.
J. Radun, T. Leisti, J. Häkkinen, H. Ojanen, J. Olives, T. Vuori, G. Nyman “Content and
Quality: Interpretation-Based Estimation of Image Quality”, ACM Trans. Appl. Percept. 4,
4, 2008
Jumisko-Pyykkö, S., “User-Centered Quality of Experience and its Evaluation Methods
for Mobile Television, PhD thesis, Tampere University of Technology, 2011, in press
Jumisko-Pyykkö, S., Kumar Malamal Vadakital, V., Hannuksela, M. M., „Acceptance
Threshold: Bidimensional Research Method for User-Oriented Quality Evaluation
Studies“, International Journal of Digital Multimedia Broadcasting, 2008
Jumisko-Pyykkö, S., Strohmeier, D., Utriainen, T., Kunze, K., „Descriptive quality of
experience for mobile 3D video“, In: Proceedings of the 6th Nordic Conference on
Human-Computer Interaction: Extending Boundaries (NordiCHI ‚10), ACM, New York,
NY, USA, pp. 266-275, 2010
Jumisko-Pyykkö, S., Utriainen, T., “ A Hybrid Method for Quality Evaluation in the
Context of Use for Mobile (3D) Television”, Multimedia Tools and Applications, Springer
Netherlands, pp 1-41, 2010
Jumisko-Pyykkö, S., Weitzel, M., Strohmeier, D., “Designing for User Experience: What
to Expect from Mobile 3D TV and Video?”, First International Conference on Designing
Interactive User Experience for TV and Video, October 22-24, 2008, Silicon Valley,
California, USA
Lawless, H. T., Heymann, H., “Sensory Evaluation of Food: Principles and Practices”,
Springer Verlag, 1999
Markopoulos P., Bekker, M., “How to Compare Usability Testing Methods with Children
Participants”, In: Interaction Design and Children”, Shaker Publisher, pp. 153-159, 2002
McTigue, M. C., Koehler, H. H., Silbernagel, M J., “Comparison of Four Sensory
Evaluation Methods for Assessing Cooked Dry Beans”, Journal of Food Sciences, Vol.
54, No. 5, pp 1278-1283, 1989
Meilgaard MC, Daigliesh CE, Clapperton JF. 1979. Beer flavour terminology. J Inst Brew,
85:38-42.
Meilgaard, M., Civille, G. V., Carr, B. T., “Sensory Evaluation Techniques”, 3 rd ed., CRC
Press, 387 pp, ISBN 0-8493-0276-5, 1999
Noble, A.C., Arnold, R.A., Masuda, B.M., Pecore, S.D., Schmidt, J.O., Stern, P.M. 1984.
Progress towards a standardized system of wine aroma terminology. Am. J. Enol. Vitic.
35 (2), 76-77.
Perrin, L., Symoneaux, R., Maître, I., Asselin, C., Jourjon, F., Pagès, J., “Comparison of
three sensory methods for use with the napping procedure: Case of ten wines from loire
valley”, Food Quality and Preference, Vol. 19, No. 1, pp. 1-11, 2008
Recommendation ITU-R BT.500-11. 2002. Methodology for the Subjective Assessment
of the Quality of Television Pictures, Recommendation ITU-R BT.500-11. ITU Telecom.
Standardization Sector of ITU.
MOBILE3DTV
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
Recommendation ITU-T P.910 “Subjective video quality assessment methods for
multimedia applications”, Recommendation ITU-T P.910, ITU Telecom. Standardization
Sector of ITU, 1999
Rouse, D., Pépion, R., Hemami, S., Le Callet, P., “Tradeoffs in subjective testing
methods for image and video quality assessment”, Human Vision and Electronic
Imaging, 7527, 2010
S. Bech, R. Hamberg, M. Nijenhuis, C. Teunissen, H. de Jong, P. Houben, S. Pramanik,
“The RaPID perceptual image description method (RaPID)”. In Proc. SPIE. Vol. 2657.
317-328. 1996
S. Jumisko-Pyykkö, “User-Centered Quality of Experience and its Evaluation Methods for
Mobile Television”, PhD thesis, Tampere University of Technology, Tampere, Finland,
2011
S. Jumisko-Pyykkö, and M.M. Hannuksela, “Does context matter in quality evaluation of
mobile television?”, MobileHCI, Amsterdam, The Netherlands, 2008.
S. Jumisko-Pyykkö, J. Häkkinen, G. Nyman, “Experienced Quality Factors – Qualitative
Evaluation Approach to Audiovisual Quality”, Proceedings of the IS&T/SPIE 19th Annual
Symposium of Electronic Imaging, Convention Paper 6507-21, 2007
S. Jumisko-Pyykkö, T. Utriainen, “A Hybrid Method for Quality Evaluation in the Context
of Use for Mobile (3D) Television”, Multimedia Tools and Applications, 2010
S. Jumisko-Pyykkö, T. Vainio, “Framing the context of use for mobile HCI”, International
Journal of Mobile-Human-Computer-Interaction (IJMHCI), 2010
S. Jumisko-Pyykkö, V. K. M. Vadakital, M. M. Hannuksela, “Acceptance Threshold:
Bidimensional Research Method for User-Oriented Quality Evaluation Studies.”
International Journal of Digital Multimedia Broadcasting, 2008
S. Tamminen, A. Oulasvirta, K. Toiskallio, and A. Kankainen, “Understanding mobile
contexts”, Pers Ubiquit Comput, Volume 8, pp. 135-143, 2003.
Shadish, W. R., Cook, T. D., Campbell, D. T., “Experimental and quasi-experimental
Designs for Generalized Causal Inference”, Houghton Mifflin Company, 2002
Smilowitz, E. D., Darnell, M. J., Benson, A. E., „Are we overlooking some usability testing
methods? A Comparison of Lab, Beta, and Forum Tests”, Proceedings of the Human
Factors and Ergonomics Society 37th Annual Meeting, 1993
Stecher, B. M., Rahn, L. M., Ruby, A., Alt, M. N., Robyn, A., „Using Alternative
Assessments in Vocational Education“, RAND, 176 pp, ISBN 0-8330-2489-2, 1997
Stone, H. and Sidel, J. L. 2004. Sensory evaluation practices. 3rd ed. ed. Academic
Press, San Diego
Wiliams, A. A., Arnold, G. M., “Comparison of the aromas of six coffees characterized by
conventional profiling, free-choice profiling and similarity scaling methods”, Journal of the
Sciences of Food and Agriculture, Vol. 36, No. 3, pp. 204-214, 1985
Yokum, J. T., Armstrong, J. S., “Beyond Accuracy: Comparison of Criteria Used to select
Forecasting Methods”, International Journal of Forecasting, Vol. 11, pp. 591-597, 1995
S. Jumisko-Pyykkö and T. Utriainen, “Results of the user-centred quality experiments”,
Technical Report D4.4 Mobile3DTV, 2009
Lorho, G., “Perceptual evaluation of mobile multimedia loudspeakers”, Proceedings of
the Audio Engineering Society 122th Convention, 2007
D. Strohmeier and S. Jumisko-Pyykkö, “Proposal on open profiling of quality as a mixed
method evaluation approach for audiovisual quality assessment”, Proposal no. 181, ITUT SG12, Question Q13/12, International Telecommunication Union, Switzerland, 2011
S. Jumisko-Pyykkö, “Hybrid method for quality evaluation in the context of use”, Proposal
no. 180, ITU-T SG12, Question Q13/12, International Telecommunication Union,
Switzerland, 2011
Strauss, A., Corbin, J., “Basics of qualitative research: Techniques and procedures for
developing grounded theory”, Sage, Thousand Oaks, CA, USA, Vol. 2, 1998
MOBILE3DTV
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]
D4.5 Results of user-centered quality evaluation experiments and usability tests of prototype
M. Lambooij, W. IJsselsteijn, M. Fortuin, and I. Heynder-ickx, “Visual discomfort and
visual fatigue of stereoscopic displays: A review,” J. Imaging and Technology 53(3),
030201-030201-14, 2009
L. M. J. Meesters, W. A. IJsselsteijn, and P. J. H. Seuntiens, “A survey of perceptual
evaluations and requirements of three-dimensional TV,” IEEE Trans. Circuits Syst. Video
Tech. vol. 14, no. 3, pp. 381-391, Mar. 2004
R. Kennedy, N. Lane, K. Berbaum, and M. Lilienthal, “Simulator sickness questionnaire:
An enhanced method for quantifying simulator sickness,” Int. J. Aviation Psychology 3(3),
pp. 203-220, 1993
J. Häkkinen, T. Vuori, and M. Puhakka, “Postural stability and sickness symptoms after
HMD use,” in Proc. SMC Symp., 2002, pp. 147–152.
B. K. Jaeger, and R. R. Mourant, “Comparison of simulator sickness using static and
dynamic walking simulators,” Hu-man Factors and Ergonomics Society Annual Meeting
Proc., Virtual Environments, pp. 1896-1900(5), 2001.
M. Pölönen, and J. Häkkinen, “Near-to-Eye Display - An accessory for handheld
multimedia devices: Subjective studies,” Journal of Display Technology, vol. 5, no. 9, pp.
358-367, Sep. 2009.
M. Lambooij, W. IJsselsteijn, M. Fortuin, and I. Heynderickx, “Visual discomfort and
visual fatigue of stereoscopic displays: A review,” J. Imaging and Technology 53(3),
030201-030201-14, 2009.
M. Lambooij, W. IJsselsteijn, I. Heynderickx, "Visual dis-comfort in stereoscopic displays:
a review," in Proc. SPIE 6490: 64900I (2007).
“Actius
AL-3DU
Laptop”,
Product
brochure,
Sharp.
2005,
Available:
www.sharpsystems.com/products/pc_notebooks/actius/al/3du/
“Stereoscopic 3D LCD Display module”, Product Brochure, masterImage, 2009,
Available: www.masterimage.co.kr/new_eng/product/module.htm
S. Uehara, T. Hiroya, H. Kusanagi, K. Shigemura, and H.Asada, “1-inch diagonal
transflective 2D and 3D LCD with HDDP arrangement,” in Proc. SPIE-IS&T Electronic
Imaging 2008, Stereoscopic Displays and Applications XIX, Vol. 6803, San Jose, USA,
Jan. 2008.
K. Stanney, R. S. Kennedy, and J. M. Drexler, “Cyber- sickness is not simulator
sickness,” in Proc. 41st Human Factors and Ergonomics Society, 1997, pp. 1138-1142
S. Jumisko-Pyykkö and T. Utriainen, ”D4.4 v2.0 Results of the user-centred quality
evaluation experiments”, Technical report, November 2009, (2009).
Mobile 3DTV Content Delivery Optimization over DVB-H System
MOBILE3DTV - Mobile 3DTV Content Delivery Optimization over DVB-H System - is a three-year
project which started in January 2008. The project is partly funded by the European Union 7th
RTD Framework Programme in the context of the Information & Communication Technology (ICT)
Cooperation Theme.
The main objective of MOBILE3DTV is to demonstrate the viability of the new technology of
mobile 3DTV. The project develops a technology demonstration system for the creation and
coding of 3D video content, its delivery over DVB-H and display on a mobile device, equipped
with an auto-stereoscopic display.
The MOBILE3DTV consortium is formed by three universities, a public research institute and two
SMEs from Finland, Germany, Turkey, and Bulgaria. Partners span diverse yet complementary
expertise in the areas of 3D content creation and coding, error resilient transmission, user
studies, visual quality enhancement and project management.
For further information about the project, please visit www.mobile3dtv.eu.
Tuotekehitys Oy Tamlink
Project coordinator
FINLAND
Tampereen Teknillinen Yliopisto
Visual quality enhancement,
Scientific coordinator
FINLAND
Fraunhofer Gesellschaft zur Förderung der
angewandten Forschung e.V
Stereo video content creation and coding
GERMANY
Technische Universität Ilmenau
Design and execution of subjective tests
GERMANY
Middle East Technical University
Error resilient transmission
TURKEY
MM Solutions Ltd.
Design of prototype terminal device
BULGARIA
MOBILE3DTV project has received funding from the European Community’s ICT programme in the context of the
Seventh Framework Programme (FP7/2007-2011) under grant agreement n° 216503. This document reflects only
the authors’ views and the Community or other project partners are not liable for any use that may be made of the
information contained therein.