Usability evaluation of design solutions for tablet
Transcription
Usability evaluation of design solutions for tablet
Sami Pekkala Usability evaluation of design solutions for tablet magazines Department of Media Technology Thesis submitted for examination of the degree of Master of Science in Technology Espoo, August 14th , 2012 Thesis supervisor: Professor Pirkko Oittinen Thesis instructor: Mikko Kuhna M.Sc “Note the written instructions on how to use the interface, which are always a sign of trouble.” Jakob Nielsen Aalto University School of Science ABSTRACT OF THE MASTER’S THESIS Author Sami Pekkala Title of the thesis Usability evaluation of design solutions for tablet magazines Date August 14th , 2012 Language Number of pages English 96 Degree Programme Degree Programme in Automation and Systems Technology Department Department of Media Technology Professorship T-75 Supervisor Professor Pirkko Oittinen Instructor Mikko Kuhna M.Sc Abstract The aim of this thesis is to evaluate the usability of tablet magazines. The content of the magazines is the same, but the design solutions (layout, structure and interaction possibilities) vary. A formative usability evaluation was done to find usability problems and a summative evaluation was carried out to compare and rank the magazines. The main emphasis on usability evaluation is in user testing and eye-tracking. The field of research is digital publishing, especially magazines for tablet computers. Print sales are declining and publishers are keen to find new means to approach the consumer. The form of digital publishing processes and business models for mobile devices have not yet been established. Web-based and image-based magazines are both common. Semi-automatic computational layout is presented as a publishing technique that can reduce the human effort of converting content into various screensizes. Before deciding on how to publish a tablet magazine, it is important to evaluate how the different design solutions affect the usability of a magazine. A theoretical background for this study is presented in the beginning. Usability is defined to consist of effectivenes, efficiency and satisfaction of the user interface. Usability is found to be dependent on context, i.e. users, tasks and environment. Usability evaluation methods (uem) chosen for this study include think aloud, performance measures, questionnaires and quantitative and qualitative eye-tracking analysis. The summative evaluation results of this study show that a web-based version of a magazine with dynamic, semi-automatic layout has better usability than the others. An image-based version with static manual layout is second and another version with dynamic layout is third in the terms of usability. Design solution suggestions for even more usable tablet magazine are made as a result of the formative evaluation. Keywords Usability, tablet, tablet computer, magazine, eye-tracking Aalto-yliopisto Perustieteiden korkeakoulu DIPLOMITYÖN TIIVISTELMÄ Tekijä Sami Pekkala Työn nimi Tablettilehtien designratkaisujen käytettävyysarviointi Päivämäärä Kieli Sivumäärä 14.8.2012 englanti 96 Tutkinto-ohjelma Automaatio- ja systeemitekniikan koulutusohjelma Laitos Mediatekniikan laitos Professuuri T-75 Työn valvoja Professori Pirkko Oittinen Työn ohjaaja Mikko Kuhna DI Tiivistelmä Tämän diplomityön tavoitteena on synnyttää tietoa neljän tablettiaikakauslehden käytettävyydestä kun arvioitavien lehtiversioiden sisältö on sama, mutta designratkaisut (taitto, rakenne ja interaktiomahdollisuudet) vaihtelevat. Formatiivinen käytettävyyarviointi tehtiin käytettävyysongelmien löytämiseksi ja summatiivisella arvioinnilla pyrittiin vertaamaan aikakauslehden versioita toisiinsa. Arvioinnissa painotetaan käyttäjätestejä ja silmänliikemittauksia. Tutkimusalueeksi määritellään digitaalinen julkaiseminen, erityisesti tablettiaikakauslehden osalta. Paperilehtien tilaajamäärien laskiessa julkaisijat yrittävät löytää uusia keinoja tavoittaa kuluttaja. Digitaalisten julkaisuprosessien ja liiketoimintojen mallit eivät ole vielä vakiintuneet. Web- ja kuvapohjaiset aikakauslehdet ovat molemmat yleisiä. Puoliautomaattinen laskennallinen taittaminen esitellään julkaisutekniikkana, joka vähentää työn määrää sisällön muokkaamisessa eri näyttöko’oille. Eri designratkaisujen vaikutus käytettävyyteen täytyy arvioida ennen kuin päätetään miten tablettilehti julkaistaan. Alun teoreettisen taustan esittelyssä käytettävyys määritellään koostuvan käyttöliittymän tehokkuudesta, suorituskyvystä ja käyttäjätyytyväisyydestä. Käytettävyys on myös aina riippuvainen asiayhteydestä: käyttäjistä, tehtävistä ja ympäristöstä. Valitut käytettävyyden tutkimusmenetelmät ovat ääneen ajattelu, suorituskykymittaus, kyselyt sekä määrällinen ja laadullinen silmänliikemittaus. Summatiivinen arviointi osoittaa, että dynaamisen, puoliautomaattisen taiton omaava web-pohjainen tablettilehti on parempi käytettävyydeltään kuin muut. Kuvapohjainen lehti manuaalisella taitolla, on toinen ja käytettävyydeltään huonoin on toinen dynaamisen taiton omaava versio samasta aikakauslehdestä. Formatiivisen arvioinnin avulla määritellään käytettävyyden kannalta parhaat designratkaisut. Avainsanat Käytettävyys, tabletti, tablettitietokone, aikakauslehti, silmänliikemittaus Acknowledgements This master’s thesis was done at the Department of Media Technology at Aalto University School of Science. My work was a part of the NextMedia program financed by Tekes – the Finnish Funding Agency for Technology and Innovation. Bigger acknowledgement go to my supervisor Professor Pirkko Oittinen, who gave me the job first place and who re-revised this lengthy script (with mercifully few corrections). I’m most grateful for the opportunity to do my final schoolwork here in your research group. Also, big up to Mikko Kuhna (M. Sc) for instructing me through this process. Keep up the good work with future instructees, but beware betting on football against them (you might end up losing). To all 40+ participants in my user tests: Thank you, for your time and your eyes. Subject No. 6, a cute blonde girl, gets the biggest credit for my graduation by feeding me after work and by gently kicking me to the goal. Finally, I want to thank my family. Especially my sister, for piloting; and dad, for providing me with valuable insight into how the eldelry use an iPad magazine :). Espoo, August 10th, 2012 Sami Pekkala iv Contents Abstract in English ii Abstract in Finnish iii Acknowledgements iv List of Figures viii List of Tables x Abbreviations xi 1 Introduction 1.1 Scope and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Usability evaluation methods 2.1 Definition of usability . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 A method for every need . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Formative usability evaluation . . . . . . . . . . . . . . . . 2.2.2 Summative usability evaluation . . . . . . . . . . . . . . . . 2.3 An overview of common usability methods . . . . . . . . . . . . . . 2.3.1 Heuristic evaluation and other usability inspection methods 2.3.2 Think aloud (as a usability evaluation method) . . . . . . . 2.3.3 Performance measures (as a usability evaluation method) . 2.3.4 System Usability Scale and Single Usability Metric . . . . . 2.3.5 Eye-tracking (as a usability evaluation method) . . . . . . . 3 Tablet computers and tablet publishing 3.1 Touchscreen devices . . . . . . . . . . . . . . . . . . . 3.2 Tablet computers . . . . . . . . . . . . . . . . . . . . . 3.2.1 Apple iPad tablet computer . . . . . . . . . . . 3.3 Usability of tablet computers . . . . . . . . . . . . . . 3.3.1 Direct manipulation in graphical user interfaces 3.3.2 Natural user interface . . . . . . . . . . . . . . 3.3.3 Previous research . . . . . . . . . . . . . . . . . 3.3.4 iPad specific research . . . . . . . . . . . . . . 3.4 Definition of magazine . . . . . . . . . . . . . . . . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 3 . . . . . . . . . . 4 4 5 5 7 7 7 10 11 11 12 . . . . . . . . . 15 15 16 16 16 16 18 19 19 20 Contents 3.5 vi 3.4.1 Definition of tablet magazine . . . . . . . . . . . . . . . . . . . . . 20 Tablet magazine publishing . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.5.1 Tablet publishing in Finland . . . . . . . . . . . . . . . . . . . . . 22 4 Magazine in the tests 4.1 Tietokone magazine . . . . . . . . 4.2 Retail version . . . . . . . . . . . . 4.3 AnyReader version . . . . . . . . . 4.4 “Fancybox” web-based magazine . 4.5 “Photoswipe” web-based magazine 4.6 Structural differences in magazines 5 Experiment setup 5.1 Chosen methods . . . . . . . . . 5.2 Users . . . . . . . . . . . . . . . . 5.3 Test setup . . . . . . . . . . . . . 5.3.1 Eye-tracking system setup 5.4 Test protocol . . . . . . . . . . . 5.5 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 24 25 27 30 32 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 38 39 39 40 41 43 6 Analysis 6.1 Task time . . . . . . . . . . . . . . . . . . . . . . . 6.2 System Usability Scale and Single Usability Metric 6.3 Quantitative eye-tracking . . . . . . . . . . . . . . 6.4 Think aloud and qualitative eye-tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 48 49 49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Results 7.1 Task time . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Total task time . . . . . . . . . . . . . . . . 7.1.2 Task browsing time . . . . . . . . . . . . . . 7.2 System Usability Scale and Single Usability Metric 7.2.1 Satisfaction . . . . . . . . . . . . . . . . . . 7.3 Quantitative eye-tracking . . . . . . . . . . . . . . 7.3.1 Pupil diameter . . . . . . . . . . . . . . . . 7.3.2 Fixation duration . . . . . . . . . . . . . . . 7.4 Think aloud and qualitative eye-tracking . . . . . . 7.4.1 Usability problems . . . . . . . . . . . . . . 7.5 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 51 54 54 56 56 58 58 60 61 62 8 Discussion 8.1 Summary of the summative usability evaluations . . . . . . . . . 8.1.1 Usability implications of task time and satisfaction scores 8.1.2 Quantitative and qualitative eye-tracking result analysis . 8.1.3 Low reliability of SUS and SUM scores . . . . . . . . . . . 8.2 Summary of the formative usability evaluations . . . . . . . . . . 8.2.1 Findings from AnyReader version . . . . . . . . . . . . . . 8.2.2 Findings from retail version (Woodwing) . . . . . . . . . . 8.2.3 Findings from Fanxybox and Photoswipe versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 66 66 68 69 69 70 70 71 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents 8.3 vii Reliability and validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 8.3.1 Influence of user background . . . . . . . . . . . . . . . . . . . . . 72 9 Conclusion 74 References 76 Appendices 82 List of Figures 2.1 2.2 A model of iso standard and the Single Usability Metric . . . . . . . . . . 12 Screen capture from iViewX-software shows corneal reflection (black crosshair) and pupil (white crosshair) . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 3.2 Apple iPad2 tablet computer from front, back and side . . . . . . . . . . . 17 Different types of tablet magazine solutions, from left: application-based, web-based, and a compilation magazine . . . . . . . . . . . . . . . . . . . 21 4.1 In ww, headlines in cover and toc are hyperlinks to the corresponding articles, a ⊕button opens a pop-up window with additional information inside an article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The four different image interaction possibilities in ww from top left: a pop-up window, enlarge image to full-screen, show image caption and enlarge image by little . . . . . . . . . . . . . . . . . . . . . . . . . . . . Different scrollable portions of a page in ww from left: scrollable text column, scrollable image and scrollable article . . . . . . . . . . . . . . . Toolbar and functions of four toolbar buttons in ww, from top left: page browser (did not work in the tests), library, homepage and store . . . . In ar, articles are accessed by tapping a hyperlink in top-level . . . . . Tapping an image in article opens an image carousel in ar, where images of the same article can be browsed . . . . . . . . . . . . . . . . . . . . . Tapping the “−button” in ar toolbar shrinks the text size and layout adjusts accordingly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toolbar navigation shortcuts in ar: “toc button” brings user to top-level and “home button” exits to library . . . . . . . . . . . . . . . . . . . . . Layout changes in ar after rotating the device 90◦ . . . . . . . . . . . . In fb and ps, articles can be accessed by tapping a hyperlink in toc . . Image opens to a pop-up window in fb . . . . . . . . . . . . . . . . . . . The navigation bar in fb and ps . . . . . . . . . . . . . . . . . . . . . . Layout changes in fb and ps after rotating the device 90◦ . . . . . . . . After tapping an image in ps, image carousel opens and all images in the article can be browsed . . . . . . . . . . . . . . . . . . . . . . . . . . . . An overview of the navigational structure of ww magazine . . . . . . . An overview of the navigational structure of ar magazine . . . . . . . . An overview of the navigational structure of ps and fb magazines . . . 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 5.1 5.2 . 26 . 26 . 27 . 28 . 28 . 29 . 29 . . . . . . 30 30 31 31 32 32 . . . . 33 34 35 36 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 An Epiphan dvi2usb Frame Grabber window was placed directly under the iPad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 viii List of Figures ix 5.3 Gesture instructions for novice users . . . . . . . . . . . . . . . . . . . . . 43 6.1 Screen capture from a video combining think aloud, gestures and eyetracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 Average total task times for each task . . . . . . . . . . . . . . . . . . . Average task browsing times for each task . . . . . . . . . . . . . . . . . Average task times grouped into usability aspects . . . . . . . . . . . . . Total task times averaged over magazines . . . . . . . . . . . . . . . . . Task browsing times averaged over magazines . . . . . . . . . . . . . . . sus score averaged over magazines . . . . . . . . . . . . . . . . . . . . . sum score averaged over magazines . . . . . . . . . . . . . . . . . . . . . Satisfaction score averaged over magazines . . . . . . . . . . . . . . . . . Average satisfaction scores given for each task . . . . . . . . . . . . . . . Average task satisfaction scores grouped into usability aspects . . . . . . Pupil diameter averaged over magazines . . . . . . . . . . . . . . . . . . Fixation duration averaged over magazines . . . . . . . . . . . . . . . . Amount of negative and positive comments about each magazine . . . . Total number and different usability problems found from observing the videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 . . . . . . . . . . . . . 52 52 53 53 53 55 55 56 57 57 59 59 61 . 62 Eye-tracking shows how correct headline is not “seen” even though quickly looked at, because of more demanding typography below, which was not a headline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 List of Tables 2.1 Summary of usability methods . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Technical specifications of Apple iPad2 . . . . . . . . . . . . . . . . . . . . 17 4.1 4.2 The three biggest IT-magazines in Finland . . . . . . . . . . . . . . . . . . 24 An overview of the magazine user-interfaces . . . . . . . . . . . . . . . . . 37 5.1 5.2 Test user statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Task overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 Key statistics for total task times . . . . . . . . . . . . . . . . . . . . . . . Key statistics for task browsing times . . . . . . . . . . . . . . . . . . . . Results (P (T ≤ t)) of two-tailed t-tests for total task times . . . . . . . . Results (P (T ≤ t)) of two-tailed t-tests for task browsing times . . . . . . Results (P (T ≤ t)) of two-tailed t-tests for sus score . . . . . . . . . . . . Results (P (T ≤ t)) of two-tailed t-tests for sum score . . . . . . . . . . . . Key statistics for sus score . . . . . . . . . . . . . . . . . . . . . . . . . . Key statistics for sum score . . . . . . . . . . . . . . . . . . . . . . . . . . Key statistics for satisfaction score . . . . . . . . . . . . . . . . . . . . . . Results (P (T ≤ t)) of two-tailed t-tests for satisfaction score . . . . . . . . Results (P (T ≤ t)) of two-tailed t-tests for average pupil diameters . . . . Results (P (T ≤ t)) of two-tailed t-tests for average fixation durations . . . Correlation coefficients between pupil diameter, task time and satisfaction Key statistics for pupil diameter measures . . . . . . . . . . . . . . . . . . Key statistics for fixation duration measures . . . . . . . . . . . . . . . . . Correlation coefficients between fixation duration, task time and satisfaction The number of positive and negative comments from think aloud and the three most remarked aspects of usability (−/+) . . . . . . . . . . . . . . . Total number and different usability problems found from observing the videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Most important individual usability problems by magazine . . . . . . . . . The order of magazines in all usability measurements . . . . . . . . . . . . A “pros and cons” summary of each magazine based on the entire study . 7.18 7.19 7.20 7.21 8.1 6 52 53 54 54 55 55 55 55 58 58 59 59 59 59 60 60 61 61 63 64 65 Correlation coefficients between user background and some usability measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 x Abbreviations HCI Human-computer interaction GUI Graphical user interface NUI Natural user interface TOC Table of contents UEM Usability evaluation method ISO International Organization for Standardization CI Confidence interval HTML Hypertext Markup Language AR AnyReader magazine WW Woodwing (retail) magazine FB Fancybox magazine (and image viewing system) PS Photoswipe magazine (and image viewing system) SUM Single Usability Metric SUS System Usability Scale xi Chapter 1 Introduction The ongoing digital revolution can be considered as the greatest change in the print media business since Gutenberg’s press brought the printing revolution. Production processes of print media publishers have already been digital for a few decades. The outlook or quality of a typical printed publication has not altered when the production processes have switched from analog to digital. What has changed the game radically is the advent of multiple new viewing platforms for digital media. Desktop, laptop and notebook computers and mobile phones can all be used to view the same digital content. A plethora of new devices implies that there must be also new usage behaviors for print media. This has been a pitfall to some traditional media companies; they have not realized that a digital carbon copy of a print publication is not enough for “digital omnivores”. The single new device that has changed the media experience most dramatically could be the iPad, manufactured by Apple Inc. Apple is expected to sell its 100 millionth iPad this year (2012)1 . iPad is a tablet computer, which are defined as a mobile computers consisting only from a large flat touchscreen surface. Tablet computers, or tablets, combine the strengths from the previously mentioned devices from the view of digital media consumption. Computers have big enough screens for reading but they are not mobile. Mobile phones, on the other hand, are always with you, but small screen size hinders digital media usage. iPad’s ten inch touchscreen, long battery life, instant power-up from hibernation and wireless data connection make it suitable for media consumption. Print media publishers have struggled to find a flexible publishing platform to suit the various screen sizes of the new devices. 1:1 digital copy of a broadsheet newspaper cannot be read comfortably from a four inch mobile phone screen. Then again, the same paper can be read from a computer display. A computer display with an HD-resolution has over three times more pixels than a typical mobile phone display2 . The input devices and 1 2 http://www.usatoday.com/tech/news/story/2012-03-03/apple-ipad-sales/53344970/1 HD: 1920 × 1080 / iPhone: 960 × 640 ≈ 3.4 1 Chapter 1: Introduction 2 use situations have also great variability between the devices. For maximum audience and profit, the same digital publication should be viewable and usable with all devices. 1.1 Scope and methods The main concepts this master’s thesis deals with are tablet computer, magazine, layout, usability, user studies and eye-tracking. As said before, print media is trying to find new routes to the consumer as print circulations are slowly declining. In Finland, magazines circulations went down by 1.2 % and newspapers by 2.6 % during 2009–20103 . When the same publication is presented on multiple platforms, it is usually preferred that the content of the publication is the same. Instead, the form should be shaped according to the device. Due to the relatively small portion of readership which uses mobile devices, there are incentives to do this alteration of form with little manpower in Finnish publishing houses. In this thesis, two automatic layout systems have been compared along with manually produced layout. The viewpoint of the comparison research was that of usability; when the content is the same, how do the distinct outlook and structure created by the layout systems affect the usability of the publication. Four different versions of an issue of a Finnish computer magazine were examined. The viewing device used was Apple’s iPad 2 and the methods for usability evaluation included heuristic evaluation and user tests with observation, think aloud, questionnaires, performance measures and eye-tracking. Even though the focus was on a tablet magazine usability, the same usability principles also apply—to some limit—to touchscreen-equipped mobile phones also. This should be noted because the two tested dynamic layout systems are adaptable to all screen sizes. In fact, many of the usability principles related to visual aspects are universal and could be applied even to print magazine layouts as well, but this is out of the scope of this study and is not discussed separately. 1.2 Aim The aim of this thesis is to find out what makes a usable tablet magazine. Usability problems and bottlenecks from the four magazine versions were identified. The results can be used to further develop automatic and manual layout of the tablet magazines for a better user experience. In addition, it is also important to evaluate and rank the magazine versions from the perspective of usability. These results can guide publishers to choose from the different versions: should they use the effort of graphic designers and 3 http://www.levikintarkastus.fi/uutisia/Levikkitiedote2011.pdf Chapter 1: Introduction 3 programmers to convert the content for different devices or should they consider a more automatic approach? Previous research about tablet magazine usability is scarce. General tablet usability and e-reader4 studies can be found, but the author is not aware of any tablet magazine specific research. Neither were studies found about eye-tracking used for tablet usability evaluations. This thesis’s research patches a gap in the hci (Human-computer interaction) field combining tablet (magazine) usability evaluation and eye-tracking. The main research question of the thesis can be stated as What defines a tablet magazine’s usability? Also, a sub question, How do the different magazine versions compare in terms of usability?, is discussed related to the research material specifically. These questions are answered as thoroughly as possible with the methods mentioned before. The second sub question of the research considering the rather explorative method is: How can eye-tracking be used to evaluate tablet (magazine) usability?. 1.3 Structure In the next chapter, Chapter 2, usability is defined and an overview of usability evaluation methods (uems) is given. The methods used in this thesis are discussed in more detail. Chapters 3 and 4 present the material of this research. Tablet computers, especially iPad, are introduced and an overview of tablet publishing is given in Chapter 3. In Chapter 4, the magazine chosen for this research is presented. In addition, the four different versions of the magazine and their differences are discussed. Chapters 5 and 6 deal with the experimental research which was conducted. Chapter 5 shows the user test setup and Chapter 6 examines the data analysis methods. Finally, the results are presented in Chapter 7, and in 8, the results are evaluated, compared to previous research and the research questions are answered. To conclude, Chapter 9 summarizes the research and the obtained results. 4 An electronic device for reading e-books. It has usually a black & white display and few buttons (no touchscreen) and thus lacks the versatility of tablets. Chapter 2 Usability evaluation methods In this chapter, a definition of usability is given and different usability evaluation methods are presented and evaluated. Some methods, which are relevant to this research, are discussed in more detail. 2.1 Definition of usability A formal and widely used definition for usability can be derived from an iso standard, which states: [Usability is the] Extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use. [37] The iso standard further defines effectiveness (a task completion measure), efficiency (a task time measure), satisfaction (a subjective measure of experience of a user) and context (equipment, environment, tasks and users). Other quality attributes which can be attached to usability are learnability (easy to learn for a beginner), memorability (easy to remember for a casual user) and error rate (few and easily recoverable errors) of the system [50]. Besides being a quality attribute, the word usability can also be used to mean the process and methods for improving ease-of-use of a system during product development and after. Usability, in the latter sense, is a synonym for usability engineering. The Usability Engineering Lifecycle by Mayhew (1999) defines usability engineering to consist of four parts. The usability process can be divided into distinct phases as follows: requirements analysis, design/testing/development and installation [46]. 4 Chapter 2: Usability evaluation methods 5 In this study, usability is used as a quality attribute of a system or as a part of the hci discipline. Usability engineering, usability testing (with users) and usability inspection (with experts) are used to describe the processes. 2.2 A method for every need Usability evaluation methods, or uems, can be divided into two subsets: usability inspection and usability testing methods. In usability inspection methods, one or several experts on user interface design and usability examine the system. Usability testing methods have real users using the system and the usability practitioner’s role is to observe them. Examples of usability inspection methods are heuristic evaluation, cognitive walkthrough and formal usability inspection. Examples of usability testing methods are think aloud, performance measurements and eye-tracking. A summary of usability evaluation methods is shown in Table 2.1 (adapted from Nielsen (1993) and the last two methods from Pernice (2009) [50, 61]). The table is not exhaustive and some of the methods can be divided further (e.g. heuristic estimation, retrospective think aloud). To summarize: there are tens of usability evaluation methods available, each with its own pros and cons. The methods also come into play at different stages of the usability engineering lifecycle [50]. With different advantages and disadvantages, it is advisable to use a set of methods, which complement each other. To choose a set of methods, a usability practitioner has to apply some criteria in the selection. The next two sections can be thought as a starting point for a decision: whether to improve system’s usability or compare system’s usability with others. 2.2.1 Formative usability evaluation Besides self-explanatory quantitative–qualitative categorization of usability evaluation methods, a formative–summative division can also be used. This partition is based on the goals of the usability study. Formative evaluation aims at improving the usability of an interface when the system is developed or revised [50]. The methods described as formative are typically fast, cheap and simple; due to the fast cycle of a product development process, usability tests need to be conducted and analyzed quickly. As a result, formative studies and usability testing can be thought as unscientific. Usability, in this sense, is reduced to craft, not science [2]. For a usability professional, this does not diminish the value of formative evaluations. Heuristic evaluation is one example of a formative method with high benefit–cost ratio [51]. Other examples of usability methods, which are normally used for formative evaluation are heuristic evaluation, think aloud, observation, interviews and qualitative eye-tracking. Chapter 2: Usability evaluation methods 6 Table 2.1: Summary of usability methods Method Users Main advantage(s) Main disadvantage(s) Heuristic evaluation 0 Finds individual usability Does not involve real users, so does not find problems. “surprises” relating to Can address expert user their needs. issues. Performance measures 10 at least Numerical data. Results easy to compare. Does not find individual usability problems. Think aloud 3–5 Pinpoints user misconceptions. Cheap and easy to conduct. Unnatural for users. Hard for expert users to verbalize. Observation 3 or more Ecological validity; reveals users’ real tasks. Suggests functions and features. Appointments hard to set up. No experimenter control. Questionnaires 30 at least Finds subjective user preferences. Easy to repeat. Pilot work needed (to prevent misunderstandings). Interviews 5 Flexible, in-depth attitude and experience probing. Time consuming. Hard to analyze and compare. Focus groups 6–9 per group Spontaneous reactions and group dynamics. Hard to analyze. Low validity. Logging actual use 20 at least Finds highly used (or unused) features. Can run continuously. Analysis programs needed for huge mass of data. Violation of user’s privacy. User feedback Hundreds Tracks changes in user requirements and views. Special organization needed to handle replies. Eye-tracking 6 qual./ 39 quant. Data about where users look (can not be acquired by other methods). Analyzed data does not directly translate to a usability measure. Unreliable and expensive equipment. Card sorting 15 Easy, cheap and quick. Can be conducted without an interface. One-sided data about concept grouping, which can be highly varied. Chapter 2: Usability evaluation methods 2.2.2 7 Summative usability evaluation After a product has been developed, it can be compared with competing products. Rather than improving a product in progress, summative evaluation attempts to rank the usability of a finished product with others [50]. More time and resources can be allocated to this than to the formative evaluation because summative evaluations are (usually) done outside the product development cycle. The results from summative evaluations have to be numeric, so that different systems can be compared. This creates requirements for the data gathering methods. Number of subjects in summative usability evaluations has to be relatively high to get significant and reliable results (see table 2.1). Also, measurements and analysis of the data has to be somehow standardized for different systems to prevent biased results. iso has defined usability so that it can be measured summatively with performance measures (task completion, task time) and questionnaire (satisfaction questionnaire) [38]. In conclusion, summative usability evaluations can be considered to be more scientific than formative. Examples of usability methods used for summative evaluations include performance measures, questionnaires and quantitative eye-tracking. Formative– summative categorization of usability methods presented here gives a starting point for a usability practitioner to choosing a method. Next, an overview of the more common usability evaluation methods is laid out. 2.3 An overview of common usability methods The most common usability evaluation methods are presented in this section including the methods which are relevant to this study. The next subsection 2.3.1 is devoted to usability inspection methods as a whole. Usability testing methods are presented in the end of this chapter. More space is allocated here, because usability testing methods are the most influential and popular uems [63]. 2.3.1 Heuristic evaluation and other usability inspection methods Heuristic evaluation (also called Expert evaluation) is the most widely used usability inspection method. Usability inspection is a generic term for methods, which involve evaluations of usability of a user-interface by experts, not users [53]. Other common usability inspection methods are cognitive and pluralistic walkthrough. Heuristic evaluation is “done by looking at an interface and trying to come up with an opinion about what is good and bad about the interface”. The name of the method comes from the set of recognized usability guidelines, the heuristics. Ideally, the evaluator Chapter 2: Usability evaluation methods 8 compares the system against some heuristics or guidelines as the evaluation proceeds, although the evaluation can be conducted using intuition and common sense also. [50] The number of guidelines in a heuristic system can be as high as thousand [71], which can be too intimidating and time consuming. To tackle these problems, the most used set of heuristics was developed by Nielsen & Molich in 1990 (revised in 1994) [53, 54]. The set contains ten guidelines to be fulfilled by a usable user-interface, presented in the following list [53]. Visibility of system status The system should always keep users informed about what is going on, through appropriate feedback within reasonable time. Match between system and the real world The system should speak the users’ language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order. User control and freedom Users often choose system functions by mistake and will need a clearly marked ”emergency exit” to leave the unwanted state without having to go through an extended dialogue. Support undo and redo. Consistency and standards Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions. Error prevention Even better than good error messages is a careful design, which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action. Recognition rather than recall Minimize the user’s memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate. Flexibility and efficiency of use Accelerators—unseen by the novice user—may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions. Aesthetic and minimalist design Dialogues should not contain information, which is irrelevant or rarely needed. Chapter 2: Usability evaluation methods 9 Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. Help users recognize, diagnose, and recover from errors Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution. Help and documentation Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large. The evaluator should be a usability specialist, who has some knowledge about usability in general or about the platform and particular user-interface, preferably both (a “double specialist”). Even novice evaluators can find some usability problems from a user-interface, but usability specialists are more potent at finding them. A study showed that, on average, five novice evaluators found half, five “regular” usability specialists found 88 % and five “double” specialists found 97 % of the usability problems from a user-interface [49]. Furthermore, averaged over six studies, five evaluators found 75 % of the usability problems [53]. According to Nielsen & Landauer (1993), a following equation can be used to predict the number of usability problems found with a certain number of evaluators: P roblemsF ound(i) = N (1 − (1 − λ)i ) (2.1) where P roblemsF ound(i) is the number of (different) usability problems found by i evaluators, N is the total number of usability problems, and λ is the proportion of N found by a single evaluator. Averaged across six studies, the mean λ was 31 % and the mean N was 41. With some other assumptions on the project size, benefits of corrected and costs of uncorrected usability problems, an optimal (highest benefit–cost ratio) number of heuristic evaluators can be derived to be 4.4. [51] The goal of a heuristic evaluation is to find usability problems. Due to its simplicity and affordability, it is most often used during an iterative product development process where it is crucial to constantly revise product prototypes [50]. If resources allocated to a product development project do not allow for any formal usability methods to be used, it is advised that at least a single heuristic evaluation should be conducted [46]. Besides heuristic evaluation, other widely exploited usability inspection methods are walkthroughs, especially cognitive and pluralistic walkthroughs. The cognitive walkthrough concentrates on evaluating user-interfaces ease of learning by exploring it. Lewis et al. first introduced the method in a paper in 1990 [42]. The method is based on the Chapter 2: Usability evaluation methods 10 idea that users do not read manuals but rather discover features they need by exploring the system [53]. An overview of a cognitive walkthrough process is listed below (adapted from [53]). 1. Define inputs to the walkthrough 2. Convene analysts 3. Walk through the action sequences for each task 4. Record critical information 5. Revise the interface to fix the problems During the cognitive walkthrough, evaluators examine the user-interface in the context of some tasks and use scenarios. The inputs required for a evaluation session are the interface (usually a paper mock-up), description of the assumed user population, a task and use scenario, and a list of actions a user should execute to complete each task (hence the name walkthrough) [53]. By these means, the evaluators assess whether or not the sequence of required actions is suitable for the current task. The pluralistic walkthrough was first developed at ibm in the 1980’s and it was introduced to the public in an article was published in 1991 [5]. It differs from a cognitive one by having several evaluators from three different groups: the users, the product developers and the usability specialists. Gradually, the evaluators collaboratively go through the action sequences of each interface dialogue window as normal users would. Then they decide if the actions are appropriate for the task [53]. 2.3.2 Think aloud (as a usability evaluation method) The rest of this section deals with usability testing methods and the first to be discussed is think aloud. Think aloud, or thinking aloud, ”may be the single most valuable usability engineering method” [50]. It started as a method for psychological research when Ericsson & Simon claimed that verbal reports could be used as data in 1980’s [26, 27]. They claimed that when properly instructed to think aloud, users “verbalize information that they are attending to in short-term memory” and that it does not necessarily affect cognitive processes [26]. In a sense, properly executed think aloud can thus be described as an indirect way of accessing user’s mind, or at least the short-term memory. Ericsson & Simon’s protocol describes strictly one-sided communication: the facilitator is silent and the test subject speaks. Usability specialists have later adopted the method to serve as a practical tool for evaluating human-computer interfaces. Think aloud used by usability specialists differs from Ericsson & Simon’s protocol by involving more twoway communication, although the user still does most of the talking. Instructions for Chapter 2: Usability evaluation methods 11 facilitating a think aloud session found in usability handbooks are vague; there are no strict protocols for think aloud method in usability, which makes the results of studies harder to compare [7]. During a think aloud session, a participant is simply instructed to use an interface while continuously thinking out aloud. This is usually done while executing some preformatted tasks. User comments are usually recorded and transcribed or they are observed live and noted down. Think aloud produces a great deal of qualitative data from small amount of users. The drawback is that it might affect performance measurement. It seems that think aloud slows down complex tasks but does not affect simple tasks, such as finding information [20]. One handbook recommends not to use think aloud and performance measures at the same time at all, because it slows down users considerably [64]. On the other hand, think aloud has been found to even speed up problem solving in search tasks, by allowing users to refine their thoughts as they verbalize them [4]. What is clear is that users performance (task time, errors etc.) when thinking aloud cannot be compared to a natural performance, but can be compared to other users think aloud performance. 2.3.3 Performance measures (as a usability evaluation method) Multiple performance measures can be gathered from a user doing tasks. Nielsen and Rubin both list 18 (not all same) quantifiable performance measurements in their handbooks, such as: task completion time, ratio between successful interactions and errors, number of user errors, number of commands or features used by user and number of times user contacts help desk [50, 64]. Performance measures are usually easy to obtain with a stopwatch and user observation and they produce easy to analyze quantifiable data. However, they do not tell why something is difficult to a user. Usability problems cannot be identified by using performance measurements alone, so they are usually used in summative evaluations (e.g. comparing competitive products) [50]. Usability, as defined by iso, can be measured with performance measures and a satisfaction questionnaire alone. As told before (2.2.2), usability consists of effectiveness, efficiency and satisfaction which can be measured by task completion rate, task time and satisfaction questionnaire respectively [37]. Therefore, performance measures are used in summative but not in formative evaluations. 2.3.4 System Usability Scale and Single Usability Metric System Usability Scale, sus, is described as “a reliable, low-cost usability scale that can be used for global assessments of systems usability”. It was developed at Digital Chapter 2: Usability evaluation methods 12 Equipment Corporation in 1986 and has since been widely exploited in different areas for its robustness and ease of use. A sus questionnaire has ten statements about different aspects of usability. Users mark their agreement with the statements on a five-point Likert scale. Agreement in half of the statements implies negative usability, so for five statements score contribution is: 5 − scaleposition and for the other five: scaleposition − 1. Sum of each ten score points is then multiplied by 2.5. As a result, a sus score ranges from 0–100 with hundred being the maximum score for usability. [9] Figure 2.1: A model of iso standard and the Single Usability Metric [66] Single Usability Metric, sum, tries to gather all aspects of usability (as defined by iso) under its hood. Task time and subjective satisfaction are the measures of efficiency and satisfaction respectively, as is defined in the standard [37]. Effectiveness is defined by two measurements: number of errors and task completion ratio. Previous research has found that these four measurements of usability would produce “maximum amount of information in one score” [67]. Figure 2.1 encapsulates the idea behind the sum. 2.3.5 Eye-tracking (as a usability evaluation method) Eye-tracking devices enable insight that no other uem can give: that is to see where users look. The value of this data is based on eye–mind hypothesis coined by Just & Carpenter in 1976, who stated that “the eye fixates the referent of the symbol currently being processed if the referent is in view” [12]. They arrived in this conclusion when they found out that during task execution, subjects eye-fixations lasted as long as mental processes in working memory (50–800 milliseconds) and users also always looked at object in question when possible. It can be generalized that we look at what we think, Chapter 2: Usability evaluation methods 13 if the object is in sight. Like think aloud, eye-tracking can be considered as an indirect mind-reading method in a sense. Eye movements in humans can be divided in convergent, smooth pursuit and saccadic movement. Convergence and smooth pursuit movement occurs when eyes have focused on an object, which moves towards or away from you (convergence), or across the field of view (smooth pursuit). Most of eye movement however is not smooth at all but sporadic which consist of saccades and fixations. During fixations, which typically last 200–600 milliseconds, eyes are stable. Saccades are quick movements between fixations, which last 20–100 milliseconds and can reach velocities of 900◦ per second. During saccades, eyes are effectively blind so humans are able to see only during fixations. [77] Eye-tracking devices record saccades and fixations on one part of field of view. Eyetracker essentially looks at user’s eyes to see where they are directed at. If the eyetracker is head-mounted, in addition to cameras recording eyes, it consists of a camera pointed forward next to eyes. As a result, gaze direction related to field of view video can be calculated and visualized, when eye and camera positions are known. Stand-alone eye-trackers have fixed position between eye camera and monitor from where the gaze is tracked. Most of the modern eye-tracking systems use infrared light and cameras to track the eyes. Non-collimated infrared light from two light sources is projected towards eyes creating distinct corneal reflections in both eyes besides pupil1 . Infrared light also makes the contrast between pupil and surrounding cornea larger and allows the camera to capture the locations of corneal reflection and pupil. After a calibration, the location between these two points (corneal reflection and pupil) can be used to calculate where the eye is looking at. Figure 2.2 is a screen capture from an eye-tracking software showing the locations of the points. Figure 2.2: Screen capture from iViewX-software shows corneal reflection (black crosshair) and pupil (white crosshair) Eye-tracking is a novel uem and it is still being explored how it can be used to make products usable. The more experimental methods include eye-tracking facilitated automatic usability testing, remote usability testing with webcams as eye-trackers and retrospective 1 http://www.smivision.com/en/gaze-and-eye-tracking-systems/products/red-red250-red-500.html Chapter 2: Usability evaluation methods 14 think aloud with eye-tracking [1, 16, 24]. Website usability testing has been a popular target for eye-tracking research partly because the stand-alone eye-tracker used in hci environments is cheaper and more accurate than the head-mounted one [25, 55]. The problem using eye-tracking as an uem is the uncertainty how eye movements relate to specific cognitive processes or the usability of a system [19]. Cooke has published several papers on the theme (2004–2006) and has arrived to the next conclusions about eye-tracking: a) a bottom–up approach is best suited with it (not having any preconceived hypothesis about how eye-tracking relates to cognitive processes) [17, 20]; b) it is most valuable in qualitative analysis and it should be used with other uems, such as think aloud [18, 20, 60]; and c) some quantitative measures, such as fixation duration, can be used to evaluate usability [19]. A recent paper suggests that by combining results from several eye-tracking measures, the mental effort during a hci task can be measured [14]. Blinks, pupil sizes, fixations and saccades were measured from participants during tasks where working memory load was varied. Results from blink and pupil data as cognitive load indicators complied with previous studies. However, saccade and fixation data contradicted previous research by exhibiting correlation with cognitive load. All in all, using quantitative eye-tracking as an uem is slightly problematic due to the various possible sources of noise. This is especially so when one tries to keep the test setup and stimuli as natural as possible, which is usually vital for valid usability results. For example, dryness of eyes leads to increased blink rate and changes in screen brightness leads to changes in pupil diameter. Even though these obstacles can be overcome, there remains a difficult question on the behalf of formative usability evaluation (see Section 2.2.1): how to fix a system when it brings about e.g. long fixation durations? Therefore qualitative eye-tracking, which simply enables usability practitioners to see where user looks and what draws their attention, has been traditionally used as an uem instead of the quantitative. In this study, both methods were used and how the eye-tracking results were mapped to usability issues is explained in Chapter 9: Discussion (Section 8.1.2). Chapter 3 Tablet computers and tablet publishing A theoretical and methodological base for this study was established in the previous two chapters. This chapter further narrows the scope of hci to tablet computers. The device in question here, Apple’s iPad, is examined more closely. At the end of the chapter, an overview of tablet publishing is given. 3.1 Touchscreen devices Touchscreens combine input and output of a computer to a single device by enabling direct interaction by touch of a screen with finger or stylus. Touchscreen technology was first introduced in a short paper published in 1965 [39]. It was later used, as first intended, in flight control. The first touchscreens implemented a basic capacitive layer over a cathode-ray tube monitor, which included a mesh of “touch wires” in the front part of the layer and insulated wires in the back. In the abstract of his article, Johnson (1965) foresaw the effect of touchscreens in people’s lives and hci fortyfive years later by stating: “This device, the ‘touch display’, provides a very efficient coupling between man and machine” [39]. Along with capacitive, the more common current touchscreens include resistive, surface acoustic wave and infrared technologies. Most popular of these are resistive and capacitive technologies1 . The development of technologies has made touchscreens so accurate and reliable, that computers with touchscreens can be used without mouse and keyboard. 1 http://whatistouchscreen.com/ 15 Chapter 3: Tablet computers and tablet publishing 3.2 16 Tablet computers A tablet computer consists solely of a touchscreen, which has a built-in central unit. Tablet computers, or tablets, differ from touchscreen-equipped mobile phones by having larger screens and thinner structure than other mobile phones: a typical tablet screen size ranges from seven to ten inches and thickness of the device from 10 to 15 mm. The first attempt to control a computer with a stylus instead of a keyboard was published in 1957 [23]. 1990’s saw some companies release tablet computers as it was made possible to integrate touchscreen and central unit into a mobile device. In 2000, Microsoft released first version of the Microsoft Tablet PC but heavy and faulty tablets were not a viable options to a desktop or laptop computer before Apple’s iPad. 3.2.1 Apple iPad tablet computer After its initial release in 2010, iPad has become almost synonymous to a tablet computer. In the first year of its release, three out of four tablets shipped were iPads. In this year 2012, a recent marketing forecast predicts that half of the tablets shipped will be iPads [36]. Other manufacturers are catching up slowly, but they still have, at the highest, only 5 % of the tablet market share [28]. Apple iPad is a part of the new breed of tablet computers which lack the deficiencies of older generations. It is lightweight and thin and the screen is accurate to touch and clear to look. Long battery life and fast power-up make it truly mobile. The most important technical specifications of Apple iPad2 are listed in Table 3.1. In March 2012, Apple released third generation iPad, which had a display resolution of 2048 × 1536. Apple iPad uses capacitive touchscreen technology. Capacitive touchscreens work by using skin, which is a conductive material, to change the capacitance of the electric field on the touchscreen surface. Capacitive touchscreens are thus dependent on skin contact and cannot be used with gloves or a stylus, like a resistive touchscreen. The operation system in all iPads is the proprietary iOS, which is used in Apple iPhones also. [44] 3.3 3.3.1 Usability of tablet computers Direct manipulation in graphical user interfaces The most fundamental difference between tablet computers and regular computers relating to usability is the input method. Directly manipulated graphical user interfaces (gui) were the next step after command-line based interaction when the term was introduced in 1983 [69]. With direct manipulation, users can handle files as icons, dragging and Chapter 3: Tablet computers and tablet publishing Table 3.1: Technical specifications of Apple iPad2 Size Height: Width: Depth: Weight: 24.1 cm 18.6 cm 0.9 cm 601 g Display Size: Resolution: Features: 9.7 inches = 24.6 cm (diagonal) 1024×768 (132 pixels per inch) LED-backlit, multi-touch, widescreen Connections Wi-Fi Bluetooth 3G Cameras Back camera: Front camera: Other Storage: Battery life: Only in “Wi-Fi + 3G” model HD (720p) 30 fps video recording 5×digital zoom still camera VGA video recording VGA still camera 16/32/64 gigabytes 10 hours Figure 3.1: Apple iPad2 tablet computer from front, back and sidea a Figure and specifications from http://www.apple.com/ipad/specs/ 17 Chapter 3: Tablet computers and tablet publishing 18 clicking them instead of wiriting commands in command line [65]. Direct manipulation of digital objects is the base of a conventional gui. Furthermore, touchscreen devices and other gestural interfaces “take direct manipulation to another level” by allowing users to touch the digital items directly on the screen itself [65]. Tablet computers are providing this with direct touch and gestures being the main means of interaction. This makes the interface of a tablet computer natural. 3.3.2 Natural user interface Natural user interfaces, or nuis, are computer interfaces which can be used by means familiar from real-life, such as speech, touch and gestures [68]. Hands-on and tactile experience allows fast learning of the basic functions and gestures such as tap and swipe (gestures are illustrated later in Figure 5.3). Nevertheless, there are some problems with using gestures as an input method: the lack of established standards for gestures and their actions and the developers’ ignorance about the universal usability principles (complying also with the new devices) are to be blamed for these problems [58]. A recent study by Mauney et al. [45] found that the executions of symbolic gestures, such as characters, had the highest variance between users from different cultures. Another significant user background factor was the previous experience from touchscreen devices. Users who had learned the use logic before swiped from right to left when they wanted to scroll right. Users who had experience only from scroll bars and arrow keys swiped erroneously from left to right to perform the same action. Even so, after few mistakes the basic navigation logic was quickly learned. Another even more serious problem evident in a pure gestural system is the lack of cues and feedback from gestures. Gestures are non-standard, imprecise and unrepeatable by their nature as non-verbal communication. This can be illustrated by an example from a fictional auction, where bidding is done by gestures: One person sneezes and thereby purchases an unwanted painting. A couple argues, and as they wave their hands at one another, the waving gets interpreted as ever-escalating bids. [57] When a user makes a gesture and gets an incorrect response, he or she she cannot know why and how to correct the gesture. A traditional gui with precise and repeatable input methods, do not have these issues. Tablet computers have solved this problem caused by a lack of feedback, by integrating elements from the traditional gui, like icons, menus and help system. As a conclusion, the natural in nui, in the strictest sense of the term, can be debated in the tablet computer domain. [57] Chapter 3: Tablet computers and tablet publishing 3.3.3 19 Previous research The commercial success of tablet computers was started by iPad in 2010, so not much research has yet been published. A good amount of research can be found if the scope is broadened from tablets to consider e-reading devices also, which have been around longer. E-reading devices are specialized devices used solely on reading e-books, unlike the more general-purpose tablet computers [70]. A summative usability study with e-reading and tablet devices can be approached from two directions. The research compares either the usability of devices with the same applications (e.g. [70]) or usability of applications with the same device (e.g. [34]). However, a research to generate general e-book design guidelines for software and hardware utilized both approaches having several applications and devices in the tests [80]. Some design guidelines and usability problems are analogous with e-reading and tablet devices, especially so when a a tablet computer is used for reading. A summarizing research found four categories of “usability barriers” (i.e. usability problems) from ebooks, which have hindered the acceptance of e-reading: screen readability, navigation, portability/physical, and network connection [33]. Another research shows that “ease of use is highly associated with ease of navigation” [15]. Navigation seems to be the category that users find most difficulties in e-reading devices [33]. 3.3.4 iPad specific research Although iPad is not solely an e-reading device, it has been found to fare well in comparison with them [31, 35]. However, being a new device at the market, it has its own usability problems. Most of these problems are application dependent, meaning that developers are not yet adapted for iPad. The most comprehensive iPad usability studies have been published by Nielsen & Budiu in 2010 and 2011 [10, 11]. The findings from the studies are recapped below: Read-tap asymmetry Content that is large enough to read but too small to tap. Too small touchable areas too close together Leads to accidental activation. Accidental activation Particularly problem in apps lacking a back button. Low discoverability Active areas that do not look touchable. Chapter 3: Tablet computers and tablet publishing 20 Poor typing Users disliked the typing on the virtual keyboard on touchscreen. Splash screen A compulsory introduction screen irritates users. Swipe ambiguity If multiple items on the same screen can be swiped, navigation (e.g. swiping to turn page) is impaired. Information squeezed into too small areas Making the content harder to perceive and manipulate. Too much navigation Large number of navigation options gives one less space. E-reading devices already have legibility similar to print so the biggest obstacle in popularizing e-reading has been the poor usability [70, 76]. Tablet usage can be generalized as being mainly media consumption with news, magazines and books having a large share of it [47]. With slowly declining subscriber numbers, many publishers have realized that the already widely distributed tablet computer could be a solution to popularize electronic reading and make digital publishing viable addition to print. 3.4 Definition of magazine The word magazine meant a storehouse when the first magazine-papers appeared in the mid 18’th century [40]. The contemporary meaning of magazine can also be considered analogous to the former; a storing place for knowledge, ideas and opinions. A magazine is separated from a newspaper in many ways. They are not as topical as newspapers, but more in-depth and specialized: stories being features, not news. Also, they do not appear daily, making them more unique and long-lasting (print magazines stitched or glued like books). A typical magazine business model varies by market place. An average us consumer magazine gets 54 % of its income from advertisers and 46 % from magazine sales [40]. Finnish magazines, due to the realities of a smaller marketplace, get majority of their income from magazine sales. An average Finnish consumer magazine gets 70 % from magazine sales and only 30 % from advertisers [74]. 3.4.1 Definition of tablet magazine Tablet magazine is simply a digital version of a print magazine, which can be read on a tablet computer. Tablet magazines come in various forms. They can be categorized Chapter 3: Tablet computers and tablet publishing 21 either by the type of distribution or by the form of the digital magazine. The distribution of tablet magazines to the devices and consumers can be handled with downloadable viewing applications or by making magazines available online. The form of a tablet magazine varies from carbon copy print replicas to magazines with rich multimedia content and interaction possibilities. Issues of application-based magazines or newspapers can be purchased or subscribed to after the application has been acquired. These magazines are usually pdf-style print replicas with varying amount of multimedia content and interaction possibilities. The new html5 standards have made it possible to layout online magazines without restrictions and it has paved the way for web-based digital magazines. It has also enabled dynamic layout, which makes it possible for magazines to adapt to different screen-sizes and orientations. Figure 3.2 shows examples of different tablet magazines and newspapers. The biggest daily newspaper in Finland, Helsingin Sanomat, is distributed through an iOS application2 . Suomen Kuvalehti3 , on the other hand, is a platform independent html5 magazine and can be viewed by any device having a modern web browser. Finally, compilation magazines, such as Flipboard4 , are worth mentioning. They are software that gather news and stories from multiple online sources. Application- and web-based magazines are discussed in more detail below. Figure 3.2: Different types of tablet magazine solutions, from left: application-based, web-based, and a compilation magazine 3.5 Tablet magazine publishing Electronic publishing works the same way as print publishing process until the last step. Distribution of the product is handled by making digital copies available, not by printing physical copies. The form of electronic publication of a newspaper or magazine for an 2 http://asiakaspalvelu.hs.fi/tilaus/hsipad/ http://suomenkuvalehti.fi/jutut/kotimaa/digilehti 4 http://flipboard.com/ 3 Chapter 3: Tablet computers and tablet publishing 22 instance is still finding its shape. The easiest method has been to directly copy the publication for web, as the digital files used for printing are already available [40]. The launch of iPad in 2010 introduced a new ecosystem for digital publishers, App Store, which is based on application, or “app”, sells. A publisher makes an application for App Store and after user has downloaded the app, it can be used to purchase and read issues of the publication. Native iPad applications can allow more flashy animations and interaction possibilities than web-based publications. However, Stevens (2011) has predicted that in the near future, users have gotten bored of the unsubstantial additional value offered by applications. As a result, web will prevail over applications as a publishing platform, being more flexible and widely available: The publishing industry will quickly come to an understanding that there is already a much more efficient and flexible means of publishing to the iPad and it already exists. It is called a website. [72] Whichever being the course of development of the distribution and business model of digital publishing, recent study shows that tablets and e-reading devices are already encouraging users to consume more magazines. Of the 1009 us mobile magazine readers surveyed by The Association of Magazine Media in the late 2011: a) 90 % consume as much or more magazines since they acquired a mobile device; b) 66 % plan to consume more digital magazines; and c) 63 % want more digital magazine content.[48] So there is a demand amongst consumers, at least amongst those who already own a tablet, for quality digital publications. Another us-based survey found out that 67 % tablet owners would read a tablet magazine rather than a print one, when both were available. However, 65 % reported print to be more satisfying to read 5 . When the problems of converting a print magazine succesfully to tablets have been solved, increasing e-reading sales could replace decreasing print sales. 3.5.1 Tablet publishing in Finland As mentioned before, magazine publishing business in Finland deals with different realities than in bigger markets. The global decline of print publications sales has affected Finnish publishing houses also. 74 Finnish magazine chief editors answered a survey by Aikakausmedia in May 20116 investigating the attitudes towards digital publishing and tablet magazines. Only one in three magazines believed that a digital version for mobile reading devices from their publication will be made available in the next five years. 5 % of Finnish magazines had already made the transition in 2011. 5 http://www.gfkmri.com/assets/PR/GfKMRI_020312PR_DigitalUpdate.htm http://www.aikakauslehdet.fi/Etusivu/Ajankohtaista/Tiedotteet/default.asp?docId= 31423 6 Chapter 3: Tablet computers and tablet publishing 23 Moreover, Another survey by Sanomalehtien Liitto from the beginning of 20127 shows that newspapers have adopted the electronic distribution venue more widely. Majority of the daily newspapers have a tablet version and the rest are planning or considering implementing it, according to the survey. The reason for a slow start towards tablet publishing amongst magazines is the unpredictable marketplace. The chief editor of MikroPC (see Table 4.1) has said: It is not yet known, what could be a good distribution model, will the devices be application or browser-based, and what could be the business model. Pioneer users are already waiting for the new distribution channels, but first we need to get the system running.6 Indeed, there have been several approaches towards tablet publishing in Finland. The most recent examples are from Suomen Kuvalehti and Helsingin Sanomat. Suomen Kuvalehti, published by Otavamedia and having 310 000 readers, has been available as an iPad application from 2010. In April 2012, they decided to change the digital magazine version from an app into an html5 version, which can be read with all devices with an internet browser: mobile phones, tablets and desktop computers alike8 . At the same time, Helsingin Sanomat—leading daily newspaper in Finland published by Sanoma News and with 905 000 readers9 —launched a subscription model where reader gets an iPad and 2 years subscription to the digital newspaper for a monthly fee10 . 7 http://www.sanomalehdet.fi/index.phtml?s=2799 http://suomenkuvalehti.fi/jutut/kotimaa/digilehti 9 http://www.levikintarkastus.fi/levikintarkastus/tilastot/Levikkitilasto2011.pdf 10 http://asiakaspalvelu.hs.fi/tilaus/hsipad/ 8 Chapter 4 Magazine in the tests In this chapter, the magazine used in the study is presented. An issue of the magazine and four different versions of it for tablet computers are examined more thoroughly. 4.1 Tietokone magazine Tietokone is a monthly Finnish magazine concentrating on computers and information technology in general. The circulation of the magazine in 2011 was 33 828, which was 11.9% lower than the year before1 . Total audience in 2011 was 113 000 making the readers-per-copy ratio 3.342 . The biggest competitors in Finland for Tietokone are MikroPC and Mikrobitti. The three magazines with their key numbers are compared in the table 4.1 below. A typical reader of Tietokone magazine is over 40 year-old (53 % of the readership) male (86 %) office worker with a high income (48 %) living in a big city (71 %)3 . As mentioned before, the common phenomenon for all print magazines is that circulations 1 http://www.levikintarkastus.fi/levikintarkastus/tilastot/Levikkitilasto2011.pdf http://www.levikintarkastus.fi/mediatutkimus/KMT_Lukija_2011_perustaustat.pdf 3 http://www.sanomamagazines.fi/mediabank/document/4042.pdf 2 Table 4.1: The three biggest IT-magazines in Finland Magazine Circulation Audience Readers-per-copy Publisher Tietokone 33 828 113 000 3.34 Sanoma Magazines Mikrobitti 71 429 255 000 3.57 Sanoma Magazines MikroPC 28 462 90 000 3.16 Talentum 24 Chapter 4: Magazine used in the tests 25 are declining and so has happened to Tietokone magazine as well. But with a computersavvy reader profile such as this, the transition towards digital and e-reading magazines could be easier. In 2011 Tietokone launched their version for iPad, which includes the same content than in print. It remains to be seen, whether the iPad version can boost the circulations and income. Rest of this chapter is dedicated in examination of the four versions (including the retail version mentioned above) of the Tietokone magazine compared in this study. The most distinctive and relevant differences in the usability point of view are discussed. The publication in question is the June 2011 issue and all the usability discussion in this study is based on this issue alone. All the versions here have the same content, text and images, only differences being in the form. The different interaction possibilities in each magazine are presented below. After that, more generic differences are discussed along with summarizing illustrations from each magazine. 4.2 Retail version Tietokone magazine viewing application (“Tietokone for iPad”) is free to acquire from the Apple’s digital application marketplace, App Store. After that, users can purchase single digital issues of the magazine with the price ranging from 4.99–8.99 euros (a newsstand copy costs 8.50 euros). Currently issues of the magazine are manually laid out with Adobe dps4 software. The issue used in this study was made with Adobe Indesign together with Woodwing5 (hence the acronym ww) but it is no longer available for purchase. The iPad magazine is almost a direct copy of the print version, with some benefit from the digital in the form of few interactive features. The interactive features added to this version (besides basic navigation from page to page) are hyperlinks, image interaction, scrollable portions of page and navigation shortcuts. Hyperlinks in retail version Hyperlinks are found on the cover page and in the table of contents page, or toc. Tapping headlines (text and/or image) brings user to the first page of article in question. Cover and toc page are almost a carbon copy of the print, so the hyperlinks are not visibly separated from regular text and images. Few hyperlink buttons that open a pop-up window are found inside articles also. No hyperlinks lead user outside the magazine. Figure 4.1 shows different hyperlinks found in ww. 4 5 http://www.adobe.com/products/digital-publishing-suite-family.html http://www.woodwing.com/en/tablet-publishing-overview Chapter 4: Magazine used in the tests 26 Figure 4.1: In ww, headlines in cover and toc are hyperlinks to the corresponding articles, a ⊕button opens a pop-up window with additional information inside an article Image interaction in retail version Some kind of image interaction is found in about every other image in the magazine. Tapping an image does different things depending of the image. It can a) enlarge an image to full-screen; b) enlarge an image by little; c) show the caption of the image; or d) open a pop-up window with some additional content inside. Possible actions, if there are any, are not indicated prior to touch. Figure 4.2 shows examples of all four. Figure 4.2: The four different image interaction possibilities in ww from top left: a pop-up window, enlarge image to full-screen, show image caption and enlarge image by little Chapter 4: Magazine used in the tests 27 Scrollable portions of page in retail version In some articles, portions of page can be scrolled horizontally or vertically by swiping. These can be text columns, images or complete articles, as shown in Figure 4.3. The scrollable portions are indicated by a small “triple guillemet” sign (>>>) on the corner of the scrollable area. Figure 4.3: Different scrollable portions of a page in ww from left: scrollable text column, scrollable image and scrollable article Navigation shortcuts in retail version All the navigation shortcuts are found in the toolbar. Toolbar is opened by tapping the lower edge of a page, which reveals six buttons, as shown in Figure 4.4. The functions of the buttons starting from the left are a) go to cover; b) go to toc; c) open a page browser6 ; d) open library where the previously purchased magazines can be accessed; e) open a pop-up window to the homepage of Tietokone magazine http://www.tietokone.fi/; and f) go to the store, where new magazines can be purchased. 4.3 AnyReader version This version of the magazine was automatically laid out with a dynamic layout software developed by a Finnish company Anygraaf. When all the text and images have sufficient metadata, a dynamic layout can be compiled from the content, which adjusts to any screen size. The layout can be accessed by AnyReader7 , which was in prototype phase at the time of this study but is now available in App Store, Nokia Store and Google Play. AnyReader, or ar, version of the Tietokone magazine differs radically from that of print. Even though the content is the same, the differences in layout and navigation are substantial. ar has two hierarchical levels for navigation: top level view, which is like 6 This did not work in the tests. Users who found this feature (5/10) during tasks, were instructed to ignore it. Later, during the free browsing phase, the page browser feature was showed to all users from an another iPad and they were asked to rate the magazine in sus as the feature would have worked. 7 http://www.anygraaf.fi/fin/eng_frontpage/anyreader__tablet_and_smartphone_ publishing_system_397.html Chapter 4: Magazine used in the tests 28 Figure 4.4: Toolbar and functions of four toolbar buttons in ww, from top left: page browser (did not work in the tests), library, homepage and store a toc spread across multiple pages horizontally and a section view, where articles from the same section are presented side by side horizontally on different pages. ar offers the following interactive features: hyperlinks, image interaction, adjustable font size and navigation shortcuts. Hyperlinks in AnyReader version Articles can be accessed via tapping hyperlinks on the top-level. The hyperlinks are large square areas of a page consisting of a headline, image (sometimes) and the beginning lines of the article body text. This is illustrated in Figure 4.5. Figure 4.5: In ar, articles are accessed by tapping a hyperlink in top-level Image interaction in AnyReader version Image interaction inside an article is indicated by a symbol on the corner of toplevel image. When an image within an article found under this symbol is tapped, Chapter 4: Magazine used in the tests 29 an image carousel is opened and all the images in the current article are browsable. This action is illustrated in Figure 4.6. Figure 4.6: Tapping an image in article opens an image carousel in ar, where images of the same article can be browsed Adjustable font size in AnyReader version ar was the only magazine version in this study with an adjustable font size. There are two ways to change the font size, either by using a gesture or by a button in the toolbar. A spread gesture performed on a page enlarges the font size of the magazine and a pinch gesture shrinks the text. + and − buttons in the toolbar performs the same actions respectively when touched. The change of text size changes the layout of the magazine also, as illustrated in Figure 4.7. Figure 4.7: Tapping the “−button” in ar toolbar shrinks the text size and layout adjusts accordingly Navigation shortcuts in AnyReader version Navigation shortcuts in ar are found from the upper and lower edges of the screen. An always-visible toolbar with four buttons is on top of the screen. + and − change the text size as mentioned before. First button on the left, the “home button”, returns user to a library view from where previously purchased papers can be accessed. The second button from the left, the ”back” button, returns user to an upper hierarchy level in the magazine. Chapter 4: Magazine used in the tests 30 Figure 4.8: Toolbar navigation shortcuts in ar: “toc button” brings user to top-level and “home button” exits to library Alternative orientation in AnyReader version When the device is rotated 90◦ , the magazine layout adjusts automatically making it possible to use ar in portrait or landscape orientation. Figure 4.9: Layout changes in ar after rotating the device 90◦ 4.4 “Fancybox” web-based magazine The html5 web-based magazine was developed in the department of Media Technology of Aalto University. It implements features from the newest generation of html/css as well as from JavaScript. Also, algorithms for automatic image alignment, cropping and main color extraction are used. Baker8 framework is used for enabling html5 magazine viewing on iPad, whereas Friar9 is used in Android devices. fb and ps are two versions of the same html5 magazine with different image browsing techniques. fb, short for Fancybox, is a pop-up image gallery, where each image opens 8 9 http://bakerframework.com/ http://www.friarframework.com/ Chapter 4: Magazine used in the tests 31 in full screen, when tapped. ps, short for Photoswipe, is an image carousel (like in ar), which opens when an image is tapped and from where images of the same article can be browsed without returning to the article. Hyperlinks in Fancybox version First page of the magazine is toc, where all articles can be directly accessed via tapping the hyperlinks, instead of browsing through the magazine. The hyperlinks are wide rectangular buttons consisting of a headline, image and lead text. This is illustrated in Figure 4.10. Figure 4.10: In fb and ps, articles can be accessed by tapping a hyperlink in toc Image interaction in Fancybox version As mentioned above, images (if not in full-screen width already) are enlarged to a full-screen pop-up window when tapped. Whether image opens or not, is not indicated. Figure 4.11 shows an image pop-up window. Figure 4.11: Image opens to a pop-up window in fb Navigation shortcuts in Fancybox version fb and ps have a hidden navigation bar similar to ww’s page browser illustrated in Figure 4.12, which is accessed by a double-tap anywhere on the screen. The Chapter 4: Magazine used in the tests 32 navigation bar shows all articles side-by-side and can be scrolled horizontally. How to open the navigation bar, is not indicated. Figure 4.12: The navigation bar in fb and ps Alternative orientation in Fancybox version When the device is rotated 90◦ , the magazine layout adjusts automatically. This makes it possible to use the magazine in portrait or landscape orientation, as shown in Figure 4.13. Figure 4.13: Layout changes in fb and ps after rotating the device 90◦ 4.5 “Photoswipe” web-based magazine Image interaction in Photoswipe version As mentioned above, some images are enlarged to an image carousel pop-up window when tapped, which is illustrated in Figure 4.14. From the carousel, all the images of the same article section can be browsed by swiping or by controls on the bottom. This was only magazine where images could also be zoomed with a spread Chapter 4: Magazine used in the tests 33 gesture in the carousel view. Whether a carousel opens or not when tapping an image, is not indicated. Figure 4.14: After tapping an image in ps, image carousel opens and all images in the article can be browsed Navigation shortcuts in Photoswipe version See Section 4.4. Alternative orientation in Photoswipe version See Section 4.4. 4.6 Structural differences in magazines Figures 4.15, 4.16 and 4.17 illustrate the differences between the magazines. Table 4.2 gives an overview of the differences between the four magazines and their user-interfaces. ww has all articles available through swiping. Shorter articles in the beginning and in the end of the magazine are stacked vertically on top of each other. Longer articles in the mid-section are separated horizontally to different pages and if they do not fit in one page, they are continued vertically below. Some sections of articles, like tables and additional information, are not visible directly but need to be accessed by tapping a hyperlink (see section 4.2. In ar, small bits (headline, picture, lead) from all articles are presented in the top-level, which is several pages wide horizontally. Articles can be accessed by tapping on the square presenting the article (see section 4.3). This brings user to lower level, “articlelevel”, where all articles are stacked side-by-side horizontally. If article does not fit in the first page, it is continued below. Articles are grouped together based on the section of the magazine they belong; unlike in ww, fb, and ps where articles are mixed and in the same order as in print. Unlike in ww and ar, in fb and ps, all of the content is available through swiping alone. Tapping only enlarges images (see sections 4.4 and 4.5) and brings out the navigation Chapter 4: Magazine used in the tests 34 bar (4.4). Like in ww, shorter articles in the beginning and in the end are arranged on top of each other. All articles in the mid-section are separated horizontally and when the article is too long to fit one page, it is continued below. toc is available in the first pages of ps, fb, and ww. In ar, the whole top-level can be considered as a toc of sorts. Horizontal transitions are paginated in every magazines, i.e. browsing between articles is done in steps. In fb and ps, vertical transitions are stepless, i.e. articles can be scrolled up and down continuously, like a web page. ww and ar has paginated vertical scrolling inside articles. Figure 4.15: An overview of the navigational structure of ww magazine (circles and lines indicate tap and transition, three dots indicate omitted pages due to space constraints) Chapter 4: Magazine used in the tests Figure 4.16: An overview of the navigational structure of ar magazine (circles and lines indicate tap and transition, three dots indicate omitted pages due to space constraints) 35 Chapter 4: Magazine used in the tests Figure 4.17: An overview of the navigational structure of ps and fb magazines (circles and lines indicate tap and transition, three dots indicate omitted pages due to space constraints) 36 Chapter 4: Magazine used in the tests 37 Table 4.2: An overview of the magazine user-interfaces WW AR PS & FB Structure Top-level (“toc”) separated from article-level, articles separated horizontally on article-level Articles separated horizontally Articles separated horizontally Pagination Paginated Paginated Continuous Columns Mainly two Mainly two (changes with font size) One Table of contents Hyperlinked list of headlines and leads Top-level consists of hyperlinked collection of headlines and images Hyperlinked list of headlines, leads and images Navigation bar Page browser found inside the toolbar Visible on page change Page browser visible on double-tap Toolbar Visible on tap to bottom, 6 buttons Always-visible, 4 buttons No Adjustable font size No Yes, layout changes accordingly No Image carousel No, images opened separately Yes No, images opened separately (fb)/Yes (ps) Image zoom No No No (fb)/Yes (ps) Chapter 5 Experiment setup In the previous chapters base of the research, the device and the research material have been introduced. This chapter shows the setup for the user tests. The setup is explained in a manner so that a similar research could be conducted with these instructions. Selection of users as test subjects is also discussed. All the areas of the context (users, environment, tasks) in this usability research are considered. 5.1 Chosen methods As mentioned before (see 2.1), different usability evaluation methods test different parts of the hci. uems chosen for this study were think aloud, performance measures, questionnaires and eye-tracking. Also, a heuristic evaluation for the magazines was done prior to users tests to help in the task design. One goal of this study was to evaluate and compare the usability of dynamic and manual layouts. Summative evaluation was needed to rank the layouts. In order to do this type of evaluation, performance measures and questionnaires were chosen as methods for gathering quantitative data. The dynamic layout systems used in three of the tested magazines (ar, fb and ps) were still being developed, so a formative analysis was also done to find and address any usability problems that could be corrected. Think aloud was chosen for the large amount of qualitative data it produces. Finally, eye-tracking was selected by default. The emphasis for this thesis was from the very start on how to evaluate iPad magazine usability with eye-tracking. A nonintrusive eye-tracking device was employed, so all uems could be used simultaneously with minimum effect to each other. Both qualitative and quantitative data was gathered through eye-tracking the users and it was used both as a summative and formative evaluation method. 38 Chapter 5: Experiment setup 39 Table 5.1: Test user statistics AR WW FB PS All 5.2 Age 27.3 23 26.8 26.3 25.85 Males/Females 8/2 6/4 8/2 7/3 29/11 Proficiency 3.2 3.8 3.1 3.9 3.5 Users A sufficient amount of qualitative data from a user-interface usability can be gathered from five users with think aloud or eye-tracking (see 2.1). Quantitative data, on the other hand, needs as many test subjects as possible for the measurements in order to get statistically significant results. Four groups of ten users was used, keeping the amount of resources spent in this study reasonable. Because four systems were evaluated in the tests, a between-subject testing was selected to keep the test time approximately in an hour. Within-subjects testing would have rendered the amount of time spent with each version to less than fifteen minutes in an hour-long test, which would have been insufficient. Most of the users were recruited with an Aalto-university newsletter. The compensation for participation was a movie ticket. No pre-requisites for users were imposed other than having Finnish as mother tongue. The user statistics in the four magazines are summarized in Table 5.1. Along with basic information, prior tablet computer and magazine experience and knowledge were surveyed with several “yes/no” questions before the tests, such as “Do you own a tablet computer?”, “Have you used a tablet computer before?”, and “Have you read Tietokone magazine before?”. A proficiency level from 1–8 was calculated for convenience to summarize the results of these questions simply by giving “yes” answers value of 1 and “no” answers 0. The pre-test survey (in Finnish) can be found in Appendices. 5.3 Test setup An overview of the test setup used in this study is shown in Figure 5.1. Users sat in an adjustable chair next to the iPad, which was attached to the eye-tracked monitor. The test facilitator sat to the left and behind the user so any movement by him did not distract the user. From this angle, the facilitator could also observe the user better than directly from behind or to the side. Video camera was also placed behind and to the side of the user for the same reasons: not to distract and for better view. Chapter 5: Experiment setup 40 The test instructions (found in Appendices) and pencil could be placed on the table left or in front of the user, depending on user preference. The monitors of computers running the experiment and eye-tracking software were placed so that they were visible to the facilitator but not to the user. The experiment software was operated with a keyboard (start/stop tasks) and a mouse (start experiment and calibration) in front of the facilitator. After the tasks, the iPad was released from the frame and the user held the device during the free browsing part. Figure 5.1: Test setup, showing 1: user, 2: facilitator, 3: iPad and eye-tracking system, 4: video camera, 5: computer recording eye-tracking data, 6: computer running experiment software. 5.3.1 Eye-tracking system setup smi eye-tracking system1 together with Epiphan Frame Grabber2 was used to enable eyetracking. A special setup was necessary to allow eye-tracking iPad with a stand-alone eye-tracking device. Epiphan Frame Grabber hardware and software was used to stream the iPad video-out signal to the monitor being eye-tracked. As shown in Figure 5.2, the window streaming iPad video-out signal was resized and placed directly under the iPad. As a result, when 1 http://www.smivision.com/en/gaze-and-eye-tracking-systems/products/ red-red250-red-500.html 2 http://www.epiphan.com/products/dvi-frame-grabbers/dvi2usb/ Chapter 5: Experiment setup 41 users looked at the iPad, their gaze was tracked to the iPad video-out signal, thus allowing iPad eye-tracking. The setup was not optimal, however. The eye-tracking device was fixed below the monitor, between users’ feet. When users interacted with the iPad, especially with horizontal swipes, the hand blocked the signal from eye-tracking device to the eyes. This resulted in gaps in the eye-tracking data. smi Experiment Center was used to operate the eye-tracking of the tasks. More specifically, the Screen recording feature was started for every task (besides task 1, the practice), which recorded gaze on the desktop, including the Frame Grabber window. smi iViewX ran from a laptop, which captured the data from the eye-tracker device. Figure 5.2: An Epiphan dvi2usb Frame Grabber window was placed directly under the iPad 5.4 Test protocol Every test was conducted with a following protocol: 1. Before user arrived, iPad was wiped clean, the magazine was set to the practice article (“Nanokoossa kaikki on toisin”) and the Experiment center was set up for the user. Chapter 5: Experiment setup 42 2. When arrived, user was welcomed to the test session and asked to turn off any mobile phones. 3. A short overview of the test was given to allow proper orientation. 4. User was asked to read the first page of the test instructions. 5. Before doing the practice task, one eye-tracking “demo” calibration run was done to introduce user to the system. The fact that first task and calibration were for practice and not recorded, was not told to users, as instructed by Nielsen [55]. 6. User was reminded to note the instructions for gestures on the wall on their left, shown in Figure 5.3. Video recording was started. 7. User did the first task to practice think aloud and magazine interaction. Task descriptions were asked to be read aloud to make following think aloud more effortless. 8. Calibration was ran five times, and the most accurate result was selected. 9. User did five tasks, answering the satisfaction questionnaire after every task. 10. Calibration was done again five times. 11. Rest of the tasks was done. 12. After eleven tasks, eye-tracking was quit and the iPad was given to the user, who now was instructed to freely browse the magazine for five minutes continuing think aloud. During free browsing, if user had missed some features of the magazine, he or she was instructed how to find them. Free browsing was thus made to simulate the magazine use like the magazine was familiar to the user beforehand. 13. After free browsing, the user was asked to fill the sus questionnaire based on the tasks and free browsing. 14. At the end, user was thanked for collaborating in the study and a movie ticket was given for compensation. Test instructions including task descriptions, satisfaction and sus questionnaires, are included in Appendices. Figure 5.3 was presented to users and it shows the six gestures needed to fully operate each magazine. Pinch and spread gestures were used to zoom out and in of images (ps) and adjust text size (ar). Tap is used to operate hyperlinks and open images in each magazine. Double-tap brings out the navigation bar in fb and ps. Slide is a more accurate version of swipe; both are used to move and scroll within the magazines. Chapter 5: Experiment setup 43 Figure 5.3: Gesture instructions for novice users (from top left: tap, double-tap, slide, pinch, spread and swipe)a a Adapted from: http://www.lukew.com/ff/entry.asp?1071 5.5 Tasks Eleven tasks were presented to users. The first task was the same to all users, but the remaining ten tasks were presented in random order to prevent the different effect of a certain task order in different magazines [64]. A heuristic evaluation on the magazines was conducted to aid the task design. (Notes from the evaluation are attached to Appendices.) Tasks were designed to steer users towards usability problems to see how they would cope with them. The magazines tested had different usability problems in different parts. The challenge was to address problems evenly between magazines at the same time emulating as natural and broad magazine usage as possible. The tasks were designed to be realistic in order to get relevant eye-tracking data [55]. Scenarios, such as “Task 6: Your friend has recommended you to read a column at the end of the magazine...”, were used in task design but later dropped in order to allow the task descriptions to be easily remembered. Goal-oriented tasks, as these, allow effective qualitative eye-tracking analysis, when the researcher knows what user is trying to find [60]. The eleven tasks are presented below with an aspect of usability being tested and usability problems found prior to user testing. Usability aspects presented in the rightmost column are broader and more related to the tablet magazine context than the heuristics. Task numbers found in the title column are used to identify the tasks in future chapters. For a complete task description (in Finnish), see Appendices. Objective Usability problem Us. aspect In ww, the image, which was required to be opened, was not placed besides the corresponding story. 4: “Tietoturvaa iPadiin ja iPhoneen” The task was to find the advice the magazine gives in case of a stolen iPhone. In ww, the items 8–10 were hidden behind a scrollable column (see Section 4.2). In ar, some of the pictures of the devices were hidden behind a link (Section 4.3). In fb and ps, only a part of the list was visible at a time. This was a test for reading and information screening: how easy it was to find a keyword from a mass of text. 3: “Kriisi 2.0” The task was to find out what project was also called “the Wikipedia of maps”. 5: “10 vekkulia usb-lelua” The task was to choose the most interesting usb toy from the article. In ar and ww, the results of the test were not visible directly in the article but hidden behind a hyperlink. 2: “Järkkäristä tuli videokamera” The task was to find the best camera model from a camera test article. Visibility Layout Readability Visibility 1: “Nanokoossa kaikki on toisin” The first task was designed as a practice especially for those who had not used a tablet computer before. All the gestures on how to interact with the magazine were shown (Figure 5.3) and user was instructed on the basic use of the magazine, i.e. how to swipe to proceed inside an article and from one article to another and how to tap to open an image. Task title Table 5.2: Task overview Chapter 5: Experiment setup 44 9: “Suljettujen ovien takana” The task was to browse through all the images of a photo feature. 8: “Sähköinen lukeminen maistuu jo” The task was to examine an information graphic and choose one interesting fact from it. 7: “Tietokoneen tulevaisuus on täällä” The task was to choose one tablet computer from a comparison table and to find its picture. 6: “Pakina” The task was to find an article written by a pseudonym “Kiukkuinen ict-johtaja” which was hinted to be at the end of the magazine. In ar, most of the images are not visible in the article, but are hidden inside an image carousel (see section 4.3). In ww, the images exhibit different actions when tapped (4.2). In fb, every image has to be opened separately (4.4). In ps, images were divided into several image carousels (4.5). In ar and fb, the image could not be zoomed. In ww, only a portion of the image was visible at a time. In ar, the review article was split into two, with technical specification comparison table situated in different article than the images of the devices. In ww, the comparison table was hidden behind a link. The lack of page numbers (ww) and toc (ar) were predicted to affect user’s sense of orientation inside the magazine. Also, the article being short and belonging to a section with multiple short stories, it was not visible in any tocs. Image interaction Image interaction Visibility Navigation Chapter 5: Experiment setup 45 11: “Kolumni” The task was to find an article by Jyrki Kasvi. 10: “Suunnistuksen uudet tuulet” The task was to choose the best navigation software, according to the outlook of the user interface. In fb and ps, the writer’s name is not visible in the toc. In ar and ps, images were shown separately in an image carousel. In ww, the information related to an image was hidden behind a link. Navigation Image interaction Chapter 5: Experiment setup 46 Chapter 6 Analysis As a result of the user tests explained in the previous chapter, large amounts of data was acquired. This chapter explains the various ways the data was analyzed to obtain the results. 6.1 Task time Task time measures the efficiency part of the usability as defined in the iso standard [37]. Navigation between articles had the biggest differences between magazine versions (see Section 4.6) so a task browsing time, i.e. how long it took a user to arrive at the correct article (but not yet finding the answer), was calculated also. This allows describing the performance of browsing more accurately. Data from ten tasks (tasks 2–11) were acquired from each user, resulting in total four hundred task time samples for total task time and task browsing time each. In tasks 6 and 11 (see Section 5.2 for task descriptions), where users were to find correct article, task browsing time equaled total task time. In other tasks, where answers were found later inside the articles, the browsing time to reach the correct article was measured from the video recordings. All task attempts were included in the calculations. If task completion took more than the time limit allowed (12 cases out of 400), they were marked to take 300 seconds. Also, if user aborted the task execution before the five minute time limit (6/400), it was treated as incomplete. Wrong answers for task questions (4/400, only found in task 4 when users misread iPhone as iPad in the task description), were treated as correct in this measurement. All task attempts had to be taken into analysis to get adequate and equal amount of samples for each magazine. Task completion was addressed separately. Total task and task browsing times were averaged over the four magazine versions (10users × 10tasks = 100samples/1magazine). In addition, task-specific averages of 47 Chapter 6: Analysis 48 the times was calculated to reveal in which tasks were the greatest differences between magazines (10users × 1task = 10samples/1task). Other key numbers, such as median, standard deviation and range are presented as well as instructed by Rubin [64]. The margins of error were calculated using 95 % confidence level and it is presented with the corresponding task times in the next chapter. 6.2 System Usability Scale and Single Usability Metric A System Usability Score, sus, was calculated for each user (as explained in Section 2.3.4) and averaged over the four magazines. The margins of error were calculated using 95 % confidence level and they are presented with the correspondent sus score in the next chapter. sus was asked at the end of the test session, immediately after free browsing. Therefore, it can be thought to represent the normal use of the magazine also, not just task performance. Finnish translation of the sus questionnaire can be found in the Appendices [75]. Before calculating the Single Usability Metric (as explained in Section 2.3.4), sum, few exceptions were made related to suggestions for specifications. A specification value, a reference to determine good and bad usability, for all the measurements was determined. For number of errors and completion rate this is always “no errors” and “successful completion”. Number for opportunities for errors, the different situations where users could make an error, was determined as 4 for almost all the tasks: 1) return from previous task (e.g. user has problems closing an opened image); 2) navigate to upper level (e.g. user swipes to toc instead of shortcuts); 3) navigate to article (e.g. user goes to wrong article); and 4) find an answer (e.g. user leaves the correct article). For tasks 6 and 11, where the goal was to find the articles, the last opportunity from the list was excluded. An average from completed tasks was used as a task time reference rather than the average of only the most satisfied users. This was necessary because in some tasks, the number of satisfied users was too small for averaging a representative number for a reference. The median (3.65) was used as a reference for satisfaction score as suggested by Nielsen & Levy [52]. The input data for calculations was acquired as follows: task time was measured as explained before; task completion was a binary measure: “1”, when user gave correct answer before the time limit, otherwise “0”; number of errors (range 0–4) were calculated from the videos; user satisfaction was the average from the three Likert scale questions (see Section 5.4). Chapter 6: Analysis 49 Weighting, standardizing and averaging was done with a tool made available by one of the authors of sum1 . A sum score is task-based by nature and can be averaged to get a score for the whole system. Results from sum analysis is presented in the next chapter along with 95 % confidence level margins of error. 6.3 Quantitative eye-tracking smi eye-tracking device records pupil diameter along with pupil location. As well as automatically adjusting the amount of light arriving to retina, in previous research pupil diameter has also been found to correlate with cognitive load during short term memory tasks [41]. For this study, the hypothesis was that the pupil dilation is a measure of mental effort in hci (as presented in a recent article [14]) and thus would correlate with other task usability measures as well. For every user, pupil diameter data was extracted from the smi BeGaze eye-tracking analysis software as ascii .txt files. An Excel macro was programmed to handle the data. First, all samples where one or both eyes were not measured, were dropped. Then, average pupil diameter (averaged for both eyes also) was calculated for tasks. Sample size below hundred/task was considered insufficient to give a representative average. Therefore, tasks with less than 100 samples (16/400) were excluded from the final analysis. Average fixation duration has been proposed as one possible quantitative usability measure derived from eye-tracking, indicating the complexity of a user-interface [29]. Average fixation duration was calculated similarly to the pupil diameter. Only difference being that tasks with samples under 10 (48/400) were dropped. The eye-tracking device failed to record data from one ww user altogether, which accounted for 10 zero-sample tasks for both quantitative measures. 6.4 Think aloud and qualitative eye-tracking Think aloud data was analyzed in two ways: textually and verbally together with eyetracking. For textual analysis, all the think aloud videos were transcribed including task and free browsing phase from every user. Then, the qualitative data analysis software Atlas.ti2 was used to code users’ speech. For example, if a user had said, “First time I saw that button, it didn’t occur me to press it”, the sentence would have been coded as “affordance−”. Or, if a user had said “I like the text, it is clear to read.”, the sentence would have been coded as “readability+”. 1 2 http://www.usabilityscorecard.com/ http://www.atlasti.com/ Chapter 6: Analysis 50 The codes that were used were acquired from a previous masters thesis research investigating the same material. Data about the proportion of positive/negative comments and the usability aspects the comments were related to was extracted from Atlas.ti. The results are presented for each magazine separately in the next chapter. In addition to textual analysis, the think aloud videos were analyzed to search for usability problems. Gaze replay videos from five users who had most eye-tracking samples, were combined and synchronized with think aloud videos as shown in Figure 6.1. This type of analysis was inspired by a similar method described in the article “Using eye tracking to address limitations in think-aloud protocol (2005)” [20]. Other half of the users, who had insufficient eye-tracking data, were analyzed from the think aloud videos only. Similar analysis was done to all videos (think aloud and combined think aloud + eyetracking): videos were examined to find usability problems. Found usability problems were grouped together according to the Nielsens heuristics presented before (see Section 2.3.1). Number of usability problems per magazine and other results are presented in the next chapter. Figure 6.1: Screen capture from a video combining think aloud, gestures and eyetracking Chapter 7 Results After analyses of the measurements were defined in the previous chapter, it is time to present the results. All results presented here are plotted with 95 % confidence interval levels calculated from two-tailed t-tests. 7.1 7.1.1 Task time Total task time Total task time was measured from the start of task execution to finish (i.e. task completion, out of time or abortion by user). Figure 7.4 shows an average from task times for every magazine along with a 95 % confidence interval levels. The order from highest average task time to lowest is ar, fb, ww and ps. Descriptive statistics related to Figure 7.4 is presented in Table 7.1. A further analysis for differences between the means shows that ar has statistically significantly longer task times than the others. The differences between the three other magazines were not statistically significant. This can be seen from Table 7.3 where only comparisons between ar and other magazines has p-values below α = .05 threshold. Plotting tasks separately allows more detailed look into which tasks produce biggest differences between magazines in total task times. Each bar in Figure 7.1 shows averaged task times from ten users for each task and magazine (task 1 was practice and not included in calculations). As was expected from Figure 7.4, ar seems to have longest times in most of the tasks. Figure 7.3 shows task times grouped according to the usability aspects they tested (presented in Table 5.2). The most significant difference can be seen in visibility, where ar has the longest task times. The other four usability aspects have more or less the same task times between magazines. 51 Chapter 7: Results 52 Figure 7.1: Average total task times for each task with 95 % ci margins of error (the black vertical lines) (see Table 5.2 for task descriptions) Figure 7.2: Average task browsing times for each task with 95 % ci margins of error (the black vertical lines) (see Table 5.2 for task descriptions) Table 7.1: Key statistics for total task times AR WW FB PS Mean 137.74 109.37 116.23 104.78 Median 123.5 96.5 102.5 84 SD 76.17 66.83 69.46 65.74 Range 290 294 292 286 Min. 10 6 8 14 Max. 300 300 300 300 Count 100 100 100 100 Chapter 7: Results 53 Figure 7.3: Average task times grouped into usability aspects (Visibility: tasks 2, 5, 7; Readability: 3; Layout: 4, Navigation: 6, 11; Image interaction: 7, 8, 9) with 95 % ci margins of error (the black vertical lines) Figure 7.4: Total task times averaged over magazines with 95 % ci margins of error (the black vertical lines) Figure 7.5: Task browsing times averaged over magazines with 95 % ci margins of error (the black vertical lines) Table 7.2: Key statistics for task browsing times AR WW FB PS Mean 65.93 39.26 41.21 36.25 Median 49.5 25 21 22 SD 59.92 45.6 41.21 41.75 Range 298 272 298 272 Min. 2 2 2 5 Max. 300 274 300 277 Count 100 100 100 100 Chapter 7: Results 54 Table 7.3: Results (P (T ≤ t)) of two-tailed t-tests for total task times WW FB PS 7.1.2 AR 0.01 0.04 0.00 WW 0.47 0.29 Table 7.4: Results (P (T ≤ t)) of two-tailed t-tests for task browsing times FB 0.07 WW FB PS AR 0.00 0.00 0.00 WW FB 0.79 0.63 0.49 Task browsing time Task browsing time was the time users spent browsing the magazine until they found and accessed the correct article. Average task browsing times for each magazine is plotted in Figure 7.5. This shows even more radical a difference between ar and the other magazines. The order of the magazines is the same as in Figure 7.4 above. Detailed statistics related to Figure 7.5 is presented in Table 7.2. Statistical analysis of the differences between the means is presented in Table 7.4. It shows even more significant differences between ar and other magazines than in the case of total task times in Table 7.3. As before, differences between the other three magazines are not statistically significant, p-value being higher than α = .05. When task browsing times are plotted for each tasks separately, the results give a more detailed view of the differences. Each bar in Figure 7.2 shows averaged task times from ten users for each task and magazine. For instance, article in task 6 was the hardest to find in every magazine. Task browsing time had smaller variances than total task time, as can be seen from the shorter error bars and smaller standard deviation in Table 7.2. This was due the fact that once the correct article was found, some users spend more time arriving at the answer (e.g. deciding the most interesting fact in task 4) than others. 7.2 System Usability Scale and Single Usability Metric Figure 7.6 shows the average sus score for each magazine. Although there are no statistically significant differences here due to small sample size (see Table 7.5), the same trend continues. The order of the means is ar, ww, fb and ps; from lowest perceived usability to highest. Table 7.7 shows the descriptive statistics behind the figure. Single Usability Metric scores from the tasks averaged over each magazine are plotted in Figure 7.7. sum was calculated from task time, completion, errors and satisfaction measurements and averaged over users for each task. As can be seen from the figure and from Table 7.6, there are no statistically significant differences between magazines; error margins are smaller here than in sus but so are the differences between means. This can be seen from Table 7.8, which shows the descriptive statistics. Chapter 7: Results 55 Figure 7.6: sus score averaged over magazines with 95 % ci margins of error (the black vertical lines) Figure 7.7: sum score averaged over magazines with 95 % ci margins of error (the black vertical lines) Table 7.5: Results (P (T ≤ t)) of two-tailed t-tests for sus score (none are below α = .05 limit) Table 7.6: Results (P (T ≤ t)) of two-tailed t-tests for sum score (none are below α = .05 limit) WW FB PS AR 0.89 0.64 0.10 WW 0.75 0.14 FB AR 0.89 0.74 0.27 WW FB PS 0.25 WW FB 0.65 0.23 0.43 Table 7.7: Key statistics for sus score AR WW FB PS Mean 59 60.25 63.25 73.5 Median 61.25 58.75 65 75 SD 19.37 20.36 20.55 17.53 Range 55 65 72.5 52.5 Min. 27.5 27.5 22.5 42.5 Max. 82.5 92.5 95 95 Count 10 10 10 10 Table 7.8: Key statistics for sum score AR WW FB PS Mean 64.9 64.29 66.29 69.2 Median 66.65 64.65 68.65 69.35 SD 9.74 10.23 8.96 6.89 Range 66.9 62.6 68 57.9 Min. 41.1 53.5 43.8 57.2 Max. 75.2 81.7 77.1 79.9 Count 10 10 10 10 Chapter 7: Results 7.2.1 56 Satisfaction User satisfaction was asked after every task for sum measurement, but it can also be presented here separately. Besides task time, satisfaction was the only measurement in sum that produced significant differences. Figure 7.8 shows how ps has the best average satisfaction score followed by fb, ww and ar. From Table 7.10 it can be seen that the difference between average satisfaction score is statistically significant between ps and ar. Figure 7.9 shows how satisfaction scores are distributed among individual tasks. Most of the tasks yield similar satisfaction scores, error margins considered, with all magazines, but some differences can be found from the tasks 5, 7, 8 and 11. Figure 7.10 shows task satisfaction scores grouped according to the usability aspects they tested (presented in Table 5.2). The most notable difference can be seen in visibility, where ar has the lowest satisfaction scores. The other four usability aspects show only minor differences in satisfaction between magazines. Satisfaction and sus scores were the only subjective measures in the tests. Satisfaction questionnaires were filled immediately after task completions, so it measures more of the task performance in contrast to sus, which was filled after free browsing phase. Nevertheless, the order of the magazines is the same as in sus (see Figure 7.6). Figure 7.8: Satisfaction score averaged over magazines with 95 % ci margins of error (the black vertical lines) (see Table 5.2 for task descriptions) 7.3 Quantitative eye-tracking Eye-tracking was used as a quantitative and qualitative uem. Quantitative measures taken were pupil diameter and fixation duration. Qualitative examination, presented in Chapter 7: Results Figure 7.9: Average satisfaction scores given for each task with 95 % ci margins of error (the black vertical lines) Figure 7.10: Average task satisfaction scores grouped into usability aspects (Visibility: tasks 2, 5, 7; Readability: 3; Layout: 4, Navigation: 6, 11; Image interaction: 7, 8, 9) with 95 % ci margins of error (the black vertical lines) 57 Chapter 7: Results 58 Table 7.9: Key statistics for satisfaction score AR WW FB PS Mean 3.35 3.55 3.57 3.80 Median 3.33 3.67 3.67 4 SD 1.01 0.95 1.07 1.01 Range 4 4 4 4 Min. 1 1 1 1 Max. 5 5 5 5 Count 100 100 100 100 Table 7.10: Results (P (T ≤ t)) of two-tailed t-tests for satisfaction score (ps–ar is below α = .05 limit) WW FB PS AR 0.16 0.15 0.00 WW FB 0.91 0.07 0.11 the section 7.4.1 was done to the combined eye-tracking–think aloud videos in order to find usability problems from the magazines. 7.3.1 Pupil diameter Figure 7.11 shows pupil diameter measures for each magazine. Measured pupil diameters ranged from 2.7 to 5.5 millimeters, which complies with established results for a normal adult pupil diameter in bright illumination [6, 41]. This measure seems to separate the magazines in two groups: ar and ww have significantly larger pupil diameters measured than fb and ps. Table 7.11 shows this to be true: the only differences between means that are not significant (α = .05) are ar–ww and ps–fb. Detailed information of the data statistics is presented in Table 7.14. Table 7.13 shows pupil diameter correlations with other task-level measurements: task time and satisfaction. As explained before, the hypothesis was that pupil diameter, indicating cognitive load, would correlate with other usability measures: directly with task time and inversely with satisfaction. There are significant correlation between pupil diameter and task time in ar (to the wrong way according to hypothesis) and ww. Also, correlation is significant between pupil diameter and satisfaction in fb (to the wrong way according to hypothesis) and ps. However, when all measures are combined, the correlations are negligible. 7.3.2 Fixation duration In Figure 7.12, average fixation durations are plotted. fb and ps are in the middle ar having higher and ww lower average fixation durations. The only statistically significant difference is between ar and other magazines, as seen from Table 7.12. Detailed numbers Chapter 7: Results 59 Figure 7.11: Pupil diameter averaged over magazines with 95 % ci margins of error (the black vertical lines) Figure 7.12: Fixation duration averaged over magazines with 95 % ci margins of error (the black vertical lines) Table 7.11: Results (P (T ≤ t)) of two-tailed t-tests for average pupil diameters Table 7.12: Results (P (T ≤ t)) of two-tailed t-tests for average fixation durations WW FB PS AR 0.07 0.00 0.00 WW 0.00 0.00 FB WW FB PS 0.65 AR 0.00 0.03 0.01 WW FB 0.18 0.09 0.95 Table 7.13: Correlation coefficients between pupil diameter, task time and satisfaction along with results (P (T ≤ t)) of two-tailed t-tests for correlation coefficients AR WW FB PS All Task time -0.27 0.23 -0.15 -0.05 -0.05 T-test 0.01 0.026 0.15 0.60 0.33 Satisfaction 0.09 0.07 0.39 -0.31 0.06 T-test 0.35 0.53 0.00 0.00 0.27 Count 100 90 100 99 389 Table 7.14: Key statistics for pupil diameter measures AR WW FB PS Mean 4.06 4.18 3.79 3.83 Median 4.17 4.19 3.72 3.82 SD 0.50 0.54 0.67 0.53 Range 1.96 2.53 2.81 2.40 Min. 2.74 3.02 2.60 2.80 Max. 4.70 5.55 5.40 5.20 Count 100 90 96 99 Chapter 7: Results 60 Table 7.15: Key statistics for fixation duration measures AR WW FB PS Mean 232.14 202.89 214.23 214.78 Median 229.88 202.53 204.52 211.73 SD 50.63 42.43 62.58 46.12 Range 295.04 226.78 306.38 241.48 Min. 102.64 113.17 127.63 125.89 Max. 397.68 339.95 434.01 367.38 Count 99 76 87 90 Table 7.16: Correlation coefficients between fixation duration, task time and satisfaction along with results (P (T ≤ t)) of two-tailed t-tests for correlation coefficients AR WW FB PS All Task time 0.10 -0.16 0.06 0.24 0.10 T-test 0.34 0.13 0.57 0.02 0.06 Satisfaction 0.06 -0.05 -0.11 0.08 -0.02 T-test 0.54 0.66 0.26 0.41 0.67 Count 99 76 87 90 352 behind the figure are presented in Table 7.15. As mentioned before, the hypothesis was that average fixation duration during a task would correlate directly with task time and inversely with satisfaction. Table 7.16 shows the correlation between fixation duration and task time and satisfaction. Unlike in pupil diameter correlations, the combined correlation coefficients are of the right sign compared to the hypothesis. Correlation with task time almost falls under the .05-limit, indicating a possible relation. In contrast, correlation with satisfaction is highly unlikely. 7.4 Think aloud and qualitative eye-tracking Figure 7.13 and Table 7.17 show summarized results from the qualitative think aloud data analysis done with Atlas.ti. ps had the best ratio of positive and negative comments followed closely by fb. Users were asked to find usability problems so negative comments were made more frequently than positive ones. Table 7.17 shows also three of the most commented aspects of the magazines. Image zoom should be a default feature and when it was not found (from ar, ww and fb), users pointed this out. ar was the only magazine where navigation bar was always visible, but its design (faded away too quickly, section divisions were not noticed) confused users. In ww, fb and ps, when the navigation bar was found, it generated positive comments. Until then, users showed frustration in their comments for they had to manually leaf through the magazine to articles and back. Chapter 7: Results 61 Table 7.17: The number of positive and negative comments from think aloud and the three most remarked aspects of usability (−/+) AR WW FB PS All Positive 16 32 40 33 121 Negative 104 141 126 103 474 Ratio 0.15 0.23 0.32 0.32 0.26 Count 120 173 166 136 695 Image zoom 12/1 4/0 13/0 0/0 29/1 Navigation bar 15/1 3/5 13/4 14/4 45/14 toc 5/0 8/2 5/2 8/1 26/5 Table 7.18: Total number and different usability problems found from observing the videos AR WW FB PS Total 95 65 60 51 Different 25 22 19 18 toc generated negative comments in ar, because it did not have one; in ww, because it was difficult to access and the hyperlinks were not noticed; in fb and ps, because it was too long, it lacked writer names and showed only the first article when several were stacked vertically. Figure 7.13: Amount of negative and positive comments about each magazine Figure 7.14 shows the number of usability problems found during qualitative analysis of the think aloud videos (20 with and 20 without eye-tracking). The exact numbers are presented in Table 7.14. 7.4.1 Usability problems Table 7.19 shows the most important usability problems found with qualitative video analysis (explained in Section 6.4). The first column on the left is a categorizing summary Chapter 7: Results 62 Figure 7.14: Total number and different usability problems found from observing the videos of the slightly different usability problems in each magazine. The second column shows to which heuristic class (see Section 2.3.1) the problems belong to (1–8, last two were excluded because none of the magazines contained instructions). In the third column, the severity of the usability problem is marked as L=low (leads to user complaints and hinders task completion) or as H=high (prevents task completion) based on the observations from the user tests. The rest of the columns contain more detailed usability problem descriptions and number of different users who encountered and noticed such a problem. If the number is zero, no users explicitly pointed out the problem, but the observator discovered it. This table includes only the most important (frequent and/or severe) usability problems; some minor problems have been left out or combined with others for space constraints. 7.5 Summary of results Results from all the measures are presented in Table 7.20. On the left side of the table, the rank is derived from means of the measurements only. 95 % confidence intervals are taken into account on the right side of the table, for the three measures where there were significant differences. sus, sum and think aloud comment ratio (positive − negative comments) were not significantly different. Confidence intervals were not calculated for number of usability problems found. The individual results and the apparent trend are discussed further in the next chapter. Table 7.21 shows pros and cons of each magazine. This is a summary of all the findings from the pre-test heuristic evaluation, user test observation and video analysis. Only those aspects of ps are presented which are different from fb (i.e. image gallery and zooming). The implications of this table are considered thoroughly in the next chapter. Chapter 7: Results 63 Table 7.19: Most important individual usability problems by magazine (H.–Heuristics: 1–Visibility of system status, 2–Match between system and the real world, 3–User control and freedom, 4–Consistency and standards, 5–Error prevention, 6–Recognition rather than recall, 7–Flexibility and efficiency of use, 8–Aesthetic and minimalist design. S.–Severity: L–low, H–high) *Dossier refers to a tablet magazine “page”; it can be vertically longer than screen size (horizontal swipe moves between dossiers, vertical swipe moves within a dossier) Usability issue Latency H. 1 S. L Untraditional 2 L Pagination 3 L Undo 3 L Zoom 4 L Unpredictable action Gestures 4 H 5 L Hidden content Navigation bar Text–image association Bookmarking 5 H 6 L 6 L 6 L Search 7 H Hidden shortcuts Scrollable portions Article hierarchy Dossier* continuity 7 H 7 L 8 L 8 H ww Number of user occurences in magazines ar fb ps 3 text size and swipe — 7 article or image open 1 same as previous 5 no toc, Käynnistys≈Käynnistä 4 no page numbers — — 4 paginated browsing 2 paginated browsing — — 0 no “undo” action 0 no “undo” action 0 no “undo” action 3 no “undo” action 9 no image zoom 2 no image or text zoom 6 no image or text zoom 2 no text zoom 5 in toolbar buttons 6 in image opening — — 2 small buttons in navigation bar 8 swipe direction error 7 same as previous 7 same as previous 8 inside image carousel 3 poor affordance ⊕buttons — — 8 fades away, poor affordance — 1 poor affordance to swiping — 5 images not next to text 6 same as previous — — — 3 starts always from top page 5 same as previous 1 same as previous 4 no search 2 no search 6 no search 5 no search — 5 bottom bar hard to find 6 same as previous 3 same as previous — 15 hide content, slows browsing — — 5 cluttered top-level — 5 short articles poorly separated 5 same as previous 0 poorly implied 2 poorly implied 5 not implied (see Section 8.2.3) 6 not implied (see Section 8.2.3) Chapter 7: Results 64 Table 7.20: The order of magazines in all usability measurements (TT: Task time, PD: Pupil diameter, FD: Fixation duration, TA: Think aloud user comment, UP: Usability problems from think aloud and eye-tracking videos) (*95 % ci taken into account) AR WW FB PS TT 4 2 3 1 SUS 4 3 2 1 SUM 3 4 2 1 PD 3 4 1 2 FD 4 1 2 3 TA 4 3 2 1 UP 4 3 2 1 AVG 3.7 2.9 2 1.4 TT* 2 1 1 1 PD* 2 2 1 1 FD* 2 1 1 1 AVG* 2 1.7 1 1 Items in the columns are mostly based on user observation and few (e.g. single-column vs. multi-coulmn layouts) are based on previous research. Users were asked to find usability problems, which was the point of this whole study, so many positive items are omitted from the “Pros” column. For example, even though magazines had different typography, legibility did not produce problems in any of the magazines: it was considered to be a “default” feature and not worth reporting. Only those positive aspects are reported, which were not found from all the magazines. When accessed, navigation bar stayed visible Page number was indicated No content was hidden behind hyperlinks Single column layout Layout rotated with device Stepless vertical scrolling Image carousel with zoom FB & PS PS When accessed, toolbar stayed visible Toolbar had shortcuts to cover and toc Cover and toc had hyperlinks Vertical pagination separated short articles Pros Toolbar was always visible Page number was indicated Image order was indicated in image carousel Articles were clearly separated to different dossiers Articles were found directly from top level Only magazine with an adjustable font size Layout rotated with device Image carousel WW AR Buggy image gallery Cons Toolbar symbols hard to decipher Navigation bar faded away, had poor affordance and divisions inside sections Relevant content was hidden inside image carousel Some articles are divided unnecessarily Top level was a jumble of images and bits of texts Adjusting text size changed layout drastically and disoriented users Paginated: did not allow stepless vertical scrolling Multi-column layout No image zoom, search feature, cover, toc Toolbar was very hard to find Page number was not indicated Poor affordance in hyperlinks Allowed stepless vertical scrolling only by holding Relevant content was hidden behind hyperlinks (⊕buttons) Image opening action was not indicated Multi-column layout Layout did not rotate with device No image zoom, search feature, adjustable text size Navigation bar was very hard to find Dossier continuation downwards was not implied on first page No shortcut to toc No image zoom, search feature, adjustable text size, cover Table 7.21: A “pros and cons” summary of each magazine based on the entire study Chapter 7: Results 65 Chapter 8 Discussion In this chapter, the results presented in the previous chapter are discussed further. The research questions are answered and problems occurred during the research are scrutinized. A summary of the results concerning each magazine is presented along with the relation of these findings to previous research. Finally, validity and reliability of the research is discussed. 8.1 Summary of the summative usability evaluations In the previous chapter, seven measures were suggested for magazine usability evaluation. Between four of them—task time, satisfaction, pupil diameter and fixation duration— statistically significant differences were found. However, task time (efficiency) and satisfaction score (satisfaction) are the only measures of the four which are scientifically sound indicators of usability according to iso [37]. Task completion rates (effectiveness) did not show differences. 8.1.1 Usability implications of task time and satisfaction scores Task time was easily measured. The problem was to decide what to do with incomplete tasks, to which no instructions from literature were found. The number of incomplete samples was so small compared to the total number of task time samples, that even if they were dealt with erroneously, it did not have an effect on the overall results. After that, task time and satisfaction score were easily analyzed to imply the usability of the magazine user interface. From these two measures and their margins of error, it can be stated that ps had better usability than ar in the context of this study. More thorough analysis on task-level reveal the usability aspects where the differences stem from. Figures 7.1, 7.2 and 7.9 66 Chapter 8: Discussion 67 show how total task time, task browsing time and satisfaction scores are distributed between tasks. Figures 7.3 and 7.10 group task-level measures from the task time and satisfaction score graphs into five usability aspects. (Task descriptions and the usability aspects they tested are presented in Table 5.2.) Readability (task 3) and Layout (task 4) show no significant differences between magazines in the two figures. In Navigation (tasks 6 and 11), differences in both time and satisfaction can be seen. fb differs most from others in task 6 (Figure 7.2) signifying poor navigation, although ps magazine is identical in the terms of this particular task. Only explanation to this can be found from Table 5.1, which shows that the proficiency level of the fb users happened to be lower than the ps users’. This implies that inexperienced users need more cues for dossier1 continuation (which was a severe usability problem in fb and ps) than more experienced. ww was the only magazine which presented author’s name in toc which could have lead to the lower task times and higher satisfaction scores in task 11 (Figures 7.1 and 7.9) and in Navigation (Figures 7.3 and 7.10). Visibility (groups tasks 2, 5 and 7) in both figures make the point that all relevant content should be clearly visible, not behind obscure hyperlinks. Time and satisfaction levels in the figures show that this is especially problem in ar, where even the correct articles for tasks 5 (see Figure 8.1) and 7 were nearly impossible to find. Also, ww’s scrollable columns are not good, which can be seen from the low satisfaction scores in task 5 (Figure 7.9). In Image interaction (tasks 8, 9, 10), some minor differences can be seen from the figures. The ability to zoom images results in better user experience (satisfaction), even though it does not necessarily translate directly to more efficient user interface (task time), which can be seen from ps task 8 (Figures 7.9 and 7.1). The low scores for ww in the same task shows that partially visible, scrollable images are a very poor choice for infographic presentation. Image carousel seems to be quicker way to browse through many images than individually opening them, which could be expected. Total task times in task 9 are low for ar, but not for ps (the other magazine with an image carousel), which means that an image carousel is only good when it is not buggy and shows all of the article’s images in the same carousel. 1 See caption from Table 7.19 Chapter 8: Discussion 8.1.2 68 Quantitative and qualitative eye-tracking result analysis Pupil diameter was first of the two measures extracted from the eye-tracking data. From Table 8.1, it can be seen that pupil diameter has significant negative correlation of −.518 with the age of the user. This is a validation for the results because human pupil size is known to decrease with age [6]. However, Table 7.13 shows that pupil diameter did not correlate on task basis with satisfaction or task time. Even though pupil diameter data is recorded in eye-tracking by default, the measurements are not normally used in usability studies. This is because the effect of cognitive load or arousal on pupil size is easily masked by changes in the amount of light arriving to eye [62]. Also, baseline measurements with different screen brightness settings would have been necessary to deal with the individual differences. It would have enabled calibration to get reliable results from different magazine layouts (with different amounts of whitespace). Pupil diameter was decided to be investigated retrospectively, so the tests were not conducted these constraints in mind. Looking at the differences in average pupil sizes between magazines in Figure 7.11, it can be seen that three categories are formed: ww in first, ar in second, and fb and ps (which look effectively the same) in third class with smallest pupil sizes. No measurements were made, but the differences between pupil diameters could have been caused by the different layouts, which had different amount of whitespace as can be seen from Figures 4.15, 4.16 and 4.17. Fixation duration, on the other hand, has been used in several usability studies before [13, 22, 30]. Fixation duration is usually thought as a measure of cognitive processing difficulty. Eye-mind hypothesis (see Section 2.3.5) states that people look at what they think. From this, it is derived that they also look at something as long as they think of it. Therefore, when a part of a user interface is difficult to process, it will generate longer fixations. On the other hand, long fixations can mean interesting and intriguing user interface, rendering the fixation–usability relation to a U-shaped curve [22]. Nevertheless, some conclusions can be made from the fixation duration data. Cowen (2002) found out that web sites with a high “clutter index”, i.e. small amount of white space and densely clustered items, made the layout more difficult to process and generated longer fixations on average. This could be why ar had the longest fixations, as the top-level layout is more dense than in the other magazines. To enable more in-depth analysis from eye-tracking, the data should have been divided to phases depending on the stage of cognitive processing the user is going through. [22] Qualitative eye-tracking produced important findings about affordance. Affordance, in the context of hci, was defined famously by Norman in his book (1988) to refer to “the perceived and actual properties of the thing, primarily those fundamental properties that determine just how the thing could possibly be used” [56]. Web sites have mostly Chapter 8: Discussion 69 gotten rid of the problem but today, poor affordance design plagues iPad applications, as mentioned before (see Section 3.3.4). For instance, eye-tracking five ww users revealed six cases when a user looked at a ⊕button or hyperlink in toc containing the correct answer (see Section 4.2), but did not tap it. This clearly indicates that the hyperlink design is faulty. Hyperlinks in ar were not made to look touchable but they were the only way to access articles so users quickly learned the interaction. In fb and ps, toc was the only place where hyperlinks were evident and users had little problems thinking of them as buttons. Eye-tracking is seldom used with mobile devices. In a meta-analysis of 100 mobile device usability studies, only two had used eye-tracking [21]. The biggest challenges are movement of the mobile device and users hands as they can block the eye-tracking signal. Some suggestions on how to use a standalone eye-tracker with mobile devices has been proposed, but the only sure solution would be to use a head-mounted eye-tracker [73]. In this study, iPad was fixed to position but hands blocked the signal about half of the time. This was a conscious compromise between ergonomics (natural hand position) and eye-tracking data quality. 8.1.3 Low reliability of SUS and SUM scores sus and sum scores did not produce statistically significant differences due to the small sample size. Especially sum could be used as a comprehensive uem, but it requires lots of resources: the number of users should be closer to hundred than ten and the examination of error rate is timely. In retrospect, the time consuming sum could have been dropped from the used methods. If a pre-test simulation would have been ran, it could have shown too small differences. However sus and sum together contributed to reveal the trend in the results, even though they were not totally statistically significant. To summarize the summative part of the evaluations: ps and fb are consistently ahead of ar and ww in the terms of usability. 8.2 Summary of the formative usability evaluations This section presents the most important differences and similarities between the four magazines. Findings are linked to previous research. All of the magazines had some good and some bad qualities, none was perfect. A hypothetical model of a tablet magazine with a “perfect usability” is presented in the next chapter. Chapter 8: Discussion 8.2.1 70 Findings from AnyReader version In ar, the most severe problems were related to finding content. Relevant content (such as technical specifications tables) was hidden inside image carousel. Also, eye-tracking revealed that headlines in the top-level did not pop out as they are supposed, because of faint typography and competing elements (see Figure 8.1 for an eye-tracking example and Figure 7.2: task 5 for notable differences in task time). An eye-tracking study has shown that extra information in search results competes with relevant information and impairs task times in navigational tasks [30]. Toolbar on top was docked and thus visible all the time, whereas the bottom navigation bar faded away too quickly for users to exploit it. Eye-tracking revealed that toolbar buttons had good affordance and saliency (users noticed and tapped them quickly), but navigation bar was mostly thought of only as an indicator of placement. Figure 8.1: Eye-tracking shows how correct headline is not “seen” even though quickly looked at, because of more demanding typography below, which was not a headline Article “Tietokoneen tulevaisuus on täällä” was divided erroneously during the dynamic layout into two dossiers: technical specifications table was in different article than the images of the devices. This was deliberately addressed in task 7 and 5/10 users failed to complete the task. Content was paginated horizontally and vertically, and it did not allow stepless scrolling. Most users in this study preferred to scroll steplessly (like in a web browser), which complies previous studies [8]. Finally, an eye-tracking study of online newspapers has suggested that single-column layout is more effortlessly read than a multi-column [59]. 8.2.2 Findings from retail version (Woodwing) ww had problems with navigation inside the magazine. Toolbar was hard to find (tap at bottom): 7/10 users did not find it at all during tasks. Lack of page numbers left users unsure of their location inside the magazine; information on current location is crucial to effectively navigate in any information space [3]. Hyperlinks in cover, toc and articles (⊕buttons) had poor or non-existent affordance, which is a common problem in iPad applications [10]. Chapter 8: Discussion 71 Image opening action lacked consistency and affordance: it was not indicated what would happen if image was tapped or if anything would happen at all. Scrollable portions of page frustrated many users, as has been the case in previous research (see Section 3.3.4). In order for them to work, they should be properly indicated for scrollability [10, 11]. Lastly, even though most users in this and in previous e-reading studies prefer portrait orientation, a possibility for a landscape orientation should have been enabled [78, 79]. 8.2.3 Findings from Fanxybox and Photoswipe versions fb are ps had two major problems: navigation bar and indicator of article continuity. 7/20 users did not find the navigation bar during tasks and most of the rest found it by accident. Some users requested for a “shorter” shortcut to toc to avoid scrolling the navigation bar. Shortcuts allow more effective usage of any user interface and they are mentioned in the heuristics (see Section 2.3.1). Another severe usability problem was found in task 6, where users had to find a short article situated at the end of the magazine. The dynamic layout system in fb and ps had made the first article of “Vikatila” dossier exactly as long as screen height by accident. Other articles, including the one searched for in task 6, were below the first one but most users had problems noticing the continuation. Finally, ps was the only magazine with an image zoom, but the image gallery had some glitches (controls disappeared abruptly). In conclusion, a good tablet magazine, at least in the usability point of view, combines features from all the tested versions. A tablet magazine should have freedom of scrolling and device orientation. In addition, this and previous research has shown that a good digital publication has the affordances of a print along with interaction possibilities of digital environment [43]. A model for a “perfect” magazine is visioned in the last chapter. 8.3 Reliability and validity Too small sample size was the biggest obstacle to obtain reliable results. This is a common problem for every usability study involving real test users, when there is a need to get statistically significant, quantitative, results. A 95 % confidence interval level was used throughout this study. This showed that some of the results were unreliable due to the natural variability in users. Ten users per magazine was too small a sample size to level out these differences. Usability literature has contradictory information on how uems affect one another. For example, think aloud has been found to speed up and slow down task times. Also, eyetracking is adviced to be used alone and with think aloud. There is not a unanimous theory on how uems should be used. In this study, it was decided to obtain as much Chapter 8: Discussion 72 measurements as possible from the small amount of users to at least reveal some trend from the more or less unreliable results. Finally, considering reliability, the evaluator effect has to be taken into account. A meta-analysis of heuristic evaluation and think aloud studies found out that different evaluators found different usability problems [32]. This means that the usability problems found are, in some degree, dependent on the evaluator. In this study, this considers only the qualitative analysis of think aloud–eye-tracking videos, which were done by a single person. However, usually the most severe and frequent usability problems are found by all evaluators, which is also believed to be the case here [51]. Usability evaluations are always sensitive to the context they have been made in. The context consists of users, tasks and the environment. In this study, it was shown that fb and ps had the best usability, but the results are strictly speaking only applicable in this context. The selection of users, tasks and the environment was done so that the results would be valid in a common context. If this was achieved, then the results of the summative usability evaluations are valid and generalizable. As stated before, the usability is strictly dependent on context: users, tasks and environment. The following steps were taken in order to make the context of user tests applicable for generalizations. Users were selected to represent together a summary of the Tietokone tablet magazine readership. Tasks were designed to mimic common magazine use experience. Ergonomics of the test setup were constructed so that it allowed a comfortable seating position for user. The test environment was the most problematic, being a (temporary) usability laboratory with a test facilitator and video camera next to the user. Nevertheless, a strong case can be made from the results of this study that fb and ps magazines have the best usability also in more common contexts. 8.3.1 Influence of user background Table 8.1 shows how user background affected the various measures. There are ten cases where the correlation is significant (marked by an asterisk). The most interesting finding from this is that the sus score correlates negatively with the amount of prior experience with tablets and e-reading. This would imply that experienced users have used better tablet magazines than the tested magazines. sus correlates also with satisfaction, which could be expected. Age affects pupil diameter, as can be seen from the table. This is explained in more detail in Section 8.1.2. Age also correlates with tablet ownership and Tietokone reading, which was expected. However, no explanation can be given to correlations between eye-tracking measures and questions. Chapter 8: Discussion 73 Table 8.1: Correlation coefficients between user background and some usability measurements (TT: Task time, PD: Pupil diameter, FD: Fixation duration, Sat: Satisfaction; all averaged per user) (*Correlation is significant at the α = .05 level (2-tailed)) SUS SUS Pupil diameter Fixation duration Age Task time Satisfaction Q: Graphic design Q: Owns tablet Q: Used tablet Q: Read tablet Q: Has Apple Q: Will buy Apple Q: Read Tietokone Q: Interest in tech.mags Q: Reads print regularly Proficiency .183 -.138 -.148 -.179 *.674 -.187 *-.370 -.274 *-.582 -.118 -.181 .039 .095 -.093 *-.367 PD .183 -.259 *-.518 -.036 .123 -.131 .251 .173 .115 .275 .237 *-.433 -.115 .099 .040 FD -.138 -.259 .156 .133 -.162 .262 .120 .033 .115 .041 .075 .092 *.356 *-.372 .160 Age -.148 *-.518 .156 .291 -.036 -.096 -.089 -.289 -.104 *-.369 -.048 *.405 -.085 -.002 -.165 TT -.179 -.036 .133 .291 .117 -.133 .140 -.097 -.050 -.109 -.019 -.032 -.127 -.122 -.162 Sat *.674 .123 -.162 -.036 .117 -.208 -.197 -.213 -.225 -.124 -.176 -.035 .008 .020 -.248 Chapter 9 Conclusion New mobile e-reading devices, tablet computers, have been proposed as a salvation for publishing houses to combat the declining print sales. More and more book, newspaper and magazine content is being made digitally available for consumers. The standard form of the digital publication has not yet been decided and many publishers are hesitant to do digital publishing until this. Application- and web-based magazines with various amounts of interaction are all available for mobile devices. This usability study has compared four versions of the same magazine: two application- and two web-based solutions. Four digital versions of the same Tietokone magazine issue were evaluated. The retail version ww, short from the Woodwing publishing solution used to build it, was a traditional image-based application with static manual layout. ar, short from AnyReader e-reading solution, had different layout and structure than ww. fb and ps were two slightly different versions from the same html5 web-based magazine. fb is a short from Fancybox, a simple pop-up window image viewing system and ps from Photoswipe, a browsable and zoomable image gallery used in otherwise similar magazine. Structure of the latter magazines was similar to ww, but the dynamic layout was different than in ww or ar. User tests were done with four groups of ten users, each individual testing one version of a magazine for an hour. Eleven tasks were designed with a heuristic evaluation as a basis. First task was a practice meant for those who had not used a tablet computer before. Time, subjective satisfaction, think aloud and eye-tracking data was recorded from ten tasks. The data was analyzed qualitatively (think aloud and eye-tracking videos) and quantitatively (task time, pupil diameter, fixation duration, sus, sum). Formative evaluation based on heuristic evaluation and think aloud–eye-tracking video observation revealed many usability problems from each magazine, some easier to correct than others. Each of the magazines had own set of pros and cons. The following model 74 Chapter 9: Conclusion 75 for tablet magazine maximizes the “pros–cons” ratio and can be argued to have maximum usability. Users want more freedom of choice in digital environment, so stepless scrolling (in fb and ps), landscape orientation (ar, fb and ps), adjustable text size (ar) and image zoom (ps) should be enabled. Navigation around the digital magazine has to be as effortless as in print, so easily discovered shortcuts (ar), page numbers (ar, fb and ps), toc (ww, fb and ps) and page browser (ww page browser did not work in the tests, fb and ps navigation bar was similar to it) should be available. None of the magazines tested fully exploited the benefits of digital platform: search feature and multimedia content (video, sound) were most frequently missed. Finally, all content should be easily found with browsing by making it either directly accessible (fb and ps) or behind clearly marked hyperlinks (hyperlinks in every magazine lacked affordance, i.e. did not look touchable). Magazines were compared with summative evaluations from task times, sus and sum scores, fixation duration and pupil diameter, think aloud comments and number of found usability problems. Even though the sample size was too small in this study for some measures to obtain statistically significant differences, a clear trend can be summarized from the measures. In this context—usability testing is always dependent on the context = users + environment + tasks—the html5 based magazines fb and ps with dynamic layout had the best usability. The retail version ww was second and ar, also with dynamic layout, fared worst according to summative usability evaluation. Eye-tracking proved to be challenging usability evaluation method. Both qualitative (gaze replay analysis together with think aloud videos) and quantitative (pupil diameter and fixation duration) analysis was done from the eye-tracking data. Pupil diameter measures did not correlate with other usability metrics because variability in user interface brightness had more effect on it. However, average fixation duration seemed to imply user interface complexity and was measured to be greatest in ar. Qualitative eye-tracking analysis gave valuable insight into affordance of hyperlinks and saliency of layout elements. This study dealt strictly with usability; visual qualities were not addressed at all. One could argue that automatic layout systems make dull and homogenic layouts1 , but this has to be researched further. One solution could be a “semi-automatic” layout system, where the basic layout is done automatically and final adjustments are left to experts of graphic design. However, being automatic or manual, latest development2 has suggested that html5 is the technology of future for digital publishing rather than applications. 1 On the other hand, one user, who was acquainted with the print version of Tietokone, commented the html5 version as “brandlike” without knowing the truth 2 “The new iPad” released in March 2012 has four times more pixels than iPad 2, which increase the memory requirements of a image-based publication substantially References [1] M. G. Albanesi, R. Gatti, M. Porta, and A. Ravarelli. Towards Semi-Automatic Usability Analysis through Eye Tracking. In CompSysTech’11 Proceedings of the 12th International Conference on Computer Systems and Technologies, pages 135– 141, New York, NY, USA, 2011. [2] J. Arnowitz and E. Dykstra-Erickson. Usability as Science. Interactions, 12(2):7–8, 2005. [3] D. Benyon. Navigating Information Space: Web site design and lessons from the built environment. PsychNology Journal, 4(1):7–24, 2006. [4] D. C. Berry and D. E. Broadbent. The role of instruction and verbalization in improving performance on complex search tasks. Behaviour & Information Technology, 9:175–190, 1990. [5] R. Bias. Interface-Walkthroughs: Efficient Collaborative Testing. IEEE Software, 8(5):94–95, 1991. [6] J. E. Birren, R. C. Casperson, and J. Botwinick. Age Changes in Pupil Size. Journal of Gerontology, 5(3):216–221, 1950. [7] T. Boren and J. Ramey. Thinking Aloud: Reconciling Theory and Practice. IEEE Transactions on Professional Communication, 43(3):261–278, 2000. [8] C. Braganza, K. Marriott, P. Moulder, M. Wybrow, and T. Dwyer. Scrolling behaviour with single- and multi-column layout. In Proceedings of the 18th international conference on World wide web - WWW ’09, pages 831–840, New York, New York, USA, Apr. 2009. [9] J. Brooke. SUS – A quick and dirty usability scale. Usability evaluation in industry, page 7, 1996. [10] R. Budiu and J. Nielsen. Usability of iPad Apps and Websites: 1st edition. Technical report, 2010. [11] R. Budiu and J. Nielsen. Usability of iPad Apps and Websites: 2nd edition. Technical report, 2011. 76 References 77 [12] P. A. Carpenter and M. A. Just. Eye fixations and cognitive processes. Cognitive Psychology, 8(4):441–480, Oct. 1976. [13] A. Çöltekin, B. Heil, S. Garlandini, and S. I. Fabrikant. Evaluating the Effectiveness of Interactive Map Interface Designs: A Case Study Integrating Usability Metrics with Eye-Movement Analysis. Cartography and Geographic Information Science, 36(1):5–17, Jan. 2009. [14] S. Chen, J. Epps, N. Ruiz, and F. Chen. Eye activity as a measure of human mental effort in HCI. In Proceedings of the 15th international conference on Intelligent user interfaces - IUI ’11, pages 315–318, New York, New York, USA, 2011. [15] P. F. Chong, Y. P. Lim, and S. W. Ling. On the Design Preferences for Ebooks. IETE Technical Review, 26(3):213–222, 2009. [16] P. Chynal, J. Szymański, P. Campos, N. Graham, J. Jorge, N. Nunes, P. Palanque, and M. Winckler. Remote Usability Testing Using Eyetracking. INTERACT 2011 Human-Computer Interaction (Lecture Notes in Computer Science), 6946:356–361, 2011. [17] L. Cooke. Improving usability through eye tracking research. In IPCC 2004 International Professional Communication Conference Proceedings, pages 195–198. IEEE, 2004. [18] L. Cooke. Eye Tracking: How It Works and How It Relates to Usability. Technical Communication, 52(4):456–463, 2005. [19] L. Cooke. Is Eye Tracking the Next Step in Usability Testing? 2006 IEEE International Professional Communication Conference, pages 236–242, Oct. 2006. [20] L. Cooke and E. Cuddihy. Using eye tracking to address limitations in think-aloud protocol. In IPCC 2005 International Professional Communication Conference Proceedings, pages 653–658. IEEE, 2005. [21] C. K. Coursaris and D. J. Kim. A Meta-Analytical Review of Empirical Mobile Usability Studies. Journal of Usability Studies, 6(3):117–171, May 2011. [22] L. Cowen, L. J. Ball, and J. Delin. An Eye Movement Analysis of Webpage Usability. In People and Computers XVI - Memorable yet Invisible: Proceedings of the HCI 2002, pages 1–14, 2002. [23] T. Dimond. Devices for reading handwritten characters. In Proceedings of Eastern Joint Computer Conference, pages 232–237, 1957. [24] N. Eger, L. J. Ball, R. Stevens, and J. Dodd. Cueing Retrospective Verbal Reports in Usability Testing Through Eye-Movement Replay. In BCS-HCI ’07 Proceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI...but not as we know it, pages 129–137, Swinton, UK, 2007. References 78 [25] C. Ehmke and S. Wilson. Identifying Web Usability Problems from Eye-Tracking Data. In BCS-HCI ’07 Proceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI...but not as we know it, pages 119–128, Swinton, UK, 2007. British Computer Society. [26] A. K. Ericsson and H. A. Simon. Verbal Reports as Data. Psychological Review, 87(3):215–251, 1980. [27] A. K. Ericsson and H. A. Simon. Protocol Analysis: Verbal Reports as Data. The MIT Press, Cambridge, MA, USA, 1984. [28] Forrester. US consumer tablet forecast update, 2011 to 2016. Technical report, 2012. [29] J. Goldberg and X. Kotval. Computer Interface Evaluation Using Eye Movements: Methods and Constructs. International Journal of Industrial Ergonomics, 24(6):631–645, 1999. [30] Y. Habuchi, M. Kitajima, and H. Takeuchi. Comparison of eye movements in searching for easy-to-find and hard-to-find information in a hierarchically organized information structure. In ETRA’08 Proceedings of the 2008 symposium on Eye tracking research & applications, number 212, pages 131–134, New York, New York, USA, 2008. [31] H. Heikkilä. eReading User Experiences: eBook Devices, Reading Software & Contents. Technical Report 54, NextMedia, 2011. [32] M. Hertzum and N. E. Jacobsen. The evaluator effect: a chilling fact about usability evaluation methods. Int. Journal of Human-Computer Interaction, 15(1):183–204, 2003. [33] A.-m. Horcher and M. Cohen. Ebook Readers: An iPod for Your Books in the Cloud. Communications in Computer and Information Science Part I, 174:22–27, 2011. [34] C.-h. Huang and C.-m. Wang. Usability Analysis in Gesture Operation of Interactive E-Books on Mobile Devices. Design, User Experience, and Usability Lecture Notes in Computer Science, 6769:573–582, 2011. [35] A. Huthwaite, C. E. Cleary, B. Sinnamon, P. Sondergeld, and A. McClintock. Ebook Readers: Separating the Hype from the Reality. In Proceedings of 2011 ALIA Information Online Conference & Exhibition, page 12, Brisbane, Australia, 2011. QUT Library. [36] IDC. Media Tablet Shipments Outpace Fourth Quarter Targets. Worldwide Quarterly Media Tablet & e-Reader Tracker, 2012. References 79 [37] ISO. ISO 9241-11:1998 Ergonomic requirements for office work with visual display terminals (VDTs) - Part 11: Guidance on usability. Technical report, International Organization for Standardization, 1998. [38] ISO. ISO/IEC 25062: Software engineering - Software product Quality Requirements and Evaluation (SQuaRE) - Common Industry Format (CIF) for usability test reports, 2006. [39] E. Johnson. Touch display—a novel input/output device for computers. Electronics Letters, 1(8):219–220, 1965. [40] S. Johnson and P. Prijatel. The Magazine from Cover to Cover. Oxford University Press, 2006. [41] D. Kahneman and J. Beatty. Pupil Diameter and Load on Memory. Science, 154(3756):1583–1585, 1966. [42] C. Lewis, P. Polson, C. Wharton, and J. Rieman. Testing a Walkthrough Methodology for Theory-Based Design of Walk-Up-and-Use Interfaces. In Chi ’90 Proceedings, pages 235–242, 1990. [43] C. C. Marshall and S. Bly. Turning the page on navigation. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries - JCDL ’05, page 225, New York, New York, USA, June 2005. [44] T. Masalin. iPad & iPhone käsikirja. Docendo, Jyväskylä, 2011. [45] D. Mauney, J. Howarth, A. Wirtanen, and M. Capra. Cultural similarities and differences in user-defined gestures for touchscreen user interfaces. In CHI EA ’10 Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems, pages 4015–4020, New York, New York, USA, 2010. [46] D. Mayhew. The Usability Engineering Lifecycle. Morgan Kaufmann Publishers, San Francisco, CA, 1999. [47] B. A. Mitchell, L. Christian, and T. Rosenstiel. The Tablet Revolution and What it Means for the Future of News. Technical Report 202, Pew Research Center’s Project for Excellence in Journalism, 2011. [48] MPA. The Mobile Magazine Reader - A Custom Study of Magazine App Users. Technical report, MPA–The Association of Magazine Media, 2011. [49] J. Nielsen. Finding usability problems through heuristic evaluation. In CHI’92 Proceedings of the SIGCHI conference on Human factors in computing systems, pages 373–380, New York, New York, USA, 1992. [50] J. Nielsen. Usability engineering. Academic Press, Boston, 1993. References 80 [51] J. Nielsen and T. K. Landauer. A mathematical model of the finding of usability problems. In CHI’93 Proceedings of the SIGCHI conference on Human factors in computing systems, pages 206–213, New York, USA, May 1993. [52] J. Nielsen and J. Levy. Measuring Usability: Preference vs. Performance. Communications of the ACM, 37:66–76, 1994. [53] J. Nielsen and R. L. Mack. Usability inspection methods. Wiley, New York, NY, USA, 1994. [54] J. Nielsen and R. Molich. Heuristic evaluation of user interfaces. In CHI ’90 Proceedings of the SIGCHI conference on Human factors in computing systems: Empowering people, volume 17, pages 249–256, New York, NY, USA, 1990. [55] J. Nielsen and K. Pernice. Eyetracking Web Usability. Voices That Matter. New Riders, Berkeley, CA, USA, 2009. [56] D. A. Norman. The Design of Everyday Things. Basic Books, 1988. [57] D. A. Norman. Natural user interfaces are not natural. interactions, 17(3):6, May 2010. [58] D. A. Norman and J. Nielsen. Gestural Interfaces: A Step Backward In Usability. interactions, 17(5):46, Sept. 2010. [59] S. Outing and L. Ruel. The Best of Eyetrack III: What We Saw When We Looked Through Their Eyes, 2006. [60] B. Pan, G. K. Gay, H. A. Hembrooke, L. A. Granka, M. K. Feusner, and J. K. Newman. The Determinants of Web Page Viewing Behavior: An Eye-Tracking Study. In ETRA ’04 Proceedings of the 2004 symposium on Eye tracking research & applications, volume 1, pages 147–154, New York, NY, USA, 2004. [61] K. Pernice and J. Nielsen. Eyetracking methodology: How to conduct and evaluate usability studies using eyetracking. Technical Report August, 2009. [62] M. Pomplun and S. Sunkara. Pupil Dilation as an Indicator of Cognitive Workload in Human-Computer Interaction. In Proceedings of the 10th International Conference on Human-Computer Interaction, page 5, 2003. [63] S. Rosenbaum, J. A. Rohn, and J. Humburg. A Toolkit for Strategic Usability: Results from Workshops, Panels, and Surveys. In CHI ’00 Proceedings of the SIGCHI conference on Human factors in computing systems, volume 2, pages 337–344, 2000. [64] J. Rubin and D. Chisnell. Handbook of usability testing: how to plan, design, and conduct effective tests. Wiley, Indianapolis, IN, 2nd edition, 2008. [65] D. Saffer. Designing Gestural Interfaces. O’Reilly Media, Sebastopol, CA, 2009. References 81 [66] J. Sauro. Using a Single Usability Metric (SUM) to Compare the Usability of Competing Products. In HCII 2005 Proceeding of the Human Computer Interaction International Conference, page 9, 2005. [67] J. Sauro and E. Kindlund. A method to standardize usability metrics into a single score. In CHI ’05 Proceedings of the SIGCHI conference on Human factors in computing systems, page 9, New York, New York, USA, 2005. [68] S. C. Seow, D. Wixon, A. Morrison, and G. Jacucci. Natural user interfaces. In CHI EA’10 Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems, page 4453, New York, New York, USA, Apr. 2010. [69] B. Shneiderman. Direct Manipulation: A Step Beyond Programming Languages. IEEE Computer, 16(8):57–69, 1983. [70] E. Siegenthaler, P. Wurtz, and R. Groner. Improving the Usability of E-Book Readers. Journal of Usability Studies, 6(1):25–38, 2010. [71] S. L. Smith and J. N. Mosier. Design Guidelines for User-System Interface Software. Technical report, 1984. [72] C. Stevens. Designing for the iPad: building applications that sell. Wiley, Hoboken, N.J., 2011. [73] Tobii. Using Eye Tracking to Test Mobile Devices. Technical report, Tobii Technology AB, 2010. [74] M. Töyry, P. Räty, and K. Kuisma. Editointi aikakauslehdessä. Taideteollinen korkeakoulu, Helsinki, Suomi, 2008. [75] Työterveyslaitos. Käytettävyydellä potkua tuotekehitykseen. Technical report, Työterveyslaitos, Oulu, 2009. [76] M. Väisänen. E-lukulaitteen ensikäytön käytettävyysongelmat ja käyttäjäkokemuksen ajallinen kehittyminen. Diplomityö, Aalto-yliopisto, 2011. [77] C. Ware. Information Visualization: Perception for design. Morgan Kaufmann, San Francisco, CA, 2000. [78] S. Wearden. Landscape vs . Portrait Formats: Assessing Consumer Preferences. Technical report, 1998. [79] S. T. Wearden, R. Fidler, A. B. Schierhorn, and C. Schierhorn. Portrait vs. landscape: Potential users’ preferences for screen orientation. Newspaper Research Journal, 20(4):50–61, 1999. [80] R. Wilson and M. Landoni. EBONI Electronic Textbook Design Guidelines. Technical Report March, Joint Information Systems Committee (JISC), 2002. Appendices 82 Below are the pre-test user background questions and the assignment paper as they were presented to the users (tasks 2–11 in random order). Notes from a preliminary heuristic expert evaluation (ww and ps) is attached in the end. All appendices are in Finnish. 29.4.2012 Taustatietokysely Taustatietokysely Tämän kyselyn tarkoituksena on kartoittaa taustan vaikutusta tutkimustuloksiin. Kaikki tiedot käsitellään luottamuksellisesti. * Required Perustiedot Nimesi: * Sähköpostiosoite: Anna sähköpostiosoitteesi, jos haluat jatkossakin saada kutsuja koehenkilöksi Mediatekniikan laitokselle. Ikäsi: * Sukupuolesi: * Mies Nainen Testiaika: Sovitun testiajankohdan päivämäärä ja kellonaika Kätisyys: * Kummalla kädellä kirjoitat? Vasen Oikea Kuuluuko harrastuksiisi/opiskeluusi/ammattiisi graafista suunnittelua? * Esimerkiksi www-sivun ulkoasun suunnittelu, lehden taittaminen jne. Kyllä Ei Ammattisi tai koulutusohjelmasi * Jos olet opiskelija, kirjoita tähän koulutusohjelmasi (esimerkiksi tietotekniikka) https://docs.google.com/spreadsheet/viewform?formkey=dHl3V0g1RjFmekVoMHg0Mm9Ldl9tMUE6M… 1/3 29.4.2012 Taustatietokysely Aikaisempi taulutietokoneiden käyttökokemus Omistatko taulutietokoneen? * Esimerkiksi Apple iPad tai Samsung Galaxy Tab Kyllä En Oletko koskaan käyttänyt taulutietokonetta? * Esimerkiksi Apple iPad tai Samsung Galaxy Tab Kyllä En Jos olet käyttänyt taulutietokonetta, oletko käyttänyt sitä sanoma- tai aikakauslehtien lukemiseen? * Kyllä En Jos vastasit edelliseen kysymykseen kyllä, mitä lehtiä olet lukenut taulutietokoneella? Omistatko Applen tuotteita? * iPhone, iPod, Mac, iPad Kyllä En Voisitko kuvitella ostavasi jonkin Applen tuotteen? * Kyllä En Ehkä https://docs.google.com/spreadsheet/viewform?formkey=dHl3V0g1RjFmekVoMHg0Mm9Ldl9tMUE6M… 2/3 29.4.2012 Taustatietokysely Tietokone-lehti Oletko aikaisemmin lukenut Tietokone-lehteä? * Kyllä En Kiinnostaako sinua elektroniikka- ja tietotekniikka-aiheiset lehdet? * Kyllä Ei Luetko säännöllisesti jotakin aikakausi- tai sanomalehteä? * Kyllä En Jos vastasit edelliseen kysymykseen kyllä, niin mitä lehtiä luet? Submit Powered by Google Docs Report Abuse - Terms of Service - Additional Terms https://docs.google.com/spreadsheet/viewform?formkey=dHl3V0g1RjFmekVoMHg0Mm9Ldl9tMUE6M… 3/3 Koejärjestelyt ja ääneen ajattelu k0 Tervetuloa testiin. Sinulle esitetään pian 11 tehtävää, jotka pitäisi suorittaa käyttämällä iPadia ja samalla ajatella ääneen. Ääneen ajattelulla voit selittää menetelmiä, joita käytät tehtävän suorittamiseen ja kommentoida mahdollisesti kohtaamiasi ongelmia. Ääneen ajattelun tulisi olla mahdollisimman jatkuvaa ja kokeen pitäjä muistuttaa, jos olet liian kauan hiljaa. Ilmoita jokaisen tehtävän jälkeen, kun olet mielestäsi valmis ja täytä kolmen kohdan kysely. Tehtävien jälkeen saat vapaasti selata lehteä. Lopuksi saat täytettäväksi lomakkeen. Testi kestää kokonaisuudessaan 45-60 min. Tämän testin aikana ei arvioida suoritustasi, vaan lehden käytettävyyttä. Älä pelkää kritiikin antamista ääneen ajattelun aikana; kokeen pitäjä ei ole ollut mukana kehittämässä lehteä. Kaikki palautteesi on arvokasta ja se käsitellään nimettömästi. Koetilanne kuvataan, jotta kommentit ja iPadin ohjauseleet saadaan tallennettua myöhempää tarkastelua varten. Silmänliikkeiden mittaus Tehtäviä tehdessä katseesi iPadilla tallennetaan. Silmänliikekamera sijaitsee näytön alapuolella ja käyttää silmille täysin vaaratonta infrapunatekniikkaa. Kokeen pitäjä pyytää sinua tarvittaessa korjaamaan asentoa tehtävien aikana, jotta silmänliikkeet saadaan tallennettua. Tehtävät Kuvittele, että olet ladannut iPadiisi Tietokone-lehden kesäkuun numeron ja avaat sen nyt ensimmäistä kertaa. Lue tehtävien otsikot ja tehtävänannot ääneen ja varmistu, että olet ymmärtänyt tehtävänannon, ennen kuin aloitat. Yritä suorittaa kaikki tehtävät mahdollisimman nopeasti. 1. ”Nanokoossa kaikki on toisin” Selaa läpi artikkeli ”Nanokoossa kaikki on toisin” pystysuuntaisilla pyyhkäisyillä ja etsi siitä kaikki toiminnallisuudet/interaktiot painelemalla artikkelin elementtejä. Siirry lopuksi edelliseen tai seuraavaan näkymään pyyhkäisemällä näyttöä sivusuunnassa. Palaa takaisin artikkeliin ”Nanokoossa kaikki on toisin”. Voit tarkistaa tästä paperista tehtävänannot ja seinältä ohjauseleet kaikkien tehtävien aikana. Muista puhua ääneen miksi teet mitä teet, mitä ajattelet ja mitä tunnet. Ennen seuraavan tehtävän alkua kokeen pitäjä palauttaa sovelluksen lehden ”etusivulle”: näkymään, joka aukeaa, kun sovellus käynnistetään ensimmäistä kertaa. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely ympyröimällä numero asteikolta, joka vastaa parhaiten kokemustasi. Kaikkiin kohtiin on vastattava. Jos et jostain syystä pysty vastaamaan kyselyn kohtaan, ympyröi ”3”. vaikeaa 1 2 3 4 5 helppoa ärsyttävää 1 2 3 4 5 miellyttävää hidasta 1 2 3 4 5 nopeaa Oliko tehtävän tekeminen tällä sovelluksella mielestäsi vaikeaa vai helppoa? Oliko tehtävän tekeminen tällä sovelluksella mielestäsi ärsyttävää vai miellyttävää? Oliko tehtävän tekeminen tällä sovelluksella mielestäsi hidasta vai nopeaa? 2. ”Järkkäristä tuli videokamera” Etsi kameratestissä ”Järkkäristä tuli videokamera” parhaan arvosanan saanut kamera. Avaa kameran kuva suurennettuna näkyviin näytölle, jos mahdollista. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 3. ”Kriisi 2.0” Minkä nimistä projektia kutsutaan artikkelin ”Kriisi 2.0” mukaan karttojen Wikipediaksi? Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 4. ”Tietoturvaa iPadiin ja iPhoneen” Millaisen neuvon artikkeli ”Tietoturvaa iPadiin ja iPhoneen” tarjoaa iPhone-kännykän varastamisen varalle? Etsi palvelun/toiminnon nimi ja avaa siihen liittyvä kuva suurennettuna näkyviin näytölle, jos mahdollista. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 5. ”10 vekkulia USB-lelua” Valitse mielenkiintoisin laite artikkelista ”10 vekkulia USB-lelua”. Jos laitteesta on kuva, avaa se suurennettuna näkyviin näytölle, jos mahdollista. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 6. Pakina Etsi nimimerkin ”Kiukkuinen ICT-johtaja” kirjoittama pakina Facebookin käytöstä työpaikoilla aivan lehden lopusta. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 7. ”Tietokoneen tulevaisuus on täällä” Etsi tablettitietokonetestistä ”Tietokoneen tulevaisuus on täällä” 10 tai 7-tuumaisten laitteiden vertailutaulukko (ei Akkukesto-taulukko) ja valitse yksi tablettitietokone jollain vapaavalintaisella kriteerillä. Etsi sitten laitteen arvostelu ja avaa kuva laitteesta suurennettuna näkyviin näytölle, jos mahdollista. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 8. ”Sähköinen lukeminen maistuu jo” Etsi tilastografiikka artikkelista ”Sähköinen lukeminen maistuu jo” ja valitse mielestäsi yllättävin asia, mikä tilastoista käy ilmi. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 9. ”Suljettujen ovien takana” Selaa reportaasin ”Suljettujen ovien takana” kaikki kuvat läpi siten, että avaat jokaisen kuvan suurennettuna näkymään näytölle, jos mahdollista. Valitse mielenkiintoisin kuva. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 10. ”Suunnistuksen uudet tuulet” Vertaile kuvia kännyköiden navigointisovellusten opastusnäkymistä artikkelissa ”Suunnistuksen uudet tuulet”. Valitse navigointisovellus, jota käyttäisit mieluiten. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa 11. Kolumni Etsi Jyrki Kasvin kirjoittama kolumni. Ilmoita, kun olet mielestäsi valmis. Täytä alla oleva kysely. Tehtävän tekeminen tällä sovelluksella oli mielestäni: vaikeaa ärsyttävää hidasta 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 helppoa miellyttävää nopeaa Selaus Selaa lehteä vapaasti kommentoiden 5-10 minuuttia. Täytä tämän jälkeen käytettävyyskysely. Käytettävyyskysely Tuotteella tarkoitetaan seuraavissa väittämissä käyttämääsi Tietokone-lehden iPad-sovellusta, ei itse iPadlaitetta. Yritä vastata kaikkiin kohtiin nopeasti ilman pitkää miettimistä. Kaikkiin kohtiin on vastattava. Voit vastata ”3”, jos et mielestäsi pysty vastaamaan johonkin kysymykseen. 1: Olen sitä mieltä, että voisin käyttää tätä tuotetta säännöllisesti. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 2: Tuote on mielestäni liian monimutkainen. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 3: Tuotetta on mielestäni helppo käyttää. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 4: Mielestäni tuotteen käytön oppiminen vaatii kokeneen käyttäjän opastusta. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 5: Mielestäni tuotteen eri toiminnot ovat liitetty toisiinsa onnistuneesti. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 6: Mielestäni tuotteessa on liikaa epäjohdonmukaisuuksia. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 7: Uskon, että useimmat oppivat käyttämään tuotetta hyvin nopeasti. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 8: Mielestäni tuote on hyvin kömpelö käyttää. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 9: Tunsin oloni hyvin luottavaiseksi tuotetta käyttäessäni. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 10: Mielestäni ennen tuotteen käyttöä pitää opetella paljon uusia asioita. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 11. Kuvien selaus toimi mielestäni tuotteessa hyvin Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä 12. Tuotteen ulkoasu tuki mielestäni tuotteen käyttöä. Täysin eri mieltä 1 2 3 4 5 Täysin samaa mieltä Kannessa otsikot vievät juttuihin ● ● Missä sisällysluettelo/sivukartta? Löytyi alareunaa napauttamalla, ei mitään indikaattoria moisesta. Alapalkissa: ○ “Kansi” (yksiselitteinen) ○ “Sisältö” (yksiselitteinen) ○ “Sivukartta” (yksiselitteinen) ■ muuten hyvä, mutta ei korostusta nykyisellä sivulla ■ ja hyppii oudosti sivun tai kaksi eteenpäin kun Sivukarttaan palaa, s.e. nykyinen sivu jää vasemmalle piiloon ○ “Kirjasto” (melko yksiselitteinen, “Arkisto” olisi parempi, avaa saman lehden muut ostetut numerot) ○ “Uutisvirta” (avaa Tietokone-lehden nettisivun, joten “Kotisivu” oilsi parempi. Avaa nettisivun uuteen ikkunaan, joten alapalkkia ei saa näkyviin uudestaan kunnes tajuaa sulkea ikkunan ylänurkan Xstä.) ○ “Kauppa” (yksiselitteinen, voi ostaa Tietokone-lehden numeroita) ● Yleistä: ● Ei tue vaakarotaatiota ● Ei zoomausta nipistyksellä ● Takaisin-painike: ei ole, sisällysluettelossa virhepainalluksesta joutuu palaamaan kahdella askeleella alapalkin kautta ● Linkkejä ei ole erotettu tavallisesta tekstistä tai kuvasta mitenkään. Miksei voi olla kuten webissä, alleviivattuina? ● Sivunumeroja ei ole->vaikeuttaa navigointia ● Tekstiä ei voi kopioida, sivut kuvatiedostoja ● Siirryttäessä sovellusten välillä ei muista mihin kohtaan lukeminen jäi, vaan hyppää ensimmäiseen juttuun. WoodWing Heusristinen testi “Tietokone 06/2011”, 11.10.2011 ● ● Ei kantta Sivukartta löytyy nopeasti tuplaklikkauksen alta, vaikkei sitä etukäteen tiedä ○ ei nykyisen sivun korostusta ○ 2-numeroiset sivunumerot jäävät piiloon thumbnailien taakse ○ pystyrotaatiossa pitäisi olla 3 s. kerrallaan näkyvissä kuten vaa’assa: edellinen, nykyinen ja seuraava sivu->parempi navigoida ○ pidettäessä sivukarttaa esillä ja vaihdettaessa sivua, sivun alalaita jää sivukartan peittoon, eikä tule esiin vaikka sivukartan ottaa pois Yleistä: ● Tukee vaakarotaatiota ○ välillä pystystä vaakaan käännettäessä sivu kyllä kääntyy mukana, mutta ei “levity” ja oikealle jää tyhjää ● Ei zoomausta nipistyksellä ● Ei kantta, ei “alapalkkia” ● Takaisin/Undo painike: ei ole ● Linkit tajuaa linkeiksi, ainakin sis.luettelossa ● Sivujen lataaminen kestää 0.5-1 s., hankaloittaa nopeaa selausta ● Sivunumerot löytyy, helpottaa navigointia ● HTML5->tekstiä ja kuvia voi kopioida Photoswipe Sisällysluettelo ● Ei sivunroita vaikka sivuilla ne näkyvät ● Ei ilmennetä että se jatkuu alaspäin, artikkeleissa pieni nuoli ilmentämässä ● Liikaa eri fontteja (5) osioiden ingresseissä ○ 2-3 riittää, sitaatit, ingressit ja testattavat ● Melko pitkään joutuu vierittämään nähdäkseen sis.luettelon kokonaan ○ pienentämällä kuvia ja samantyyppiset jutut yhteen ryhmittämällä (kolumnit allekkain/vierekkäin, testit allekkain/ vierekkäin) säästyisi tilaa ● “Testit” artikkeleissa vielä lisäksi “Testissä”, turhaa ● Kuvanrajausalgoritmi näyttäisi toimivan hyvin Pääkirjoitus ● Riittävän isot kuvat Käynnistys ● 1. otsikon viimeinen sana “jo” menee toiselle riville pystyrotaatiossa ● Karttagrafiikassa vaikea yhdistää maiden nimiä karttaan ● OpenOffice-jutun yhteydessä Itella-sitaatti, joka ei liity juttuun, WoodWingissä sitaatti erotettu selkeämmin ● Kuvatekstit näkyvät suoraan kuvien alla, näin kuvatekstien tiedot eivät jää vahingossakaan katsomatta. Kuvien suurennos painettaessa työn alla. 10 USB lelua ● Turhan väljä taitto ja isot kuvat. Toimisiko pikkujutut kahdella tai useammalla sarakkeella? Sisällysluettelo ● Ei ilmennetä että se jatkuu alaspäin, artikkeleissa pieni nuoli ilmentämässä lisäsivuista ● “Joka numerossa” ei ole juttua Amazonin pilvi repesi tai Tietoyhteiskunta 2.0, vaan Kolumnit geneerisenä, kuten Kytkentöjä (tämä kunnossa helmikuun nrossa) ● Sis.luettelon linkeissä sivunvaihtoanimaatio, näkee että nyt hypätään lehdessä eteenpäin ● Sis.luetteloon pitäisi olla oikoreitti Pääkirjoitus ● Ei löydy sivukartasta ● Pienet kuvat ● Outoa pistekoon vaihtelua 1. kplssa Käynnistys ● 1. jutun iso grafiikka ei mahdu sivulle ja vierittäminen kadottaa graafin toisen reunan ● Kartta toimii hyvin, kunhan tajuaa painaa maita ● “Timo Valli” kuva kasvaa turhan vähän painettaessa, vieressä olevista 3sta kuvasta ei tapahdu mitään painettaessa vaikka samanlainen kehys. Tällä ratkaisulla käyttäjä jouttu koettamaan jokaista kuvaa löytyisikö lisätietoa. Jos kysyttäisiin “Milloin Timo Valli aloittaa työnsä?”, moniko löytäisi kuvatekstin? ● HP jutussa kuvassa “+” symboli, joka indikoi kuvatekstiä painettaessa. Miksei näin kaikissa kuvissa, joiden yhteydessä on kuvateksti piilossa? 10 USB-lelua ● Hyvä, tiivis taitto, melkein kaikki 10 mahtuvat samalle näkymälle Sosiaalinen media ● Leipätekstin saa kokonaan näkyviin yhdelle ruudulle portaattoman scrollauksen ansiosta. ● Nostettu sitaatti ei mahdu ruudulle kokonaan pystyrotaatiossa Uudet tuotteet ● Kaikki tiedot samalla sivulla vierityksen päässä Tulevaisuuden tekniikka ● Tietoyhteiskunta 2.0 Testissä: Tietokoneen tulevaisuus... ● Jutun jatkumista kuvaava nuoli näyttää napilta, mutta ei ole sitä ● Laitetestit näkyvissä kerralla samassa näkymässä, helppo vertailla keskenään. Sosiaalinen media ● Ruutu kerrallaan scrollautuva sivu toimii paremmin lyhyiden juttujen kohdalla, tässä ruudun vaihto alaspäin katkaisee leipätekstin. Uudet tuotteet ● Tuotteista esillä vain kuvat, tiedot painalluksen päässä, joka aukaisee ponnahdusikkunan, jonka saa suljettua vain ylänurkan ruksista. Vierittäminen paljon vaivattomampi tapa liikkua tuotteiden arvostelujen välillä. Tulevaisuuden tekniikka ● Ensimmäisen kuvan kuvatekstiä ei löydy Tietoyhteiskunta 2.0 ● Tässä sivun sisällä toimiva leipätekstin vieritys toimii hyvin, koska jutun otsikko pysyy näin näkyvissä koko ajan. Testissä: Tietokoneen tulevaisuus... ● Jutun jatkumista alaspäin ei indikoida mitenkään ● Tässä jutussa kuvissa tyylikästä toiminnallisuutta (Angry Birdsin ja liittimien suurennos) ● Lopun kuvagalleria toimii ainakin tässä tapauksessa näin paremmin, jossa kaikki osat ovat näkyvissä kerralla ○ mutta näitä ei voi halutessaan suurentaa ● Laitetestejä vaikea vertailla, kun aukeavat vain omiin ikkunoihinsa yksi kerrallaan Kriisi 2.0 Järkkäristä tuli videokamera ● Kameroita helppo vertailla keskenään, kaikki ovat näkyvillä vierityksen päässä Suljettujen ovien takana ● Kannen tekstilaatikolle sopivampi paikka (pystyrotaatiossa) olisi oikealla alakulmassa lattialla ● Kuvakaruselli muuten hyvä, mutta ○ yläosan leipäteksti kaipaisi sivuille marginaalit. ○ Osissa kuvissa on sama leipäteksti vaikka se näyttäisi vaihtuvan kuvaa vaihdettaessa ○ vaakarotaatiossa yläosan leipäteksti ei mahdu näkymään Automatisoi Windows 7 asennus ● Jutun lopussa linkkejä, joihin voisi päästä suoraan painamalla. Tässä ne voi sentään kopioida leikepöydälle. Vältä sokki puhelinlaskussa ● Turhan väljä taitto, ainakin vaakarotaatiossa voisi olla kahdella palstalla Tietoturvaa iPadiin ja iPhoneen ● Selvästi parempi taitto, kuin WoodWingillä Kriisi 2.0 Järkkäristä tuli videokamera ● Kameroita hankala verrata kun vain kuvat näkyvät yhdessä, arvostelu painalluksen takana Suljettujen ovien takana ● Kuvatekstien nuolet (>>) samannäköisiä kuin tekstipalkin vieritysmahdollisuutta kuvaavat (>>>) ● Tässä jotkut kuvat suurenevat painettaessa ja osassa niistä on kuvateksti painalluksen takana. Jotkut kuvista eivät suurene painettaessa, näitä ei ole erotettu mitenkään Automatisoi Windows 7 asennus ● Jutun lopussa linkkejä, joihin voisi päästä suoraan painamalla. Tässä niitä ei voi edes kopioida Vältä sokki puhelinlaskussa ● Sivua vaikea vaihtaa nopeasti, koska sivulle vierittyvä leipätekstiosa vie suuren osan ruutua Tietoturvaa iPadiin ja iPhoneen ● Sekava taitto. Miksi “Etsi iPhone”-kuva heti alussa? Turhan väljä rajaus “Alarmomatic”-kuvassa Väripoikkeama kuriin ● Toimiva taitto kahdella palstalla ● Kuvakaruselli ○ kuvatekstiä ei ehdi lukea ennen automaattista piilotusta ○ kun kuvat eivät täytä koko ruutua pystyrotaatiossa, kuvatekstin ja hallintapainikkeiden piilotus turhaa ○ kuvateksti pysyvästi näkyviin yläreunasta alas vetämällä? ○ kuvakarusellista poistuminen heittää pois artikkelista ○ nipistyszoomaus toimii, mutta sen jälkeen ei saa kontrolleja/ kuvatekstiä näkyviin painalluksella Verkot ohjelmien ohjaukseen ● Nostositaatin ensimmäinen sana ei mahdu ruudulle pystyrotaatiossa Suunnistuksen uudet tuulet ● Tyylikäs kansi! ● Kahdelle palstalle taitetut testit kulkevat eri korkeudella otsikoiden vaatimien rivien erosta johtuen ○ rivirekisteri (ja korkeuksien tasaus esim otsikon jälkeen) käyttöön ● Kuvien marginaalit osin liian isot (esim Verbatim Mediashare) Vikatila ● Tarpeettoman väljät marginaalit (ostomääräys, flipperi, kirja), vaakarotaatiossa korostuu Väripoikkeama kuriin ● Kuvatekstejä ei suoraan näkyvissä, ● Viimeisen sivun alareunaan jää musta palkki Verkot ohjelmien ohjaukseen ● Suunnistuksen uudet tuulet ● Vikatila