Information Visualization – a talk by Prof. G Benoît

Transcription

Information Visualization – a talk by Prof. G Benoît
Information Visualization Talk
Information Visualization – a talk by Prof. G Benoît
March 4, 2014, Simmons College
Volume of data
Popularization of the Topic
Creating Tools
Powerful computers
Our work lives …
[SLIDE 1] Welcome. I call this talk the “future of information” for several reasons.
• One is that we live already in a graphically-intense world. The typical American sees more than 3500 visual
messages a day;
• The Web has been around for a generation - students come to school with computers, software, and the
Internet as part of the topography of their lives not as new skills and modes of thought to be learned;
• The topic is popular - public awareness by people creating their own graphics, more graphics created by
designers are accessible - the web is full of examples as are the streets; popular press journals promulgate the
idea of Big Data as a benefit1 (without real analysis of the liabilities);
• Strata and Ted2 talks online, along with people appropriating the term for their own blogs and websites
increases the number of avenues towards the subject;
• More raw data being designed blurring the traditional lines of “information graphics”, “graphic design”, “data
visualization”, “information visualization”, and now Big Data, Data Analytics, Visual Analytics, decision
making;
Literatures and Definitions: Let’s start with some definitions: “Information visualization is the study of (interactive) visual representations of abstract data to reinforce human cognition. The abstract data include both numerical
and non-numerical data, such as text and geographic information.” This is pretty accurate.
And a cursory review of literatures from different times and domains suggest commonalities of interpretation but also differences: should the visualization be representations of abstract phenomena (such as “ideas”) or stimulate subconscious mental processes [cognition], or underscore the volume of data that could be related, or rely on a
few design principles [Ilinsky’s YouTube introduction3], expand to incorporate new flexible displays, or build on what
is already expected?
Do some data lend themselves to certain design models? Time and Space are two usual criteria.
What kind of rhetoric captures the goals of visualization? From the designers’ perspective, the choice of expression,
audience, and data stores dictate the process; to others there’s historical roots of data mining + visualization that are
integral to the visualization and interpretation; others emphasize the large volume of data [and that leads to outlandishly complex displays]; others on the end-user’s interpretive strategies; and finally, some “hard-core” computer
scientists limit the whole to the computational aspects.
1. Harvard Magazine. (2014, Mar-Apr). Making sense of big data.
2. e.g., http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html
3. https://www.youtube.com/watch?v=nrsdgvauqKg
–1–
Information Visualization Talk
A functional definition might be extracted by who engages in InfoVis:
computer science - from the more technical to the more applications of InfoVis:
popular (public) items - http://www.infovis-wiki.net/index.php/Information_Visualization
1999: Use of computer-supported, interactive, visualization of abstract data to amplify cognition [Card, et al.]
Information visualization utilizes [sic] computer graphics and interaction to assist humans in solving problems [Purchase et al., 2008, p. 58. Terrible
writing]
“Info Vis… is a special kind of visualization. Visualization is a part of computer graphics, which is in turn a subset of computer science … Info Vis is
visualization of abstract data… [s]hould be seen in contract to scientific visualization, which deals with physically-based data … Visualization of
abstract data is not straightforward …” [Voigt, 2002]
In information visualization, the graphical models may represent abstract concepts and relationships that do not necessarily have a counterpart in
the physical world, e.g., information describing user accesses to pages of an Internet portal or records describing selected properties of different car brands and models. Typically, each data unity describes multiple related attributes (usually more than four) that are not of a spatial or
temporal nature. Although spatial and temporal attributes may occur, the data exists in an abstract (conceptual) data space [Ferreira and Levkowitz, 2003].
The study of how to effectively present information visually. Much of the work in this field focuses on creating innovative graphical displays for
complicated datasets, such as census results, scientific data, and databases. An example problem would be deciding how to display the pages
on a website or the files on a hard disk. Visualization techniques include selective hiding of data, layering data, taking advantage of 3-dimensional space, using scaling techniques to provide more space for more important information (e.g. Fisheye views), and taking advantage of psychological principles of layout, such as proximity, alignment, and shared visual properties (e.g. color) [Usability First, 2003].
Information visualization, an increasingly important subdiscipline within HCI, focuses on graphical mechanisms designed to show the structure of
information and improve the cost of access to large data repositories. In printed form, information visualization has included the display of
numerical data (e.g., bar charts, plot charts, pie charts), combinatorial relations (e.g., drawings of graphs), and geographic data (e.g., encoded
maps). Computer-based systems, such as the information visualizer and dynamic queries have added interactivity and new visualization techniques (e.g., 3D, animation) [Averbuch, 2004].
Visual representations of the semantics, or meaning, of information. In contrast to scientific visualization, information visualization typically deals with
nonnumeric, nonspatial, and high-dimensional data [Chen, 2005]
** The upshot is “Information Visualization” is the graphic rendering of abstract phenomena; that the rendering shares
computational approaches and visual languages with other areas, such as graphic design, information graphics, and
illustration; that the definitions of information visualization vary by domain, history of computing use, history of
applying graphics; with very little reading of others’ literatures.
What is involved in general?
Sketching what you want to see - what underlying model of design? Do you draw from graphic design principles? Or from computing models?
Is your goal to display all the data, establishing combinations of extracted data based on some design or domain-specific needs?
• Do you use only the raw data or do you add extra-textual (metadata) as well as adding visual clues, such as
“real-world metaphors” that situate the viewer?
• Do you have to know how to program or do you use third-party software?
• Is there room for you to participate if you program your own designs or if you can’t program at all?
The Process
First we need a reason: why are these data going to be displayed? Commonly people design to express a lot
of data to be viewed; then interpreted; often for some specific need, such as decision-making, decision-support, or
–2–
Information Visualization Talk
uncovering unanticipated relationships in the data, discoverable by visual means.
• Explain - situate the viewer to understand; provide statistical evidence for the reason for the links; collapse
multiple layers and dimensions of data into a single, computer-based image
• Leads to integration of graphic design; simplified computing; emphasizes the idea of (usually semantic tokens) links or relationships
• Predict - depending on the underlying model, predict why something happened, or didn’t; predict how
something might change by altering some inputs;
• hypothesis generation; hypothesis testing for decision-making
• risk analysis, but obviously anything that is a concern for the domain expert group.
• Expose undiscovered - show the immediately obvious represented by a graphic difference or exposing visual
representations of the cause-and-effect of changing inputs
The Data
The data component is similar to the processes of “KDD” or Data Mining and Text Mining. A large volume
of data are extracted often from heterogeneous data sets and merged into a data warehouse. The warehouse consists
of prepared data (“cleansed data”) and then subjected to algorithms that find some relationships between the data.
This is vital: the reason for the relationships usually is bound to the domain: medical doctors may need links between
biomedical processes, expressed as terms extracted from the collection. For example, “myocardial infarction” may be
the text expression of data from one collection (one dimension of the data); combine this with patients’ records
about treatments (say prescriptions), but then add another dimension such as recovery time, and perhaps drug interactions … what might happen? Perhaps an unanticipated link between a drug, something unknown in a patient
record, and the strength of the prescription reveals a greater longevity.
So here the most obvious use of visualization of scientific data is to expose the otherwise abstract notion of
recovering from a heart attack.
The Display
The data themselves are separate from the presentation of the data; and separated still is the idea of interacting with the data.
Only recently, the “hard core” computer science folk are beginning to look at the graphic design of the data.
This idea is supported by changes in course curricula and activities across graduate courses in information visualization. The shift is noticeably towards the static visual representation of a complex problem, called “information graphics.” This is counter-intuitive because the static imagery of data seems to derive from explanatory texts and arts (such
as the 18th century French Enlightenment minds d’Alembert and Diderot’s Encyclopédie) to illustration (emphatically
not fine art, where the concept of representation versus abstraction took a different turn); computing has always been
about programmatic, that is algorithmic, solutions to human problems. We see, too, in the computer science literature at times ignorance of, or avoidance of, the graphic design literature or attempts to create from scratch what
designers have studied and applied for years4. Good examples are found in the industry standard InformationVisualization conference and journal.
4. http://simile-widgets.org/timeline/
–3–
Information Visualization Talk
Another trend is to popularize5 the fusion of arts with data. We see this in trends such as ArtScience, a ridiculous fusion, but with grant-funded fury, by David Edwards at Harvard; or the way MIT allows students to design the
official website, changing the feeling and look daily, provided certain features are provided; or how Mitsubishi incorporates visual story-telling in product development.
But there is more to consider: for example the study of visualizations (from information graphics) have
become a publishing trend, such as McCandless’ Visual miscellaneum: transit maps of the world. Or that we are here
today discussing the topic.
Data, Models, Tools
The key here is that we data standards and models (sql, xml, flat files), algorithms and models for extracting
data (parsing), techniques for creating combinations among large data sets that may be meaningful individually and in
the aggregate as well as computing tools suited for all levels of participation6.
For example, we could design something using Adobe Illustrator; equally we could sketch our own visualizations for the web using HTML4 or 5, easy-to-use 3rd party JavaScript drawing libraries (such as raphael.js7; Chart.js8,
processing.js9, d3js10), or, programmatically incorporating easy scripts such as PHP (to access flat file, xml, and relational database stores and to draw common plots, e.g., jpgraph11) and HTML5’s Canvas or incorporating Scalable
Vector Graphics (.svg; see for example the JavaScript + SVG tiled maps from PolyMaps12) files we adopt from Illustrator or create on-the-fly, again incorporating (usually free) tools to plot what one expects to see, given the baseline
(such as time; e.g., timeline widgets) … or go further using high-powered libraries and standards, such as Java3D,
JavaFX13 to create our own tools - or to use a host of OpenSource and proprietary tools.
Notice that there are a lot of OpenSource tools - but frankly with some effort on your own part you could
create cool stuff, too - and there are a lot of proprietary tools that expect (require) the “Cloud” (e.g., iCharts14- a
horrible idea. [Fight the Cloud with all your being because it is the Borg!]
Keep in mind that what the public and managers see as something new - Big Data and Visualization - has
been around since 1946 when American Demographics published an edition with a graphic on the cover - to demonstrate to statisticians the idea that data could be represented as a visual, instead of lists of numbers.
And quickly to this mix we add companies, such as IBM15, SPSS, and SAS, who had either engineers or statisticians willing and able to help domain experts (car sales, medical studies, economists, even the FBI) ingest incomprehensibly large data sets to extract interesting events, or combinations of data that suggest to the domain-expert an
important trend or a significant anomaly. To increase the confidence of one’s interpretation, these activities provided
statistical analyses. To be sure, there are times accidents of data sets happen (called “lift”) and the domain-expert
must intervene where “events” are identified that could never happen. Today the trend continues with OpenSource
5. http://www.webdesignerdepot.com/2009/06/50-great-examples-of-data-visualization/
6. Created from looking at a lot of sites (such as https://www.cmu.edu/teaching/technology/tools/informationvisualization/, http://www.creativebloq.com/infographic/20-freedata-visualisation-tools-5133780, ) to identify some trends.
7. http://raphaeljs.com
8. http://www.chartjs.org
9. http://processingjs.org
10. http://d3js.org
11. http://jpgraph.net
12. http://polymaps.org
13. http://www.oracle.com/technetwork/java/javafx/overview/index.html
14. http://www.icharts.net
15. For instance “ManyEyes” http://www-958.ibm.com/software/data/cognos/manyeyes/
–4–
Information Visualization Talk
statistics packages such as SQL and R16, integration of statistical products like R with vendor software (such as
tableau17 or with SQL as “NOSQL” proprietary scripting languages such as Cypher18), and the results of high-powered
schools’ student activities (for instance, MIT’s C-Sail projects and Harvard’s CS-50 course).
Old wine in new skins
A voice that’s not usually heard says that “Big Data” and “Information Visualization” are really, by their own
definitions, a continuation of information retrieval (IR) [establishing relationships between document collection
representation, query representation, a framework for their matching, and a relationship between queries and documents] in order to locate (“known-entity search”) and learn (formerly “browsing”, today “discovery”), interactivity,
cognition (or meaning-construction), but with an emphasis on the language of the relationship: from textual, 2-dimensional representations [lists of relevancy ranked items] to multidimensional, multi-layered representations using a
visual language. To this standard model19, I add V for the visual component.
We might represent this as Q, D, F, R(qi, dj) + V.
Some Questions
• If, on the one hand, the InfoVis trend is merely a popularization of data mining activities, what is its relationship to established fields and research? How do IR and DM map to IV?
• On the other hand, if the world of “Big Data” takes over, that is volumes of data so vast they outpace human
comprehension and so require so treatment for our understanding, how will how we think, communicate,
evaluate, and prepare records all be transformed? Have they already been transformed?
Treading lightly?
There have been many attempts to visualization data from libraries, archives, museums, and the like. And
there are trends to visualize purchasing habits (look at Amazon’s “people who bought X also looked at…” - these are
called “recommender systems”); just as there are tools, projects, and trends common in established research and
practice domains, of which geographic information systems20, chemistry21, and math are obvious examples. Notice,
too, that many “informatics” programs do the same work, with the same tools, resources, and visualizations, bioinformatics and health informatics leap to mind.
But … we should consider, too, the influences of technological change, innovation, and adoption. Just how
much is imposed from without? What situations coerce and require us to yield rights to participate in a highly digitized world - and can we stand up against them? That what constitutes the sanction approach to information and data
- driven from outside, and drive by volume, not necessarily utility, obviating work domains - a scythe mowing down
all before it.
Some established trends
16. http://www.r-project.org
17. http://www.tableausoftware.com/products
18. http://www.neo4j.org
19. See Baeza-Yates (1998) Modern information retrieval. New York: ACM Press and class notes for LIS466, Information Retrieval (web.simmons.edu/~benoit/lis466/
index.html).
20. Any GIS application; similar commercial products Gapminder.com; even Google Earth
21. JMOL (3d models of chemical structures)
–5–
Information Visualization Talk
The counter and opportunity is knowledge. But without genuine communication there can be no discussion
about opportunities, liabilities, benefits, or learning more.
Therefore, what should people in “information professions” do?
1. Learn about graphic design: know the widely-adopted principles of composition, typography, and color
theory
2. Learn about the history of visuals: mass communication [posters, advertisements, television]; “fine art” versus “low art”
a. Both of these ideas are easily accessible in standard texts, such as Meggs’ or Janson; but even more so
through companies with vested interests in an informed customer base, such as Adobe [cite]
b. But you need to be an informed consumer, too; many sites are inaccurate or have a limited scope
3. Learn more about data models and how they’re manipulated
a. relational databases
b. XML
c. full-text retrieval (IR)
d. trends in Data Mining, Text Mining, and the informatics movements
e. Boolean, extended Boolean, algebraic, probabilistic models - these are important in OPACs
4. Learn about fundamental issues related to the adoption of innovation and systems design
a. Examine how your institution makes decisions about enterprise-wide information systems
b. Can you argue for other models, say a “data centric” model versus the current “add another portal”
model?
5. Programming, scripting
a. Understand the relationship at a tactile level between extracting data, creating meaningful subsets, and
then translating this whole into a visual language
6. Master a few of the popular 3rd party tools and literatures
a. There are many OpenSource and proprietary software products
b. Find a literature that suits your level and needs: perhaps the technical industry’s Information Visualization or ACM SIG-VIS; or AMIA’s or Pacific Symposium of Biocomputing’s research; perhaps more popularizing trends such as a university or library system publication (such as Harvard Magazine); or something in-between
So how is IV your future?
The theme of the talk is that information visualization is your future. How? Information systems in
general do not emerge from the bottom-up; usually they’re usually imposed from beyond. While originally
librarians were full participants in the creating of forward-looking data models (MARC was very far-sighted for
its day), the trend, for myriad reasons to complex for today’s talk, has been to centralize - at first into OPACs
(RLIN, OCLC, etc.), then to integrated services (multiple otherwise independent information systems linked
either at the data-level or through the interface as a portal); now to multiple silos of data supplied by 3rd party
vendors that require additional staff and systems to link them … finally, to the idea of large, astoundingly large,
sets of heterogeneous data that can be extracted and combinations made that could be meaningful to the user.
The volume is so great that traditional lists are not sufficient; there needs to be other means to express the data
–6–
Information Visualization Talk
themselves, the links between the data, and to do so in a visual language.
There needs to be people who (a) create these systems, (b) understand these systems and can explain
the benefits, liabilities and use of these in new settings, and how to evaluate their usefulness, and (d) who can be
valuable contributors to the design and integration of new visualization systems.
Consider this transition: a spreadsheet program that has pie charts, bar graphs and the like. We grew up
with this as an everyday tool. Now consider relational databases. FileMaker Pro is commonly employed in
offices by staff; MySQL and Oracle are used in small offices settings through large businesses, increasingly with
visualization tools - still bound, tho, to the idea of pie charts and graphs. Notice, then, the ubiquitousness of
these products and tools … so why InfoVis and Big Data causing a stir?
The reason is the shift towards analysis of the data - the skills of analysis are increasingly statistics-based
or numeric. Quantification of data and its domination in work and study have shifted what passes for “legitimate
knowledge” to arguably only quantified empiricist base. The humane, qualitative, discursive are, some believe,
an avenue to add value.
Visualization provides also a chance to participate as a designer; imagine developing alternatives to the
FishEye or building a system that demonstrates usefulness to your clients/patron-base?
Conclusions? The field is wide-open and growing in popularity. It’s a combination of well-defined behaviors
but with a lack of stability in principles, tho some are gelling. I don’t think they’re quite right, tho …
There are ways to learn more about the topic, the specific activities, and inspiration… The rest of these slides is
a gallery of visual solutions, often drawing from the same dataset…
Visit the Information Visualization class, LIS593d. Your ideas and questions are welcome; do you want to participate on some info vis projects? Let me know - I have the resources but not the folk! Thanks.
Gerry Benoit, [email protected]
http://web.simmons.edu/~benoit/index.html
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
References
Averbuch, M. (2004). As you Like It: Tailorable Information Visualization, Database Visualization Research Group, Tufts University.
Card, S., Mackinlay, J., & Shneiderman, B. (1999). Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann Publishers.
Chen, C. (2005). ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=31454&arnumber=1463074&count=14&index=3 Top 10 Unsolved Information
Visualization Problems], IEEE Computer Graphics and Applications, 25(4):12-16, July-Aug. 2005.
Ferreira de Oliveira, M.D., & Levkowitz, H. (2003, Sept). doi.ieeecomputersociety.org/10.1109/TVCG.2003.1207445 From Visual Data Exploration to Visual Data Mining: A Survey], IEEE Transactions onVisualization and Computer Graphics, 9(3), pp. 378-394, July-September, 2003.
Gee, A.G.,Yu, M., & Grinstein, G.G. [nd] Dynamic and Interactive Dimensional Anchors for Spring-Based Visualizations. Technical Report, Computer
Science, University of Massachussetts Lowell.
Keim, D.A., Mansmann, F., Schneidewind, J. & Ziegler, H. (2006). Challenges in Visual Data Analysis, Proceedings of InformationVisualization (IV 2006),
IEEE, p. 9-16, 2006.
Plaisant, C. (2001, Nov.) InformationVisualization - Lecture Notes,
–7–
Information Visualization Talk
Purchase et al., 2008] Purchase, H. C., Andrienko, N., Jankun-Kelly, T. J., and Ward, M. 2008. Theoretical Foundations of Information Visualization.
In InformationVisualization: Human-Centered Issues and Perspectives, A. Kerren, J. T. Stasko, J. Fekete, and C. North, Eds. Lecture Notes In Computer
Science, vol. 4950. Springer-Verlag, Berlin, Heidelberg, 46-64. DOI= lololdx.doi.org/10.1007/978-3-540-70956-5_3
Usability First (2003). Usability Glossary. Retrieved at: 2003. www.usabilityfirst.com/glossary/main.cgi?function=display_term&term_id=5
Voigt, R., (2002). www.vrvis.at/via/resources/DA-RVoigt/masterthesis.html An Extended Scatterplot Matrix and Case Studies in Information
Visualization], Master’s thesis, Hochschule Magdeburg-Stendal, 2002, www.vrvis.at/vis/resources/DA-RVoigt/node4.html Classification and
Definition of Terms]
Wikipedia (2005). Information visualization. en.wikipedia.org/wiki/Information_visualization
http://www.matthiasdittrich.com/projekte/narratives/visualisation/index.html
http://www.wikimindmap.org
http://www.datavisualization.fr/blog/2011/12/data-visualization-in-2011-a-recap.html
http://nordisk.pp.ru/dizain-menedgment/
file name: InfoVisTalk-GB-2014.rtf
3/3/14, 10:10 AM
–8–