Voyagers and Voyeurs - UW-Madison Database Research Group

Transcription

Voyagers and Voyeurs - UW-Madison Database Research Group
Voyagers and Voyeurs
Supporting Social Data Analysis
Jeffrey Heer
Computer Science Department
Stanford University
CIDR 2009 – Monterey, CA
5 January 2009
A Tale of Two Visualizations
vizster
Observations
Groups spent more time in front of the
visualization than individuals.
Friends encouraged each other to unearth
relationships, probe community boundaries, and
challenge reported information.
Social play resulted in informal analysis, often
driven by story-telling of group histories.
NameVoyager
The Baby Name Voyager
Social Data Analysis
Visual sensemaking can be social as
well as cognitive.
Analysis of data coupled with social
interpretation and deliberation.
How can user interfaces catalyze and
support collaborative visual analysis?
sense.us
A Web Application for Collaborative
Visualization of Demographic Data
Voyagers and Voyeurs
Complementary faces of analysis
Voyager – focus on visualized data
Active engagement with the data
Serendipitous comment discovery
Voyeur – focus on comment listings
Investigate others’ explorations
Find people and topics of interest
Catalyze new explorations
Out of the Lab,
Into the Wild
Wikimapia.org
DecisionSite posters
Spotfire Decision Site Posters
Tableau Server
Many-Eyes
Social Data Analysis In Action
1. Discussion and Debate
2. Text is Data, Too
3. Data Integrity and Cleaning
4. Integrating Data in Context
5. Pointing and Naming
For each, some thoughts on future directions.
I asked my colleagues: if you could give database
researchers a wish list, what would it be?
Discussion and Debate
Tableau X-Box / Quest Diag?
“Valley of Death”
Content Analysis of Comments
Service
Sense.us
Many-Eyes
Observation
Question
Hypothesis
Data Integrity
Linking
Socializing
System Design
Testing
Tips
To-Do
Affirmation
0
20
40
60
Percentage
80 0
20
40
60
Percentage
80
Feature prevalence from content analysis (min Cohen’s = .74)
High co-occurrence of Observations, Questions, and Hypotheses
WANTED: Structured Conversation
Reduce the cost of synthesizing contributions
Wikipedia: Shared Revisions
NASA ClickWorkers: Statistics
WANTED: Structured Conversation
Reduce the cost of synthesizing contributions
Can we represent data, visualizations, and social
activity in a unified data model?
Text is Data, Too
Visualization Popularity
Service
Many-Eyes
Swivel
Tag Cloud
Bubble Graph
Word Tree
Bar Chart
Maps
Network Diagram
Treemap
Matrix Chart
Line Graph
Scatterplot
Stacked Graph
Pie Chart
Histogram
0.0 0.1
0.2
0.3
Percentage
0.4
0.5 0.0 0.1
0.2
0.3
Percentage
0.4
0.5
Over 1/3 of Many-Eyes visualizations use free text
Alberto Gonzales
WANTED: Better Tools for Text
Statistical Analysis of text (with ties to source!)
Entity Extraction
Aggregation and Comparison of texts
Get a “global” view of documents
We can do better than Tag Clouds (!?)
Use text analysis tools to enable analysis of
structured conversation by the community.
Data Integrity and Cleaning
No cooks in 1910? … There may have
been cooks then. But maybe not.
The great postmaster
scourge of 1910?
Or just a bug
in the data?
Content Analysis of Comments
Service
Sense.us
Many-Eyes
Observation
Question
Hypothesis
Data Integrity
Linking
Socializing
System Design
Testing
Tips
To-Do
Affirmation
0
20
40
60
Percentage
80 0
20
40
60
Percentage
16% of sense.us comments and 10% of Many-Eyes comments
reference data quality or integrity.
80
WANTED: Data Cleaning Tools
Reshape data, reformat rows & columns
Handle missing data: label, repair, interpolate
Entity resolution and de-duplication
Group related values into aggregates
Assist table lookups & data transforms
Provide tools in situ to leverage collective
Transparency requires provenance
Integrating Data in Context
College Drug Use
College Drug Use
Harry Potter is Freaking Popular
WANTED: In-Situ Data Integration
Search for and suggest related data or views
User input for types, schema matching, or data
Apply in context of the current task
But record mappings for future use
Record provenance: chain of data sources
Examples: Google Web Tables, Pay-As-You-Go,
Stanford Vispedia, Utah VisTrails
Pointing and Naming
“Look at that spike.”
“Look at the spike for Turkey.”
“Look at the spike in the middle.”
Free-form
Data-aware
Visual Queries
Model selections as declarative queries over
interface elements or underlying data
(-118.371 ≤ lon AND lon ≤ -118.164) AND (33.915 ≤ lat AND lat ≤ 34.089)
Visual Queries
Model selections as declarative queries over
interface elements or underlying data
Applicable to dynamic, time-varying data
Retarget selection across visual encodings
Support social navigation and data mining
WANTED: Data-Aware Annotation
Meta-queries linking annotations to views
Visually specifying notification triggers
Annotating data aggregates (use lineage?)
Unified model (again!) to facilitate reference
How to make it work at scale?
How else to use machine-readable annotations?
Can annotations be used to steer data mining?
Conclusion
Social Data Analysis
Collective analysis of data supported
by social interaction.
1. Discussion and Debate
2. Text is Data, Too
3. Data Integrity and Cleaning
4. Integrating Data in Context
5. Pointing and Naming
Summary
As visualization becomes common on the web,
opportunities for collaborative analysis abound.
Weave visualizations into the web: data access,
visualization creation, view sharing and pointing.
Support discovery, discussion, and integration
of contributions to leverage the collective.
Improve both processes and technologies for
communication and dissemination.
Parting Thoughts
Visualizations may have a catalytic effect
on social interaction around data.
Encourage participation by minimizing or
offsetting interaction costs.
Provide incentives by fostering the
personal relevance of the data.
Acknowledgements
@ Berkeley: Maneesh Agrawala, Wes Willett,
danah boyd, Marti Hearst, Joe Hellerstein
@ IBM: Martin Wattenberg, Fernanda Viégas
@ PARC: Stu Card
@ Tableau: Jock Mackinlay, Chris Stolte,
Christian Chabot
Voyagers and Voyeurs
Supporting Social Data Analysis
Jeffrey Heer Stanford University
[email protected]
http://jheer.org
With a collaborative spirit, with a collaborative platform
where people can upload data, explore data, compare
solutions, discuss the results, build consensus, we can
engage passionate people, local communities, media and
this will raise - incredibly - the amount of people who can
understand what is going on.
And this would have fantastic outcomes: the engagement of
people, especially new generations; it would increase
knowledge, unlock statistics, improve transparency and
accountability of public policies, change culture, increase
numeracy, and in the end, improve democracy and welfare.
Enrico Giovannini, Chief Statistician, OECD. June 2007.