The Data Visualizers Using data visualizations to uncover the true

Transcription

The Data Visualizers Using data visualizations to uncover the true
MaRS Market Insights
The Data Visualizers
Using data visualizations to uncover the true
meaning behind a data set
MaRS is a member of
Content Lead and Market Analyst:
Neha Khera, MaRS Market Intelligence
Partner & Advisor:
Acknowledgements:
We thank the following individuals and organizations for their participation in this report:
Dr. Kamran Khan, CEO and Founder, Bio.Diaspora
Nick Edouard, EVP Business Development & Marketing, BuzzData
Nadia Amoroso, CEO and Co-Founder, DataAppeal
Haim Sechter, COO, DataAppeal
Niall Wallace, CEO and Founder, Infonaut
Lisa Zhang, Co-Founder, Polychart
Faizal Karmali, Director and Co-Founder, Quinzee
Sam Molyneux, CEO and Co-Founder, Sciencescape
Eugene Woo, Founder, Venngage
Disclaimer:
The information provided in this report is presented in summary form, is general in nature,
current only as of the date of publication and is provided for informational purposes only.
Specific advice should be sought from a qualified legal or other appropriate professional.
MaRS Discovery District, © October 2012
Table of Contents
Data Visualization Market / 4
Data, data and more data. What’s all the hype about? / 5
1: Amount of data created daily / 5
What is enabling the big data hype / 6
Extracting value from big data / 6
FIGURE 2: The Digital Intelligence Architecture / 6
The rise of data visualization / 7
FIGURE 4: Search and news reference volume for the word “infographic” on Google. / 8
FIGURE
Data visualization tools / 9
The challenge with data visualizations / 9
Investment in the data visualization space / 9
Noteworthy applications of data visualization / 11
Understanding census data / 11
5: GTA Population Change by Municipality 1996-2001 / 11
FIGURE 6: GTA Population Change by Municipality 2006-2011 / 12
Tracking disease / 13
FIGURE 7: John Snow’s cholera map / 13
FIGURE 8: Map reflecting Sault Ste. Marie mosquito trapping efforts / 14
Improving healthcare / 14
FIGURE 9: Hospital 30-day overall readmission rates by Ontario region, 2009-2010 / 14
Supporting decision-making / 115
FIGURE 10: Edward Tuft’s figure on the 1986 Challenger Space Shuttle launch decision / 15
Driving transparency / 16
FIGURE 11: Energy being supplied by renewable sources for US residents / 16
FIGURE
Groups supporting data visualization / 17
Looking ahead / 18
References / 19
Appendix A: The Data Visualizers / 20
Bio.Diaspora / 20
BuzzData / 22
DataAppeal / 24
Infonaut / 26
Polychart / 28
Quinzee / 30
Venngage / 32
i
Data Visualization Market
Data visualization is not a new concept. It has been used for centuries to distill and communicate information. Think about all the maps, graphs and charts in existence, and the popularity of this form of data
analysis will quickly become clear. However, with advancements in technology, data visualizations are taking
on more complex forms than ever before. They are being used to unravel the meaning behind big data
sets that would otherwise be too difficult to understand. Highlighted in this piece are eight Ontario-based
startups whose innovative applications are setting the future for data visualization.
04
Data, data and more data. What’s all the hype about?
To understand the importance of data visualization, let’s take a step back and look at the impact of data in
today’s modern economy. It has been said that we are living through the Industrial Revolution of data: an era
where so much data is being produced on a daily basis by people and machines that we no longer have the
capacity to store it all. From the billions of mobile phones to the trillions of RFID sensors, we live in a world
where our every action and reaction is being captured and stored. And while it may seem eerily intrusive, the
capturing of data has the potential to drastically improve the world in which we live. This is the rise of what’s
known as “big data.”
The term “big data” was coined to describe data sets with a size and complexity beyond the ability of typical
database software tools to capture, store, manage and analyze them.1 This definition is intentionally subjective
and is not meant to limit “big” data sets to a certain number of terabytes.1
Just how big a phenomenon big data actually is was eloquently captured in a remark by Google’s Eric Schmidt.
He pointed out that we are creating as much information every two days as we did from the dawn of civilization up until 2003. On a daily basis, this translates into around 2.5 exabytes of data.2
FIGURE 1: Amount of data created daily
With each coming year, the vastness of data generated will only intensify. For example, the Square Kilometer
Array (SKA) Telescope — the world’s largest telescope — is projected to generate in excess of one exabyte of
data per day when it goes live in 2024.3 This is roughly twice the amount of data that’s generated everyday on
the World Wide Web.3 IBM is working feverishly to develop a supercomputer powerful enough to handle this
amount of information.
Big data can and will impact every nation, industry, company and individual around the globe, whether it’s
in terms of understanding our galaxy, optimizing healthcare, selecting an ideal retail location or finding the
05
perfect date. A study by McKinsey Global Institute estimates that big data can add $300 billion worth of value
to the US healthcare system and can increase retailers’ operating margins by as much as 60%.1 There is no
doubt that those who collect, analyze and act on their data successfully will gain a competitive advantage in
their market.
What is enabling the big data hype
The rise of big data springs from two main factors:
1. The increased generation of information.
2. The ability to store this information.
Both of these factors are tied to advancements in technology. Social media applications have generated huge
amounts of sentiment online, where the beliefs, activities and interests of billions of people are being captured
in a way like never before. Mobile devices are used by over six billion people today, of which nearly five billion
are in developing countries.4 These devices are capturing data in regions where information was previously
difficult to extract. And through the rise of networked sensor technologies such as RFID (radio-frequency
identification) tags, more than 30 million articles are being tracked across the transportation, industrial and
retail sectors.1
And as Moore’s law continues to prevail, we now also have the ability to store all this data that’s being
generated. And storage of vast amounts of data is financially accessible to many. Today, the entire world’s
music can be stored on a device that costs less than $600.1 Up until the turn of this century, storing an average
music playlist of 7,000 songs would have cost $500 alone.
Extracting value from big data
The creation and capture of data by itself does not, obviously, benefit anyone — only when analysis is added
to the mix is the value of big data unlocked. Unfortunately, this is also an area where significant challenges
exist. Big data analysis remains a market in its infancy. As Google’s Chief Economist Hal Varian put it, “Data are
widely available; what is scarce is the ability to extract from them.”5
FIGURE 2: The Digital Intelligence Architecture6
06
Big data analysis is often hindered by the sheer cost involved in purchasing tools that can process large volumes of information. Another impediment is not being able to process information quickly enough to extract
insights in real-time. Waiting two days or two weeks for reports is becoming unacceptable given the fast pace
of digital interactions. What is likely the biggest obstacle is the lack of talent and expertise in the data science
field. The McKinsey Global Institute gauges that by 2018, more than half of all big data jobs, nearly 200,000 of
them, will go unfilled because skilled candidates will be in short supply.7
However, as we turn our attention to the field of data visualization — one form of data analysis — we start to
see many of these roadblocks disappear. The power of data visualizations lies in their ability to transform the
most complex of data sets into a rendering that even novice users can interpret. And through technology
innovation, data visualization tools have become increasingly easier to adopt, with intuitive user-interfaces and
cloud-based access.
The rise of data visualization
At its core, data visualization is the use of abstract, non-representational pictures to show numbers.8 It can
include points, lines, symbols, words, shading and colour.8 Data visualizations make it easier to spot trends and
patterns amid large amounts of information. They also make it possible for data to tell a story. Just as experts
in the field of communication propose the use of stories to better convey information verbally, the same holds
true when conveying information through data. And one of the best ways to tell a data story is to use a compelling visual.
As industry-renowned data visualization expert Edward Tufte once said about the traditional rows and columns
of data tables, “The world is complex, dynamic, multidimensional; the paper is static, flat. How are we to
represent the rich visual world of experience and measurement on mere flatland?”9
Data illustration techniques have been in use since as early as 6200 BC, when the oldest known map was
drawn. However, it was not until the eighteenth century when data visualizations went beyond mapping and
more abstract measures were introduced, including the ever-popular pie and bar charts.
The nineteenth century saw the creation of what many have argued to be the world’s best data visualization:
Charles Joseph Minard’s 1869 visualization titled Napoleon’s March, which depicts the movement and losses of
Napoleon’s army as it invaded Russia in 1812.
After 1975, we witnessed the most rapid advancements in data visualization, which stemmed from the
development of software and computer systems. Data visualizations moved beyond pie and bar charts, and
more complex formats began to appear and aid us in processing information. For example, through the use
of mind maps, our thought patterns can now be visually organized. Apps like Flipboard and Newsmap have
completely reinvented the display of news, while tag clouds have provided another way to discover and search
for information. And through network graphs, we can now uncover the connectivity between any number of
entities, be they our own social circles, groups of companies or globally dispersed cities.
Moreover, visualizations no longer adhere to a static format: they can be interactive in nature. This allows a
user to drill down on certain data points, or manipulate and change views of the information to reach deeper
insights.
Infographics are another popular visualization form. Their growth since 2009 came with the rise of content
marketing, which involves the creation and sharing of content in order to engage with customers.10 Brands
and advertisers frequently use infographics as a form of content, as they provide both interesting insights and
visual appeal, and are easy for users to share on the web.
07
FIGURE 4: Search and news reference volume for the word “infographic” on Google.
08
Data visualization tools
Until about 2007, Microsoft Excel was the de facto standard for developing visualizations, whether they were
pivot tables or simple graphs. When analyzing larger data sets or looking for more complex visualizations,
knowledge workers would often have to tap into their company’s own business intelligence (BI) units to access
highly skilled data scientists and analysts.
Since 2007, however, a new breed of visualization tools has emerged which is characterized by simplicity and
ease of use. These tools enable non-technical workers to bypass their BI units and model data themselves. This
is the rise of what Gartner touts as “data visualization applications,” an industry Gartner predicts will reach
$1 billion as early as 2013.11
Tableau Software is one of the fastest-growing data visualization applications on the market today and is in
use by over 9,000 organizations around the globe. Tableau’s success is a testament to the rise of the data
visualization market, which research firms Gartner and IDC predict won’t slow down any time soon.
The challenge with data visualizations
With the advent of these innovative tools, the ability to create a visualization of a data set is no longer difficult.
What remains difficult, however, is the creation of a good visualization.
If we break down the field of data illustration, we see that it is essentially the coming together of two contrasting fields of study: art and science. It requires the harmonious work of both the left and right brain, where the
most complex of data sets can be gathered and refined and then organized in a simple yet compelling way.
Finding this type of expertise is not an easy feat — unless, of course, you’re a Google. Google’s “Big Picture”
data visualization group is led by Martin Wattenberg, and a quick look at his resume makes you realize he is
among a special breed of people. How many people do you know with both a doctorate in mathematics and an
exhibition at New York’s MoMA?
Due to the difficulty in finding the right talent and expertise, data visualizations often end up being too complex to interpret, or they distort the information by focusing on the visual and not the meaning of the data
itself. As Tufte explains, “excellent visualizations are those that give the viewer the greatest number of ideas in
the shortest amount of time, with the least ink and in the smallest space.”8 In essence, data illustration is about
simplifying the complex as much as possible.
Investment in the data visualization space
2011 was a banner year for companies in the field of big data, with an estimated $2.47 billion invested by
venture capital firms globally.12 This was a 38% increase from the amount invested in 2010.12
The following chart depicts some of the top data visualization companies and their respective funding to date.
09
Excluded from this chart is the analytics application Spotfire, which was acquired for $195 million by TIBCO
software.13 Prior to its acquisition in 2007, Spotfire raised nearly $40 million over the course of ten years.13
Qlik Technologies is another notable software product with powerful visualization techniques. The company
went public in July 2010 at a valuation of nearly $900 million.13 Prior to its IPO, the company raised over $80
million over a ten-year period.13
10
Noteworthy applications of data visualization
Understanding census data
For over a century, visualizations have been used by governments to better understand census data and
decide, for instance, how representation should be apportioned and federal dollars distributed.14 A recent
example (below) shows Statistics Canada maps depicting population changes in the Greater Toronto Area from
1996 to 2011. They reflect how population growth is slowing in Toronto and Mississauga and rising in areas
north of these cities.
FIGURE 5: GTA Population Change by Municipality 1996-200115
11
FIGURE 6: GTA Population Change by Municipality 2006-201116
OpenFile, a Toronto-based startup, has used 2011 Canada census data to build their CensusFile application.
Through the use of data maps, this application allows anyone to mine the census data and gain insights about
their neighbourhood.
12
Tracking disease
One of the most cited examples of a data visualization success story was John Snow’s cholera map. During
an 1854 outbreak of cholera in London, England, Snow used a spot map to illustrate how outbreaks of cholera
were centered around the city’s water pumps. This depiction helped prove that cholera was being spread
through water and not by air, as was thought at the time.17
FIGURE 7: John Snow’s cholera map17
In 2006, the city of Sault Ste. Marie in Ontario was able to eliminate what could have been a potentially serious
threat related to the West Nile Virus. The Sault Ste. Marie Innovation Centre had done a systematic job of
enabling the sharing of data sets between various municipalities within the city. The data sets were then being
merged using data maps to uncover new insights. Through this activity, the Centre happened to learn about an
unusually large collection of mosquitoes within the city’s underground transformer vaults. Due to an absence
of draining structures, the vaults had unknowingly become the perfect breeding ground for mosquitoes. Were
it not for the use of data visualization, this threat of West Nile Virus would not have been discovered and
mitigated.
13
FIGURE 8: Map reflecting Sault Ste. Marie mosquito trapping efforts18
Improving healthcare
The Canadian Institute for Health Information (CIHI) has developed the Canadian Hospital Reporting Project
(CHRP), which is focused on improving the quality of healthcare across the nation. Visualizations are being
used to increase understanding of mortality rates, readmission rates, costs of hospital stays and other health
indicators. The project’s goal is to provide data insights to key decision- and policy-makers, so improvements
can be made and hospitals can collaborate to achieve efficiencies.
FIGURE 9: Hospital 30-day overall readmission rates by Ontario region, 2009-201019
14
Supporting decision-making
The 1986 destruction of the Space Shuttle Challenger, which was due to a damaged O-ring seal, has been
attributed in part to a failure of data analysis. Decision-makers at the US space agency, NASA, were uncertain
about whether to launch the space shuttle in below-freezing temperatures, and relied on poorly presented data
and short bullet points in making their decision. As data visualization expert Edward Tufte later pointed out, this
disaster could have been avoided had the data been more clearly conveyed through the use of a graphic. The
sample graphic Tufte later developed makes obvious the risk of O-ring damage in extreme cold temperatures.
FIGURE 10: Edward Tuft’s figure on the 1986 Challenger Space Shuttle launch decision20
Today, NASA is heavily involved in the development of visualizations that explain NASA missions and scientific
results.
15
Driving transparency
General Electric (GE) is one of the many companies developing extraordinary visualizations, based on the
petabytes of data collected through their various technologies. GE is hoping the visualizations will help not
only simplify the complex nature of their work, but also drive insights and discoveries that might otherwise
be difficult to achieve. For example, GE has developed an interactive visualization to help US residents
understand how much of their energy is being supplied by renewable sources.
FIGURE 11: Energy being supplied by renewable sources for US residents21
IBM is also experimenting with data visualization and has developed an application called Many Eyes that
invites anyone to upload a data set or to visualize an existing one.
16
Groups supporting data visualization
Discovery Exhibition is a US-based organization that profiles “visualization impact stories.” Highlights in 2011
included visualizations that helped reveal the mortality rate of African infants, understand traffic patterns in
Beijing and optimize car engine injection systems. Information is Beautiful is another US organization focused
on celebrating beautiful designs in data visualization. Among the nominated designs for 2012 is one on the
Vancouver Canucks’ franchise history.
Here in Ontario, York University and OCAD University have teamed up to develop the Centre for Innovation in
Information Visualization and Data-Driven Design (CIV-DDD), which is essentially a data visualization research
hub. Leveraging computer scientists from York and designers from OCAD, the group is working to develop data
visuals that help solve specific problems across the areas of healthcare, arts, social sciences and engineering.
Sample projects underway include understanding the impact of social media content and mapping the origins
of Africans liberated from transatlantic slavery.
MaRS’ very own Data Catalyst team is working with data to provide insights on the innovation economy in
Ontario. Their outputs will include visualizations and dashboards representing the impact of innovation support in the province, as well as visualizations that highlight opportunities for market and economic growth in
key sectors.
17
Looking ahead
There is no question about the potential for growth and innovation in the data visualization space. Otherwise
hard-to-understand rows and columns of numbers are brought to life through visualization techniques. Data
illustrations not only help to tell a story, but they reveal the true meaning behind a data set.
However, data visualization is only one of a series of analytics techniques. As we continue to collect more and
more data every day, an increasing number of techniques will be required to distill the most complex of data
sets down to an easily accessible message. This is an existing gap in the big data market, and an area where
entrepreneurs should think about focusing their efforts.
18
References
1.
McKinsey Global Institute: Big data: The next frontier for innovation, competition, and productivity
2. TechCrunch: Eric Schmidt: Every 2 Days We Create As Much Information As We Did Up To 2003
3. CNN Tech: A telescope that generates more data than the whole internet
4. The World Bank: Mobile Phone Access Reaches Three Quarters of Planet’s Population
5. The Economist: Data, data everywhere
6. Forrester report: Welcome to the Era of Digital Intelligence
7. Fast Company: Time To Build Your Big-Data Muscles
8. Edward Tufte: The Visual Display of Quantitative Information
9. Forrester report: Advanced Data Visualization (ADV) Platforms, Q3 2012
10. Content Marketing Institute: What is Content Marketing?
11. Gartner report: Emerging Technology Analysis: Visualization-Based Data Discovery Tools
12. SGMarketwatch: Venture Capital Sees Big Returns in Big Data
13. Dow Jones VentureSource
14. Fast Company: Infographic of the Day: What the Census Said About Us…in 1870
15. Toronto Urban Development Services: Population Growth and Aging
16. Statistics Canada
17. Visual.ly: John Snow Cholera Map
18. ESRI Canada: Case Study: Sault Ste. Marie Innovation Centre
19. Canadian Institute for Health Information: CHRP Key Findings
20. Edward Tufte: Visual Explanations: Images and Quantities, Evidence and Narrative, p.44
21. GE: Renewable Energy Sources
19
Appendix A: The Data Visualizers
Data visualizations are being used today to unravel the meaning behind big data sets that would otherwise be
too difficult to understand. Highlighted in this piece are eight Ontario-based startups whose innovative applications are setting the future for data visualization.
Bio.Diaspora
Bio.Diaspora brings together disparate information about global outbreaks, climatic conditions and travel
patterns, and synthesizes them to facilitate risk assessments of infectious disease threats around the world.
MaRS Market Intelligence spoke with Kamran Khan, Founder of Bio.Diaspora.
How did you come up with the idea for Bio.Diaspora?
I am an infectious disease physician, and have my own clinical practice based at St. Mike’s hospital in Toronto.
Back in 2003 when we had the SARS outbreak, I really got a to chance to see how a disease can impact a city
— not only in terms of health, but also the psychological and economic damages that come with it. SARS alone
took $2 billion out of our local economy here in Toronto. It really got me thinking about the interconnectedness
of the global community, and I realized that I was going be to practising medicine in a world where I would
require a full understanding of infectious disease activity across the globe. The question, however, was how
could one individual possibly know what is happening in cities all around the world and how they are connected
to each of those cities?
This is when I began to focus my research on the global airline network, which transports over 2.5 billion
travellers every year. Following airline activity was a way to grasp how the world is interconnected and how
cities and countries share the risks of infectious diseases.
Where did the name Bio.Diaspora come from?
I realize it’s quite a mouthful, but Bio.Diaspora is talking about the scattering of living systems. Its literal
meaning is the scattering of life. It represents how living systems interact in a world where there is so much
movement happening.
20
Where are you able to source data about not only global outbreaks, but also travel patterns?
We get information from official government reporting as well as from online chatter, which can provide early
clues about infectious disease outbreaks. We’re pulling information from our colleagues at the Harvard Medical
School who run HealthMap, from NASA satellite imagery and from a variety of other sources pertaining to
human, animal and insect populations. With respect to the airline industry, we work with different agencies to
analyze over 2.5 billion travel itineraries every year.
How do you present this information within Bio.Diaspora?
In terms of techniques, we use a combination of maps, charts, tables and word clouds to visualize different
types of information. For example, we are using a word cloud to visualize the birthplace of residents across the
United States. We have about 200 countries from where people originate and portraying this information in a
chart or a bar graph is not particularly efficient.
One thing that is not often considered is how humans will interact with information. When designing Bio.
Diaspora, visualizing the data was very important to me, and, more importantly, visualizing it as accurately as
possible was critical. We want to minimize the potential for misinterpretation.
Who are the customers Bio.Diaspora is targeting?
Our customers are currently governments and public health agencies, which have a responsibility to protect
their citizens against international infectious disease threats.
Going forward, we will include national departments of defense that are concerned about biological threats, as
well as companies that are negatively impacted by infectious disease outbreaks such as insurance agencies.
Another target is pharmaceutical companies that manufacture drugs or vaccines for certain diseases.
Do you see this information ever becoming available to the public?
I don’t see this happening anytime soon, because the data is potentially sensitive in its raw format. However,
it is possible that sections could be made available to travellers, because getting sick while travelling can be
particularly unpleasant and people would value this information. There may be creative ways to utilize some of
our information and distill it right down to an individual traveller’s needs.
Looking into the future, what is your vision for the ideal state in which diseases are tracked?
My hope is that we get away from reacting and move more into anticipating. Today, we’re largely firefighters
in that we basically wait for fires (that is, outbreaks) to emerge and land on our doorstep, and then we react to
them. What we really need is an early warning system.
An early warning system could provide any jurisdiction with the ability to look out to the rest of the world, to
have situational awareness of what’s occurring in terms of outbreaks, and to understand how people are moving into that particular geographic region at any given time. As a global community, we need to start thinking
more proactively and prioritizing prevention, rather than working as a collection of individual countries solely
focused on our immediate self-interests. This is a reality of living in a highly interconnected world.
In hindsight, do you think SARS could have been anticipated and prevented in Toronto?
I think there was definitely enough information to indicate SARS was going to land in Toronto, as there was
something unusual happening in Guangdong province in China, which is right next door to Hong Kong. Many
of the tools that we have today didn’t exist back then, but they would have certainly given us good insights.
Looking back, we can see just how predictable the movement of SARS was. It’s amazing how much the spread
of the disease tracked the corridors of people’s movements worldwide.
21
What are some of your favourite visualizations?
One image that really speaks to me is the image of flight lines in the world. When looking at it, you can see the
fabric of how the world is connected today. You can not only see the physical geography of places, but also a
depiction of social contexts and relationships. It’s not necessarily an image that would be used for decisionmaking, but it’s a beautiful rendering of something that’s complex and global.
BuzzData
BuzzData gives people the analysis and visualization tools they need to find the story in a data set, and to
communicate it visually through the creation of smart executive summaries. People can set up their own
BuzzData Hive, where teams and communities can store and share their files, visualizations and analysis.
MaRS Market Intelligence spoke with Nick Edouard of BuzzData.
What is the underlying problem you are trying to solve with BuzzData?
There are many problems with the way in which data is shared today, particularly in large organizations. Too
many people look to share large files such as Excel spreadsheets by email, with massive cover notes. What
the intended audience usually really needs is just the key facts and figures: the executive summary, if you will,
from that data. People often do not have the time or the skill sets to understand data that is not communicated visually and effectively.
And while there are good file-sharing tools such as Dropbox, they are not an optimum way to communicate
information, particularly when that data needs engagement and discussion in order for meaningful insights to
be extracted.
The visualizations are key to helping users explore and understand the data that’s been uploaded. For example,
say you have uploaded a North American sales forecast. BuzzData will offer a suite of tools that asks, “What do
you want to do with this data? Do you want to try a visualization? Do you want to complete an infographic? Do
you want to find some structure in this unstructured document?”
22
The tools will then return the output of that manipulation as a new artifact into the data room. These artifacts
can then be structured into an executive summary, highlighting the key facts and figures that need to be
communicated.
Have you built these visualization tools internally?
No. We are leveraging best-in-class tools from third parties, one of which is the infographic application infogr.am.
There are a whole host of visualization tools, applications and products out there that do one thing really well,
whether that be mapping, graphing, motion charts, etc. But it’s hard for a user to know what exists. Our goal is
to make it easy for a BuzzData user to choose the best tool they want and to produce the type of analysis and
artifact that they are looking for.
Who are your some of your customers today?
We’re doing a lot of work with some really exciting companies and organizations that regularly produce data
and know they need to do better in terms of how they share and use it, both internally and externally — market
research and management consultant companies, for example. Often, they are looking at new ways of delivering information to their customers, say by taking their current executive summaries and turning them into
something that is much more visual, engaging and easier to grasp.
We also ran the Best City in the World Contest with The Economist’s intelligence unit earlier this year to
crowdsource a new livability index. The Economist Intelligence Unit (EIU) has been publishing the results of
their livability analysis for years. One angle they were interested in was readers’ thoughts on whether they
were approaching the analysis correctly and what additional factors could be included in the index. So the
EIU published the underlying data, and the community engaged with it and produced some really interesting
results.
One of the things the winner did was assess the relative proportion of green space to urban sprawl within a
city using OpenStreetMap and Google Earth. Another individual produced an app that calculates the best city
for an individual based on a user’s own preferences and rankings.
How comfortable are your customers with sharing their data publicly?
BuzzData Hives can either be private — locked down and by invitation only, or public — discoverable by Google.
At the moment, the private Hives outnumber the public ones at a rate of four to one, so we’re definitely not
just about public or open-data sharing. That said, there are some very interesting things happening on the
public side. Some organizations are looking to better inform their communities about specific topics so that
they in turn become better advocates, and others are seeking to get their community involved in the development of products and services.
A professor at the University of Toronto’s faculty of math has recently set up an NSERC public Hive. This Hive
gives academics, government officials and anyone interested in NSERC* a way to engage with the data related
to its funding so they can better understand how it is being applied, whether it’s working and so on.
Our customers are sensitive to issues around data storage — specifically, security and jurisdictional considerations. While BuzzData itself is secure and not cloud-based, our customers are increasingly questioning where
their data physically resides and who could potentially access it. This is a challenge for us and the SaaS market
in general, as we need to build solutions that meet our customer’s specific requirements. Fortunately we
anticipated this, and we believe we have built the product to be able to accommodate these requirements.
23
What are some of your favourite visualizations?
I’m a very big fan of Santiago Ortiz’s Moebio project and, specifically, his visualization that broke down The
Iliad by the number of times each character’s name appears in each book of the poem. Having read classics at
university, I thought this was a bit of a cheat sheet that I could really have done with ten years ago! It was an
interesting way of looking at something that you wouldn’t think was necessarily data. It provides structure to
what is otherwise unstructured information.
*Natural Sciences and Engineering Research Council of Canada
DataAppeal
DataAppeal is a web-based tool that automatically renders large amounts of data into three-dimensional
animated maps. It offers an alternative to the often complex mapping tools available today.
MaRS Market Intelligence spoke with Nadia Amoroso and Haim Sechter of DataAppeal.
How did the idea for DataAppeal come about?
Nadia: My background is actually in landscape architecture and creative mapping and I recently wrote a book
called The Exposed City: Mapping the Urban Invisibles. While writing this book, I was looking at various data
points within a city — elements such as demographics, crime rates and surveillance cameras. This is information that is not normally visible. I was interested in creating some type of landscape or new topography based
on this data, hoping to reveal hidden patterns within a geographic space. So I began manually creating data
maps. When presenting these maps at various conferences, I was amazed at the interest they created. This
interest demonstrated to me the importance of creating a tool that allowed others to use visualization techniques to help them analyze data.
What about existing geographic information system (GIS) tools, which can also develop data maps?
Haim: I have been in the business intelligence space for 12 years now, and have seen all the issues GIS presents.
Namely, they are very expensive and difficult to install. So what happens is that their use is limited to only a
few individuals within an organization. When speaking with Nadia, it was interesting to learn how engaged
people were with her data maps. Our goal with DataAppeal is to overcome the challenges presented by GIS
today and make mapping tools something that people within organizations actually want to use so that they
can share their data.
24
What do you think is special about the insights data maps reveal, as versus other forms of
visualizations?
Nadia: Mapping is an ideal way to visualize geographic data. One of the key figures I was researching for my
book was an architect named Hugh Ferriss. Back in 1916, New York came out with a zoning ordinance, and a
lot of people in the area — citizens, architects and even city officials — had a hard time understanding what the
numbers and codes of the planning ordinance meant. So Ferriss manually sketched and rendered 3D maps of
the form and shape of buildings that could be built, based on the zoning bylaws. He took textual information
and turned them into works of art, which got a lot of attention and even graced the covers of The New York
Times Magazine. The maps provided instant insight into the data.
Why did you choose to use 3D shapes to represent the data in DataAppeal?
Nadia: The use of 3D is fairly new in the data visualization space and comes from my background in urban
design and landscape architecture. Because our application is built on the Google Earth platform, you can
actually walk through the 3D data itself, as if in street view, and view it from all dimensions. It makes the
experience much more immersive.
From a more practical standpoint, 3D gives a dimension of height to data, which is an extra level of analysis
that you wouldn’t normally get from a 2D data map. Often data maps group information in colours, but it is not
easy to know the amount of variation between two data points with different shades. It is easier to see variation when one data point is, say, twice the height of another point.
What are some of the challenges people run into when using data maps?
Haim: With geo-data, the end user needs to have some overall understanding of what it is that they are looking
at, otherwise the data can really be misinterpreted. If I’m showing two different values that are exactly the
same — for example, the number of shootings in Orangeville versus Toronto — it will look much more intense
in Orangeville because of the size of the land mass. We have built some training tools right into DataAppeal so
that these kinds of errors are not made.
How do you see the DataAppeal product evolving?
Haim: One thing we are focused on creating is a data gallery to enable people to profile their visualizations
on our website. Making it easy to share data will play a big role in bringing together people and organizations
from around the globe. Our emphasis will not be on the sharing of numbers but, rather, on the sharing of art.
That is what we feel is the key to driving transparency.
Nadia: James Corner is a landscape architect who teaches at the University of Pennsylvania. He has created
some very poetic mappings by taking aerial photographs and superimposing collages to show elements that
would not otherwise be seen in the image, such as an underground ravine or what is hidden within the soil. His
work inspired me to create Data Appeal.
Another person I admire is Hans Rosling, who has created some amazing animated visualizations. These are
spectacular to watch in some of his TED talks.
25
Infonaut
Infonaut’s product, HospitalWatchLive, tracks the interaction of patients, staff and assets in hospital settings,
providing evidence to better understand and control the spread of infections.
MaRS Market Intelligence spoke with Niall Wallace, Founder of Infonaut.
How did you come up with the idea for this product?
Infonaut was founded in 2006 and was focused at that time on healthcare and data visualization. We got
involved in the SARS response by doing things like mapping quarantine cases. We got really good at tracking
diseases through this type of work. Fast forward to 2009 and Infonaut was asked to help a hospital that was
experiencing a superbug outbreak. They wanted us to help them get a handle on what was going oninside
their hospital building. Up until that point, we had only been working on the movement of disease in the
outside world.
So how did you tackle the tracking of disease inside a building?
We focused on the movements and locations of patients, staff and assets. For example, people not washing
their hands, people moving around, assets being moved around and so forth. Diseases are essentially spread
through these types of human behaviours. When two things come together, that’s when you have a chance for
a disease to make a leap.
Our technology is able to monitor down to about eight inches where everything is in a hospital. For example,
we put our tracking technology inside of gel dispensers. When a doctor comes in contact with a gel dispenser,
we get a positive signal that gel has been dispensed.
Why do you need to track hand washing? Is it not mandatory within hospitals?
Everybody reports 90% hand washing compliance within a hospital, but our best guess is that it only happens
about 40% of the time. If you consider a shift change at 3:00 a.m. on a long weekend, hand washing rates can
drop as low as 10%.
It comes down to the fact that while everybody knows what they should be doing to prevent infection, they do
not always follow through on it.
26
Is this seen as being too Big Brother?
This is something we considered early on, that we were delivering a product that could be considered Big
Brother. Especially since we have expertise on our team on surveillance systems, and how they overwhelmingly
fail. If people feel like they’re under surveillance, they will find ways to defeat the system.
With HospitalWatchLive, we focus on preventing infections and protecting hospital staff. We are not interested
in analyzing any other types of behaviour with the data we collect. We work on communicating this benefit
to the staff and helping them understand that their safety is our first priority. This is what really helps us in
obtaining their support and engagement.
Any challenges with collecting this data?
The biggest challenge around data collection is the privacy requirements associated with personal health
information. Part of me feels these privacy policies have created negative impediments to the building, design
and delivery of value-added solutions. The other part of me understands why they are necessary.
Overall, I feel the pendulum has swung too far. Patient data has gone from being on a clipboard at the end
of a bed, which anybody walking through the room could access, to being part of an enormous system with
complex algorithms to protect the information and access to it.
However, rather than trying to affect change in the area of privacy, we treat it as a necessary requirement and
simply work around it.
The data we collect is visually overlaid on a map of the hospital. This provides evidence of how infectious
disease is actually being spread within the building. The visualizations tend to be a bit of eye candy to engage
audiences and provide them with an understanding of what the data shows.
That being said, I think visualizations by themselves have little value if you do not act on what the data tells you.
With HospitalWatchLive, what becomes more important than the data analysis is the ability to drive behavioural
change among staff to limit the spread of infectious disease. The data alone will not be able to do that.
Where do you see Infonaut heading as a company?
Our goal is to leave behind the pure health IT play and to become more of a knowledge organization by
assuming some responsibility for change management. We may also reach the point where we give away our
software for free so that we can charge for the knowledge services, which is essentially the “so what” part that
follows data visualization. This is where hospitals are going to get the most value out of what we do.
Where is HospitalWatchLive being used today?
It is being used at Toronto General Hospital, on two floors in the multi-organ transplant unit. Patients in this
unit are at the highest risk for infection because they are on immunosuppressive drugs and are an older
population. Even though staff are a lot more vigilant in keeping these patients protected, without our solution
these patients would still have a higher-than-normal incidence of infections.
I really like the visualization of Napoleon’s march across Russia. It does a great job of conveying complex
information in a way that can be easily consumed.
I would also have to give credit to my iPhone, which is designed in such a thoughtful and elegant way when it
comes to retrieving information. Apple, in general, has done a great job in trying not to overwhelm users with
too much information when using their products.
27
Polychart
Polychart is a web-based application for visually analyzing data and creating charts. Through drag-and-drop
functionality, it enables managers, marketers, analysts and other users to understand data visually without
having to code or perform statistical analysis.
MaRS Market Intelligence spoke with Lisa Zhang, Co-founder of Polychart.
What led to the creation of this technology?
I did a couple of internships at Facebook’s data science team, so I’ve seen some of the trends and opportunities
available in the data-analysis market. I’ve also witnessed the rapid growth of the data-analysis software, Tableau,
which is a great tool. What we felt was missing from it was the ability to bring that type of analysis to the web.
The advantage of being web-based is that we don’t have to make assumptions about which operating systems
people are working under, or how willing people are to download software and plugins. It’s just very accessible.
Why Polychart rather than a more traditional tool like MS Excel?
The best thing about Polychart is the speed at which you can create a chart. I think iterability is extremely
important when you’re analyzing data, since you tend to think of ideas as you’re working. If there’s a lot of
friction between when you thought of an idea and when it shows up on the screen, then that idea just gets lost.
In data analysis, this can mean the difference between having a key business insight and not.
Based on your experience at Facebook, how well do companies exploit their data? Particularly, how well
do web companies leverage their large amounts of user data?
I think there are a lot of ways in which companies could be using their data but are not due to a lack of talent
in the data space — this is particularly true when you’re dealing with big data. There is a Fast Company article
I came across which talks about there being 340,000 big data positions in 2012, of which more than half will
go unfilled. I think a lot of this is a talent issue and if we can increase the accessibility of data analysis, then
companies can go a much longer way.
Why is there such a lack of talent?
Well, in order to be a good data scientist, you need to understand statistics and you also need to have programming skills in order to manipulate data. In order to visually present data in an impactful way, you need to
understand human perception and how to communicate well. Those are a lot of different skill sets at play that
are difficult to find in one person.
28
Visualizations can often lead to different interpretations, simply by the way in which the data is displayed. Does Polychart address this challenge?
This is one thing we take very seriously. There is ample research into the field of perception that tells us what
our visual system pays attention to. For example, people are very good at comparing areas, and so it’s helpful
to start the y-axis of a bar chart at zero. It’s also why 3D effects on bar charts and pie charts can distort the
data being displayed. 3D effects do a great job at grabbing someone’s attention, but when doing data analysis,
accuracy is much more important.
The fact that people are good at comparing areas is also why when representing values using sizes of objects
(say, a circle), the area should grow proportional to the value represented (as opposed to the radius). Say you
are representing the numbers 1, 2 and 3, and you use circles that have a radius of 1, 2 and 3, then the third
circle will actually look nine times bigger than the first because people perceive areas more readily than the
diameters.
Colour is something else that is tricky to use. While colours are great for representing categorical values,
they’re not very good for representing quantities. We’re very bad at seeing if a shade is one-and-a-half times,
two times or three times darker than another.
In terms of choosing the type of charts to use, there is an interesting flowchart that suggests which
visualization to use based on the data that you have and the purpose of the visualization.
Any examples of poorly made visualizations?
The chart titled “Percentage of Comments by Identity” is an example of a visualization that ignores best
practices. The 3D effect and the different heights shown give a disproportional area to “Pseudonyms,” and
make the area representing “Real Identity” a lot more than 10 times smaller than “Anonymous.”
Similarly, this graphic by Gizmodo about the change in iPad battery size has tied the increase of battery size to
the height of the image rather than to the area, misrepresenting the increase.
Fox News is a large source of misleading visualizations! This chart on Bush tax cuts does not start the y-axis of
the chart at zero, which magnifies the change in tax rates.
Another chart created by Fox News about unemployment rates is borderline dishonest. The last data point of
8.6% is shown as being a non-change on the graph.
Putting aside all these bad visualizations, what are some of your favourites?
Napoleon’s March is a classic visualization and a great example of an effective way to present statistics.
More recently, the interactive database We Feel Fine is one of the biggest data visualization projects in the past
decade
29
Quinzee
Quinzee focuses on helping users be smart about energy consumption. To do so, Quinzee presents data from
smart meters in a way that educates, motivates and enables residents to make more intelligent decisions
about their energy use.
MaRS Market Intelligence spoke with Faizal Karmali, Co-founder of Quinzee.
Why did you choose to launch Quinzee in Ontario?
The regulatory and political environment in Ontario is fostering energy efficiency, particularly through the
Green Energy Act. And Ontario is ahead of the curve globally with the adoption of smart meters. There are
smart meters on just about every single house and small business in the province. We saw this as an opportunity given there is so much energy data being collected.
What is being done with this energy data today?
A recent study by Accenture talks about how the average North American spends six to nine minutes a year
looking at their energy consumption. We’re talking about an average $2,000 spent per person and it’s being
looked at for six minutes. Smart meters are collecting data for utilities for the purposes of billing and energy
management, but the data is not yet in the hands of consumers for their own energy management. The
challenge is significant. Few consumers understand as yet what a kilowatt-hour is, although the information is
available. Moreover, the market has a limited attention span and limited interest in energy. Quinzee is creating
more value for utilities and their customers from the data.
Why this lack of interest from people in their energy use?
It’s because no one really connects their day-to-day behaviour with energy use. It’s just so far removed from
an average’s person’s life. Sure you’ve got pockets of people who say they are very energy-conscious, but, in
general, the average person does not even think about it or translate their words into action. It really boils
down to our culture of excess and the fact that the effects of our overuse of energy are not visible to us.
How can data visualization help?
The current way in which data is presented requires interpretation and, considering energy use is an area of
limited interest, the likelihood of that interpretation taking place is low. For Quinzee, the broader idea behind
leveraging data visualization is to ensure people can rapidly interpret our data visuals. For now, we’re focused
on providing quick nuggets of information that make it simple and easy for a user to act on.
30
What nuggets of information resonate the most with users?
We’re finding people respond the most when their energy consumption data is put into context. For example,
we provide a household with averages, so that they can place themselves in context of what’s normal for them,
their neighbourhood, their city, country, etc. The natural motivation for humans to be “normal” is what will
drive the average energy use lower and lower.
This phenomenon is similar to what we saw with the blue box recycling program. The driver for adoption was
having households put blue boxes outside: then other neighbouring households felt pressure to use one as
well. There is actually a street in the UK that is adopting a similar methodology for energy consumption. All
of the residents on the street have agreed to write their energy meter readings on the sidewalk. As other
residents walk home, they know exactly how much better or worse their consumption is than their neighbours’.
This has led to something like a 20% reduction in energy usage on that one street alone.
Will this type of social pressure be enough?
It’s only one method we intend to use. Another idea is to help people understand the aggregate of the impact
their actions have. Today, we consume energy like it’s an endless resource, but that’s just not the reality of
the planet. We are so far removed from the aggregate that it’s not very relevant to us on a micro level. I think
one of the major goals of data visualization is to help us place our day-to-day personal behaviour and choices
into greater data contexts, be it the context of a neighbourhood, a city, a province, a country or the world.
Connecting individual choices and behaviour to something bigger will help people feel empowered. Eventually
when someone switches off a light, they will feel like they’re actually contributing to something positive. They
will understand the connection.
Who owns energy data today: utility companies or households?
The way it’s set up now is that utility companies own the data, but households have full access to it. The utility
companies are working toward enabling this access, but not all of them are there yet. In the US, President Obama
has undertaken what is called the Green Button initiative, which mandates all US utilities to provide energy data
to the customer and enables third-party providers like Quinzee to analyze and present that data to end users.
In Ontario, MaRS is actually helping drive a similar Green Button initiative for the province. While Ontario is
ahead of the game in having implemented smart meter technology, it has been leapfrogged by the US in terms
of driving data transparency.
Is this a policy issue?
Not really. Ontario’s Minister of Energy has been talking about getting information back into the hands of
Ontarians. It’s just been a slower process.
What is your long-term vision for the company?
Resource management is important in many sectors, and our culture of excess and indifference stands in
the way of that. Quinzee aims to improve the quality of life of communities, both rural and urban, around the
world. By starting in a highly educated, relatively wealthy country like Canada, we hope to prove a model that
can be translated to sectors such as food, healthcare and waste. For us, Quinzee is the first of many applications that will use data and information to influence behavioural change.
Let’s use healthcare as an example. What if an individual saw the bill every time they went to see the doctor? This actually happened to one of my friends, who accidentally received a bill for his MRI service. He was
shocked to see the amount. It has encouraged him to think twice the next time he feels the need visit the doctor. It’s not about changing the healthcare system; it’s about using data to drive transparency so that people
can feel engaged and make informed decisions.
31
What are some of your favourite visualizations?
I really connect with one visualization NASA has of the world at night where the lights are on everywhere. With
just one glance you can see how well-lit a certain city is, and how much power countries are using.
GE has also done some spectacular work with their visualizations. We look to what GE has done with
visualizations as motivation for how we would like to share some of our information.
Venngage
Venngage is building a solution to automatically transform data into visually appealing infographic reports.
These reports can be used for a variety of purposes, from content marketing to data analysis and reporting.
MaRS Market Intelligence spoke with Eugene Woo, Co-founder of Venngage.
Where did the idea for Venngage stem from?
Our company started out as Vizualize.me, which was a simple tool for visualizing your resume. Basically you
signed in using LinkedIn, and Vizualize.me converted your LinkedIn profile into an infographic. This tool got
a lot of traction and press coverage from outlets such as TechCrunch and Forbes. Even today we get at least
1,000 sign-ups a day, and have over 200,000 users in total.
The problem with Vizualize.me is that it offers limited engagement. Users only go to the site if they’re hiring or
looking for a job. This represents something like 10% of the population, and we find a large percentage of our
users don’t come back to the site because they have no reason to.
Nevertheless, Vizualize.me helped us realize the power of infographics, which are a unique form of data
visualization. It also created a lot of inbound interest from clients who wanted infographics for their custom
data. This is how we came up the idea to automate the creation of infographics.
Why do you think the use of infographics has become so popular?
I think that, in general, and I know this is a cliché, a picture is worth a thousand words. Would you rather read a
one-page article or just look at an infographic report? When done right, an infographic can help you to synthesize information very quickly and easily — sometimes in as little as 30 seconds. In today’s world, where we are
bombarded by millions and millions of messages, we need something like this. If, after looking at an infographic
report you want to dig deeper, well, that’s when you can read the actual analysis.
32
When it comes to content marketing, there are a lot of mediocre data visualizations out there. But the same
can be said about images and blog posts. If you take all the blog posts ever written, you will probably have
99% that are not very good and 1% that are great.
I think the same holds true with infographics. The difference being that infographics tend to get shared more
often and tend to receive more press coverage, so people just see them more. For example, a bad infographic
will surface a lot more than a bad blog post.
Which visuals tend to resonate the most with users?
I think one of the simplest things is knowing how to make text stand out. Take a number, for instance. Most
people think they have to visualize this one number, whereas sometimes it’s just easier to highlight that
number on its own, particularly if you’re not comparing it with anything else.
Venngage tends to stay away from very complex visualizations. For example, something like a network
graph can look very nice when seen from afar, but nine out of ten people won’t understand it. Our clients
sometimes ask for things like a network graph and we have to convince them to use something simpler. For
us, making something simple that is still visually appealing is a much bigger challenge than making a complex
visualization.
Apart from being a content marketing tool, how else are infographics being used?
Our hope is to get companies using them internally. Today, the average office worker still uses Excel or
PowerPoint to do their data analysis. That really hasn’t changed in the last fifteen years. Moreover, data is
locked up in people’s Excel spreadsheets, which is an inefficient and old-fashioned way of working. We want
to provide a tool that allows the average worker to easily convert their data into insights and to be able to
share these insights with other people in the organization. This will help free up data and drive a lot more
transparency.
As technology advances, how do you see the exploration of data evolving?
When companies talk about analyzing data, it’s still very much a domain for data scientists or business intelligence (BI) folks. It’s still a very high-tech, difficult process that involves lots of expensive tools and lots of
specialized people. I mean, a typical enterprise business analytics tool can cost hundreds of thousands of
dollars!
I think data analytics will evolve with the consumerization of IT and become more of a consumer-based offering that everyone can use. With Venngage, we’re going to adopt a freemium model like GitHub. You’ll be able
to create free visualizations up until the point that you want to use real company data, and then you’ll need
to convert to a paid account. This will probably be adopted very quickly by, say, marketing departments, who
don’t necessarily need to analyze a whole data warehouse but just a small set of data. I also see more of what I
call the self-service model being used, where the end user can do the work themselves rather than relying on a
team of analysts or BI experts.
What are some of your favourite visualizations?
I love the work of Nicholas Feltron.
I also really like Facebook’s timeline and how it visualizes such a large amount of data. The funny thing is that
Vizualize.me had a timeline as part of their site. We thought it was super cool and then, maybe two months
later, Facebook came out with their timeline! We thought, “oh no, everyone will think we copied Facebook!” But
really — we built ours first.
33