White Paper Dark Data Making your Organisation data

Transcription

White Paper Dark Data Making your Organisation data
White Paper
Dark Data
Making your Organisation
data-enabled?
This briefing paper looks at the rise of unstructured data aka
‘Dark Data’ which accounts for 90% of the data today; the
contents provide the reader with an understanding of what
dark data is, the challenges, opportunities, and offers
conclusions relating to how the attributes and utilisation of
dark data can help make organisations data enabled to drive
business efficiencies and maximise opportunities.
Introduction:
Organisations are swimming in a sea of data of various types,
structured and unstructured, current, and ancient, sensitive
and trivial. In most organisations, vast pools of unstructured
and unprotected data simply reside within an organisation
not doing much of anything for improving the value of the
business.
According to the analysis firm IDC, the world’s information is
doubling every 2 – 3 years. In fact, IDC estimates that the
overall volume of digital bits created, replicated, and consumed
across the United States alone will hit 6.6 Zettabytes by 2020.
It means that with this rate of data production the wealth of
vast amounts of data to be found in log files, data archives, etc.
has yet to be analysed for any business or competitive
intelligence to assess its value in aiding business decision
making. In fact the data which is kept ‘just in case’ but so far
hasn’t found a proper usage in an organisation is referred to as
‘dark data’.
Dark Data: A definition
Gartner Research Inc., who originally coined the term defines
dark data as the information assets organisations collect,
process, and store during regular business activities, but
generally fail to use for other purposes such as analytics,
business relationships, and direct monetising.
In other words, it’s the human generated data that
organisations are paying to store, protect, and manage that
isn’t being efficiently utilised to deliver better business value.
based structures represents the rebel counterpart, namely Dark
Data.
The sources of dark data vary from company to company,
organisation to organisation, enterprise to enterprise; any of
the following could fall into the category of dark data if viewed
as redundant, and unstructured.
Customer Information
Notes or Presentations
Legal Contracts
Raw Survey Data
Log Files
Financial Statements
Email Correspondences
Previous Employee Data
Dark data can also be characterised as:
Data that is not currently being collected;
Data that is being collected, but that is difficult to access at the
right time and place;
Data that is collected and available, but that has not yet been
productised or fully applied
The Challenges
Dark Data might be a relatively new term, but it's not a new
problem, for most medium to large organisations the
challenges posed by dark data will become more recognisable
ne such challenge is that dark data consumes costly storage
space. Medium to large organisations are generally able to
provide terabytes of file share storage space for their
employees and departments to utilise. Employees drag and
drop all kinds of work related files, as well as personal files such
as personal photos, MP3 music files, personal communications,
etc. In addition, PSTs and work station backup files. The clear
majority of these files are unmanaged and therefore never
looked at again by the employee or anyone else.
More storage means more overhead costs particularly in the era
of ‘Big Data’ which is already a significant concern in most
organisations.
Aside from increased storage costs, having large amounts of
unstructured or unorganised data is a challenge simply because
many organisations don’t have the will, systems or processes in
place to automatically index and categorise their rapidly
growing unstructured dark data.
Types of Dark Data
In simple terms structured data is neat and tidy, tabular in form,
namely comprising of rows of columns in a database defined,
and accessible. E.g. SQL-based relational databases have not
only become ubiquitous but also often regarded the ‘lingua
franca’ of data storage, transaction processing, reporting and
analytics. Alternatively, data that doesn’t conform easily to SQL
Security and risk is also another concern when dealing with the
challenges in utilising dark data, the reason being that dark data
has the potential to contain sensitive information, which may
be harmful to an organisation in the form of a security breach.
Given the common data that most organisations collect the
security concerns and risks may include some if not all of the
following:
Legal and Regulatory Risk
- If data covered by mandate or regulation e.g. patient records,
appear anywhere in dark data collections, its exposure could
involve legal and financial liabilities
The Opportunities
Along with the sensitive data that could be potentially harmful
in the case of a breach there is also the potential for dark data
to be a goldmine of information. For consumer centric
organisations there are numerous data points circulating
around the organisation at any given time.
Intelligence Risk
If dark data encompasses proprietary or sensitive information
reflective of business operations, practices, competitive
advantages, important partnerships, joint ventures, etc.
inadvertent disclosure could affect the bottom line or
compromise important business activities and relationships.
Reputation Risk
Any kind of data breach reflects badly on the organisations
affected. This applies as much to dark data as to other kinds of
breaches of particular concern should be where an organisation
has customer and/or operations stored in the cloud outside
their immediate control and maintenance.
Another key challenge presented by dark data is determining its
real value, if any at all.
Much of dark data remains ‘unilluminated’ because
organisations simply do not know what it contains. To destroy
it may prove too risky (because of compliance issues), but
analysing it can be costly making it hard to justify the expense
if the potential value of the dark data is unknown.
For example, a bank that is only looking at transactional
information or CRM database information to target a customer
with a marketing promotion, may only be seeing part of a 360degree view of a customer, and thereby only understanding a
fraction of the preferences of that customer. Arguably, the
highly valuable data about who customers are interacting with,
what they are saying in social media about how they feel about
brands, what they are looking for, where they shop, and in this
example, how they felt about their customer service experience
when they walked into a bank – is left dark.
Organisations such as Google and Amazon ensure they have
good understanding of customer preferences gained by
gathering consumer intelligence, far more so than long
established consumer centric organisations like banks which
(ironically) have been around much longer and have amassed a
lot more customer data.
In some cases, it may not even be realised initially that useful
data in the form of dark data is being collected. An example of
this being a Cisco number crunching project undertaken at
Copenhagen Airport, where a team derived an amazing amount
of useful information by crunching the data in the log files of
WIFI routers scattered around the airport.
Hadoop – A Dark Data Game Changer
The emergence of Apache Hadoop is a game changer in the field
of dark data; pioneered fundamentally as a new way or storing
and processing data (and it’s 100% open source).
Instead of relying on expensive, proprietary hardware and
different systems to store and process data, Hadoop enables
distributed parallel processing of huge amounts of data across
inexpensive, industry-standard servers that both store and
process the data, and can scale without limits.
With Hadoop, no data is too big, and most importantly
Hadoop’s breakthrough advantages mean that organisations
can now find value in data that was recently considered useless.
So, essentially, dark data can be viewed as a subset of big data,
and is about the same thing – data management.
Conclusions
The Danish passenger’s smart phones “pinged” the different
routers as they walk through the terminals, even if they weren’t
connected to the network and the team found they could track
passenger movements and behaviour to a reasonable level of
precision.
The data was subsequently not only used to determine facilities
questions around typical passenger flows and choke points, but
also to help answer more commercial questions such as “which
is the most visited area of duty free?”
In order for organisations to manage and analyse data
efficiently and more intelligently than ever before the
aforementioned example shows that they need a means to sort,
structure and visualise their dark data - a key requirement in
determining whether the data is even worth further analysis.
In today’s evolving information driven society one of the key
attributes of leading organisations is how deeply they
understand their market, customers, and competitors. As part
of this ‘information age’ revolution organisations are gathering
exponentially vast amounts of data – ‘big data’. However, for
numerous reasons some of the data is falling by the wayside
when it is not put to immediate use, the emerging common
term for this data is – dark data
Dark Data – Actionable Information
Today’s organisations are increasingly aware of the value of
raw data, when data goes dark it means a missed opportunity,
and may leave a big hole in a business strategy when it comes
to potential actionable information for those involved with
consumer centric organisations.
New Tools and Initiatives
For organisations not directly engaged with determining
consumer preferences, raw data may be viewed from a
differing perspective. Namely, that it is those various bits of
data and/or information such as files, documents, instant
messages, etc. that lurk behind every organisations firewall
within file shares, SharePoint sites, and cloud based
collaboration sites like Dropbox.
So, once data has been collected, the next evolutionary
step is to identify and manage the dark data with new tools
and initiatives, such as Hadoop that can result in insight
driven action.
HQS – Dark Data Opportunities?
HQS has created ‘Big Data Analytics: Consulting to
Solutions’, an approach that combines big data expertise
with business analytics and customer engagement
techniques to deliver insight into a business and its
Customers. The capabilities offered to organisations will be a
unified dark data and information management offering
(similar to Simpana’s CommVault software) which will i)
automate the data lifecycle – (i.e. automated policies that
classify, organise, retain and delete legacy data at reasonable
limits); ii) Undertake the execution of eDiscovery (discovery of
al Electronically Stored Information (ESI)); iii) Undertake
ongoing inventory and assessment (i.e. periodic
reconnaissance); iv) Enforcement of safe disposal (i.e.
determining whether only contents or both contents and media
must be disposed of); and V) Devise self-service access
(facilitate users self-service to search and access the legacy data
required).
By utilising an HQS suite of dark data tools even if it is
determined that dark data has negligible value for business
intelligence, something of merit will have been accomplished –
the determining of the dark data Market Value of Information
(to borrow a Gartners Research Inc. infonomics term).
By running an HQS suite of dark data tools an organisation will
not only establish the business case for managing dark data as
part of the information governance process, but will also
outline the case for freeing up IT resources wasted on
maintaining low-value data; organisations will be free - at
last - to hit the delete key.
HighQuest Solutions Ltd
20-22 Wenlock Road
London, N1 7GU
www.highquestsolutions.com
[email protected]
+44(0)207 078 4332

Similar documents