White Paper Dark Data Making your Organisation data
Transcription
White Paper Dark Data Making your Organisation data
White Paper Dark Data Making your Organisation data-enabled? This briefing paper looks at the rise of unstructured data aka ‘Dark Data’ which accounts for 90% of the data today; the contents provide the reader with an understanding of what dark data is, the challenges, opportunities, and offers conclusions relating to how the attributes and utilisation of dark data can help make organisations data enabled to drive business efficiencies and maximise opportunities. Introduction: Organisations are swimming in a sea of data of various types, structured and unstructured, current, and ancient, sensitive and trivial. In most organisations, vast pools of unstructured and unprotected data simply reside within an organisation not doing much of anything for improving the value of the business. According to the analysis firm IDC, the world’s information is doubling every 2 – 3 years. In fact, IDC estimates that the overall volume of digital bits created, replicated, and consumed across the United States alone will hit 6.6 Zettabytes by 2020. It means that with this rate of data production the wealth of vast amounts of data to be found in log files, data archives, etc. has yet to be analysed for any business or competitive intelligence to assess its value in aiding business decision making. In fact the data which is kept ‘just in case’ but so far hasn’t found a proper usage in an organisation is referred to as ‘dark data’. Dark Data: A definition Gartner Research Inc., who originally coined the term defines dark data as the information assets organisations collect, process, and store during regular business activities, but generally fail to use for other purposes such as analytics, business relationships, and direct monetising. In other words, it’s the human generated data that organisations are paying to store, protect, and manage that isn’t being efficiently utilised to deliver better business value. based structures represents the rebel counterpart, namely Dark Data. The sources of dark data vary from company to company, organisation to organisation, enterprise to enterprise; any of the following could fall into the category of dark data if viewed as redundant, and unstructured. Customer Information Notes or Presentations Legal Contracts Raw Survey Data Log Files Financial Statements Email Correspondences Previous Employee Data Dark data can also be characterised as: Data that is not currently being collected; Data that is being collected, but that is difficult to access at the right time and place; Data that is collected and available, but that has not yet been productised or fully applied The Challenges Dark Data might be a relatively new term, but it's not a new problem, for most medium to large organisations the challenges posed by dark data will become more recognisable ne such challenge is that dark data consumes costly storage space. Medium to large organisations are generally able to provide terabytes of file share storage space for their employees and departments to utilise. Employees drag and drop all kinds of work related files, as well as personal files such as personal photos, MP3 music files, personal communications, etc. In addition, PSTs and work station backup files. The clear majority of these files are unmanaged and therefore never looked at again by the employee or anyone else. More storage means more overhead costs particularly in the era of ‘Big Data’ which is already a significant concern in most organisations. Aside from increased storage costs, having large amounts of unstructured or unorganised data is a challenge simply because many organisations don’t have the will, systems or processes in place to automatically index and categorise their rapidly growing unstructured dark data. Types of Dark Data In simple terms structured data is neat and tidy, tabular in form, namely comprising of rows of columns in a database defined, and accessible. E.g. SQL-based relational databases have not only become ubiquitous but also often regarded the ‘lingua franca’ of data storage, transaction processing, reporting and analytics. Alternatively, data that doesn’t conform easily to SQL Security and risk is also another concern when dealing with the challenges in utilising dark data, the reason being that dark data has the potential to contain sensitive information, which may be harmful to an organisation in the form of a security breach. Given the common data that most organisations collect the security concerns and risks may include some if not all of the following: Legal and Regulatory Risk - If data covered by mandate or regulation e.g. patient records, appear anywhere in dark data collections, its exposure could involve legal and financial liabilities The Opportunities Along with the sensitive data that could be potentially harmful in the case of a breach there is also the potential for dark data to be a goldmine of information. For consumer centric organisations there are numerous data points circulating around the organisation at any given time. Intelligence Risk If dark data encompasses proprietary or sensitive information reflective of business operations, practices, competitive advantages, important partnerships, joint ventures, etc. inadvertent disclosure could affect the bottom line or compromise important business activities and relationships. Reputation Risk Any kind of data breach reflects badly on the organisations affected. This applies as much to dark data as to other kinds of breaches of particular concern should be where an organisation has customer and/or operations stored in the cloud outside their immediate control and maintenance. Another key challenge presented by dark data is determining its real value, if any at all. Much of dark data remains ‘unilluminated’ because organisations simply do not know what it contains. To destroy it may prove too risky (because of compliance issues), but analysing it can be costly making it hard to justify the expense if the potential value of the dark data is unknown. For example, a bank that is only looking at transactional information or CRM database information to target a customer with a marketing promotion, may only be seeing part of a 360degree view of a customer, and thereby only understanding a fraction of the preferences of that customer. Arguably, the highly valuable data about who customers are interacting with, what they are saying in social media about how they feel about brands, what they are looking for, where they shop, and in this example, how they felt about their customer service experience when they walked into a bank – is left dark. Organisations such as Google and Amazon ensure they have good understanding of customer preferences gained by gathering consumer intelligence, far more so than long established consumer centric organisations like banks which (ironically) have been around much longer and have amassed a lot more customer data. In some cases, it may not even be realised initially that useful data in the form of dark data is being collected. An example of this being a Cisco number crunching project undertaken at Copenhagen Airport, where a team derived an amazing amount of useful information by crunching the data in the log files of WIFI routers scattered around the airport. Hadoop – A Dark Data Game Changer The emergence of Apache Hadoop is a game changer in the field of dark data; pioneered fundamentally as a new way or storing and processing data (and it’s 100% open source). Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big, and most importantly Hadoop’s breakthrough advantages mean that organisations can now find value in data that was recently considered useless. So, essentially, dark data can be viewed as a subset of big data, and is about the same thing – data management. Conclusions The Danish passenger’s smart phones “pinged” the different routers as they walk through the terminals, even if they weren’t connected to the network and the team found they could track passenger movements and behaviour to a reasonable level of precision. The data was subsequently not only used to determine facilities questions around typical passenger flows and choke points, but also to help answer more commercial questions such as “which is the most visited area of duty free?” In order for organisations to manage and analyse data efficiently and more intelligently than ever before the aforementioned example shows that they need a means to sort, structure and visualise their dark data - a key requirement in determining whether the data is even worth further analysis. In today’s evolving information driven society one of the key attributes of leading organisations is how deeply they understand their market, customers, and competitors. As part of this ‘information age’ revolution organisations are gathering exponentially vast amounts of data – ‘big data’. However, for numerous reasons some of the data is falling by the wayside when it is not put to immediate use, the emerging common term for this data is – dark data Dark Data – Actionable Information Today’s organisations are increasingly aware of the value of raw data, when data goes dark it means a missed opportunity, and may leave a big hole in a business strategy when it comes to potential actionable information for those involved with consumer centric organisations. New Tools and Initiatives For organisations not directly engaged with determining consumer preferences, raw data may be viewed from a differing perspective. Namely, that it is those various bits of data and/or information such as files, documents, instant messages, etc. that lurk behind every organisations firewall within file shares, SharePoint sites, and cloud based collaboration sites like Dropbox. So, once data has been collected, the next evolutionary step is to identify and manage the dark data with new tools and initiatives, such as Hadoop that can result in insight driven action. HQS – Dark Data Opportunities? HQS has created ‘Big Data Analytics: Consulting to Solutions’, an approach that combines big data expertise with business analytics and customer engagement techniques to deliver insight into a business and its Customers. The capabilities offered to organisations will be a unified dark data and information management offering (similar to Simpana’s CommVault software) which will i) automate the data lifecycle – (i.e. automated policies that classify, organise, retain and delete legacy data at reasonable limits); ii) Undertake the execution of eDiscovery (discovery of al Electronically Stored Information (ESI)); iii) Undertake ongoing inventory and assessment (i.e. periodic reconnaissance); iv) Enforcement of safe disposal (i.e. determining whether only contents or both contents and media must be disposed of); and V) Devise self-service access (facilitate users self-service to search and access the legacy data required). By utilising an HQS suite of dark data tools even if it is determined that dark data has negligible value for business intelligence, something of merit will have been accomplished – the determining of the dark data Market Value of Information (to borrow a Gartners Research Inc. infonomics term). By running an HQS suite of dark data tools an organisation will not only establish the business case for managing dark data as part of the information governance process, but will also outline the case for freeing up IT resources wasted on maintaining low-value data; organisations will be free - at last - to hit the delete key. HighQuest Solutions Ltd 20-22 Wenlock Road London, N1 7GU www.highquestsolutions.com [email protected] +44(0)207 078 4332