Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality

Transcription

Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality
Enterprise Data Quality Dashboards and Alerts:
Holistic Data Quality
Jay Zaidi
Bonnie O’Neil
(Fannie Mae)
Data Governance Winter Conference
Ft. Lauderdale, Florida
November 16-18, 2011
Agenda
1
Introduction
2
Data Quality Challenges and Opportunities
3
Holistic Data Quality (HDQ)
4
Enterprise Data Quality Solutions Architecture
5
Enterprise Data Quality Dashboard Example
Page  2
Meet the Authors – Jay Zaidi
 Enterprise Data Quality
Program Lead, Fannie
Mae
 15+ years in Enterprise
Data Management and
Solution Architecture
 Specialized in Financial
Services and Healthcare
domains
 Contact: 202-590-3131
 [email protected]
Page  3
Meet the Authors – Bonnie O’Neil
 Technical Data Architect,
Fannie Mae
 20+ years as a Data
Architect
 Author: 3 books
– Most recent: Business
Metadata
 Author, over 50 articles &
white papers
Page  4
Data Quality Management –
Challenges and Opportunities
Data Silos
“Holistic Data Quality (HDQ)”
Data Volumes and Velocity
Data Optimization and Scalability
Complex Data Architectures
Simplify Data Architecture
Real Time Enterprise Requirements
Real Time Data Quality Monitoring
Lack of Accountability
Strong Data Governance
Reactive Mode
Proactive Data Quality Controls
Lack of Straight Through Processing
Automated controls and monitoring
Structured and Unstructured Data
Leverage “Big Data” Solutions
(email, video, logs, system events etc)
High level of maturity in Data Quality Management is required to address
operational challenges.
Page  5
The Data Quality Maturity Journey
STEP ONE
FOUNDATION &
FRAMEWORK
•
•
•
•
DQ Use Cases
Solution Architecture
Industry Tool Selection
Consistent DQ Definitions
•
•
•
•
STEP TWO
STEP THREE
CONSTRUCTING
THE RAILROAD
EXECUTION
Tool Deployment
Reporting Capabilities
Training
Communication
•
•
•
•
Change Management
Awareness
Proactive DQ Controls
DQ Continuous
Improvement
• DQ Services
Robust data quality management is required to support Regulatory Compliance, Risk
Management, Accounting, Financial reporting and other business functions.
Page  6
The Data Architecture Spaghetti
Department Two
Operational
Data Store
Transactional
Store
Transactional
Store
Data Mart
Department One
Data Mart
Data Warehouses
Operational
Data Store
Department Three
Diagram by Arnon Rotem-Gal-Oz, April 2007
How do you manage the quality of business critical data in a dynamic and highly
complex environment?
Page  7
The Information Supply Chain
Transparency
into quality
across supply
chain
Diagram by George Marinos - The Information Supply Chain: Achieving Business Objectives by Enhancing Critical Business Processes, April 2005
Each link of the information supply chain is dependant on the other – strong
controls are needed to manage business critical data.
Page  8
Guiding Principles
 Identify and address data quality issues at point on entry into eco-system
 Externalize data quality rules from code (rules engine, calculation libraries,
derivation logic, etc with governance and controls)
 Manage enterprise critical data at the enterprise level (ent. Dg, ent. Dq group) and
line of business data at local level (local dg and dq)
 Measure quality of data at systems of record and critical stores, compare against
thresholds and tolerances and remediate proactively
 EDQ team will monitor and manage
Page  9
Data Quality Maturity
Page  10
Data Quality Use Cases





Process Externally supplied data
Reconcile data between data stores or data store and files
Certify the quality of data
Score the quality of data
Identify data anomalies in data (db, files, xml, etc.)
Page  11
Data Quality Toolkit























DQ Standards and Policies
DQ Methodology
DQ Dimensional Framework
DQ Development and Support Model (roles, responsibilities, deliverables by team across the SDLC life cycle)
DQ Best Practices
Data Quality Requirements Template
Data Quality Metrics Template
DQ tasks inside SDLC Methodology
DQ Solution Architecture
DQ Training Documentation
DQ Business Case Deck with elevator speech
Governance structure – custodians, trustees, stewards, business data lead
Map of critical data, SOR’s, custodian, trustee, bdl,
Project plan activities related to a DQ project
On-boarding documentation for tools, dashboards etc
DQ Deployment Model (Centralized vs Federated vs. Hybrid)
Lessons Learned/Challenges you will hit
Change Management Plan
Stakeholder Communication Plan
DQ Charter, Strategy, Approach, Sponsorship
DQ Case Studies – business value add
Synergy between DQ and DG
Organizational structure
Page  12
Conceptual Solution Architecture
Page  13
Deployment Models
 Central vs Federated
Page  14
Challenges You Will Face and Your Response
Page  15
Typical Business Scenario
Analyze Data and
Conduct Forensics
(Data Quality Tool)
Implement Real Time
Data Quality using DQ Services
(Data Quality Tool)
Internally or
Externally Supplied data
Enterprise Applications
Identify anomalies and
remediate issues
(Data Quality Tool and
EDQ Dashboard)
Reports & Executive
Dashboards
Enterprise Data Stores
(Transactional, Operational, Marts and
Warehouses)
The Enterprise Data Quality Platform provides the tools, methodologies and best
practices to identify and remediate data quality issues.
Page  16
Issue Logging and Resolution
Page  17
Holistic Data Quality
Our focus should be on addressing systemic issues. This requires a
switch from “reactive” to “proactive” approaches to data quality and
quality that is not evaluated or managed in silos, but addressed using a
holistic cross-silo approach. “Holistic Data Quality (HDQ)” is the
term that I have coined to address this need.
– Jay Zaidi
Implementing HDQ at the enterprise level is a strategic, multi-year effort for mid to
large-sized firms. If done right - the return on investment is many fold.
Page  18
Do Not Boil The Ocean
Narrowing the scope of the effort will ensure success
10,000 to 20,000
General population of
data elements*
2,000 to 3,000
Critical data for a line of
business* (“LOB
Critical”)
400 to 500
Critical data for the
enterprise*
(“Enterprise Critical”)
Initial Focus should be on
“Enterprise Critical” data
* Estimates Only
Enterprise level governance and quality efforts should focus on Enterprise Critical
data. Lines of business should govern and manage the quality of their business
critical data.
Page  19
Dimensions of Data Quality
 The concept of Dimensions of Data Quality has been established by many
authors in the industry, such as David Loshin and Danette McGilvray:
“To be able to correlate data quality issues to business impacts, we must be able
to both classify our data quality expectations as well as our business impact
criteria.”
-David Loshin
 Dimensions are facets or specific measurements of data quality, pertaining to
specific data elements
 The authors propose many variations but the main ones that most agree on
are:
– Accuracy
– Conformity
– Completeness
– Consistency/Duplication
– Timeliness (sometimes called Currency)
– Integrity
Data Quality Dimensions facilitate the consistent definition of data quality
requirements and metrics across various organizations.
Page  20
Data Quality Development and Support Model
Page  21
Business Intelligence for Enterprise Data
Quality
 Business intelligence tool (COTS)
 Data quality Commercial-off-the-shelf (COTS) product
 Data quality data mart (custom)
 Data quality issue management system
 Extract Transform and Load (ETL) product
 Enterprise Service Bus (SOA and Data Quality Services)
SOLUTION COMPONENTS
Enterprise
Dashboard
Data Quality
Tool
(Profiling/Rule Execution)
Data Quality
Rules
Data Quality
Results
Data Stores
ETL
Data Quality
Mart
Business
Intelligence
Tool
Files
Page  22
Replace Paper Reports with Business
Intelligence
Operational Incidents
Audit Findings
Data Quality Issues Report
Regulatory Compliance
Issues
Weekly Data Management
Status Reports
Replace mounds of paper with a business intelligence solution – gain access to
summary and detailed information on key quality indicators on-demand.
Page  23
ENTERPRISE DATA QUALITY DASHBOARD
(Enterprise View)
QUALITY BY LINE OF BUSINESS
DATA QUALITY MATURITY
CRITICAL DATA
BREAKDOWN
RELEASE 1
WHOLESALE RETAIL COMMERCIAL
WHOLESALE
RETAIL
COMMERCIAL
RELEASE 2
TRENDING OF DATA QUALITY
PRODUCT DATA
HEALTH INDICATORS
CUSTOMER DATA
OVERALL
HEALTH
REGIONAL TREND
QUALITY RATING FOR EACH DATA ELEMENT
Page  24
ENTERPRISE DATA QUALITY DASHBOARD
(Retail Business View)
OVERALL
HEALTH
HEALTH INDICATORS
CRITICAL DATA
BREAKDOWN
RELEASE 1
RELEASE 2
TRENDING OF DATA QUALITY
BORROWER DATA
LOAN DATA
QUALITY RATING FOR EACH LOB DATA ELEMENT
DATA STORE TREND
DATA QUALITY SERVER UTILIZATION
Page  25
Continuously Measure and Improve Quality
Step 1 - Define
Define the scope, goal, budget,
duration and the data quality
problem to be addressed.
Step 4 - Control
Monitor the quality after
remediation to ensure that
data is defect free. If there
are any further changes to be
made, the team makes
changes and again measures
the quality.
Step 2 - Measure
All relevant data quality
statistics and measures
important to the enterprise
are collected at this stage.
Step 3 - Analyze and
Improve
Analysis of the data collected
in the previous phase is
conducted and root cause(s)
identified.
Data remediation is
implemented to improve the
quality of data.
The Enterprise Data Quality dashboard provides transparency into data quality
hotspots that must be addressed proactively.
Page  26
Lessons Learned
 Changing behavior is hard – so use a carrot and stick approach to get people to
change
 Recognize team members that display the expected behavior and highlight what
they did
 Roll out the data quality platform (tools, methodologies, best practices) in a
phased manner
 Educate team members at all levels of the enterprise on the value of strong
governance and data quality
 Facilitate adoption of the tools and business intelligence offerings by providing
them to all organizations free of cost or at a very low cost
 Highlight the fact that data is “owned” by the enterprise and not by a particular
individual or line of business
 Hold people accountable by using operational metrics, data quality metrics and
compliance metrics to make your case
 Measure the hard savings and business value added by the program and
communicate up and down the chain on a regular basis (KPI Dashboard)
Page  27
Summary
 Effective data management provides order out of chaos
 Implementing “Holistic Data Quality” provides transparency into data
quality issues across the information supply chain and helps in
identifying systemic issues
 Focus must be on “Enterprise Critical” data initially. Do not try to
boil the ocean.
 The solution architecture’s core components are the data quality
COTS product, a data quality Data Mart and a Business Intelligence
tool
 Proactive monitoring and measurement of data quality, coupled with
an alerting mechanism, significantly reduces operational incidents
 Implementing HDQ is a strategic initiative and requires C-level
sponsorship and support
Page  28
Questions!!
Page  29
Typical Current State Data Flow
Transactional and
Operational Stores
External
Data Feeds
External
Data Feeds
Data Warehouse
Data Marts
Potential data
quality problem
The current siloed approach to data management is wasteful and doesn’t provide
transparency into systemic issues.
Page  30
Future State Data Flow: Continuous Data
Quality Monitoring
Transactional and
Operational Stores
External
Data Feeds
External
Data Feeds
Data Warehouse
Data Marts
DQ Monitoring
Enterprise Data Architecture should enable straight through processing and offer
operational efficiencies.
Page  31
Key Process Steps
 For each data element that you will monitor do the following (use a template):
– Identify the trustee and custodian (DG/DQ)
– Identify the system of record (DG/DQ)
– Identify the dimensions of data quality that apply (Custodian/Trustee/DQ)
– Capture the data quality rules per dimension (Custodian/Trustee/DQ)
– Capture the frequency of rule execution (Custodian/Trustee)
– Capture the data quality thresholds and tolerances for Red/Yellow/Green status
(Custodian/Trustee)
– Capture the key metrics that you wish to capture (Custodian/Trustee)
– Conduct the logical to physical data mapping (to the data source) for the data
element (DQ/Technology)
Page  32
Dimensions of Data Quality - Explanation
Accuracy: How much does the
data conform to the real world?
Completeness: How much
required data is missing?
Conformity: How much does
the data conform to formats and
domain values?
Duplication: Does the same
data exist in multiple systems? If
so, is it represented the same?
Integrity: Does the data
conform to integrity rules
appropriately? Are relationships
between elements retained?
Currency: How current is the
data? When was it last entered
or refreshed?
There are a dozen or more Data Quality Dimensions that can be defined, but
organizations should pick the ones that best meet their needs.
Page  33