Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality
Transcription
Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality
Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality Jay Zaidi Bonnie O’Neil (Fannie Mae) Data Governance Winter Conference Ft. Lauderdale, Florida November 16-18, 2011 Agenda 1 Introduction 2 Data Quality Challenges and Opportunities 3 Holistic Data Quality (HDQ) 4 Enterprise Data Quality Solutions Architecture 5 Enterprise Data Quality Dashboard Example Page 2 Meet the Authors – Jay Zaidi Enterprise Data Quality Program Lead, Fannie Mae 15+ years in Enterprise Data Management and Solution Architecture Specialized in Financial Services and Healthcare domains Contact: 202-590-3131 [email protected] Page 3 Meet the Authors – Bonnie O’Neil Technical Data Architect, Fannie Mae 20+ years as a Data Architect Author: 3 books – Most recent: Business Metadata Author, over 50 articles & white papers Page 4 Data Quality Management – Challenges and Opportunities Data Silos “Holistic Data Quality (HDQ)” Data Volumes and Velocity Data Optimization and Scalability Complex Data Architectures Simplify Data Architecture Real Time Enterprise Requirements Real Time Data Quality Monitoring Lack of Accountability Strong Data Governance Reactive Mode Proactive Data Quality Controls Lack of Straight Through Processing Automated controls and monitoring Structured and Unstructured Data Leverage “Big Data” Solutions (email, video, logs, system events etc) High level of maturity in Data Quality Management is required to address operational challenges. Page 5 The Data Quality Maturity Journey STEP ONE FOUNDATION & FRAMEWORK • • • • DQ Use Cases Solution Architecture Industry Tool Selection Consistent DQ Definitions • • • • STEP TWO STEP THREE CONSTRUCTING THE RAILROAD EXECUTION Tool Deployment Reporting Capabilities Training Communication • • • • Change Management Awareness Proactive DQ Controls DQ Continuous Improvement • DQ Services Robust data quality management is required to support Regulatory Compliance, Risk Management, Accounting, Financial reporting and other business functions. Page 6 The Data Architecture Spaghetti Department Two Operational Data Store Transactional Store Transactional Store Data Mart Department One Data Mart Data Warehouses Operational Data Store Department Three Diagram by Arnon Rotem-Gal-Oz, April 2007 How do you manage the quality of business critical data in a dynamic and highly complex environment? Page 7 The Information Supply Chain Transparency into quality across supply chain Diagram by George Marinos - The Information Supply Chain: Achieving Business Objectives by Enhancing Critical Business Processes, April 2005 Each link of the information supply chain is dependant on the other – strong controls are needed to manage business critical data. Page 8 Guiding Principles Identify and address data quality issues at point on entry into eco-system Externalize data quality rules from code (rules engine, calculation libraries, derivation logic, etc with governance and controls) Manage enterprise critical data at the enterprise level (ent. Dg, ent. Dq group) and line of business data at local level (local dg and dq) Measure quality of data at systems of record and critical stores, compare against thresholds and tolerances and remediate proactively EDQ team will monitor and manage Page 9 Data Quality Maturity Page 10 Data Quality Use Cases Process Externally supplied data Reconcile data between data stores or data store and files Certify the quality of data Score the quality of data Identify data anomalies in data (db, files, xml, etc.) Page 11 Data Quality Toolkit DQ Standards and Policies DQ Methodology DQ Dimensional Framework DQ Development and Support Model (roles, responsibilities, deliverables by team across the SDLC life cycle) DQ Best Practices Data Quality Requirements Template Data Quality Metrics Template DQ tasks inside SDLC Methodology DQ Solution Architecture DQ Training Documentation DQ Business Case Deck with elevator speech Governance structure – custodians, trustees, stewards, business data lead Map of critical data, SOR’s, custodian, trustee, bdl, Project plan activities related to a DQ project On-boarding documentation for tools, dashboards etc DQ Deployment Model (Centralized vs Federated vs. Hybrid) Lessons Learned/Challenges you will hit Change Management Plan Stakeholder Communication Plan DQ Charter, Strategy, Approach, Sponsorship DQ Case Studies – business value add Synergy between DQ and DG Organizational structure Page 12 Conceptual Solution Architecture Page 13 Deployment Models Central vs Federated Page 14 Challenges You Will Face and Your Response Page 15 Typical Business Scenario Analyze Data and Conduct Forensics (Data Quality Tool) Implement Real Time Data Quality using DQ Services (Data Quality Tool) Internally or Externally Supplied data Enterprise Applications Identify anomalies and remediate issues (Data Quality Tool and EDQ Dashboard) Reports & Executive Dashboards Enterprise Data Stores (Transactional, Operational, Marts and Warehouses) The Enterprise Data Quality Platform provides the tools, methodologies and best practices to identify and remediate data quality issues. Page 16 Issue Logging and Resolution Page 17 Holistic Data Quality Our focus should be on addressing systemic issues. This requires a switch from “reactive” to “proactive” approaches to data quality and quality that is not evaluated or managed in silos, but addressed using a holistic cross-silo approach. “Holistic Data Quality (HDQ)” is the term that I have coined to address this need. – Jay Zaidi Implementing HDQ at the enterprise level is a strategic, multi-year effort for mid to large-sized firms. If done right - the return on investment is many fold. Page 18 Do Not Boil The Ocean Narrowing the scope of the effort will ensure success 10,000 to 20,000 General population of data elements* 2,000 to 3,000 Critical data for a line of business* (“LOB Critical”) 400 to 500 Critical data for the enterprise* (“Enterprise Critical”) Initial Focus should be on “Enterprise Critical” data * Estimates Only Enterprise level governance and quality efforts should focus on Enterprise Critical data. Lines of business should govern and manage the quality of their business critical data. Page 19 Dimensions of Data Quality The concept of Dimensions of Data Quality has been established by many authors in the industry, such as David Loshin and Danette McGilvray: “To be able to correlate data quality issues to business impacts, we must be able to both classify our data quality expectations as well as our business impact criteria.” -David Loshin Dimensions are facets or specific measurements of data quality, pertaining to specific data elements The authors propose many variations but the main ones that most agree on are: – Accuracy – Conformity – Completeness – Consistency/Duplication – Timeliness (sometimes called Currency) – Integrity Data Quality Dimensions facilitate the consistent definition of data quality requirements and metrics across various organizations. Page 20 Data Quality Development and Support Model Page 21 Business Intelligence for Enterprise Data Quality Business intelligence tool (COTS) Data quality Commercial-off-the-shelf (COTS) product Data quality data mart (custom) Data quality issue management system Extract Transform and Load (ETL) product Enterprise Service Bus (SOA and Data Quality Services) SOLUTION COMPONENTS Enterprise Dashboard Data Quality Tool (Profiling/Rule Execution) Data Quality Rules Data Quality Results Data Stores ETL Data Quality Mart Business Intelligence Tool Files Page 22 Replace Paper Reports with Business Intelligence Operational Incidents Audit Findings Data Quality Issues Report Regulatory Compliance Issues Weekly Data Management Status Reports Replace mounds of paper with a business intelligence solution – gain access to summary and detailed information on key quality indicators on-demand. Page 23 ENTERPRISE DATA QUALITY DASHBOARD (Enterprise View) QUALITY BY LINE OF BUSINESS DATA QUALITY MATURITY CRITICAL DATA BREAKDOWN RELEASE 1 WHOLESALE RETAIL COMMERCIAL WHOLESALE RETAIL COMMERCIAL RELEASE 2 TRENDING OF DATA QUALITY PRODUCT DATA HEALTH INDICATORS CUSTOMER DATA OVERALL HEALTH REGIONAL TREND QUALITY RATING FOR EACH DATA ELEMENT Page 24 ENTERPRISE DATA QUALITY DASHBOARD (Retail Business View) OVERALL HEALTH HEALTH INDICATORS CRITICAL DATA BREAKDOWN RELEASE 1 RELEASE 2 TRENDING OF DATA QUALITY BORROWER DATA LOAN DATA QUALITY RATING FOR EACH LOB DATA ELEMENT DATA STORE TREND DATA QUALITY SERVER UTILIZATION Page 25 Continuously Measure and Improve Quality Step 1 - Define Define the scope, goal, budget, duration and the data quality problem to be addressed. Step 4 - Control Monitor the quality after remediation to ensure that data is defect free. If there are any further changes to be made, the team makes changes and again measures the quality. Step 2 - Measure All relevant data quality statistics and measures important to the enterprise are collected at this stage. Step 3 - Analyze and Improve Analysis of the data collected in the previous phase is conducted and root cause(s) identified. Data remediation is implemented to improve the quality of data. The Enterprise Data Quality dashboard provides transparency into data quality hotspots that must be addressed proactively. Page 26 Lessons Learned Changing behavior is hard – so use a carrot and stick approach to get people to change Recognize team members that display the expected behavior and highlight what they did Roll out the data quality platform (tools, methodologies, best practices) in a phased manner Educate team members at all levels of the enterprise on the value of strong governance and data quality Facilitate adoption of the tools and business intelligence offerings by providing them to all organizations free of cost or at a very low cost Highlight the fact that data is “owned” by the enterprise and not by a particular individual or line of business Hold people accountable by using operational metrics, data quality metrics and compliance metrics to make your case Measure the hard savings and business value added by the program and communicate up and down the chain on a regular basis (KPI Dashboard) Page 27 Summary Effective data management provides order out of chaos Implementing “Holistic Data Quality” provides transparency into data quality issues across the information supply chain and helps in identifying systemic issues Focus must be on “Enterprise Critical” data initially. Do not try to boil the ocean. The solution architecture’s core components are the data quality COTS product, a data quality Data Mart and a Business Intelligence tool Proactive monitoring and measurement of data quality, coupled with an alerting mechanism, significantly reduces operational incidents Implementing HDQ is a strategic initiative and requires C-level sponsorship and support Page 28 Questions!! Page 29 Typical Current State Data Flow Transactional and Operational Stores External Data Feeds External Data Feeds Data Warehouse Data Marts Potential data quality problem The current siloed approach to data management is wasteful and doesn’t provide transparency into systemic issues. Page 30 Future State Data Flow: Continuous Data Quality Monitoring Transactional and Operational Stores External Data Feeds External Data Feeds Data Warehouse Data Marts DQ Monitoring Enterprise Data Architecture should enable straight through processing and offer operational efficiencies. Page 31 Key Process Steps For each data element that you will monitor do the following (use a template): – Identify the trustee and custodian (DG/DQ) – Identify the system of record (DG/DQ) – Identify the dimensions of data quality that apply (Custodian/Trustee/DQ) – Capture the data quality rules per dimension (Custodian/Trustee/DQ) – Capture the frequency of rule execution (Custodian/Trustee) – Capture the data quality thresholds and tolerances for Red/Yellow/Green status (Custodian/Trustee) – Capture the key metrics that you wish to capture (Custodian/Trustee) – Conduct the logical to physical data mapping (to the data source) for the data element (DQ/Technology) Page 32 Dimensions of Data Quality - Explanation Accuracy: How much does the data conform to the real world? Completeness: How much required data is missing? Conformity: How much does the data conform to formats and domain values? Duplication: Does the same data exist in multiple systems? If so, is it represented the same? Integrity: Does the data conform to integrity rules appropriately? Are relationships between elements retained? Currency: How current is the data? When was it last entered or refreshed? There are a dozen or more Data Quality Dimensions that can be defined, but organizations should pick the ones that best meet their needs. Page 33