Enterprise Data Validation Architecture (EDVA): Fixing Data Quality
Transcription
Enterprise Data Validation Architecture (EDVA): Fixing Data Quality
Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Enterprise Data Validation Architecture (EDVA): Fixing Data Quality for Enterprise Interoperability Cambridge, Massachusetts, USA July 18-19, 2007 Bill McMullen PG 798 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Outline • Purpose • Nature of the Problem • Elements of a Solution • Business Case Considerations • Q&A Business Enterprise Architecture (BEA) Business Enterprise Architecture (BEA) &&Enterprise Transition Plan (ETP) Enterprise Transition Plan (ETP) September 2006 September 2006 An Objective – “Transform the Department’ Department’s An Objective – “Transform the Department’s supply chain information environment by: chain information environment by: 1)supply improving data integrity and visibility by 1) improving data integrity and visibility by defining, managing, and utilizing item, customer, defining, managing, and utilizing customer, and vendor master data; ; and 2)item, reducing data and vendor data;variability and 2) reducing complexity andmaster minimizing on the complexity and minimizing variability on the supply chain business transactions by adopting supply chain business transactions by adopting standardized transaction and business rules.” rules.” standardized transaction and business rules.” Department of Defense Department of Defense Enterprise Architecture Federation Strategy Enterprise Architecture Federation Strategy Draft Version 1.0 Draft Version 1.0 18 September 2006 18 September 2006 Prepared by the DoD CIO Prepared by the DoD CIO 2 PG 799 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Purpose • Purpose of Briefing Provides an overview of an Enterprise Data Validation Engine (This may be an important management option for certain business cases.) • Purpose of an Enterprise Data Validation Engine – Aligns data across previously independent legacy system sources – Eliminates routine manual reconciliation efforts previously needed to coordinate legacy data sources – Improves decision making at the domain and/or enterprise levels through enhanced legacy data quality – Can be applied in many ways across the enterprise 3 PG 800 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Decision Making Capability for the Next Decade • ERP promises highly accurate data and superior decision making ability, but a wait is involved – ECSS is expected to be deployed about 2011+ • It is not necessary to wait to increase data and decision integrity – Selected “legacy” systems, with minimal effort, can provide more integrity in the meantime – Global Logistics Support Capability (GLSC) requires reengineering both the business processes and their supporting systems • It is possible to make improvements without distracting from ECSS-ERP – The proposed approach will actually facilitate the migration of valid data to ECSS 4 PG 801 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Nature of the Problem Regarding Enterprise Data Validation • The Business Problem – – – Data from separate legacy systems frequently can not reliably be combined at domain or enterprise levels When combined, may not yield trustworthy results for quality enterprise decision making Examples • – “Almost 30 percent of Air Force backorder data is inaccurate.” – “Forty-two percent of in-transit records … were invalid. This equates to 4,627 transactions, valued at $325M.” Case #2: 98% failure to integrate databases across domain • Case #3: 25% mismatch (est.) among several vehicle systems Manual reconciliation to compensate is not satisfactory in today's environment (AFSO21) Airman resources no longer will be available to “fix” data (re GLSC) The Technical Issues – – 5 – • • • Case #1: 29% mismatch among two requisition systems (re GLSC) Synchronization of multiple source system data Association of multiple source data PG 802 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 The Systems Gap: Misaligned Data Leads to Questions Source systems often describe different aspects of the same asset. Data for Vehicles Within 3 Systems Integration blends the data from the multiple source systems into a global “mismatched” view. A B A1 B2 C2 D3 D2 E3 E2 F3 G2 G3 H3 I3 J3 K3 L3 M3 B1 C1 D E D1 F I1 G J1 H K1 Maintenance Maintenance Parts Vehicle Activity OLVIMS Activity Inventory Inventory Parts Inventory SBSS Vehicle Inventory AFEMS Issue – Time and Instance Integrity Across Systems H2 I2 J2 I L1 J M1 K2 L2 Note: The Synchronization Issue shows up in a database when data from multiple source systems does not correspond across the sources in a logical manner, or as expected, given the business rules. 6 C PG 803 © 2007 The MITRE Corporation. All rights reserved K L M Problem – Reporting Produces Inconsistencies (Q: How much can we trust this?) Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Enforce Business Rules & Link Records A1 B1 C1 Equipment Information D1 (SBSS & AFEMS) K1 I1 J1 A1 L1 M1 B2 C2 B1 B2 C1 C2 D1 D2 D3 E2 E3 G2 G3 H2 H3 D2 E2 G2 H2 I2 F3 Organizational Information I1 I2 I3 (MPES) J1 J2 J3 K1 K2 K3 L1 L2 L3 J2 K2 L2 M1 Cross-Reference Information (GFM-DI) M3 D3 E3 F3 Aircraft Information (ForceTab) G3 H3 I3 J3 K3 L3 M3 7 PG 804 © 2007 The MITRE Corporation. All rights reserved • Check against enterprise business rules • Make corrections at this point or they usually won’t (can’t) get done later Note: The Association Issue shows up in a federated database when when invalid associations among object is erroneously permitted. Validation against business rules must be performed prior to a source system allowing such a transaction to update its own database. Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Enterprise Decision Maker’s Ability to Use Data Problems Compound as More Systems Are Considered Very Poor l a n o i t o N Poor CAU TION Good Very Good Small * Only the topic of Object Association is addressed hereafter, because the solution to this single problem facilitates the solution of Attribute Synchronization (even though its solution can be attempted separately). 8 PG 805 © 2007 The MITRE Corporation. All rights reserved Medium Large Compounding of Missing, Mismatched, or Unsynchronized Data (Due to improperly associated and serialized enterprise-wide data and enforcement of enterprise-wide business rules) Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Some Elements of a Solution for Using an Enterprise Data Validation Engine Associations Across Systems Determine how different objects of two systems are associated. System A Object X 9 PG 806 © 2007 The MITRE Corporation. All rights reserved Validation Engine Associative CrossReference New business rules must be enforced by Systems A and B. System B Object Y Note: EII usually requires that objects in separate database (i.e., Objects X and Y above) be logically related according to certain enterprise business rules. Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Leveraging the Air Force Architecture Unclassified / NIPRNet NAF MAJCOM NAF NAF WING SG NAF WING WING SQN CrossCross-Reference for Associations SQN CREW BILLET OG SQN FLT FLT CREW BILLET WING SG OG FLT Classified / SIPRNet HAF HAF MAJCOM Data Services BoM BI Tools Viewers (e.g., FSC) Enterprise Data Warehouse AF GLSC Archive AF Portal CREW BILLET Data Services Metadata BI Tools Viewers (e.g., FSC) Enterprise Data Warehouse Enterprise Data Validation Engine for A Exchange Tools GLSC Data Translator Discovery FLT CREW BILLET BoM Enterprise AF GLSC Data Validation Engine for GLSC Data LoadingM AF GLSC SQN AF GLSC Enterprise Data Archive Validation Engine for etc. Discovery Metadata AF GLSC AF GLSC Exchange GLSC Data Loading Tools GLSC Data Translator AF (Up to AOC and Up from Legacy) Portal Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) (Up to AOC and Up from Legacy) Adapter M Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Change Identification & Notification for M ES-S GCSS-AF Infrastructure Guard GCSS-AF Infrastructure Adapter A Adapter etc. Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Change Identification & Notification for A ES-S GLSC AFEMS ... Others Authoritative Sources (Unclassified) 10PG 807 © 2007 The MITRE Corporation. All rights reserved AOC Change Identification & Notification for etc. ES-S Joint Staff AFEMS GLSC ... AFEMS etc. Others Previous Snapshot for Authoritative NonNon-Intrusion Systems Sources (Classified) AOC Joint Staff Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Source System Architecture Using the Enterprise Data Validation Engine • Leads to greatly enhanced data agreement across systems that currently lack common standards or interoperability • Enterprise Data Validation Engine ensures data integrity as it trickles in from the various systems CrossCross-Reference for Associations Enterprise Data Validation Engine for M Enterprise Data Validation Engine for A Enterprise Data Validation Engine for etc. Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) • Enterprise Data Validation Architecture Adapter M Adapter A Adapter etc. Change Identification & Notification for M Change Identification & Notification for A Change Identification & Notification for etc. ES-S AFEMS etc. – Components include • • • • Cross-Reference for Associations Enterprise Data Validation Engine Change Identifier and Notifying Interoperability Protocols (7 possible choices) [Database Snapshot for Non-Intrusion*] * Can be non-intrusive to source system, but with critical tradeoffs • Performance • Data Quality 11PG 808 © 2007 The MITRE Corporation. All rights reserved This architecture is an adaptation from “Protocols for Integrity Constraint Checking in Federated Databases”, P. Grefen and J. Widom, 1997 Previous Snapshot for * NonNon-Intrusion Systems Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Data Entry Scenario Where Legacy Modifications Allowed Improved Legacy Data Entry Process Current Legacy Data Entry Process ID model serial number Address Line 1 Line 2 Line 3 If already existing: populate form CrossReference Check for existence If not already existing: continue as before Finish data entry zzzzzz-zzz yyyyyy-yyy xxxxxx-xxx ID ----- ----------------------------- Local Store --------- ----1. Initial Entry Screen ID model serial number Address Line 1 Line 2 Line 3 --------- ----2. PrePre-Populate Screen ID model serial number Address Line 1 Line 2 Line 3 xxxxxx-xxx zzzzzz-zzz yyyyyy-yyy ERPERP-new 3. Remaining Data Entry Local Store 12PG 809 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Business Case Considerations • Program-Centric Aspects – Development Effort and Time [Investment] • Added Effort – Systems Aspects (enhanced data integrity capabilities vs. system revisions) – Operational Aspects (elimination of reconciliation activities vs. data entry revisions) • Coordination Among Other Stakeholders (via a COI?) – Lifecycle Benefits (may accrue over time) [Return] • Improved Data Integrity • Improved Data Timeliness • Eliminate Redundant Data Entry • Eliminate Reconciliation Effort • Enterprise-Centric Aspects (in addition to above) – Generic Solution Template – Prerequisite to ERPs (DEAMS, ECSS, DIMHRS) Migration – Improved Integrity of Enterprise-Level Decision Making 13PG 810 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Considerations for Use of Enterprise Data Validation Where would the Enterprise Data Validation Engine be considered for use? data”” herein includes any data aggregation Note: The term “data (element, record, file, cross cross--system composite, etc.). After conducting Enterprise-Level Cross-System Data Quality assessments for given products, then perform the following Data Fixing Strategy as applicable. Rule 1 2 3 4 5 6 7 Can Data Be “Abandoned”? Yes No No No No No No Is Current Quality OK? - Yes No No No No No Feasibly to Re-Entered? - - Yes No No No No Feasible to Reconcile? - - - Manually Automatically Auto- matically Automatically Can Apply Fix at Migration Time? - - - - Yes No No Fixing Sources Cost Acceptable? - - - - - Yes* No Data Fixing Strategy Do Nothing Do Nothing Manually Re-Entry Manually Reconcile Do Data Cleansing Enforce Enterprise Data Validation ??? Situation 14PG 811 © 2007 The MITRE Corporation. All rights reserved Data Quality Improvements 8 Indicates Inherent Government Responsibilities This issue is This issue similar to theis similar to thein Y2K problem Y2Kyou problem that can’ can’t in that you can’t wait until the wait until the last minute to last minute to think about or think about or act upon. It has actadditional upon. It has the the additional characteristic characteristic that the sooner the sooner itthat is addressed it is addressed the the better the quality better the data gets datatime. quality gets over over time. * If interfaces are being changed anyway, then these costs may be negligible to use EDVA template. Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Questions and Answers 15PG 812 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Backup Slides 16PG 813 © 2007 The MITRE Corporation. All rights reserved Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Associations Between Objects Federating among systems can require that different objects of two systems be associated. System A Object X 17PG 814 © 2007 The MITRE Corporation. All rights reserved Validation Engine Associative CrossReference Tx means transaction. Thus, new synchronization business rules must be enforced on Systems A and B. System B Object Y Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Vision Architecture (using GLSC example) Unclassified / NIPRNet NAF MAJCOM NAF NAF WING SG NAF WING WING FLT FLT CREW CREW BILLET OG SQN SQN BILLET WING SG OG SQN FLT Classified / SIPRNet HAF HAF MAJCOM Data Services BoM BI Tools Viewers (e.g., FSC) Enterprise Data Warehouse AF GLSC Archive SQN FLT CREW CREW BILLET BILLET Data Services BoM Metadata AF GLSC AF GLSC Archive AF GLSC Exchange GLSC Data Loading Tools GLSC Data Translator ES-S GLSC AFEMS ...AF GLSC Data Adaptors connect the authoritative sources to the ESB and AOC Joint Staff Others provide various mediation services including format and protocol translation, mapping and translation. Authoritative Sources (Unclassified) 18PG 815 © 2007 The MITRE Corporation. All rights reserved AF GLSC AF GLSC Exchange GLSC Data Loading Tools GLSC Data Translator AF Portal Guard Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Metadata Discovery (Up to AOC and Up from Legacy) GCSS-AF Infrastructure Viewers (e.g., FSC) Enterprise Data Warehouse Discovery AF Portal BI Tools (Up to AOC and Up from Legacy) GCSS-AF Infrastructure Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) ES-S GLSC AFEMS ... Others Authoritative Sources (Classified) AOC Joint Staff Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Source System Architecture (expertise) CrossCross-Reference for Associations Enterprise Data Validation for M Enterprise Data Validation for A Enterprise Data Validation for etc. Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Adapter M Adapter A Adapter etc. Change Identification & Notification for A Change Identification & Notification for etc. CrossEnterprise (functional) SMEs (COI?) and Source System SMEs GCSS-AF (technical) SMEs (with AFKS mappings) Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Change Identification & Notification for M ES-S AFEMS ... ES-S AFEMS Others Authoritative Sources 19PG 816 © 2007 The MITRE Corporation. All rights reserved Previous Snapshot for NonNon-Intrusion Systems etc. Source System SMEs Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Source System Architecture (some needed pre-requisites) Conceptual Model of “Common” Common” Objects and Attributes CrossCross-Reference for Associations Enterprise Data Validation for M Common Data Model Enterprise Data Validation for A Enterprise Data Validation for etc. Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Adapter M Adapter A Adapter etc. Change Identification & Notification for M Change Identification & Notification for A Change Identification & Notification for etc. ES-S AFEMS etc. EnterpriseEnterpriseLevel Business Rules for How Source System Relates to “Common” Common” Objects and Attributes Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Business Rules (i.e., Constraints) ES-S AFEMS ... Others Authoritative Sources 20PG 817 © 2007 The MITRE Corporation. All rights reserved Previous Snapshot for NonNon-Intrusion Systems Source System Objects and Attributes Related to “Common” Common” Ones Data Mappings to Common Model (above) Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Source System Architecture Variant Where Source System A Is ESB-Enabled CrossCross-Reference for Associations Enterprise Data Validation for M Enterprise Data Validation for A Enterprise Data Validation for etc. Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Adapter M Adapter A Adapter etc. If Already Publishing Change Identification & Notification for etc. Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Change Identification & Notification for M ES-S AFEMS ... Change Identification & Notification for A Then Subscribe ES-S AFEMS etc. Others Authoritative Sources 21PG 818 © 2007 The MITRE Corporation. All rights reserved Previous Snapshot for NonNon-Intrusion Systems However, existing messages from source system A to ESB must be sufficient for enterpriseenterprisewide needs, both in its timing and in its elements. Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Source System Architecture Variant where Source System A Either Is AFKS-Enabled “Trap” Trap” Data Services However, existing messages from source system A to ESB must be sufficient for enterpriseenterprise-wide needs, both in its timing and in its elements. CrossCross-Reference for Associations Enterprise Data Warehouse Changes or Subscribe System A Database Enterprise Data Validation for M Enterprise Data Validation for A Change Identification & Notification for A Enterprise Data Validation for etc. Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Adapter M Adapter A Adapter etc. Change Identification & Notification for A Framework & Enterprise Services Bus (EAI, Web Services, XML, etc.) Change Identification & Notification for M If Already Publishing Change Identification & Notification for etc. ES-S AFEMS ... 22PG 819 © 2007 The MITRE Corporation. All rights reserved Or Subscribe Not part of these discussions. ES-S AFEMS etc. BI Tools Enterprise Data Warehouse Others Authoritative Sources These are either/or, not both. Previous Snapshot for NonNon-Intrusion Systems AF GLSC System Archive AF GLSC A Database Viewers (e.g., FSC) Metadata AF GLSC Exchange GLSC Data Loading Tools GLSC Data Translator (Up to JS and Up from Legacy) Discovery Core vehicle data copies with additional domain data Proceedings of the MIT 2007 Information Quality Industry Symposium AS-IS Data Sharing Example from Vehicle Domain System of Record for Interfaces with limited automation; mostly abandoned in favor of dual data entry Rejected tx have limited notification processes and must each be cleared by workers 1 2 AFEMS Tx are edited for syntax by SBSS Rejected Transactions Tx are edited for syntax and Require Human semantics by AFEMS Intervention t Some tx are simply forwarded after en m syntax check (no other processing ip t qu done by SBSS) E es f er o t s In pe nt y e T m Tx ge a an M 2 Air Force Equipment Management System Vehicle Records On-Line Vehicle Information Management System Vehicle Records Vehicle Operations Management Data PG 820 3 System of Record for Inventory Financial Data SBSS OLVIMS •About 200 Sites •Distributed Data •Stand Alone Operations Rejected Transactions Require Human Intervention t es t er es Int ter ial nc t In en ina fF em ag so an pe Ty tM en Tx ipm qu dE an 3 Equipment Management Tx Types of Financial Interest and 1 Equipment Management Interest Rejected Transactions Require Human Intervention Standard Base Supply System Vehicle Records Summary Financial Reporting $$$ Systems Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Vehicle Life Cycle Change Highlights Authorized? Generating Parties Interested Parties Participating Systems Wait for (SPR) Acquisition Vehicle Management, Equipment Management, Inventory Management OLVIMS, AFEMS, SBSS Receive (REC) Vehicle Management Vehicle Management, Equipment Management, Inventory Management OLVIMS, AFEMS, SBSS Operate (mission capable) Vehicle Operations Vehicle Management, Equipment Management OLVIMS, AFEMS Maintain (non-mission capable) Vehicle Maintenance Vehicle Management, Equipment Management OLVIMS, AFEMS Maintain (awaiting parts) Vehicle Maintenance Vehicle Management OLVIMS Turn-In (salvage) Vehicle Management Vehicle Management, Equipment Management OLVIMS, AFEMS, SBSS Allowed? Need: One Pickup GO Acquire Wait for Priority Buy List $ APPROVED $ Receive Operate and Maintain • • • • 24PG 821 © 2007 The MITRE Corporation. All rights reserved Mission Capable Mission in Progress Needs Preventive Maintenance Work Order in Progress Turn - In DR MO Proceedings of the MIT 2007 Information Quality Industry Symposium The MIT Information Quality Industry Symposium, 2007 Characteristics of The Situations Attribute Synchronization Across an Object Constraint Mgr. Associative CrossReference • Multiple object creation responsibility • CrossCross-Reference contains − Object identifiers (with association rules) − Data common to the systems • Systems can − Check object association validity − Eliminate crosscross-system data conflicts − Reduce redundant data entry System A System B System C Object X Object Y Object Z Account Information Customer Information Branch Information Constraint Mgr. Associations Among Objects 25PG 822 © 2007 The MITRE Corporation. All rights reserved Associative CrossReference • Object creation is System A responsibility • CrossCross-Reference contains − Object Identifiers − Data common to the systems • Systems can − Check object validity (B & C) − Eliminate conflicting data − Eliminate redundant data entry System A System B System C Object X piece 1 Object X piece 2 Object X piece 3 Acquire Data Finance Data Inventory Data