DWCMM: The Data Warehouse Capability Maturity Model

Transcription

DWCMM: The Data Warehouse Capability Maturity Model
DWCMM: The Data Warehouse
Capability Maturity Model
Thesis for the Degree of Master of Science
Author:
Catalina Sacu
Student Number: 3305260
Thesis Number: INF/SCR-09-86
Institute of Information and Computer Sciences
Utrecht University, The Netherlands
Supervisors:
Utrecht University: dr. Marco Spruit 1st
dr. ir. Johan Versendaal 2nd
Inergy: Frank Habers
Abstract
Data Warehouses (DWs) and Business Intelligence (BI) have been part of a very dynamic and popular
field of research in the last years as they help organizations in making better decisions and increasing
their profitability. Unfortunately, many DW/BI solutions fail to bring the desired results, and therefore, it
is important to have an overview of the critical success factors. However, this is usually very difficult to
do as a DW/BI project is a very complex endeavour. This research offers a solution to this problem by
creating a Data Warehouse Capability Maturity Model (DWCMM) focused on the technical and
organizational aspects involved in developing a DW environment. Based on an extensive literature study,
the DWCMM consists of a maturity matrix and a maturity assessment questionnaire that analyze the main
categories and sub-categories necessary when implementing a DW/BI solution. The model and its
associated questionnaire can be used to help organizations assess their current DW solution and provide
them with guidelines for future improvements. In order to validate and enrich the theory created, the
DWCMM was evaluated empirically through five expert interviews and four case studies. Based on the
evaluation results, some minor changes were made to improve the model. The main conclusion of this
research is that the DWCMM can be successfully applied in practice and organizations can use it as a
starting point for improving their DW/BI solution.
II
Acknowledgements
Utrecht, August 2010
I would like to use this opportunity to thank some people who made a significant contribution to this
research.
First, I would like to express my gratitude to my supervisors at Utrecht University, dr. Marco Spruit and
dr. ir. Johan Versendaal, for their professional guidance and constructive feedback during the project.
Second, I would like to thank my external supervisor Frank Habers for offering me the opportunity to
perform my research during an internship at Inergy. He has been extremely helpful in providing prompt
access to all the resources I needed for the research and in offering me guidance and advice every time I
needed it. He has also helped me arrange several expert interviews for validating my model. I would also
like to thank other colleagues from Inergy, especially Rick Tijsen, for their support and enthusiasm during
my research there.
Furthermore, I would also like to express my gratitude to all the experts and respondents who made room
for me in their busy schedules and accepted to review my model and fill in the assessment questionnaire,
respectively.
Last, but not least, I would like to thank my parents, my boyfriend and my friends for their constant love,
support and welcome distractions when needed.
Catalina Sacu
III
Table of Contents
1
2
Introduction .......................................................................................................................................... 1
1.1
Problem Definition & Research Motivation ................................................................................. 1
1.2
Research Questions ....................................................................................................................... 2
1.3
Research Approach ....................................................................................................................... 3
What are Data Warehousing and Business Intelligence? ..................................................................... 5
2.1 The Intelligent Organization ............................................................................................................... 5
2.2 Data-Information-Knowledge............................................................................................................. 6
2.3 The Origins of DW/BI ........................................................................................................................ 8
2.4 DW/BI Definition ............................................................................................................................... 9
2.5 DW/BI Business Value ..................................................................................................................... 11
2.6 Inmon vs. Kimball ............................................................................................................................ 12
2.7 Maturity Modelling........................................................................................................................... 13
2.7.1
Data Warehouse Capability Maturity Model (DWCMM) Forerunners .............................. 14
2.8 Summary........................................................................................................................................... 16
3 A Data Warehouse Capability Maturity Model ....................................................................................... 17
3.1 From Nolan‘s Stages of Growth to the Data Warehouse Capability Maturity Model ...................... 17
3.2 DWCMM .......................................................................................................................................... 18
4 DW Technical Solution Maturity ............................................................................................................. 24
4.1
General Architecture and Infrastructure...................................................................................... 24
4.1.1
What is Architecture?.......................................................................................................... 24
4.1.2
Conceptual Architecture and Its Layers .............................................................................. 25
4.1.3
Infrastructure ....................................................................................................................... 27
4.1.4
Metadata .............................................................................................................................. 28
4.1.5
Security ............................................................................................................................... 30
4.1.6
Business Rules for DW ....................................................................................................... 31
4.1.7
DW Performance Tuning .................................................................................................... 32
4.1.8
DW Update Frequency........................................................................................................ 33
4.2
Data Modelling ........................................................................................................................... 34
4.2.1
Data Modelling Definition and Characteristics................................................................... 34
4.2.2
Data Models Classifications (Data Models Levels and Techniques) .................................. 34
4.2.3
Dimensional Modelling....................................................................................................... 37
IV
4.2.4
Data Modelling Tool ........................................................................................................... 41
4.2.5
Data Modelling Standards ................................................................................................... 41
4.2.6
Data Modelling Metadata Management .............................................................................. 42
4.3
Extract – Transform – Load (ETL) ............................................................................................. 43
4.3.1
What is ETL? ...................................................................................................................... 43
4.3.2
Extract ................................................................................................................................. 43
4.3.3
Transform............................................................................................................................ 44
4.3.4
Load .................................................................................................................................... 46
4.3.5
Manage ................................................................................................................................ 47
4.3.6
ETL Tools ........................................................................................................................... 48
4.3.7
ETL Metadata Management................................................................................................ 49
4.3.8
ETL Standards..................................................................................................................... 49
4.4
BI Applications ........................................................................................................................... 50
4.4.1
What are BI Applications? .................................................................................................. 50
4.4.2
Types of BI Applications .................................................................................................... 50
4.4.3
BI Applications Delivery Method ....................................................................................... 54
4.4.4
BI Applications Tools ......................................................................................................... 54
4.4.5
BI Applications Metadata Management.............................................................................. 55
4.4.6
BI Applications Standards .................................................................................................. 55
4.5 Summary........................................................................................................................................... 56
5 DW Organization and Processes .............................................................................................................. 57
5.1 DW Development Processes............................................................................................................. 57
5.1.1 DW Development Phases .......................................................................................................... 58
5.1.2 The DW/BI Sponsor .................................................................................................................. 64
5.1.3 The DW Project Team and Roles .............................................................................................. 65
5.1.4 DW Quality Management ......................................................................................................... 66
5.1.5 Knowledge Management........................................................................................................... 66
5.2 DW Service Processes ...................................................................................................................... 67
5.2.1 From Maintenance and Monitoring to Providing a Service ...................................................... 68
5.2.2 IT Service Frameworks ............................................................................................................. 68
5.2.3 DW Service Components .......................................................................................................... 71
5.3 Summary........................................................................................................................................... 77
6 Evaluation of the DWCMM..................................................................................................................... 78
V
6.1 Expert Validation.............................................................................................................................. 78
6.1.1 Expert Review Results and Changes ......................................................................................... 78
6.2 Multiple Case Studies ....................................................................................................................... 81
6.2.1 Case Study Approach ................................................................................................................ 82
6.2.2 Case Overview .......................................................................................................................... 84
6.2.3 Case Studies Results and Conclusions ...................................................................................... 90
6.3 Summary........................................................................................................................................... 93
7 Conclusions and Further Research ........................................................................................................... 94
7.1 Conclusions ...................................................................................................................................... 94
7.2 Limitations and Further Research ..................................................................................................... 95
8 References ................................................................................................................................................ 97
Appendix A: DW Detailed Maturity Matrix ............................................................................................. 105
Appendix B: The DW Maturity Assessment Questionnaire (Final Version)............................................ 112
Appendix C: DW Maturity Assessment Questionnaire (Redefined Version) ........................................... 124
Appendix D: Expert Interview Protocol ................................................................................................... 134
Appendix E: Case Study Interview Protocol ............................................................................................ 136
Appendix F: Case Study Feedback Template ........................................................................................... 138
Appendix G: Paper.................................................................................................................................... 139
VI
List of Figures
Figure 1: IS Research Framework (adapted from (Hevner et al., 2004))...................................................... 4
Figure 2: Information Gap (adapted from (Tijsen et al., 2009)). .................................................................. 5
Figure 3: The BI Cycle (adapted from (Thomas, 2001)). ............................................................................. 6
Figure 4: The Data-Information-Knowledge-Wisdom Hierarchy (adapted from (Hey, 2004)). .................. 7
Figure 5: Data Warehouse Capability Maturity Model (DWCMM). ......................................................... 18
Figure 6: DWCMM Condensed Maturity Matrix. ...................................................................................... 22
Figure 7: A Typical DW Architecture (adapted from (Chaudhuri & Dayal, 1997)). ................................. 25
Figure 8: DW Design Process Levels (adapted from (Husemann et al., 2000)). ........................................ 35
Figure 9: Star Schema vs. Cube (adapted from (Chauduri & Dayal, 1997)). ............................................. 38
Figure 10: Case Study Method (adapted from (Yin, 2009)). ...................................................................... 83
Figure 11: Alignment Between Organization A‘s Maturity Scores. ........................................................... 88
Figure 12: Alignment Between Organization B‘s Maturity Scores. ........................................................... 88
Figure 13: Alignment Between Organization C‘s Maturity Scores. ........................................................... 88
Figure 14: Alignment Between Organization D‘s Maturity Scores. ........................................................... 89
Figure 15: Benchmarking for Organization A. ........................................................................................... 90
VII
List of Tables
Table 1: Differences between operational databases and DWs (adapted from (Breitner, 1997)). .............. 10
Table 2: Comparison of Essential Features of Inmon‘s and Kimball‘s Data Warehouse Models (Breslin,
2004). .......................................................................................................................................................... 13
Table 3: Overview of Maturity Models. ..................................................................................................... 14
Table 4: DW General Questions. ................................................................................................................ 20
Table 5: DW Architecture Maturity Assessment Questions. ...................................................................... 27
Table 6: Infrastructure Maturity Assessment Questions. ............................................................................ 28
Table 7: Business Metadata vs. Technical Metadata (adapted from (Moss & Atre, 2003)). ...................... 28
Table 8: Metadata Management Maturity Assessment Question. .............................................................. 30
Table 9: Security Maturity Assessment Question. ...................................................................................... 31
Table 10: Business Rules Maturity Assessment Questions. ....................................................................... 32
Table 11: Performance Tuning Maturity Assessment Question. ................................................................ 33
Table 12: Update Frequency Maturity Assessment Question. .................................................................... 34
Table 13: Data Model Synchronization and Levels Maturity Assessment Questions. ............................... 37
Table 14: Dimensional Modelling Maturity Assessment Questions........................................................... 40
Table 15: Data Modelling Tool Maturity Assessment Questions. .............................................................. 41
Table 16: Data Modelling Standards Maturity Assessment Questions. ...................................................... 42
Table 17: Data Modelling Metadata Management Maturity Assessment Questions. ................................. 43
Table 18: Data Quality Maturity Assessment Questions. ........................................................................... 46
Table 19: ETL Complexity Maturity Assessment Question. ...................................................................... 47
Table 20: ETL Management and Monitoring Maturity Assessment Question. .......................................... 48
Table 21: ETL Tools Maturity Assessment Question. ................................................................................ 49
Table 22: ETL Metadata Management Maturity Assessment Question. .................................................... 49
Table 23: ETL Standards Maturity Assessment Questions......................................................................... 50
Table 21: Table 24: BI Applications Maturity Assessment Question......................................................... 53
Table 25: BI Applications Delivery Method Maturity Assessment Question. ........................................... 54
Table 26: BI Tools Maturity Assessment Question. ................................................................................... 55
Table 27: BI Applications Metadata Management Maturity Assessment Question. .................................. 55
Table 28: BI Applications Standards Maturity Assessment Questions. ..................................................... 56
Table 29: DW Development Processes General Maturity Assessment Question. ...................................... 58
Table 30: Project Management Maturity Assessment Question. ................................................................ 60
Table 31: Requirements Definition Maturity Assessment Question........................................................... 61
VIII
Table 32: Testing and Acceptance Maturity Assessment Question. ........................................................... 63
Table 33: Development/ Testing/ Acceptance/ Production Maturity Assessment Questions. .................... 64
Table 34: DW/BI Sponsorship Maturity Assessment Question.................................................................. 65
Table 35: DW Project Team and Roles Maturity Assessment Question. ................................................... 65
Table 36: DW Quality Management Maturity Assessment Question. ........................................................ 66
Table 37: Knowledge Management Maturity Assessment Question. ......................................................... 67
Table 38: Overview of IT Service Frameworks.......................................................................................... 69
Table 39: ITIL‘s Core Components (adapted from (Cater-Steel, 2006)).................................................... 69
Table 40: IT Service CMM‘s Key Process Areas (adapted from (Paulk et al., 1995)). .............................. 71
Table 41: Maintenance and Monitoring Maturity Assessment Question. ................................................... 72
Table 42: Service Quality Management Maturity Assessment Question. .................................................. 73
Table 43: Service Level Management Maturity Assessment Question. ..................................................... 74
Table 44: Incident Management Maturity Assessment Question. .............................................................. 74
Table 45: Change Management Maturity Assessment Question. ............................................................... 75
Table 46: Incident Management Maturity Assessment Question. .............................................................. 76
Table 47: Availability Management Maturity Assessment Question. ........................................................ 76
Table 48: Release Management Maturity Assessment Question. ............................................................... 77
Table 49: Expert Overview. ........................................................................................................................ 78
Table 50: Rephrased or Changed Questions and Answers. ........................................................................ 80
Table 51: Case and Respondent Overview. ................................................................................................ 84
Table 52: Technologies Usage Overview. .................................................................................................. 85
Table 53: Organizations‘ Maturity Scores. ................................................................................................. 87
Table 54: Maturity Scores Analysis. ........................................................................................................... 89
IX
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
1 Introduction
In nowadays economy, organizations are part of a very dynamic environment due to continuous changing
conditions and relationships. At the same time, the external environment is an important source of
information (Aldrich & Mindlin, 1978) that organizations will have to gather and process very rapidly in
order to maintain their competitive advantage (Choo, 1995). Moreover, as (Kaye, 1996) notes,
―organizations must collect, process, use, and communicate information, both external and internal, in
order to plan, operate and take decisions.‖ The ongoing request for profits, increasing competition and
demanding customers, all require organizations to take the best decisions as fast as possible (Vitt et al.,
2002). Hence, in order to survive, companies have to adapt themselves to this new information
environment by shortening the period of time between the moment of acquiring the information and
getting the right results. One of the solutions that can narrow down this time gap and improve the decision
making process is the implementation of Data Warehouses and Business Intelligence (BI) applications.
1.1 Problem Definition & Research Motivation
The most fundamental aspect in a particular organization in today‘s highly globalized market is the
critical decision making capacity of the management, which influences the successful running of business
operations. Hence, it is very important for organizations to manage both transaction- and non-transactionoriented information for making timely decisions and react to changing business circumstances
(AbuSaleem, 2005). Moreover, in the last couple of years, enterprises have changed their business focus
towards customer orientation to remain competitive. Accordingly, maintaining relationships with clients
and managing their data have appeared as top issues to be considered by global companies. Also, many
researchers have reported that the amount of data in a given organization doubles every five years
(AbuSaleem, 2005). In order to process this high amount of data and make the best decisions as fast as
possible, the information must be reliable, accurate, real-time and easy-to-access. For such information,
all the enterprise-related data should be integrated and appropriately analyzed from a multi-dimensional
point of view. The solution for this is a data warehouse (DW).
Over the years, DWs have become one of the fundamentals of the information systems that are used to
support the decision making initiatives. The new era of enterprise-wide systems integration and the
growing needs towards BI both accelerate the development of DW solutions (AbuAli & Abu-Addose,
2010). Most large companies have already established DW systems as a component of the information
systems landscape. According to (Gartner, 2007), BI and DWs are at the forefront of the use of IT to
support management decision-making. DWs can be thought of as the large-scale data infrastructure for
decision support. BI can be viewed as the data analysis and presentation layer that sits between the DW
and the executive decision-makers (Arnott & Pervan, 2005). In this way, the DW/BI solutions can
transform raw data into information and then into knowledge.
However, a DW is not only a software package. The adoption of DW technology requires massive capital
expenditure and a certain deal of implementation time. DW projects are hence very expensive, timeconsuming and risky undertakings compared with other information technology initiatives, as cited by
prior researchers (Wixom & Watson, 2001; Hwang et al., 2004; Mukherjee & D‘Souza, 2003; Solomon,
-1-
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
2005). The typical project costs over $1 million in the first year alone (AbuAli & Abu-Addose, 2010).
And, it is estimated that one-half to two-thirds of all initial DW efforts fail (Hayen et al., 2007).
Moreover, (Gartner, 2007) estimates that more than fifty percent of DW projects have limited acceptance
or fail. Therefore, it is crucial to have a thorough understanding of the critical success factors and
variables that determine the efficient implementation of a DW solution.
These factors can refer to the development of the DW/BI solution or to the usage and adoption of BI. In
this master thesis, we will focus on the former as we consider that it represents the foundation for a solid
DW solution that can have a high rate of usage and adoption. First, it is critical to properly design and
implement the databases that lie at the heart of your DW. The right architecture and design can ensure
performance today and scalability tomorrow. Second, all components of the data warehouse solution
(e.g.: data repository, infrastructure, user interface) must be designed to work together in a flexible, easyto-use way. A third task is to develop a consistent data model and establish what and how source data will
be extracted. In addition to these factors, the DW needs to be created and developed quickly and
efficiently so that the organization can gain the business benefits as soon as possible (AbuAli & AbuAddose, 2010). As can be seen, a DW project can unquestionably be complex and challenging. This is
why it is important to gain some insight into the technical and organizational variables that determine the
successful development of a DW solution and assess these variables. Therefore, it is the main goal of this
master thesis to do this by creating a Data Warehouse Capability Maturity Model (DWCMM) and
answering the following main research question:
How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?
1.2 Research Questions
As stated before, the main goal of this research is to develop a DWCMM that will help organizations
assess their current DW solution from both technical and, organizational and processes points of view. In
order to do this and address the main research question, several sub-questions have been formulated and
have to be answered.
First, we would like to give an overview on the field of BI and DW in order to have a better
understanding of the context of BI/DW by answering the first sub-question: What are BI and DWs?
Then, we will elaborate on the second important element from our model, the maturity part. We will
identify the main characteristics of maturity models and the most representative maturity models for our
research by answering the following sub-question: What do maturity models represent and which are the
most representative ones for our research? Once we have a general overview on BI/DW and maturity
modelling, we can continue with presenting the stages of the DWCMM and the main characteristics for
each stage. We will in this way answer the next two sub-questions: What are the most important variables
and characteristics to be considered when building a data warehouse? and How can we design a
capability maturity model for a data warehouse assessment? Having created and presented the model, we
can now apply it as an assessment method at different organizations and see whether it is a viable source
of information and which changes need to be done. This will answer the last sub-question: To which
extent does the data warehouse capability maturity model result in a successful assessment and guideline
for the analyzed organizations?
-2-
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
To summarize, in order to deliver a valid DWCMM, our research aims to answer the following research
questions:
Main question:
How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?
Sub-questions:
1) What are business intelligence and data warehouses?
2) What do maturity models represent and which are the most representative ones for our research?
3) What are the most important variables and characteristics to be considered when building a data
warehouse?
4) How can we design a capability maturity model for a data warehouse assessment?
5) To which extent does the data warehouse capability maturity model result in a successful
assessment and guideline for the analyzed organizations?
1.3 Research Approach
Information systems (IS) are implemented within an organization for the purpose of improving the
efficiency and effectiveness of that organization. Hence, the main goal of research in this field is to create
―knowledge that enables the application of information technology for managerial and organizational
purposes‖ (Hevner et al., 2004). According to (Hevner et al., 2004), mainly two paradigms characterize
the research in the IS discipline: behavioural science and design science. Behavioural science aims to
develop and verify theories that explain or predict human or organizational behaviour. Design science
paradigm on the other hand seeks to extend the boundaries of human and organizational capabilities by
creating new and innovative artifacts. As discussed above, the main goal of our research is to develop a
DWCMM that depicts the maturity stages of a DW project, which can be used to assist organizations in
identifying their current maturity stage and evolving to a higher level. For this purpose, a design research
approach is used as its main philosophy is to generate scientific knowledge by building and validating a
previously designed artifact (Hevner et al., 2004). In this research, the artifact is the DWCMM, which is
developed according to the seven design science guidelines stated by (Hevner et al., 2004) and to the five
steps in developing design research artifacts as described by (Vaishnavi & Kuechler, 2008):



awareness of problem – it can come from multiple research sources. In our case, awareness of the
problem area was raised in discussions with DW/BI practitioners and literature study on data
warehousing and maturity modelling.
suggestion – it is essentially a creative step wherein new functionality is envisioned based on a
novel configuration of either existing or new elements. Before the actual development of the
DWCMM, we have done a thorough literature study, proposed ideas and received suggestions
from experts regarding the components of the model and the relationship between them. We have
also designed an outline framework of the model.
development – it involves the actual implementation of the model using various techniques
depending on the artifact to be constructed. This stage is highly related to the previous one. In our
-3-
DWCMM: The Data Warehouse Capability Maturity Model


Catalina Sacu
research, it involves the actual creation and presentation of the DWCMM with all its analyzed
categories and maturity stages.
evaluation – it consists of evaluating the constructed artifact according to criteria that are always
implicit and frequently made explicit in the awareness step. According to (Hevner et al., 2004),
case studies have proved to be an appropriate evaluation method in design research. Therefore,
the validation phase in our case has consisted of five expert interviews and four case studies. We
have received a lot of feedback and suggestions from the expert interviews for improving the
model. Then, once we redefined the model, we continued its validation within four organizations
following (Yin, 2009) case study approach.
conclusion – this phase is the finale of a specific research effort when results are summarized,
conclusions are drawn and suggestions for further research are discussed.
The way in which our research fits within the IS Research Framework designed by (Hevner et al., 2004)
can be depicted in the figure below.
Figure 1: IS Research Framework (adapted from (Hevner et al., 2004)).
As we adopted a design science approach for our study, the following structure was chosen for this thesis
document. We will first provide some background information on the main concepts of the study in
chapter 2. We will then present an overview of the design artifacts of this research in chapter 3. Chapters
4 and 5 offer a detailed analysis of the main components of the model we developed. In chapter 6, results
are presented for two evaluation activities of the model – expert interviews and test case studies. Finally,
in chapter 7, conclusions are drawn and limitations of this research are discussed along with some points
on further research.
-4-
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
2 What are Data Warehousing and Business Intelligence?
In this section, the key background concepts of the study – data warehousing, business intelligence and
maturity modelling – will be summarized, and related work will be explored.
2.1 The Intelligent Organization
The ubiquitous complexity, speed of change and uncertainty of the present economical environment
determine organizations to face enormous challenges (Schwaninger, 2001). However, for a long time
organizations worked in closed settings and saw themselves as fortresses with walls and boundaries that
limited their activities and influence (Choo, 1995). Nowadays, this static representation of organizations
has become a relic. Today‘s organizations are complex, open systems that cannot function isolated from
the surrounding dynamic environment. As already discussed in the introduction, the external environment
is an important source of information (Aldrich & Mindlin, 1978) that organizations will have to gather
and process very rapidly in order to maintain their competitive advantage (Choo, 1995). However,
nowadays, information is being generated at an ever-increasing rate, which makes it very difficult for
companies to manage it. Decision makers often find themselves into an information overload problem,
being very hard for them to identify the right information for decision purposes in the available time
(O'Reilly, 1980). This causes a so-called ―information gap‖ due to the need for fast decision making on
one hand, and the longer time needed to acquire the right information on the other hand (Tijsen et al.,
2009). This requires decision makers to utilize information management systems and analysis for
supporting their business decisions (Turban et al., 2007). This is where BI/DW can help. As depicted in
figure 2, BI helps narrowing down the information gap by shortening the required time to obtain relevant
data and by efficient utilization of available time to apply information (Tijsen et al., 2009).
Figure 2: Information Gap (adapted from (Tijsen et al., 2009)).
In this way, organizations are not only information consumers, but also creators of information and
knowledge (Choo, 1995). This can help them understand and adapt very fast to the changes in their
business environment and maintain their competitive advantage. According to (Porter, 1985), an effective
competitive strategy requires a deep understanding of the relationship between the firm and its
environment. And this can be obtained by applying DW/BI as a competitive differentiator. This refers to
using DW/BI not only to get to know your own organization and customers, but also your competitors
-5-
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
and getting competitive intelligence. As J. D. Rockefeller once said, ―Next to knowing all about your own
business, the best thing to know about is the other fellow‘s business.‖
As can be seen in figure 3, there is a whole BI cycle which starts with planning based on corporate needs;
then ethically collecting reliable information from valid sources; then analyzing the data to form
intelligence in conjunction with strategic planning and market research. Finally, in order for the
intelligence to have value, it must be disseminated in a form that is clear and understandable (Thomas,
2001). BI is a rigorous process where sources of information, including published information as well as
human sources, play a vital role. The BI process was working long before the development of computers
and knowledge database software, but those tools have allowed BI to have much greater value in the
decision-making process and in the way organizations sustain their competitive advantage.
Figure 3: The BI Cycle (adapted from (Thomas, 2001)).
In this way, organizations can become ―intelligent‖ and stay ahead of change which according to
(Drucker, 1999) is the only way of coping with change effectively. The main characteristics that
distinguish intelligent organizations are (Schwaninger, 2001):




to adapt to change as a function of external stimuli
to influence and shape their environment
to find a new milieu, if necessary, or to reconfigure themselves virtuously with their environment
to make a positive net contribution to the viability and development of the larger environments
into which they are embedded.
2.2 Data-Information-Knowledge
As the terms data, information and knowledge have been used in the previous paragraphs and will be used
further in this thesis, we would like to give a short overview on each of the terms and see the differences
between them. In everyday‘s writing the distinction between data and information is not clearly made,
and they are often used interchangeably; the same applies for information and knowledge. However,
many scientists claim that data, information and knowledge are part of a sequential order (Zins, 2007).
Data are the raw material for information, and information is the raw material for knowledge. A well
-6-
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
known representation of the relationships between the three concepts is the ―DIKW (Data, Information,
Knowledge, Wisdom) Hierarchy‖. One of the versions of the hierarchy depicts it as a linear chain (Hey,
2004) as can be seen in figure 4. Not all versions of the DIKW model reference all four components
(earlier versions not including data, later versions omitting or downplaying wisdom), but the main idea is
the same. We will only elaborate on the first three concepts here as they are the most used and
acknowledged ones.
Figure 4: The Data-Information-Knowledge-Wisdom Hierarchy (adapted from (Hey, 2004)).
The distinctions and relationships between data, information and knowledge will be further elaborated in
the remainder of the paragraph.
Data
Data has experienced a variety of definitions, largely depending on the context of its use. For example,
Information Science defines data as unprocessed information and other domains leave data as a
representation of objective facts (Hey, 2004). According to (Ackoff, 1989), data is raw. It simply exists
and has no significance beyond its existence. Data is acquired from the external world through our senses
in the form of signals and signs. Much neural processing has to take place between the reception of a
stimulus and its sensing as data by an agent (Kuhn, 1974). In an organizational context, data is usually
described as structured records of transactions which are stored in a technology system for different
departments such as finance, accounting, sales, etc. (Davenport & Prusak, 2000). Data says nothing about
its own importance or irrelevance; nor does it provide judgement, interpretation or sustainable basis of
action. However, it is important to organizations because it is essential raw material for the creation of
information.
Information
Information is data that has been processed, interpreted and given meaning (useful or not) by way of
relational connection. It provides answers to ―who‖, ―what‖, ―where‖ and ―when‖ questions. In computer
parlance, a relational database makes information from the data stored within it (Ackoff, 1989).
According to (Boisot & Canals, 2004), information constitutes those significant regularities residing in
-7-
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
the data that agents attempt to extract from it. In order to interpret data into information, a system needs
knowledge. The meaning of terms may be different for different people, and it is our knowledge about
particular domains - and the world in general - that enables us to get meaning out of these data strings.
Hence, for data to become information, interpretation and elaboration processes are required (Aamodt &
Nygård, 1995). Computers can assist in transforming data into information, but cannot replace humans.
Today‘s managers believe that having more information technology will not necessary improve the state
of information. To sum up, we could say that ―information is data that makes a difference‖ (Davenport &
Prusak, 2000).
Knowledge
Knowledge is broader, deeper and richer than data and information. It is the appropriate collection of
information such that it is placed in a certain context and its intent is to be useful. Knowledge is a
deterministic process and it provides answers to ―how‖ questions (Ackoff, 1989). According to
(Davenport & Prusak, 2000), knowledge is ―a fluid mix of framed experience, values, contextual
information and expert insight that provides a framework for evaluating and incorporating new
experiences and information‖. As can be seen, knowledge is not neat or simple to obtain. It can be
considered both process and stock, and its creation takes place within and between people. Knowledge
allows us to act more effectively than information and data as it gives the opportunity of predicting future
outcomes. We could say that knowledge in practical sense is ―value added information‖ (Jashapara, 2004)
which helps us make better and faster decisions.
2.3 The Origins of DW/BI
While the term BI is new (since the early 1990s), computer-based BI systems go back, in one form or
another, for more than forty years (Gray & Negash, 2003). Approaches to BI have thus evolved over
decades of technological innovation and management experience with IT.
The history of BI systems begins in the mid-1960s when researchers began systematically studying the
use of computerized quantitative models to assist in decision making and planning (Power, 2003).
(Ferguson & Jones, 1969) reported the first experimental study using a computer aided decision system
by investigating a production scheduling application. At the same time, organizations were beginning to
computerize many of the operational aspects of their business. Information systems were developed to
perform operational applications such as order processing, billing, inventory control and payroll (Arnott
& Pervan, 2005). Once the importance of the data from the operational processes was acknowledged for
the decision-making process, the first Management Information System (MIS) was developed. Another
turning point in this field was Morton‘s work who, together with Gorry, defined the concept of ―Decision
Support Systems‖ (DSS) (Gorry & Morton, 1971). They constructed a framework for improving MIS and
conceived DSS as systems that support any managerial activity in decisions that are semistructured or
unstructured (Arnott & Pervan, 2005). The aim of DSS was to create an environment in which the human
decision maker and the IT-based system worked together in an interactive way to solve problems, the
human dealing with the complex unstructured parts of the problem, and the information system providing
assistance by automating the structured elements of the decision situation.
The oldest form of DSS was the personal DSS which were small-scale systems that were normally
developed for one manager, or a small number of independent managers, for one decision task. They
-8-
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
effectively replaced MIS as the management support approach of choice and for around a decade they
were the only form of DSS in practice. Starting with the 1980s, many activities associated with building
and studying DSS occurred in universities and organizations that resulted in expanding the scope of DSS
applications. This determined a broad historical progression and development of DSS into the next main
categories (Arnott & Pervan, 2005):



Group DSS - a group DSS consists of a set of technology and language components and
procedures, communication and information processing that support a group of people engaged in
a decision-related process (Huber, 1984; Kraemer & King, 1988).
Negotiation DSS – a negotiation DSS also operates in a group context, but as the name suggests,
they involve the application of computer technologies to facilitate negotiations (Rangaswamy &
Shell, 1997). As GSS were developed, the need to provide electronic support for groups involved
in negotiation problems and processes evolved as a focused sub-branch of GSS with different
conceptual foundations to support those needs.
Executive information systems (EIS) – an EIS is a data-oriented DSS that provides reporting
about the nature of an organization to management (Fitzgerald, 1992). Despite the ‗executive‘
title, they are used by all levels of management. EIS were enabled by technology improvements
in the mid to late 1980s and, by the mid 1990s EIS had become mainstream and were an integral
component of the IT portfolio of any reasonably sized organization.
As the 1990s unfolded, we saw the emergence of Data Warehousing (DW) and Business Intelligence (BI)
which replaced the EIS. We will focus on these two terms further in the next paragraph in order to get a
better overview of the field that inspired us in our research.
2.4 DW/BI Definition
The term BI was first introduced by (Luhn, 1958) in his article called ―A Business Intelligence System‖.
In his view, BI was defined as: ―the ability to apprehend the interrelationships of presented facts in such a
way as to guide actions towards a desired goal‖. However, the term BI was coined and popularized in the
early 1990s by Howard Dresner (Gartner Group analyst). He described BI as a set of concepts and
methods to improve business decision making by using fact-based support systems (Power, 2003). In the
last couple of years, a lot of attention has been spent on BI and therefore many definitions can be found in
literature. Some of the most representative ones can be seen here.
According to (Golfarelli et al., 2004), BI can be defined as the process of turning data into information
and then into knowledge. A similar view on BI is the one of (Eckerson, 2007) who believes that BI
represents ―the tools, technologies and processes required to turn data into information and information
into knowledge and plans that optimize business actions.‖ Furthermore, (Gray & Negash, 2003) consider
that BI systems ―combine data gathering, data storage and knowledge management with analytical tools
to present complex and competitive information to planners and decision makers‖. We can see from these
definitions that BI helps the decision making process by efficiently and effectively transforming data into
knowledge through the use of different analytical tools.
Moreover, the concept of DW dates back in the late 1980s when IBM researchers (Devlin & Murphy,
1988) published their article ―An Architecture for a Business and Information Systems‖ and introduced
the term ―business data warehouse‖. However, the DW technology and development became popular in
-9-
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
the 1990s after (Inmon, 1992) published his seminal book ―Building the Data Warehouse‖. Furthermore,
the bull market of the 1990s led to a plethora of mergers and acquisitions and an increasing globalization
of the world economy. Large organizations were therefore faced with significant challenges in
maintaining an integrated view of their business. This was the environment that determined the increase
of development and usage of DWs (Arnott & Pervan, 2005).
Similar to BI, a lot of definitions can be found for DW, but all of them start from the ways that Inmon and
Kimball (the creators of the two main schools of thought and practice within data warehousing) defined it.
(Inmon, 1992) defines the DW as a ―subject-oriented, integrated, time-varying, non-volatile collection of
data that is used primarily in organizational decision making‖. (Kimball, 1996) offers a much simpler
definition of a DW which provides less insight and depth than Inmon‘s, but is no less accurate. In his
opinion, a DW is ―a copy of transaction data specifically structured for query and analysis‖ (Kimball,
1996). DWs are therefore targeted for decision support as they collect information about one or more
business processes involved in the whole organization. The DW can be seen as a repository that stores
data gathered from many operational databases, and from which the information and knowledge needed
to effectively manage the organization emerge.
Typically, the DW is maintained separately from the organization‘s operational databases as it supports
online analytical processing (OLAP) through a variety of front end tools such as query tools, report
writers, data mining and analysis tools. OLAP functional and performance requirements are quite
different from those of the online transaction processing (OLTP) that applications traditionally supported
by the operational databases. The main differences between operational databases and DWs can be seen
in the table below.
Characteristics
Source of data
Operational Databases
DWs
Operational data; OLTP are the Consolidated data; data comes
original source of data
from various OLTP databases
Few
Many
Number of sources
Gigabyte
Gigabyte-Terabyte
Size of sources
Current values
Archived, derived, summarized
Data content
Control and run fundamental Help planning, problem solving,
Purpose of data
business tasks
prediction and decision support
Simple
Complex
Complexity of transactions
Static, predefined
Dynamic, flexible
Kind of transactions
Current-valued
Current-valued & historical
Actuality
High
Medium/Low
Numbers of users/Frequency
Table 1: Differences between operational databases and DWs (adapted from (Breitner, 1997)).
Now that we defined both the BI and DW terms, we can see that there is some overlapping, but also some
differences between them. Also, in literature there has been a debate regarding these two concepts. Some
authors believe that BI is the overarching term with the DW being the central data store foundation,
whereas others refer to data warehousing as the overall concept with the DW databases and BI layers as
subset deliverables (Kimball et al., 2008). As (Kimball et al., 2008) and (Inmon, 2005), two of the most
notorious figures in this field, believe that the DW is the foundation of BI, we will proceed with this
approach in the remainder of this thesis.
- 10 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
2.5 DW/BI Business Value
Since the development of a DW/BI environment is usually a very expensive endeavour, an organization
considering such an initiative needs a BI strategy and a business justification to show the balance between
the costs involved and the benefits gained. A DW/BI initiative provides numerous benefits—not only
tangible benefits such as increasing the sales volume and profits, but also intangible benefits such as
enhancing the organization's reputation. Many of these benefits, especially the intangible ones, are
difficult to quantify in terms of monetary value. The real benefit of a DW/BI solution occurs when the
created knowledge is actionable. That means that an organization cannot just provide for the information
factory; it must also have some methods for extracting value from that knowledge. This is not a technical
issue – it is an organizational one. To have identified actionable knowledge is one thing, but to take the
proper action requires a nimble organization with individuals empowered to take that action. Hence,
before embarking on building a DW/BI environment, every included DW/BI activity should be
accompanied by some strategy to gain business value (Loshin, 2003).
Moreover, although the general benefits of DW/BI initiatives are widely documented, they cannot justify
the DW/BI project unless these benefits can be associated to the organization‘s specific business problems
and strategic business goals (Moss & Atre, 2003). Justification for a DW/BI initiative must always be
business-driven and not technology-driven. It is very important for such an initiative to have support from
top level management in order to be successful. Therefore, the DW/BI initiative as a whole, and the
proposed BI application specifically, should support the strategic business goals. Each proposed BI
application must reduce measurable business problems (i.e.: problems affecting the profitability or
efficiency of an organization) in order to justify building the application.
Furthermore, the business representative should be primarily responsible for determining the business
value of the proposed DW/BI application. The information technology (IT) department can become a
solution partner with the business representative and can help explore the business problems and define
the potential benefits and costs of the DW/BI solution. IT can also help clarify and coordinate the
different needs of the varied groups of business users in order to develop a solution that will have a higher
rate of adoption.
With the business representative leading the business case assessment effort, IT staff can assist with the
four business justification components (Moss & Atre, 2003):




Business drivers - Identify the business drivers, strategic business goals, and DW/BI application
objectives. Ensure that the DW/BI solution objectives support the strategic business goals.
Business analysis issues - Define the business analysis issues and the information needed to meet
the strategic business goals by stating the high-level information requirements for the business.
Cost-benefit analysis – Estimate the benefits and costs for building and maintaining a successful
BI decision-support environment. Determine the return on investment (ROI) by assigning
monetary value to the tangible benefits and highlighting the positive impact the intangible
benefits will have on the organization.
Risk assessment - Assess the risks in terms of technology, complexity, integration, organization,
project team, user adoption and financial investment.
- 11 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
As can be seen from this paragraph, it is very important to have synchronization between the business
goals of an organization and technical DW/BI solution. The business part has to be the driver for building
the technical application. However, due to time constraints and the fact that there are previous DW/BI
maturity models that also focus on the business side of the problem (Watson et al., 2001; Eckerson, 2004;
Hostmann, 2007), in this thesis we will focus on the technical aspects and the organizational processes
and roles involved in developing a DW/BI solution. Once the business goals and strategy are clearly
defined, it all comes down to being able to develop and maintain a solid technical solution.
2.6 Inmon vs. Kimball
As mentioned in the paragraphs above, there are two different fundamental approaches to data
warehousing: enterprise level data warehouses (Inmon, 1992) and division or department level data marts
(Kimball, 1996). Understanding the basics of the architecture and methodology of both models provides a
good foundational knowledge of data warehousing. Based on this and an organization‘s special needs,
architects can then choose between Inmon‘s, Kimball‘s or a hybrid architectural model.
Inmon sees the DW as a part of a much larger information environment, which he calls Corporate
Information Factory (CIF). To ensure that the DW fits well in this larger environment, he advocated the
construction of both an atomic DW and departmental databases. Inmon‘s approach stresses top-down
development using adaptations of proven database methods and tools. He proposes a three-level data
model (Breslin, 2004). The first level is represented by entity relationship diagrams (ERDs); the second
level establishes the data item set (DIS) for each department; and the third level is the physical model,
created ―by merely extending the mid-level data model to include keys and physical characteristics‖
(Inmon et al., 2005). As it can be seen, Inmon‘s approach is evolutionary rather than revolutionary. His
tools and methods can actively be used only by IT professionals, whereas end users have a more passive
role in the DW development, mostly receiving the results generated by the IT professionals.
On the other hand, Kimball proposes a bottom-up approach by first building one data mart per business
process and then creating the organization‘s DW as a sum of all data marts. The interoperability between
various data marts is ensured by the data bus which requires that all data marts are modeled within
consistent standards called conformed dimensions. Kimball proposes a unique four-step dimensional
design process that consists of: selecting the business process; declaring the grain (i.e.: the level of detail)
of the DW; choosing the dimensions; and identifying the facts. Fact tables contain metric data and
dimension tables show the context of the facts and modify the data. Dimensional modelling has a series of
advantages such as understandability, query performance and extensibility to accommodate new data
(Kimball et al., 2008). Dimensional modelling tools can be used by end users with some special training
which ensures the active involvement of end users in the development of the DW (Breslin, 2004).
Inmon‘s and Kimball‘s models are similar in some ways such as the treatment of the time attribute or the
extract-transform-load (ETL) process, but they are also very different regarding other aspects such as the
development methodologies and architecture, data modelling and philosophy. A summary of these
differences can be depicted in table 2 adapted from (Breslin, 2004).
- 12 -
DWCMM: The Data Warehouse Capability Maturity Model
Methodology and Architecture
Overall approach
Architectural structure
Complexity of the method
Comparison
with
established
development methodologies
Consideration of the physical design
Data Modelling
Data orientation
Tools
End-user acceptability
Philosophy
Primary audience
Place in the organization
Catalina Sacu
Inmon
Kimball
Top-down
Enterprise wide (atomic) data
warehouse is the foundation for data
marts.
Bottom-up
Data marts model a single business
process
and
the
enterprise
consistency is achieved through data
bus and conformed dimensions.
Fairly simple
Four-step process, inspired from
relational databases methods.
Fairly light
Quite complex
Derived
from
methodology.
Fairly thorough
the
Subject- or data-driven
Traditional (ERDs, DISs)
Low
spiral
Process oriented
Dimensional modeling
High
IT professionals
End-users
Integral
part
of
Corporate Transformer
and
retainer
of
Information Factory (CIF)
operational data
Objective
Deliver a sound technical solution Deliver a solution that makes it easy
based on proven database methods for end users to directly query the
and technologies.
data and get good response times.
Table 2: Comparison of Essential Features of Inmon‘s and Kimball‘s Data Warehouse Models (Breslin, 2004).
The model that we are developing in this thesis can be applied to both Inmon‘s and Kimball‘s conceptual
views on DW development, but there are some specific aspects in the data modelling assessment that are
limited to dimensional modelling. The reasons for this include both time constraints and the fact that most
of the DWs developed in practice make use of this technique especially for data marts and for models
presented to users. For more information on the two data modelling techniques, see 4.2.2 and 4.2.3.
2.7 Maturity Modelling
As the main goal of our research is to develop a Data Warehouse Capability Maturity Model, we will now
give an overview on the subject of maturity modelling and take a look at the maturity models that served
as a source of inspiration for our endeavour.
In this highly competitive environment, it is very important for organizations to be aware of their current
situation and know the steps they need to take for continuous improvement. This requires the company‘s
positioning with regard to its IT capabilities and the quality of its products and processes. This positioning
usually involves a comparison with the company‘s goals, external requirements (e. g.: customer demands,
laws or guidelines), or benchmarks. However, an objective assessment of a company‘s position often
proves to be a difficult task. Maturity models can be helpful in this situation. They essentially describe the
development of an entity over time, where the entity can be anything of interest: a human being, an
organizational function, an organization, etc. (Klimko, 2001).
Maturity models can be used as an evaluative and comparative basis for organizational improvement (de
Bruin et al., 2005), and to derive an informed approach for increasing the capability of a specific area
within an organization (Hakes, 1996). They usually have a number of sequentially ordered levels, where
the bottom stage stands for an initial state than can be, for example, characterized by an organization
- 13 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
having little capabilities in the domain under consideration. In contrast, the highest stage represents a
concept of total maturity. Advancing on the evolution path between the two extremes involves a
continuous progression regarding the organization‘s capabilities or process performance. The maturity
model serves as an assessment of the position on the evolution path, as it offers a set of criteria and
characteristics that need to be fulfilled in order to reach a particular maturity level (Becker et al., 2009).
During a maturity appraisal which can be done by predetermined procedures such as questionnaires, a
snapshot of the organization regarding the given criteria is made (i.e.: a descriptive model). Based on the
results of the as-is analysis, recommendations for improvement measures can be derived and prioritized in
order to reach higher maturity levels (i.e.: a prescriptive model). Then, once the model is applied in a
wide range of organizations, similar practices across organizations can be compared in order to
benchmark maturity within more disparate industries (i.e.: a comparative model).
2.7.1
Data Warehouse Capability Maturity Model (DWCMM) Forerunners
Studies have shown that more than one hundred and fifty maturity models have been developed (de Bruin
et al., 2005), but only some of them managed to gain global acceptance. Also, there are several
information technology and/or information system maturity models dealing with different aspects of
maturity: technological, organizational and process maturity. The most important maturity models that
served as a source of inspiration for our research can be seen in table 3 and are briefly presented in the
following paragraphs.
Authors
Nolan (1973)
Software Engineering Institute
(SEI) (1993)
Watson,
Ariyachandra
&
Matyska (2001)
Chamoni & Gluchowski (2004)
The Data Warehousing Institute
(TDWI) (2004)
Gartner – Hostmann (2007)
Model
Stages of Growth
Capability Maturity Model (CMM)
Focus
IT Growth Inside an Organization
Software Development Processes
Data Warehousing Stages of Growth
Data Warehousing
Business Intelligence Maturity Model
Business Intelligence Maturity Model
Business Intelligence
Business Intelligence
Business Intelligence and Performance
Management Maturity Model
Table 3: Overview of Maturity Models.
Business
Intelligence
Performance Management
and
Nolan’s Stages of Growth
First, one of the most widely used concepts in organizational and IS research is the ―stages of growth‖.
The fundamental belief is that many things change over time, in sequential, predictable ways. The stages
of growth are commonly depicted graphically using an S-shaped curve, where the turnings of the curve
mark important transitions. The number of stages varies with the phenomena under investigation, but
most models have between three and six stages (Watson et al., 2001). One of the most famous ―stages of
growth‖ maturity models is Richard Nolan‘s one, published in (Nolan, 1973). The model has been widely
recognized and used by both practitioners and researchers alike. It is based on the companies‘ spending
for electronic data processing, but it can be expanded to the general approach of IT in an organization.
Nolan‘s initial model describes four distinct stages: initiation, expansion, formalization and maturity. In
1979, Nolan transformed the original four-stage model into a six-stage model by adding two new stages:
- 14 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
integration and data administration that were put in between the stages of formalization and maturity. For
a more detailed and also critical analysis of the Nolan curve see (Galliers & Sutherland, 1991).
Capability Maturity Model (CMM)
The second classical maturity model is the Capability Maturity Model (CMM) developed at the end of the
eighties by Watts Humphrey and his team from the Software Engineering Institute (SEI) at Carnegie
Mellon University. The CMM is a framework that describes the key elements of an effective software
process and presents an evolutionary improvement path from an ad-hoc, immature process to a mature,
disciplined one. It covers practices for planning, engineering and managing software development and
maintenance. The components of the CMM include (Paulk et al., 1995):






five maturity levels – initial, repeatable, defined, managed and optimizing;
process capabilities – describe the range of expected results that can be achieved by following a
software process;
key process areas – components of the maturity levels that identify a cluster of related activities
that, when performed collectively, achieve a set of goals considered important for establishing
process capability at that maturity level;
goals – summarize the key practices of a key process area;
common features – indicate whether the implementation and institutionalization of a key process
area is effective, repeatable, and lasting;
key practices – each key process area is described in terms of key practices that, when
implemented, help to satisfy the goals of that key process area.
Furthermore, a number of maturity models have been developed for assessing the maturity of BI and
DWs solutions.
Data Warehousing Stages of Growth
The ―Data Warehousing Stages of Growth‖ was adapted from Nolan‘s growth curve. It includes three
stages that describe the current evolution of DWs:



initiation — the initial version of the warehouse;
growth — the expansion of the warehouse;
maturity — the warehouse becomes fully integrated into the company‘s operations.
and nine variables that describe the different stages: data, architecture, stability of the production
environment, warehouse staff, users, impact on users‘ skills and jobs, applications, costs and benefits,
organizational impact (Watson et al., 2001). However, this model has its limitations as it is a
generalization, so it does not describe perfectly every company‘s experiences. Also, the model is a few
years old and meanwhile, new developments have occurred, that point to additional stages.
BI Maturity Model (biMM)
Another interesting model is the ―Business Intelligence Maturity Model‖ developed by (Chamoni &
Gluchowski, 2004), but both the model and the paper are in German which makes it rather difficult for
non-German speakers to understand its content. It comprises of five levels of evolutionary BI
- 15 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
development analyzed from three perspectives: business content, technology and organizational impact.
Different aspects of these perspectives are recorded and evaluated for each of the five stages. The model
is applied in different organizations in order to do some BI benchmarking in specific industrial sectors
and offer general strategic recommendations.
TDWI’s BI Maturity Model
Another famous BI maturity model is the one developed by The Data Warehousing Institute (TDWI)
(Eckerson, 2004). It is a six-stage model that shows the trajectory that most organizations follow when
evolving their BI infrastructure. The maturity stages are: prenatal, infant, child, teenager, adult and sage.
They are defined by a number of characteristics including scope, analytic structure, executive perceptions,
types of analytics, stewardship, funding, technology platform, change management and administration. In
2009, TDWI published a poster with a more complex BI maturity model which can be considered as a
generalization of multiple BI projects and implementations indicating certain patterns of behaviour based
on five different aspects: the BI adoption, organization control and processes, usage, insight and return on
investment (ROI). In order to give more value to the model, TDWI also created an assessment
questionnaire with questions on funding, value, architecture, data, development and delivery that can be
filled in by different organizations in order to have some BI benchmarking done.
Gartner’s BI and Performance Management (PM) Maturity Model
The last model that we would like to present is Gartner‘s Group BI and Performance Management (PM)
Maturity Model (Hostmann, 2007). Their model helps an organization understand its current position with
regard to BI and what it needs to do to move to the next level. Gartner bases its maturity curve on the
real-world phenomenon that organizational change is usually incremental over time and proposes five
maturity stages: unaware, tactical, focused, strategic and pervasive. An important discovery in their
analysis is that one characteristic was more likely than any other to indicate whether an organization is
capable of operating at the higher levels of BI/PM maturity: its implementation of the BI Competency
Center (BICC), or its lack thereof. BICC is a group of business, IT, and information analysts who work
together to define BI strategies and requirements for the entire organization.
2.8 Summary
This chapter has presented information on the key background concepts for our thesis – data warehousing,
business intelligence and maturity modelling. We first talked about the ―intelligent organization‖ and how
DW/BI solutions can help companies improve their performance. Then, we gave a short overview on
DW/BI evolution and defined the two concepts. Emphasis was then put on the fact that a DW/BI initiative
must always be business-driven and not technology-driven in order to be successful. We continued with
presenting the two main conceptual approaches to data warehousing (i.e.: Inmon vs. Kimball). Finally, we
provided some information on maturity modelling and the main maturity models that served as a
foundation for the artifact we designed. We will continue with an overview on the model we developed in
chapter 3.
- 16 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
3 DWCMM: The Data Warehouse Capability Maturity Model
This section describes in detail the deliverables proposed as a solution to the research problem.
3.1 From Nolan’s Stages of Growth to the Data Warehouse Capability Maturity Model
As presented in the previous paragraphs, a lot of maturity models have been developed for different
fields. And several models have been proposed for the field of DW/BI. Each of them has a different way
of assessing maturity, but there are some common elements for all the models.
First of all, Nolan‘s ―stages of growth‖ was a breakthrough in the organizational and IS research (Nolan,
1973). It shows the growth and evolution of information technology (IT) in a business or similar
organization from stage 1 called ―initiation‖ to the last stage called ―maturity‖. The second maturity
model which was actually the starting point for this thesis is the CMM (Paulk et al., 1995). It has become
a recognized standard for rating software development organizations. The CMM is a framework that
describes the key elements of an effective software process and presents an evolutionary improvement
path from an ad-hoc, immature process to a mature, disciplined one. Since its development, CMM has
become a universal model for assessing software process maturity. Therefore, we decided to use it as a
main foundation for our model. However, the CMM has often been criticized for its complexity and
difficulty of implementation. That is why we simplified it by keeping the five maturity levels (i.e.: initial,
repeatable, defined, managed and optimizing), the process capabilities and the key process areas, which in
our model would translate to the chosen benchmark variables/categories for doing the DW maturity
assessment.
As DW/BI is widely applied in practice, several maturity models were developed especially for this field
as already presented. One of the most recent and famous models is the one developed by the TDWI
(Eckerson, 2004). Another interesting model is Gartner‘s BI and PM maturity model (Hostmann, 2007).
They both show the trajectory that most organizations follow when evolving their BI or PM
infrastructure. However, even if both models are interesting, they are not sustained by scientific literature
and they focus more on the business side of BI implementation and not on the technical aspects of a DW
project. Furthermore, even if the other two models, the DW stages of growth (Watson et al., 2001) and the
BI maturity model (Chamoni & Gluchowski, 2004), have more scientific roots, they have their
deficiencies. As mentioned before, the latter is in German, whereas the former is a few years old and
meanwhile, new developments have occurred that point to additional stages. Although both models focus
on more variables involved in the DW/BI development, they do not go deep into analyzing the technical
aspects of a DW solution.
Therefore, it can be seen that even if DW/BI solutions are often implemented in practice and a lot of
maturity models have been created, none is actually focusing on the technical aspects of the DW/BI
solution and the organizational processes that sustain them. Hence, this is the research gap we would like
to fill in by developing a Data Warehouse Capability Maturity Model (DWCMM) that focuses on the
DW technical solution and DW organization and processes. A short overview of the model will be shown
in the next paragraph and more details on each component will be given in the upcoming chapters.
- 17 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
3.2 DWCMM
Using the CMM as a main foundation, the other maturity models described above and a thorough and
extensive literature study, we developed the DWCMM that can be depicted in figure 5.
Figure 5: Data Warehouse Capability Maturity Model (DWCMM).
When analyzing the maturity of a DW solution, we are actually taking a snapshot of an organization at the
current moment in time. Therefore, in order to do a valuable assessment, it is important to include in the
maturity analysis the most representative dimensions involved in the development of a DW solution.
Several authors describe that the main phases usually involved in a DW project lifecycle are (Kimball et
al., 2008; Moss & Atre, 2003; Ponniah, 2001): project planning and management, requirements
definition, design, development, testing and acceptance, deployment, growth and maintenance. All of
these phases and processes refer to the implementation and maintenance of the actual DW technical
solution which includes: general architecture and infrastructure, data modelling, ETL, BI applications.
These categories can be analyzed from many points of view which will be depicted in our model and the
maturity assessment we developed. Therefore, as the DWCMM will be restricted for doing the assessment
of the technical aspects, without taking into consideration the DW/BI usage and adoption or the DW/BI
business value, it will consider two main benchmark variables/categories for analysis, each of them
having several sub-categories:


DW Technical Solution
 General Architecture and Infrastructure
 Data Modelling
 Extract-Transform-Load (ETL)
 BI Applications
DW Organization & Processes
 Development Processes
 Service Processes.
- 18 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
In order to be able to do the assessment for each of the chosen categories and sub-categories, we also
developed a DW maturity assessment questionnaire. It is important to emphasize the fact that the
questionnaire we have developed does a high level assessment of an organization’s DW solution and it is
limited strictly to the DW technical aspects. Emphasis should also be put on the fact that the model will
assess “what” and “if” certain characteristics and processes are implemented and not “how” they are
implemented. It is a practical solution as it takes less than an hour to fill in the questions that will be
scored and it is addressed to someone from the DW team who has knowledge and experience in all the
presented categories included in the DWCMM (e.g.: DW technical architect, BI project manager, BI
manager, BI consultant, etc.). However, although it may be tempting to use the scores from the
assessment questionnaire as a definitive statement of the organization‘s DW maturity, this should be
avoided. The maturity score is just a rough gauge that merely scratches the surface of most DW projects.
That is why the maturity assessment we developed should serve as a starting point. To truly assess the
technical maturity and discover the areas of strengths and weakness, organizations should perform more
thorough analysis for each benchmark category.
The DW maturity assessment questionnaire has 60 questions divided into the following three categories:

DW General Questions (9 questions) – it comprises of several questions about the DW/BI
solution and they are not scored. Their purpose is to offer a better image on the drivers for
implementing the DW environment, the budget allocated for data warehousing and BI, the DW
business value, end-user adoption, etc. This will be useful in creating a complete picture on the
current DW solution and its maturity. Also, once the questionnaire is filled in by more
organizations, this data will serve as input for statistical analysis and comparisons between
organizations from the same industry or across industries. The questions from this category can
be seen in the table below.
1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization?
2) How long has your organization been using BI/DW?
3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of:
a) Returns vs. Costs
b) Time (Intended vs. Actual)
c) Quality
d) End-user adoption.
4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment?
a) Operational cost center – An IT system needed to run the business
b) Tactical resource - Tools to assist decision making
c) Mission-critical resource - A system that is critical to running business operations
d) Strategic resource – Key to achieving performance objectives and goals
e) Competitive differentiator – Key to gaining or keeping customers and/or market share.
5) What percentage of the annual IT budget for your organization does the BI/DW budget represent?
6) What percentage of the IT department is taking care of BI (i.e.: how many people from the total number of IT
employees)?
7) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the
invoice)?
8) Which technologies do you use for developing the BI/DW solution in your organization (i.e.: for data
modelling, ETL, BI applications, database)?
9) What data modelling technique do you use for your BI/DW solution (e.g.: dimensional modelling, normalized
- 19 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
modelling, data vault, etc.)?
Table 4: DW General Questions.


DW Technical Solution (32 questions) – it comprises of several scored questions for each of the
following sub-categories:
 General Architecture and Infrastructure (9 questions)
 Data Modelling (9 questions)
 ETL (7 questions)
 BI Applications (7 questions). More details on this part will be given in chapter 4.
DW Organization & Processes (19 questions) – it comprises of several scored questions for each
of the following sub-categories:
 Development Processes (11 questions)
 Service Processes (8 questions). More details on this part will be given in chapter 5.
The whole DW maturity assessment questionnaire is shown in appendix B. Each question from the
questionnaire will have five possible answers which are scored from 1 to 5, 1 being a characteristic for the
lowest maturity stage and 5 for the highest one. When an organization takes the survey, it will receive:



a maturity score for each sub-category by computing the average value of the weightings (i.e.:
sum of the weightings / number of questions).
an overall score for each of the two main categories by computing the average value of the scores
obtained for each sub-category.
an overall maturity score is shown following the same principle applied to the main two
categories scores.
We believe that the maturity scores for the sub-categories can give a good overview on the current DW
solution implemented by the organization. This is the reason why, after computing the maturity scores for
each sub-category, a radar graph, as the one depicted in figure 5, will be drawn to show the alignment
between these scores. In this way, the organization will have a clearer image on their current DW project
and will know what sub-category is the strongest and which one is left behind.
An important point here is that the answers are usually mingled in order to get a more unbiased result.
Some questions have their answers given in a hierarchical order as, in order to get to a higher maturity
level, the organization should already have implemented the requirements found in the previous stages.
As will be seen in the validation chapters, the model was tested in several organizations with all the
answers mingled. However, this created confusion for some of the questions (especially the ones from the
service processes part), and therefore, we decided to keep some answers in hierarchical order, assuming
that every respondent would like to get a fair result and will not offer biased answers.
After reviewing the maturity scores and the given answers by a specific organization, some general
feedback and advice for future improvements will be provided. Each organization that takes the
assessment will receive a document with a short explanation on the scoring method, a table with their
maturity scores and the radar graph, and then some general feedback that will consist of:


a general overview on the maturity scores.
an analysis of the positive aspects already implemented in the DW solution.
- 20 -
DWCMM: The Data Warehouse Capability Maturity Model

Catalina Sacu
several steps that the organization should take in order to improve their current DW application.
A template of this document can be seen in appendix F. Moreover, as our model measures the maturity of
a DW solution, we also created two maturity matrices – a condensed maturity matrix and a detailed one –
each of them having five maturity stages as inspired by the CMM:





Initial (1)
Repeatable (2)
Defined (3)
Managed (4)
Optimized (5),
where the initial stage describes an incipient DW development and the optimized level shows a very
mature solution that can be obtained by an organization with a lot of experience in the field where
everything is standardized and monitored. An organization will usually be situated on one of these stages
if their score is a perfect match with the number of the stage (i.e.: 1, 2, 3, 4 or 5) or somewhere in between
otherwise. However, this mapping is not a perfect match.
The condensed DW maturity matrix gives a short overview of the most important characteristics for each
sub-category for each maturity level. This will offer a better image on the main goal of the DWCMM and
on what the detailed maturity matrix entails. The condensed maturity matrix can be seen in figure 6.
Stages
DW Technical Solution
Benchmark Variables
Architecture
Initial (1)
Desktop
marts
data
Repeatable (2)
Defined (3)
Managed (4)
Optimized (5)
DW/BI service
that federates a
central DW and
other
sources
via
standard
interface
Enterprise-wide
standards and
automatic
synchronization
of all the data
models
Optimized ETL
for
real-time
DW with all the
standards
defined
Independent
data marts
Independent
data
warehouses
Central
DW
with/without
data marts
Data Modelling
No data models
synchronization
or standards
Manually
synchronized
data models
Manually
or
automatically
synchronized
data models
Automatic
synchronization
of most data
models
ETL
Simple ETL with
no standards that
just extracts and
loads data into
the DW
Basic ETL with
simple
transformations
BI Applications
Static
and
parameter-driven
reports
Ad-hoc
reporting;
OLAP
Advanced ETL
(e.g.
slowly
changing
dimensions
manager, data
quality system,
reusability, etc.)
Dashboards &
scorecards
More advanced
ETL
(e.g.
hierarchy
manager,
special
dimensions
manager, etc.)
Predictive
analytics; data
& text mining
Closed-loop &
real-time
BI
applications
- 21 -
DW Organization & Processes
DWCMM: The Data Warehouse Capability Maturity Model
Development Processes
Service Processes
Ad-hoc,
nonstandardized
development
processes
or
defined phases
Some
Standardized
development
development
processes
processes with
policies
and all the phases
procedures
separated and
established with all the roles
some
phases formalized
separated
Ad-hoc,
non- Some service Standardized
standardized
processes
service
service processes policies
and processes with
procedures
all the roles
established
formalized
Figure 6: DWCMM Condensed Maturity Matrix.
Catalina Sacu
Quantitative
development
processes
management
Continuous
development
processes
improvement
Quantitative
service
processes
management
Continuous
service
processes
improvement
However, as already mentioned, more important is the detailed DW maturity matrix which can be seen in
appendix A. We will give a short overview on the detailed DW maturity matrix in this paragraph.
First, the characteristics for each maturity stage are usually obtained by mapping the corresponding
answers of each question from the maturity assessment questionnaire (except for several characteristics
such as: project management, testing and acceptance, whose answers are formulated in a different way).
In this way, an organization will be able to see their maturity stage by category (e.g.: architecture) and by
main category characteristics (e.g.: metadata, standards, infrastructure, etc.). The matrix has two
dimensions:


columns – show each benchmark sub-category (i.e.: Architecture, Data Modelling, ETL, BI
Applications; Development Processes, Service Processes) with their maturity stages from Initial
(1) to Optimized (5).
rows – show the main analyzed characteristics (e.g.: for Architecture – conceptual architecture,
business rules, metadata, security, data sources, performance, infrastructure, update frequency)
for each sub-category divided by maturity stage.
The matrix can be interpreted in two ways:
1) Take each stage and see which the specific characteristics for each sub-category for that particular
stage are.
2) Take each sub-category and see which its specific characteristics for each stage or for a particular
stage are.
As the developed questionnaire does an assessment for each benchmark sub-category, a specific
organization will most likely follow the second interpretation. They would probably like to know what
steps to take in order to improve each sub-category and hence, the overall maturity score, which will lead
to a higher maturity stage. It is also very unlikely that an organization will have at the same moment in
time, all the characteristics for all the sub-categories on the same maturity stage. However, the first
interpretation does not have to be followed so strictly. After all, this is only a model and the mapping
between theory and reality is not perfect. Therefore, if a company gets a maturity score of 3, this does not
mean that all the characteristics for all the sub-categories are on stage three. Depending also on the
- 22 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
standard deviation and the answers themselves, we can find out more information about the actual
situation. This is why we believe that the second interpretation would be more useful and we will
exemplify it here for general architecture and infrastructure.
The main characteristics for general architecture and infrastructure evaluated in our model are: conceptual
architecture, business rules, metadata management, security, data sources, infrastructure, performance,
and update frequency.
The maturity stages for conceptual architecture have the following structure:





Initial (1) – desktop data marts (e.g.: Excel sheets)
Repeatable (2) – multiple independent data marts
Defined (3) – multiple independent data warehouses
Managed (4) – a single, central DW with multiple data marts (Inmon) or conformed data marts
(Kimball)
Optimized (5) – a DW/BI service that federates a central DW and other data sources via a
standard interface.
Therefore, if an organization scores 3 for this specific characteristic, we would advise it to reconsider
their architecture and maybe go one step further and implement a single, central DW. In this way, they
could reach maturity stage four for this specific characteristic and it would be the first step towards a
higher overall maturity score. The same interpretation can be given if analyzing any characteristic for
architecture or for any other benchmark category.
At the same time, one could say that in order to be on maturity stage 3, an organization should have the
following characteristics implemented for architecture (more or less):




Conceptual architecture – multiple independent data marts.
Business rules – some business rules defined or implemented.
Metadata Management – central metadata repository separated by tools.
Security – independent authorization for each tool, etc.
Now that we have offered an overview of the DWCMM, we will continue with presenting the DWCMM
and the DW maturity assessment questionnaire and matrix more thoroughly in the next chapters. For this,
we will elaborate on each category and sub-category as shown in the DWCMM and we will present the
characteristics and questions we chose in order to assess the maturity of each benchmark variable from
the DWCMM. In chapter 4 we will focus on the DW Technical Solution maturity and we will continue in
chapter 5 with the DW Organization & Processes part.
- 23 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
4 DW Technical Solution Maturity
The main elements of the DWCMM having been identified, it is now time to elaborate on each part of the
maturity assessment questionnaire to present our arguments regarding the choice of questions for each
sub-category of the DWCMM. We will start with the components of the DW technical solution – general
architecture and infrastructure, data modelling, ETL, BI applications – in this chapter and continue with
the DW processes and organization in the next one.
4.1 General Architecture and Infrastructure
We already talked about what a DW is and the most common approaches to develop one in the previous
chapters. In this section, we would like to analyze the most important elements that need to be considered
when assessing the maturity of DW general architecture and infrastructure (this benchmark variable was
initially called ―architecture‖; see 6.1.1 for more details on changing the name). Depending on this, we
will also define the most representative questions for architecture included in the maturity assessment
questionnaire.
4.1.1
What is Architecture?
Architecture as a general term refers to a blueprint that allows communication, planning, maintenance,
learning, and reuse (Sen & Sinha, 2005). According to (Kimball et al., 2008), the architecture of a DW
consists of three major pieces: data architecture - organizes the data and defines the quality and
management standards for data and metadata; application architecture – the software framework that
controls the movement of data from source to user; and technical architecture - the underlying computing
infrastructure that enables the data and application architectures.
The whole architecture is divided in two parts (Kimball et al., 2008):


the back room where the data modelling and the ETL process take place and
the front room which refers to the BI applications and services.
Besides these three main components (i.e.: data modelling, ETL, BI applications), architecture also
includes underlying elements such as infrastructure, metadata and security that support the flow of data
from the source systems to the end-users (Kimball et al., 2008; Chauduri & Dayal, 1997). At the same
time, architecture refers to the major data storage components – source systems, data staging area, data
warehouse database, operational data store, data marts – and the way they are assembled together
(Ponniah, 2001). This is connected to the conceptual approach of designing and building the DW (e.g.:
conformed data marts – Kimball or enterprise-wide DW – Inmon, etc.). Therefore, in this thesis we
consider architecture as a separate category for assessing maturity in which we include questions
regarding: conceptual architecture and its layers, infrastructure, metadata management, security
management, update frequency, business rules, performance optimization. We will elaborate on each of
these elements and, at the same time, present the questions related to these elements that we included in
our maturity questionnaire.
- 24 -
DWCMM: The Data Warehouse Capability Maturity Model
4.1.2
Catalina Sacu
Conceptual Architecture and Its Layers
In this section we will present a typical DW architecture which usually contains several data storage
layers such as source systems, data staging area, data warehouse database, operational data store, data
marts. It is not mandatory for all these elements to be part of the architecture. A typical DW architecture
can be seen in figure 7.
Figure 7: A Typical DW Architecture (adapted from (Chaudhuri & Dayal, 1997)).
Source Systems
The first component of a DW is represented by the source systems without which there would be no data.
They provide the input into the solution and require detailed analysis at the beginning of the project. In
most cases, data must come from multiple systems built with multiple data stores hosted on multiple
platforms. The source systems usually include: Excel files, text files, XML files, relational databases,
enterprise resource planning (ERP) and customer relationship management (CRM) systems, etc. For a
broader view on these types of data sources, see (Kimball et al., 2008). Lately, organizations have begun
implementing capabilities in order to include in their DW various types of unstructured data sources (e.g.:
text documents, e-mail files; images or videos) and Web data sources. However, this implies new
technologies such as content intelligence (i.e.: search, classification and discovery techniques) which are
not yet very mature (Blumberg & Atre, 2003). Therefore, one could say that an organization that is able to
extract data from this kind of sources is a more mature one.
Data Staging Area
A data staging area is a temporary location in the back room of a DW where data from source systems is
copied. Occasionally, the implementation of the DW encounters environmental problems as it pulls data
from many source operational systems. Therefore, a separate staging area is needed to prepare data for the
DW, but it is not universally built. The copy of the data can be a one-to-one mapping of the source
systems‘ content, but in a more convenient environment. The data staging area is not accessible for the
end users and it does not support query or presentation services. It acts as a surrogate for the source
systems and it offers several benefits (Walker, 2006):
- 25 -
DWCMM: The Data Warehouse Capability Maturity Model



Catalina Sacu
It is a good place to perform data quality profiling.
It can be used as a point close to the source to perform data quality cleaning.
It will serve as a workbench for ETL, etc..
Data Marts
Data marts can be considered as subsets of the data volume from the whole organization specific to a
group of users or department. Therefore, they are limited to specific subject areas. For example, a data
mart for the marketing department would have subjects limited to customers, products, sales, etc.
(Chauduri & Dayal, 1997). The data from a data mart are usually aggregated to a certain level which can
sometimes provide rapid response to end-user requests. Data marts require less cost and effort to develop
and provide access to functional or private information to specific organizational units. They are suited
for businesses demanding a fast time to market, quick impact on the bottom line, and minimal
infrastructure changes (Murtaza, 1998). However, even if from a short-term perspective a data mart seems
a better investment than a DW, from a long-term perspective, the former is never a substitute for the
latter. The main reason for this is because many organizations misunderstand the concept of data marts
and develop independent solutions that propagate freely throughout the organization and become a
problem when attempting to integrate them (Kimball et al., 2008). Therefore, when developed, data marts
should be conformed and integrated, or derived from an enterprise wide DW.
Data Warehouse Database
As already presented, there are two main conceptual DW architectures: central DW with multiple data
marts (Inmon) or conformed data marts (Kimball). Of course, there are also hybrid approaches that
combine the enterprise wide DW and the conformed data marts technique. Here, we just refer to the DW
database or the separate repository that does the actual storage of data. The DW is no special technology
in itself as it is a relational or multidimensional data structure that is optimized for analysis and querying.
As the data structure and operations are different from the ones in the transactional systems, it is
important to have the DW environment separated from the operational ones (Chauduri & Dayal, 1997).
Operational Data Store
The Operational Data Store (ODS) is a database that provides a consolidated view of volatile
transactional data from multiple operational systems. According to Bill Inmon, the originator of the
concept, an ODS is ―a subject-oriented, integrated, volatile, current-valued, detailed-only collection of
data in support of an organization's need for up-to-the-second, operational, integrated, collective
information.‖ (Inmon, 1992). As can be seen, an ODS differs from a DW in that the ODS‘s contents are
updated in the course of business, whereas a data warehouse contains static data. Therefore, this
architecture is suitable for real time or near real time reporting and analysis that can be done without
impacting the performance of the production systems. Unfortunately, operational data is not designed for
decision support applications, and complex queries may result in long response times and heavy impact
on the transactional systems.
Maturity Assessment Question(s)
All this being said, we can now show the maturity assessment questions related to these elements:
- 26 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
1) What is the predominant architecture of your DW?
a) Level 1 – Desktop data marts (e.g.: Excel sheets)
b) Level 2 – Multiple independent data marts
c) Level 3 – Multiple independent data warehouses
d) Level 4 – A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball)
e) Level 5 – A DW/BI service that federates a central enterprise DW and other data sources via a standard
interface.
2) What types of data sources does your DW extract data from at the highest level?
a) Level 1 – CSVs files
b) Level 2 – Operational databases
c) Level 3 – ERP and CRM systems; XML files
d) Level 4 – Unstructured data sources (e.g.: text documents, e-mail files)
e) Level 5 – Various types of unstructured data sources (e.g.: images, videos) and Web data sources.
Table 5: DW Architecture Maturity Assessment Questions.
As can be seen, we focused our attention on the conceptual architecture and on the types of data sources
that the DW supports at the highest level as we considered them to be the most important high-level
elements that characterize the maturity of the conceptual architecture. The hierarchical order in which the
answers were organized was deduced from the information given beforehand on these elements and from
the literature study we had done.
4.1.3
Infrastructure
Infrastructure is a very important component of a DW as it provides the underlying foundation that
enables the DW architecture to be implemented. It is sometimes called technical architecture and it
includes several elements such as: hardware platforms and components (i.e.: disks, memory, CPUs,
DW/ETL/BI applications servers), operating systems (e.g.: UNIX), database platforms (e.g.: relational
engines or multidimensional/OLAP engines), connectivity and networking. Several factors influence the
implemented infrastructure: the business requirements, the technical and systems issues, the specific skills
and experience of the DW team, policy and other organizational issues, expected growth rates, etc.
(Kimball et al., 2008).
An important aspect here is the parallel processing hardware architecture used: symmetric
multiprocessing (SMP), massively parallel processing (MMP) and non-uniform memory architecture
(NUMA). These architectures differ in the way the processors work with the disk, memory and each
other. It is important to gain sufficient insight into each option‘s features, benefits and limitations in order
to select the proper server hardware. Therefore, you cannot say that one is more mature than the other.
For more information on parallel processing hardware architectures, see (Kimball et al., 2008; Ponniah,
2001).
As DWs contain large volumes of data with a different structure than the operational databases, a
specialized infrastructure for DW can be critical for performance and better results. The most important
aspect is to have different servers for the OLTP and DW systems. However, many organizations ignore
this thing and use the same servers for both systems which leads to low performance. Once this is done,
higher performance can be achieved by having separate servers for DW, ETL and BI applications. Lately,
a new hardware solution has been developed for increasing the performance of the DW system: a
- 27 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
specialized DW appliance. It consists of a small amount of proprietary hardware with an integrated set of
servers, storage, operating system(s), DBMS and software specifically pre-installed and pre-optimized for
data warehousing. Though such appliances are expensive relative to regular hardware, the custom
hardware they contain allows them to claim a 10-50 times improvement over existing database solutions
(Madden, 2006). Another reason for buying such an appliance is simplicity. The appliance is delivered
complete (―no assembly required‖) and installs rapidly. Finally, if there are any problems, the appliance
requires complex analysis, but only a single call to the appliance vendor for a solution (Feinberg & Beyer,
2010).
Maturity Assessment Question(s)
From the information presented above, we decided that a representative question to assess the maturity of
the infrastructure refers to the specialization of infrastructure for a DW solution:
3) To what degree is your infrastructure specialized for a DW?
a) Very low – Desktop platform
b) Low – Shared OLTP systems and DW environment
c) Moderate – Separate OLTP systems and DW environment
d) High – Separate servers for OLTP systems, DW, ETL and BI applications
e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata).
Table 6: Infrastructure Maturity Assessment Questions.
4.1.4
Metadata
Metadata is usually defined as ―data about data‖ (Shankaranarayanan & Even, 2004). However, this
definition does not give a clear image on what metadata actually is. Metadata can be seen as all the
information that defines and describes the structures, operations and contents of the DW system in order
to support the administration and effective exploitation of the DW. The DW/BI industry often refers to
two main categories of metadata (Moss & Atre, 2003):


Business metadata - provides business users with a roadmap for accessing the business data in the
DW/BI decision-support environment. It describes the contents of the DW in more user
accessible terms. It shows what data the user can find, where it comes from, what it means and
what its relationships is to the other data in the DW.
Technical metadata - supports the technicians and ―power users‖ by providing them with
technical information about the objects and processes that make up the DW/BI system.
Some differences between business metadata and technical metadata are highlighted in table 7.
Business Metadata
Provided by business people
Documented in business terms on data models and in
data dictionaries
Used by business people
Technical Metadata
Provided by technicians or tools
Documented in technical terms in databases, files,
programs, and tools
Used by technicians, ―power users‖, databases,
programs, and tools (e.g.: ETL, OLAP)
Names fully spelled out in business language
Abbreviated names with special characters, used in
databases, files, and programs
Table 7: Business Metadata vs. Technical Metadata (adapted from (Moss & Atre, 2003)).
- 28 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
(Kimball et al., 2008) propose a third category of metadata:

Process metadata – describes the results of various operations in the DW and it is especially
applied to the ETL or query processes. For example, in the ETL process, each task logs key data
about is execution, such as start and end time, CPU seconds used, rows processed, etc. Similar
process metadata is generated when users query the DW. This data is very important for
performance monitoring and improvement process.
Metadata can be considered the DNA of the DW as it defines its elements and how they work together. It
drives the DW and provides flexibility by buffering the various components of the system from each other
(Ponniah, 2001).
A very important aspect related to metadata is integration. Metadata is usually stored and maintained in
repositories. These are structured storage and retrieval systems, typically built on top of a conventional
DBMS. A repository is not simply a storage component, but also embodies functionalities necessary to
handle the stored metadata. However, the reality is that most tools create and manage their own metadata
repository and therefore, there will be several metadata repositories scattered around the DW system.
These repositories often use different storage types and thus, they may have overlapping content. It‘s this
combination of multiple repositories that causes problems and hence, the best solution is a single
integrated metadata repository (Kimball et al., 2008). However, implementing an integrated metadata
repository can be very challenging, but if succeeded, it would be valuable in several ways: it could help
identify the impact of making a change to the DW system; it could serve as a source for auditing and
documentation; it would ensure metadata quality and synchronization, etc. As usually an organization
supports tools from more vendors, it is rather difficult to create an integrated metadata repository due to
lack of standardization. But, despite all these challenges, a metadata repository is a mandatory component
of every DW environment and metadata should be gathered for all the components of the DW (i.e.: data
modelling, ETL, BI applications, etc.) (Ponniah, 2001; Moss & Atre, 2003).
Another important aspect related to metadata is accessibility (Moss & Atre, 2003). In order to reach its
goal, BI applications metadata should always be available and easily accessible to end users for a better
understanding and usage of the DW solution. Of course, the best solution would be a complete integration
of metadata with the BI applications (i.e.: metadata can be accessed through one button push on the
attributes, metrics, etc.). However, this is also the hardest one to implement. In case the organization has a
metadata repository implemented, another efficient way of accessing metadata is through a metadata
management tool. But, there are still many organizations that do not pay much attention to business
metadata and its accessibility and, therefore, metadata is very often not available or available by sending
documents to users by request.
Maturity Assessment Question(s)
As metadata is an underlying element in a DW and it has specific characteristics for each of the major
components – data modelling, ETL, BI applications – we will have one maturity question regarding
metadata in each of the mentioned categories. For architecture, we decided that the metadata maturity
question should refer to the general metadata management:
4) To what degree is your metadata management implemented?
- 29 -
DWCMM: The Data Warehouse Capability Maturity Model
a)
b)
c)
d)
e)
Catalina Sacu
Very low – No metadata management
Low – Non-integrated metadata by solution
Moderate – Central metadata repository separated by tools
High – Central up-to-date metadata repository
Very high – Web-accessed central metadata repository with integrated, standardized, up-to-date metadata.
Table 8: Metadata Management Maturity Assessment Question.
4.1.5
Security
A DW is a veritable gold mine of information as all of the organization‘s critical information is readily
available in a format easy to retrieve and use. The DW system must publish data to those who need to see
it, while simultaneously protecting the data. On the one hand, the DW team is judged by how easily the
business user can access the data, and on the other hand, the team is blamed if sensitive data gets into the
wrong hand or if data is lost. Therefore, security is very important for the success of the DW even if some
organizations seem to ignore this fact. User access security is usually implemented through several
methods (Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001):




Authentication – the process of identifying a person, usually based on a logon ID and password.
This process is meant to ensure that the person is who he or she claims to be. There are several
levels of authentication depending on how sensitive the data is. The first level consists of a
simple, static password, followed by a system-enforced password pattern and periodically
required changes. An organization with a DW solution should at least have this security method
implemented.
Role-based security – databases usually offer role-based security. A role is just a grouping of
users with some common requirements for accessing the database. Once the roles are created,
users can be set up in the appropriate roles and access privileges may be granted at the level of a
role. A privilege is an authorization to perform a particular operation; without explicitly granted
privileges, a user cannot access any information in the database. While privileges let you restrict
the types of operations a user can perform, managing these privileges may be complex. To
address the complexity of privilege management, database roles encapsulate one or more
privileges that can be granted to and revoked from users.
Tool-based security – tool-based security is usually not as flexible as role-based security at
database level. Nevertheless, tool-based security can form some part of the security solution.
However, if the DW team is planning to use the DBMS itself for security protection, then toolbased security may be considered redundant.
Authorization – the process of determining what specific content a user is allowed to access. Once
users are authenticated, the authorization process defines the access policy. Authorization is a
more complex problem in the DW system than authentication because limiting access can have
significant maintenance and computational overhead.
No matter of the chosen security strategy, a very important and hard to achieve goal is to establish a
security policy for the DW compliant with the organizational security policy and to implement and
integrate this security at a companywide level (Ponniah, 2001).
- 30 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Maturity Assessment Question(s)
From the most important aspects related to security presented above – related to the way security is
implemented for the DW – we came up with the following maturity question for this DW component:
5) To what degree is security implemented in your DW architecture?
a) Very low – No security implemented
b) Low – Authentication security
c) Moderate – Independent tool-based security
d) High – Role-based security at database level
e) Very high – Integrated companywide authorization security
Table 9: Security Maturity Assessment Question.
4.1.6
Business Rules for DW
Business rules are abstractions of the policies and practices of a business organization. They reflect the
decisions needed to accomplish business policy and objectives of an organization (Kaula, 2009). Business
rules are used to capture and implement precise business logic in processes, procedures, and systems
(manual or automated). Therefore, business rules are an important aspect when implementing a DW.
Example of business rules used in a DW are: different attributes, ranges, domains, operational records,
etc. Business rules can serve different purposes in the development of a DW (Ponniah, 2001):




They are very important for data quality and integrity. In order to have the right data in the DW, it
is important that the values of each data item adhere to prescribed business rules. For example, in
an auction system, the sale price cannot be less than the reserve price. Many data quality
problems are determined by violation of such business rules. An example would be when an
employee record comes up with the number of days (i.e.: days worked in a year plus vacation
days, holidays and sick days) more than 365 or 366.
They are a source for business metadata.
They should be taken into consideration when requirements are defined
They should be used for data modelling and applied for the extraction and transformation of data.
Maturity Assessment Question(s)
To sum up, an enterprise that properly documents and actually follows its business rules will have a better
DW and will also manage change better than one that ignores its rules. Being hard to assess at a high level
which business rules are defined and implemented, we decided to include a more general assessment
question that can be seen here.
6) To what degree have you defined and documented definitions and business rules for the necessary
transformations, key terms and metrics?
a) Very low – No business rules defined
b) Low – Most of the business rules defined and documented
c) Moderate – Few business rules defined and documented
d) High – Some business rules defined and documented
e) Very high – All business rules defined and documented.
7) To what degree have you implemented definitions and business rules for the necessary transformations, key
- 31 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
terms and metrics?
a) Very low– No business rules implemented
b) Low – Most of the business rules implemented
c) Moderate – Few business rules implemented
d) High – Some business rules implemented
e) Very high – All business rules implemented.
Table 10: Business Rules Maturity Assessment Questions.
4.1.7
DW Performance Tuning
DWs usually contain large volumes of data. At the same time, they are query-centric systems and hence,
the need to process queries faster dominates. That is the reason why various methods are needed to
improve performance (Ponniah, 2001):

-
-
-
-


Software performance improvement – the most often used are (Chauduri & Dayal, 1997):
index management – indexes are database objects associated with database tables and created to
speed up access to data within the tables. Indexing techniques have already been in existence for
decades for transactional systems, but in order for them to handle large volume of data and
complex queries common in DWs, some new or modified techniques have to be implemented for
indexing the DWs (Vanichayobon & Gruenwald, 2004). The most used indexing techniques for
data warehousing are: B-tree index, bitmap index, projection index.
data partitioning – typically, the DW holds some very large database tables. Loading these tables
can take excessive time; building indexes for large tables can also create problems sometimes.
Therefore, another solution for performance tuning is data partitioning which mean deliberate
splitting of a table and its index into manageable parts.
parallel processing – major performance improvement can be achieved if the processing is split
into components that are executed in parallel. The simultaneous concurrent executions will
produce the results faster. Parallel processing techniques work in conjunction with data
partitioning schemes. They are usually features of the used DBMS and some physical options are
also critical for effective parallel processing.
view materialization – many queries over DWs require summary data, and therefore use
aggregates. Hence, besides the detailed data, the DW needs to contain summary data.
Materializing summary data on different parameters can help to accelerate many common queries
by significantly speeding up query processing,
Hardware performance improvement – scale the DW server to match the query requirements,
tune the DW computing platform (i.e.: a set of hardware components and the whole network).
Specialized DW appliances or DW Cloud Computing – an overview on the former was given in
4.1.3. Cloud Computing is the latest trend in data warehousing/BI and it is not very mature yet.
Some of the advantages of Cloud Computing are: performance - better query and data load
performance; simplicity – rapid time to value and simple tools for agile provisioning and
simplified management; elasticity – scale on demand; low acquisition and maintenance costs –
price based on utilization.
- 32 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Maturity Assessment Question(s)
An organization that has a DW in place usually starts its performance tuning with the first category (i.e.:
software tuning), and if this does not pay off, they continue with the second option (i.e.: hardware tuning).
However, the organizations with a lot of experience in data warehousing understand that the best solution
to improve performance is to buy a DW specialized appliance or to resort to the latest trend, DW cloud
computing. Therefore, the maturity question for performance tuning can be depicted in the table below.
8) To what degree do you use methods to increase the performance of your DW?
a) Very low – No methods to increase performance
b) Low – Software performance tuning (e.g.: index management, parallelizing and partitioning system, views
materialization)
c) Moderate – Hardware performance tuning (e.g.: DW server)
d) High – Software and hardware tuning
e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata) or cloud computing.
Table 11: Performance Tuning Maturity Assessment Question.
4.1.8
DW Update Frequency
The classical DW solutions were built for strategic and tactical BI that would help executives or line-ofbusiness mangers develop and assess progress in achieving long-term enterprise goals. This uses
historical data which is one day to a few months or even years old. However, this tradition has been
changing lately. With ever-increasing competition and rapidly changing customer needs and technologies,
enterprise decision makers are no longer satisfied with scheduled analytics reports, pre-configured KPIs
or fixed dashboards. They demand ad hoc queries to be answered quickly, they demand actionable
information from analytic applications using real-time business performance data, and they demand these
insights be accessible to the right people exactly when and where they need them (Azvine, 2005).
Therefore, real time processing is an increasingly common requirement in data warehousing, as more and
more business users expect the DW to be continuously updated throughout the day and grow impatient
with stale data. However, building a real time DW/BI system requires gathering a very precise
understanding of the true business requirements for real time data and identifying an appropriate ETL
architecture that incorporates a variety of technologies integrated with a solid platform.
Maturity Assessment Question(s)
As a conclusion, one could say that an organization that does real-time data warehousing is a very mature
one as it probably has optimized processes and ETL. Real time data warehousing is however, a very
complex activity and it is hard to judge from a high level point of view if it is done successfully and with
a high data quality. But, we will tackle this problem by including here a maturity question regarding the
update frequency of the DW and another question in the ETL part that will assess its complexity and
performance.
9) Which answer best describes the update frequency for your DW?
a) Level 1 – Monthly update or less often
b) Level 2 – Weekly update
c) Level 3 – Daily update
- 33 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
d) Level 4 – Inter-daily update
e) Level 5 – Real-time update.
Table 12: Update Frequency Maturity Assessment Question.
4.2 Data Modelling
4.2.1
Data Modelling Definition and Characteristics
A data model is ―a set of concepts that can be used to describe the structure of and operations on a
database‖ (Navathe, 1992). By structure of a database, (Navathe, 1992) refers to the data types,
relationships and constraints that define the ―template‖ of that database. Hence, data modelling is the
process of creating a data model.
Furthermore, data modelling is very important for creating a successful information system as it defines
not only data elements, but also their structures and relationships between them. Data modelling
techniques are used to model data in a standard, consistent, predictable manner in order to manage it as a
resource. Some authors like (Simsion & Witt, 2005) consider the data model to be ―the single most
important component of an information system‘s design‖ due to several reasons:



leverage – a small change to the data model may have major impact on the system as a whole.
Problems with data organization arise not only from failing to meet initial business requirements,
but also from expensive changes to the business after the database had been built.
conciseness – a data model is a very powerful tool for expressing information systems
requirements and capabilities whose value lies partly in conciseness.
data quality – a data model plays a key role in achieving good data quality by establishing a
common understanding o what is to be held in each table and column.
4.2.2
Data Models Classifications (Data Models Levels and Techniques)
Throughout time, there were a lot of data models developed that can be classified mainly along two
dimensions (Navathe, 1992):
a) the first dimension deals with the steps of the overall database design activity to which the model
applies. The classic database design process consists of mapping requirements of data and
applications successively through the following steps (Navathe, 1992): conceptual design, logical
design and physical design. (Golfarelli & Rizzi, 1998) and (Husemann et al., 2000) propose a
DW design approach similar to the traditional database design. Hence, we will consider the three
sequential phases/levels in figure 8 to serve as a reference for a complete DW design process
model: conceptual design, logical design, physical design.
- 34 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Figure 8: DW Design Process Levels (adapted from (Husemann et al., 2000)).
Conceptual Design
Conceptual design translates user requirements into an abstract representation understandable to the user
that is independent of implementation issues, but is formal and complete, so that it can be transformed
into the next logical schema without ambiguities (Tryfona et al., 1999). The conceptual data model is
usually represented as a diagram with supporting documentation (Simsion & Witt, 2005) (e.g.: high level
model diagram as described by (Kimball et al., 2008) for dimensional modelling).
Logical Design
Logical design models data using constructs that are easy for users to follow, avoid physical details of
implementation, but typically depend on the kind of DBMS used in the implementation (e.g.: relational
data model, dimensional data model, etc.) (Navathe, 1992). It is usually the most often implemented and
it makes the connection between the conceptual design and the physical one. Logical design is still easily
understood by users, and it does not deal with the physical implementation details yet. It only deals with
defining the types of information that are needed.
Physical Design
Physical design incorporates any necessary changes to achieve adequate performance and consists of a
variety of choices for storage of data in terms of clustering, partitioning, indexing, directory structure,
access mechanisms, etc. (Navathe, 1992; Simsion & Witt, 2005). Some guidelines on developing
concepts for describing physical implementations along the lines of a data model can be found in (Batory,
1988).
b) the second dimension deals with the flexibility (i.e.: the ease with which a model can deal with
complex application situations) and expressiveness of the data model (i.e.: the ease with which a
model can bring out the different abstractions and relationships in an involved application) and it
includes mainly the following types of models: record-based data models, semantic data models
and object-based models. For an overview on these models, see (Navathe, 1992).
- 35 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
In this section we will briefly describe two of the most often used data modelling techniques in data
warehousing: entity-relationship data models, relational data models. We will have a separate paragraph
for dimensional modelling because, as mentioned before, we will focus on this data modelling technique
in our research and we will have several questions in the data modelling maturity assessment
questionnaire dedicated to dimensional modelling.
Entity-Relationship (ER) Data Models
Entity-relationship (ER) model proposed by (Chen, 1975), is one of the most famous semantic data
models, and it has been a precursor for many subsequent variations. It is used mainly for conceptual
design and the basic constructs in the ER model are (Chen, 1975):



entities – An entity is recognized as being capable of an independent existence which can be
uniquely identified. It is an abstraction from the complexities of a certain domain and it can be a
physical object, an event or a concept. Entities can be viewed as nouns.
relationships – A relationship captures how two or more entities are related to one another.
Relationships can be thought of as verbs, linking two or more nouns.
attributes – An attribute expresses the information about an entity or a relationship which is
obtained by observation or measurement.
Moreover, entities, relationships and attributes are classified in sets and this is what ER models usually
show. However, the distinction between entities and relationships or entities and attributes can sometimes
be fuzzy and it should be clarified for each particular environment. In conclusion, the ER model is fairly
simple to use, has been formalized and has a reasonably unique interpretation with an easy diagrammatic
notation. It has remained a favourite as a means for conceptual design as an easy way of communication
in the early stages of database design. It is also used for conceptual DW design (for both Inmon‘s and
Kimball‘s views), but especially for enterprise-wide DWs when applying Inmon‘s view on developing
DWs.
Relational Data Models
The relational data model is a record-base data model proposed by (Codd, 1970). It became a landmark
development in this area because it provided a mathematical basis to the discipline of data modelling. The
fundamental assumption is that all data are represented as mathematical n-ary relations, an n-ary relation
being a subset of the Cartesian product of n domains given as sets (i.e.: S1, S2, …, Sn, not necessarily
distinct). A relation on n sets can also be defined as a set of n tuples each of which has its first element
from S1, its second element from S2, and so on (Codd, 1970).
These relations are organized in the form of tables which consist of tuples (rows) of information defined
over a set of attributes (columns). The attributes, in turn, are defined over a set of atomic domains of
values. The data from the model are operated upon by means of a relational algebra, which includes
operations of selection, projection, join as well as set operations of union, intersection, Cartesian product,
etc. Moreover, there are two types of constraints that apply for this model:

the entity integrity constraint – guarantees the uniqueness of a table‘s key;
- 36 -
DWCMM: The Data Warehouse Capability Maturity Model

Catalina Sacu
the referential integrity constraint – guarantees that whenever a column in one table derives
values from a key of another table, those values must be consistent.
Due to its simplicity of modelling, the relational data model gained a wide popularity among business
applications developers. It is usually used to capture the microscopic relationships among data elements
and eliminate data redundancies. It is extremely beneficial for transaction processing because it makes
transaction loading and updating simple and fast. However, it is also used for DW design as the logical
model when following Inmon‘s view on developing DWs.
Maturity Assessment Question(s)
As we are not going to judge which data modelling technique is better for data warehousing, we
considered that two significant characteristics that could determine the maturity of this category for a DW
are: the synchronization (i.e.: establishing consistency among data from a source to a target data storage
and vice versa and the continuous harmonization of the data over time) between all the data models found
in the DW (i.e.: ETL source and target models, DW and data marts models, BI models); and the
differentiation between data models levels (i.e.: physical, logical and conceptual). Companies usually
ignore the conceptual level as, at first, they do not see any benefits from it. However, in time, some of
them realize that it is very important for a solid and consistent data modelling and start designing it.
1) Which answer best describes the degree of synchronization between the following data models that your
organization maintains and the mapping between them: ETL source and target models; DW and data marts
models; BI semantic or query object models?
a) Automatic synchronization of all of the data models
b) Manual synchronization of some of the data models
c) No synchronization between data models
d) Manual or automatic synchronization depending on the data models
e) Automatic synchronization of most of the data models.
2) To what degree do you differentiate between data models levels: physical, logical and conceptual?
a) No differentiation between data models levels
b) All data models have conceptual, logical and physical levels designed
c) Logical and physical levels designed for some data models
d) Conceptual level also designed for some data models
e) Logical and physical levels designed for all the data models.
Table 13: Data Model Synchronization and Levels Maturity Assessment Questions.
4.2.3
Dimensional Modelling
ER diagrams and relational modelling are popularly used for database design in OLTP environments, but
also in DWs. However, the database designs recommended by ER diagrams are considered by some
authors to be inappropriate for decision support systems where efficiency in querying and in loading data
is very important (Chauduri & Dayal, 1997). Relational (i.e.: normalized) modelling has some
characteristics that are appropriate for OLTP systems, but not for DWs:

its structure is not easy for end-users to understand and use. In OLTP systems this is not a
problem because, usually, end-users interact with the database through a layer of software.
- 37 -
DWCMM: The Data Warehouse Capability Maturity Model

Catalina Sacu
data redundancy is minimized. This maximizes efficiency of updates, but tends to penalize
retrievals. Data redundancy is not a problem in DWs because data is not updated on-line.
Dimensional modelling came as a solution to these problems. It was proposed by (Kimball, 1996) and has
been adopted as the predominant approach to designing DWs and data marts in practice (Moody &
Kortink, 2000). Dimensional modelling is a logical design technique for structuring data so that it is
intuitive to business users and delivers fast query performance (Kimball, 1996). The main advantages of
dimensional modelling are (Kimball et al., 2008; Ponniah, 2001): understandability, query performance
and flexibility.
Dimensional modelling divides the world into:


measurements – Measurements are captured by the organization‘s business processes and their
supporting operational source systems. They are usually numeric values and are called facts.
context – Facts are surrounded by largely textual context that is true at the moment the fact is
recorded. This context is intuitively divided into independent logical parts called dimensions.
Each of the organization‘s business processes can be represented by a dimensional model that consists of
a fact table containing the numeric measurements surrounded by several dimension tables containing the
textual context. This star-like structure is often called a star join (Kimball et al., 2008). Dimensional
models can be stored in:


a relational database platform (i.e.: a ROLAP server) – they are typically referred to as star
schemas.
multidimensional online analytical structures (i.e.: MOLAP servers) – they are typically called
cubes.
An example of a star schema and a cube can be seen in the figure below.
Figure 9: Star Schema vs. Cube (adapted from (Chauduri & Dayal, 1997)).
However, star schemas do not explicitly provide support for attribute hierarchies and sometimes,
snowflake schemas are used. They provide a refinement of star schemas where the dimensional hierarchy
- 38 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
is explicitly represented by normalizing the dimension tables. This leads to advantages in maintaining the
dimension tables. However, the denormalized structure of the dimensional tables in star schemas may be
more appropriate for browsing the dimensions. There are also other structures used for dimensional
modelling (e.g.: fact constellations), but the ones we presented are the most often implemented. For more
information on dimensional modelling, see (Kimball, 1996) and (Kimball et al., 2008).
Fact Tables
Fact tables store the performance measurements generated by the organization‘s business activities or
events. The value of fact is usually not known in advance because it is variable and the fact‘s valuation
occurs at the time of the measurement event. Two aspects are important when analyzing fact tables
(Kimball et al., 2008):


Fact table keys – they are characterized by a multipart key made up of foreign keys coming from
the intersection of the dimension tables involved in the business process. This shows that a fact
table always expresses a many-to-many relationship.
Fact table granularity – it refers to the level of detail of the data stored in a fact table. High
granularity refers to data that is at or near the transaction level, data referred to as atomic level
data. Low granularity refers to data that is summarized or aggregated, usually from the atomic
level data.
Dimension Tables
In contrast to the rigid qualities of fact tables consisting of only keys and numeric measurements,
dimension tables are filled with a lot of descriptive fields. In many ways, the power of the DW is
proportional to the quality and depth of the dimension attributes as robust dimensions translate into robust
querying and analysis capabilities. The most important aspects when analyzing dimension tables are
(Kimball et al., 2008):




Dimension table keys – whereas fact tables have a multipart key, dimension rows are uniquely
identified by a single key field. It is recommended that surrogate keys should be used and not the
keys that were used in the source systems. These surrogate keys are meaningless and they merely
serve as join fields between the fact and dimension tables. For practical reasons, they are usually
represented as simple integers assigned in sequence.
Conformed dimensions – they are dimensions that adhere to the same structure and are shared
across the enterprise‘s DW environment, joining to multiple fact tables representing various
business processes. Conformed dimensions are either identical or strict mathematical subsets of
the most granular, detailed dimension. Dimension tables are not conformed if the attributes are
labeled differently or contain different values.
Hierarchies – A hierarchy is a set of parent-child relationships between attributes within a
dimension. These hierarchy attributes, called levels, roll up from child to parent, for example,
Customer totals can roll up to Sub-region totals which can further rollup to Region totals. Another
example would be: Daily sales roll up to Weekly sales, which rollup to Month to Quarter to
Yearly sales.
Slowly changing dimensions - the dimensional model needs to track time-variant dimension
attributed as required by the business requirements. There are mainly three techniques for
- 39 -
DWCMM: The Data Warehouse Capability Maturity Model

Catalina Sacu
handling slowly changing dimensions (SCDs): type 1 – overwrite of one or more attributes in an
existing dimension row; type 2 – copy the previous version of the dimension row and create a
new row with a new surrogate key; type 3 – add and populate a new column of the dimension
table with the previous values and populate the original column with the new values. Of course,
these techniques will sometimes be used in a hybrid approach for better management.
Special dimensions – These are dimensions that are only sometimes needed, but they involve
knowledge and experience to be successfully built: mini dimensions (i.e.: dimensions created by
the possible combination of the frequently analyzed or frequently changed attributes of the
rapidly changing large dimensions); large dimensions (i.e.: dimensions with a very large number
of rows or with a large number of attributes), junk dimensions (i.e.: structures that provide a
convenient place to store junk attributes such as transactional codes, flags and/or text attributes
that are unrelated to any particular dimension), etc.
Maturity Assessment Question(s)
The maturity assessment part for dimensional modelling includes three questions on the most important
characteristics for fact and dimensional tables. The best approach for designing fact tables is to have a
very high percentage of data at a low level of granularity in order to be able to do analysis at whichever
level of aggregation. Regarding the dimension tables, the implementation of slowly changing dimensions
and special dimensions implies advanced knowledge and experience, and are therefore specific to
organizations on a higher maturity stage.
3) What percentage of all your fact tables has their granularity at the lowest level possible?
a) Very few fact tables have their granularity at the lowest level possible
b) Few fact tables have their granularity at the lowest level possible
c) Some fact tables have their granularity at the lowest level possible
d) Most fact tables have their granularity at the lowest level possible
e) All fact tables have their granularity at the lowest level possible.
4) To what degree do you design conformed dimensions in your data models?
a) No conformed dimensions
b) Conformed dimensions for few business processes
c) Enterprise-wide standardized conformed dimensions for most business processes; also making use of a high
level design technique such as an enterprise bus matrix
d) Conformed dimensions for some business processes
e) Enterprise-wide standardized conformed dimensions for all business processes.
5) Which answer best describes the current state of your dimension tables modelling?
a) Few dimensions designed; no hierarchies or surrogate keys designed
b) Some dimensions designed with surrogate keys and basic hierarchies (if needed)
c) Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed)
d) Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed
e) Besides regular dimensions and slowly changing dimensions techniques, special dimensions are also
designed (e.g.: mini, monster, junk dimensions).
Table 14: Dimensional Modelling Maturity Assessment Questions.
- 40 -
DWCMM: The Data Warehouse Capability Maturity Model
4.2.4
Catalina Sacu
Data Modelling Tool
Data models can be created by just drawing the models in different spreadsheets and documents.
However, the more optimum solution is to use a data modelling tool. The main advantages of using a data
modelling tool are (Kimball et al., 2008):





It makes the connection and transition between all the data models levels easier.
It integrates the DW model with other corporate data models.
It helps assure consistency in naming and definition.
It creates good documentation in a variety of useful formats.
It makes metadata management for data modelling easier.
However, the most important benefits of using a data modelling tool refer to making the design itself and
metadata management easier and more efficient.
Maturity Assessment Question(s)
As the usage of a data modelling tool can be a differentiator for an organization developing a DW
solution, we included in our assessment a maturity question derived from the information provided above:
6) Which answer best describes the usage of a data modelling tool in your organization?
a) Level 1 – No data modelling tool
b) Level 2 – Scattered data modelling tools used only for design
c) Level 3 – Scattered data modelling tools used also for maintenance
d) Level 4 – Standardized data modelling tool used only for design
e) Level 5 – Standardized data modelling tool used for design and maintaining metadata.
Table 15: Data Modelling Tool Maturity Assessment Questions.
4.2.5
Data Modelling Standards
DW Standards Overview
Standards in a DW environment are necessary and cover a wide range of objects, processes, and
procedures. Standards range from how to name the fields in the database to how to conduct interviews
with the user departments for requirements definition. Standards do not need only to be defined and
documented, but it is very important to actually implement them and use them constantly. The definition
of standards would also benefit if a person or a group in the DW would be designated to revise the
standards and keep them up-to-date. By consistently applying standards, it will be much easier for the
business users and developers to navigate the complex DW system. Standards also provide a consistent
means for communication. Effective communication must take place among the members of the project
and the users. Standards ensure consistency across the various areas leaving less room for ambiguity.
Therefore, one could say that the importance of standards cannot be overemphasized (Ponniah, 2001).
This is why many companies invest a lot of time and money to prescribe standards for their information
systems and implicitly, for their DW.
- 41 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
As can be seen, standards can be defined and implemented for every part of the DW architecture and
processes and this is why we will include questions regarding the definition and implementation of
standards for the maturity assessment of each of the major components – data modelling, ETL, BI
applications.
Data Modelling Standards
With regard to data modelling, standards are many and diverse. They can be applied to all the data models
levels (i.e.: conceptual, logical and physical) and most often standards like naming conventions for the
objects and attributes in the data models take on special significance. Other standards here refer to the
way one data model is derived from the other, the way metadata is documented or how data quality is
taken care of in this phase.
Maturity Assessment Question(s)
All the maturity assessment questions related to standards will address general aspects such as the
definition and documentation of standards and their actual implementation. The same principle applies for
data modelling. There is an important distinction between having some standards defined and written
down somewhere and actually following those standards.
7) To what degree have you defined and documented standards (e.g.: naming conventions, metadata, etc.) for your
data models?
a) Very low – No standards defined for data models
b) Low – Solution-dependent standards defined for some of the data models
c) Moderate – Enterprise-wide standards defined for some of the data models
d) High – Enterprise-wide standards defined for most of the data models
e) Very high – Enterprise-wide standards defined for all the data models.
8) To what degree have you implemented standards (e.g.: naming conventions, metadata, etc.) for your data
models?
a) Very low – No standards defined for data models
b) Low – Solution-dependent standards defined for some of the data models
c) Moderate – Enterprise-wide standards defined for some of the data models
d) High – Enterprise-wide standards defined for most of the data models
e) Very high – Enterprise-wide standards defined for all the data models.
Table 16: Data Modelling Standards Maturity Assessment Questions.
4.2.6
Data Modelling Metadata Management
Data models usually need a lot of metadata (business and technical) to be documented in order to create
consistency and understandability for both developers and users. A common subset of business metadata
components as they apply to data includes (Moss & Atre, 2003): data names, definitions, relationships,
identifiers, types, lengths, policies, ownership, etc. The standardization of the metadata documentation is
also critical for integration among data models. Hence, the maturity question depicted below.
Maturity Assessment Question(s)
9) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality,
- 42 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
etc.) in your data models?
a) Very low – No documentation for any data models
b) Low – Non standardized documentation for some of the data models
c) Moderate – Standardized documentation for some of the data models
d) High – Standardized documentation for most of the data models
e) Very high – Standardized documentation for all the data models.
Table 17: Data Modelling Metadata Management Maturity Assessment Questions.
4.3 Extract – Transform – Load (ETL)
4.3.1
What is ETL?
The Extract-Transform-Load (ETL) process is part of the DW back room component. As the name shows,
the ETL process involves the following activities:



extracting data from outside sources;
transforming data to fit the target‘s requirements;
loading data into the target database.
According to (Kimball et al., 2008), there is also a forth component of the ETL system, called managing
the ETL environment. This component is very important as in order for the ETL processes to run
consistently to completion and be available when needed, they have to be managed and maintained. These
activities are also part of the DW maintenance and monitoring processes, but there are some important
technical components that need to be implemented and this is why we will also elaborate on it in this
paragraph. Moreover, (Kimball et al., 2008) propose 34 subsystems that form the ETL architecture and
divide them for every ETL main activity (i.e.: extract, transform, load and manage).
However, even if the name seems to be understood by everyone, nobody can say why the ETL system is
so complex and resource demanding (Kimball et al., 2008). Easily, 60 to 80 percent of the time and effort
of developing a DW project is devoted to the ETL system (Nagabhushana, 2006). Building an ETL
system is very challenging because many outside constraints put pressure on the ETL design: business
requirements, source data systems, budget, processing windows and available staff skills. Hence,
designing ETL processes is extremely complex, often prone to failure, and time consuming (Simitsis et
al., 2005). However, since it is extensively recognized that the design and maintenance of the ETL
processes are a key factor in the success of a DW project (March & Hevner, 2007; Solomon, 2005),
organizations put a lot of effort into implementing a powerful ETL system. In order to formulate the
chosen maturity questions for this category, we would like to first give a short overview on each ETL
component.
4.3.2
Extract
The extraction system is the first of the ETL architecture. It addresses the issues of understanding the
source data, extracting the data and transferring it to the DW environment where the ETL system can
operate on them independent of the operational systems (Kimball et al., 2008). Depending on the DW
- 43 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
architecture, the extracted data may go directly into the DW or in a data staging area. Extraction
essentially resumes to two questions (Loshin, 2003):


What data should be extracted?
How should that data be extracted?
The first question essentially relies on which results clients expect to see in their BI applications.
However, the answer is not that simple as it depends on what source data we have and also, the data
model that the architects had previously developed. The answer to the second question may depend on
the scale of the project, the number and disparity of the data sources, and how far into the implementation
the developers are. Extraction can be as simple as a collection of simple SQL queries or as complex as to
require ad hoc, specially designed programs written in a proprietary programming language (Loshin,
2003). The other alternative is to use tools to help automate the process and obtain better results.
Depending on the organization and the data warehouse project, data can be extracted from various source
systems.
Moreover, according to (Kimball et al., 2008), there are three subsystems that support the extraction
process:



Data profiling system – it does the technical analysis of data to describe its content, consistency
and structure. It focuses on the instance analysis of individual attributes, providing information
such as data type, length, value range, uniqueness, occurrence of null values, typical string
pattern, etc. (Rahm & Hai Do, 2000). The profiling step protects the ETL team from dealing with
dirty data and provides them guidance to set expectations regarding realistic development
schedules, limitations in the source data and the need to invest in better source data capture
practices.
Change data capture (CDC) system – It will offer the capability to transfer only the source data
that has changed since the last load. This is not important at the first historic load, but it will
prove very useful from this point forward. Implementing the CDC system is not an easy task. For
more information on how to capture source data changes, see (Kimball et al., 2008).
Extract system – This is a fundamental component of the ETL architecture and it refers to the data
extraction itself whether it is done by writing scripts or by using a tool. Sometimes, data has to be
extracted from only one system, but most of the times, each source might be in a different system.
There are two primary methods for getting data from a source system: as a file or as a stream that
is constructing the extract system as a single process. Other two important aspects that need to be
taken into consideration in the extraction phase are: data compression – important when large
amounts of data have to be transferred through a public network; and data encryption – important
for security reasons.
4.3.3
Transform
The transformation step is where the ETL system adds value to the data through the changes it makes.
Usually, this phase includes cleaning and transforming the data according to the business rules and
standards that have been established for the DW.
- 44 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Data Cleaning
Data cleaning, also called data cleansing or scrubbing, is part of the complex and important data quality
processes. It deals with detecting and removing errors and inconsistencies from data in order to improve
their quality (Rahm & Hai Do, 2000). As DWs are used for decision making, the correctness of their data
is very important to avoid wrong results. ―Dirty data‖ (e.g.: duplicates, missing data) will produce
incorrect statistics proving the concept of ―garbage in, garbage out‖. Hence, due to the wide range of
possible data inconsistencies and large data volume, data cleaning is considered to be one of the biggest
problems in data warehousing. However, many organizations do not cleanse their data and believe that
this is the responsibility of the source systems. Qualitative or accurate data means that data are (Kimball
& Caserta, 2004):




correct – the values and descriptions in data describe their associated objects truthfully and
faithfully;
unambiguous – the values and descriptions in data can be taken to have only one meaning;
consistent – the values and descriptions in data use one constant notational a convention to
convey their meaning;
complete – the individual values and descriptions in data are defined (not null) for each instance;
and the aggregate number of records is complete.
Even if most often data cleansing is done manually or by low-level programs that are difficult to write and
maintain, data quality tools are available to enhance the quality of the data at several stages in the process
of developing a data warehouse. Cleansing tools can be useful in automating many of the activities that
are involved in cleansing the data: parsing, standardizing, correction, matching and transformation.
A part of the data quality process is represented by quality screens or tests that act as diagnostic filters in
the data flow pipelines (Kimball et al., 2008). What is important here is the action taken when an error is
thrown: 1) halting the process; 2) sending the offending record(s) to a suspense file for later processing;
3) merely tagging the data and passing it through to the next step in the pipeline in order. The last choice
is of course the best one is it offers the possibility of taking care of data quality without aborting the job.
Two other deliverables that can be of help in the data cleaning activities and are usually hard to
implement are (Kimball & Caserta, 2004):


the error-event schema – captures all error events that are vital inputs to data quality
improvement.
the audit dimension assembler – attaches metadata to each fact table as a dimension. This
metadata is available to BI applications for visibility into data quality.
Maturity Assessment Question(s)
Data quality is very important for data warehousing due to the fact that if users do not trust data, they will
not use the DW environment that will be considered a failure. At the same time, it is also one of the
biggest DW challenges (Ponniah, 2001), as high data quality is very hard to achieve. Of course, when
taking a first look at the DW, it is difficult to assess the actual data quality. This is why we included a
question that checks whether a specific organization addresses data quality by identifying and solving
- 45 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
data quality issues. The usage of data quality tools is of course a strong point and an organization that
uses them will definitely get better results.
1) Which answer best describes the data quality system implemented for your ETL?
a) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues:
Solving data quality issues: no
b) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving
quality issues: no
c) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues:
Solving data quality issues: yes
d) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no; Solving
quality issues: no
e) Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes; Solving
quality issues: yes.
Table 18: Data Quality Maturity Assessment Questions.
yes;
data
yes;
data
data
Data Transformation
Besides data cleaning, the transformation system literally transforms the data in accordance with the
business rules and standards that have been established for the DW. Typical transformations that are
implemented in a DW are (Nagabhushana, 2006):






format changes - change data from different sources to a standard set of formats for the DW;
de-duplication – compare records from multiple sources to identify duplicates and merge them
into a unified one;
splitting-up fields/integrating fields – split-up a data item from the source systems into one or
more fields in the DW/integrate two or more fields from the operational systems into a DW field;
derived values – compute derived values using agreed formulas (e.g.: averages, totals, etc.);
aggregation – create aggregate records based on the atomic DW data;
other transformations such as filtering, sorting, joining data from multiple source, transposing or
pivoting, etc.
4.3.4
Load
The DW load system takes the load images created by the extraction and transformation subsystems and
loads these images directly into the DW. A good load system should be able to perform the following
activities (Kimball et al., 2008; Nagabhushana, 2006):







generate surrogate keys – create standard keys for the DW separate from the source systems keys;
manage slowly changing dimensions (SCDs);
handle late arriving data – apply special modifications to the standard processing procedures to
deal with late-arriving fact and dimension data;
drop indexes on the DW when new records are inserted;
load dimension records;
load fact records;
compute aggregate records using base fact and dimension records;
- 46 -
DWCMM: The Data Warehouse Capability Maturity Model


Catalina Sacu
rebuild or regenerate indexes once all loads are complete;
log all referential integrity violations during the load process.
Maturity Assessment Question(s)
The maturity assessment question for this category aims to give an overview on the general complexity
and performance of ETL. Once again, we are not trying to judge how certain activities are done, but only
if they exist. As mentioned before, the latest trend in this field is real-time data warehousing which puts a
lot of pressure on ETL. Hence, the highest level of maturity for ETL involves real-time capabilities.
2) Which answer best describes the complexity of your ETL?
a) Simple ETL that just extracts and loads data into the data warehouse
b) Basic ETL with simple transformations such as: format changes, sorting, filtering, joining, deriving new
calculated values, aggregation, etc and surrogate key generator
c) Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data capture system,
de-duplication and matching system, data quality system
d) More advanced ETL capabilities: error event table creation, audit dimension creation, late arriving data
handler, hierarchy manager, special dimensions manager
e) Optimized ETL for a real time DW (real-time ETL capabilities).
Table 19: ETL Complexity Maturity Assessment Question.
4.3.5
Manage
In order for the DW project to be a success, the ETL processes need to reliable, available and
manageable. This is the reason why (Kimball et al., 2008) consider the management subsystem to be the
forth component of the ETL system. They propose 13 subsystems to be included in this ETL component.
Some of them can also be found in (Nagabhushana, 2006; Chauduri & Dayal, 1997), but they are not
grouped into a separate subsystem of the ETL process. The most important capabilities for a successful
management of the ETL system are:








an ETL job scheduler;
a backup system;
a recovery and restart system – it can be manual or automatic;
a workflow monitor – ensures that the ETL processes are operating efficiently and gathers
statistics regarding ETL execution or infrastructure performance.
a version control and migration system – helps archiving and recovering all the logic and
metadata of the ETL process and then migrate this information to another environment (for
example, from development to test and on to production).
a data lineage and dependency system – identifies the source of a data element and all
intermediate locations and transformations for that data element.
a security system – security is an important consideration for the ETL system and the
recommended method is role-based security on all data and metadata in the ETL system;
a metadata repository management.
- 47 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Maturity Assessment Question(s)
In order to assess the maturity of the management and monitoring of ETL, we separated the necessary
activities in two categories: simple monitoring which is usually done first; and advanced monitoring
which is usually implemented by an organization that already has some experience in this field. A critical
aspect for ETL that can really make the difference is the restart and recovery system. An organization
usually evolves from not having a restart and recovery system at all to a completely automatic restart and
recovery system. However, the latter is very complex and prone to error and, therefore, it is very hard to
achieve.
3) Which answer best describes the management and monitoring of your ETL?
(Definitions:
 Simple monitoring (i.e.: ETL workflow monitor – statistics regarding ETL execution such as pending,
running, completed and suspended jobs; MB processed per second; summaries of errors, etc.);
 Advanced monitoring (i.e.: ETL workflow monitor – statistics on infrastructure performance like CPU
usage, memory allocation, database performance, server utilization during ETL; job scheduler – time
or event based ETL execution, events notification; data lineage and analyzer system))
a) Restart and recovery system: no; Simple monitoring: no; Advanced monitoring: no; Real-time monitoring:
no
b) Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time monitoring:
no
c) Manual restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes / no; Realtime monitoring: no
d) Manual and automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes
/ no; Real-time monitoring: no
e) Completely automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring:
yes; Real-time monitoring: yes.
Table 20: ETL Management and Monitoring Maturity Assessment Question.
4.3.6
ETL Tools
There is a constant debate whether an organization should deploy custom-coded ETL solutions or should
buy an ETL tool suite (Kimball & Caserta, 2004). Using hand-coded ETL proves helpful sometimes
because it offers: object-oriented techniques that can make all the transformations consistent for error
reporting, validation and metadata updates; metadata can be more directly managed; in-house
programmers might be available; unlimited flexibility. However, even if programmers can set up ETL
processes using hand-coded ETL, building such processes from scratch can become complex.
That is the reason why companies are buying more and more often ETL tools for this purpose. There are
some advantages for buying an ETL tool such as: simpler, faster, cheaper development; users without
professional programming skills can use them effectively; integrated metadata repository; automated
generated metadata at every step of the ETL process; in-line encryption and compression capabilities;
good performance for very large data sets; possibility of augmenting the ETL tool with selected
processing modules hand coded in an underlying programming language.
- 48 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Maturity Assessment Question(s)
As already mentioned, ETL can be built by using a programming language or by using an ETL tool, the
latter being the more optimum solution. A company that uses hand-coded ETL usually does not have a
very complex ETL process which shows a low level of maturity regarding ETL capabilities. However, in
both cases, some standard scripts are sometimes needed which can increase the performance of ETL.
From the expert interviews we had and from the exploratory case study we did, we came up with another
possibility of generating ETL: complete ETL generated from metadata. This is rarely applied in practice
nowadays, but it is the desired solution for the future.
4) Which answer best describes the usage of an ETL tool in your organization?
a) Level 1 – Only hand-coded ETL
b) Level 2 – Hand-coded ETL and some standard scripts
c) Level 3 – ETL tool(s) for all the ETL design and generation
d) Level 4 – Standardized ETL tool and some standard scripts
e) Level 5 – Complete ETL generated from metadata.
Table 21: ETL Tools Maturity Assessment Question.
4.3.7
ETL Metadata Management
ETL is responsible for the creation and use of much of the metadata describing the DW environment.
Therefore, it is important to capture and manage all possible types of metadata for ETL: business,
technical and process metadata. Nevertheless, not many organizations manage to do this and thus, we
decided to include the following maturity question regarding ETL in our assessment.
Maturity Assessment Question(s)
5) To what degree is your metadata management implemented for your ETL?
a) Very low – No metadata management
b) Low – Business and technical metadata for some ETL
c) Moderate – Business and technical metadata for all ETL
d) High – Process metadata is also managed for some ETL
e) Very high – All types of metadata are managed for all ETL.
Table 22: ETL Metadata Management Maturity Assessment Question.
4.3.8
ETL Standards
A general overview on standards used in data warehousing was given in 4.2.5. Standards specific to ETL
are related to: naming conventions, set-up standards, recovery and restart system, etc. The maturity
questions and stages below are straightforward.
Maturity Assessment Question(s)
6) To what degree have you defined and documented standards (e.g.: naming conventions, set-up standards,
recovery process, etc.) for your ETL?
a) Very low – No standards defined
b) Low – Few standards defined for ETL
- 49 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
c) Moderate – Some standards defined for ETL
d) High – Most standards defined for ETL
e) Very high – All the standards defined for ETL.
7) To what degree have you implemented standards (e.g.: naming conventions, set-up standards, recovery process,
etc.) for your ETL?
a) Very low – No standards defined
b) Low – Few standards defined for ETL
c) Moderate – Some standards defined for ETL
d) High – Most standards defined for ETL
e) Very high – All the standards defined for ETL.
Table 23: ETL Standards Maturity Assessment Questions.
4.4 BI Applications
4.4.1
What are BI Applications?
BI applications are part of the front-room component of the DW architecture (Kimball et al., 2008) and
are sometimes referred to as ―front-end‖ tools (Chauduri & Dayal, 1997). They are what the end-users see
and hence, are very important in order for a DW to be considered a successful one. According to (March
& Hevner, 2007), a crucial point for achieving DW implementation success is the selection and
implementation of appropriate end-user analysis tools, because business benefits of BI are only gained
when the system is adopted by its intended end-users. This is why BI applications must meet several
design requirements such as (Kimball et al., 2008): be correct – BI applications must provide accurate
results; perform well – queries should have a satisfactory response time; be easy to use – BI applications
should be customized for each category of users; have a nice interface – BI applications should be clear
and have an attractive design; be a long-term investment – BI applications must be properly documented,
maintained, enhanced and extended.
4.4.2
Types of BI Applications
Throughout time, BI applications have evolved from (simple) predefined reporting to (advanced) datamining tools to fulfill users‘ analytical needs (Breitner, 1997). Also, according to (Azvine et al., 2006),
traditional BI applications fall into the following categories sorted by ascending complexity:



report what has happened – standard reporting and query applications (i.e.: static/preformatted
reports; interactive/parameter-driven reports);
analyze and understand why it has happened – ad-hoc reporting and online analytical processing
(OLAP); visualization applications (i.e.: dashboards, scorecards);
predict what will happen – predictive analytics (i.e.: data and text mining).
However, in the last couple of years, due to the development of real-time data warehousing, a new
category of BI applications has developed called operational BI and closed-loop applications (Kimball et
al., 2008). As the complexity of the BI applications contributes to the maturity of a DW environment, we
will include a maturity question regarding this aspect and therefore, we will give a short overview on each
type of BI applications in the remainder of this paragraph.
- 50 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Standard Reporting and Query Applications
This category of BI applications is usually considered to be the entry level BI tooling, providing end users
with a core set of information about what is happening in a particular area of the business (Kimball et al.,
2008). Standard reports are the reports the majority of non-technical business users look at every day.
They represent an easy-to-use means to get the needed information with a very short learning curve. As
presented above, two types of standard reporting can be distinguished, based on the level of data
interactivity:


static/preformatted reporting – the most basic form of reporting which can be seen as a
repeatable, pre-calculated and non-interactive request for information. It is characterized by rigid
evaluations of business facts presented in a standard format on a routine basis to a targeted
audience (usually represented by casual users) (Eckerson, 2009).
interactive/parameter-driven reporting – this kind of reporting offers the possibility of creating
reports with dynamic content. End users now have some flexibility as they can choose from a
predefined set of parameters to filter reports content to their individual preferences and needs
(Turban et al., 2007). Once users get the view of the data they want, they can save the view as a
report and schedule it to run on a regular basis. This allows reports designers to create reports that
can serve multiple end users categories.
Analytic Applications
Analytic applications are more complex than standard reports. Although the latter offer the possibility of
creating reports of all shapes and detail levels, in many cases additional information is required (Varga &
Vukovic, 2008). This places higher requirements on the DW architecture and also on the end-users IT and
analytical skills. Analytic applications offer the possibility of ad-hoc (or online) data access and complex
analysis through a user friendly interface based system. In this way, users can formulate their own queries
directly into the data without the need of in-depth knowledge of SQL or other database query languages.
Probably the best known analytic technique is the Online Analytical Processing (OLAP), term coined by
E.F. Codd in 1993.
OLAP interfaces provide a fairly simple, yet extremely flexible navigation and presentation environment
that enables end-users to gain insight into data through fast, dynamic, consistent, interactive access to a
wide variety of possible views of information. This is possible due to the fact that data is characterized by
multidimensionality, being structured as a cube, designed with dimensions and facts. OLAP users can
navigate through the data cube using several operations such as (Breitner, 1997): roll-up (increasing the
level of aggregation) and drill-down (decreasing the level of aggregation or increasing detail) along one
or more dimension hierarchies, slice and dice (selection and projection to a certain layer or sub-cube) or
pivot (re-orienting the multidimensional view of data). It can be seen that although the data cube is a
simple structure, the large number of alternatives, including many numeric facts and dimensions and
many hierarchies or abstraction levels combine to form an immense universe of queries that can be
explored via an OLAP interface (Tremblay et al., 2007). Through OLAP, users can generate fast reports
regardless of database size and complexity and they are allowed to define new ad-hoc calculations in any
desired way without having advanced knowledge of SQL.
- 51 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Visualization Applications
Due to the flood of data available from information systems, standard reporting and analytic applications
are often not enough for business analysts and decision-makers to make sense out of the knowledge they
contain. This is the reason why, especially when dealing with large amounts of data, visualization
techniques can be very useful to facilitate data analysis. Information visualization is defined by (Chung et
al., 2005) as ―a process of constructing a visual presentation of abstract quantitative data. The
characteristics of visual perception enable humans to recognize patterns, trends and anomalies inherent in
the data with little effort in a visual display.‖ The main visualization applications that are usually used in
BI are dashboards and scorecards. According to (Eckerson, 2006), there are three types of performance
dashboards: operational dashboards – used to track core operational processes; tactical dashboards – used
by managers and analysts to track and analyze departmental activities, processes and projects; strategic
dashboards – used by executives and staff to chart their progress toward achieving strategic objectives.
(Eckerson, 2006) states that dashboards are part of the first two categories, whereas scorecards are use at
the strategic level.
Furthermore, it can be said that dashboards present various key performance indicators (KPIs) (i.e.: key
measures crucial to business strategy that must link to the organization‘s performance) in one screen view
with intuitive displays of information (e.g.: tables, graphs, charts, dials, gauges, etc.) similar to an
automobile control panel. Dashboards support status reporting and alerts generation across multiple data
sources at a high level, but also allow drill down to more specific data (Kimball et al., 2008).
As said above, scorecards are actually dashboards developed at a strategic level. They help executives
monitor their progress toward achieving strategic objectives. A scorecard can track an organization‘s
performance by measuring business activity at a summarized level and comparing these values to
predefined targets. In this way, executives can determine what actions should be taken in order to improve
performance. There are more types of scorecards, but the most implemented one is the balanced scorecard
defined by (Kaplan & Norton, 1992).
Data and Text Mining Applications (Predictive Analytics)
Data and text mining applications are sophisticated BI applications that involve advanced methods for
data analysis. It is a process that requires a lot of data which need to be in a reliable state before it can be
subjected to the data mining process. A newer technique is text mining which refers to the process of
deriving high-quality information from text. Data and text mining can also be found under the name of
knowledge discovery or the newer term, predictive analytics. Data mining is defined by (Holsheimer &
Siebes, 1994) as being ―the search for relationships and global patterns that exist in large databases, but
are „hidden‟ among the vast amount of data‖; these relationships can then offer valuable knowledge
about the database and the objects in the database. However, other researchers such as (Fayyad et al.,
1996) consider that actually knowledge discovery refers to the overall process of discovering useful
knowledge from data; whereas data mining refers to a particular step in this process that consists of
―applying data analysis and discovery algorithms that, under acceptable computational efficiency
limitations, produce a particular enumeration of patterns over the data‖ (Fayyad et al., 1996).
- 52 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Data mining relies on known techniques from fields like machine learning, pattern recognition, and
statistics. It also uses a variety of methods such as (Fayyad et al., 1996): classification, regression,
clustering, summarization, dependency modelling, change and deviation detection.
Operational BI and Closed-loop Applications
This category of applications is part of the real-time data warehousing requirement. It includes the use of
applications that are more sophisticated than typical operational reports, but leverage the rich historical
context across multiple business processes available in the DW to guide operational decision making.
These applications also frequently include transactional interfaces back to the source systems. The goal of
operational BI applications is to reduce the analysis latency – the time it takes to inform the person in
charge of data analysis that new data has to be analyzed, the time needed to choose appropriate analysis
models and the time to process the data and present the results (Seufert & Schiefer, 2005). Sometimes,
these applications may be produced by accessing live operational data. In other cases, when a certain
degree of data latency can be tolerated, the reports are produced using the information collected by the
(near) real-time DW. Hence, in order to get accurate operational results, activities and processes involved
in a DW project have to be optimized.
Maturity Assessment Question(s)
As can be seen from the short overview on the BI applications, the types of BI applications supported by
the DW environment are an important indicator on its maturity. For example, an organization that
develops predictive analytics certainly has experience in developing less complex applications such as adhoc reports or visualization applications. The highest level of maturity refers to the development of
closed-loop and operational (real-time) BI applications as it is the last trend in this field and not many
organizations have the necessary skills and experience to develop them. As user requirements can change
very often and the time to deliver the updated BI applications is rather short, a characteristic that can act
as a differentiator is the usage of standardized objects (e.g.: KPIs, metrics, attributes, templates, etc.). This
being said, the maturity questions and stages can be seen below.
1) Which types of BI applications best describe the highest level purpose of your DW environment?
a) Level 1 – Static and parameter-driven reports and query applications
b) Level 2 – Ad-hoc reporting; online analytical processing (OLAP)
c) Level 3 – Visualization techniques: dashboards and scorecards
d) Level 4 – Predictive analytics: data and text mining; alerts
e) Level 5 – Closed-loop BI applications; real-time BI applications.
2) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) used in your BI applications?
a) Very low – Objects defined for every BI application
b) Low – Some reusable objects for similar BI applications
c) Moderate – Some standard objects and templates for similar BI applications
d) High – Most similar BI applications use standard objects and templates
e) Very high – All similar BI applications use standard objects and templates.
Table 21: Table 24: BI Applications Maturity Assessment Question.
- 53 -
DWCMM: The Data Warehouse Capability Maturity Model
4.4.3
Catalina Sacu
BI Applications Delivery Method
As end users are interested only in the results they get from the BI applications, the easiness of accessing
and delivering these results is critical for the success of the DW solution. The main BI applications
delivery methods are:



Physically (e.g.: on paper) or electronically (e.g.: by e-mail) delivered reports. Even if this
method is easy to implement, it is the least mature and efficient way of delivering BI applications.
Reports can be delivered manually or automatically.
Direct tool-based interface. This is a more evolved delivery method as it offers a better interface
for the users to use when they want to access their reports. It involves developing a set of reports
and providing them to the users directly using the standard data access tool interface (Kimball et
al., 2008). However, there might be some integration or accessibility problems if an organization
uses more BI tools.
A BI portal. Lately, the Web has become a popular environment for BI applications. The result of
this is the development of a new delivery method, the BI portal, which is also the most evolved
and difficult to implement and maintain. A BI portal will give the users a well organized, useful,
easily understood place to find the tools and information they need (Kimball et al., 2008;
Ponniah, 2001). Besides the structured BI applications, the BI portal should also offer functions
such as information center and help, discussion forum, alerting, metadata browser, etc. A
successful BI portal also needs to be highly interactive and always up-to-date.
Maturity Assessment Question(s)
From the information presented on BI applications delivery method, the maturity question we created for
assessing this characteristic is straightforward.
3) Which BI applications delivery method best describes the highest level purpose of your DW?
a) Level 1 – Reports are delivered manually physically (e.g.: on paper) or electronically (e.g.: by e-mail)
b) Level 2 – Reports are delivered automatically by email
c) Level 3 – Direct tool-based interface
d) Level 4 – A BI portal with basic functions: subscriptions, discussions forum, alerting
e) Level 5 – Highly interactive, business process oriented, up-to-date portal (no differentiation between
operational and BI portals).
Table 25: BI Applications Delivery Method Maturity Assessment Question.
4.4.4
BI Applications Tools
As we saw for data modelling and ETL, the usage of tool(s) can really make the difference between
organizations. This is the reason why we decided to also include a question regarding this aspect for BI
applications. After having the expert interviews, we decided that a very low maturity level is represented
by the usage of different BI tools for each data mart, whereas the highest maturity stage is reached when
there is one standardized tool for main stream BI applications (i.e.: reporting and visualization
applications which are most often developed) and one for specific BI applications (i.e.: data mining,
financial analysis which are harder to implement, and usually are specific to each department).
- 54 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Maturity Assessment Question(s)
4) Which answer best describes your current BI tool usage?
a) Level 1 – BI tool related to the data mart
b) Level 2 – More than two tools for main stream BI (i.e.: reporting and visualization applications)
c) Level 3 – One tool recommended for main stream BI, but each department can use their own tool
d) Level 4 – One standardized tool for main stream BI, but each department can use their own tool for specific
BI applications (i.e.: data mining, financial analysis, etc.)
e) Level 5 – One standardized tool for main stream BI and one standardized tool for specific BI applications.
Table 26: BI Tools Maturity Assessment Question.
4.4.5
BI Applications Metadata Management
As BI applications are what the end user sees, an important aspect is the accessibility of metadata. An
overview on how this can be achieved was offered in 4.1.4. Therefore, an organization can evolve from
showing no metadata to users to completely integrate metadata with the BI applications (e.g.: metadata
can be accessed through one button push on the attributes).
Maturity Assessment Question(s)
5) Which answer best describes the metadata accessibility to users?
a) Very low – No metadata available
b) Low – Some incomplete metadata documents that users ask for periodically
c) Moderate – Complete up-to-date metadata documents sent to users periodically or available on the intranet
d) High – Metadata is always available through a metadata management tool, different from the BI tool
e) Very high – Complete integration of metadata with the BI applications (e.g.: metadata can be accessed
through one button push on the attributes, etc.).
Table 27: BI Applications Metadata Management Maturity Assessment Question.
4.4.6
BI Applications Standards
A general overview on standards used in data warehousing was given in 4.2.5. Standards specific to BI
Applications include: naming conventions, generic transformations, logical structure of attributes and
measures, etc. Once again, we will not assess what standards are defined or implemented, but if this is
done. The maturity questions and stages below are straightforward.
Maturity Assessment Question(s)
6) To what degree have you defined and documented standards (e.g.: naming conventions, generic
transformations, logical structure of attributes and measures) for your BI applications?
a) No standards defined
b) Few standards defined for BI applications
c) Some standards defined for BI applications
d) Most standards defined for BI applications
e) All the standards defined for BI applications.
7) To what degree have you implemented standards (e.g.: naming conventions, generic transformations, logical
structure of attributes and measures) for your BI applications?
- 55 -
DWCMM: The Data Warehouse Capability Maturity Model
a)
b)
c)
d)
e)
Catalina Sacu
No standards implemented
Few standards implemented for BI applications
Some standards implemented for BI applications
Most standards implemented for BI applications
All the standards implemented for BI applications.
Table 28: BI Applications Standards Maturity Assessment Questions.
4.5 Summary
In this chapter we took a closer look at the DW technical solution category and its main sub-categories:
general architecture and infrastructure, data modelling, ETL and BI applications. For each of them we
identified the most important characteristics that might influence the maturity of the DW solution and we
introduced the maturity assessment questions. We will continue with the second category in our model –
the DW organization and processes – in the next chapter.
- 56 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
5 DW Organization and Processes
When assessing the maturity of a DW technical solution, the processes and roles involved in the project
also need to be analyzed. A good technical solution cannot be developed without the processes
surrounding it as there is a strong interconnection between the two parts. It is more probable that an
organization with standardized processes and formalized development roles will develop a better DW
solution. At the same time, an organization cannot improve its processes without having some experience
with previous DW projects. Therefore, in this chapter we will take a closer look at the second part of the
DW maturity assessment questionnaire, the one regarding the organizational roles and processes
necessary to develop and maintain a DW solution.
5.1 DW Development Processes
A DW solution can be considered a software engineering project with some specific characteristics. And,
therefore, as any software engineering project, it will go through several development stages (Moss &
Atre, 2003). There have been several models or paradigms of software development defined in literature
and applied in practice. Some of the most known ones are: the waterfall model, spiral development,
iterative and incremental development, agile development, etc. For an overview on these models, see
(Sommerville, 2007).
Since DW/BI is an enterprise-wide evolving environment that is continually improved and enhanced
based on feedback from the business community, the best approach for its development is the iterative
and incremental development (Kimball et al., 2008; Ponniah, 2001). Due to its complexity, the approach
for a DW project has to include iterative tasks going through cycles of refinement. (Kimball et al., 2008)
also suggest that agile techniques fit best with the development of BI applications. Designing and
developing the analytic reports and analyses involve unpredictable, rapidly changing requirements. The
BI team members need to work in close proximity to the business, so that they can be readily available
and responsive in order to release incremental BI functionality in a matter of weeks. However, one size
seldom fits all, and therefore, it is important for organizations to be able to address the right methodology
for each DW layer.
Maturity Assessment Question(s)
As it is hard to judge which software development paradigm is better and more mature, the first maturity
question on development processes is a more general one and it refers to how the DW development
processes map to the CMM levels: whether they are done ad-hoc or they are standardized. And if they are
standardized, it is important to know if they are measured against defined goals and continuously
improving.
1) Which answer best describes the DW development processes in your organization?
a) Level 1 – Ad-hoc development processes; no clearly defined development phases (i.e.: planning,
requirements definition, design, construction, deployment, maintenance)
b) Level 2 – Repeatable development processes based on experience with similar projects; some development
phases clearly separated
c) Level 3 – Standard documented development processes; iterative and incremental development processes
with all the development phases clearly separated
- 57 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
d) Level 4 – Development processes continuously measured against well-defined and consistent goals
e) Level 5 – Continuous development process improvement by identifying weaknesses and strengthen the
process proactively, with the goal of preventing the occurrence of defects.
Table 29: DW Development Processes General Maturity Assessment Question.
5.1.1 DW Development Phases
No matter of the chosen DW development model, a lifecycle approach is needed in order to accomplish
all the major objectives in the system development process (Ponniah, 2001). A DW system consists of
numerous tasks, technologies, and team member roles. It is not enough to have the perfect data model or
best-of-breed technology. The many facets of a DW project need to be coordinated and the lifecycle
approach can do that by breaking down the project complexity and enforcing orderliness and a systematic
approach to building the DW (Kimball et al., 2008). However, a one-size fits all lifecycle approach will
not work for a DW project. The lifecycle approach has to be adapted to the special needs of the
organization‘s DW project. But, no matter of the situational factors, the main high level phases and tasks
required for an effective DW implementation are (Kimball et al., 2008; Moss & Atre, 2003):







Project planning and management
Requirements definition
Design
Development
Testing and acceptance
Deployment/production
Growth, maintenance and monitoring.
As the DW environment is continuously changing and improving, the first six phases are usually
considered to be project-based, whereas the maintenance and monitoring should be done on an ongoing
basis. However, many authors report that even today, software organizations do not have any defined
processes for their software maintenance activities (April et al. 2004). (van Bon, 2000) confirms the lack
of process management in software maintenance and that it is a mostly neglected area. Traditionally,
maintenance has been depicted as the final activity of the software development process (Schneidewind,
1987). (Bennett, 2000) has a historical view of this problem, tracing it back to the beginning of the
software industry when there was no difference between software development and software
maintenance. But, starting with the 1980s, software maintenance began to be treated as a sequence of
activities and not as the final stage of a software development project. Several standards were developed
especially for software maintenance and nowadays, many organizations make this distinction between the
development phases and the maintenance and monitoring processes. This is also the reason why we make
this distinction, especially that in a DW project, maintenance and monitoring activities take a lot of time
and effort. Therefore, we will elaborate on the first six phases in this section and will continue with the
maintenance and monitoring activities in the DW service processes part.
- 58 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
5.1.1.1 Project Planning and Management
One of the reasons why so many DW projects fail is improper project planning and inadequate project
management (Ponniah, 2001). DW project planning is not a one-time activity. Since project plans are
usually based on estimates, they must be adjusted constantly. A solid and, at the same time, flexible DW
project plan could be the foundation for a successful DW initiative. Project planning usually consists of
several important activities (Lewis, 2001): create a work breakdown structure listing activities, tasks, and
subtasks; estimate time, cost and resource requirements; determine the critical path based on the task and
resource dependencies; create the detailed project plan. As a DW project is very complex and many risks
can affect its development, a very important step here is project risk management. It involves three main
activities: identify possible risks and threats; quantify threats and risks by assigning a risk priority
number; and develop contingency plans to deal with risks that cannot be ignored.
However, just planning the project is not enough for a successful DW implementation. The project also
needs to be managed during its development. First, the DW project officially begins with the project
kickoff meeting in order to get the entire project team on the same page in terms of where the project
stands and where it plans to go (Kimball et al., 2008). Once the project has started, the project status must
be regularly monitored (Lewis, 2001). The DW project lifecycle requires the integration of numerous
resources and tasks that must be brought together at the right time to achieve success. Monitoring project
status is key to achieving this coordination. Another important problem in project management is the
management of scope changes. This is usually done by adopting issue tracking or change management
methodologies.
Of course, throughout the project, a lot of changes might happen and this is why it is a good idea to
maintain the project plan by updating and evaluating it periodically. Moreover, consolidated project
documentation will help ease the burden keeping pace with the unending nature of the DW project.
Documenting project assumptions and decision points is also helpful in the event that the deliverables do
not meet expectations. However, many organizations ignore the importance of documentation, and if time
pressures mount, it will be the first item to be eliminated. Finally, in order to learn from previous
mistakes, projects and project management should always be reviewed and evaluated. This will offer
some lessons learned that will determine the same mistakes to be avoided in the future (Lewis, 2001).
Maturity Assessment Question(s)
As explained in this section, project planning and management is crucial for a DW project success. This
is why we created a maturity question regarding this part of the development processes. We included in
the answers the most important aspects: project planning and scheduling; project risk management;
project tracking and control; documentation; and evaluation and assessment. Therefore, an organization
which does not have any of these activities implemented is on the first level of maturity, whereas one that
takes care of all these activities is on the highest level of maturity regarding project planning and
management.
2) Which answer best describes your DW project management?
a) Level 1 – Project planning and scheduling: no; project risk management: no; project tracking and control:
no; standard and efficient procedure and documentation, evaluation and assessment: no
b) Level 2 – Project planning and scheduling: yes; project risk management: no; project tracking and control:
- 59 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
no; standard and efficient procedure and documentation, evaluation and assessment: no
Level 3 – Project planning and scheduling: yes; project risk management: no; project tracking and control:
yes; standard and efficient procedure and documentation, evaluation and assessment: no
d) Level 4 – Project planning and scheduling: yes; project risk management: yes; project tracking and control:
yes; standard and efficient procedure and documentation, evaluation and assessment: no
e) Level 5 – Project planning and scheduling: yes; project risk management: yes; project tracking and control:
yes; standard and efficient procedure and documentation, evaluation and assessment: yes.
Table 30: Project Management Maturity Assessment Question.
c)
5.1.1.2 Requirements Definition
In a DW, users‘ business requirements represent the most powerful driving force (Ponniah, 2001) as they
impact virtually every aspect of the project. Also, as end users alone are able to define the business goals
of the DW systems correctly, they should be enabled to specify information requirements by themselves
(Hansen, 1997).
The DW environment is an information delivery system where the users themselves will access the DW
repository and create their own outputs. It is therefore extremely important that the DW should contain
the right elements of information in the most optimal formats in order for the users to get the results they
want. Every task that is performed in every phase in the development of the DW is determined by the
requirements. Every decision made during the design phase is totally influenced by the requirements.
Because requirements form the primary driving force for every phase of the development process, special
attention needs to be paid to the requirements definition phase in order to make sure that it contains
adequate details to support each phase.
Requirements are usually gathered from the user community using two basic interactive techniques
(Kimball et al., 2008; Moss & Atre, 2003; Ponniah, 2001):


interviews – they are conducted with individuals or small groups (i.e.: two or three persons at a
time) and they represent a good approach when details are intricate.
facilitated sessions – they are larger group sessions of ten to twenty people led by a facilitator and
they more appropriate after getting a baseline understanding of the requirements,
but useful information can also be extracted from the review of existing documentation from the user and
IT departments.
Another important aspect which is often neglected in the requirements definition phase is formal
documentation (Kimball et al., 2008; Ponniah, 2001) which is essential for several reasons. First, the
requirements definition document is the foundation for the next phases and it becomes the encyclopedia
of reference material as resources are added to the DW team. If project team members have to leave the
project for any reason at all, the project will not suffer from people walking away with the knowledge
they have gathered. Second, documentation helps the team to crystallize and better understand the
interview content. Finally, formal documentation will also validate the findings when reviewed with the
users.
- 60 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Maturity Assessment Question(s)
As shown in this paragraph, the requirements definition phase is very important for the DW environment
and special attention should be paid to it. A solid requirements definition follows a standard methodology
and has a formal requirements document. Also, even if not usually done, causal analysis meetings to
identify common bottlenecks causes in this step and subsequent elimination of these causes could be very
beneficial for the DW development process.
3) Which answer best describes the requirements definition phase for your DW project?
a) Level 1 – Ad-hoc requirements definition; no methodology used
b) Level 2 – Methodologies differ from project to project; interviews with business users for collecting the
requirements
c) Level 3 – Standard methodology for all the projects; interviews and group sessions with both business and
IT users for collecting the requirements
d) Level 4 – Level 3) + qualitative assessment and measurement of the phase; requirements document also
published
e) Level 5 – Level 4) + causal analysis meetings to identify common bottlenecks causes and subsequent
elimination of these causes.
Table 31: Requirements Definition Maturity Assessment Question.
5.1.1.3 Design/ Development/ Testing and Acceptance/ Deployment
Once the business requirements are gathered and defined, the DW team can continue with designing the
data model and the physical database, designing and developing the ETL and the BI applications. Then,
the developed DW with all its components needs to be tested, accepted by both the technical and business
parts, and finally, the system can be deployed or put into production. We will give a short overview on
each of these phases further in this paragraph and then present the maturity questions regarding this part
of the DW development processes.
Design
The design phase refers to designing the data models with all three levels (i.e.: conceptual, logical,
physical), the ETL and the BI applications. Most of the aspects related to the design phase were already
mentioned in the DW technical solution part where we elaborated on each technical component.
However, several things regarding the processes can be added here.
First, the data modelling process itself starts during the business requirements activity when the
preliminary requirements definition document is created. Based on this, the design team will first develop
a high level conceptual model, and then continue with the logical and physical data models. What is
important in this process is to remember that the data modelling is an iterative process and to have a
preparation period beforehand which includes activities such as: identify the roles and participants
required, review the business requirements document, set up the modelling environment, develop
standards and obtain appropriate facilities and supplies. The design of ETL and BI applications also
involves several activities in order to be successful: create a plan and documentation, do some resource
planning, develop default strategies and standards (Kimball et al., 2008).
- 61 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Development
The development phase includes the building of the physical databases and the actual implementation of
ETL and BI applications (Kimball et al., 2008). The physical databases are built when the data definition
language (DDL) is run against the database management system (DBMS). ETL programs must be
developed for the two sets of load processes: one-time historic load and incremental load. If DBMS load
utility is used to populate the BI target databases, then only the extract and transformation programs need
to be written, including the programs that create the final load files. If an ETL tool is used, the
instructions (i.e.: technical metadata) for the ETL tool must be created. BI applications development
involves using the front end tool building environment and writing the programs and scripts for the
reports, queries, front-end interface, and online help function (Moss & Atre, 2003).
Testing and Acceptance
The DW system is a complex software project that needs to be tested extensively before put in
production. However, even if testing is critical for DW success, many organizations underestimate the
importance and the time needed for these tasks. The most important activities during this step are
(Golfarelli & Rizzi, 2009; Kimball et al., 2008; Moss & Atre, 2003):




unit testing – all ETL modules and BI applications must be unit tested to prove that they compile
without errors, but also to verify that they perform their functions correctly, to trap all potential
errors, and to see if they produce the right results. It is also recommended that a different role
than the developer should do this unit testing.
integration and regression testing – once all the individual ETL modules and BI applications
have been unit tested, the entire system needs to be tested. This is done with integration testing on
the first release and with regression testing on subsequent releases. In this way, the completely
integrated system can be verified whether it meets its requirements or not. Regression testing
focuses on finding defects after a major change has occurred and it uncovers all the test results
that deviate from the correct answers.
performance testing – a performance test will indicate whether the system performs well both for
loads and queries and reports.
acceptance testing – acceptance tests are done by the users of the DW in order to verify that the
system meets the mutually agreed-upon requirements. The acceptance tests include the validation
of the ETL process, but more importantly for the end users, they should determine the overall
usability of the BI applications and whether the returned results are the desired ones. In order for
these tests to be effective, users‘ training is usually done beforehand.
Besides doing these activities, it is also important to formalize and follow a standard procedure for the
testing and acceptance phase. In this way, it would be much easier to keep track of the tests and their
results and, at the same time, evaluate the testing and acceptance phase (Kimball et al., 2008).
Deployment (Production)
The last step in order to finish the DW implementation is to deploy it by transferring the DW from testing
to production. The first deployment is the easiest one. After this, things get a little bit more complicated
- 62 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
as any modifications to the system should be accomplished with minimal disruption to the business user
community. For more details on DW deployment techniques, see (Kimball et al., 2008).
Maturity Assessment Question(s)
As a lot of aspects regarding the design phase were analyzed in the DW technical part, and it is difficult to
do a high level assessment for the development and deployment phases, we decided to assess the testing
and acceptance phase, which is a critical one for DW success. It will show the main activities involved in
this phase and offer the possibility for the organization to choose the ones implemented by them. The
question will be scored through normalization as further explained in the expert evaluation chapter.
4) Which of the following activities are included in the testing and acceptance phase for your DW project?
a) Unit testing by another person
b) System integration testing
c) Regression testing
d) User training
e) Acceptance testing
f) Standard procedure and documentation for testing and acceptance
g) External assessments and reviews of testing and acceptance.
Table 32: Testing and Acceptance Maturity Assessment Question.
Development/ Testing/ Acceptance/ Production Environments
To support all the phases presented in this paragraph, organizations usually set up different environments
for different purposes (Moss & Atre, 2003):




The development environment, where the programs and scripts are written and tested by the
developers.
The testing environment, where the DW system with all its components is tested.
The acceptance environment, where the users do acceptance tests.
The production environment, where the DW actually runs after being rolled out.
The implemented environments can influence the quality and performance of the DW. While smaller
organizations may have only two environments (i.e.: development and production), others usually have at
least three different environments. Another important aspect is the way the migration between
environments is done: manually or automatically, the latter being of course the optimum one.
Maturity Assessment Question(s)
The maturity question chosen for this aspect is straightforward and self-explanatory after reviewing the
arguments mentioned above. As standards are crucial for data warehousing, we also did an assessment for
the
5) To what degree is there a separation between the development/test/acceptance/deployment environments in
your organization?
a) Very low – no separation between environments
b) Low – two separate environments (i.e.: usually development and production) with manual transfer between
them
c) Moderate – some separation between environments (i.e.: at least three environments) with manual transfer
- 63 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
between them
d) High – some separation between environments (i.e.: at least two environments) with automatic transfer
between them
e) Very high – all the environments are distinct with automatic transfer between them.
6) To what degree has your organization defined and documented standards for developing, testing and deploying
DW functionalities (i.e.: ETL and BI applications)?
a) Very low – no standards defined
b) Low – few standards defined
c) Moderate – some standards defined
d) High – a lot of the standards defined
e) Very high – a comprehensive set of standards defined
7) To what degree has your organization implemented standards for developing, testing and deploying DW
functionalities (i.e.: ETL and BI applications)?
a) Very low – no standards implemented
b) Low – few standards implemented
c) Moderate – some standards implemented
d) High – a lot of the standards implemented
e) Very high – a comprehensive set of standards implemented.
Table 33: Development/ Testing/ Acceptance/ Production Maturity Assessment Questions.
5.1.2 The DW/BI Sponsor
As already mentioned, strong support and sponsorship from senior business management is critical for a
successful DW initiative. However, many organizations seem to overlook this aspect and ignore its
importance. No other venture unifies the information view of the entire corporation as the corporation‘s
DW does. The entire organization is involved and positioned for strategic advantage. Therefore, it is
important to have sponsorship from the highest levels of management to keep focus and satisfy
conflicting requirements (Ponniah, 2001). The DW sponsor needs to be more than an IT project manager
or IT director. Effective business sponsors share several characteristics (Kimball et al., 2008). First, they
have a vision for the potential impact of a DW/BI solution and can visualize how improved access to
information will result in incremental value to the business. Second, strong business sponsors are
influential leaders within the organization and they are demanding, but at the same time, realistic and
supportive. It is important that they have a basic understanding of DW/BI concepts, including the iterative
development cycle to avoid unrealistic expectations. Effective sponsors are able to deal with short-term
problems and project setbacks and they are willing to compromise.
Maturity Assessment Question(s)
All this being said, some conclusions can be drawn:



The DW project sponsor needs to be from the business department.
It is better to have multiple strong sponsors within the organization.
The best sponsorship involves business-driven, cross-departmental sponsorship including top
level management. Therefore, the DW/BI initiative is integrated in the company‘s strategy and
processes with continuous support and budget.
Therefore, the maturity question derived from these conclusions is:
8) Which answer best describes the sponsor for your DW project?
- 64 -
DWCMM: The Data Warehouse Capability Maturity Model
a)
b)
c)
d)
e)
Catalina Sacu
Level 1 – No project sponsor
Level 2 – Chief information officer (CIO) or an IT director
Level 3 – Single sponsor from a business unit or department
Level 4 – Multiple individual sponsors from multiple business units or departments
Level 5 – Multiple levels of business-driven, cross-departmental sponsorship including top level
management sponsorship (BI/DW is integrated in the company process with continuous budget).
Table 34: DW/BI Sponsorship Maturity Assessment Question.
5.1.3 The DW Project Team and Roles
As in any type of project, the success of a DW project also depends on the project team. A DW project is
similar to other software projects in that it is human-intensive. It takes several trained and especially
skilled persons to form the project team. Two of the factors that can break a project are: complexity
overload and responsibility ambiguity. But, the bad influence of these factors can be overcome by putting
the right person in the right job (Ponniah, 2001). Therefore, organizing the project team for a DW project
has to do with matching diverse roles and responsibilities with proper skills and levels of experience. A
DW project requires a number of different roles and skills from both the business and IT communities
during its lifecycle. The main roles refer to (Kimball et al., 2008; Ponniah, 2001):



Sponsorship and management (e.g.: business sponsor, project manager, etc.)
Development roles (e.g.: business analyst, data steward, data quality analyst, data modeler,
metadata manager, ETL architect, ETL developer, BI architect, BI developer, technical architect,
security manager, DW tester, etc.)
Monitoring and maintenance roles (e.g.: help desk, operations manager, etc.)
However, there is seldom a one-to-one relationship between roles and individuals. It does not really
matter whether a person fills multiple roles on the DW project. What really matters is to have these roles
and responsibilities formalized and actually implemented. It is also important to do periodic evaluation
and assessments of the performance of roles in order to check for training requirements and solve skillrole mismatches (Humphries et al., 1999; Nagabhushana, 2006).
Maturity Assessment Question(s)
As it is difficult to say whether a team with more roles is more mature than one with less roles, we will
assess here whether the role definition and implementation has been done. Besides this, a company on a
higher level of maturity would also do periodic assessment and evaluation of roles.
9) Which answer best describes the role division for the DW development process?
a) Level 1 – No formal roles defined
b) Level 2 – Defined roles, but not technically implemented
c) Level 3 – Formalized and implemented roles and responsibilities
d) Level 4 – Level 3) + periodic peer reviews (i.e.: review of each other‘s work)
e) Level 5 – Level 4) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles
and match the needed roles with responsibilities and tasks).
Table 35: DW Project Team and Roles Maturity Assessment Question.
- 65 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
5.1.4 DW Quality Management
The purpose of DW Quality Management is to provide management with appropriate visibility into the
development process being used by the DW project and of the products being built. Organizations usually
start by doing DW development quality assurance. This involves reviewing and auditing the data
warehousing products and activities to verify that they comply with the applicable procedures and
standards and providing the project and other appropriate managers with the results of these reviews and
audits. In time, organizations learn how to manage this and implement DW quality management. This
involves defining quality goals for the DW products and processes, establishing plans to achieve these
goals, and monitoring and adjusting the plans, products, activities, and quality goals to satisfy the needs
and desires of the customer and end user (Paulk et al., 1995).
Maturity Assessment Question(s)
The maturity assessment question and the characteristics specific for each stage can be depicted in the
table below.
10) Which answer best describes the DW quality management?
a) Level 1 – No quality assurance activities
b) Level 2 – Ad-hoc quality assurance activities
c) Level 3 – Standardized and documented quality assurance activities done for all the development phases
d) Level 4 – Level 3) + measurable and prioritized goals for managing the DW quality (e.g.: functionality,
reliability, maintainability, usability)
e) Level 5 – Level 4) + causal analysis meetings to identify common defect causes and subsequent elimination
of these causes; service quality management certification.
Table 36: DW Quality Management Maturity Assessment Question.
5.1.5 Knowledge Management
Knowledge management (KM) is an emerging discipline that promises to capitalize on organization‘s
intellectual capital. KM implementation and use has rapidly increased since the 1990s as more and more
companies understood the importance of the knowledge each individual possesses and can systematically
share with an organization (Rus & Lindvall, 2002). KM is ―the practice of adding actionable value to
information by capturing tacit knowledge and converting it to explicit knowledge; by filtering, storing,
retrieving and disseminating explicit knowledge; and by creating and testing new knowledge‖ (Nemati et
al., 2002). Explicit knowledge, also known as codified knowledge, is expressed knowledge. It
corresponds to the information and skills that employees can easily communicate and document, such as
processes, templates and data. Tacit knowledge is personal knowledge that employees gain through
experience; this can be hard to express and is largely influenced by their beliefs, perspectives and values
(Nonaka, 1991).
DW development is a quickly changing, knowledge-intensive process involving people working in
different phases and activities. Therefore, knowledge in data warehousing is diverse and an improved use
of this knowledge is the basic motivation for KM in this field. KM is equally important for both the DW
- 66 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
development processes and service processes. The general knowledge evolution cycle which defines the
phases of organizational knowledge can also be applied for the specific field of DW (Agresti, 2000):





originate / create knowledge – members of the DW team develop knowledge through learning,
problem solving, innovation, creativity, and importation from outside sources.
create / acquire knowledge – members acquire and capture information about knowledge in
explicit forms.
transform / organize knowledge – knowledge is organized, transformed or included in written
material and knowledge bases.
deploy / access knowledge – knowledge is distributed through education, training and mentoring
programmes, automated knowledge-based systems or expert networks.
apply knowledge – the organization‘s ultimate goal is applying the knowledge – this is the most
important part of the life cycle. KM aims to make knowledge available whenever it is needed.
In order to implement these phases systematically and successfully, it is very important for organizations
to have a centralized KM strategy in place and not do everything ad-hoc (Rus & Lindvall, 2002).
Maturity Assessment Question(s)
(Klimko, 2001) proposed a KM maturity model based on CMM. By summarizing the characteristics
provided by him for each maturity stage and also the information from the knowledge evolution cycle, we
came up with the following maturity assessment question. The same maturity assessment is also done for
the implementation of KM for Service Processes.
11) Which answer best describes the knowledge management in your organization for the DW development
processes?
a) Level 1 – Ad-hoc knowledge gathering and sharing
b) Level 2 – Organized knowledge sharing through written documentation and technology (e.g.: knowledge
databases, intranets, wikis, etc.)
c) Level 3 – Knowledge management is standardized; knowledge creation and sharing through brainstorming,
training and mentoring programs, and also through the use of technology
d) Level 4 – Central business unit knowledge management; quantitative knowledge management control and
periodic knowledge gap analysis
e) Level 5 – Continuously improving inter-organizational knowledge management.
Table 37: Knowledge Management Maturity Assessment Question.
5.2 DW Service Processes
As already mentioned in the previous paragraph, in the last two decades, software maintenance began to
be treated as a sequence of activities and not as the final stage of a software development project (April et
al., 2004). Several standards and models have been developed especially for software maintenance and
nowadays, more and more organizations make this distinction between the development phases and the
maintenance and monitoring processes. These processes are very important after a DW has been deployed
in order to keep the system up and running and to manage all the necessary changes. Software
maintenance is defined as (IEEE, 1990): ―The process of modifying a software system or component after
delivery to correct faults, improve performance or other attributes, or adapt to a changed environment‖.
- 67 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
5.2.1 From Maintenance and Monitoring to Providing a Service
In the last couple of years, IT organizations made a transition from being pure technology providers to
being service providers. This requires taking a different perspective on IT management, called IT Service
Management (ITSM). ITSM puts the services delivered by IT at the center of IT management and it is
commonly defined as (Young, 2004): ―a set of processes that cooperate to ensure the quality of live IT
services, according to the levels of service agreed to by the customer.‖
This service oriented perspective on IT organizations can be best applied to the software maintenance
field as it is an ongoing activity as opposed to the software development which is more project based.
Therefore, software maintenance can be seen as providing a service, whereas software development is
concerned with the delivery of products (Niessink & van Vliet, 2000). Consequently, customers will
judge the quality of software maintenance differently from that of software development. In particular,
service quality is assessed on two dimensions: the technical quality – what the result of the service is –
and the functional quality – how the service is delivered. This means that in order to provide high-quality
software maintenance, different and additional processes are needed than provided by a high-quality
software development organization (Niessink & van Vliet, 2000).
In order to have a clearer image on what a ―service‖ means, we can take a look at the service marketing
literature where a wide range of definitions exists of what a service entails. Usually, a service is defined
as an essentially intangible set of benefits or activities that are sold by one party to another (Grönroos,
1990). The main differences between products and services are (Zeithaml, 1996; van Bon, 2007):
intangibility, heterogeneity, simultaneous production and consumption, perishability. However, the
difference between products and services is not clear-cut and they can sometimes be intertwined.
If we turn to the software engineering domain, we see that a major difference between software
development and software maintenance is the fact that software development results in a product, whereas
software maintenance results in a service being delivered to the customer. All types of maintenance are
concerned with activities aimed at keeping the system usable and valuable for the organization. Hence,
software maintenance has more service-like aspects than software development, because the value of
software maintenance is in the activities that result in benefits for the customers, such as corrected faults
and new features. This is in contrast with software development, where the development activities do not
provide benefits for the customer, but instead it is the resulting software system that provides the benefits
(Niessink & van Vliet, 2000). As DW can also be considered a software engineering project, the same
concepts can be applied here as well. Also, as said above, the difference between products and services is
not clear-cut and, consequently, this also goes for software development and software maintenance.
5.2.2 IT Service Frameworks
Over the years, various IT service frameworks have been proposed: Information Technology
Infrastructure Library (ITIL), BS 15000, HP ITSM Reference Model, Microsoft Operations Framework
(MOF), IBM‘s Systems Management Solution Lifecycle (SMSL). However, in the ITSM landscape, ITIL
acts as the de-facto standard for the definition of best practices and processes that pertain to the
disciplines of service support and service delivery (Salle, 2004). BS 15000 extends ITIL, but at the same
time it is tightly integrated with ITIL. The other frameworks extend and refine ITIL, sometimes with
- 68 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
guidelines specific to the referenced technologies. Therefore, we will consider the service components
from ITIL as a starting point for our analysis of the DW Service Processes part. Moreover, two maturity
models related to IT maintenance and service also served as a foundation for developing this part of our
DW maturity model: the Software Maintenance Maturity Model and the IT Service CMM. Inspired by
other maturity models, they include several maturity stages and key process areas. An overview is
depicted in table 38. A more detailed description of the three models is provided further in this paragraph.
Authors
Central
Computer
and
Telecommunications
Agency
(CCTA) (1989)
Niessink, Clerc & van Vliet
(2002)
Model
Technology
(ITIL)
Infrastructure
Library
Main Idea
service delivery processes and service
support processes and functions
IT Service CMM
key practices intended to cover the
activities needed to reach a certain
level of service maturity while
preserving a structure similar to CMM
April, Hayes, Abran & Dumke Software Maintenance Maturity Model unique
activities
of
software
(2004)
(SMmm)
maintenance while preserving a
structure similar to that of the CMMi
Table 38: Overview of IT Service Frameworks.
ITIL
ITIL was established in 1989 by the United Kingdom‘s former Central Computer and
Telecommunications Agency (CCTA) to improve its IT organization. ITIL consists of an inter-related set
of best practices for lowering the cost, while improving the quality of IT services delivered to users. It is
organized around six key areas: service support, service delivery, business perspective, application
management, infrastructure management, security management and planning to implement service
management. However, the core of ITIL comprises of five service delivery processes and five service
support processes and one service support function (service desk). Service support processes apply to the
operational level of the organization (i.e.: all aspects associated with the daily activities of IT service and
maintaining the related processes), whereas the service delivery processes are tactical in nature (i.e.: the
processes required for planning and delivery of quality services over the long term, with a goal of
continual improvement of those services). An overview on ITIL‘s core components can be viewed in the
table below. For more information about ITIL, see (Colin, 2004).
Service Support
Service Delivery
Service Level Management
Financial Management
Capacity Management
IT Service Continuity Management
Availability Management
Service Desk
Incident Management
Problem Management
Change Management
Release Management
Configuration Management
Table 39: ITIL‘s Core Components (adapted from (Cater-Steel, 2006)).
Software Maintenance Maturity Model (SMmm)
The SMmm was designed as a customer-focused reference model for either:
- 69 -
DWCMM: The Data Warehouse Capability Maturity Model


Catalina Sacu
Auditing the software maintenance capability of a software maintenance service supplier or
outsourcer, or
Improving internal software maintenance organizations.
The model has been developed from a customer perspective, as experienced in a competitive, commercial
environment. A higher capability in the SMmm context means better services delivered for customer
organizations and increased service performance for the maintenance organizations.
The SMmm is based on the Capability Maturity Model Integration (CMMi), version 1.1 [sei02] and
Camélia model [Cam94]. The model must be viewed as a complement to the CMMi, especially for the
processes that are common to developers and maintainers. The architecture of the SMmm differs slightly
from that of the CMMi version. The most significant difference is the inclusion of:


A roadmap category to further define the key process areas (KPAs);
Detailed references to papers and examples on how to implement the practices.
The SMmm includes four process domains (i.e.: software maintenance process management, software
maintenance request maintenance, software evolution engineering, support to software evolution
enginerring), several KPAs, roadmaps and practices. While some KPAs are unique to maintenance, others
were derived from the CMMi and other models, and have been modified slightly to map more closely to
daily maintenance characteristics. For more details on the SMmm, see (April et al., 2004).
IT Service CMM
The IT Service CMM is based on the CMM, but it is adapted to the service processes. The model consists
of five maturity levels which contain KPAs. For an organization to reside on a certain maturity level, it
needs to implement all of the KPAs for that level and those of lower levels. An overview of the KPAs
assigned to each maturity level can be seen in table 40.
Level
Initial
Repeatable
Defined
Managed
Key Process Area
Ad hoc processes
Service Commitment Management
Service Tracking and Oversight
Subcontract Management
Service Delivery Planning
Event Management
Configuration Management
Service Quality Assurance
Organization Service Definition
Organization Process Definition
Organization Process Focus
Integrated Service Management
Service Delivery
Resource Management
Training Programme
Intergroup Coordination
Problem Management
Quantitative Process Management
Service Quality Management
- 70 -
DWCMM: The Data Warehouse Capability Maturity Model
Optimizing
Catalina Sacu
Process Change Management
Technology Change Management
Problem Prevention
Table 40: IT Service CMM‘s Key Process Areas (adapted from (Paulk et al., 1995)).
The objective of the IT Service CMM is twofold:


to enable IT service providers to assess their capabilities with respect to the delivery of IT
services, and
to provide IT service providers with directions and steps for further improvement of their service
capability.
There are a number of characteristics of the IT Service CMM that are important for understanding its
nature. The main focus of the model is the complete service organization, the scope of the model
encompasses all service delivery activities (i.e.: those activities which are key to improving the service
delivery capability of service organizations), the model is strictly ordered (i.e.: the key process areas are
assigned to different maturity levels in such a way that lower level processes provide a foundation for the
higher level processes), and the model is a minimal model in different senses (i.e.: the model only
prescribes the key processes and activities that are needed to reach a certain maturity level and it does not
show how to implement them, what organization structure to use, etc.). For a broader image on the IT
Service CMM, see (Niessink & van Vliet, 1999).
5.2.3 DW Service Components
Now that we have given a short overview on the most important frameworks related to IT service, we can
present the main elements we chose for our DW service processes maturity assessment. Once the DW
project has been deployed, ongoing maintenance and monitoring work is required to keep the DW system
operating in great shape. As the scope of the maintenance and monitoring activities in the DW extend
over many features and functions, it is important to have a plan and do these activities in a formalized
manner. The results of this phase offer the data needed to plan for growth and to improve performance.
The most important activities involved in the DW maintenance and monitoring are (Kimball et al., 2008;
Ponniah, 2001):









collection of statistics regarding the utilization of the hardware and software resources (e.g.:
memory management, physical disk storage space utilization, processor usage, report usage,
number of completed queries by time slots during the day, time each user stays online with the
data warehouse, total number of distinct users per day, etc.)
user support
BI applications maintenance and monitoring
security administration
performance monitoring and tuning
data reconciliation and data growth management
ETL monitoring and management
resource monitoring and management
infrastructure management
- 71 -
DWCMM: The Data Warehouse Capability Maturity Model

Catalina Sacu
backup and recovery management, etc..
Maturity Assessment Question(s)
As can be seen, the DW software maintenance and monitoring involves a lot of activities, but it is critical
to include at least the most important ones. This is the reason why we developed a high level maturity
question regarding DW software maintenance and monitoring processes. The question is a multiple
choice one where the answers are the main activities included in this part of the DW solution. It will be
scored similar through normalization similar to the question for the testing and acceptance phase.
However, this question was not included in the questionnaire for the first case study and thus, will not be
taken into consideration when doing the scoring for the case studies.
1) Which of the following activities are included in the maintenance and monitoring phase for your DW project?
a) Collection of statistics regarding the utilization of the hardware and software resources (e.g.: memory
management, physical disk storage space utilization, processor usage, BI applications usage, number of
completed queries by time slots during the day, time each user stays online with the data warehouse, total
number of distinct users per day, etc.)
b) BI applications maintenance and monitoring
c) User support
d) ETL monitoring and management
e) data reconciliation and data growth management
f) Security administration
g) Resource monitoring and management
h) Infrastructure management
i) Backup and recovery management
j) Performance monitoring and tuning.
Table 41: Maintenance and Monitoring Maturity Assessment Question.
As already presented in the previous paragraphs, DW maintenance and monitoring are more and more
often considered to be service processes as they are offered on an ongoing basis to the customers. From
the presented IT service frameworks, it can be seen that some elements appear in more than one model or
some elements from one model can be mapped to elements from another one. If we also take into
consideration the changing nature of a DW and all the aspects that DW maintenance and monitoring
processes entail, we decided to consider the following components when assessing the maturity of DW
service processes: service quality management, service level management, incident management, change
management, technical resource management, availability management, release management and
knowledge management. Each of these elements and the correspondent maturity assessment question(s)
will be further elaborated on in this paragraph.
5.2.3.1 Service Quality Management
The purpose of Service Quality Management is to provide management with the appropriate visibility into
the processes being used and the services being delivered. This process entails service quality assurance
activities which involve the reviewing and auditing of working procedures, DW service delivery activities
and work products to see that they comply with applicable standards and procedures. Management and
relevant groups are provided with the results of the reviews and audits. An organization with experience
- 72 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
in service processes also develops a quantitative understanding of the quality of the services delivered in
order to achieve specific quality goals (Niessink & van Vliet, 1999). If these goals are not reached, causal
analysis meetings should be held to identify the defect causes and subsequently eliminate them. In order
to get better results, many organizations with a high maturity in DW service delivery also try to obtain
external service quality certification (e.g.: ISO certification, etc.).
Maturity Assessment Question(s)
Therefore, an organization on the first maturity stage will not have any service quality management
activities; whereas one on the highest maturity level will have not only a standard procedure, but also
quantitative service quality evaluation and causal analysis meetings. Their purpose is to identify common
defect causes and try to eliminate them in the future; in this way continuous service quality management
improvement is achieved.
2) Which answer best describes the DW service quality management in your organization?
a) Level 1 – No service quality management activities
b) Level 2 – Ad-hoc service quality management
c) Level 3 – Proactive service quality management including a standard procedure
d) Level 4 – Level 3) + service quality measurements periodically compared to the established goals to
determine the deviations and their causes
e) Level 5 – Levels 4) + causal analysis meetings to identify common defect causes and subsequent
elimination of these causes; service quality management certification.
Table 42: Service Quality Management Maturity Assessment Question.
5.2.3.2 Service Level Management
Service Level Management ensures continual identification, monitoring and reviewing of the optimally
agreed levels of IT services as required by the business. It negotiates service level agreements (SLAs)
with the suppliers and customers and ensures that they are met (Cater-Steel, 2006). It is responsible for
ensuring that all DW service management processes, operational level agreements and underpinning
contracts are appropriate for the agreed service level targets. This is done in close cooperation between
the DW service providers and the customers. Some examples of SLA performance criteria for a DW are:




50 concurrent queries processed with an average query time of no more than five minutes.
Less than four hours of planned downtime per week.
Less than six hours of unplanned downtime per month.
Data refreshed weekly.
The high level activities for Service Level Management are: document customer service needs, implement
SLAs, SLAs reviewed with the customer/supplier on a periodic or event-driven basis, actual service
delivery continuously monitored and evaluated with the customer/supplier (SLAs with penalties).
- 73 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Maturity Assessment Question(s)
From the high level activities, one could say that usually an organization evolves from documenting all
customer/supplier service needs in an ad-hoc manner to using a standard procedure and continuously
monitoring, evaluating and improving the actual service delivery.
3) Which answer best describes the DW service level management in your organization?
a) Level 1 – Customer/supplier service needs documented in an ad-hoc manner; no service catalogue
compiled
b) Level 2 – Some customer/supplier service needs documented and formalized based on previous experience
c) Level 3 – All the customer/supplier service needs documented and formalized according to a standard
procedure into service level agreements (SLAs)
d) Level 4 – SLAs reviewed with the customer/supplier on both a periodic and event-driven basis
e) Level 5 – Actual service delivery continuously monitored and evaluated with the customer/supplier on both
a periodic and event-driven basis for continuous improvement (SLAs including penalties).
Table 43: Service Level Management Maturity Assessment Question.
5.2.3.3 Incident Management
ITIL defines an incident as a deviation for the (expected) standard operation of a system or a service. The
objective of Incident Management is to provide continuity by restoring the service in the quickest way
possible by whatever means necessary (Salle, 2004). Also, a problem is considered in ITIL as a condition
that has been defined, identified from one large incident or many incidents exhibiting common symptoms
for which the cause is unknown (Salle, 2004). As a DW is a very complex system, many incidents and
problems can occur, and therefore, this process is very important. Incidents and problems can arise on the
side of the users or that of the system. Given the frequency of changes in a DW, many complex problems
are likely to occur very often. The objective of Incident and Problem Management is to provide continuity
by restoring the service as quickly as possible and to prevent and minimize the impact of incidents. This
is why it is critical to have a solid Incident and Problem Management that also needs to be in a close
relationship with Change Management. The high level activities for Incident Management are: detection,
recording, classification, investigation, diagnosis, resolution and recovery.
Maturity Assessment Question(s)
4) Which answer best describes the DW incident management in your organization?
a) Level 1 – Incident management is done ad-hoc with no specialized ticket handling system or service desk
to assess and classify them prior to referring them to a specialist
b) Level 2 – A ticket handling system is used for incident management and some procedures are followed, but
nothing is standardized or documented
c) Level 3 – A service desk is the recognized point of contact for all the customer queries; incidents
assessment and classification is done following a standard procedure
d) Level 4 – Level 3) + standard reports concerning the incident status including measurements and goals
(e.g.: response time) are regularly produced for all the involved teams and customers; an incident
management database is established as a repository for the event records
e) Level 5 – Level 4) + trend analysis in incident occurrence and also in customer satisfaction and value
perception of the services provided to them.
Table 44: Incident Management Maturity Assessment Question.
- 74 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
5.2.3.4 Change Management
Change Management is described as a regular task for immediate and efficient handling of changes that
might occur in a DW environment. The main input to the Change Management process is a request for
change (RFC) (Salle, 2004). This can be done by an outcome of a process relating to Incident and
Problem Management or by extending the service through Service Level Management. The objective of
Change Management is to ensure that standardized methods and techniques are used for efficient and
immediate handling of all the changes to the DW system while minimizing change related incidents. The
changes that can frequently occur in a DW environment concern:



Changes in the contents of the DW.
Changes in the functionality of BI applications.
Changes in a source system with direct implications for ETL, etc.
The high level for Change Management activities are: acceptance and classification, assessment and
planning, authorization of changes, control and coordination, evaluation.
Maturity Assessment Question(s)
As in the case of Incident Management, at first an organization takes care of change requests in an ad-hoc
manner. Then, an electronic change management system is usually introduced for storing and solving the
requests for change and some policies and procedures for change management are beginning to be
established. Once a standard procedure for approving, verifying, prioritizing and scheduling changes is
put in place, organizations start moving towards the high end of the maturity development. And some of
them manage to reach the last maturity stage of continuous improvement of Change Management.
5) Which answer best describes the DW change management in your organization?
a) Level 1 – Change requests are made and solved in an ad-hoc manner
b) Level 2 – A change management system is used for storing and solving the requests for change; some
policies and procedures for change management established, but nothing is standardized
c) Level 3 – A standard procedure is used for approving, verifying, prioritizing and scheduling changes
d) Level 4 – Standard reports concerning the change status including measurements and goals (e.g.: response
time) are regularly produced for all the involved teams and customers; standards established for
documenting changes
e) Level 5 – Trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and
value perception of the services provided to them.
Table 45: Change Management Maturity Assessment Question.
5.2.3.5 Technical Resource Management
The purpose of Resource Management is to maintain control of the necessary hardware and software
resources needed to deliver the agreed DW services level targets (Niessink & van Vliet, 1999). Before
commitments are made to customers, resources are checked. If not enough resources are available, either
the commitments are adapted or extra resources are installed. It also involves monitoring the ETL and BI
applications in order to see if the current resources are enough for the desired DW performance.
- 75 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Maturity Assessment Question(s)
Similar to the other service processes, DW Technical Resource Management also evolves from ad-hoc
activities to resource trend analysis and monitoring to determine the most common bottlenecks and make
sure that there is sufficient capacity to support planned services. The intermediate phases can be depicted
from the answers to the maturity assessment question below.
6) Which answer best describes the DW technical resource management in your organization?
a) Level 1 – Ad-hoc resource management activities (only when there is a problem)
b) Level 2 – Resource management is done following some procedures, but nothing is standardized or
documented
c) Level 3 – Resource management is done constantly following a standardized documented procedure
d) Level 4 – Level 3) + standard reports concerning performance and resource management including
measurements and goals are done on a regular basis
e) Level 5 – Level 4) + resource management trend analysis and monitoring to make sure that there is
sufficient capacity to support planned services.
Table 46: Incident Management Maturity Assessment Question.
5.2.3.6 Availability Management
Availability Management allows organizations to ensure that all DW infrastructure, processes, tools and
roles are according to the SLAs by using appropriate means and techniques. It should also manage risks
that could seriously impact DW services by reducing the risks to an acceptable level and planning for the
recovery of DW services. Availability Management also tries to proactively manage continual
improvement efforts by measuring and tracking metrics for availability, reliability, maintainability,
serviceability, and security (Colin, 2004). In order to have better results, Availability Management that
also needs to be in a close collaboration with Resource Management.
Maturity Assessment Question(s)
The maturity assessment question for Availability Management follows the same structure as the one for
Resource Management as these activities are in close collaboration, but there is a very important
characteristic for the former which can really make a difference: risk assessment. An organization that
follows a standardized procedure for availability management and that also pays serious attention to risk
assessment has a very high change of delivering the agreed service level targets. The maturity question
for this aspect can be seen below.
7) Which answer best describes the availability management in your organization?
a) Level 1 – Ad-hoc availability management
b) Level 2 – Availability management is done following some procedures, but nothing is standardized or
documented
c) Level 3 – Availability management documented and done using a standardized procedure (all elements are
monitored)
d) Level 4 – Level 3) + risk assessment to determine the critical elements and possible problems
e) Level 5 – Level 4) + availability management trend analysis and planning to make sure that all the elements
are available for the agreed service level targets.
Table 47: Availability Management Maturity Assessment Question.
- 76 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
5.2.3.7 Release Management
As a DW is continuously changing and evolving over time, organizations need to embrace the release
concept. Any incomplete functionality or necessary change will be bundled in future releases. Therefore,
the objective of Release Management is to ensure that only authorized and correct versions of DW are
made available for operation (Salle, 2004). It can be seen as a collection of hardware, software,
documentation, processes or other components required to implement approved changes to a DW (CaterSteel, 2006). In order to have a successful Release Management, it is very important to have a solid
release planning; and to document and follow a standardized procedure for this process. A solid Release
Management also implies standardized release naming and numbering conventions; assigned roles and
responsibilities; and a release database with master copies of previous DW versions.
Maturity Assessment Question(s)
Therefore, the maturity assessment question for this component of the DW service processes is
straightforward and can be seen below.
8) Which answer best describes the release management in your organization?
a) Level 1 – Ad-hoc changes solving and implementation; no release naming and numbering conventions
b) Level 2 – Release management is done following some procedures, but nothing is standardized or
documented; release naming and numbering conventions
c) Level 3 – Release management is documented and done following a standardized procedure; assigned
release management roles and responsibilities
d) Level 4 – Level 3) + standard reports concerning release management including measurements and goals
are done on a regular basis; master copies of all software in a release secured in a release database
e) Level 5 – Level 4) + release management trend analysis, statistics and planning.
Table 48: Release Management Maturity Assessment Question.
5.3 Summary
This chapter has offered a detailed image of the DW organization and processes benchmark variable and
its main sub-categories: DW development processes and DW service processes. Just like in the previous
chapter, we identified the main characteristics for each sub-category for each maturity stage and we
presented the underlying maturity assessment questions.
Now that the DWCMM with its main components and maturity assessment questionnaire have been
presented, we will continue with presenting the results of the evaluation phase from our research process
in the next chapter.
- 77 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
6 Evaluation of the DWCMM
This section presents the results of two activities aimed at evaluating the model presented in the previous
chapters. Chapter 6.1 is a report of the review of the model by five DW/BI experts from practice, and
emphasizes the validity of the model. Chapter 6.2 describes an assessment of the case study results gotten
by testing the model in four organizations.
6.1 Expert Validation
To evaluate the utility and further revise the DWCMM, expert validation was applied. An ―expert‖ is
defined by (Hoffman et al., 1995) as a person ―highly regarded by peers, whose judgements are
uncommonly accurate and reliable, whose performance shows consummate skill and economy of effort,
and who can deal effectively with rare or tough cases. Also, an expert is one who has special skills or
knowledge derived from extensive experience with subdomains.‖ Therefore, eliciting knowledge from
experts is very important and useful and can be done using several methods, one of them being structured
and unstructured interviews (Hoffman et al., 1995). More information on interview techniques is given in
6.2.1.
Moreover, five experts in data warehousing and BI were interviewed and asked to give their opinions
about the content of the model we have developed. The interviews were structured, but consisted of open
questions, in order to capture the knowledge of respondents. This offered the possibility of enabling the
experts to liberally state their opinions and ideas for improvement. The expert panel consists of five
experts from practice, each of them having at least 10 years of experience in the DW/BI field. All of them
are DW/BI consultants at different organizations in The Netherlands (local or multinational). An
overview of the experts and their affiliations (figures are taken from 2009 annual reports) is depicted in
table 49.The expert interview protocol and questions can be seen in appendix D.
ID
Job Position
1
CI/BI consultant
Industry
DW/BI
Consulting
B2B
≈ 45
Market
Employees
Respondent
2
3
Principal consultant/ BI consultant
Thought
leader
BI/CRM
Respondent Affiliation
IT Services
BI Consulting
B2B
≈ 49000
B2B
≈ 35
Table 49: Expert Overview.
4
Principal
consultant BI
5
BI consultant
IT Services
DW Consulting
B2B
≈ 38000
B2B
≈1
6.1.1 Expert Review Results and Changes
DWCMM
First, the experts were asked to propose some categories that they would find important when assessing
the maturity of a DW solution. Among the proposed categories we can mention: data structure, data
- 78 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
architecture, metadata, masterdata, hardware, infrastructure, report architecture, security, management
and maintenance, traceability within the DW. One expert said that other important aspects to be analyzed
refer to whether the organization is doing ETL or ELT, and whether they are using real time data
warehousing. Another critical point for success was considered to be the alignment between business and
IT. As can be seen, some of the categories proposed by the experts can be found or are included in the
categories from our model. The others (i.e.: data architecture, data governance, masterdata, traceability)
can be considered to be further researched in the future.
Furthermore, all reviewers gave positive feedback for their first impression of the DWCMM, said it made
sense and it could be applied for assessing an organization‘s current DW solution. One of the experts
noticed that the main sub-categories from the DW technical solution part were not on the same level
because ―architecture‖ is usually a superset that includes data modelling, ETL and BI applications. Some
experts said that in general the model seemed to be complete, but that of course, probably small changes
could be made or new categories/sub-categories should be added.
Three of five reviewers stated that ―infrastructure‖ should also be added as a sub-category for the DW
technical solution or should replace ―architecture‖. However, as already explained in the previous
chapters, in literature, infrastructure usually refers to the hardware and software supporting architecture.
Also, architecture usually refers to the logical architecture (i.e.: the data storage layers), application
architecture (i.e.: ETL, BI applications), data architecture, technical architecture (i.e.: infrastructure). Our
sub-category refers more to the logical architecture and some other elements such as: metadata, security,
infrastructure, etc. Therefore, we agree that maybe the name ―architecture‖ could be a little bit confusing
and we decided to change the name to ―General Architecture and Infrastructure‖ for the final model.
One expert suggested that ―data modelling‖ should be changed to ―data management‖ that is a broader
category which includes: data modelling, data quality and data governance. However, due to time
constraints and the fact that we do assess data modelling and a little bit the data quality in our current
model, we leave the data governance part to future research.
The last comments regarding the structure of the DWCMM were related to ETL. One of the experts
suggested that this sub-category should be called ―data logistics‖ as it could involve ETL or ELT. But, as
we believe that ETL is the more common name and easier to be understood by the respondents who
would take the assessment, we decided to leave it unchanged. Another expert proposed that a new subcategory called ―ETL Workflow‖ should be added. It would include the way fault tolerance is addressed,
the ETL technical administration and generally how ETL processes are managed. We consider that this
sub-category would not be on the same level as the other ones, and some of its elements are addressed in
the ETL sub-category, and therefore, we decided not to include it in our model for the time being.
The DWCMM Condensed Maturity Matrix
All reviewers said they got a good first impression of the DWCMM Condensed Maturity Matrix and that
it gives a good overview of the main goal of the model. Some experts pointed out that the characterization
of ―architecture‖ for the highest maturity stage was not on the same level as the previous ones. After also
doing the test case studies, we decided to change the fifth stage of maturity to ―DW/BI service that
federates a central enterprise DW and other data sources via a standard interface‖. More comments on this
change are given in the case study results paragraph.
- 79 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Moreover, several suggestions were made regarding the ETL characterization for each stage. One expert
suggested that more information should be given for each stage of ETL. Another one proposed that the
characterization of ETL for the last level of maturity should be changed as ―real-time ETL‖ seems not to
be on the same page as the ETL characterization done for the previous stages (i.e.: basic ETL, advanced
ETL, etc.). The redefined matrix after the expert interviews can be depicted in figure 6.
DW Maturity Assessment Questionnaire
As with the previous two deliverables, all reviewers gave positive feedback for their first impression of
the DW maturity assessment questionnaire. Some of the experts pointed out that, even if the chosen
characteristics and questions are representative for the problem we would to address, they might not be
enough to do an in-depth assessment of a specific DW environment. Therefore, most of the experts
suggested that, when testing the model in practice, it would be very important to clearly state that the
main goal of the model is to do a high-level DW maturity assessment and that the focus of the
questionnaire is represented by the technical aspects of a DW solution.
Furthermore, each expert had its own view on data warehousing and BI, and hence, it was difficult to sum
up all their comments and integrate them in useful changed for our maturity questionnaire. Finally, we
decided to split their feedback into two categories: proposed changes that due to time constraints and
scope limitation were not implemented in the final version of the model, but should be considered for
future research; and implemented improvement suggestions that involved some rephrasing or complete
changing of questions and answers. We will give a short overview on the former further in this paragraph.
The actually implemented changes can be seen in the redefined questionnaire in appendix C. The
questions and answers that suffered changes are written in red so that the differences from the first
version of the questionnaire can be better depicted. Also, the main questions and answers that were
redefined can be seen in the table below.
Category
Rephrased Questions
Questions Whose Answers
Rephrased or Changed
1
Data Modelling
2,
4 – split in two questions
1, 8
ETL
2,
5 – split in two questions
3 – split in two questions
1, 2, 4
3 – split in two questions
2, 5, 8
3, 4, 5, 6, 7, 8
Architecture
BI Applications
Development Processes
Service Processes
Were
Table 50: Rephrased or Changed Questions and Answers.
Moreover, here are the main changes that were proposed by the experts, but were not implemented. All
the experts suggested a version for the DW architecture found on the highest maturity level. We
combined all the opinions and came up with the answer as shown later in this chapter. Three of five
experts suggested that more attention should be paid on data quality as it is a very important issue
nowadays, and that more questions should be added for tackling this problem. We have a question
regarding data quality in the ETL section, but apparently, data quality should also be taken into
consideration in the data modelling part. Therefore, due to time constraints and to the high level nature of
our assessment, we leave this topic open for future research. One expert suggested that we should dive a
- 80 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
little bit into cloud computing for data warehousing. We find this topic very interesting and important for
the future of data warehousing and BI. However, due to time constraints, we could not find an efficient
way to include this in our current model and we will leave it to future research.
One expert said that we should analyze in more details how the actual monitoring and management of
ETL is done, not just ask if they do this. But, as already mentioned, our assessment is a high level one and
it tries to capture what is done and not how it is done. Hence, we decided not to implement this suggested
change. Other proposals that we found interesting for assessing the maturity of a DW, but very hard to
include in a model and questionnaire like ours refer to: judge how mature the organization is in adapting
to new situations; address the right DW development methodology (e.g.: waterfall, iterative and
incremental development, agile development, etc.) for the right category; or have a good strategy for tool
management (e.g.: aspects related to pricing, licensing, etc.) and understand that there are more tools, but
you need an integrated view to be successful.
The last important suggestion was that a question for problem management should be added to the DW
service processes category. We must admit that we also thought about it when developing the model, but
we finally decided not to include this in our questionnaire. Problem is usually defined in ITIL as ―a
condition that has been defined, identified from one large incident or many incidents exhibiting common
symptoms for which the cause is unknown‖. Therefore, we believe that problem management is positively
correlated with incident and change management, and it can be included in these processes. That is why
for the time being, we will leave this question out.
Besides the questions regarding the DWCMM, we also asked the experts whether weighting coefficients
should be considered for computing the maturity scores. Two of them said that no general weights should
be used as this would make the scoring rather difficult and these weighting coefficients should be
situational, depending on each organization. One expert suggested that it would be interesting to have
weights for both individual questions and sub-categories/categories. One expert believed that weighting
factors should be used for the main sub-categories/categories. The last expert was not really sure about
this aspect, saying that it would be interesting to have weights for each question, but the scoring will
become very complicated. Therefore, due to the lack of unanimity regarding the adding of weighting
coefficients to the questionnaire, and for the clarity of scoring, we decided to leave weights out for the
time being.
6.2 Multiple Case Studies
Depending on the nature of a research topic and the goal of a researcher, different research methods are
appropriate to be used (Benbasat et al., 1987; Yin, 2009). One of the most commonly used ways to
classify research methods is the distinction between qualitative and quantitative research methods. The
research method applied here is case study research, a qualitative one. It is the most widely used
qualitative research method in information systems (IS) research and is well suited to understanding the
interactions between information technology (IT)-related innovations and organizational contexts (Darke
et al., 1998). (Runeson & Host, 2009) suggest that case study research is also a suitable research
methodology for software engineering since it studies contemporary phenomena in its natural context.
Therefore, as research in data warehousing can be considered to be at the border between IS and software
engineering, case study research is also appropriate here. According to (Yin, 2009), ―the essence of a case
- 81 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
study is that it tries to illuminate a decision or set of decisions: why they were taken, how they were
implemented, and with what result.‖ Hence, the case study method allows investigators to retain the
holistic and meaningful characteristics of real-life events, such as organizational and managerial
processes, for example. (Benbasat et al., 1987) consider that a case study examines a phenomenon in its
natural setting, employing multiple methods of data collection to gather information from one or a few
entities (i.e.: people, groups or organizations). As our research is developing a DWCMM, it is part of the
IS/software engineering field and it is suited for both technical and organizational issues. Therefore, case
study research seems as an appropriate choice that will help us capture knowledge from practitioners, test
and validate the created models and theories. In order to enrich and validate our model in practice, four
organizations were chosen to take our DW maturity assessment and the results will be furthered presented
in this chapter.
6.2.1 Case Study Approach
Case study research can be used to achieve various research aims: to provide descriptions of phenomena,
develop theory and test theory (Darke et al., 1998). But, no matter of its final goal, preliminary theory
development as part of the case study design phase is essential (Yin, 2009). In our research, we will use it
in order to test theory which in this case is the DWCMM we developed. The use of case study research to
test theory requires the specification of theoretical propositions derived from an existing theory. The
results of case study data collection and analysis are used to compare the case study findings with the
expected outcomes predicted by the propositions (Cavaye, 1996). The theory is either validated or found
to be inadequate in some way, and may then be further refined on the basis of the case study findings.
Case study research may adopt single or multiple case designs. A single case study is appropriate where it
represents a critical case (i.e.: it meets all the necessary conditions for testing a theory), where it is an
extreme or unique case, or it is a revelatory case (Yin, 2009). Multiple case designs allow cross-case
analysis and comparison, and the investigation of a particular phenomenon in diverse settings. Multiple
case studies may also be selected to predict similar results (i.e.: literal replication) or to produce
contrasting results for predictable reasons (i.e.: theoretical replication) (Yin, 2009). As according to
(Benbasat et al., 1987) and (Yin, 2009), multiple case studies are preferred over single case studies
designs to get better results and analytic conclusions, we decided to conduct a multiple case study
research following (Yin, 2009) case study approach as depicted in figure 10.
- 82 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Figure 10: Case Study Method (adapted from (Yin, 2009)).
Therefore, the main steps in case study research are (Runeson & Host, 2009; Yin, 2009):



Case study design – research objectives are defined and the case study is planned. This is also
where theoretical development is done, as described in chapters 3-5.
Preparation for data collection – procedures and protocols for data collection are defined. This is
also where cases are found and selected to evaluate and test the theory. The main criterion used in
the search for suitable organizations was that all approached organizations had a professionally
DW/BI system in place whose maturity could be assessed by applying the DWCMM.
Furthermore, an important criterion for the selection of respondent per case was that the
interviewed respondents had an overall view on the technical and organizational aspects for the
DW/BI solution implemented in their organization. As (Yin, 2009) suggests that at least three
case studies should be used, four test organizations have been found that agreed on cooperating in
our research and taking the maturity assessment (an overview is provided in paragraph 6.2.2).
While selecting the case studies, the case study and data collection protocols were also defined.
The protocol contains the instrument, but also the procedures and general rules to be followed. It
is essential when doing a multiple-case study to increase the reliability of the case study research
and guide the investigator in carrying out the data collection.
Collecting evidence – execution with data collection on the studied case. Typically, multiple data
collection methods are employed in case study research to increase the validity of the results.
Also, multiple sources of evidence are used such as (Yin, 2009): documentation – various written
material; archival records – organizational charts, service, personnel and financial records;
interviews – open, structured or semi-structured interviews; direct observation – observe and
absorb details, actions or subtleties of the environment; physical artifacts – devices, tools,
instruments. In this research, the data collected in the cases consists of four interviews and
documentation. The interview lasts around 1.5 hours and consists mainly of the maturity
assessment questionnaire itself, but it also has a few open questions. This allows the researcher to
guide and control the interview, while leaving some room available for discussions in order to
capture the suggestions and questions that the respondents might have. The purpose was not only
for organizations to take the maturity assessment, but also to use the knowledge of the
respondents in order to improve the questionnaire. For analyzing purposes, interviews have been
- 83 -
DWCMM: The Data Warehouse Capability Maturity Model


Catalina Sacu
digitally recorded, transcribed and validated by the respondents. For purposes of consistency, the
interview protocol is enclosed in appendix E. Mainly, the interview consisted of three parts:
 General questions about the organization and the respondent‘s role in the DW/BI project.
 The DW maturity assessment questionnaire (i.e.: DW General Questions; DW Technical
Solution; DW Organization and Processes).
 Final questions and suggestions.
Analysis of collected data – data analysis can be quantitative and qualitative. In this research, we
will do a qualitative data analysis to derive conclusions from the interviews and improve our
model. The remainder of this chapter discusses the overall findings of the case studies, including
a short description per case and analysis of the results. Despite the fact that all individual cases
are interesting, we will focus on the overall results.
Reporting – the report communicates the findings of the study, but it is also the main source of
information for judging the quality of the study. Therefore, the master thesis document itself will
serve as the case study report.
6.2.2 Case Overview
The case studies have been conducted at four organizations of different sizes, operating in several types of
industries and offering a wide variety of products and services. An overview of the case study
organizations (figures are taken from 2009 annual reports) and respondents is depicted in table 51. As the
technologies used for developing each component of the DW can help us shape a better image on the DW
solution and its maturity, an overview of these technologies is also offered in table 52. For each case, a
short description is provided in the following subparagraphs. A short analysis on the maturity scores each
organization got after taking the assessment is also given further in this chapter. However, due to
confidentiality reasons, the individual answers for each question and the feedback given to each
organization are not published in the official version.
Organization
Industry
Market
Revenue
Employees
Respondent
Function
A
B
Retail
Insurance
B2C
19.94 billion €
≈ 138000
BI consultant
C
Retail
B2B & B2C
B2C
4.87 billion €
780 million €
≈ 4500
≈ 3660
DW/BI
technical BI manager
architect
Table 51: Case and Respondent Overview.
D
Maintenance
&
Servicing
B2B
NA
≈ 3500
BI consultant & DW
lead architect
- 84 -
DWCMM: The Data Warehouse Capability Maturity Model
Organization
Developing Category
Data Modelling
NA
Power Designer
SAP
Extract/Transform/Load
(ETL)
IBM
InfoSphere
DataStage
IBM InfoSphere
DataStage
SAP
BI Applications
Microstrategy
&
Business Objects
Database
Organization A
Organization B
Catalina Sacu
Organization C
Cognos & SAS; QlikView
in-house Business
Objects
IBM DB2
Oracle DB
IBM DB2
Table 52: Technologies Usage Overview.
Organization D
Visio,
Word,
PowerPoint
Oracle
Warehouse
Builder
Oracle
BI
Enterprise
Edition
Oracle DB
6.2.2.1 Organization A
Organization A is an international food retailer headquartered in Western Europe. It has leading positions
in food retailing in key markets. These positions are built through strong regional companies going to
market in a variety of food store formats. The operating companies benefit from the Group‘s global
strength and best practices. Their strategy remains organized around the three pillars of profitable top-line
growth, the pursuit of excellence in execution and corporate citizenship. Organization A considers that in
a high-volume industry characterized by low margins such as food retail, excellent execution offers a true
competitive advantage. This is the reason why sophisticated tools, state-of-the-art systems and
streamlined processes are implemented to serve as the foundation for profitable growth and good returns.
Connecting and converging tools, systems, processes and people help the operating companies to address
both current and future challenges with cost-effective and integrated solutions.
DW General Information
The main drivers for developing a DW at organization A were to improve managerial decisions and
increase profit. For a supermarket it is very important to store data at a high granularity in the DW. In this
way operations can be closely monitored and different types of BI applications can be developed. The
main activities done using the DW/BI project are: reporting and dashboarding on the main KPIs on profit
margins, store usage, store losses, etc. Also, some data mining to determine which products are most
often sold together. In this way a better product placing and promotion decisions can be achieved.
Organization A has been using DW/BI for almost 10 years and executives perceive the DW/BI
environment as a tactical resource (i.e.: a tool to assist decision making) and their goal is for the DW/BI
to become a competitive differentiator (i.e.: key to gaining or keeping customers and/or market share) in
the future. In general, the returns gotten from the DW are higher than its costs, data quality is high which
can also be seen in the relatively high end-user adoption. As the DW environment is considered an
important factor for success, the budget owner is the business part and a relatively high percent of the
annual IT budget is allocated to the DW/BI department.
6.2.2.2 Organization B
Organization B, situated in Western Europe, is a major player in the insurance market. Established in the
eighteenth century, it has a century long tradition and experience in this field. It offers both private
- 85 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
individuals and companies a wide array of life, non-life medical and disability insurances, and also
mortgage, savings and investment products. The distribution is made via several channels: brokers,
consultants working on commission, banks, independent intermediaries and direct contact.
DW General Information
The main driver for developing a DW at organization B was that business analysts and controllers needed
integrated data in order to make their own reports, analyses, etc. Another requirement came from the
consumer intelligence department that wanted to have a broader view on the whole company portfolio
and on each customer portfolio.
Organization B has been using DW/BI for almost 10 years and the DW solution is perceived as an
operational cost center (i.e.: an IT system needed to run the business) and in certain situations, as a
tactical resource. Data quality is considered to be a business responsibility and therefore the ―garbage in,
garbage out‖ principle is applied. However, a lot of attention is paid to the software and development
processes quality in the technical department. The DW solution is not a big success in organization B as
the end-user adoption is not high due to distrust in the data and the DW solution itself. DW/BI has
decentralized budget owners as business lines have their own budgets and each one of them decides what
to spend on BI, but generally around 5% of the IT budget is allocated for this department.
6.2.2.3 Organization C
Organization C is a supermarket chain with local activities in a western European country. They are a
unique and ambitious organization due to their cooperative nature: an intense cooperation between the
organization and its members. They are a flat organization with short communication lines, a complete
and modern logistics system and a low cost structure. This organization offers a wide diversity of food
related products and their goal is an optimal service to customers. They try to optimize their business
outcomes by stacking purchasing volumes and ongoing cost control.
DW General Information
The main driver for developing a DW at organization C was that management could make better
decisions based on the right data. As organization C is a supermarket, some of the drivers are the same as
the ones for organization A. A DW/BI solution is very important for food retailers as they have many and
diverse products and a lot of transactions take place every day. At organization C, the DW is considered
the only viable solution for reporting and data analysis.
Organization C has been using DW/BI for almost 15 years and it has evolved a lot since the first solution
was implemented. Executives perceive the DW/BI environment as a tactical resource and that is why data
quality is considered very important. It could be said that the returns are higher than the costs as, due to
high data quality and ―one version of the truth‖, end user adoption is good and executives are able to
make better and faster decisions. The budget owner of the DW is the Chief Financial Officer (CFO) and
usually, 10% of the annual IT budget is allocated to the DW/BI environment for continuous improvement.
- 86 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
6.2.2.4 Organization D
Organization D is one of the leading providers of rolling stock maintenance in Western Europe. They
provide rolling stock availability and reliability for numerous passenger and freight carriers from across
Europe. In addition to short-term maintenance, organization D also offers routine servicing. This covers
minor repairs as well as the cleaning of interiors and exteriors, including the removal of graffiti. Customer
and performance focus play an essential part in the business partnership between organization D and its
customers. That is why they are closely involved in all stages of a customer‘s project in order to avoid
unnecessary work being carried out. Another important aspect for organization D is innovation. They
invest in high-technology workshops and state-of-the-art equipment. Only by keeping up with the latest
technology can they offer specialized services to customers at the forefront of the rapidly changing rail
transport market.
DW General Information
The main driver for developing a DW at organization D was the need for high data quality and
consistency in order for the business (especially middle and higher management) to make the right
decisions. In the beginning, it was focused on the operational side, but now the main focus is on the
financial one; and the main goal for this year is to integrate the two solutions in a single DW.
Organization D has been using DW for 3.5 years and it started out as a tactical resource. Nowadays,
executives perceive it as a mission-critical resource and the goal for the near future is for the DW to
become a strategic resource. Therefore, the DW solution and the way it is perceived in the organization
have developed a lot since it was first implemented. In general, there is a positive net result when
comparing the returns and costs of the DW. The main benefits include a high data quality and end-user
adoption. From this point of view, the DW/BI solution has achieved its goal. The budget owner of the
DW is the Chief Financial Officer (CFO) and usually, less than 5% of the annual IT budget is allocated to
the DW/BI environment.
6.2.2.5 Case Study Analysis
In this section, a short analysis of the results gotten by all the organizations after filling in the assessment
questionnaire will be given. The maturity scores regarding the implemented DW solution obtained by the
organizations can be seen in the table below.
Maturity Score
Benchmark Category
Architecture
Data Modelling
ETL
BI Applications
Development Processes
Service Processes
Organization A
Organization B
Organization C
2.67
2.56
2.17
3.44
3.14
3.29
2.71
2.71
2.90
3.19
2.63
3.00
Table 53: Organizations‘ Maturity Scores.
3.89
3.00
3.71
3.43
3.66
2.87
Organization D
3.55
4.11
2.86
3.57
3.02
3.12
- 87 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
As shown in the picture depicting our model, a better way to see the alignment between the maturity
scores for the six categories is by drawing the radar graph. The radar graphs for all the organizations can
be seen in the figures below.
Service
Processes
Architecture
5
4
3
2
1
0
Development
Processes
Data
Modelling
Organization A
Ideal Situation
ETL
BI
Applications
Figure 11: Alignment Between Organization A‘s Maturity Scores.
Service
Processes
Architecture
5
4
3
2
1
0
Development
Processes
Data
Modelling
Organization B
Ideal Situation
ETL
BI
Applications
Figure 12: Alignment Between Organization B‘s Maturity Scores.
Service
Processes
Architecture
5
4
3
2
1
0
Development
Processes
Data
Modelling
Organization C
Ideal Situation
ETL
BI
Applications
Figure 13: Alignment Between Organization C‘s Maturity Scores.
- 88 -
DWCMM: The Data Warehouse Capability Maturity Model
Service
Processes
Architecture
5
4
3
2
1
0
Development
Processes
Data
Modelling
Catalina Sacu
Organization D
Ideal Situation
ETL
BI
Applications
Figure 14: Alignment Between Organization D‘s Maturity Scores.
Some more information regarding the maturity scores for all the four case studies can be seen in the table
below.
Organization
Maturity Score
Total Maturity Score for DW
Technical Solution
Total Maturity Score for DW
Organization & Processes
Overall Maturity Score
Highest Score
Lowest Score
A
B
C
D
2.67
3.00
3.51
3.52
2.77
3.10
3.26
3.07
2.72
ETL - 3.14
3.05
Data Modelling 3.44
Architecture - 2.56
3.38
Architecture 3.89
Service Processes
- 2.87
3.29
Data Modelling 4.11
ETL - 2.86
Data Modelling 2.17
Table 54: Maturity Scores Analysis.
As can be seen from table 53, maturity scores for each sub-category are usually between 2 and 4, with one
exception: organization D scored 4.11 for Data Modelling. Thus, the overall maturity scores and the total
score per category also ranged between 2 and 4 which shows that most organizations are probably
somewhere between the second and fourth stage of maturity. The highest maturity score was gotten by
organization C, and the lowest one by organization A. Apparently, an overall score close to 4 or 5 is quite
difficult to achieve. This is usually normal in maturity assessments, as in practice, nobody is so close to
the ideal situation. It will be interesting to see the range of scores after the questionnaire will have been
filled in by a large number of organizations.
From table 54 it can be seen that the categories with the highest and lowest scores are diverse depending
on the organization. For example, organization A scored lowest for Data Modelling, whereas Data
Modelling was the most mature variable for organization D. Interesting conclusions can also be drawn if
comparing the scores for organizations A and C as they are part of the same industry. The former is an
international food retailer and has more experience in this industry, whereas the latter is a local one with
less experience. However, organization A got a quite low DW maturity score. Thus, experience in the
industry does not also mean maturity in data warehousing. Of course, more factors can influence this
- 89 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
difference in scores: size, the way data warehousing/BI is embedded in the organizational culture, the
percentage from the IT budget for BI, etc.
As presented in the previous chapters, the goal of our model is not only to give a maturity score to a
specific organization, but also provide them with some feedback and the necessary steps for reaching a
higher maturity stage. For example, the overall maturity score for organization A is 2.72, which leaves a
lot of room for improvement. Moreover, as the lowest score is for Data Modelling, a good starting point
would be this category. Due to confidentiality reasons, more details regarding the maturity scores and
feedback cannot be offered here. The template used for giving feedback to the case studies can be seen in
appendix F.
6.2.2.6 Benchmarking
As already mentioned in the previous chapters, the DWCMM can serve as a benchmarking tool for
organizations. The DW maturity assessment questionnaire provides a quick way for organizations to
assess their DW maturity and, at the same time, compare themselves in an objective way against others in
the same industry or across industries. Of course, better results will be achieved for benchmarking after
more organizations will take the maturity assessment. However, in order to have a better image on how
the graph will look like when doing benchmarking, we will give an example here using the data from the
case studies we performed. A bar chart comparing organization A‘s scores with the best practice and with
the average maturity score is shown below.
Service Processes
Development Processes
BI Applications
Average Score
Best Practice
ETL
Organization A
Data Modelling
Architecture
0
1
2
3
4
5
Figure 15: Benchmarking for Organization A.
6.2.3 Case Studies Results and Conclusions
From the results gotten from the case studies, it can be said that the DWCMM could be successfully
applied in practice. However, this part of the validation process had a multiple goal:


First we wanted to see if organizations can understand the questions and the answers match to
their specific situation.
Second we wanted to see if the scoring method works and if we can offer specific feedback for
each organization to achieve a higher maturity level.
- 90 -
DWCMM: The Data Warehouse Capability Maturity Model

Catalina Sacu
Last, but not least, we wanted to receive feedback from them regarding the questions and their
answers.
Therefore, depending on the suggestions of each interviewee, we made the following changes and drew
some conclusions. An overview of these changes and conclusions is given further in this paragraph. The
final version of the questionnaire is shown in appendix B.
The main changes were done after the first case study. The changes proved to be successful as the same
problems were not met again at the following case studies. The first interviewee suggested that in order to
judge the maturity of DW/BI in an organization, it is also critical to see how strongly it is embedded in
the organizational culture and how important it is considered for the organization. As this is very hard to
assess, a first step was to add the following question to the DW General Questions: What percentage of
the IT department is taking care of BI? (i.e.: how many people from the total number of IT employees?).
Moreover, the answers from questions 3 and 4 regarding ETL suffered some minor changes as it was hard
for the respondent to choose the most appropriate answer for his organization. A little bit of confusion
was also created by the answers of questions 1 (i.e.: Which types of BI applications best describe the
highest level purpose of your DW environment?) and 6 (i.e.: Which BI applications delivery method best
describes the highest level purpose of your DW environment?) from the BI applications part. This is the
reason why we decided to arrange the answers in a hierarchical order so that it would be clear that even if
more answers match the company‘s current situation, only the one with the highest complexity will be
scored.
Several questions from the DW Organization and Processes part also suffered minor changes. For
example, answer d) from question 2 regarding the DW development processes changed from ―some
separation between environments (i.e.: at least three environments) with automatic transfer between
them‖ to ―some separation between environments (i.e.: at least two environments) with automatic transfer
between them‖. An important change was made to the last question from DW development processes
regarding the testing and acceptance phase. As it proved to be difficult for the interviewee to match an
answer to the current organizational situation, we decided to change the layout of the question to a
multiple choice one. Therefore, we consider that there are seven main elements that determine the
maturity of the testing and acceptance phase: unit testing by another person, system integration testing,
regression testing, user training, acceptance testing, standard procedure and documentation, external
assessments and reviews. Respondents can now choose all the elements characteristics to their
organization and we will give a normalized score, between 1 and 5, in order to match the overall scoring
method.
The last change we made after the first case study was to arrange the answers from the DW service
processes questions in a hierarchical order as organizations usually need to fulfill the requirements for an
inferior level in order to get to a higher one. Apparently, the mixed answers created a little bit of
confusion for the respondent. The problem with the hierarchical order was that respondents might give a
biased response, but when tested among the other three case studies, this did not seem to happen as we
got diverse scores depending on the question and the organization. Of course, more feedback regarding
this aspect will be received after testing the questionnaire in more organizations.
- 91 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
As already mentioned, most of the changes were made after the first case study. However, after receiving
the feedback from all the four respondents, we decided that further changes are needed for improving the
DW maturity assessment questionnaire.
First, the answer we proposed for the highest level of maturity for the first question regarding the
predominant architecture of the DW: ―a virtual integrated DW‖ will be changed to ―a DW/BI service that
federates a central enterprise DW and other data sources via a standard interface‖. To further accelerate
development and adapt quickly to changing business needs, mature organizations can redistribute some
development tasks to the business units and departments. However, a central DW is needed as a
repository for information shared across business units. Distributed groups are just allowed to build their
own applications within a framework of established standards, often maintained by a center of excellence.
In order to be successful, DW/BI solutions have to first be centralized and later federated (Eckerson,
2004). Another way to accelerate the development of BI-enabled solutions is for organizations to use
service oriented architecture (SOA). By wrapping BI functionality and query object models with Web
services interfaces, developers can make DW/BI capabilities available to any application regardless of the
platform it runs on or programming language it uses. As the previous answer was not very clear to the
interviewees, we believe that the latter will provide the right meaning for future respondents.
Furthermore, question 2 from Data Modelling is rather complex as it involves the synchronization
between a wide range of data models and maybe in the future it would be better if separated into more
questions. However, so far we have not come to an agreement how a better question and answers would
look like. Minor changes were made to the answers found on the second stage of maturity for questions 4,
5 and 6 regarding the standards and metadata for data modelling. For questions 4 and 5, we decided to
change the characteristic for the second level of maturity from ―solution dependent standards
implemented for some of the data models‖ to ―solution dependent standards‖ to make the distinction
between this maturity stage and the next one even stronger, as the next stage of maturity already involves
enterprise-wide (or team-wide) standards for some data models. A similar argument stands for changing
the characteristic on the second maturity stage for question 6 (regarding metadata management for data
models) from ―non standardized documentation for some of the data models‖ to ―non standardized
documentation‖.
Another question whose answers created some confusion was the question related to metadata
management for ETL. From the literature study we have done, the conclusion was drawn that usually
organization manage the business and technical metadata for some or all ETL, and usually the ones with a
broad experience in this field also manage the process metadata. However, one of the respondents
answered that they manage process and technical metadata for all ETL and business metadata only for
some ETL. Therefore, we consider that the answers to the question: “To what degree is your metadata
management implemented for your ETL?” may be something like the ones proposed here: a) no metadata
management; b) only one type of metadata managed (i.e.: business, technical or process); c) two types of
metadata managed (i.e.: whichever combination between business, technical and process); d) all three
types of metadata (i.e.: business, technical and process) managed for some ETL; d) all three types of
metadata managed for all ETL. However, these new characteristics need to be further tested in practice to
be validated.
- 92 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Moreover, another change to be considered is for question 7 from the DW development processes
regarding the DW project management. There are usually five main elements that determine the maturity
of a DW project management: project planning and scheduling, project tracking and control, project risk
management, standard procedure and documentation, and evaluation and assessment. Therefore, we
believe that a better layout and scoring for this question would be one similar to the one proposed for the
testing and acceptance phase.
While doing the case studies, we came up with a general question regarding the DW service processes
that includes the main activities from this phase. The concept of this question is the same as the one
proposed for the testing and acceptance phase from the DW development processes. We had the
opportunity to test it and score it for the last three case studies, but we did not include its score in the final
result in order to be able to compare all the four case studies on the same level. The question seems to
work in practice, although, as with the other questions with similar layout (i.e.: the ones for testing and
acceptance, project management, etc.), it is quite difficult to judge which characteristics should be on
which maturity stage.
A last remark is related to the questions on the defined, documented and implemented standards. As one
of the experts suggested, we divided the questions related to standards into two separate ones: the first
regarding the definition and documentation of standards, and the second one regarding the actual
implementation and following of standards. After testing the model, we saw that some organizations
consider these two aspects synonyms and sometimes fill in the same answers. However, as we believe
that this distinction should be made and, as we cannot generalize how other organizations would see this
problem, we will leave the questions separated for the time being.
To sum up, the DW maturity assessment questionnaire can be successfully applied in practice. We
generally received positive feedback regarding the questions and their answers from the case study
interviewees. In this way, we could test whether the questions and their answers are representative for
assessing the current DW solution for a specific organization and if they can be mapped to any
organization depending on the situational factors. We also had the chance to apply the scoring method
and give appropriate feedback for each case study. Finally, we combined all the feedback received from
the case studies and did some small, but valuable changes to some questions and answers which improved
our DWCMM as a whole.
6.3 Summary
This chapter presented the results of the two main activities done for evaluating our model: five expert
interviews and four case studies. We started by giving an overview of the experts and their affiliations,
and then we showed the main changes that resulted after the five expert interviews. We continued with
presenting the four case studies and the underlying respondents. Finally, we did an analysis of the
maturity scores gotten by the cases and we illustrated the changes made to the questionnaire that followed
the case studies.
In the next chapter, we conclude our research and present some discussion points and possible future
work.
- 93 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
7 Conclusions and Further Research
In this section, the main conclusions of this study are presented. Subsequently, some critical analysis of
the results is done and finally, recommendations for future research are made.
7.1 Conclusions
This research has been triggered by the estimates made by (Gartner, 2007) and other researchers that
more than fifty percent of DW projects have limited acceptance or fail. Therefore, we developed a Data
Warehouse Capability Maturity Model (DWCMM) that would help organizations assess their current DW
solution and provide guidelines for future improvements. The main elements that usually influence the
success of a DW environment are: technical components, end user adoption and usage, and business
value. However, we limited our research to the technical components, due to time constraints and the fact
that a solid technical solution usually is the foundation for the other two elements to be successful. In this
way we attempted to answer the main research question for our study:
How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?
The main conclusion from our study is that, even if our maturity model could help organizations improve
their DW solutions, there is no ―silver bullet‖ for a successful development of DW/BI solutions. The
DWCMM provides a quick way for organizations to assess their DW/BI maturity and compare
themselves in an objective way against others in the same industry or across industries. It received
positive feedback from the five experts that reviewed and validated it and it also resonated well with the
audiences from our four case studies. However, it is critical to emphasize the fact that the model only
does a high-level assessment. In order to truly assess the maturity of their DW/BI solutions and discover
the strong and weak variables, organizations should use our assessment as a starting point for a more
thorough analysis. Our research also showed that the model can be applied to a wide diversity of
organizations from different industries, but the results and guidelines for future improvement depend on
some situational factors specific for each organization. According to the experts that validated our model,
some important situational factors are: whether data warehousing and BI can act as a differentiator in
their specific industry, the size of the organization, their budget (especially the one for DW/BI), the
organizational culture regarding DW/BI, etc.
Moreover, our main research question is split into several more specific sub-questions:
1) What are data warehouses and business intelligence?
2) What do maturity models represent and which are the most representative ones for our research?
The answers to these two questions are the foundation for our research and the main ―artifact‖ we have
delivered – the DWCMM. In this way, we presented some theoretical background on the key concepts of
the study – data warehousing, business intelligence, maturity modelling – and discovered the research gap
that our model could fill in – the lack of a maturity model that would help organizations take a snapshot
of their current DW/BI technical solution and provide them systematic steps for achieving a higher
maturity stage, and thus, a DW/BI environment that would deliver better results.
- 94 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
3) What are the most important variables and characteristics to be considered when building a data
warehouse?
This question is addressed by the DWCMM and its components as presented in chapter 3. As already
mentioned, the model we developed is limited to the technical aspects and it considers two main
benchmark variables/categories for analysis, each of them having several sub-categories (i.e.: DW
Technical Solution – Architecture, Data Modelling, ETL, BI Applications; and DW Organization and
Processes – Development Processes, Service Processes).
4) How can we design a capability maturity model for a data warehouse assessment?
The answer to this question is offered by the DW Maturity Assessment Questionnaire and the underlying
Maturity Scoring and Maturity Matrices. The questionnaire includes several questions for each
benchmark category and sub-categories. In the end, a maturity score is given for each sub-category and
category, and of course, the end result is an overall maturity score. Depending on this, an informative
maturity stage can be pointed out for a specific organizations and some general feedback regarding the
maturity scores and future steps for improvement will be outlined.
5) To which extent does the data warehouse capability maturity model result in a successful assessment
and guideline for the analyzed organizations?
Once we developed the model, an evaluation phase was necessary to test its validity and depending on the
results, make the necessary adjustments. The DWCMM with all its components was initially reviewed by
five notable experts in this field, and then, tested in four organizations to see whether it can achieve its
goal or not. The model received positive feedback in general from the experts, and several minor changes
were made as can be seen in appendix C. Furthermore, the experts pointed out that even if the model
succeeds to emphasize the most important aspects involved in the development of a DW/BI project, it
might not be complete. Other benchmark categories and sub-categories should be added in the future.
Also, the DWCMM can serve as a high level technical assessment, but more questions and thorough
analysis are needed to dig deeper into the strengths and weaknesses of the DW/BI environment. The four
case studies offered us the possibility to test the model in practice. Generally speaking, the model seemed
to deliver the desired results. The respondents identified the categories and sub-categories from the model
and the questions and answers were usually well understood. Depending on their comments, we did
several readjustments, so that the assessment would be clearer and better understood by future
respondents. Moreover, the scoring method seemed to work well, and we were also able to offer feedback
to our respondents. Of course, we believe that more valuable feedback could be given in the future by
someone with more experience in this field. An observation at this point is that we cannot track what the
organizations are going to do with the results from this assessment and whether they are actually going to
take action to improve their DW/BI solution.
7.2 Limitations and Further Research
For every scientific research project, it is important to elaborate on its objectivity and limitations. First of
all, a limitation of this study is that it is based on the design science research which answers to research
questions in the form of design artifacts. Being a qualitative research method, a risk for objectivity might
arise. Hence, a certain influence of the experiences, opinions and feelings of the researcher on the analysis
- 95 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
is possible. In our study, the main deliverables were developed by doing thorough literature study, but
also in collaboration with a Dutch organization as described in the acknowledgements. Therefore, some
slightly noticeable lack of impartiality might have slipped into the initial structure of the model. However,
this point of weakness was minimized by doing the validation of the model with several experts in the
field.
Another limitation is the fact that our model was evaluated by conducting case study research. The
DWCMM was tested in four organizations where the position of the respondents in the organization and
their viewpoints might have biased the validation. For future reference, it would probably be advisable for
at least two respondents from one organization to take the assessment. There is a higher chance that the
results would be more objective. Also, due to the fact that the model was tested only in four cases, it is
not possible to generalize the findings to any given similar situation. Therefore, for further research, it
would be interesting to validate the model using quantitative research methods. An example would be to
have the assessment questionnaire filled in by a large number of organizations in order to be able to do
some statistical analysis on the data, more valuable benchmarking and improvements on the whole
structure of the model. Another interesting approach would be to interview more experts from different
organizations in order to come up with a different structure for the model, new benchmark categories and
sub-categories and of course, new maturity questions and answers. Moreover, as suggested by the experts,
new elements that could be analyzed further in the future are: data quality which is currently one of the
most important reasons for DW/BI failure and data governance. These two elements could be both part of
a bigger category, called data management.
An important aspect to mention here is the fact that our research is limited to the technical aspects of a
DW/BI project. Therefore, a point for future research would be to extend the model to the analysis of
DW/BI end user adoption and business value. New benchmark categories and maturity assessment
questions could be added regarding these two problems. Another future extension that would increase the
value of the model could include questions and analysis for other types of data modelling (e.g.:
normalized modelling, data vault, etc.) because, as stated earlier in this thesis, we limited our maturity
assessment only to dimensional modelling. Last, but not least, as already mentioned, our model is a high
level one. In the future, several questions could be added for a more detailed analysis of the current
DW/BI environment and more valuable feedback offered to organizations.
To sum up, this study can be seen as a contribution to understanding the main categories and elements
that determine the maturity of a DW/BI project. The developed model serves as an assessment for the
current DW/BI solution for a specific organization and offers guidelines for future improvements. As
shown, the model received positive feedback when validated, but there is always room for improvement.
And, due to the current economic situation, data warehousing and BI could really make a difference.
According to (Gartner, 2009), in the near future, organizations will have high expectations from their BI
and performance management initiatives to help transform and significantly improve their business. As
can be seen, data warehousing, BI and performance management are becoming more and more valuable
to organizations and a lot of developments could and will be done in this field and in our model, as a
consequence.
- 96 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
8 References
Aamodt, A., & Nygård, M. (1995). Different Roles and Mutual Dependencies of Data, Information and Knowledge.
Data and Knowledge Engineering, 16 , 191-222.
AbuAli, A., & Abu-Addose, H. (2010). Data Warehouse Critical Success Factors. European Journal of Scientific
Research, 42 , (2), 326-335.
AbuSaleem, M. (2005). The Critical Success Factors of Data Warehousing. Retrieved June 24, 2010, from Master's
Degree Programme in Advanced Financial Information Systems: http://www.pafis.shh.fi/graduates/majabu03.pdf
Ackoff, R. (1989). From Data to Wisdom. Journal of Applies Systems Analysis, 16 , 3-9.
Agresti, W. (2000). Knwoledge Management. In M. Zelkowitz, Advances in Computers (pp. 171-283). London:
Academic Press.
Aldrich, H., & Mindlin, S. (1978). Uncertainty and Dependence: Two Perspectives on Environment. In L. Karpik,
Organization and Environment: Theories, Issues and Reality (pp. 149-170). London: Sage Publications Inc.
April, A., Hayes, J., Abran, A., & Dumke, R. (2004). Software Maintenance Maturity Model: the Software
Maintenance Process Model. Journal of Software Maintenance and Evolution: Research and Practice, 17 , (3), 197
- 223.
Arnott, D., & Pervan, G. (2005). A Critical Analysis of Decision Support Systems Research. Journal of Information
Technology, 20 , (2), 67-87.
Azvine, B. C. (2005). Towards Real-Time Business Intelligence. BT Technology Journal, 23 , (3), 214-225.
Batory, D. (1988). Concepts for a Database System Synthesizer. Proceedings of International Conference on
Principles of Database Systems. Paris.
Becker, J., Knackstedt, R., & Poppelbus, J. (2009). Developing Maturity Models for IT Management: A Procedure
Model and its Application. Business & Information Systems Engineering, 1 , (3), 213-222.
Benbasat, I., Goldstein, D., & Mead, M. (1987). The Case Research Strategy in Studies of Information Systems. MIS
Quarterly, 11 , (3), 369-386.
Bennett, K. (2000). Software Maintenance: a Tutorial. In M. Dorfman, & R. Thayer, Software Engineering (pp. 289303). Los Alamitos: IEEE Computer Society Press.
Blumberg, R., & Atre, S. (2003). The Problem with Unstructured Data. Retrieved July 23, 2010, from Information
Management: http://www.information-management.com/issues/20030201/6287-1.html
Boehm, B. (1988). A Spiral Model for Software Development and Enhancement. IEEE, 21 , (5), 61-72.
Boisot, M., & Canals, A. (2004). Data, Information and Knowledge: Have We Got It Right? Journal of Evolutionary
Economics, 14 , (1), 43-67.
Breitner, C. (1997). Data Warehousing and OLAP: Delivering Just-in-Time Information for Decision Support.
Proceeding of the 6th International Workshop for Oeconometrics. Karlsruhe, Germany.
Breslin, M. (2004). Data Warehousing - Battle of the Giants: Comparing the Basics of the Kimball and Inmon
Models. Business Intelligence Journal, 9 , (1), 6-20.
- 97 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Bruckner, R., List, B., & Schiefer, J. (2002). Striving Towards Near Real-Time Data Integration for Data
Warehouses. In Lecture Notes in Computer Science (pp. 173-182). Berlin: Springer.
Cater-Steel, A. (2006). Transforming IT Service Management - the ITIL Impact. Proceedings of the 17th
Australasian Conference on Information Systems. Adelaide, Australia.
Cavaye, A. (1996). Case Study Research: A Multifaceted Research Approach for Information Systems. Information
Systems Journal, 6 , 227-242.
Chamoni, P., & Gluchowski, P. (2004). Integrationstrends bei Business-Intelligence-Systemen, Empirische
Untersuchung auf Basis des Business Intelligence Maturity Model. Wirtschaftsinformatik, 46 , (2), 119-128.
Chauduri, S., & Dayal, U. (1997). An Overview of Data Warehousing and OLAP Technology. ACM Sigmod
Record, 26 , (1), 65-74.
Chen, P. (1975). The Entity-Relationship Model — Toward a Unified View of Data. Proceedings of the
International Conference on Very Large Data Bases, (pp. 9-36). Framingham, Massachusetts, USA.
Choo, C. (1995). Information Management for the Intelligent Organization. Medford, NJ: Information Today, Inc.
Chung, W., Chen, H., & Nunamaker Jr., J. (2005). A Visual Framework for Knowledge Discovery on the Web: An
Empirical Study of Business Intelligence Exploration. Journal of Management, 21 , (4), 57-84.
Codd, E. (1970). A Relational Model for Large Shared Data Banks, 13. Communications of the ACM, 13 , 377-387.
Colin, R. (2004). An Introductory Overview of ITIL. Reading, United Kingdom: itSMF Publications.
Darke, P., Shanks, G., & Broadbent, M. (1998). Successfully Completing Case Study Research: Combining Rigour,
Relevance and Pragmatism. Information Systems Journal, 8 , (4), 273-289.
Davenport, T., & Prusak, L. (2000). Working Knowledge: How Organizations Manage What They Know. Harvard:
Harvard Business Press.
Dayal, U., Castellanos, M., Simitsis, A., & Wilkinson, K. (2009). Data Integration Flows for Business Intelligence.
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database
Technology (pp. 1-11). Saint Petersburg, Russia: ACM.
de Bruin, T., Freezey, R., Kulkarniz, U., & Rosemann, M. (2005). Understanding the Main Phases of Developing a
Maturity Assessment Model. Proceedings of the 16th Australasian Conference on Information Systems. Sydney,
Australia.
Devlin, B., & Murphy, P. (1988). An Architecture for a Business and Information Systems. IBM Systems Journal,
27 , (1).
Drucker, P. (1999). Management Challenges for the 21st Century. Oxford: Butterworh-Heinemann.
Eckerson, W. (2009). Delivering Insights with Next Generation Analytics. Retrieved April 23, 2010, from The Data
Warehousing Institute: http://tdwi.org/research/2009/07/beyond-reporting-delivering-insights-with-nextgenerationanalytics.aspx?tc=page0
Eckerson, W. (2004). Gauge Your Data Warehousing Maturity. Retrieved July 3, 2010, from The Data Warehousing
Institute: http://tdwi.org/Articles/2004/10/19/Gauge-Your-Data-Warehousing-Maturity.aspx?Page=2
- 98 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Eckerson, W. (2006). Performance Dashboards. New Jersey: John Wiley & Sons, Inc.
Eckerson, W. (2007). Predictive Analytics: Extending the Values of Your Data Warehousing Investment. Retrieved
June 30, 2010, from SAS: http://www.sas.com/feature/analytics/102892_0107.pdf
Fayyad, U., Gregory, P., & Padhraic, S. (1996). From Data Mining to Knowledge Discovery in Databases. The AI
Magazine, 17 , (3), 37-54.
Feinberg, D., & Beyer, M. (2010). Magic Quadrant for Data Warehouse Database Management Systems. Retrieved
July 21, 2010, from Business Intelligence: http://www.businessintelligence.info/docs/estudios/Gartner-MagicQuadrant-for-Datawarehouse-Systems-2010.pdf
Ferguson, R., & Jones, C. (1969). A Computer Aided Decision System. Management Science, 15 , (10), B550-B561.
Fitzgerald, G. (1992). Executive Information Systems and Their Development in the U.K.: A Research Study.
International Information Systems, 1 , (2),1-35.
Galliers, R., & Sutherland, A. (1991). Information Systems Management and Strategy Formulation: the "Stages of
Growth". Information Systems Journal, 1 , (2), 89-114.
Gartner. (2007, February 1). Creating Enterprise Leverage: The 2007 CIO Agenda . Retrieved June 24, 2010, from
Gartner: http://www.gartner.com/DisplayDocument?id=500835
Gartner. (2009). Gartner Reveals Five Business Intelligence Predictions for 2009 and Beyond. Retrieved August 6,
2010, from Gartner: http://www.gartner.com/it/page.jsp?id=856714
Golfarelli, M., & Rizzi, S. (2009). A Comprehensive Approach to Data Warehouse Testing. Proceeding of the ACM
twelfth international workshop on Data warehousing and OLAP, (pp. 17-24). Hong Kong.
Golfarelli, M., & Rizzi, S. (1998). A Methodological Framework for DW Design. Proceedings ACM First
International Workshop on Data Warehousing and OLAP (DOLAP). Washington, D.C., USA.
Golfarelli, M., & Rizzi, S. (1999). Designing the Data Warehouse: Key Steps and Crucial Issues. Journal of
Computer Science and Information Management, 2 , (3).
Golfarelli, M., Rizzi, S., & Cella, I. (2004). Beyond Data Warehousing - What's Next in Business Intelligence?
Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, (pp. 1-6). Washington, D.C.,
USA.
Gorry, A., & Morton, S. (1971). A Framework for Information Systems. Sloan Management Review, 13 , 56-79.
Gray, P., & Negash, S. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information
Systems, (pp. 3190-3199). Tampa, Florida, USA.
Grönroos, C. (1990). Service Management and Marketing - Managing the Moments of Truth in Service Competition.
Lexington: Lexington Books.
Hakes, C. (1996). The Corporate Self Assessment Handbook, 3rd edition. London: Chapman & Hall.
Hansen, W. (1997). Vorgehensmodell zur Entwicklung Einer Data Warehouse Losung. In H. Mucksch, & W.
Behme, Das Data Warehouse Konzept (pp. 311-328). Wiesbaden: Gabler.
- 99 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Hayen, R., Rutashobya, C., & Vetter, D. (2007). An Investigation of the Factors Affecting Data Warehousinf
Success. Issues in Information Systems, 8 , (2), 547-553.
Hevner, A., March, S., Park, J., & Ram, S. (2004). Design Science in Information Systems Research. Management
Information Systems Quarterly, 28 , (1), 75-106.
Hey, J. (2004). The Data, Information, Knowledge, Wisdom Chain: the Metaphorical Link. Retrieved June 27, 2010,
from http://best.berkeley.edu/~jhey03/files/reports/IS290_Finalpaper_HEY.pdf
Hoffman, R., Shadbolt, N., Burton, A., & Klein, G. (1995). Eliciting Knowledge from Experts: A Methodological
Analysis. Organizational Behaviour and Human Decision Processes, 62 , (2), 129-158.
Holsheimer, M., & Siebes, A. (1994). Data Mining: the Search for Knowledge in Databases (9406). Amsterdam:
Centrum voor Wiskunde en Informatica.
Hostmann, B. (2007). BI Competency Centers: Bringing Intelligence to the Business. Retrieved July 3, 2010, from
Business Performance Management: http://bpmmag.net/mag/bi_competency_centers_intelligence_1107/index2.html
Huber, G. (1984). Issues in the Design of Group Decision Support Systems. MIS Quarterly, 8 , (3), 195-204.
Humphries, M., Hawkins, M., & Dy, M. (1999). Data Warehousing: Architecture and Implementation. New Jersey:
Prentice Hall PTR.
Husemann, B., Lechtenborger, J., & Vossen, G. (2000). Conceptual Data Warehouse Design. Proceedings of the
International Workshop on Design and Management of Data Warehouses. Stockholm, Sweden.
Hwang, H., Ku, C., Yen, D., & Cheng, C. (2005). Critical Factors Influencing the Adoption of Data Warehouse
Technology: A Study of the Banking Industry in Taiwan. Decision Support Systems, 37 , 1-21.
IEEE. (1990). Standard Glossary of Software Engineering Terminology (IEEE STD 610.12). New York: Institute of
Electrical and Electronics Engineers, Inc.
Inmon, W. (1992). Building the Data Warehouse. Indianapolis: John Wiley and Sons, Inc.
Inmon, W. (2005). Building the Data Warehouse, 4th edition. Indianapolis: Wiley Publishing, Inc.
Jashapara, A. (2004). Knowledge Management: An Integrated Approach. Harlow: Finance Times Prentice Hall.
Kaplan, R., & Norton, D. (1992). The Balanced Scorecard - Measure that Drive Performance. Harvard Business
Review, 70 , (1), 71-79.
Kaula, R. (2009). Business Rules for Data Warehouse. International Journal of Information Technology, 5 , 58-66.
Kaye, D. (1996). An Information Model of Organization. Managing Information, 3 , (6),19-21.
Kimball, R. (1996). The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data
Warehouses. New York: John Wiley & Sons, Inc.
Kimball, R., & Caserta, J. (2004). The Data Warehouse ETL Toolkit. Indianapolis: Wiley Publishing, Inc.
Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit,
2nd Edition. Indianapolis: John Wiley.
- 100 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit,
2nd Edition. Indianapolis: Wiley Publishing, Inc.
Klimko, G. (2001). Knowledge Management and Maturity Models: Building Common Understanding . Proceedings
of the 2nd European Conference on Knowledge Management, (pp. 269-278). Bled, Slovenia.
Kraemer, K., & King, J. (1988). Computer-Based Systems for Cooperative Work and Group Decision Making. ACM
Computing Surveys , (2), 115-146.
Kuhn, T. (1974). Second Thoughts on Paradigms. In F. Suppe, The Structure of Scientific Theories. Urbana: The
University of Illinois Press.
Lewis, J. (2001). Project Planning, Scheduling and Control, 3rd Edition. New York: McGraw-Hill.
Loshin, D. (2003). Business Intelligence: the Savvy Manager's Guide. San Francisco: Morgan Kaufmann Publishers.
Loshin, D. (2003). Business Intelligence: The Savvy Manager's Guide. San Francisco: Morgan Kaufmann
Publishers.
Luhn, H. (1958). A Business Intelligence System. IBM Journal of Research and Development, 2 , (4), 314-319.
Madden, S. (2006). Rethinking Database Appliances. Retrieved July 21, 2010, from Information Management:
http://www.information-management.com/specialreports/20061024/1066827-1.html?pg=1
March, S., & Hevner, A. (2007). Integrated Decision Support Systems: A Data Warehousing Perspective. Decision
Support Systems, 43 , (3), 1031-1043.
Moody, D., & Kortink, M. (2000). From Enterprise Models to Dimensional Models: A Methodology for Data
Warehouse and Data Mart Design. Proceedings of the International Workshop on Design and Management of Data
Warehouses, (pp. 1-12). Stockholm.
Moss, L., & Atre, S. (2003). Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support
Applications. Boston: Addison Wesley.
Mukherjee, D., & D'Souza, D. (2003). Think Phased Implementation for Successful Data. Information Systems
Management, 20 , (2), 82-90.
Munoz, L., Mazon, J., Pardillo, J., & Trujillo, J. (2008). Modelling ETL Processes of Data Warehouses with UML
Activity Diagrams. Proceedings of the OTM Workshops (pp. 44-53). Monterrey, Mexico: Springer.
Murtaza, A. (1998). A Framework for Developing Enterprise Data Warehouses. Information Systems Management,
15 , (4), 21-26.
Nagabhushana, S. (2006). Data Warehousing. OLAP and Data Mining. New Delhi: New Age International Limited.
Navathe, S. B. (1992). Evolution of Data Modelling for Databases. Communications of the ACM, 35 , (9), 112-123.
Negash, S., & Gray, P. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information
Systems, (pp. 3190-3199). Tampa, Florida, USA.
Niessink, F., & van Vliet, H. (2000). Software Maintenance from a Service Perspective. Journal of Software
Maintenance: Research and Practice, 12 , (2), 103-120.
- 101 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Niessink, F., & van Vliet, H. (1999). The IT Service Capability Maturity Model (IR-463). Amsterdam: Division of
Mathematics and Computing Science, Vrije Universiteit.
Nolan, R. (1973). Managing the Computer Resource: A Stage Hypothesis. Communications of the ACM, 16 , (7),
399-405.
Nonaka, I. (1991). The Knowledge-Creating Company. Harvard Business Review, 79 , (6), 96-104.
O'Reilly, C. (1980). Individuals and Information Overload in Organizations: Is More Necessarily Better? Academy
of Management Journal, 23 , (4), 684-696.
Paulk, M., Weber, C., Curtis, B., & Chrissis, M. (1995). The Capability Maturity Model: Guidelines for Improving
the Software Process. Boston: MA: Addison-Wesley.
Ponniah, P. (2001). Data Warehousing Fundamentals. New York: John Wiley & Sons, Inc.
Porter, M. (1985). Competitive Advantage. New York: The New Press.
Power, D. (2003). A Brief History of Decision Support Systems. Retrieved June 30, 2010, from Decision Support
Systems Resources: http://dssresources.com/history/dsshistoryv28.html
Prakash, N., & Gosain, A. (2008). An Approach to Engineering the Requirements of Data Warehouses.
Requirements Engineering, 13 , (1), 49-72.
Rahm, E., & Hai Do, H. (2000). Data Cleaning: Problems and Current Approaches. Bulletin of the Technical
Committee on Data Engineering, 23 , (4), 3-13.
Rangaswamy, A., & Shell, G. (1997). Using Computers to Realize Joint Gains in Negotiations: Toward an
‗Electronic Bargaining Table‘. Management Science, 43 , (8), 1147-1163.
Royce, W. (1970). Managing the Development of Large Software Systems. Proceedings of the Western Electronic
Show and Convention (WesCon). Los Angeles.
Runeson, P., & Host, M. (2009). Guidelines for Conducting and Reporting Case Study Research in Software
Engineering. Empirical Software Engineering, 14 , (2), 131-164.
Rus, I., & Lindvall, M. (2002). Knowledge Management in Software Engineering. IEEE Software, 19 , (3), 26-38.
Salle, M. (2004). IT Service Management and IT Governance: Review, Comparative. Retrieved July 16, 2010, from
HP Technical Reports: http://www.hpl.hp.com/techreports/2004/HPL-2004-98.pdf
Schneidewind, N. (1987). The State of Maintenance. IEEE Transactions on Software Engineering, 13 , (3), 303-310.
Schwaninger, M. (2001). Intelligent Organizations: An Integrative Framework. SystemsResearch and Behavioral
Science, 18 , 137-158.
Sen, A., & Sinha, A. (2005). A Comparison of Data Warehousing Methodologies. Communications of the ACM, 48 ,
(3), 79-84.
Seufert, A., & Schiefer, J. (2005). Enhanced Business Intelligence - Supporting Business Processes with Real-Time
Business Analytics. Proceedings of the 16th International Workshop on Database and Expert Systems Applications,
(pp. 919-925). Copenhagen, Denmark.
- 102 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Shankaranarayanan, G., & Even, A. (2004). Managing Metadata in Data Warehouses: Pitfalls and Possibilities.
Communications of the Association for Information Systems, 14 , 247-274.
Simitsis, A. (2004). Modelling and Optimization of ETL Processes in Data Warehouse Environments. Athens:
National Technical University of Athens.
Simitsis, A., Vassiliadis, P., & Sellis, T. (2005). Optimizing ETL Processes in Data Warehouses. Proceedings of the
21st International Conference on Data Engineering (pp. 564-575). Tokyo, Japan: IEEE Computer Science.
Simitsis, A., Vassiliadis, P., & Sellis, T. (2005). State-Space Optimization of ETL Workflows. IEEE Transactions
on Knowledge and Data Engineering, 17 , (10), 1404-1419.
Simsion, G. C., & Witt, G. C. (2005). Data Modelling Essential, 3rd Edition. San Francisco: Morgan Kaufmann
Publishers.
Solomon, M. (2005). Ensuring a Successful Data Warehouse Initiative. Information Systems Management, 22 , (1),
26-36.
Sommerville, I. (2007). Software Engineering, 8th Edition. Harlow: Addison-Wesley.
Thomas, J. (2001). Business Intelligence - Why? eAI Journal , 47-49.
Tijsen, R., Spruit, M., van Raaij, B., & van de Ridder, M. (2009). BI-FIT: The Fit between Business Intelligence,
End-Users, Tasks and Technologies. Utrecht: Utrecht University.
Tremblay, M., Fuller, R., Berndt, D., Studnicki, & J. (2007). Doing More with More Information: Changing
Healthcare Planning. Decision Support Systems, 43 , 1305-1320.
Tryfona, N., Busborg, F., & Christiansen, J. (1999). Data Warehousing and OLAP. Proceedings of the 2nd ACM
international workshop on Data warehousing and OLAP, (pp. 3-8). Kansas City, Missouri, United States .
Turban, E., Aronson, J., Liang, T., & Sharda, R. (2007). Business Intelligence and Decision Support Systems. New
Jersey: Pearson Education International.
Vaishnavi, V., & Kuechler, W. (2008). Design Science Research Methods and Patters: Innovating Information and
Communication Technology. Boca Raton, Florida: Auerbach Publications Taylor & Francis Group.
van Bon, J. (2007). IT Service Management: An Introduction. Zaltbommel: Van Haren Publishing.
van Bon, J. (2000). World Class IT Service Management Guide . The Hague: ten Hagen & Stam Publishers.
Vanichayobon, S., & Gruenwald, L. (2004). Indexing Techniques for Data Warehouses’ Queries. Retrieved July 3,
2010, from Univerisyt of Oklahoma Database: http://www.cs.ou.edu/~database/documents/vg99.pdf
Varga, M., & Vukovic, M. (2008). Feasability of Investment in Business Analytics. Journal of Information and
Organizational Sciences, 31 , (2), 50-62.
Vitt, E., & Luckevich, M. &. (2002). Business Intelligence: Making Better Decisions Faster. Redmond: Microsoft
Press.
Walker, D. (2006). Overview Architecture for Enterprise Data Warehouses. Retrieved July 23, 2010, from Data
Management & Warehousing : http://www.datamgmt.com/
- 103 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Watson, H., Ariyachandra, T., & Matyska, R. (2001). Data Warehousing Stages of Growth. Information Systems
Management, 18 , (3), 42-50.
Winter, R., & Stauch, B. (2003). A Method for Demand-driven Information Requirements Analysis in Data
Warehousing Projects. Proceedings of the 36th Hawaii International Conference on System Sciences. Big Island:
IEEE Computer Society.
Wixom, B., & Watson, H. (2001). An Empirical Investigation of the Factors Affecting Data Warehousing Success.
MIS Quarterly, 25 , (1).
Yin, R. (2009). Case Study Research Design and Methods. Thousand Oaks, California: SAGE Inc.
Young, C. (2004). An Introduction to IT Service Management. Research Note, COM-10-8287 .
Zeithaml, V., & Bitner, M. (1996). Service Marketing. New York: McGraw-Hill.
Zins, C. (2007). Conceptual Approaches for Defining Data, Information, and Knowledge. Journal of the American
Society for Information Science and Technology, 58 , (4), 479-493.
- 104 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Appendix A: DW Detailed Maturity Matrix
Initial (1)
DW Technical Solution
Architecture
Repeatable (2)
Defined (3)
Managed (4)
Optimized (5)
Desktop data marts (e.g.:
Excel sheets)
Multiple independent data
marts
Multiple
independent
data warehouses
A single, central DW
with multiple data
marts (Inmon) or
conformed data marts
(Kimball)
A DW/BI service that
federates a central
enterprise DW and
other data sources via a
standard interface
No
business
rules
defined or implemented
Few business rules defined
or implemented
Some business rules
defined or implemented
Most business rules
defined
or
implemented
All
business
rules
defined or implemented
No metadata
management
Non-integrated
by solution
Central
metadata
repository separated by
tools
Central
up-to-date
metadata repository
Web-accessed central
metadata
repository
with
integrated,
standardized, up-to-date
metadata
No security implemented
Authentication security
Independent
authorization for each
tool
Role-level security at
database level
Integrated
companywide
authorization security
CSVs files
Operational databases
ERP and CRM systems;
XML files
Unstructured
data
sources (e.g.: text or
documents)
Various
types
of
unstructured
data
sources (e.g.: images,
videos) and Web data
sources
No methods to increase
performance
Software
performance
tuning
(e.g.:
index
management, parallelizing
and partitioning system,
views materialization)
Shared OLTP systems and
DW environment
Hardware performance
tuning (e.g.: DW server)
Software
hardware tuning
and
Specialized
appliances
DW
Separate OLTP systems
and DW environment
Separate servers for
OLTP systems, DW,
ETL
and
BI
applications
Specialized
appliances
DW
Weekly update
Daily update
Inter-daily update
Real-time update
Desktop platform
Monthly update or less
often
Initial (1)
metadata
Repeatable (2)
Data Modelling
Defined (3)
Managed (4)
Optimized (5)
- 105 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
No data modelling tool
Data modelling tools used
only for design
Data modelling
used
also
maintenance
tools
for
Standardized
data
modelling tool used for
design
Standardized
data
modelling tool used
for
design
and
maintaining metadata
No
synchronization
between data models
Manual synchronization of
some of the data models
Manual or automatic
synchronization
depending on the data
models
Automatic
synchronization of most
of the data models
Automatic
synchronization of all
the data models
No
between
levels
differentiation
data models
Logical and physical levels
designed for some data
models
Logical and physical
levels designed for all
the data models
Conceptual level also
designed for some data
models
All data models have
conceptual, logical and
physical
levels
designed
No standards defined or
implemented for data
models
Enterprise-wide
standards defined or
implemented for some of
the data models
Standardized
documentation for some
of the data models
Enterprise-wide
standards defined or
implemented for most
of the data models
Standardized
documentation for most
of the data models
Enterprise-wide
standards defined or
implemented for all
the data models
Standardized
documentation for all
the data models
Very few fact tables
have their granularity at
the lowest level possible
Solution-dependent
standards
defined
or
implemented for some of
the data models
Non
standardized
documentation for some of
the data models
Few fact tables have their
granularity at the lowest
level possible
Some fact tables have
their granularity at the
lowest level possible
Most fact tables have
their granularity at the
lowest level possible
All fact tables have
their granularity at the
lowest level possible
No
dimensions
conformed
Conformed dimensions for
few business processes
Conformed dimensions
for
some
business
processes
Enterprise-wide
standardized conformed
dimensions for most
business processes; also
making use of a high
level design technique
such as an enterprise
bus matrix
Enterprise-wide
standardized
conformed dimensions
for
all
business
processes
Few
dimensions
designed; no hierarchies
or
surrogate
keys
designed
Some dimensions designed
with surrogate keys and
basic hierarchies
Most
dimensions
designed with surrogate
keys
and
complex
hierarchies
Slowly
changing
dimensions techniques
(i.e.: type 2, 3 and
more) also designed
Besides
regular
dimensions and slowly
changing dimensions,
special dimensions are
also designed (e.g.:
mini, monster, junk
dimensions)
Initial (1)
Repeatable (2)
Managed (4)
Optimized (5)
No documentation for
any data models
ETL
Defined (3)
Only hand-coded ETL
Hand-coded ETL and some
standard scripts
ETL tool(s) for all the
ETL
design
and
generation
Simple ETL that just
extracts and loads data
into the data warehouse
Basic ETL with simple
transformations such as:
format changes, sorting,
filtering, joining, deriving
new calculated values,
aggregation,
etc
and
surrogate key generator
Advanced
ETL
capabilities:
slowly
changing
dimensions
manager,
reusability,
change data capture
system, de-duplication
and matching system,
Standardized ETL tool
and some standard
scripts
for
better
performance
More advanced ETL
capabilities: error event
table creation, audit
dimension creation, late
arriving data handler,
hierarchy
manager,
Complete
generated
metadata
ETL
from
Optimized ETL for a
real time DW (realtime ETL capabilities)
- 106 -
DWCMM: The Data Warehouse Capability Maturity Model
data quality system
Catalina Sacu
Daily
automation:
yes/no;
Specific data quality
tools: yes/no;
Identifying data quality
issues: yes;
Solving data quality
issues: no
Manual
restart
and
recovery system: yes;
Simple monitoring: yes;
Advanced monitoring:
yes;
Real-time monitoring: no
special
dimensions
manager
Daily
automation:
yes/no;
Specific data quality
tools:
yes/no;
Identifying data quality
issues: yes;
Solving data quality
issues: yes
Manual and automatic
restart and recovery
system: yes;
Simple monitoring: yes;
Advanced monitoring:
yes;
Real-time monitoring:
no
Few standards defined or
implemented for ETL
Some standards defined
or implemented for ETL
Most standards defined
or implemented for
ETL
Business and technical
metadata for some ETL
Business and technical
metadata for all ETL
Process metadata is also
managed for some ETL
Daily automation: no;
Specific data quality
tools: no;
Identifying data quality
issues: no;
Solving data quality
issues: no
Daily automation: no;
Specific data quality tools:
no;
Identifying data quality
issues: yes;
Solving data quality issues:
no
Restart and recovery
system: no;
Simple monitoring: no;
Advanced monitoring:
no;
Real-time monitoring: no
Restart
and
recovery
system: no;
Simple monitoring: yes;
Advanced monitoring: no;
Real-time monitoring: no
No standards
No metadata
management
BI Applications
Defined (3)
Daily automation: yes;
Specific data quality
tools: yes;
Identifying
data
quality issues: yes;
Solving data quality
issues: yes
Completely automatic
restart and recovery
system: yes;
Simple
monitoring:
yes;
Advanced monitoring:
yes;
Real-time monitoring:
yes
All
the
standards
defined
or
implemented for ETL
All types of metadata
are managed for all
ETL
Initial (1)
Repeatable (2)
Managed (4)
Optimized (5)
Static and parameterdriven reports and query
applications
Ad-hoc reporting; online
analytical
processing
(OLAP)
Visualization techniques:
dashboards
and
scorecards
Predictive
analytics:
data and text mining;
alerts
Closed
loop
BI
applications; real-time
BI applications
BI tool related to the
data mart
More than two tools for
main stream BI (i.e.:
reporting and visualization
applications)
One tool recommended
for main stream BI, but
each department can use
their own tool
One tool for main
stream BI, but each
department can use
their own tool for
specific BI applications
(e.g.: data mining,
financial analysis, etc.)
One tool for main
stream BI and one tool
for
specific
BI
applications
No standards
Few standards defined or
implemented
for
BI
applications
Some standards defined
or implemented for BI
applications
Most standards defined
or implemented for BI
applications
All
the
standards
defined
or
implemented for BI
applications
Objects defined for
every BI application
Some reusable objects for
similar BI applications
Some standard objects
and templates for similar
BI applications
Most
similar
applications
standard objects
templates
All
similar
BI
applications
use
standard objects and
templates
BI
use
and
- 107 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Reports are delivered
manually on paper or by
email
Reports
are
delivered
automatically by email
Direct
interface
tool-based
A BI portal with basic
functions: subscriptions
, discussions forum,
alerting
No metadata available
Some incomplete metadata
documents that users ask
for periodically
Complete
up-to-date
metadata documents sent
to users periodically or
available on the intranet
Metadata is always
available through a
metadata management
tool, different from the
BI tool
Initial (1)
DW Organization & Processes
Development Processes
Repeatable (2)
Defined (3)
Managed (4)
Ad-hoc
development
processes; no clearly
defined
development
phases (i.e.: planning,
requirements definition,
design,
construction,
deployment,
maintenance)
Repeatable development
processes
based
on
experience with similar
projects;
some
development phases clearly
separated
Standard
documented
development processes;
iterative and incremental
development processes
with all the development
phases clearly separated
Development processes
continuously measured
against
well-defined
and consistent goals
No separation between
environments
Two separate environments
(i.e.: usually development
and
production)
with
manual transfer between
them
Some
separation
between environments
(i.e.: at least three
environments)
with
manual transfer between
them
Some
separation
between environments
(i.e.: at least two
environments)
with
automatic
transfer
between them
No standards defined or
implemented
Few standards defined or
implemented
Some standards defined
or implemented
A lot of the standards
defined or implemented
No quality
activities
Ad-hoc quality assurance
activities
Standardized
and
documented
quality
assurance activities done
for all the development
phases
Level 3) + measurable
and prioritized goals for
managing the DW
quality
(e.g.:
functionality,
reliability,
maintainability,
usability)
Chief information officer
(CIO) or an IT director
Single sponsor from a
business
unit
or
department
Multiple
individual
sponsors from multiple
business
units
or
assurance
No project sponsor
Highly
interactive,
business
process
oriented,
up-to-date
portal
(no
differentiation
between operational
and BI portals)
Complete integration
of metadata with the
BI
applications
(metadata
can be
accessed through one
button push on the
attributes, etc.)
Optimized (5)
Continuous
development process
improvement
by
identifying
weaknesses
and
strengthen the process
proactively, with the
goal of preventing the
occurrence of defects
All the environments
are
distinct
with
automatic
transfer
between them
A comprehensive set
of standards defined or
implemented
Level 4) + causal
analysis meetings to
identify
common
defect causes and
subsequent elimination
of
these
causes;
service
quality
management
certification
Multiple levels of
business-driven, crossdepartmental
- 108 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
departments
sponsorship including
top level management
sponsorship (BI/DW is
integrated
in
the
company process with
continuous budget)
Project planning and
scheduling;
project
risk
management;
project tracking and
control;
standard and efficient
procedure
and
documentation;
evaluation
and
assessment
Level 4) + periodic
evaluation
and
assessment of roles
(i.e.:
assess
the
performance of the
roles and match the
needed roles with
responsibilities
and
tasks)
Continuously
improving
interorganizational
knowledge sharing
No project management
activities
Project
planning
scheduling
and
Some of the main project
management activities
(project planning and
scheduling; project risk
management;
project
tracking and control)
Some
project
management activities;
standard and efficient
procedure
and
documentation
No formal roles defined
Defined roles, but not
technically implemented
Formalized
and
implemented roles and
responsibilities
Level 3) + periodic peer
reviews (i.e.: review of
each other‘s work)
Ad-hoc
knowledge
gathering and sharing
Organized
knowledge
sharing through written
documentation
and
technology
(e.g.:
knowledge
databases,
intranets, wikis, etc.), and
also through training and
mentoring programs
Knowledge management
is
standardized;
knowledge creation and
sharing
through
brainstorming, training
and mentoring programs,
and also through the use
of technology
Central business unit
knowledge
management;
quantitative knowledge
management
control
and periodic knowledge
gap analysis
Ad-hoc
requirements
definition;
no
methodology used
Methodologies differ from
project
to
project;
interviews with business
users for collecting the
requirements
Level 3) + qualitative
assessment
and
measurement of the
phase;
requirements
document
also
published
Level 4) + causal
analysis meetings to
identify
common
bottlenecks causes and
subsequent elimination
of these causes
Only unit testing is done;
no
standards
or
documentation
Other types of testing are
beginning to be done
(some of the following:
unit testing by another
person; system integration
testing; regression testing;
Standard methodology
for all the projects;
interviews and group
sessions
with
both
business and IT users for
collecting
the
requirements
Diverse types of testing;
some standards
Diverse
types
of
testing;
standard procedure and
documentation
All the main types
testing (unit testing by
another person; system
integration
testing;
regression
testing;
acceptance testing);
- 109 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
acceptance testing)
Initial (1)
user training;
standard
procedure
and
documentation;
external assessments
and reviews
Repeatable (2)
Service Processes
Defined (3)
Managed (4)
Optimized (5)
level 4) + causal
analysis meetings to
identify
common
defect causes and
subsequent elimination
of
these
causes;
service
quality
management
certification
Continuously
improving
interorganizational
knowledge
management
No
service
quality
management activities
Ad-hoc service
management
quality
Proactive service quality
management including a
standard procedure
level 3) + service
quality measurements
periodically compared
to the established goals
to
determine
the
deviations and their
causes
Ad-hoc
knowledge
gathering and sharing
Organized
knowledge
sharing through written
documentation
and
technology
(e.g.:
knowledge
databases,
intranets, wikis, etc.)
Knowledge management
is
standardized;
knowledge creation and
sharing
through
brainstorming, training
and mentoring programs
Customer and suppliers
service
needs
documented in an ad-hoc
manner;
no
service
catalogue compiled
Some
customer
and
supplier service needs
documented
and
formalized
based
on
previous experience
All the customer and
supplier service needs
documented
and
formalized according to
a standard procedure into
service level agreements
(SLAs)
Central business unit
knowledge
management;
quantitative knowledge
management
control
and periodic knowledge
gap analysis
SLAs reviewed with the
customer and supplier
on both a periodic and
event-driven basis
Incident management is
done ad-hoc with no
specialized
ticket
handling
system
or
service desk to assess
and classify them prior
to referring them to a
specialist
A ticket handling system is
used
for
incident
management and some
procedures are followed,
but nothing is standardized
or documented
A service desk is the
recognized point of
contact for all the
customer
queries;
incidents assessment and
classification is done
following a standard
procedure
Change
A ticket handling system is
A standard procedure is
requests
are
Level 3) + standard
reports concerning the
incident
status
including
measurements
and
goals (e.g.: response
time) are regularly
produced for all the
involved teams and
customers; an incident
management database is
established
as
a
repository for the event
records
Level 3) + standard
Actual
service
delivery continuously
monitored
and
evaluated with the
customer on both a
periodic and eventdriven
basis
for
continuous
improvement (SLAs
including penalties)
Level 4) + trend
analysis in incident
occurrence and also in
customer satisfaction
and value perception
of
the
services
provided to them
Level
4)
- 110 -
+
trend
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
made and solved in an
ad-hoc manner
used for storing and
solving the requests for
change
and
some
procedures are followed,
but nothing is standardized
or documented
used for approving,
verifying,
prioritizing
and scheduling changes
reports concerning the
change status including
measurements
and
goals (e.g.: response
time) are regularly
produced for all the
involved teams and
customers;
standards
established
for
documenting changes
Level 3) + standard
reports
concerning
performance
and
resource management
including
measurements
and
goals are done on a
regular basis
Ad-hoc
resource
management activities
(only when there is a
problem)
Resource management is
done
following
some
procedures, but nothing is
standardized
or
documented
Resource management is
done
constantly
following a standardized
documented procedure
Ad-hoc
availability
management
Availability management is
done
following
some
procedures, but nothing is
standardized
or
documented
Availability management
documented and done
using a standardized
procedure (all elements
are monitored)
Level
3)
+
risk
assessment to determine
the critical elements
and possible problems
Ad-hoc changes solving
and implementation; no
release naming and
numbering conventions
Release management is
done
following
some
procedures, but nothing is
standardized
or
documented;
release
naming and numbering
conventions
Release management is
documented and done
following a standardized
procedure;
assigned
release
management
roles and responsibilities
Level 3) + standard
reports
concerning
release
management
including
measurements
and
goals are done on a
regular basis; master
copies of all software in
a release secured in a
release database
analysis and statistics
regarding
change
occurrence,
success
rate,
customer
satisfaction and value
perception of the
services provided to
them
Level 4) + resource
management
trend
analysis
and
monitoring
to
determine the most
common bottlenecks
and make sure that
there is sufficient
capacity to support
planned services
Level 4) + availability
management
trend
analysis and planning
to determine the most
common bottlenecks
and make sure that all
the
elements
are
available
for
the
agreed service level
targets
Level 4) + release
management
trend
analysis, statistics and
planning
- 111 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Appendix B: The DW Maturity Assessment Questionnaire (Final Version)
Data Warehouse (DW) Maturity Assessment Questionnaire
The filling in of the questionnaire will take approximately 50 minutes and in the end a maturity score for each
benchmark category/sub-category and an overall maturity score will be provided. The questions from the first part of
the questionnaire (i.e.: DW General Questions) are not scored and their answers will serve as input for shaping a
better image of the DW solution maturity. The questions from the second and third part of the questionnaire (i.e.:
DW technical solution; and DW organization and processes) are scored from 1 to 5 and they are multiple choice
questions with only one possible answer (except questions 3.1 – 11 and 3.2 – 1 where more answers may be circled).
1 DW General Questions
1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization?
2) How long has your organization been using BI/DW?
3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of:
a) Returns vs. Costs
b) Time (Intended vs. Actual)
c) Quality
d) End-user adoption.
4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment?
a) Operational cost center – An IT system needed to run the business
b) Tactical resource - Tools to assist decision making
c) Mission-critical resource - A system that is critical to running business operations
d) Strategic resource – Key to achieving performance objectives and goals
e) Competitive differentiator – Key to gaining or keeping customers and/or market share.
5) What percentage of the annual IT budget for your organization does the BI/DW budget represent?
6) What percentage of the IT department is taking care of BI (i.e.: how many people from the total number of IT
employees)?
7) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the
invoice)?
8) Which technologies do you use for developing the BI/DW solution in your organization?
Developing Category
Technology
Data Modelling
Extract/Transform/Load (ETL)
BI Applications
Database
- 112 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
9) What data modelling technique do you use for your BI/DW solution (e.g.: dimensional modelling, normalized
modelling, data vault, etc.)?
2 DW Technical Solution
2.1 General Architecture and Infrastructure
1) What is the predominant architecture of your DW?
a) Multiple independent data marts
b) A virtual integrated DW or real-time DW
c) Multiple independent data warehouses
d) A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball)
e) Desktop data marts (e.g.: Excel sheets)
2) To what degree have you defined and documented definitions and business rules for the necessary
transformations, key terms and metrics?
a) No business rules defined
b) Most of the business rules defined and documented
c) Few business rules defined and documented
d) All business rules defined and documented
e) Some business rules defined and documented
3) To what degree have you implemented definitions and business rules for the necessary transformations, key
terms and metrics?
a) No business rules implemented
b) Most of the business rules implemented
c) Few business rules implemented
d) All business rules implemented
e) Some business rules implemented
4) To what degree is your metadata management implemented?
a) Web-accessed central metadata repository with integrated, standardized, up-to-date metadata
b) Non-integrated metadata by solution
c) Central up-to-date metadata repository
d) No metadata management
e) Central metadata repository separated by tools
5) To what degree is security implemented in your DW architecture?
a) No security implemented
b) Integrated company wide security
c) Independent authorization for each tool
d) Authentication security
e) Role-level security at database level
6) What types of data sources does your DW support at the highest level?
a) CSVs files
b) Operational databases
c) ERP and CRM systems; XML files
- 113 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
d) Unstructured data sources (e.g.: text or documents)
e) Various types of unstructured data sources (e.g.: images, videos) and Web data sources
7) To what degree do you use methods to increase the performance of your DW?
a) Specialized DW appliances (e.g.: Netezza, Teradata) or cloud computing
b) No methods to increase performance
c) Software performance tuning (e.g.: index management, parallelizing and partitioning system, views
materialization)
d) Hardware performance tuning (e.g.: DW server)
e) Software and hardware tuning
8) To what degree is your infrastructure specialized for a DW?
a) Desktop platform
b) Specialized DW appliances (e.g.: Netezza, Teradata)
c) Separate OLTP systems and DW environment
d) Separate servers for OLTP systems, DW, ETL and BI applications
e) Shared OLTP systems and DW environment
9) Which answer best describes the update frequency for your DW?
a) Daily update
b) Monthly update or less often
c) Real-time update
d) Inter-daily update
e) Weekly update
2.2 Data Modelling
1) Which answer best describes the usage of a data modelling tool in your organization?
a) No data modelling tool
b) Scattered data modelling tools used only for design
c) Standardized data modelling tool used for design and maintaining metadata
d) Standardized data modelling tool used only for design
e) Scattered data modelling tools used also for maintenance
2) Which answer best describes the degree of synchronization between the following data models that your
organization maintains and the mapping between them: ETL source and target models; DW and data marts
models; BI semantic or query object models?
a) Automatic synchronization of all of the data models
b) Manual synchronization of some of the data models
c) No synchronization between data models
d) Manual or automatic synchronization depending on the data models
e) Automatic synchronization of most of the data models
3) To what degree do you differentiate between data models levels: physical, logical and conceptual?
a) No differentiation between data models levels
b) All data models have conceptual, logical and physical levels designed
c) Logical and physical levels designed for some data models
d) Conceptual level also designed for some data models
e) Logical and physical levels designed for all the data models
- 114 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
4) To what degree have you defined and documented standards (e.g.: naming conventions, metadata, etc.) for your
data models?
a) No standards defined for data models
b) Enterprise-wide standards defined for some of the data models
c) Enterprise-wide standards defined for most of the data models
d) Solution-dependent standards defined for some of the data models
e) Enterprise-wide standards defined for all the data models
5) To what degree have you implemented standards (e.g.: naming conventions, metadata, etc.) for your data
models?
a) No standards implemented for data models
b) Enterprise-wide standards implemented for some of the data models
c) Enterprise-wide standards implemented for most of the data models
d) Solution-dependent standards implemented for some of the data models
e) Enterprise-wide standards implemented for all the data models
6) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality,
etc.) in your data models?
a) No documentation for any data models
b) Standardized documentation for some of the data models
c) Standardized documentation for all the data models
d) Non standardized documentation for some of the data models
e) Standardized documentation for most of the data models

If you use dimensional modelling, please answer the following three questions:
7) What percentage of all your fact tables has their granularity at the lowest level possible?
a) Very few fact tables have their granularity at the lowest level possible
b) Few fact tables have their granularity at the lowest level possible
c) Some fact tables have their granularity at the lowest level possible
d) Most fact tables have their granularity at the lowest level possible
e) All fact tables have their granularity at the lowest level possible
8) To what degree do you design conformed dimensions in your data models?
a) No conformed dimensions
b) Conformed dimensions for few business processes
c) Enterprise-wide standardized conformed dimensions for most business processes; also making use of a high
level design technique such as an enterprise bus matrix
d) Conformed dimensions for some business processes
e) Enterprise-wide standardized conformed dimensions for all business processes
9) Which answer best describes the current state of your dimension tables modelling?
a) Few dimensions designed; no hierarchies or surrogate keys designed
b) Some dimensions designed with surrogate keys and basic hierarchies (if needed)
c) Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed)
d) Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed
e) Besides regular dimensions, special dimensions are also designed (e.g.: mini, monster, junk dimensions)
- 115 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
2.3 ETL
1) Which answer best describes the usage of an ETL tool in your organization?
a) Only hand-coded ETL
b) Complete ETL generated from metadata
c) Hand-coded ETL and some standard scripts
d) Standardized ETL tool and some standard scripts
e) ETL tool(s) for all the ETL design and generation
2) Which answer best describes the complexity of your ETL?
a) Simple ETL that just extracts and loads data into the data warehouse
b) Basic ETL with simple transformations such as: format changes, sorting, filtering, joining, deriving new
calculated values, aggregation, etc and surrogate key generator
c) Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data capture system,
de-duplication and matching system, data quality system
d) More advanced ETL capabilities: error event table creation, audit dimension creation, late arriving data
handler, hierarchy manager, special dimensions manager
e) Optimized ETL for a real time DW (real-time ETL capabilities)
3) Which answer best describes the data quality system implemented for your ETL?
a) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues:
Solving data quality issues: no
b) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving
quality issues: no
c) Daily automation: yes / no; Specific data quality tools: yes / no; Identifying data quality issues:
Solving data quality issues: yes
d) Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no; Solving
quality issues: no
e) Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes; Solving
quality issues: yes
yes;
data
yes;
data
data
4) Which answer best describes the management and monitoring of your ETL?
(Definitions:
 Simple monitoring (i.e.: ETL workflow monitor – statistics regarding ETL execution such as pending,
running, completed and suspended jobs; MB processed per second; summaries of errors, etc.);
 Advanced monitoring (i.e.: ETL workflow monitor – statistics on infrastructure performance like CPU
usage, memory allocation, database performance, server utilization during ETL; job scheduler – time
or event based ETL execution, events notification; data lineage and analyzer system))
a)
Restart and recovery system: no; Simple monitoring: no; Advanced monitoring: no; Real-time monitoring:
no
b) Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time monitoring:
no
c) Manual restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes / no; Realtime monitoring: no
d) Manual and automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring: yes
/ no; Real-time monitoring: no
e) Completely automatic restart and recovery system: yes; Simple monitoring: yes; Advanced monitoring:
yes; Real-time monitoring: yes
- 116 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
5) To what degree have you defined and documented standards (e.g.: naming conventions, set-up standards,
recovery process, etc.) for your ETL?
a) No standards defined
b) Few standards defined for ETL
c) Some standards defined for ETL
d) Most standards defined for ETL
e) All the standards defined for ETL
6) To what degree have you implemented standards (e.g.: naming conventions, set-up standards, recovery process,
etc.) for your ETL?
a) No standards implemented
b) Few standards implemented for ETL
c) Some standards implemented for ETL
d) Most standards implemented for ETL
e) All the standards implemented for ETL
7) To what degree is your metadata management implemented for your ETL?
a) No metadata management
b) Business and technical metadata for some ETL
c) All types of metadata (i.e.: business, technical, process) are managed for all ETL
d) Process metadata is also managed for some ETL
e) Business and technical metadata for all ETL
2.4 BI Applications
1) Which types of BI applications best describe the highest level purpose of your DW environment?
a) Static and parameter-driven reports and query applications
b) Ad-hoc reporting; online analytical processing (OLAP)
c) Visualization techniques: dashboards and scorecards
d) Predictive analytics: data and text mining; alerts
e) Closed-loop BI applications; real-time BI applications
2) Which answer best describes your current BI tool usage?
(Definitions:
 main stream BI applications (i.e.: reporting and visualization applications);
 specific BI applications (i.e.: data mining, financial analysis, etc.))
a)
b)
c)
d)
e)
One standardized tool for main stream BI and one standardized tool for specific BI applications
BI tool related to the data mart
One tool recommended for main stream BI, but each department can use their own tool
More than two tools for main stream BI
One standardized tool for main stream BI, but each department can use their own tool for specific BI
applications
3) To what degree have you defined and documented standards (e.g.: naming conventions, generic
transformations, logical structure of attributes and measures) for your BI applications?
a) No standards defined
b) Few standards defined for BI applications
c) Some standards defined for BI applications
- 117 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
d) Most standards defined for BI applications
e) All the standards defined for BI applications
4) To what degree have you implemented standards (e.g.: naming conventions, generic transformations, logical
structure of attributes and measures) for your BI applications?
a) No standards implemented
b) Few standards implemented for BI applications
c) Some standards implemented for BI applications
d) Most standards implemented for BI applications
e) All the standards implemented for BI applications
5) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) implemented in your BI
applications?
a) Objects defined for every BI application
b) All similar BI applications use standard objects and templates
c) Some reusable objects for similar BI applications
d) Most similar BI applications use standard objects and templates
e) Some standard objects and templates for similar BI applications
6) Which BI applications delivery method best describes the highest level purpose of your DW?
a) Reports are delivered manually on paper or by email
b) Reports are delivered automatically by email
c) Direct tool-based interface
d) A BI portal with basic functions: subscriptions, discussions forum, alerting
e) Highly interactive, business process oriented, up-to-date portal (no differentiation between operational and
BI portals)
7) Which answer best describes the metadata accessibility to users?
a) No metadata available
b) Some incomplete metadata documents that users ask for periodically
c) Complete integration of metadata with the BI applications (metadata can be accessed through one button
push on the attributes, etc.)
d) Complete up-to-date metadata documents sent to users periodically or available on the intranet
e) Metadata is always available through a metadata management tool, different from the BI tool
3 DW Organization and Processes
3.1 Development Processes
1) Which answer best describes the DW development processes in your organization?
a) Ad-hoc development processes; no clearly defined development phases (i.e.: planning, requirements
definition, design, construction, deployment, maintenance)
b) Repeatable development processes based on experience with similar projects; some development phases
clearly separated
c) Standard documented development processes; iterative and incremental development processes with all the
development phases clearly separated
d) Development processes continuously measured against well-defined and consistent goals
- 118 -
DWCMM: The Data Warehouse Capability Maturity Model
e)
Catalina Sacu
Continuous development process improvement by identifying weaknesses and strengthen the process
proactively, with the goal of preventing the occurrence of defects
2) To what degree is there a separation between the development/test/acceptance/deployment environments in
your organization?
a) No separation between environments
b) Two separate environments (i.e.: usually development and production) with manual transfer between them
c) All the environments are distinct with automatic transfer between them
d) Some separation between environments (i.e.: at least two environments) with automatic transfer between
them
e) Some separation between environments (i.e.: at least three environments) with manual transfer between
them
3) To what degree has your organization defined and documented standards for developing, testing and deploying
DW functionalities (i.e.: ETL and BI applications)?
a) No standards defined
b) Few standards defined
c) Some standards defined
d) A lot of the standards defined
e) A comprehensive set of standards defined
4) To what degree has your organization implemented standards for developing, testing and deploying DW
functionalities (i.e.: ETL and BI applications)?
a) No standards implemented
b) Few standards implemented
c) Some standards implemented
d) A lot of the standards implemented
e) A comprehensive set of standards implemented
5) Which answer best describes the DW quality management?
a) No quality assurance activities
b) Ad-hoc quality assurance activities
c) Standardized and documented quality assurance activities done for all the development phases
d) c) + measurable and prioritized goals for managing the DW quality (e.g.: functionality, reliability,
maintainability, usability)
e) d) + causal analysis meetings to identify common defect causes and subsequent elimination of these
causes; service quality management certification
6) Which answer best describes the sponsor for your DW project?
a) Multiple levels of business-driven, cross-departmental sponsorship including top level management
sponsorship (BI/DW is integrated in the company process with continuous budget)
b) No project sponsor
c) Single sponsor from a business unit or department
d) Chief information officer (CIO) or an IT director
e) Multiple individual sponsors from multiple business units or departments
7) Which answer best describes your DW project management?
(Definitions:
- 119 -
DWCMM: The Data Warehouse Capability Maturity Model


Catalina Sacu
project planning and scheduling (i.e.: work breakdown structure, time, costs and resources estimates,
planning and scheduling;
project tracking and control (i.e.: milestone tracking, change control))
a)
Project planning and scheduling: no; project risk management: no; project tracking and control: no;
standard and efficient procedure and documentation, evaluation and assessment: no
b) Project planning and scheduling: yes; project risk management: no; project tracking and control: no;
standard and efficient procedure and documentation, evaluation and assessment: no
c) Project planning and scheduling: yes; project risk management: no; project tracking and control: yes;
standard and efficient procedure and documentation, evaluation and assessment: no
d) Project planning and scheduling: yes; project risk management: yes; project tracking and control: yes;
standard and efficient procedure and documentation, evaluation and assessment: no
e) Project planning and scheduling: yes; project risk management: yes; project tracking and control: yes;
standard and efficient procedure and documentation, evaluation and assessment: yes
8) Which answer best describes the role division for the DW development process?
a) No formal roles defined
b) Defined roles, but not technically implemented
c) Formalized and implemented roles and responsibilities
d) c) + periodic peer reviews (i.e.: review of each other‘s work)
e) d) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles and match the
needed roles with responsibilities and tasks)
9) Which answer best describes the knowledge management in your organization for the DW development
processes?
a) Ad-hoc knowledge gathering and sharing
b) Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases,
intranets, wikis, etc.)
c) Knowledge management is standardized; knowledge creation and sharing through brainstorming, training
and mentoring programs, and also through the use of technology
d) Central business unit knowledge management; quantitative knowledge management control and periodic
knowledge gap analysis
e) Continuously improving inter-organizational knowledge management
10) Which answer best describes the requirements definition phase for your DW project?
a) Ad-hoc requirements definition; no methodology used
b) Methodologies differ from project to project; interviews with business or IT users for collecting the
requirements
c) Standard methodology for all the projects; interviews and group sessions with both business and IT users
for collecting the requirements
d) c) + qualitative assessment and measurement of the phase; requirements document also published
e) d) + causal analysis meetings to identify common bottlenecks causes and subsequent elimination of these
causes
11) Which of the following activities are included in the testing and acceptance phase for your DW project?
a) Unit testing by another person
b) System integration testing
c) Regression testing
- 120 -
DWCMM: The Data Warehouse Capability Maturity Model
d)
e)
f)
g)
Catalina Sacu
User training
Acceptance testing
Standard procedure and documentation for testing and acceptance
External assessments and reviews of testing and acceptance
3.2 Service Processes (Maintenance and Monitoring Processes)
1) Which of the following activities are included in the maintenance and monitoring phase for your DW project?
a) Collection of statistics regarding the utilization of the hardware and software resources (e.g.: memory
management, physical disk storage space utilization, processor usage, BI applications usage, number of
completed queries by time slots during the day, time each user stays online with the data warehouse, total
number of distinct users per day, etc.)
b) BI applications maintenance and monitoring
c) User support
d) ETL monitoring and management
e) data reconciliation and data growth management
f) Security administration
g) Resource monitoring and management
h) Infrastructure management
i) Backup and recovery management
j) Performance monitoring and tuning
2) Which answer best describes the DW service quality management in your organization?
a) No service quality management activities
b) Ad-hoc service quality management
c) Proactive service quality management including a standard procedure
d) c) + service quality measurements periodically compared to the established goals to determine the
deviations and their causes
e) d) + causal analysis meetings to identify common defect causes and subsequent elimination of these causes;
service quality management certification
3) Which answer best describes the knowledge management in your organization for the DW development
processes?
a) Ad-hoc knowledge gathering and sharing
b) Organized knowledge sharing through written documentation and technology (e.g.: knowledge databases,
intranets, wikis, etc.)
c) Knowledge management is standardized; knowledge creation and sharing through brainstorming, training
and mentoring programs, and also through the use of technology
d) Central business unit knowledge management; quantitative knowledge management control and periodic
knowledge gap analysis
e) Continuously improving inter-organizational knowledge management
4) Which answer best describes the DW service level management in your organization?
a) Customer and suppliers service needs documented in an ad-hoc manner; no service catalogue compiled
b) Some customer and supplier service needs documented and formalized based on previous experience
c) All the customer and supplier service needs documented and formalized according to a standard procedure
into service level agreements (SLAs)
d) SLAs reviewed with the customer and supplier on both a periodic and event-driven basis
- 121 -
DWCMM: The Data Warehouse Capability Maturity Model
e)
Catalina Sacu
Actual service delivery continuously monitored and evaluated with the customer on both a periodic and
event-driven basis for continuous improvement (SLAs including penalties)
5) Which answer best describes the DW incident management in your organization?
a) Incident management is done ad-hoc with no specialized ticket handling system or service desk to assess
and classify them prior to referring them to a specialist
b) A ticket handling system is used for incident management and some procedures are followed, but nothing is
standardized or documented
c) A service desk is the recognized point of contact for all the customer queries; incidents assessment and
classification is done following a standard procedure
d) c) + standard reports concerning the incident status including measurements and goals (e.g.: response time)
are regularly produced for all the involved teams and customers; an incident management database is
established as a repository for the event records
e) d) + trend analysis in incident occurrence and also in customer satisfaction and value perception of the
services provided to them
6) Which answer best describes the DW change management in your organization?
a) Change requests are made and solved in an ad-hoc manner
b) A ticket handling system is used for storing and solving the requests for change and some procedures are
followed, but nothing is standardized or documented
c) A standard procedure is used for approving, verifying, prioritizing and scheduling changes
d) c) + standard reports concerning the change status including measurements and goals (e.g.: response time)
are regularly produced for all the involved teams and customers; standards established for documenting
changes
e) d) + trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and value
perception of the services provided to them
7) Which answer best describes the DW technical resource management in your organization?
a) Ad-hoc resource management activities (only when there is a problem)
b) Resource management is done following some procedures, but nothing is standardized or documented
c) Resource management is done constantly following a standardized documented procedure
d) c) + standard reports concerning performance and resource management including measurements and goals
are done on a regular basis
e) d) + resource management trend analysis and monitoring to determine the most common bottlenecks and
make sure that there is sufficient capacity to support planned services
8) Which answer best describes the availability management in your organization?
a) Ad-hoc availability management
b) Availability management is done following some procedures, but nothing is standardized or documented
c) Availability management documented and done using a standardized procedure (all elements are
monitored)
d) c) + risk assessment to determine the critical elements and possible problems
e) d) + availability management trend analysis and planning to determine the most common bottlenecks and
make sure that all the elements are available for the agreed service level targets
9) Which answer best describes the release management in your organization?
a) Ad-hoc changes solving and implementation; no release naming and numbering conventions
b) Release management is done following some procedures, but nothing is standardized or documented;
release naming and numbering conventions
- 122 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
c)
Release management is documented and done following a standardized procedure; assigned release
management roles and responsibilities
d) c) + standard reports concerning release management including measurements and goals are done on a
regular basis; master copies of all software in a release secured in a release database
e) d) + release management trend analysis, statistics and planning
- 123 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Appendix C: DW Maturity Assessment Questionnaire (Redefined Version)
Data Warehouse (DW) Maturity Assessment Questionnaire
The filling in of the questionnaire will take 50 minutes and in the end a maturity score for each benchmark
category/sub-category and an overall maturity score will be provided. The questions from the first part of the
questionnaire (i.e.: DW General Questions) are not scored and their answers will serve as input for shaping a better
image of the DW solution maturity. The questions from the second and third part of the questionnaire (i.e.: DW
technical solution; and DW organization and processes) are scored from 1 to 5 and they are multiple choice
questions with only one possible answer (except questions 3.1 – 11 and 3.2 – 1 where more answers may be circled).
1 DW General Questions
1) Could you elaborate on the main drivers for implementing a BI/DW solution in your organization?
2) How long has your organization been using BI/DW?
3) Could you elaborate on the success of the BI/DW solution in your organization, in terms of:
a) Returns vs. Costs
b) Time (Intended vs. Actual)
c) Quality
d) End-user adoption.
4) Which answer best describes how executives perceive the purpose of your organization‘s BI/DW environment?
a) Operational cost center – An IT system needed to run the business
b) Tactical resource - Tools to assist decision making
c) Mission-critical resource - A system that is critical to running business operations
d) Strategic resource – Key to achieving performance objectives and goals
e) Competitive differentiator – Key to gaining or keeping customers and/or market share.
5) What percentage of the annual IT budget for your organization does the BI/DW budget represent?
6) Who is the budget owner of the BI/DW solution in your organization (i.e.: who is responsible for paying the
invoice)?
7) Which technologies do you use for developing the BI/DW solution in your organization?
Developing Category
Technology
Data Modelling
Extract/Transform/Load (ETL)
BI Applications
Database
8) What data modelling technique do you use for your BI/DW solution?
- 124 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
2 DW Technical Solution
2.1 Architecture/ General Architecture and Infrastructure
1) What is the predominant architecture of your DW?
a) Level 1 – Desktop data marts (e.g.: Excel sheets)
b) Level 2 – Multiple independent data marts
c) Level 3 – Multiple independent data warehouses
d) Level 4 – A single, central DW with multiple data marts (Inmon) or conformed data marts (Kimball)
e) Level 5 – A virtual integrated DW or real-time DW
2) To what degree have you defined, documented and implemented definitions and business rules for the necessary
transformations, key terms and metrics?
a) Very low – No business rules defined
b) Low – Few business rules defined and implemented
c) Moderate – Some business rules defined and implemented
d) High – Most of the business rules defined and implemented
e) Very high – All business rules defined and implemented
3) To what degree is your metadata management implemented?
a) Very low – No metadata management
b) Low – Non-integrated metadata by solution
c) Moderate – Central metadata repository separated by tools
d) High – Central up-to-date metadata repository
e) Very high – Web-accessed central metadata repository with integrated, standardized, up-to-date metadata
4) To what degree is security implemented in your DW architecture?
a) Very low – No security implemented
b) Low – Authentication security
c) Moderate – Independent authorization for each tool / Target audience authorization
d) High – Role-level security at database level
e) Very high – Integrated companywide authorization security
5) What types of data sources does your DW support at the highest level?
a) Level 1 – CSVs files
b) Level 2 – Operational databases
c) Level 3 – ERP and CRM systems; XML files
d) Level 4 – Unstructured data sources (e.g.: text or documents)
e) Level 5 – Various types of unstructured data sources (e.g.: images, videos) and Web data sources
6) To what degree do you use methods to increase the performance of your DW?
a) Very low – No methods to increase performance
b) Low – Software performance tuning (e.g.: index management, parallelizing and partitioning system, views
materialization)
c) Moderate – Hardware performance tuning (e.g.: DW server)
d) High – Software and hardware tuning
e) Very high – Specialized DW appliances (e.g.: Netezza, Teradata) / cloud computing
7) To what degree is your infrastructure specialized for a DW?
- 125 -
DWCMM: The Data Warehouse Capability Maturity Model
a)
b)
c)
d)
e)
Catalina Sacu
Very low – Desktop platform
Low – Shared OLTP systems and DW environment
Moderate – Separate OLTP systems and DW environment
High – Separate servers for OLTP systems, DW, ETL and BI applications
Very high – Specialized DW appliances (e.g.: Netezza, Teradata)
8) Which answer best describes the update frequency for your DW?
a) Level 1 – Monthly update or less often
b) Level 2 – Weekly update
c) Level 3 – Daily update
d) Level 4 – Inter-daily update
e) Level 5 – Real-time update
2.2 Data Modelling
Data quality?
1) Which answer best describes the usage of a data modelling tool in your organization?
a) Level 1 – No data modelling tool
b) Level 2 – Scattered data modelling tools used only for design
c) Level 3 – Scattered data modelling tools used also for maintenance
d) Level 4 – Standardized data modelling tool used only for design
e) Level 5 – Standardized data modelling tool used for design and maintaining metadata
2) Which answer best describes the degree of synchronization between the following data models that your
organization maintains and the mapping between them: ETL source and target models; DW and data marts
models; BI semantic or query object models?
a) Level 1 – No synchronization between data models
b) Level 2 – Manual synchronization of some of the data models
c) Level 3 – Manual or automatic synchronization depending on the data models
d) Level 4 – Automatic synchronization of most of the data models
e) Level 5 – Automatic synchronization of all of the data models
3) To what degree do you differentiate between data models levels: physical, logical and conceptual?
a) Very low – No differentiation between data models levels
b) Low – Logical and physical levels designed for some data models
c) Moderate – Logical and physical levels designed for all the data models
d) High – Conceptual level also designed for some data models
e) Very high – All data models have conceptual, logical and physical levels designed
4) To what degree have you defined and implemented standards (e.g.: naming conventions, metadata, etc.) for
your data models?
a) Very low – No standards defined for data models
b) Low – Solution-dependent standards defined for some of the data models
c) Moderate – Solution-dependent standards defined for most of the data models / Enterprise-wide standards
defined for some of the data models
d) High – Enterprise-wide standards defined for most of the data models
e) Very high – Enterprise-wide standards defined for all the data models
- 126 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
5) To what degree have you documented the metadata (e.g.: definitions, business rules, main values, data quality,
etc.) in your data models?
a) Very low – No documentation for any data models
b) Low – Non standardized documentation for some of the data models
c) Moderate – Standardized documentation for some of the data models
d) High – Standardized documentation for most of the data models
e) Very high – Standardized documentation for all the data models
6) What percentage of all your fact tables has their granularity at the lowest level possible?
a) Very low – Very few fact tables have their granularity at the lowest level possible
b) Low – Few fact tables have their granularity at the lowest level possible
c) Moderate – Some fact tables have their granularity at the lowest level possible
d) High – Most fact tables have their granularity at the lowest level possible
e) Very high – All fact tables have their granularity at the lowest level possible
7) To what degree do you design conformed dimensions in your data models?
a) Very low – No conformed dimensions
b) Low – Conformed dimensions for few business processes
c) Moderate – Conformed dimensions for some business processes
d) High – Enterprise-wide standardized conformed dimensions for most business processes; also making use
of a high level design technique such as an enterprise bus matrix
e) Very high – Enterprise-wide standardized conformed dimensions for all business processes
8) Which answer best describes the current state of your dimension tables modelling?
a) Level 1 – Few dimensions designed; no hierarchies or surrogate keys designed
b) Level 2 – Some dimensions designed with surrogate keys and basic hierarchies (if needed)
c) Level 3 – Most dimensions designed with surrogate keys and basic/complex hierarchies (if needed)
d) Level 4 – Slowly changing dimensions techniques (i.e.: type 2, 3 and more) also designed
e) Level 5 – Besides regular dimensions and slowly changing dimensions technique, special dimensions are
also designed (e.g.: mini, monster, junk dimensions)
2.3 ETL
1) Which answer best describes the usage of an ETL tool in your organization?
a) Level 1 – Only hand-coded ETL
b) Level 2 – Hand-coded ETL and some standard scripts
c) Level 3 – ETL tool(s) for all the ETL design and generation
d) Level 4 – Standardized ETL tool and some standard scripts for better performance
e) Level 5 – Complete ETL generated from metadata
2) Which answer best describes the complexity of your ETL?
a) Level 1 – Simple ETL that just extracts and loads data into the data warehouse
b) Level 2 – Basic ETL with simple transformations such as: format changes, sorting, filtering, joining,
deriving new calculated values, aggregation, etc and surrogate key generator
c) Level 3 – Advanced ETL capabilities: slowly changing dimensions manager, reusability, change data
capture system, de-duplication and matching system, data quality system
d) Level 4 – More advanced ETL capabilities: error event table creation, audit dimension creation, late
arriving data handler, hierarchy manager, special dimensions manager
- 127 -
DWCMM: The Data Warehouse Capability Maturity Model
e)
Catalina Sacu
Level 5 – Real-time ETL capabilities (optimization of ETL) / optimized ETL for an agile DW (real-time
ETL capabilities)
3) Which answer best describes the data quality system implemented for your ETL?
a) Very low – Daily automation: no; Specific data quality tools: no; Identifying data quality issues: no;
Solving data quality issues: no
b) Low – Daily automation: no; Specific data quality tools: no; Identifying data quality issues: yes; Solving
data quality issues: no
c) Moderate – Daily automation: yes/no; Specific data quality tools: yes/no; Identifying data quality issues:
yes; Solving data quality issues: no
d) High – Daily automation: yes/no; Specific data quality tools: yes/no; Identifying data quality issues: yes;
Solving data quality issues: yes
e) Very high – Daily automation: yes; Specific data quality tools: yes; Identifying data quality issues: yes;
Solving data quality issues: yes
4) Which answer best describes the management and monitoring of your ETL?
a) Level 1 – Restart and recovery system: no; Simple monitoring (i.e: ETL workflow monitor – statistics
regarding ETL execution such as pending, running, completed and suspended jobs; MB processed per
second; summaries of errors, etc.): no; Advanced monitoring (ETL workflow monitor – statistics on
infrastructure performance like CPU usage, memory allocation, database performance, server utilization
during ETL; job scheduler – time or event based ETL execution, events notification; data lineage and
analyzer system); Real-time monitoring: no
b) Level 2 – Restart and recovery system: no; Simple monitoring: yes; Advanced monitoring: no; Real-time
monitoring: no
c) Level 3 – Restart and recovery system: no / Manual restart and recovery system: yes; Simple monitoring:
yes; Advanced monitoring: yes; Real-time monitoring: no
d) Level 4 – Restart and recovery system: yes / Manual and automatic restart and recovery system: yes;
Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: no
e) Level 5 – Restart and recovery system: yes / Completely automatic restart and recovery system: yes;
Simple monitoring: yes; Advanced monitoring: yes; Real-time monitoring: yes (Manual or automatic
restart and recovery system as needed)
5) To what degree have you defined and implemented standards (e.g.: naming conventions, set-up standards,
recovery process, etc.) for your ETL?
a) Very low – No standards defined
b) Low – Few standards defined for ETL
c) Moderate – Some standards defined for ETL
d) High – Most standards defined for ETL
e) Very high – All the standards defined for ETL
6) To what degree is your metadata management implemented for your ETL?
a) Very low – No metadata management
b) Low – Business and technical metadata for some ETL
c) Moderate – Business and technical metadata for all ETL
d) High – Process metadata is also managed for some ETL
e) Very high – All types of metadata are managed for all ETL
2.4 BI Applications
- 128 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
1) Which types of BI applications best describe the highest level purpose of your DW environment?
a) Level 1 – Static and parameter-driven reports and query applications
b) Level 2 – Ad-hoc reporting; online analytical processing (OLAP)
c) Level 3 – Visualization techniques: dashboards and scorecards
d) Level 4 – Predictive analytics: data and text mining; alerts
e) Level 5 – Closed-loop BI applications; real-time BI applications
2) Which answer best describes your current BI tool usage?
a) Level 1 – BI tool related to the data mart
b) Level 2 – More than two tools for main stream BI (i.e.: reporting and visualization applications)
c) Level 3 – One tool recommended for main stream BI, but each department can use their own tool
d) Level 4 – One standardized tool for main stream BI, but each department can use their own tool for specific
BI applications (i.e.: data mining, financial analysis, etc.)
e) Level 5 – One standardized tool for main stream BI and one standardized tool for specific BI applications
3) To what degree have you defined and implemented standards (e.g.: naming conventions, generic
transformations, logical structure of attributes and measures) for your BI applications?
a) Very low – No standards defined
b) Low – Few standards defined for BI applications
c) Moderate – Some standards defined for BI applications
d) High – Most standards defined for BI applications
e) Very high – All the standards defined for BI applications
4) To what degree are standardized objects (e.g.: KPIs, metrics, attributes, templates) implemented in your BI
applications? / To what degree are generic components used for your BI applications?
a) Very low – Objects defined for every BI application
b) Low – Some reusable objects for similar BI applications
c) Moderate – Some standard objects and templates for similar BI applications
d) High – Most similar BI applications use standard objects and templates
e) Very high – All similar BI applications use standard objects and templates
5) Which BI applications delivery method best describes the highest level purpose of your DW?
a) Level 1 – Reports are delivered manually on paper or by email
b) Level 2 – Reports are delivered automatically by email
c) Level 3 – Direct tool-based interface
d) Level 4 – A BI portal with basic functions: subscriptions, discussions forum, alerting
e) Level 5 – Highly interactive, business process oriented, up-to-date portal (no differentiation between
operational and BI portals)
6) Which answer best describes the metadata accessibility to users?
a) Very low – No metadata available
b) Low – Some incomplete metadata documents that users ask for periodically
c) Moderate – Complete up-to-date metadata documents sent to users periodically or available on the intranet
d) High – Metadata is always available through a metadata management tool, different from the BI tool
e) Very high – Complete integration of metadata with the BI applications (metadata can be accessed through
one button push on the attributes, etc.)
3 DW Organization and Processes
- 129 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
3.1 Development Processes
1) Which answer best describes the DW development processes in your organization?
a) Level 1 – ad-hoc development processes; no clearly defined development phases (i.e.: planning,
requirements definition, design, construction, deployment, maintenance)
b) Level 2 – repeatable development processes based on experience with similar projects; some development
phases clearly separated
c) Level 3 – standard documented development processes; iterative and incremental development processes
with all the development phases clearly separated
d) Level 4 – development processes continuously measured against well-defined and consistent goals
a) Level 5 – continuous development process improvement by identifying weaknesses and strengthen the
process proactively, with the goal of preventing the occurrence of defects
2) To what degree is there a separation between the development/test/acceptance/deployment environments in
your organization? – the time too market is too long if each environment is separate
a) Very low – no separation between environments
b) Low – two separate environments (i.e.: usually development and production) with manual transfer between
them
c) Moderate – some separation between environments (i.e.: at least three environments) with manual transfer
between them
d) High – some separation between environments (i.e.: at least two environments) with automatic transfer
between them
e) Very high – all the environments are distinct with automatic transfer between them
3) To what degree has your organization defined, documented and implemented standards for developing, testing
and deploying DW functionalities (i.e.: ETL and BI applications)?
a) Very low – no standards defined
b) Low – few standards defined
c) Moderate – some standards defined
d) High – a lot of the standards defined
e) Very high – a comprehensive set of standards defined
4) Which answer best describes the DW quality management?
a) Level 1 – no quality assurance activities
b) Level 2 – ad-hoc quality assurance activities
c) Level 3 – standardized and documented quality assurance activities done for all the development phases
d) Level 4 – level 3) + measurable and prioritized goals for managing the DW quality (e.g.: functionality,
reliability, maintainability, usability)
e) Level 5 – level 4) + causal analysis meetings to identify common defect causes and subsequent elimination
of these causes; service quality management certification
5) Which answer best describes the sponsor for your DW project?
a) Level 1 – no project sponsor
b) Level 2 – chief information officer (CIO) or an IT director
c) Level 3 – single sponsor from a business unit or department
d) Level 4 – multiple individual sponsors from multiple business units or departments
e) Level 5 – multiple levels of business-driven, cross-departmental sponsorship including top level
management sponsorship (BI/DW is integrated in the company process with continuous budget)
- 130 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
6) Which answer best describes your DW project management?
a) Level 1 – project planning and scheduling (i.e.: work breakdown structure, time, costs and resources
estimates, planning and scheduling): no; project risk management: no; project tracking and control (i.e.:
milestone tracking, change control): no; standard and efficient procedure and documentation, evaluation
and assessment: no
b) Level 2 – project planning and scheduling: yes; project risk management: no; project tracking and control:
no; standard and efficient procedure and documentation, evaluation and assessment: no
c) Level 3 – project planning and scheduling: yes; project risk management: no; project tracking and control:
yes; standard and efficient procedure and documentation, evaluation and assessment: no
d) Level 4 – project planning and scheduling: yes; project risk management: yes; project tracking and control:
yes; standard and efficient procedure and documentation, evaluation and assessment: no
e) Level 5 – project planning and scheduling: yes; project risk management: yes; project tracking and control:
yes; standard and efficient procedure and documentation, evaluation and assessment: yes
7) Which answer best describes the role division for the DW development process?
a) Level 1 – no formal roles defined
b) Level 2 – defined roles, but not technically implemented
c) Level 3 – formalized and implemented roles and responsibilities
d) Level 4 – level 3) + periodic peer reviews (i.e.: review of each other‘s work)
e) Level 5 – level 4) + periodic evaluation and assessment of roles (i.e.: assess the performance of the roles
and match the needed roles with responsibilities and tasks)
8) Which answer best describes the knowledge management in your organization for the DW development
processes?
a) Level 1 – ad-hoc knowledge gathering and sharing
b) Level 2 – organized knowledge sharing through written documentation and technology (e.g.: knowledge
databases, intranets, wikis, etc.), and also through training and mentoring programs
c) Level 3 – knowledge management is standardized; knowledge creation and sharing through brainstorming,
training and mentoring programs
d) Level 4 – central business unit knowledge management; quantitative knowledge management control and
periodic knowledge gap analysis
e) Level 5 – continuously improving inter-organizational knowledge management
9) Which answer best describes the requirements definition phase for your DW project?
a) Level 1 – ad-hoc requirements definition; no methodology used
b) Level 2 – methodologies differ from project to project; interviews with business users for collecting the
requirements
c) Level 3 – standard methodology for all the projects; interviews and group sessions with both business and
IT users for collecting the requirements
d) Level 4 – level 3) + qualitative assessment and measurement of the phase; requirements document also
published
e) Level 5 – level 4) + causal analysis meetings to identify common bottlenecks causes and subsequent
elimination of these causes
10) Which answer best describes the testing and acceptance phase for your DW project? – answers hard to match
a) Level 1 – unit testing by another person: yes; system integration testing: no; user training: no; acceptance
testing: no; standard procedure and documentation for testing and acceptance: no; external assessments and
reviews of testing and acceptance: no;
- 131 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
b) Level 2 - unit testing by another person: yes; system integration testing: no; user training: yes; acceptance
testing: yes; standard procedure and documentation for testing and acceptance: no; external assessments
and reviews of testing and acceptance: no;
c) Level 3 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance
testing: yes; standard procedure and documentation for testing and acceptance: no; external assessments
and reviews of testing and acceptance: no;
d) Level 4 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance
testing: yes; standard procedure and documentation for testing and acceptance: yes; external assessments
and reviews of testing and acceptance: no;
e) Level 5 - unit testing by another person: yes; system integration testing: yes; user training: yes; acceptance
testing: yes; standard procedure and documentation for testing and acceptance: yes; external assessments
and reviews of testing and acceptance: yes.
3.2 Service Processes (Maintenance and Monitoring Processes)
1) Which answer best describes the DW service quality management in your organization?
a) Level 1 – no service quality management activities
b) Level 2 – ad-hoc service quality management
c) Level 3 – proactive service quality management including a standard procedure
d) Level 4 – level 3) + service quality measurements periodically compared to the established goals to
determine the deviations and their causes
e) Level 5 – levels 4) + causal analysis meetings to identify common defect causes and subsequent
elimination of these causes; service quality management certification
2) Which answer best describes the knowledge management in your organization for the DW service processes?
a) Level 1 – ad-hoc knowledge gathering and sharing
b) Level 2 – organized knowledge sharing through written documentation and technology (e.g.: knowledge
databases, intranets, wikis, etc.), and also through training and mentoring programs
c) Level 3 – knowledge management is standardized; knowledge creation and sharing through brainstorming,
training and mentoring programs
d) Level 4 – central business unit knowledge management; quantitative knowledge management control and
periodic knowledge gap analysis
e) Level 5 – continuously improving inter-organizational knowledge management
3) Which answer best describes the DW service level management in your organization? – SLA with the suppliers
of data?
a) Level 1 – customer service needs documented in an ad-hoc manner; no service catalogue compiled
b) Level 2 – some customer service needs documented and formalized based on previous experience
c) Level 3 – all the customer service needs documented and formalized according to a standard procedure into
service level agreements (SLAs)
d) Level 4 – SLAs reviewed with the customer on both a periodic and event-driven basis
e) Level 5 – actual service delivery continuously monitored and evaluated with the customer on both a
periodic and event-driven basis for continuous improvement (SLAs including penalties)
4) Which answer best describes the DW incident management in your organization?
a) Level 1 – incident management is done ad-hoc with no specialized ticket handling system or service desk to
assess and classify them prior to referring them to a specialist
b) Level 2 – a ticket handling system is used for incident management; some policies and procedures for
incident management are established, but nothing is standardized
- 132 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Level 3 – a service desk is the recognized point of contact for all the customer queries; incidents
assessment and classification is done following a standard procedure
d) Level 4 – standard reports concerning the incident status including measurements and goals (e.g.: response
time) are regularly produced for all the involved teams and customers; an incident management database is
established as a repository for the event records
e) Level 5 – trend analysis in incident occurrence and also in customer satisfaction and value perception of the
services provided to them
c)
5) Which answer best describes the DW change management in your organization?
a) Level 1 – change requests are made and solved in an ad-hoc manner
b) Level 2 – a change management system is used for storing and solving the requests for change; some
policies and procedures for change management established, but nothing is standardized
c) Level 3 – a standard procedure is used for approving, verifying, prioritizing and scheduling changes
d) Level 4 – standard reports concerning the change status including measurements and goals (e.g.: response
time) are regularly produced for all the involved teams and customers; standards established for
documenting changes
e) Level 5 – trend analysis and statistics regarding change occurrence, success rate, customer satisfaction and
value perception of the services provided to them
6) Which answer best describes the DW technical resource management in your organization?
a) Level 1 – ad-hoc resource management activities (only when there is a problem)
b) Level 2 – resource management is done following some procedures, but nothing is standardized or
documented
c) Level 3 – resource management is done constantly following a standardized documented procedure
d) Level 4 – standard reports concerning performance and resource management including measurements and
goals are done on a regular basis
e) Level 5 – resource management trend analysis and monitoring to make sure that there is sufficient capacity
to support planned services
7) Which answer best describes the availability management in your organization?
a) Level 1 – ad-hoc availability management
b) Level 2 – availability management is done following some procedures, but nothing is standardized or
documented
c) Level 3 – availability management documented and done using a standardized procedure (all elements are
monitored)
d) Level 4 – risk assessment to determine the critical elements and possible problems
e) Level 5 – availability management trend analysis and planning to make sure that all the elements are
available for the agreed service level targets
8) Which answer best describes the release management in your organization?
a) Level 1 – ad-hoc changes solving and implementation; no release naming and numbering conventions
b) Level 2 – release management is done following some procedures, but nothing is standardized or
documented; release naming and numbering conventions
c) Level 3 – release management is documented and done following a standardized procedure; assigned
release management roles and responsibilities
d) Level 4 – standard reports concerning release management including measurements and goals are done on a
regular basis; master copies of all software in a release secured in a release database
e) Level 5 – release management trend analysis, statistics and planning.
- 133 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Appendix D: Expert Interview Protocol
Interviewee :
Date
Organization:
Start Time:
Place
End Time :
:
:
Interviewer Instructions


Ask for recording permission, for processing purposes. Recordings will be deleted after processing. Check
if recorder works correctly!
Start with the following introduction and continue with the questions:
General information:
As briefly explained in our e-mail contact, my name is Catalina Sacu and I am following the two years Master of
Business Informatics at Utrecht University. I am currently writing my thesis under the supervision of dr. M.R. Spruit
and dr. J. M. Versendaal, aiming to develop a Data Warehouse Capability Maturity Model. The main goal of my
thesis is to create a model that would help organizations assess their current data warehouse solution from both a
technical and an organizational and processes points of view.
Research:
In nowadays economy, organizations have a lot of information to gather and process in order to be able to take the
best decisions as fast as possible. One of the solutions that can improve the decision making process is the usage of
Business Intelligence (BI)/Data Warehouses (DW) solutions. They combine tools, technologies and processes in
order to turn data into information and information into knowledge that can optimize business actions. However,
even if organizations spend a lot of money for developing these solutions, more than 50 percent of the BI/DW
projects fail to deliver the promised results (Gartner Group, 2007). This was the trigger for my research that aims at
creating a DW Capability Maturity Model. In this way, we will be able to assess and score the different variables
that influence the quality of a DW and determine the current situation of an organization‘s DW solution. Then, we
will be able to offer some guidelines on future DW improvements that will lead to a better organizational
performance.
Goal:
As said before, the main goal of my research is to develop a Data Warehouse Capability Maturity Model. This
interview is part of my research and its main objective is to get some expert validation for the model I have
developed from theory and the case study done at Inergy. The interview will contain questions regarding the
following aspects:


Your organization and role
The Data Warehouse Capability Maturity Model.
Data collected during the interview will only be used for my thesis and will be processed anonymously. At the end
of my research, you will have the chance to see the results and the final model. The interview will last for about two
hours. Before we start, are there any questions? OK, let‘s start! Start recorder!
- 134 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Questions:
Organization and Role
1.
Could you give a short introduction to your organization (including products, markets, customers)?
2.
Could you explain your role in the organization (including your experience in BI)? On a scale from 1 to 5,
how would you judge your knowledge on BI (Business vs. Technical)?
The Data Warehouse Capability Maturity Model
1.
In my model, I consider several benchmark variables/categories that have to be taken into consideration
and assessed when analyzing the maturity of an organization‘s DW. Which
categories would you
recommend?

Show and explain the DW Capability Maturity Model (with all its components).
2.
Do you think the chosen categories are representative and if not, what changes would you make?

Let‘s take a look at each category and the questions I chose in order to do the assessment.
3.
Do you think the chosen questions are representative and if not, what changes would you make?

Let‘s take a closer look at two categories you prefer.
4.
Do you think the chosen answers are representative and if not, what changes would you make?
5.
In my model, I consider each question to have five possible answers weighted from 1 to 5. Each answer is
also specific to one of the five possible maturity stages. In this way, after getting all the answers, we can
sum up all the weightings for each category and divide them by the number of questions per category (e.g.:
a score for architecture, one for data modelling, etc.). In the end, an overall score can be obtained by
summing up the scores for all the categories and dividing them by six (the number of categories). What is
your opinion on the scoring method? Should we add weightings for each category (e.g.: architecture – 0.2;
data modelling – 0.3; etc.)? What other changes would you make?
Final Questions:
1.
What are the current trends in DW in your opinion?
2.
What are the situational factors (if any) in your opinion that can influence the development of a DW and
hence, the applicability of the DW Capability Maturity Model?

Thank you for your time and cooperation! Are there any additional comments or questions? Turn off
recorder!
- 135 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Appendix E: Case Study Interview Protocol
Interviewee :
Date
Organization:
Start Time:
Place
End Time :
:
:
Interviewer Instructions


Ask for recording permission, for processing purposes. Recordings will be deleted after processing. Check if
recorder works correctly!
Start with the following introduction and continue with the questions:
General information:
As briefly explained in our e-mail contact, my name is Catalina Sacu and I am following the two years Master of
Business Informatics at Utrecht University. I am currently writing my thesis under the supervision of dr. M.R. Spruit
and dr. J. M. Versendaal, aiming to develop a Data Warehouse Capability Maturity Model. The main goal of my
thesis is to create a model that would help organizations assess their current data warehouse solution from both a
technical and an organizational and processes points of view.
Research:
In nowadays economy, organizations have a lot of information to gather and process in order to be able to take the
best decisions as fast as possible. One of the solutions that can improve the decision making process is the usage of
Business Intelligence (BI)/Data Warehouses (DW) solutions. They combine tools, technologies and processes in
order to turn data into information and information into knowledge that can optimize business actions. However,
even if organizations spend a lot of money for developing these solutions, more than 50 percent of the BI/DW
projects fail to deliver the promised results (Gartner Group, 2007). This was the trigger for my research that aims at
creating a DW Capability Maturity Model. In this way, we will be able to assess and score the different variables
that influence the quality of a DW and determine the current situation of an organization‘s DW solution. Then, we
will be able to offer some guidelines on future DW improvements that will lead to a better organizational
performance.
Goal:
As said before, the main goal of my research is to develop a Data Warehouse Capability Maturity Model. This
interview is part of my research and its main objective is to test the model in an organization to see if it works in
practice and get some feedback for future improvements of the model. The interview will contain questions
regarding the following aspects:


Your organization and role
The Data Warehouse Maturity Assessment.
Data collected during the interview will only be used for my thesis and will be processed anonymously. At the end
of my research, you will have the chance to see the results and the final model. The interview will last for about 1.5
hours. Before we start, are there any questions? OK, let‘s start! Start recorder!
- 136 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Questions:
Organization and Role
1.
Could you give a short introduction to your organization (including products, markets, customers)?
2.
Could you explain your role in the organization (including your experience in BI/DW) and the BI/DW
project? On a scale from 1 to 5, how would you judge your knowledge on BI (Business vs. Technical)?
The Data Warehouse Maturity Assessment Questionnaire


Show and explain the Data Warehouse Capability Maturity Model (with all its components).
Please see the attached questionnaire and fill in the answers.

Thank you for your time and cooperation! Are there any additional comments or questions? Turn off
recorder!
- 137 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Appendix F: Case Study Feedback Template
1. Maturity Scores


Short overview on the maturity assessment questionnaire.
Tables with maturity scores and radar graph.
2. Feedback



Strengths regarding the current DW solution
Feedback regarding the DW technical solution
Feedback regarding the DW organization & processes.
- 138 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Appendix G: Paper
Paper will be submitted to the Journal of Database Management.
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu1, Marco Spruit1, Frank Habers2
1
Institute of Information and Computing Sciences,
Utrecht University, 3508 TC, Utrecht, The Netherlands.
2
Inergy, 3447 GW Woerden, The Netherlands.
Abstract: Data Warehouses and Business Intelligence have been part of a very dynamic and
popular field of research in the last years as they help organizations in making better decisions and
increasing their profitability. This paper aims at creating a Data Warehouse Capability Maturity
Model (DWCMM) focused on the technical and organizational aspects involved in developing a
data warehouse environment. This model and its associated maturity assessment questionnaire can
be used to help organizations assess their current DW solution and provide them with guidelines
for future improvements. The DWCMM was evaluated empirically through multiple expert
interviews and case studies to enrich and validate the theory we have developed.
Keywords: Data Warehousing, Business Intelligence, Maturity Modelling.
Introduction and Problem Definition
In nowadays economy, organizations are part of a very dynamic environment due to continuous changing
conditions and relationships. As Kaye (1996) notes, ―organizations must collect, process, use, and
communicate information, both external and internal, in order to plan, operate and take decisions‖ (p. 20).
The ongoing request for profits, increasing competition and demanding customers, all require
organizations to take the best decisions as fast as possible (Vitt et al., 2002). One of the solutions that can
narrow down the period of time between the moment of acquiring the information and getting the right
results to improve the decision making process is the implementation of Data Warehouses and Business
Intelligence (BI) applications.
Over the years, data warehouses (DWs) and BI solutions have become one of the fundamentals of the
information systems that are used to support the decision making initiatives. Most large companies have
already established DW systems as a component of the information systems landscape. According to
(Gartner, 2007), BI and DWs are at the forefront of the use of IT to support management decisionmaking. DWs can be thought of as the large-scale data infrastructure for decision support. BI can be
viewed as the data analysis and presentation layer that sits between the DW and the executive decisionmakers (Arnott & Pervan, 2005). In this way, the DW/BI solutions can transform raw data into
information and then into knowledge.
- 139 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
However, a DW is not only a software package. The adoption of DW technology requires massive capital
expenditure and a certain deal of implementation time. DW projects are hence very expensive, timeconsuming and risky undertakings compared with other information technology initiatives, as cited by
prior researchers (Wixom & Watson, 2001; Hwang et al., 2004; Solomon, 2005). Moreover, it is often
believed that one-half to two-thirds of all initial DW efforts fail (Hayen et al., 2007). (Gartner, 2007)
estimates that more than fifty percent of DW projects have limited acceptance or fail. Therefore, it is
crucial to have a thorough understanding of the critical success factors and variables that determine the
efficient implementation of a DW solution.
These factors can refer to the development of the DW/BI solution or to the usage and adoption of BI. In
this research, we will focus on the former as we consider that it represents the foundation for a solid DW
solution that can have a high rate of usage and adoption. First, it is critical to properly design and
implement the databases that lie at the heart of the DW. The right architecture and design can ensure
performance today and scalability tomorrow. Second, all components of the DW solution (e.g.: data
repository, infrastructure, user interface) must be designed to work together in a flexible, easy-to-use way.
A third task is to develop a consistent data model and establish what and how source data will be
extracted. In addition to these factors, the DW needs to be created and developed quickly and efficiently
so that the organization can gain the business benefits as soon as possible (AbuAli & Abu-Addose, 2010).
As can be seen, a DW project can unquestionably be complex and challenging, and there is usually not a
single successful solution that can be applied to all organizations. Therefore, it is very important for
organizations to be aware of their current situation and know the steps they need to take for continuous
improvement. However, an objective assessment often proves to be a difficult task.
Maturity models can be helpful in this situation. They essentially describe the development of an entity
over time, where the entity can be anything of interest: a human being, an organizational function, an
organization, etc. (Klimko, 2001). Maturity models have a number of sequentially ordered levels, where
the bottom stage stands for an initial state than can be, for example, characterized by an organization
having little capabilities in the domain under consideration. In contrast, the highest stage represents a
conception of total maturity. Advancing on the evolution path between the two extremes involves a
continuous progression regarding the organization‘s capabilities or process performance. The maturity
model serves as an assessment of the position on the evolution path, as it offers a set of criteria and
characteristics that need to be fulfilled in order to reach a particular maturity level (Becker et al., 2009).
With the help of maturity modelling, we will gain some insight into the technical and organizational
variables that determine the successful development of a DW solution and analyze these variables.
Therefore, in order to make an assessment of the most important aspects that influence a DW project, this
paper develops a Data Warehouse Capability Maturity Model (DWCMM) which provides an answer to
the following research question:
How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?
Research Methodology
The main goal of this research is to develop a DWCMM that depicts the maturity stages of a DW project.
For this purpose, a design research approach is used as its main philosophy is to generate scientific
knowledge by building and validating a previously designed artifact (Hevner et al., 2004). In this
- 140 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
research, the artifact is the DWCMM, which is developed according to the five steps in developing design
research artifacts as described by (Vaishnavi & Kuechler, 2008): problem awareness, suggestion and
development, evaluation and conclusion. Awareness of the problem was raised in discussions with
DW/BI practitioners and literature study on data warehousing and maturity modelling. A detailed problem
description was provided in the section before. Based on this, it has become clear that DW projects often
fail or do not bring the expected results and that organizations sometimes need guidelines for
improvement. As a solution to this problem, we developed the DWCMM which can be used to assist
organizations in doing a maturity assessment for the DW technical aspects and in providing guidelines for
future improvements. First, an overview on the model and its main components will be presented. Then,
results of the evaluation phase are presented. The DWCMM has been evaluated by carrying out five
expert interviews and multiple case study within four organizations, following (Yin, 2009) case study
approach. Finally, the last section contains conclusions regarding our model and agenda for future
research.
DWCMM: The Data Warehouse Capability Maturity Model
In literature, a lot of maturity models have been developed (de Bruin et al., 2005), but only some of them
managed to gain global acceptance. There are also several information technology and/or information
system maturity models dealing with different aspects of maturity: technological, organizational and
process maturity. Some of them are specific to the data warehousing/BI field. The most important
maturity models that served as a source of inspiration for our research can be seen in table 1.
Authors
Nolan (1973)
Software Engineering Institute
(SEI) (1993)
Watson,
Ariyachandra
&
Matyska (2001)
Chamoni & Gluchowski (2004)
The Data Warehousing Institute
(TDWI) (2004)
Gartner – Hostmann (2007)
Model
Stages of Growth
Capability Maturity Model (CMM)
Focus
IT Growth Inside an Organization
Software Development Processes
Data Warehousing Stages of Growth
Data Warehousing
Business Intelligence Maturity Model
Business Intelligence Maturity Model
Business Intelligence
Business Intelligence
Business Intelligence and Performance
Management Maturity Model
Table 1: Overview of Maturity Models.
Business
Intelligence
Performance Management
and
Each of these models has a different way of assessing maturity, but there are some common elements for
all the models. All the models have interesting elements, but also weak points that could be improved.
Moreover, all the models developed for the field of data warehousing/BI focus on more variables
involved in such a project, but they do not go deep into analyzing the technical aspects.
The maturity model which served as the main foundation for this research is the CMM (Paulk et al.,
1995). It has become a recognized standard for rating software development organizations. The CMM is a
framework that describes the key elements of an effective software process and presents an evolutionary
improvement path from an ad-hoc, immature process to a mature, disciplined one. Since its development,
CMM has become a universal model for assessing software process maturity. However, the CMM has
often been criticized for its complexity and difficulty of implementation. That is why we simplified it by
keeping the five maturity levels (i.e.: initial, repeatable, defined, managed and optimizing), the process
- 141 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
capabilities and the key process areas, which in our model would translate to the chosen benchmark
variables/categories for doing the DW maturity assessment.
Therefore, it can be seen that even if DW/BI solutions are often implemented in practice and a lot of
maturity models have been created, none is actually focusing on the technical aspects of the DW/BI
solution and the organizational processes that sustain them. Hence, this is the research gap we would like
to fill in by developing a Data Warehouse Capability Maturity Model (DWCMM) that focuses on the
DW technical solution and DW organization and processes. The DWCMM can be depicted in figure 1. A
short overview of the model and its components will be provided in the next paragraphs.
When analyzing the maturity of a DW solution, we are actually taking a snapshot of an organization at the
current moment in time. Therefore, in order to do a valuable assessment, it is important to include in the
maturity analysis the most representative dimensions involved in the development of a DW solution.
Several authors describe that the main phases usually involved in a DW project lifecycle are (Kimball et
al., 2008; Moss & Atre, 2003; Ponniah, 2001): project planning and management, requirements
definition, design, development, testing and acceptance, deployment, growth and maintenance. All of
these phases and processes refer to the implementation and maintenance of the actual DW technical
solution which includes: the general architecture and infrastructure, data modelling, ETL, BI applications.
These categories can be analyzed from many points of view which will be depicted in our model and the
maturity assessment we developed. Therefore, the DWCMM will be restricted for doing the assessment of
the technical aspects, without taking into consideration the DW/BI usage and adoption or the DW/BI
business value. It will consider two main benchmark variables/categories for analysis, each of them
having several sub-categories. Firstly, the DW Technical Solution consists of the following four
components: General Architecture and Infrastructure, Data Modelling, Extract-Transform-Load (ETL)
and BI Applications. Secondly, the DW Organization & Processes dimension comprises the following
two aspects: Development Processes and Service Processes.
Figure 1: Data Warehouse Capability Maturity Model (DWCMM).
- 142 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
As can be seen from figure 1, the DWCMM does a maturity assessment which will provide a maturity
score for each benchmark sub-category. In order to create a complete image on the current DW solution
for an organization, the DWCMM has several components:

A DW maturity assessment questionnaire:
The whole DW maturity assessment questionnaire has been published in (Sacu et al., 2010). Emphasis
should be put on two aspects regarding the DW maturity assessment questionnaire. Firstly, it does a high
level assessment of an organization’s DW solution and it is limited strictly to the DW technical aspects.
Secondly, the model will assess “what” and “if” certain characteristics and processes are implemented
and not “how” they are implemented. The DW maturity assessment questionnaire has 60 questions
divided into the following three categories:



DW General Questions (9 questions) – it comprises of several questions about the DW/BI
solution and they are not scored. Their purpose is to offer a better image on the drivers for
implementing the DW environment, the budget allocated for data warehousing and BI, the DW
business value, end-user adoption, etc. This will be useful in creating a complete picture on the
current DW solution and its maturity. Also, once the questionnaire is filled in by more
organizations, this data will serve as input for statistical analysis and comparisons between
organizations from the same industry or across industries.
DW Technical Solution (32 questions) – it comprises of several scored questions for each of the
following sub-categories:
 General Architecture and Infrastructure (9 questions)
 Data Modelling (9 questions)
 ETL (7 questions)
 BI Applications (7 questions). More details on this part will be given in the next sections.
DW Organization & Processes (19 questions) – it comprises of several scored questions for each
of the following sub-categories:
 Development Processes (11 questions)
 Service Processes (8 questions). More details on this part will be given in the next
sections.
Each question from the questionnaire will have five possible answers which are scored from 1 to 5, 1
being a characteristic for the lowest maturity stage and 5 for the highest one. When an organization takes
the survey, it will first receive a maturity score for each sub-category by computing the average value of
the weightings (i.e.: sum of the weightings / number of questions); then, an overall score for each of the
two main categories will be given by computing the average value of the scores obtained for each subcategory; and finally, an overall maturity score is shown following the same principle applied to the main
two categories scores.
We believe that the maturity scores for the sub-categories can give a good overview on the current DW
solution implemented by the organization. This is the reason why, after computing the maturity scores for
each sub-category, a radar graph as the one depicted in figure 1 will be drawn to show the alignment
between these scores. In this way, the organization will have a clearer image on their current DW project
and will know what sub-category is the strongest and which one is left behind.
- 143 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Moreover, after reviewing the maturity scores and the given answers by a specific organization, some
general feedback and advice for future improvements will be provided. Each organization that takes the
assessment will receive a document with a short explanation on the scoring method, a table with their
maturity scores and the radar graph, and then some general feedback that will consist of: a general
overview on the maturity scores; an analysis of the positive aspects already implemented in the DW
solution; and several steps that the organization should take in order to improve their current DW
application.

A condensed DW maturity matrix:
As our model measures the maturity of a DW solution, we also created two maturity matrices – a
condensed maturity matrix and a detailed one – each of them having five maturity stages as inspired by
the CMM: Initial (1); Repeatable (2); Defined (3); Managed (4); Optimized (5); where the initial stage
describes an incipient DW development and the optimized level shows a very mature solution that can be
obtained by an organization with a lot of experience in the field where everything is standardized and
monitored. An organization will usually be situated on different stages of maturity for each sub-category
that will determine the overall maturity level.
The condensed DW maturity matrix gives a short overview of the most important characteristics for each
sub-category for each maturity level. This will offer a better image on the main goal of the DWCMM and
on what the detailed maturity matrix entails. The condensed maturity matrix can be seen in figure 2.
Stages
DW Technical Solution
Benchmark Variables
Architecture
Initial (1)
Desktop
marts
data
Repeatable (2)
Defined (3)
Managed (4)
Optimized (5)
DW/BI service
that federates a
central DW and
other
sources
via
standard
interface
Enterprise-wide
standards and
automatic
synchronization
of all the data
models
Optimized ETL
for
real-time
DW with all the
standards
defined
Independent
data marts
Independent
data
warehouses
Central
DW
with/without
data marts
Data Modelling
No data models
synchronization
or standards
Manually
synchronized
data models
Manually
or
automatically
synchronized
data models
Automatic
synchronization
of most data
models
ETL
Simple ETL with
no standards that
just extracts and
loads data into
the DW
Basic ETL with
simple
transformations
BI Applications
Static
and
parameter-driven
reports
Ad-hoc
reporting;
OLAP
Advanced ETL
(e.g.
slowly
changing
dimensions
manager, data
quality system,
reusability, etc.)
Dashboards &
scorecards
More advanced
ETL
(e.g.
hierarchy
manager,
special
dimensions
manager, etc.)
Predictive
analytics; data
& text mining
Closed-loop &
real-time
BI
applications
- 144 -
DW Organization & Processes
DWCMM: The Data Warehouse Capability Maturity Model
Development Processes
Service Processes

Ad-hoc,
nonstandardized
development
processes
or
defined phases
Some
Standardized
development
development
processes
processes with
policies
and all the phases
procedures
separated and
established with all the roles
some
phases formalized
separated
Ad-hoc,
non- Some service Standardized
standardized
processes
service
service processes policies
and processes with
procedures
all the roles
established
formalized
Figure 2: DWCMM Condensed Maturity Matrix.
Catalina Sacu
Quantitative
development
processes
management
Continuous
development
processes
improvement
Quantitative
service
processes
management
Continuous
service
processes
improvement
A detailed DW maturity matrix:
We will give a short overview on the detailed DW maturity matrix in this paragraph. First, the
characteristics for each maturity stage are usually obtained by mapping the correspondent answers of each
question from the maturity assessment questionnaire (except for several characteristics such as: project
management, testing and acceptance, whose answers are formulated in a different way). In this way, an
organization will be able to see their maturity stage by category (e.g.: General Architecture and
Infrastructure) and by main category characteristics (e.g.: metadata, standards, infrastructure, etc.). The
matrix has two dimensions:


columns – show each benchmark sub-category (i.e.: General Architecture and Infrastructure,
Data Modelling, ETL, BI Applications; Development Processes, Service Processes) with their
maturity stages from Initial (1) to Optimized (5);
rows – show the main analyzed characteristics (e.g.: for General Architecture and Infrastructure
– conceptual architecture, business rules, metadata, security, data sources, performance,
infrastructure, update frequency) for each sub-category divided by maturity stage.
Moreover, the matrix can be interpreted in two ways. First, one could take each stage and see which the
specific characteristics for each sub-category for that particular stage are. Second, one could take each
sub-category and see which its specific characteristics for each stage or for a particular stage are.
As the developed questionnaire does an assessment for each benchmark sub-category, a specific
organization will most likely follow the second interpretation. They would probably like to know what
steps to take to improve each sub-category and hence, the overall maturity score, which will lead to a
higher maturity stage. It is also very unlikely that an organization will have all the characteristics for all
the sub-categories on the same maturity stage at the same moment in time. Therefore, if a company gets a
maturity score of 3, this does not mean that all the characteristics for all the sub-categories are on stage
three. Depending also on the standard deviation and the answers themselves, we can find out more
information about the actual situation.
Now that the main components of the DWCMM have been identified, we will continue by taking a closer
look at the main categories and sub-categories of the model and their analyzed characteristics. These can
be depicted in the maturity assessment questionnaire and detailed maturity matrix. We will start with the
DW technical solution and continue with the DW organization and processes.
- 145 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
DW Technical Solution Maturity
As mentioned earlier, the main components that need to be analyzed when doing an assessment of the
DW technical solution are: general architecture and infrastructure, data modelling, ETL and BI
applications.
General Architecture and Infrastructure
DW architecture includes: three main components (i.e.: data modelling, ETL, BI applications), several
data storage components (e.g.: source systems, data staging area, DW database, operational data store,
data marts) and the way they are assembled together (Ponniah, 2001), and underlying elements such as
infrastructure, metadata and security that support the flow of data from the source systems to the endusers (Kimball et al., 2008; Chauduri & Dayal, 1997). This is connected to the conceptual approach of
designing and building the DW (e.g.: conformed data marts – Kimball or enterprise-wide DW – Inmon,
etc.). Therefore, in this research we consider architecture and infrastructure as a separate sub-category for
assessing maturity and for which the main characteristics will be further analyzed.
Conceptual architecture and its layers (question 1) – encompasses the conceptual approach of designing
and building the DW with all its data storage layers.
DW data sources (question 6) - the types of data sources that the DW extracts data from (e.g.: Excel files,
text files, relational databases, ERP & CRM systems, unstructured data: text documents, e-mails, images,
videos, Web data sources).
Infrastructure (question 8) – it provides the underlying foundation that enables the DW architecture to be
implemented (Ponniah, 2001), and it includes elements such as: hardware platforms and components,
operating systems, database platforms, connectivity and networking (Kimball et al., 2008).
Metadata management (question 4) – metadata can be seen as all the information that defines and
describes the structures, operations and contents of the DW system in order to support the administration
and effective exploitation of the DW. The main elements that influence its maturity are: the types of
implemented metadata (i.e.: business, technical or process) and the integration of metadata repositories
(Moss & Atre, 2003; Kimball et al., 2008).
Security management (question 5) – user access security is usually implemented through several methods,
presented here in hierarchical order of difficulty of implementation (Kimball et al., 2008; Moss & Atre,
2003; Ponniah, 2001): authentication, tool-based security, role-based security, authorization.
Business rules (questions 2 & 3) – they are abstractions of the policies and practices of a business
organization (Kaula, 2009), and are used to capture and implement precise business logic in processes,
procedures, and systems (manual or automated).
Performance optimization (question 7) – encompasses the various methods needed to improve DW
performance (Ponniah, 2001): software performance improvement (e.g.: index management, data
partitioning, parallel processing, view materialization); hardware performance improvement; specialized
DW appliances or cloud computing which are characteristics for a very high stage of maturity.
- 146 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Update frequency (question 9) – it is one of the characteristics that differentiate classical DW solutions
built for strategic and tactical BI from the newer DWs that process data in real time.
Data Modelling
Data modelling is the process of creating a data model. A data model is ―a set of concepts that can be used
to describe the structure of and operations on a database‖ (Navathe, 1992, pp. 112-113). Data modelling is
very important for creating a successful information system as it defines not only data elements, but also
their structures and relationships between them. The most important characteristics which should be taken
into consideration when assessing the maturity of data modelling are described below.
Synchronization between all the data models found in the DW (question 2) – establishing consistency
among data from a source to a target data storage and vice versa and the continuous harmonization of the
data over time.
Design levels (question 3) – encompasses all the data model design levels: conceptual design, logical
design and physical design.
Tool (question 1) – data models can be created by just drawing the models in different spreadsheets and
documents. However, the more mature solution is to use a data modelling tool that can make the design
itself and metadata management easier and more efficient.
Standards (questions 4 & 5) – standards in a DW environment are necessary and cover a wide range of
objects, processes, and procedures. All the maturity assessments related to standards will address general
aspects such as the definition and documentation of standards and their actual implementation. Most
often, standards related to data modelling refer to naming conventions for the objects and attributes in the
data models.
Metadata management (question 6) – encompasses the common subset of business and technical
metadata components as they apply to data (Moss & Atre, 2003): data names, definitions, relationships,
identifiers, types, lengths, policies, ownership, etc.
Dimensional modelling (questions 7, 8 & 9) – there are several data modelling techniques that can be
applied for data warehousing: relational (or normalized), dimensional, data vault, etc. In this research we
focused on dimensional modelling. For more information on dimensional modelling, see (Kimball, 1996).
Extract-Transform-Load (ETL)
As the name shows, the Extract-Transform-Load (ETL) process mainly involves the following activities:
extracting data from outside sources; transforming data to fit the target‘s requirements; loading data into
the target database. The ETL system is very complex and resource demanding (Kimball et al., 2008), and
hence, 60 to 80 percent of the time and effort of developing a DW project is devoted to the ETL system
(Nagabhushana, 2006). The main characteristics that we included in our ETL maturity assessment are
further described in this paragraph.
Complexity (question 2) – this refers to the maturity and performance of each ETL component (i.e.:
extract, transform, load). For example, the extraction phase should include a data profiling system, a
- 147 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
change data capture system and the extract system itself. The transformation step usually includes
cleaning and transforming data according to the business rules and standards that have been established
for the DW. The DW load system takes the load images created by the extraction and transformation
subsystems and loads these images directly into the DW.
Data quality system (question 3) – data quality is critical for the success of a DW. Therefore, we decided
to include a question that would depict its main characteristics for each maturity stage regarding: daily
automation, specific data quality tools, identifying data quality issues and actually solving them.
Management and monitoring (question 4) – encompasses all the necessary capabilities for the ETL
processes to run consistently to completion and be available when needed (e.g.: an ETL job scheduler; a
backup system; a recovery and restart system – it can be manual or automatic; a workflow monitor, etc.)
Tool (question 1) – there is a constant debate whether an organization should deploy custom-coded ETL
solutions or should buy an ETL tool suite (Kimball & Caserta, 2004). A company that uses hand-coded
ETL usually does not have a very complex ETL process which shows a low level of maturity regarding
ETL capabilities.
Metadata management (question 7) – ETL is responsible for the creation and use of much of the metadata
describing the DW environment. Therefore, it is important to capture and manage all possible types of
metadata for ETL: business, technical and process metadata.
Standards (questions 5 & 6) – includes ETL specific standards that are related to: naming conventions,
set-up standards, recovery and restart system, etc.
BI Applications
BI applications, sometimes referred to as ―front-end‖ tools (Chauduri & Dayal, 1997), are what the endusers see and hence, are very important for a DW to be considered a successful one. According to March
& Hevner (2007), a crucial point for achieving DW implementation success is the selection and
implementation of appropriate end-user analysis tools, because business benefits of BI are only gained
when the system is adopted by its intended end-users. The main aspects that determine the maturity of BI
applications are analyzed further in this paragraph.
Types of BI applications (question 1) – encompasses the main types of BI whose complexity contributes
to the maturity of a DW environment. According to Azvine et al. (2006), traditional BI applications fall
into the following categories sorted by ascending complexity: report what has happened – standard
reporting and query applications; analyze and understand why it has happened – ad-hoc reporting and
online analytical processing (OLAP); visualization applications (i.e.: dashboards, scorecards); predict
what will happen – predictive analytics (i.e.: data and text mining). In the last couple of years, due to the
development of real-time data warehousing, a new category of BI applications has developed called
operational BI and closed-loop applications (Kimball et al., 2008).
Delivery method (question 6) – it includes the main BI applications delivery methods. As end users are
interested only in the results they get from the BI applications, the easiness of accessing and delivering
these results is critical for the success of the DW solution.
- 148 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Tool (question 2) – defines the usage of BI applications tools which can really make a difference for the
DW solution.
Metadata management (question 7) – encompasses the main metadata accessibility methods. As BI
applications are what the end user sees, this is an important aspect for DW success (Moss & Atre, 2003).
Standards (questions 3 & 4) – it includes standards specific to BI Applications such as: naming
conventions, generic transformations, logical structure of attributes and measures, etc.
DW Organization and Processes Maturity
When assessing the maturity of a DW technical solution, the processes and roles involved in the project
also need to be analyzed. A good technical solution cannot be developed without the processes
surrounding it as there is a strong interconnection between the two parts. The necessary processes for a
DW project are: development processes and service processes.
DW Development Processes
A DW solution can be considered a software engineering project with some specific characteristics. And,
therefore, as any software engineering project, it will go through several development stages (Moss &
Atre, 2003). Since DW/BI is an enterprise-wide evolving environment that is continually improved and
enhanced based on feedback from the business community, the best approach for its development is
iterative and incremental development, with agile techniques for the development of BI applications
(Kimball et al., 2008; Ponniah, 2001). The high level phases and tasks required for an effective DW
implementation are (Kimball et al., 2008; Moss & Atre, 2003): project planning and management;
requirements definition; design; development; testing and acceptance; deployment/production. The main
characteristics which might influence the maturity of DW development processes can be seen below.
CMM levels (question 1) – as it is hard to judge which software development paradigm is better and more
mature, the first maturity question on development processes is a more general one and it refers to how
the DW development processes map to the CMM levels.
Project planning and management (question 7) – encompasses the main elements that determine the
maturity of this characteristic (Lewis, 2001): project planning and scheduling; project risk management;
project tracking and control; standard procedure and documentation; and evaluation and assessment.
DW/BI sponsor (question 6) – defines the extent of organizational support and sponsorship for the DW
environment. Strong support and sponsorship from senior business management is critical for a successful
DW initiative (Ponniah, 2001).
DW project team and roles (question 8) – encompasses how DW project roles and responsibilities are
formalized and implemented to solve skill-role mismatches (Humphries et al., 1999; Nagabhushana,
2006).
Requirements definition (question 10) – encompasses how requirements definition is done. In a DW,
users‘ business requirements represent the most powerful driving force (Ponniah, 2001) as they impact
virtually every aspect of the project.
- 149 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Testing and acceptance (question 11) – this is a critical phase for DW success as it includes several
important activities which are not always implemented. The degree of implementation influences the
success of a DW project and hence, its maturity.
Development/ testing/ acceptance/ production environments (question 2) – encompasses the way
organizations set up different environments for different purposes to support all the development phases
(Moss & Atre, 2003).
DW quality management (question 5) – its purpose is to provide management with appropriate visibility
into the development process being used by the DW project and the products being built (Paulk et al.,
1995).
Knowledge management (question 9) – encompasses all the knowledge management activities and the
way they are implemented.
Standards (questions 3 & 4) – makes an analysis of the standards used for successfully developing,
testing and deploying DW functionalities.
DW Service Processes
In the last two decades, software maintenance began to be treated as a sequence of activities and not as
the final stage of a software development project (April et al., 2004). These processes are very important
after a DW has been deployed in order to keep the system up and running and to manage all the necessary
changes. Lately, IT organizations made a transition from being pure technology providers to being service
providers. This service oriented perspective on IT organizations can be best applied to the software
maintenance field as it is an ongoing activity as opposed to the software development which is more
project based (Niessink & van Vliet, 2000). Over the years, various IT service frameworks have been
proposed, but one that acts as the de-facto standard for the definition of best practices and processes for
service support and service delivery is the Information Technology Infrastructure Library (ITIL) (Salle,
2004). Therefore, we will consider the service components from ITIL as a starting point for our analysis
of the DW service processes part. Moreover, two maturity models related to IT maintenance and service
also served as a foundation for this part of our DW maturity model: the Software Maintenance Maturity
Model (April et al., 2004) and the IT Service CMM (Niessink et al., 2002). Taking into consideration
these models and the changing nature of a DW, we considered the following components when assessing
the maturity of DW service processes.
Service quality management (question 2) – this is similar to the DW quality management, but applied to
the service processes.
Knowledge management (question 3) – this is also similar to the DW development processes, but in the
context of service processes.
Service level management (question 4) – it negotiates service level agreements (SLAs) with the suppliers
and customers and ensures that they are met by continual monitoring and reviewing (Cater-Steel, 2006).
Incident management (question 5) – its main objective is to provide continuity by restoring the service in
the quickest way possible by whatever means necessary (Salle, 2004).
- 150 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Change management (question 6) – it is described as a regular task for immediate and efficient handling
of changes that might occur in a DW environment.
Technical resource management (question 7) – the purpose of resource management is to maintain
control of the necessary hardware and software resources needed to deliver the agreed DW services level
targets (Niessink & van Vliet, 1999).
Availability management (question 8) – manages risks and ensures that all DW infrastructure, processes,
tools and roles are according to the SLAs by using appropriate means and techniques (Colin, 2004).
Release management (question 9) – as a DW is continuously changing and evolving over time, the
objective of release management is to ensure that only authorized and correct versions of DW are made
available for operation (Salle, 2004).
Evaluation of the DWCMM
In order to validate the DWCMM, two methods were chosen – expert validation and multiple case studies
– on which we will elaborate in this section.
Expert Validation
To evaluate the utility and further revise the DWCMM, expert validation was applied. An ―expert‖ is
defined by Hoffman et al. (1995) as a person ―highly regarded by peers, whose judgements are
uncommonly accurate and reliable and who can deal effectively with rare or tough cases. Also, an expert
is one who has special skills or knowledge derived from extensive experience with subdomains‖ (p. 132).
Therefore, eliciting knowledge from experts is very important and useful and can be done using several
methods, one of them being structured and unstructured interviews (Hoffman et al., 1995).
Moreover, five experts in data warehousing and BI were interviewed and asked to give their opinions
about the content of the model we have developed. The interviews were structured, but consisted of open
questions, in order to capture the knowledge of respondents. This offered the possibility of enabling the
experts to liberally state their opinions and ideas for improvement. The expert panel consists of five
experts from practice, each of them having at least 10 years of experience in the DW/BI field. An
overview of the experts and their affiliations is depicted in table 2. All of them are DW/BI consultants at
different organizations in The Netherlands (local or multinational).
Respondent
ID
Job Position
1
CI/BI consultant
Industry
DW/BI
Consulting
B2B
≈ 45
Market
Employees
2
3
Principal consultant/ BI consultant
Thought
leader
BI/CRM
Respondent Affiliation
IT Services
BI Consulting
4
Principal
consultant BI
5
BI consultant
IT Services
DW Consulting
B2B
≈ 49000
B2B
≈ 38000
B2B
≈1
B2B
≈ 35
Table 2: Expert Overview.
- 151 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
The experts were asked to give their opinions regarding the DWCMM structure, the DWCMM condensed
maturity matrix and the DW maturity assessment questionnaire. All reviewers gave positive feedback for
their first impression of all three deliverables, said they made sense and the model could be applied for
assessing an organization‘s current DW solution. Valuable insights and criticism were provided that
resulted in several (mostly minor) improvements. Furthermore, the category ―Architecture‖ was renamed
―General Architecture and Infrastructure‖ as the former created some confusion among the interviewees.
Some adjustments were made to the ETL characterization for each stage of the DWCMM condensed
maturity matrix. However, most feedback was received regarding the maturity assessment questionnaire.
This resulted in two categories of changes: proposed changes that due to time constraints and scope
limitation were not implemented in the final version of the model, but should be considered for future
research; and implemented improvement suggestions that involved some question rephrasing and answer
rephrasing or changing.
Multiple Case Studies
Depending on the nature of a research topic and the goal of a researcher, different research methods
(qualitative and quantitative) are appropriate to be used (Benbasat et al., 1987; Yin, 2009). One of the
most widely used qualitative research methods in information systems (IS) research is case study
research. It can be used to achieve various research aims: provide descriptions of phenomena, develop
theory and test theory (Darke et al., 1998). In our research, we will use it to test theory which in this case
is the DWCMM we developed. The theory is usually either validated or found to be inadequate in some
way, and may then be further refined on the basis of the case study findings. Case study research may
adopt single or multiple case designs.
As according to Benbasat et al. (1987) and Yin (2009), multiple case studies are preferred over single
ones to get better results and analytic conclusions, we decided to conduct a multiple case study research
following (Yin, 2009) case study approach. In this way, we can achieve a multiple goal: test the model in
practice to see if the chosen benchmark variables/categories, the maturity assessment questions and
answers match the organizations‘ specific solutions; and receive feedback and knowledge from
respondents regarding the DWCMM in order to make future improvements. Despite the fact that all
individual cases are interesting, this section focuses on the overall results.
Case Overview
The case studies have been conducted at four organizations of different sizes, operating in several types of
industries and offering a wide variety of products and services. An overview of the case study
organizations (figures are taken from 2009 annual reports) and respondents is depicted in table 3. The
main criterion used in the search for suitable organizations was that all approached organizations had a
professionally DW/BI system in place whose maturity could be assessed by applying the DWCMM.
Furthermore, an important criterion for the selection of respondent per case was that the interviewed
respondents had an overall view on the technical and organizational aspects for the DW/BI solution
implemented in their organization. A short analysis on the maturity scores each organization got after
taking the assessment is also given further in this paragraph.
- 152 -
DWCMM: The Data Warehouse Capability Maturity Model
Organization
Industry
Market
Revenue
Employees
Respondent
Function
A
Catalina Sacu
B
Retail
Insurance
C
Retail
B2C
19.94 billion €
≈ 138000
BI consultant
B2B & B2C
B2C
4.87 billion €
780 million €
≈ 4500
≈ 3660
DW/BI
technical BI manager
architect
Table 3: Case and Respondent Overview.
D
Maintenance
&
Servicing
B2B
NA
≈ 3500
BI consultant & DW
lead architect
Case Study Analysis
In this section, a short analysis of the results gotten by all the organizations after filling in the assessment
questionnaire is given. The maturity scores regarding the implemented DW solution obtained by the
organizations can be seen in the table below.
Maturity Score
Benchmark Category
Architecture
Data Modelling
ETL
BI Applications
Development Processes
Service Processes
Organization A
Organization B
Organization C
2.67
2.56
2.17
3.44
3.14
3.29
2.71
2.71
2.90
3.19
2.63
3.00
Table 4: Organizations‘ Maturity Scores.
Organization D
3.89
3.00
3.71
3.43
3.66
2.87
3.55
4.11
2.86
3.57
3.02
3.12
As shown in the picture depicting our model, a better way to see the alignment between the maturity
scores for the six categories is by drawing the radar graph. We will show here the radar graph for
organization A as an example.
Service
Processes
Architecture
5
4
3
2
1
0
Development
Processes
Data
Modelling
Organization A
Ideal Situation
ETL
BI
Applications
Figure 3: Alignment Between Organization A‘s Maturity Scores.
Some more information regarding the maturity scores for all the four case studies are provided in the table
below.
- 153 -
DWCMM: The Data Warehouse Capability Maturity Model
Organization
Maturity Score
Total Maturity Score for DW
Technical Solution
Total Maturity Score for DW
Organization & Processes
Overall Maturity Score
Highest Score
Lowest Score
Catalina Sacu
A
B
C
D
2.67
3.00
3.51
3.52
2.77
3.10
3.26
3.07
2.72
ETL - 3.14
3.05
Data Modelling 3.44
Architecture - 2.56
3.38
Architecture 3.89
Service Processes
- 2.87
3.29
Data Modelling 4.11
ETL - 2.86
Data Modelling 2.17
Table 5: Maturity Scores Analysis.
As can be seen from table 4, maturity scores for each sub-category are usually between 2 and 4, with one
exception: organization D scored 4.11 for Data Modelling. Thus, the overall maturity scores and the total
scores per category ranged between 2 and 4 which shows that most organizations are probably
somewhere between the second and fourth stage of maturity. The highest maturity score was gotten by
organization C, and the lowest one by organization A. Apparently, an overall score close to 4 or 5 is quite
difficult to achieve. This is usually normal in maturity assessments, as in practice, nobody is so close to
the ideal situation. It will be interesting to see the range of scores after the questionnaire will have been
filled in by a large number of organizations.
From table 5 it can be seen that the categories with the highest and lowest scores are diverse depending on
the organization. For example, organization A scored lowest for Data Modelling, whereas Data Modelling
was the most mature variable for organization D. Interesting conclusions can also be drawn if comparing
the scores for organizations A and C as they are part of the same industry. The former is an international
food retailer and has more experience in this industry, whereas the latter is a local one with less
experience. However, organization A got a quite low DW maturity score. Thus, experience in the industry
does not also mean maturity in data warehousing. Of course, more factors can influence this difference in
scores: size, the way data warehousing/BI is embedded in the organizational culture, the percentage from
the IT budget for BI, etc.
However, the goal of our model is not only to give a maturity score to a specific organization, but also
provide them with some feedback and the necessary steps for reaching a higher maturity stage. For
example, the overall maturity score for organization A is 2.72, which leaves a lot of room for
improvement. Moreover, as the lowest score is for Data Modelling, a good starting point would be this
category. Due to confidentiality reasons, more details regarding the maturity scores and feedback cannot
be offered here.
Benchmarking
As already mentioned in the previous sections, the DWCMM can serve as a benchmarking tool for
organizations. The DW maturity assessment questionnaire provides a quick way for organizations to
assess their DW maturity and, at the same time, compare themselves in an objective way against others in
the same industry or across industries. Of course, better results will be achieved for benchmarking after
more organizations will take the maturity assessment. However, in order to have a better image on how
- 154 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
the graph will look like when doing benchmarking, we will provide here an example for organization A
using the data from the case studies we performed. The bar chart can be depicted below.
Service Processes
Development Processes
BI Applications
Average Score
Best Practice
ETL
Organization A
Data Modelling
Architecture
0
1
2
3
4
5
Figure 4: Benchmarking for Organization A.
To sum up, the DW maturity assessment questionnaire can be successfully applied in practice. We
generally received positive feedback regarding the questionnaire from the case study interviewees. In this
way, we could test whether the questions and their answers are representative for assessing the current
DW solution for a specific organization and if they can be mapped to any organization depending on the
situational factors. Respondents usually had no problems in recognizing the proposed benchmark
categories and understanding the questions and answers from the survey. We also had the chance to apply
the scoring method and give appropriate feedback for each case study. Finally, we combined all the
feedback received from the case studies and did some minor, but valuable improvements to several
questions and answers in order for them to be more representative for the analyzed characteristics and
better fit the maturity stages.
Conclusions and Further Research
This research has been triggered by the estimates made by (Gartner, 2007) and other researchers that
more than fifty percent of DW projects have limited acceptance or fail. Therefore, we developed a Data
Warehouse Capability Maturity Model (DWCMM) that would help organizations assess the technical
aspects of their current DW solution and provide guidelines for future improvements. In this way we
attempted to answer the main research question for our study:
How can the maturity of a company’s data warehouse technical aspects be assessed and acted upon?
The main conclusion from our study is that, even if our maturity model could help organizations improve
their DW solutions, there is no ―silver bullet‖ for a successful development of DW/BI solutions. The
DWCMM provides a quick way for organizations to assess their DW/BI maturity and compare
themselves in an objective way against others in the same industry or across industries. It received
positive feedback from the five experts that reviewed and validated it and it also resonated well with the
audiences from our four case studies. Several (mostly minor) improvements were made after the
validation process.
- 155 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
However, our model is not without limitations. First of all, it is critical to emphasize the fact that the
model only does a high-level assessment. In order to truly assess the maturity of their DW/BI solutions
and discover the strong and weak variables, organizations should use our assessment as a starting point
for a more thorough analysis. In the future, several questions could be added in our model for a more
detailed analysis of the current DW/BI environment and more valuable feedback offered to organizations.
Second, a limitation of this study is that it is based on the design science research which answers to
research questions in the form of design artifacts. Being a qualitative research method, a risk for
objectivity might arise. Another limitation is related to the validation process for our model. Due to time
constraints and difficulty of finding them, it was reviewed only by five experts. Therefore, more experts
should be interviewed in the future to enrich the structure and content of the model. Also, due to the fact
that the model was tested only in four cases, it is not possible to generalize the findings to any given
similar situation. For further research, it would be interesting to validate the model using quantitative
research methods. In this way, we will be able to do some statistical analysis on the data, more valuable
benchmarking and improvements on the whole structure of the model. Another future extension that
would increase the value of the model could include questions and analysis for other types of data
modelling (e.g.: normalized modelling, data vault, etc.) because, as stated earlier in this paper, we limited
our maturity assessment only to dimensional modelling. Last, but not least, more work is also needed to
extend our model to the analysis of DW/BI end user adoption and business value. New benchmark
categories and maturity assessment questions could be added regarding these two problems.
References
AbuAli, A., & Abu-Addose, H. (2010). Data Warehouse Critical Success Factors. European Journal of Scientific
Research, 42 , (2), 326-335.
Aldrich, H., & Mindlin, S. (1978). Uncertainty and Dependence: Two Perspectives on Environment. In L. Karpik,
Organization and Environment: Theories, Issues and Reality (pp. 149-170). London: Sage Publications Inc.
Arnott, D., & Pervan, G. (2005). A Critical Analysis of Decision Support Systems Research. Journal of Information
Technology, 20 , (2), 67-87.
Blumberg, R., & Atre, S. (2003). The Problem with Unstructured Data. Retrieved July 23, 2010, from Information
Management: http://www.information-management.com/issues/20030201/6287-1.html
Cater-Steel, A. (2006). Transforming IT Service Management - the ITIL Impact. Proceedings of the 17th
Australasian Conference on Information Systems. Adelaide, Australia.
Cavaye, A. (1996). Case Study Research: A Multifaceted Research Approach for Information Systems. Information
Systems Journal, 6 , 227-242.
Chamoni, P., & Gluchowski, P. (2004). Integrationstrends bei Business-Intelligence-Systemen, Empirische
Untersuchung auf Basis des Business Intelligence Maturity Model. Wirtschaftsinformatik, 46 , (2), 119-128.
Chauduri, S., & Dayal, U. (1997). An Overview of Data Warehousing and OLAP Technology. ACM Sigmod
Record, 26 , (1), 65-74.
Choo, C. (1995). Information Management for the Intelligent Organization. Medford, NJ: Information Today, Inc.
Colin, R. (2004). An Introductory Overview of ITIL. Reading, United Kingdom: itSMF Publications.
- 156 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
de Bruin, T., Freezey, R., Kulkarniz, U., & Rosemann, M. (2005). Understanding the Main Phases of Developing a
Maturity Assessment Model. Proceedings of the 16th Australasian Conference on Information Systems. Sydney,
Australia.
Eckerson, W. (2004). Gauge Your Data Warehousing Maturity. Retrieved July 3, 2010, from The Data Warehousing
Institute: http://tdwi.org/Articles/2004/10/19/Gauge-Your-Data-Warehousing-Maturity.aspx?Page=2
Feinberg, D., & Beyer, M. (2010). Magic Quadrant for Data Warehouse Database Management Systems. Retrieved
July 21, 2010, from Business Intelligence: http://www.businessintelligence.info/docs/estudios/Gartner-MagicQuadrant-for-Datawarehouse-Systems-2010.pdf
Gartner. (2007, February 1). Creating Enterprise Leverage: The 2007 CIO Agenda . Retrieved June 24, 2010, from
Gartner: http://www.gartner.com/DisplayDocument?id=500835
Gray, P., & Negash, S. (2003). Business Intelligence. Proceedings of the 9th Americas Conference on Information
Systems, (pp. 3190-3199). Tampa, Florida, USA.
Hakes, C. (1996). The Corporate Self Assessment Handbook, 3rd edition. London: Chapman & Hall.
Hevner, A., March, S., Park, J., & Ram, S. (2004). Design Science in Information Systems Research. Management
Information Systems Quarterly, 28 , (1), 75-106.
Hoffman, R., Shadbolt, N., Burton, A., & Klein, G. (1995). Eliciting Knowledge from Experts: A Methodological
Analysis. Organizational Behaviour and Human Decision Processes, 62 , (2), 129-158.
Hwang, H., Ku, C., Yen, D., & Cheng, C. (2005). Critical Factors Influencing the Adoption of Data Warehouse
Technology: A Study of the Banking Industry in Taiwan. Decision Support Systems, 37 , 1-21.
Inmon, W. (1992). Building the Data Warehouse. Indianapolis: John Wiley and Sons, Inc.
Kaula, R. (2009). Business Rules for Data Warehouse. International Journal of Information Technology, 5 , 58-66.
Kaye, D. (1996). An Information Model of Organization. Managing Information, 3 , (6),19-21.
Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., & Becker, B. (2008). The Data Warehouse Lifecycle Toolkit,
2nd Edition. Indianapolis: Wiley Publishing, Inc.
Klimko, G. (2001). Knowledge Management and Maturity Models: Building Common Understanding . Proceedings
of the 2nd European Conference on Knowledge Management, (pp. 269-278). Bled, Slovenia.
Lewis, J. (2001). Project Planning, Scheduling and Control, 3rd Edition. New York: McGraw-Hill.
Madden, S. (2006). Rethinking Database Appliances. Retrieved July 21, 2010, from Information Management:
http://www.information-management.com/specialreports/20061024/1066827-1.html?pg=1
March, S., & Hevner, A. (2007). Integrated Decision Support Systems: A Data Warehousing Perspective. Decision
Support Systems, 43 , (3), 1031-1043.
Moss, L., & Atre, S. (2003). Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support
Applications. Boston: Addison Wesley.
Nagabhushana, S. (2006). Data Warehousing. OLAP and Data Mining. New Delhi: New Age International Limited.
- 157 -
DWCMM: The Data Warehouse Capability Maturity Model
Catalina Sacu
Navathe, S. B. (1992). Evolution of Data Modelling for Databases. Communications of the ACM, 35 , (9), 112-123.
Nolan, R. (1973). Managing the Computer Resource: A Stage Hypothesis. Communications of the ACM, 16 , (7),
399-405.
Ponniah, P. (2001). Data Warehousing Fundamentals. New York: John Wiley & Sons, Inc.
Salle, M. (2004). IT Service Management and IT Governance: Review, Comparative. Retrieved July 16, 2010, from
HP Technical Reports: http://www.hpl.hp.com/techreports/2004/HPL-2004-98.pdf
Sacu, C., Spruit, M. & Habers, F. (2010). Data Warehouse (DW) Maturity Assessment Questionnaire. Utrecht:
Utrecht University.
Sen, A., & Sinha, A. (2005). A Comparison of Data Warehousing Methodologies. Communications of the ACM, 48 ,
(3), 79-84.
Vaishnavi, V., & Kuechler, W. (2008). Design Science Research Methods and Patters: Innovating Information and
Communication Technology. Boca Raton, Florida: Auerbach Publications Taylor & Francis Group.
Yin, R. (2009). Case Study Research Design and Methods. Thousand Oaks, California: SAGE Inc.
- 158 -