Copyright © 2014 Mosaic Data Science. All rights reserved.

Transcription

Copyright © 2014 Mosaic Data Science. All rights reserved.
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic
Data Science,
January
2014
Copyright © 2014 Mosaic Data
Science.
All rights
reserved.
Why Enterprise Data Warehouse
Projects Fail, and What to do About it
Introduction
Mosaic Data Science, January 2014
Everyone knows data warehouses are risky. It is an IT truism that enterprise data
warehouse (EDW) projects are unusually risky. One paper on the subject begins, “D
Introduction
warehouse projects are notoriously difficult to manage, and many of them end in
failure.”1 A book on EDW project management reports that “the most experienced
Everyone knows data warehouses
are risky.
It is an with
IT truism
enterprise
databecause “estimating on
project managers
[struggle]
EDWthat
projects,”
in part
warehouse (EDW) projectswarehouse
are unusually
risky.isOne
on the
subject
“Data
projects
verypaper
difficult
[since]
eachbegins,
data warehouse
project can be so
2
warehouse projects are notoriously
difficult
manage,
and EDW
many of
in
Manytoarticles
about
riskthem
cite end
the laundry
lists of EDW failure and r
different.”
project
reports
failure.”1 A book on EDWtypes
in management
Chapter 4 “Risks”
ofthat
the “the
samemost
book,experienced
arguing that EDW project riskiness deri
project managers [struggle]from
withthe
EDW
projects,”
in
part
because
“estimating
multitude of risk factors inherent in EDWon
projects. Likewise, experts such a
warehouse projects is very Ralph
difficult
[since]offer
each numerous
data warehouse
project
be so EDW project risk.3
Kimball
guidelines
forcan
decreasing
different.”2 Many articles about EDW risk cite the laundry lists of EDW failure and risk
types in Chapter 4 “Risks” One
of therisk
same
book,stands
arguing
thatWe
EDW
project
riskiness
derives
factor
out.
agree
that EDW
projects
are complex, and have ma
from the multitude of risk factors
inherent
EDW
projects.
experts such
as are more difficult to
risk factors.
Weinalso
agree
that inLikewise,
general, complex
projects
3
Ralph Kimball offer numerous
guidelines
forsaid
decreasing
EDWdecades
project of
risk.
manage.
Having
that, several
experience with EDW projects lead us
focus on one particular pattern of EDW project failure that we believe is largely
One risk factor stands out.
We agreefor
thatEDW
EDWprojects’
projectsreputation
are complex,
responsible
for and
highhave
risk.many
This white paper describes th
risk factors. We also agreefailure
that in pattern,
general,explains
complexthe
projects
are more
difficult
to
risk factor
behind
the pattern,
and suggests how to avoid
manage. Having said that, risk
several
decades
of experience
with EDW
projectsthroughout
lead us to the project lifespan.
while
delivering
value as quickly
as possible
focus on one particular pattern of EDW project failure that we believe is largely
responsible for EDW projects’ reputation for high risk. This white paper describes the
failure pattern, explains theThe
risk Failure
factor behind
the pattern, and suggests how to avoid the
Pattern
risk while delivering value as quickly as possible throughout the project lifespan.
The pattern proper. The failure pattern is surprisingly mundane:
The Failure Pattern
1. The EDW project’s business sponsors agree to fund the project.
The pattern proper. The failure
is surprisingly
mundane:
2. pattern
In exchange
for sponsorship,
the sponsors pressure the project team to deliver
substantial business value early—typically within three months of project ons
1. The EDW project’s business
sponsors
agreethe
to fund
the project.
More
generally,
sponsors
require a project roadmap that promises frequen
delivery of major new increments of functionality.
2. In exchange for sponsorship, the sponsors pressure the project team to deliver
substantial business value
within
threethe
months
project
onset.
3. early—typically
Eager for a chance
to build
EDW,ofthe
project
team negotiates and commit
More generally, the sponsors
require adelivery
project schedule,
roadmap that
promises
frequent of some high-visibility dat
a frequent
starting
with delivery
delivery of major new increments
reports of
at functionality.
the end of the first project phase.
3. Eager for a chance to build the EDW, the project team negotiates and commits to
1
Robert M. starting
Bruckner,with
Beatedelivery
List, and of
Josef
Schiefer,
“Risk Management
a frequent delivery schedule,
some
high-visibility
data orfor Data Warehouse System
Lecture
Notes
in
Computer
Science
Vol.
2114,
pp.
219-229
(Springer
Verlag, 2001).
reports at the end of2 the first project phase.
Sid Adelman and Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
Management (Pearson, 2000), pp. 1-26.
3
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
1
Robert M. Bruckner, Beate List,visited
and Josef
Schiefer,
“Risk Management for Data Warehouse Systems.”
January
30, 2014.
Lecture Notes in Computer Science Vol. 2114, pp. 219-229 (Springer Verlag, 2001).
2
Sid Adelman and Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
Management (Pearson, 2000), pp. 1-26.
3
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing/,
visited January 30, 2014.
1
1
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic
Data Science,
January
2014
Copyright © 2014 Mosaic Data
Science.
All rights
reserved.
4. The project team works
very hard through the first project phase, but still does not
Introduction
deliver its initial commitments on time.
Everyone knows data warehouses are risky. It is an IT truism that enterprise data
5. As time progresses the
inevitable
frequent
scopeare
changes
(thatrisky.
partlyOne
make
EDW
warehouse
(EDW)
projects
unusually
paper
on the subject begins, “D
projects infamous) accrue.
These
divert
the
project
team’s
attention
from
its
warehouse projects are notoriously difficult to manage, and many of them end in
original commitments.
The 1team
mayon
renegotiate
the delivery
schedule,
butthat “the most experienced
A book
EDW project
management
reports
failure.”
nevertheless continues
to
fall
ever
more
behind.
project managers [struggle] with EDW projects,” in part because “estimating on
warehouse projects is very difficult [since] each data warehouse project can be so
2
6. Eventually the project’s
schedule
dilation
ruinsabout
the project
team’s
Many
articles
EDW risk
citecredibility.
the laundry lists of EDW failure and r
different.”
Management replaces
oneinteam
of EDW
developers
management
types
Chapter
4 “Risks”
of thewith
sameanother,
book, arguing
that EDW project riskiness deri
itself gets replaced, from
or thethe
project
is
scrapped.
multitude of risk factors inherent in EDW projects. Likewise, experts such a
Ralph Kimball offer numerous guidelines for decreasing EDW project risk.3
Here’s the remarkable thing about this pattern: even experienced EDW technologists
well able to anticipate scopeOne
changes
find themselves
in the
The scope
risk factor
stands out.
Wesame
agreebind.
that EDW
projects are complex, and have ma
changes per se are not the root
of
the
problem.
Nor
is
the
problem
just
a
failure
to
risk factors. We also agree that in general, complex projects
are more difficult to
renegotiate schedule commitments;
these
renegotiations
are
a
typical
part
of
the
pattern.
manage. Having said that, several decades of experience with EDW projects lead us
Rather, the problem lies with
the on
project
team’s failure
to estimate
requirements
focus
one particular
pattern
of EDW time
project
failure that we believe is largely
accurately—even time requirements
for
re-negotiated
priorities—especially
earlyThis white paper describes th
responsible for EDW projects’ reputation for highinrisk.
project phases. The key question
is
why
this
occurs.
To
answer
this
question,
we
must
failure pattern, explains the risk factor behind the pattern,
and suggests how to avoid
detour into data architecturerisk
enough
grasp a few
of its
concepts,
notably
the
whiletodelivering
value
as basic
quickly
as possible
throughout
the project lifespan.
entity type.
Data Architecture 101. An
entity
typePattern
is a class of things that a database represents.
The
Failure
Common business examples of entity types include customer, product, and address. A
typical business database represents
on the
order of
100
entitypattern
types. is surprisingly mundane:
The pattern
proper.
The
failure
Data architects organize entity 1.
types
into
dataproject’s
models. business
A logicalsponsors
data model
The
EDW
agreerepresents
to fund the project.
entity types in the abstract. A physical data model realizes a logical model in a given
types of model
account forthe
thesponsors
logical relations
physical database.4 In principle,
2. both
In exchange
for sponsorship,
pressure the project team to deliver
among entity types.5 However, thesubstantial
data architect
must
decide
which
relations
a data three months of project ons
business value early—typically within
model actually captures. A good logical
model is fairly
exhaustive
abouta project
these relations,
More generally,
the sponsors
require
roadmap that promises frequen
but a physical model typically represents
a proper
subset
those in the
logical model.6
delivery
of major
new of
increments
of functionality.
3.as Eager
for aphysical
chancemodels
to build
the one
EDW,
project
team negotiates and commit
Logical models are often represented
if they were
having
of thethe
normal
forms
a
frequent
delivery
schedule,
starting
with
delivery
of some high-visibility dat
initially defined by the database researchers Codd and Boyce in the 1979s. See
http://en.wikipedia.org/wiki/Database_normalization#Normal_forms
the project
list of normal
forms.
reports at the end of thefor
first
phase.
5
Data models are similar in principle to class hierarchies in object-oriented programming. Data models
differ from class hierarchies in that they do not emphasize the inheritance relation. Object-relational
programming attempts to reconcile
the relational
and object-oriented
either
through
a middle for Data Warehouse System
1
Robert
M. Bruckner,
Beate List, and paradigms,
Josef Schiefer,
“Risk
Management
tier between application code andLecture
database
that
supposedly
eliminates
the
“impedance
mismatch”
between
Notes in Computer Science Vol. 2114, pp. 219-229 (Springer
Verlag, 2001).
2
them, or through an object-relational
database
management
system
that
has
both
relational
and
objectSid Adelman and Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
oriented features. See for example
http://www.princeton.edu/~achaney/tmve/wiki100k/docs/ObjectManagement
(Pearson, 2000), pp. 1-26.
3
relational_database.html.
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
6
In a physical model, most relations
among
entity
visited
January
30,types
2014.appear as foreign keys. Here’s the short course on
database keys. A primary key is a unique identifier for an entity, that is, an instance of an entity type. A
natural primary key is a unique identifier that comes from outside of a database. (Someone enters it
manually, or a data-integration process imports it.) A surrogate primary key is a primary key generated
by the database, usually an integer. A foreign key is a reference in one table to a surrogate primary key in
another table.
4
2
2
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
WhyScience.
Enterprise
Copyright © 2014 Mosaic Data
AllData
rightsWarehouse
reserved. Projects Fail, and What to do About it
Mosaic Data Science, January 2014
Precedence relations in logical
models. Now, suppose that within an EDW’s logical
Introduction
model, two entity types have some relation. For example, the logical model might
represent the customer and Everyone
purchase-transaction
entity
types, with
customer
knows data
warehouses
areeach
risky.
It is an IT truism that enterprise data
7,8
optionally making several purchases.
In
such
cases
one
of
the
entity
types
mustpaper on the subject begins, “D
warehouse (EDW) projects are unusually risky. One
precede the other in two senses,
both
important
to
the
EDW
development
process:
warehouse projects are notoriously difficult to manage, and many of them end in
failure.”1 A book on EDW project management reports that “the most experienced
Physical-implementation
One entity
project precedence:
managers [struggle]
withtype’s
EDW physical
projects,”model
in partmust
because “estimating on
be implemented first.
Here,
the
customer
(dimension)
table
can
be
implemented
warehouse projects is very difficult [since] each data warehouse project can be so
2
physically, regardless
of the existence
of a purchase
But
purchase
Many articles
about (fact)
EDW table.
risk cite
thethe
laundry
lists of EDW failure and r
different.”
table must have a foreign
key
referencing
the customer
(or customer
group)
types in
Chapter
4 “Risks”
of the same
book, arguing
that EDW project riskiness deri
dimension table’s surrogate
key
to implement
theprojects.
physical Likewise, experts such a
from
theprimary
multitude
ofcolumn,
risk factors
inherent infully
EDW
9
model of a purchase.Ralph
Kimball offer numerous guidelines for decreasing EDW project risk.3
ETL-implementation
One entity
extract-transform-load
Oneprecedence:
risk factor stands
out. type’s
We agree
that EDW projects are complex, and have ma
(ETL) process mustrisk
be implemented
first.
In
our
example
one
loadprojects
a
factors. We also agree that in general, could
complex
are more difficult to
customer without having
loaded
any
particular
purchase,
but
one
cannot
load
a
manage. Having said that, several decades of experience with
EDW projects lead us
specific purchase without
first
having
loaded
the customer
(or
customer
group)
focus
on
one
particular
pattern
of
EDW
project
failure
that
we
believe is largely
10
making the purchase.
responsible for EDW projects’ reputation for high risk. This white paper describes th
failure pattern, explains the risk factor behind the pattern, and suggests how to avoid
These two precedence relations
always
run in thevalue
sameasdirection,
so possible
logicallythroughout
speaking the project lifespan.
risk while
delivering
quickly as
they can combine into a single precedence relation.
A natural visual representation
the entity-type
Theof
Failure
Pattern precedence relations in a logical model
is a directed graph, where the arcs (arrows) represent precedence. For example, Figure 1
below represents customer The
preceding
purchase:
pattern
proper. The failure pattern is surprisingly mundane:
1. The EDW project’s business sponsors agree to fund the project.
2. In exchange for sponsorship, the sponsors pressure the project team to deliver
substantial business value early—typically within three months of project ons
7
If several customers can make one purchase
together,
the relation
between customer
purchase
is many that promises frequen
More
generally,
the sponsors
require and
a project
roadmap
(customers) to many (purchases); if not, it is one (customer) to many (purchases). Likewise, if a customer
delivery of major new increments of functionality.
need not have a purchase, the relation is optional on the customer side; otherwise it is required. These
aspects of a logical relation are termed the relation’s cardinality.
3. Eager
forbea achance
to build
EDW,
the project
team negotiates and commit
In the present example a customer would
probably
dimension,
and athe
purchase
a fact,
in the EDW’s
physical model. The analysis applies regardless,
but isdelivery
more subtle,
in other cases—for
example,
if bothof some high-visibility dat
a frequent
schedule,
starting with
delivery
entity types are dimensions, or if one is a reports
dimension
other
attribute
of thephase.
dimension. For
atand
thethe
end
of is
theanfirst
project
brevity’s sake we gloss this distinction.
9
Partial representations can skirt the problem at the cost of creating substantial technical debt. Hard
experience has taught us to implement
an entire entity type’s physical model all at once (and likewise an
1
M. aBruckner,
Beate
Josef Schiefer,
for Data Warehouse System
entity type’s entire ETL process, atRobert
least for
given source
or List,
set ofand
sources).
We have“Risk
neverManagement
found
Lecture
Notes
inthe
Computer
Science Vol. 2114, pp. 219-229 (Springer Verlag, 2001).
implementing partial physical entity
types
worth
cost.
2
10
Sid Adelman
and Larissa
Moss,using
“Introduction
Data termed
Warehousing,”
It is possible to load missing dimension
data after
a fact isT.loaded,
a design to
pattern
a late- Data Warehouse Project
Management
(Pearson,
2000),
pp.
1-26.
arriving dimension. See http://www.kimballgroup.com/data-warehouse-business-intelligence3
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
resources/kimball-techniques/dimensional-modeling-techniques/late-arriving-dimension/.
Late-arriving
visited
2014.
dimensions don’t really avoid the issue, January
because 30,
even
when one uses late-arriving dimensions, one still
8
must implement ETL to load the dimension foreign key for the fact whenever the foreign key’s value is
available. In a strong sense, late-arriving dimensions are a workaround for a data-quality or datagovernance problem. They are not the dimension’s happy-path ETL process.
3
3
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic Data Science, January 2014
Copyright © 2014 Mosaic Data Science. All rights reserved.
Introduction
Customer
Purchase
Everyone knows data warehouses are risky. It is an IT truism that enterprise data
warehouse (EDW) projects are unusually risky. One paper on the subject begins, “D
warehouse
projectsPrecedence
are notoriously
difficult to manage, and many of them end in
Figure
1: Example
Graph
failure.”1 A book on EDW project management reports that “the most experienced
project
managers
[struggle]
with EDW
projects,”
Let’s call this sort of diagram
the logical
model’s
precedence
digraph
(PDG).in11part because “estimating on
warehouse projects is very difficult [since] each data warehouse project can be so
2
A PDG has three interestingdifferent.”
properties. Many articles about EDW risk cite the laundry lists of EDW failure and r
types in Chapter 4 “Risks” of the same book, arguing that EDW project riskiness deri
theismultitude
of risk connected,
factors inherent
in EDW
PDGs are acyclic: from
A PDG
not necessarily
but each
of its projects.
connectedLikewise, experts such a
3
Ralph Kimball
numerous
guidelines
for decreasing
components is necessarily
acyclic. offer
A cycle
would imply
that some
entity typeEDW project risk.
12
precedes itself.
One risk factor stands out. We agree that EDW projects are complex, and have ma
agree acyclic
that in general,
complex
projects
PDGs have stages: risk
As factors.
is true forWe
anyalso
directed
graph, the
nodes can
be are more difficult to
said
that,
several
decades of experience
with
arranged into stages.manage.
A node Having
is in stage
0 if
it has
no predecessors.
It is in stage
n EDW projects lead us
focus
on
one
particular
pattern
of
EDW
project
failure
that
we
(for n > 0) if all of its predecessors are lower-numbered stages, and at least one of believe is largely
reputation
for 10
high
risk. 13This white paper describes th
its predecessors is inresponsible
stage n – 1.forAEDW
typicalprojects’
EDW PDG
has about
stages.
failure pattern, explains the risk factor behind the pattern, and suggests how to avoid
risk iswhile
delivering
value as stages:
quickly as
throughout
Most of the good stuff
in the
high-numbered
Thepossible
entity types
most the project lifespan.
useful to a business generally appear in a PDG’s late (high numbered) stages. For
example, analyzing early-stage entity types such as dates, locations, and products
The Failure
Pattern value per se. In contrast, purchases
does not typically provide
much business
may involve products, customers, contracts, discounts, locations, dates and times,
pattern
proper.
failure
is purchases
surprisingly
salespeople, etc. SoThe
all of
these other
entityThe
types
mustpattern
precede
in mundane:
the
PDG, forcing the purchase entity type to appear in a late stage. And analyzing
purchase patterns is a high-value
EDWproject’s
activity. business sponsors agree to fund the project.
1. The EDW
Entity-type precedence and EDW
scheduling.
We canthe
now
explainpressure
the failure
2. Inproject
exchange
for sponsorship,
sponsors
the project team to deliver
pattern we describe at the start of this
paper, inbusiness
terms ofvalue
PDGs.early—typically
Here are the key
insights:
substantial
within
three months of project ons
More generally, the sponsors require a project roadmap that promises frequen
1. EDW project sponsors wantdelivery
to reportofon
and analyze
late-stageofentity
types, early
major
new increments
functionality.
in the project.
3. Eager for a chance to build the EDW, the project team negotiates and commit
2. When EDW project teams promise
delivery
of late-stage
entity types,
a frequent
delivery
schedule, starting
with they
delivery of some high-visibility dat
habitually fail to recognize that
they
are
also
committing
to
deliver
all
of
reports at the end of the first project phase. the sofar unimplemented entity types required by the promised late-stage entity types.
Early in an EDW project, most of a late-stage entity type’s PDG predecessors are
Robert
Beate List, and
Josef Schiefer,
“Risk Management
for Data Warehouse System
unimplemented. So1the
riskM.ofBruckner,
over-commitment
is greatest
at project
onset.
Lecture Notes in Computer Science Vol. 2114, pp. 219-229 (Springer Verlag, 2001).
2
Sid Adelman and Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
Management
(Pearson, 2000), pp. 1-26.
11
You don’t need to generate the3graph
manually. Instead, use the open-source utility graphviz to translate
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
a text file listing the graphs nodesvisited
and arcs
into a30,
nicely
formatted graph that you can print in any size.
January
2014.
12
Graphing a logical model’s PDG and checking it for cycles is a great way to discover structural problems
in the logical model. The Unix/Linux tsort (topological sort) utility makes checking for cycles trivial.
13
Batch ETL can be architected according to these stages. The data is loaded one stage at a time, and all
entity types in a given stage can be loaded in parallel. This approach provably maximizes the parallelism in
the ETL process, which is another good reason to construct and study the PDG of an EDW’s logical model.
4
4
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic
Data Science,
January
2014
Copyright © 2014 Mosaic Data
Science.
All rights
reserved.
Notice that the risk is likelyIntroduction
to be present each time a project team re-negotiates a project
schedule. This is true because the re-negotiation is usually a response to shifting sponsor
priorities, not to the projectEveryone
team’s discovering
the warehouses
extent of its over-commitment.
knows data
are risky. It is an IT truism that enterprise data
warehouse (EDW) projects are unusually risky. One paper on the subject begins, “D
Outside of our own work, we
have never
seen EDW
project teams
consider
formally
how
warehouse
projects
are notoriously
difficult
to manage,
and
many of them end in
1
entity type precedence relations
bear on
project
Inmanagement
particular, wereports
have never
A book
onscheduling.
EDW project
that “the most experienced
failure.”
seen an EDW team construct
a PDG.
Nor to [struggle]
our knowledge
has anyone
developed
project
managers
with EDW
projects,”
in partand
because “estimating on
published an EDW project-scheduling
methodology
based
on
the
PDG.
warehouse projects is very difficult [since] each data warehouse project can be so
different.”2 Many articles about EDW risk cite the laundry lists of EDW failure and r
types in Chapter 4 “Risks” of the same book, arguing that EDW project riskiness deri
The Remedy: EDW Project-Schedule
Optimization
from the multitude
of risk factors inherent in EDW projects. Likewise, experts such a
Ralph Kimball offer numerous guidelines for decreasing EDW project risk.3
In our experience the best way to avoid EDW project over-commitment is to maintain a
PDG for the EDW’s logicalOne
model,
to use
the PDG
inform
riskand
factor
stands
out. toWe
agreethe
thatproject’s
EDW projects are complex, and have ma
implementation schedule. In
what
follows
we
enlarge
this
recommendation
intoprojects
a formalare more difficult to
risk factors. We also agree that in general, complex
model of optimal EDW project
scheduling.
Some
of
the
model’s
elements
will
be
manage. Having said that, several decades of experience with EDW projects lead us
familiar to anyone with EDW
experience.
focus
on one particular pattern of EDW project failure that we believe is largely
responsible for EDW projects’ reputation for high risk. This white paper describes th
Enterprise ontology. The failure
scheduling
model
requires
enterprise
ontology,
whichand
is a suggests how to avoid
pattern,
explains
thean
risk
factor behind
the pattern,
logical model that lists eachrisk
entity
type’s
data
sources
(databases,
OLTP
applications,
while delivering value as quickly as possible throughout the project lifespan.
etc.) with their volumetrics, and that records other metadata relevant to EDW
implementation. For technical reasons the ontology should limit itself to entity types that
will be represented by a single
in the
physical model. For example, a person entity
Thetable
Failure
Pattern
type is too abstract, because an EDW will generally have several tables representing
different kinds of people: employees,
supplier
contacts,
customer
contacts,
etc.
The pattern
proper.
The failure
pattern
is surprisingly
mundane:
Functional areas. A functional
a collection
entity types
that satisfy
thefund the project.
1. area
The is
EDW
project’sofbusiness
sponsors
agree to
reporting requirements of a given business function. (One entity type may appear in
many functional areas.) For example,
a product-definition
functional
area might
require
2. In exchange
for sponsorship,
the sponsors
pressure
the project team to deliver
four entity types: manufacturer product,
supplier
product,
internally
developed
product,
substantial business value early—typically within three months of project ons
and unit of measure. This functional
areagenerally,
might serve
product-management
More
the the
sponsors
require a project roadmap that promises frequen
organization. A typical EDW project
will have
on the
order
of 100 functional
areas.
delivery
of major
new
increments
of functionality.
Each functional area should list3.theEager
reports
corresponding
for required
a chanceby
to the
build
the EDW, theorganization.
project team negotiates and commit
(Several organizations may requireathe
same report.)
reportstarting
shouldwith
be mapped
frequent
delivery Each
schedule,
deliverytoof some high-visibility dat
the entity types it queries. Each report
should
also
be
assigned
a
utility
weight,
reports at the end of the first project phase. that is, a
dimensionless positive number representing the report’s relative importance to the
business. We suggest these weights be integers between one and 10.
1
Robert M. Bruckner, Beate List, and Josef Schiefer, “Risk Management for Data Warehouse System
Lecture
Notes in
Science
Vol. 2114,
pp. 219-229
Verlag, 2001).
Project schedule. A project
schedule
isComputer
a sequence
of project
stages.
Each(Springer
functional
2
14 Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
Sid Adelman
and
area belongs to exactly one project
stage.
Consistency with the PDG is a necessary
Management (Pearson, 2000), pp. 1-26.
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
visited January 30, 2014.
14
We could define a project schedule more directly as a sequence of entity types and reports. Our
definition in terms of project stages means we search a subset of the set of all possible sequences of entity
types and reports. As a result, an optimal project schedule may not optimize over this larger set. However,
the optimization algorithm is free to optimize the ordering of previously unimplemented entity types and
reports within a given stage’s functional areas. So depending on the load-leveling constraint, our definition
3
5
5
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic
Data Science,
January
2014
Copyright © 2014 Mosaic Data
Science.
All rights
reserved.
Introduction
condition for feasibility. More
formally, a schedule is infeasible if the following
conditions hold:
Everyone knows data warehouses are risky. It is an IT truism that enterprise data
warehouse
(EDW) area
projects
1. Functional area A precedes
functional
B inare
theunusually
schedule. risky. One paper on the subject begins, “D
warehouse
arex notoriously
2. There exist entity types
x and yprojects
such that
is in A and ydifficult
is in B.to manage, and many of them end in
on PDG.
EDW project management reports that “the most experienced
failure.”1 A
3. Entity type y is an antecedent
of book
x in the
project managers [struggle] with EDW projects,” in part because “estimating on
warehouse
projects
is very
[since]
each
data warehouse
A feasible schedule thus traverses
the PDG
(visits
eachdifficult
node once)
while
visiting
a given project can be so
2
Many articles
about EDW
risk citearea
the in
laundry
different.”
node only after visiting all of
its predecessors.
Thus placing
a functional
a givenlists of EDW failure and r
types
inimplement
Chapter 4 “Risks”
the same
book, arguing
that EDW project riskiness deri
project stage means the team
must
all of theofso-far
unimplemented
entity
types required by the functional
includingofallrisk
of factors
the antecedents
the
from area,
the multitude
inherent required
in EDW by
projects.
Likewise, experts such a
functional area’s so-far unimplemented
reports,
in that project
stage. for decreasing EDW project risk.3
Ralph Kimball
offer numerous
guidelines
It is usually desirable to have
therisk
project
team’s
sizeout.
remain
over
the projects
project are complex, and have ma
One
factor
stands
We constant
agree that
EDW
lifespan. One can express this
by allowing
each
to require
at most
some are more difficult to
risk naturally
factors. We
also agree
thatstage
in general,
complex
projects
fixed number of person hours.
If a project
such decades
a load-leveling
constraint,
manage.
Havingteam
saidimposes
that, several
of experience
with EDW projects lead us
it becomes another necessary
condition
schedule
feasibility.
focus
on onefor
particular
pattern
of EDW project failure that we believe is largely
responsible for EDW projects’ reputation for high risk. This white paper describes th
Workload model. The project
team
should
construct
workload
model attributing
failure
pattern,
explains
thearisk
factor behind
the pattern,aand suggests how to avoid
number of person hours to implementing
each entity
physical
model and
ETL, andthe project lifespan.
risk while delivering
valuetype’s
as quickly
as possible
throughout
to implementing each report. For example, it is common to categorize each data source
as having low, medium, or high complexity, and to assume that all source tables having a
given complexity level require
same Pattern
amount of time (same goes for reports).15 The
Thethe
Failure
workload model should also include one-time activities such as hardware and software
configuration. The workload
model
mustproper.
let the EDW
team compute
project’s total
The
pattern
The failure
pattern isthe
surprisingly
mundane:
and remaining person-hour requirements, given the current project schedule.
1. The EDW project’s business sponsors agree to fund the project.
Scheduling algorithm. Finally, the model should include a scheduling algorithm. The
algorithm must output a feasible
schedule
optimizes the
remaining
project
utility.
2. project
In exchange
forthat
sponsorship,
sponsors
pressure
the project team to deliver
The optimization may be stepwise substantial
optimal (greedy)
or globally
optimal. Ordinarily
business
value early—typically
within three months of project ons
project priorities (represented by the
reports’
utilitythe
weights)
are require
likely toa change
More
generally,
sponsors
project roadmap that promises frequen
frequently over the project lifespan,delivery
so the optimization
be greedy,
to ensure the
of major newshould
increments
of functionality.
project realizes known high utilities in the short term. Global optimization is only
appropriate when all stakeholders
unusually
confident
that the
project
priorities
will team
not negotiates and commit
3. are
Eager
for a chance
to build
EDW,
the project
change, which typically implies that
all requirements
known starting
and wellwith
understood.
a frequent
deliveryare
schedule,
delivery of some high-visibility dat
reports at the end of the first project phase.
It is a practical impossibility for a human to construct a (feasible) project schedule that is
near optimal. There are simply too many possible schedules. For example, if a project
1
87
List, and JoseftoSchiefer,
“Risk
includes 63 functional areas,Robert
there M.
areBruckner,
63! ≈ 10Beate
permutations
consider
(forManagement
feasibilityfor Data Warehouse System
Lecture Notes in Computer Science Vol. 2114, pp. 219-229 (Springer Verlag, 2001).
2
Sid Adelman and Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
may not strongly constrain the search.
We choose
the definition
based
Management
(Pearson,
2000), pp.
1-26.on project stages and functional
areas because the business world3 generally
requires
that
EDW
projects
be structured in this fashion, with a
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
stage typically lasting between three
weeks
and three
months.
visited
January
30,
2014.
15
For source tables, we suggest using as a complexity metric the product of column count and base-10 log
of row count. The assumption is that replication effort is linear in the number of columns, while datacleansing effort increases very sub-linearly with record quantity. Then one can categorize a source table as
low, medium, or high complexity according to the metric’s value. We likewise suggest a report-complexity
metric linear in the number of entity types the report queries.
6
6
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic
Data Science,
January
2014
Copyright © 2014 Mosaic Data
Science.
All rights
reserved.
and optimality). Fortunately,
there is a well-established class of algorithms for solving
Introduction
16
this kind of problem.
Everyone knows data warehouses are risky. It is an IT truism that enterprise data
Sample results. We have used
a greedy
algorithm
for are
an EDW
project
having
warehouse
(EDW)
projects
unusually
risky.
Onethe
paper on the subject begins, “D
following volumetrics:
warehouse projects are notoriously difficult to manage, and many of them end in
failure.”1 A book on EDW project management reports that “the most experienced
• 116 entity types project managers [struggle] with EDW projects,” in part because “estimating on
• 372 entity-type precedence
relations
warehouse
projects is very difficult [since] each data warehouse project can be so
• 674 source tables different.”2 Many articles about EDW risk cite the laundry lists of EDW failure and r
types in Chapter
“Risks”
of the same book, arguing that EDW project riskiness deri
• 1,267 source-table requirements
(for 4entity
tables)
from the multitude of risk factors inherent in EDW projects. Likewise, experts such a
• 325 reports
Ralph Kimball
numerous guidelines for decreasing EDW project risk.3
• 1,380 entity-type requirements
(foroffer
reports)
• 63 functional areas.
One risk factor stands out. We agree that EDW projects are complex, and have ma
risk factors.
We
also agree
in general,
complex
projects
are more difficult to
Setting the work-leveling ceiling
to 5,000
person
hoursthat
yielded
a project
schedule
with 19
manage.
Having
said
that,
several
decades
of
experience
with
stages. Table 1 below lists the stage metrics. The original utilities are between one and EDW projects lead us
one
particular
failure that we believe is largely
10, but the utility-per-hour focus
figuresonare
scaled
up bypattern
10,000of
forEDW
ease project
of comparison:
responsible for EDW projects’ reputation for high risk. This white paper describes th
failure pattern,
the risk factorUtility
behind the pattern, and suggests how to avoid
Stage Functional
TotalexplainsSource
risk
while
delivering
value
as
quickly
as
possible throughout the project lifespan.
Area Count
Hours
Tables per Hour
1
6
4,488
134
20
2
1
2,068
62
10
The
Pattern
3
1 Failure
2,706
74
7
4
2
3,762
89
27
The pattern proper. The failure pattern is surprisingly mundane:
5
1
1,782
46
28
6
1
4,400
85
16
1. The EDW project’s business sponsors agree to fund the project.
7
18
4,664
117
259
8
2
22
1
6818
2. In exchange for sponsorship, the sponsors pressure the project team to deliver
9
1
22 business value
1 early—typically
4546
substantial
within three months of project ons
10
10
418
12
1507 a project roadmap that promises frequen
More generally, the sponsors require
11
3
220 of major new 9increments
1046
delivery
of functionality.
12
1
110
2
909
13
3 3. Eager176
2
568 the project team negotiates and commit
for a chance to build
the EDW,
14
1
154 delivery schedule,
4
455 with delivery of some high-visibility dat
a frequent
starting
15
6
616
16
552phase.
reports at the end of the first project
16
2
66
2
1515
17
1
176
8
398
1
18
1Robert M. Bruckner,
88 Beate List, and
4 Josef Schiefer,
341 “Risk Management for Data Warehouse System
Lecture
Notes
in
Computer
Science
Vol.
2114,
pp.
219-229 (Springer Verlag, 2001).
19
396
6
328
22
Sid Adelman and Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
Management (Pearson, 2000), pp. 1-26.
3
Table
1: Sample Greedy Schedule
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
visited January 30, 2014.
16
That is, a network optimization problem with path-dependent utilities and costs. See for example Dimitri
P. Bertsekas, Network Optimization: Continuous and Discrete Models (Athena Scientific, 1998).
7
7
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic
Data Science,
January
2014
Copyright © 2014 Mosaic Data
Science.
All rights
reserved.
If we consolidate stages 8-19
into a single final stage, preserving functional-area order
Introduction
within the final stage, we end up with an eight-stage schedule. Assuming an eight-person
team working 180 person hours
per month,
longest
stage requires
aboutItthree
Everyone
knowsthe
data
warehouses
are risky.
is ancalendar
IT truism that enterprise data
months; the shortest, about warehouse
five weeks.(EDW)
See Table
2:
projects are unusually risky. One paper on the subject begins, “D
warehouse projects are notoriously difficult to manage, and many of them end in
on EDW project
failure.”1 A book
Stage
Functional
Total
Sourcemanagement
Utilityreports that “the most experienced
Area
Hours
Tables
per
Hour in part because “estimating on
project managers [struggle] with EDW projects,”
warehouse projects is very difficult [since] each data warehouse project can be so
Counts
articles about134
EDW risk cite20
the laundry lists of EDW failure and r
different.”
1
6 2 Many
4,488
types
in
Chapter
4
“Risks”
of
the
same
book,
arguing
that EDW project riskiness deri
2
1
2,068
62
10
from the
of risk factors 74
inherent in EDW
3
1 multitude
2,706
7 projects. Likewise, experts such a
Ralph 2Kimball offer
for decreasing
EDW project risk.3
4
3,762numerous guidelines
89
27
5
1
1,782
46
28
One risk
stands out. We 85
agree that EDW
6
1 factor4,400
16 projects are complex, and have ma
risk factors.
We4,664
also agree that in
general, complex
7
18
117
259 projects are more difficult to
manage.
Having
said
that,
several
decades
of
experience with EDW projects lead us
8
33
2,464
67
832
focus on one particular pattern of EDW project failure that we believe is largely
for EDW
projects’
reputation
for high risk. This white paper describes th
Table 2:responsible
Consolidated
Sample
Greedy
Schedule
failure pattern, explains the risk factor behind the pattern, and suggests how to avoid
while
delivering
value
as quickly
possible
throughout the project lifespan.
The business analyst for therisk
same
project
attempted
repeatedly
toas
order
the functional
areas intuitively. He settled on a schedule that yielded the schedule structure in Table 3:
The Failure
Pattern
Stage
Source
Tables
The pattern proper.
The
failure
1
252 pattern is surprisingly mundane:
2
102
1. The EDW project’s business sponsors agree to fund the project.
3
55
4
70
2. In exchange for sponsorship, the sponsors pressure the project team to deliver
5
60
substantial business value early—typically within three months of project ons
6
41
More generally, the sponsors require a project roadmap that promises frequen
7
delivery of major new33
increments of functionality.
8
8
9
13
3. Eager for a chance to build the EDW, the project team negotiates and commit
10 delivery schedule,
17
a frequent
starting with delivery of some high-visibility dat
11
6
reports at the end of the first project phase.
12
9
1
Table
3: Manually
Schedule
Robert
M. Bruckner,Generated
Beate List, and
Josef Schiefer, “Risk Management for Data Warehouse System
Lecture Notes in Computer Science Vol. 2114, pp. 219-229 (Springer Verlag, 2001).
2
Sid eight
Adelman
and Larissa
T. Moss,
“Introduction
to Data
Warehousing,”
Data Warehouse Project
Consolidating the tail to obtain
stages
as before
yields
the schedule
in Table
4:
Management (Pearson, 2000), pp. 1-26.
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
visited January 30, 2014.
3
8
8
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic Data Science, January 2014
Copyright © 2014 Mosaic Data Science. All rights reserved.
Introduction
Stage
Source
Tables
Everyone knows
data
warehouses
are risky. It is an IT truism that enterprise data
1
252
warehouse (EDW)
are unusually risky. One paper on the subject begins, “D
2 projects 102
warehouse projects
are
notoriously
difficult to manage, and many of them end in
3
55
on
EDW
project
management
reports that “the most experienced
failure.”1 A book
4
70
project managers
5 [struggle] with
60 EDW projects,” in part because “estimating on
warehouse projects
is
very
difficult
[since] each data warehouse project can be so
6
41
articles
about
EDW
risk cite the laundry lists of EDW failure and r
different.”2 Many
7
33
types in Chapter8 4 “Risks” of53
the same book, arguing that EDW project riskiness deri
from the multitude of risk factors inherent in EDW projects. Likewise, experts such a
Ralph
KimballGenerated
offer numerous
guidelinesSchedule
for decreasing EDW project risk.3
Table 4: Sample
Manually
Consolidated
One risk
factorisstands
out.
that EDW
projects are complex, and have ma
The steady drop in source tables
per stage
striking,
as isWe
theagree
difference
in absolute
risk factors.
We also
agreeAfter
that consolidation,
in general, complex
projects are more difficult to
deviations between consecutive
source-table
counts.
the largest
manage.
Having
said
that,
several
decades
of
experience
with EDW projects lead us
stage of the greedy algorithm's schedule contains 2.9 times as many source tables as the
focus
on
one
particular
pattern
of
EDW
project
failure
that
smallest; but the largest manual stage requires 7.6 times as many source tables as the we believe is largely
responsible for EDW projects’ reputation for high risk. This white paper describes th
smallest. See Figure 2 below:
failure pattern, explains the risk factor behind the pattern, and suggests how to avoid
risk while delivering value as quickly as possible throughout the project lifespan.
Source Tables per Stage
300
The Failure Pattern
Source Tables
250
The pattern proper. The failure pattern is surprisingly mundane:
200
150
1. The EDW project’s business sponsors agree to
100
project.
2. In exchange for sponsorship, the sponsors pressure the project team to deliver
substantial business value early—typically within three months of project ons
More
generally,
the sponsors
require
a8project roadmap that promises frequen
3
4
5
6
7
delivery
of
major
new
increments
of
functionality.
Schedule Stage
50
0
1
Greedy
Manual
fund
the
Ideal
2
3. Eager for a chance to build the EDW, the project team negotiates and commit
frequent
schedule,
starting with delivery of some high-visibility dat
Figure 2: SourceaTables
vs.delivery
Stage for
Both Schedules
reports at the end of the first project phase.
Figure 3 below presents both schedules' source-table burn rates against an ideal
(constant) rate:
1
Robert M. Bruckner, Beate List, and Josef Schiefer, “Risk Management for Data Warehouse System
Lecture Notes in Computer Science Vol. 2114, pp. 219-229 (Springer Verlag, 2001).
2
Sid Adelman and Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
Management (Pearson, 2000), pp. 1-26.
3
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
visited January 30, 2014.
9
9
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
Why Enterprise Data Warehouse Projects Fail, and What to do About it
Mosaic
Data Science,
January
2014
Copyright © 2014 Mosaic Data
Science.
All rights
reserved.
IntroductionBurn Rates
800
Source Tables
700
600
500
400
300
200
100
0
1
2
Everyone knows data warehouses are risky. It is an IT truism that enterprise data
warehouse (EDW) projects are unusually risky. One paper on the subject begins, “D
warehouse projects are notoriously difficult to manage, and many of them end in
that “the most experienced
failure.”1 A book on EDW project management reports Greedy
project managers [struggle] with EDW projects,” in partManual
because “estimating on
Ideal
warehouse projects is very difficult [since] each data warehouse
project can be so
different.”2 Many articles about EDW risk cite the laundry lists of EDW failure and r
types in Chapter 4 “Risks” of the same book, arguing that EDW project riskiness deri
from the multitude of risk factors inherent in EDW projects. Likewise, experts such a
3
4
6 guidelines
7
8
Ralph
Kimball
offer5numerous
for decreasing
EDW project risk.3
Schedule Stage
One risk factor stands out. We agree that EDW projects are complex, and have ma
risk factors. We also agree that in general, complex projects are more difficult to
Figure 3: Burn Rates (Greedy vs. Manual vs. Ideal Schedule)
manage. Having said that, several decades of experience with EDW projects lead us
focus on one particular pattern of EDW project failure that we believe is largely
The greedy algorithm achieves far greater load leveling, while choosing at each stage the
responsible for EDW projects’ reputation for high risk. This white paper describes th
functional areas that will deliver the most utility possible in an acceptable amount of
failure pattern, explains the risk factor behind the pattern, and suggests how to avoid
time.
risk while delivering value as quickly as possible throughout the project lifespan.
Using the PDG and scheduling algorithm to negotiate project schedules. The key
benefit of using the PDG and scheduling algorithm is not to optimize the project
The Failure Pattern
schedule. After all, once the project is complete, the business will probably continue to
benefit from the EDW’s data and reports for years, while the project may only last for
The pattern proper. The failure pattern is surprisingly mundane:
between six and eighteen months.
1. The EDW project’s business sponsors agree to fund the project.
Rather, the main benefits are twofold:
•
•
2. In exchange for sponsorship, the sponsors pressure the project team to deliver
The project team can identify
all of thebusiness
time requirements
implicit in a within
proposed
substantial
value early—typically
three months of project ons
project schedule. This complete
understanding
of
time
requirements,
together
More generally, the sponsors require a project roadmap that promises frequen
with a realistic load-leveling
constraint,
lets the
project
team avoid
early overdelivery
of major
new
increments
of functionality.
commitment and the resulting loss of credibility and risk of project failure.
3. Eager for a chance to build the EDW, the project team negotiates and commit
The project team can use the
PDG to help
the business
team
a frequent
delivery
schedule,sponsors
starting and
withproject
delivery
of some high-visibility dat
develop a shared understanding
of
the
scheduling
constraints
that
the
EDW’s
reports at the end of the first project phase.
subject matter imposes on the project. This shared understanding can help the
business sponsors accept realistic schedule requirements, recognizing that they
1
cannot have everything
theyM.want
threeBeate
months
the Schiefer,
project starts.
Robert
Bruckner,
List, after
and Josef
“Risk Management for Data Warehouse System
Lecture Notes in Computer Science Vol. 2114, pp. 219-229 (Springer Verlag, 2001).
2
Sid Adelman
Larissa
T. Moss,
to Data
We encourage the project team
to print aand
large
copy
of the“Introduction
PDG, and use
it inWarehousing,”
schedule Data Warehouse Project
Management
(Pearson,
2000),
pp.
1-26.
negotiations with project sponsors
to illustrate scheduling realities by
3
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
visited January 30, 2014.
•
circling all of the entity types required by a popular functional area,
•
marking the entity types required by some high-utility reports (to illustrate that
they tend to lie in the PDG’s late stages), and
10
10
Copyright © 2014 Mosaic Data Science. All rights reserved.
A MOSAIC DATA SCIENCE WHITE PAPER
WhyScience.
Enterprise
Copyright © 2014 Mosaic Data
AllData
rightsWarehouse
reserved. Projects Fail, and What to do About it
Mosaic Data Science, January 2014
challenging businessIntroduction
sponsors to construct a project schedule consistent with the
PDG and load-leveling constraint that delivers total utility over the project
warehouses
are risky.
It is an
truism that enterprise data
lifespan comparableEveryone
to the totalknows
utilitydata
delivered
by the project
schedule
thatITthe
warehouse
optimization algorithm
outputs.(EDW) projects are unusually risky. One paper on the subject begins, “D
warehouse projects are notoriously difficult to manage, and many of them end in
1
A book
on EDW
project
management
reports
that “the most experienced
failure.”
In our experience, this approach
quickly
persuades
business
sponsors
that they
can only
managers
[struggle]
with EDW
projects,”
partto
because “estimating on
get so much so fast. In fact,project
the approach
frees
the sponsors
and project
teaminalike
is verystage,
difficult
[since]
each datatowarehouse project can be so
change the schedule freely warehouse
at the end ofprojects
each project
using
the algorithm
2
Many articles
about EDW
risk cite
the laundry
lists of EDW failure and r
different.”
recompute an optimal schedule
after changing
report utilities
to reflect
changing
business
types
in
Chapter
4
“Risks”
of
the
same
book,
arguing
that
EDW
project riskiness deri
priorities. Paradoxically, this more formal approach to project scheduling ends up
from
multitude
of risk
factors inherent
in EDW
projects.
helping the project be as agile
asthe
possible,
as well
as minimizing
the risk
that the
project Likewise, experts such a
3
will miss its schedule.17 Ralph Kimball offer numerous guidelines for decreasing EDW project risk.
•
One risk factor stands out. We agree that EDW projects are complex, and have ma
risk factors. We also agree that in general, complex projects are more difficult to
manage. Having said that, several decades of experience with EDW projects lead us
focus on one particular pattern of EDW project failure that we believe is largely
responsible for EDW projects’ reputation for high risk. This white paper describes th
failure pattern, explains the risk factor behind the pattern, and suggests how to avoid
risk while delivering value as quickly as possible throughout the project lifespan.
The Failure Pattern
The pattern proper. The failure pattern is surprisingly mundane:
1. The EDW project’s business sponsors agree to fund the project.
2. In exchange for sponsorship, the sponsors pressure the project team to deliver
substantial business value early—typically within three months of project ons
More generally, the sponsors require a project roadmap that promises frequen
delivery of major new increments of functionality.
3. Eager for a chance to build the EDW, the project team negotiates and commit
a frequent delivery schedule, starting with delivery of some high-visibility dat
reports at the end of the first project phase.
1
Robert M. Bruckner, Beate List, and Josef Schiefer, “Risk Management for Data Warehouse System
Lecture Notes in Computer Science Vol. 2114, pp. 219-229 (Springer Verlag, 2001).
2
Sid Adelman and Larissa T. Moss, “Introduction to Data Warehousing,” Data Warehouse Project
Management (Pearson, 2000), pp. 1-26.
3
http://www.kimballgroup.com/2009/04/26/eight-guidelines-for-low-risk-enterprise-data-warehousing
visited January 30, 2014.
17
In particular, it becomes natural to treat a project phase as an agile/scrum iteration.
11
11