Applying Business Intelligence to Software Engineering

Transcription

Applying Business Intelligence to Software Engineering
Applying Business Intelligence
to Software Engineering
OSCAR
GÖTHBERG
Master of Science Thesis
Stockholm, Sweden 2007
Applying Business Intelligence
to Software Engineering
OSCAR
GÖTHBERG
Master’s Thesis in Computer Science (20 credits)
at the School of Computer Science and Engineering
Royal Institute of Technology year 2007
Supervisor at CSC was Stefan Arnborg
Examiner was Stefan Arnborg
TRITA-CSC-E 2007:003
ISRN-KTH/CSC/E--07/003--SE
ISSN-1653-5715
Royal Institute of Technology
School of Computer Science and Communication
KTH CSC
SE-100 44 Stockholm, Sweden
URL: www.csc.kth.se
Abstract
A major problem with software development is a lack of insight into how
well the project is going. Every year, a vast majority of finished software
projects in the US are either scrapped, or completed without meeting
cost and time constraints. At the same time, analysis of performance
metrics is an area that is often underutilized in software engineering
because of a lack of expertise and tools.
However, this is not so in the business world, where data is modeled
and analyzed along different dimensions to help make critical business
decisions that can determine profits or loss. These techniques fall under the umbrella of “business intelligence”. In my thesis, I have evaluated using these techniques for the gathering of metrics related to the
progress of a software project.
With a list of metrics to provide as the project aim, I used the business intelligence platform developed by LucidEra to develop four analytics applications, each corresponding to a different engineering support
system (a bug database, a testing framework, a continous integration
system and a revision control system) as the data source.
Through this approach, I was able to provide a majority, although
not all, of the metrics from my list in an automated and relatively
source-independent fashion. Based on my results, I conclude that using a business intelligence platform is a viable approach to the problem
of providing better analytics related to software engineering.
Referat
Att tillämpa Business Intelligence på
mjukvaruutveckling
Ett stort problem inom mjukvaruutveckling är brist på insikt i hur väl
projektet fortskrider. Varje år är en stor majoritet av de mjukvaruprojekt
som avslutas i USA antingen nedlagda, eller avslutade utan att möta
tids- och kostnadsramar. Samtidigt är analys av prestationsindikatorer
ett område som ofta är underutnyttjat i mjukvaruutveckling på grund
av en brist på expertis och verktyg.
Så är dock inte fallet i affärsvärlden, där data modelleras och analyseras längs olika dimensioner för att underlätta kritiska affärsbeslut som
avgör vinst eller förlust. Dessa tekniker faller under begreppet “business
intelligence”. I mitt examensarbete har jag utvärderat möjligheterna att
använda dessa tekniker för att underlätta insamlingen av mätetal relaterade till progressionen hos ett mjukvaruprojekt.
Med en lista av mätetal att tillhandahålla som målsättning för projektet, har jag använt den business intelligence-plattform som utvecklas
av LucidEra för att utveckla fyra analysapplikationer, var och en baserad på ett av utvecklingslagets stödsystem (en buggdatabas, ett testramverk, ett system integrationsautomatisering och ett versionshanteringssystem) som datakälla.
Genom detta tillvägagångssätt kunde jag tillhandahålla de flesta,
om än inte alla, av mätetalen från min lista på ett automatiserat och relativt källoberoende sätt. Med mina resultat som bas kan jag konstatera
att användandet av en business intelligence-plattform är ett gångbart
sätt att tillhandahålla bättre mätetal för mjukvaruutveckling.
Preface
The project forming the basis of this thesis was carried out at LucidEra in San Mateo,
California, between March and September 2006. I would like to thank Boris Chen
for inviting me and for his invaluable support throughout the project.
I would also like to thank everyone at LucidEra for their help and input. A
special thanks to Fan Zhang for contributing some of his Sed&Awk magic.
Most of all, I would like to thank my parents for always supporting me.
Contents
1 Introduction
1.1 Background .
1.2 Problem . . .
1.3 Method . . .
1.4 Scope . . . .
1.5 Report outline
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
1
2
3
2 Background
2.1 The concept of Business Intelligence . . .
2.2 The potential of engineering analytics . .
2.3 Challenges . . . . . . . . . . . . . . . . . .
2.3.1 Productivity is hard to measure . .
2.3.2 Software vendors get away with it
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
6
6
6
3 Aim
3.1 Defect tracking . . . . . . . . . . . . . . . . .
3.1.1 Open bugs over time . . . . . . . . . .
3.1.2 Cumulative open/closed plot . . . . .
3.1.3 Open/close activity charts . . . . . . .
3.1.4 Bug report quality . . . . . . . . . . .
3.1.5 Average resolution time . . . . . . . .
3.2 Test result tracking . . . . . . . . . . . . . . .
3.2.1 Total test results over time . . . . . . .
3.2.2 Performance comparisons . . . . . . .
3.2.3 Test execution reports . . . . . . . . .
3.2.4 Test development reports . . . . . . .
3.3 Continous integration tracking . . . . . . . .
3.3.1 Number of runs per day . . . . . . . .
3.3.2 Time to fix . . . . . . . . . . . . . . .
3.3.3 Number of new submissions per build
3.3.4 Number of changed files per build . .
3.3.5 Number of line changes per build . . .
3.4 Revision control tracking . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
10
10
10
10
10
11
11
11
11
12
12
12
12
12
13
13
13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3.4.1
3.4.2
3.4.3
3.4.4
Number of checkins . . . . . . . . . . .
Number of files changed per submission
Number of lines changed per submission
Number of integrations . . . . . . . . .
4 Tools
4.1 Introduction to OLAP concepts . . . . .
4.1.1 An example . . . . . . . . . . . .
4.1.2 Measures . . . . . . . . . . . . .
4.1.3 Dimensions . . . . . . . . . . . .
4.1.4 Members . . . . . . . . . . . . .
4.1.5 Hierarchies . . . . . . . . . . . .
4.1.6 Cubes . . . . . . . . . . . . . . .
4.1.7 Slicing . . . . . . . . . . . . . . .
4.1.8 MDX . . . . . . . . . . . . . . . .
4.1.9 Star schema design . . . . . . . .
4.2 The platform . . . . . . . . . . . . . . .
4.2.1 Clearview . . . . . . . . . . . . .
4.2.2 Mondrian . . . . . . . . . . . . .
4.2.3 LucidDB . . . . . . . . . . . . . .
4.2.4 Data adaptors . . . . . . . . . . .
4.3 Analytics application development tools
4.3.1 The Workbench . . . . . . . . . .
4.3.2 Application Lifecycle Service . . .
4.4 Data sources . . . . . . . . . . . . . . .
4.4.1 Blackhawk . . . . . . . . . . . .
4.4.2 Cruisecontrol . . . . . . . . . . .
4.4.3 Jira . . . . . . . . . . . . . . . .
4.4.4 Perforce . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
14
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
15
15
15
16
16
17
17
17
17
18
18
18
19
19
20
20
20
20
20
21
21
21
5 Method
5.1 Introduction to application design . . . . . . .
5.1.1 Extraction . . . . . . . . . . . . . . . .
5.1.2 Transformation . . . . . . . . . . . . .
5.1.3 Star schema . . . . . . . . . . . . . . .
5.2 Defect tracking - Jira . . . . . . . . . . . . . .
5.2.1 Extraction . . . . . . . . . . . . . . . .
5.2.2 Transformation . . . . . . . . . . . . .
5.2.3 Star schema . . . . . . . . . . . . . . .
5.3 Test result tracking - Blackhawk . . . . . . . .
5.3.1 Extraction . . . . . . . . . . . . . . . .
5.3.2 Transformation . . . . . . . . . . . . .
5.3.3 Star schema . . . . . . . . . . . . . . .
5.4 Continous integration tracking - Cruisecontrol
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
24
24
24
24
24
25
26
27
27
27
28
28
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5.4.1 Extraction . . . . . . . . . .
5.4.2 Transformation . . . . . . .
5.4.3 Star schema . . . . . . . . .
5.5 Revision control tracking - Perforce
5.5.1 Extraction . . . . . . . . . .
5.5.2 Transformation . . . . . . .
5.5.3 Star schema . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28
30
30
30
30
31
31
6 Results
6.1 Example reports . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Performance comparison between virtual java machines
6.1.2 Test success monitoring for recent builds . . . . . . . . .
6.2 Compared to the original aim . . . . . . . . . . . . . . . . . . .
6.2.1 Metrics that were easy to get . . . . . . . . . . . . . . .
6.2.2 Metrics that would require more work . . . . . . . . . .
6.3 Next steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1 Possible with additional data sources . . . . . . . . . . .
6.3.2 Possible with better cross-referencing . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
33
34
34
34
36
37
37
37
7 Conclusions
39
Bibliography
41
List of Figures
1.1 Manual metrics gathering in a spreadsheet . . . . . . . . . . . . . . . . .
2
4.1 Hierarchies example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Star schema example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 The LucidEra application platform and ETL layer . . . . . . . . . . . . .
16
18
19
5.1
5.2
5.3
5.4
5.5
.
.
.
.
.
25
25
27
29
31
6.1 Report on execution time for different virtual java machines. . . . . . . .
6.2 Report on the percentage of passed tests for the ten most recent builds. .
34
35
ETL layer for the Jira-based application . . . . . . .
Example of normalization in the Jira source schema .
ETL layer for the Blackhawk-based application . . . .
ETL layer for the Cruisecontrol-based application . .
ETL layer for the Perforce-based application . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
5.1
5.2
5.3
5.4
Measures and dimensions acquired for defect tracking . . . . . . . .
Measures and dimensions acquired for test result tracking . . . . . .
Measures and dimensions acquired for continous integration tracking
Measures and dimensions acquired for revision control tracking . . .
.
.
.
.
.
.
.
.
26
28
30
32
6.1 Metrics that are easy to get from the source systems. . . . . . . . . . . .
6.2 Metrics that would require extra work to retrieve from source systems. .
35
36
Chapter 1
Introduction
1.1
Background
A major problem with software development is a lack of insight into how well the
project is going. The first problem is determining how to measure a project, the
second is being able to obtain those measurements, and the third is how to analyze
them.
Analysis of metrics is an area that is often underutilized in software engineering
because of a lack of expertise and tools. However, this is not so in the business
world, where data is modeled and analyzed along different dimensions to help
make critical business decisions that can determine profits or loss. These techniques
fall under the umbrella of “business intelligence”.
1.2
Problem
As my project for this thesis I will investigate the feasibility of applying a “traditional” business intelligence platform to the problem of analyzing the software
engineering process.
Today, much of the metrics gathered for analyzing the progress of software
projects are collected by hand (example in figure 1.1), and compiled through manual work in tools such as Microsoft Excel. Therefore, I will investigate the feasibility
of automating and enhancing this metrics collection process through the use of
business intelligence techniques.
1.3
Method
LucidEra, Inc. is developing a business intelligence software-as-a-service to midsize
organizations, that collects data from a company’s internal systems as well as external ones, combines, aggregates and analyzes these, and finally presents them in an
accessible way.
1
CHAPTER 1. INTRODUCTION
Figure 1.1. Manual metrics gathering in a spreadsheet
I will try and apply the service being developed by LucidEra to the gathering
and presenting of data relevant to software engineering, from sources such as bug
databases, testing frameworks and revision control systems, in order to help the
managing of software engineering projects.
As data sources I will use the tools used in LucidEra’s internal development
infrastructure, including Jira (for bug tracking), Perforce (for revision control),
Cruisecontrol (for continous integration) and Blackhawk (for organizing and running recurring test suites).
For the design of the applications themselves, I will use standard data warehousing, relational database and OLAP (on-line analytical processing) techniques, such
as ETL (extract, transform, load), as well as SQL/MED connectivity features of the
LucidDB database.
1.4
Scope
I will primarily study what can be done just by “plugging in” an analytics platform to
the tools and workflows already being used in a software development organization,
and see what can be had “for free” or at least “for cheap”.
The reason for this is to limit the barrier of entry by keeping the amount of work
adapting an organization’s current way of doing things to a minimum. Evangelical
texts along the lines of “we could improve a lot by totally changing the way we do
things” already exist.
In my conclusions I will, however, discuss the small incremental changes, the
2
1.5. REPORT OUTLINE
low-hanging fruit, to engineering practices and conventions that could give big improvements in performance analysis.
Also outside of the scope of this thesis is lower-level source code analysis, such as
the generating of code complexity metrics. Although this would make an interesting
addition to the performance metrics I do cover, it has been investigated extensively
by others (including previous KTH master’s theses[16][19]), and it would detract
from the focus of this thesis.
1.5
Report outline
In chapter 2, I introduce the concept of business intelligence along with a background as to why better analytics is needed in the software industry, while chapter 3
gives a detailed specification of the specific analytics I wanted to realize. Chapter 4
introduces the tools and techniques used, and chapter 5 provides in-depth examples
of how these were actually used in the project. In chapter 6 I report what metrics
and analytics I was actually able to extract using this approach, and compare that
to the original aim along with some suggestions as to which next steps are possible.
Finally, in chapter 7 I present my conclusions and my recommendations for further
development in the area.
3
Chapter 2
Background
2.1
The concept of Business Intelligence
“Business Intelligence” is a common term for the gathering and analysis of an organization’s data in the pursuit of better business decisions[31]. Being somewhat
loosely defined, the term spans many technical disciplines from data warehousing
and customer relationship management, to high-end technical data analysis and
data mining. Big vendors in the field are SAS, Cognos, SAP and Business Objects.
Businesses in financial investement and trading have used business intelligencerelated analytics in their competitive strategies for decades, but in recent years
these techniques have also started to spread into other industries such as consumer
finance, retail and travel[7].
2.2
The potential of engineering analytics
In the past decades, software has become an essential part of business and society.
In 2006, the revenues from software and software related services for the industry’s
500 biggest companies totalled over $380 billion[8].
Since 1994, the Massachusetts-based Standish Group has been tracking the outcomes of software projects in the US. Their first study[27], from 1994, reported
that only 16% of software projects were delivered on-time and on-budget, with
53% over-budget and over-time, and 31% terminated without completion. In 2000
the situation had improved[12] somewhat, with 28% of the projects within budget
and time constraints, 49% over-budget and over-time, and 23% scrapped.
Even so, it can be concluded that predictability in the software engineering process is still sorely lacking. Especially so considering the values at stake in the industry. There is a big potential in having better ways of telling how well a software
project is progressing, and one possible path could be through the use of business
intelligence techniques. Having better predictability in the software engineering
process would lead to improvements in many areas, including:
5
CHAPTER 2. BACKGROUND
• higher accuracy in cost and time estimates.
• better utilization, and wiser allocation, of project resources.
• closer understanding of business processes and organizations.
2.3
Challenges
Like any other context, however, the software engineering organization has certain
characteristics that are important to consider when applying analytics strategies.
2.3.1 Productivity is hard to measure
An area where business intelligence is widely used today is in sales organizations;
the value of individual sales representatives, their managers, and so on, are often
measured in terms of generated revenue. Selling is the major responsibility of sales
persons, and revenue is the point of selling. Thus, since revenue easily can be
expressed in actual numbers, sales person performance can be measured rather
well.
The problem with doing a direct transfer from this situation to that of, say,
a programmer, is that the responsibilities of a programmer are harder to measure by numbers. And worse, using bad metrics would drive the wrong type of
performance[5].
A major problem with measuring performance on a per-individual level is that it
encourages competition and discourages teamwork[5]. This may be fine in a situation where a large amount of competitiveness is a good thing, but in a teamworkdependent organization, such as an engineering group, maximizing individual performance metrics may actually lead to worse performance for the team as a whole.
Therefore, this thesis will primarily evaluate business intelligence techniques for
performance analysis for a project or team as a whole rather than on a per-person
basis.
2.3.2 Software vendors get away with it
Microsoft announced[10] in 2004 that it would have to cut major features from the
coming revision to it’s Windows operating system in order to make the 2006 deadline. Still, in 2006, the update was delayed further[3] into 2007, with Microsoft
leaving PC manufacturers and resellers without the expected new Windows version
in the important holiday shopping season. However, even though this is the flagship
product of a company with a yearly revenue of $45 billion, the reaction to the delay
has been one of sighs and shrugs, at most, suggesting that this sort of production
hickup is to be expected in the computer industry.
By comparison, when the new Airbus 380 passenger jet project hit it’s most
recent delay, pushing back initial deliveries by 6 months (October 2006 to April
6
2.3. CHALLENGES
2007), it caused a shareholder class action lawsuit that forced the resignations of
both Airbus’ and parent company EADS’ CEOs[2].
One possible explanation to why analytics has not caught on in the software
industry could be that people simply are more willing to accept problems related to
computers than they are in the case of other types of technology[21], and that this
has lead to software vendors getting comfortable with the situation.
Considering, however, that the annual added cost of inadequate software testing infrastructure has been estimated[28] to $59.5 billion per year in the United
States alone, the upside of this unwillingness is that there is a significant competitive advantage to be seized by a company that manages to handle this problem
better.
7
Chapter 3
Aim
The specific aim for this thesis is to investigate how much of the metrics and analytics currently being gathered by hand for the management of a software project can
be acquired automatically through the use of common data warehousing techniques
and a business intelligence platform such as the one being developed by LucidEra.
I will start with a requirements specification; a wish list of the kind of metrics
and analytics we would like to have for the development organization at LucidEra.
Providing the metrics described below in an automated and accessible fashion will
be the goal for this project.
It is worth noting that this list of metrics was subject to change throughout the
actual implementation effort, due to discoveries of previously unknown limitations
as well as possibilities in the source data. This is to be expected during the development of an ETL (extract, transform, load) process[6].
I have grouped the metrics by different areas of analysis. To a large extent these
groups also correspond to the different data sources to be used:
• Defect tracking (bug report database) - section 3.1
• Test result tracking (test suite framework) - section 3.2
• Continous integration tracking (automated build system) - section 3.3
• Revision control tracking (revision control system) - section 3.4
3.1
Defect tracking
For tracking defects, the plan was to connect to a bug tracking software. These bug
tracking services are usually centered around the filing of issues, concerning some
part of the product, and the assigning of these to someone for fixing. An issue can
be either open (not yet resolved) or closed (resolved), it can be of higher or lower
priority, and so on.
For my analysis purposes, I would want to extract metadata to allow me to break
down and/or filter these issues along at least the following dimensions:
9
CHAPTER 3. AIM
priority
component
reporter
assignee
fixer
distinguish critical problems from minor issues
which part of the project is affected
who reported the bug
who is assigned to fix it
if fixed, who fixed it
3.1.1 Open bugs over time
Track trends over time to gauge whether we are nearing “readiness” for release.
When we are ready, we should see a downward slope in open bugs.
3.1.2 Cumulative open/closed plot
Also to track trend. For a given day, how many bugs have been opened to this point,
and how many have been closed. (two lines). In this case, we should see a smooth
convergence. Danger signs are[4]:
1. when lines do not converge and continue to increase (Endless bug discovery)
2. if we plateau but do not converge (Bug deficit)
3. if we see plateaus and inclines alternating (blurred project vision, may indicate inconsistency in bug filing/closing)
3.1.3 Open/close activity charts
See “open” and “close” actions per day. Towards the end of the release, the find rate
should decrease. Close rate should also decrease (since fixing/closing bugs also
may introduce new bugs).
3.1.4 Bug report quality
Breaking down the number of resolved bugs per their resolution (“fixed”, “will not
fix”, “invalid”, “can not reproduce”), etc, gives us an idea of the quality of the bug
reports.
A high percentage of “invalid” or “will not fix” could indicate that there is a
perception mismatch between developers and testers regarding what is a bug and
what is not. A high number of “can not reproduce” may indicate that the level of
detail in bug descriptions needs to improve, and so on.
3.1.5 Average resolution time
Measure trend in ability to resolve bugs promptly. If time is lengthening, then
either bugs are getting harder, or exceeding capacity, or there is some inefficiency
in handling.
10
3.2. TEST RESULT TRACKING
3.2
Test result tracking
For an organization doing continous automated testing, it will be valuable to be
able to analyze the results from these tests.
A central part of test results gathering is the management of the configuration
matrix (the possible environments that the project will be tested on), and since that
data is inherently multidimensional, it lends itself well for business intelligencestyle dimensional analysis.
For test results, we will want to be able to break down and filter data along the
following dimensions:
build number
component
results
configuration dimensions
the number uniquely identifying a build
what software component was tested
success, failed, skipped, etc
operating system, JVM1 used, etc
3.2.1 Total test results over time
This provides simple high level aggregate numbers of test results over time, and
should provide the project team an estimate on the quality of the product. An
increase of tests will indicate better coverage, an increase of success percentage
indicates higher known quality.
Ideally at the end of a release, we will want to have increasing test coverage,
and success rate. A decrease in coverage and increase of success rate is no comfort.
An increase in tests, and decrease in success is expected normally, and over time,
the success rate should then increase (if an increase in tests do not result in decrease
of success rate, then this may indicate ineffective tests being developed).
3.2.2 Performance comparisons
Though functional tests are not performance tests, they can be analyzed for performance differences between different configuration values.
For instance, show tests run on solaris vs tests run on linux, or top 10 disparities
(i.e. where solaris is much faster; or conversely much slower).
Another common problem is finding out why a test run is taking so long, and for
this purpose it would be useful to be able to find the top 10 longest running tests.
3.2.3 Test execution reports
Test management reports are for tracking how many tests we want to run, and how
close we are to our goal.
This requires a test management system to be put in place, which includes the
ability to define an expected set of tests, and match them to configurations which
they should be run with, and at what frequency they should be run. You may for
instance want to have all tests run for a given milestone, but you may not require
11
CHAPTER 3. AIM
that all tests run for all milestones. Some tests may be run once per release. Some
may be run every day.
3.2.4 Test development reports
Test development reports are for tracking what tests should be developed and when.
Tests will usually be defined in a test specification, and then be tried out manually
before being automated.
If tests are to be executed manually to be tracked, then development is in three
stages, 1) definition, 2) implementation for manual testing (sometimes requires
some simple scripting or some dependency; or if none, then definition and implementation is the same thing), 3) implementation for automated testing.
An important milestone may be to have some percentage of expected tests to
be implemented completed by some date, all to be complete by another. When a
test set is frozen, then success rate should climb to an acceptable level that satisfies
release criteria.
3.3
Continous integration tracking
Doing builds on a regular schedule, nightly/daily or some other, is a common practice in software development. Analyzing data from such builds could provide important information about the complexity and matureness of the product. Interesting
things to break down by would be:
success
last submitter(s)
date
whether the build attempt was successful or not.
who made submissions since the last successful build
when the build was made
3.3.1 Number of runs per day
How many build attempts were made. Gives us an indicator of how busy the continous integration system is. If the load is heavy and if there is a high percentage of
failures, it might be an indicator that the code base should be branched.
3.3.2 Time to x
Whenever we have a broken build, it is useful to measure how long it takes to get a
successful one. Broken builds slow down development progress, so it is important
to have a quick turnaround on build failures.
3.3.3 Number of new submissions per build
How many submissions were made to the revision control system in between one
build and the next. A high number of new submissions per build could be the cause
of a lesser build success rate.
12
3.4. REVISION CONTROL TRACKING
3.3.4 Number of changed les per build
How many files were changed between one build and the next. Compared to metric 3.3.3, this gives a better (albeit still not very good) indicator of how much code
was actually changed inbetween two builds.
3.3.5 Number of line changes per build
How many lines of code (where applicable) were changed between one build and
the next. Compared to metric 3.3.3 or 3.3.4, this gives a better indicator of how
much code mass was actually changed inbetween two builds.
3.4
Revision control tracking
The revision control system manages all the changes (bug fixes, new features) submitted by developers. It is a fundamental part of any software development organization. Relevant breakdowns of metrics gathered from the revision control system
would be:
change number
submitter
affected component(s)
change type
date
the number identifying the change
who made the submission
software component(s) affected by the changes
type of change (add, edit, delete, etc)
date the change was made
3.4.1 Number of checkins
The number of checkins would give an estimate of developer activity on a “logical”
change level; one checkin often corresponds to a new feature, a new bug fix, and
so on.
3.4.2 Number of les changed per submission
The number of files changed per submission gives insight into the project’s file structure; how tightly coupled different parts of the repository are, how many files need
to be edited to make a certain kind of change. If the number is very high, we might
have a case of spaghetti-code.
3.4.3 Number of lines changed per submission
The number of lines changed can be used as a measure for how much actual work is
required to make a certain kind of change. If simple changes require big amounts of
code to be edited, it could be an indicator that the code structure and development
environment are not very powerful for the type of development in question.
13
CHAPTER 3. AIM
3.4.4 Number of integrations
With multiple code branches, the number of integrations (points of merging between the codelines) is important to keep track of. Having integrations happen too
far apart increases the risk of having developers work with out-dated code.
Also, if the number of changes per integration is very small for an extended
period of time, it might be an indicator that the reason for splitting code into these
two branches is no longer applicable and that it could be worthwhile to permanently
merge them into one.
14
Chapter 4
Tools
4.1
Introduction to OLAP concepts
OLAP, or Online Analytical Processing[29], is an approach to the structuring of data
that makes sense to analyze by dimensions. Fundamental concepts in OLAP, and the
relations between them, are best illustrated by an example:
4.1.1 An example
Let’s say we want to analyze data describing the bug tracking database for a software project. A bug report has several properties, out of which a few could be:
reporter
assignee
create date
priority
someone reported the bug
someone is assigned to fix it
at which point in time it was reported
if the bug is critical or not
4.1.2 Measures
Among things interesting to analyze here is average priority, to see if the bugs in
our database, on average, are bad ones or not. In order to do this, we take the
“priority” property and couple that with an aggregation rule to calculate the average
for several bug reports. We now have a measure called “average priority”.
4.1.3 Dimensions
That gives us an average for how severe all of our bugs are, but what we don’t see
yet is whether this measure varies depending on the other properties. For example,
we could imagine that a plot of how the average priority measure changes over time
could be useful.
For that, we need dimensions. The “average priority over time” report can be
achieved by taking the “create date” metadata and using that as the basis for a
dimension.
15
CHAPTER 4. TOOLS
In our example, we can also create “reporter” and “assignee” dimensions out of
those respective metadata, allowing reports like “average priority for bugs reported
by Amy and assigned to Bob, over time”.
4.1.4 Members
The dimensions are populated by members, essentially all the possible values that
the dimensions can take. For example, the report above suggests that “Amy” is a
member of the “reporter” dimension, and that “Bob” is a member of “assignee”.
4.1.5 Hierarchies
Many dimensions that we want to analyze data by are hierarchical by nature. Such
is the case with “create date”, for example; we might want to analyze per year, per
month, or per day. Thus, we define “create year”, “create month” and “create day”
as different levels of the “create date” dimension.
Parent/Child
October
2006
2006
2006-10-24
Cousins
Siblings
November
2006
2006-11-12
Ancestor/Descendant
Create Year
Create Month
Create Day
Figure 4.1. Hierarchies example
As shown in figure 4.1, members of the different levels are connected through
parent/child relationships: for example, “2006-10-24” of the “create day” level is
the child of “October 2006” of the “create month” level, while “2006” of the “create
year” dimension is the parent of “October 2006”. Both “2006-10-24” and “October
2006” are descendants of “2006”, and the other way around, “2006” is their ancestor.
Continuing the family analogy, two members on the same dimension level sharing the same parent are called siblings, while two members on the same level having
different parents but sharing the same grandparent are called cousins.
16
4.1. INTRODUCTION TO OLAP CONCEPTS
4.1.6 Cubes
For every combination of assignee, reporter and create date, it is possible for the
“average priority” measure to take on a different value.
This relation between measures and dimensions can be seen as a cube, with our
three dimensions as the axises X, Y and Z. For each cell (each possible combination
of values for X, Y and Z) this OLAP cube can hold different values for the different
measures.
4.1.7 Slicing
When looking at our cube, by deciding to keep the value of one of the dimensions
fixed (such as only analyzing bug reports reported by Amy) we limit the number of
remaining combinations to analyze.
In effect, we’re no longer looking at the whole cube, but rather just a slice. The
activity of breaking down results by different dimensions is therefore called slicing.
The cube analogy gets difficult to picture geometrically as the number of dimensions increase, so these concepts are easiest to visualize with no more than two or
three dimensions.
4.1.8 MDX
Multidimensional Expressions, or MDX, is a language specially geared towards
querying OLAP cubes, like SQL is in the case of relational databases. MDX was
invented at Microsoft in 1997 as part of Microsoft SQL Server 7.0[30], and has
since been adopted by most major business intelligence vendors[32].
[ Measures ] . [ Average p r i o r i t y ] ON COLUMNS,
[ C r e a t e da te ] . [ 2006 ] . C h i l d r e n ON ROWS
FROM BugReportCube
WHERE ( [ R e p o r t e r ] . [Amy] , [ A s s i g n e e ] . [ Bob ] )
SELECT
Above is an example query for retrieving the average priority for all bugs filed
by Amy in 2006 that are assigned to Bob.
Simple MDX queries can look quite similar to SQL statements, but while SQL is
adamant about seeing precisely everything as a table, MDX has powerful features
for actually navigating the spatial structures of an OLAP cube, which makes many
things that are very hard to express in SQL easy to do in MDX.
4.1.9 Star schema design
Normally, the multidimensional data model of an OLAP cube is an abstraction, realized through a relational database schema.
The two prevalent data warehousing schema designs for such cubes are snowflake
schemas and star schemas [29]. In this project I will use star schemas exclusively,
and will therefore give a brief introduction to these.
17
CHAPTER 4. TOOLS
Star schemas consist of a fact table connected to separate dimension tables. In
practice, each cell in the cube is represented by one or more rows in the fact table,
and these fact table rows, or simply facts, are associated to dimension members
using foreign keys mapping to surrogate keys in the dimension tables. See figure
4.2 for what this would look like in our example. Note that measures are stored as
their own columns in the fact table and not in separate tables.
REPORTER_DIM
ID: integer
NAME: varchar(255)
ASSIGNEE_DIM
ID: integer
NAME: varchar(255)
BUGREPORT_FACT
ID: integer
PRIORITY: integer
ASSIGNEE_KEY: integer
REPORTER_KEY: integer
CREATE_DATE_KEY: integer
CALENDAR_DIM
ID: integer
YEAR: varchar(4)
MONTH: varchar(11)
DAY: varchar(11)
Figure 4.2. Star schema example
4.2
The platform
The LucidEra platform for analytics applications consists of a number of components, out of which major parts (Mondrian, LucidDB) are open-source. The numbered steps in figure 4.3 illustrate how the different parts of the platform interact
when responding to a report request from a user. In this section I will briefly introduce the different parts of the platform.
4.2.1 Clearview
Clearview is the Ajax1 -based graphical user interface for the LucidEra platform and
the point of interaction with the end user. It puts advanced analytics in the hands
of users without prior knowledge of databases or MDX, by allowing them to create
queries through the dragging and dropping of graphical elements.
4.2.2 Mondrian
Mondrian[26] is an on-line analytical processing (OLAP) server. In the LucidEra
platform it essentially takes the role of translator between MDX and SQL; it allows
1
Asynchronous JavaScript and XML
18
4.2. THE PLATFORM
5. final report
Internet
1. user request
Clearview
2. MDX queries
5. MDX results
Mondrian
3. SQL queries
4. SQL results
LucidDB
REPORTER_DIM
ID
NAME
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_KEY
REPORTER_KEY
CREATE_DATE_KEY
ASSIGNEE_DIM
ID
NAME
REPORTER_DIM
ID
NAME
ASSIGNEE_DIM
ID
NAME
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_KEY
REPORTER_KEY
CREATE_DATE_KEY
CALENDAR_DIM
ID
YEAR
MONTH
CALENDAR_DIM
DAY
ID
MySQL
adapter
MySQL
Oracle
adapter
Oracle
Flatfile
adapter
CSV
files
YEAR
MONTH
DAY
CALENDAR_DIM
ID
YEAR
MONTH
DAY
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_KEY
REPORTER_KEY
CREATE_DATE_KEY
Star schema
Transform
Load
Extract
ETL layer
Figure 4.3. The LucidEra application platform and ETL layer
us to define an OLAP cube as a schema in a regular relational database.
4.2.3 LucidDB
LucidDB[20] is a relational database tailored for use in business intelligence applications. It is built from the Eigenbase2 platform for modular data management
systems, and is released as open source under the GNU Public License3 .
4.2.4 Data adaptors
LucidDB implements parts of the SQL/MED standard[11] for external connectivity,
which gives application developers a convenient way of accessing various kinds of
structured data using plain SQL.
Data from foreign servers, such as structured text files, or relational databases
such as Oracle, MySQL or Microsoft SQL Server, are connected to using SQL/MED
2
3
Eigenbase Foundation, http://www.eigenbase.org
GNU Public License, http://www.opensource.org/licenses/gpl-license.php
19
CHAPTER 4. TOOLS
foreign data wrappers, which make them show up as foreign tables. These foreign
tables can then be accessed by regular SQL statements just like any other table.
4.3
Analytics application development tools
The above components make up the platform for the hosting and the delivery of
a business intelligence application. However, for the development of the analytics
applications themselves, I have used a couple of additional LucidEra in-house tools,
the Workbench and ALS.
4.3.1 The Workbench
The Workbench is the development environment for applications running on the
platform, and is based on the Eclipse4 platform. Creating an application entails
creating views, tables, queries and foreign data connectors, and the Workbench
allows me to do all this from a single environment.
4.3.2 Application Lifecycle Service
The Application Lifecycle Service (ALS) server manages the contact between the
Workbench and the delivery platform described in section 4.2, where the app gets
deployed. An application is created in the Workbench, and when the user decides
to deploy it to the platform, it is the ALS server that does the work.
4.4
Data sources
For each of the analysis areas (test results, continous integration, defects tracking,
revision control), there are many different frameworks available on the market. For
simplicity, I decided to go with the ones already in use at LucidEra. I describe these
in this section.
However, other comparable frameworks would provide more or less the same
functionalities, and the applications I have developed in this project would be reasonably adaptable to other datasources of the same kind. It should not be too hard
to replace Jira with Bugzilla5 , for example, or Perforce with Subversion6 .
4.4.1 Blackhawk
Blackhawk[9] is a pluggable test harness that allows for dynamic composition of
test suites based on test meta-data, parameterization of test cases, and multi-process
4
Eclipse, an open-source development platform, http://www.eclipse.org
Bugzilla, an open-source bug tracking system, http://www.bugzilla.org/
6
Subversion, an open-source version control system, http://subversion.tigris.org/
5
20
4.4. DATA SOURCES
management. It supports junit7 and HTTP protocol testing, and has a pluggable architecture for connecting to other test harnesses. Blackhawk is open-sourced under
the Apache License8 as part of the Apache Beehive project.
LucidEra uses Blackhawk for automating the nightly execution of unit test suites,
in order to see that features work as intended and to make sure that fixed bugs don’t
reappear.
4.4.2 Cruisecontrol
Cruisecontrol[25] is a framework for continous builds and integrations. It triggers
builds to be performed at certain events (such as changes in the revision control
system), and then notifies various listeners of the results thereof. Cruisecontrol is
open-sourced under a BSD-style9 license10 .
LucidEra uses Cruisecontrol for automating the build process. Whenever a build
dependency (such as a source file) has changed, Cruisecontrol schedules builds for
it’s dependant.
4.4.3 Jira
Jira[1] is a bug tracking, issue tracking, and project management application developed by Australia-based Atlassian Software. Jira is proprietary to Atlassian Software.
LucidEra uses Jira to track bugs, feature requests, tasks, to-do items, and so on.
4.4.4 Perforce
Perforce[23] is a revision control system, developed by Perforce Software, Inc. The
Perforce system is based on a server/client architecture, with the server performing
version management for the individual files in a central repository, to which the
clients submit changes. Perforce is proprietary to Perforce Software, Inc.
LucidEra uses Perforce for managing all it’s source code.
7
JUnit is a unit testing framework for the Java programming language
Apache License Version 2.0, http://www.opensource.org/licenses/apache2.0.php
9
BSD License, http://www.opensource.org/licenses/bsd-license.php
10
CruiseControl license, http://cruisecontrol.sourceforge.net/license.html
8
21
Chapter 5
Method
5.1
Introduction to application design
Developing an analytics application for the LucidEra application stack is essentially
about solving the problem of taking information from various sources and structuring it in a database schema that lends itself well to answering interesting questions
using the data.
This poses several challenges:
1. How to access the raw data to be analyzed.
Data can come from all sorts of sources and in all sorts of formats, but I need
to access it without requiring too much human assistance.
2. How to design the resulting database schema.
I am building an analytics application to answer questions using data. The
user asking the questions does this using a graphical interface (essentially an
MDX query generator), that has a limited scope as to which questions are
possible to ask.
Thus, it is not enough that the data is there in the schema, the schema must
also be designed in such a way that the answers are made available through
these limited queries.
3. How to transform the raw data into this schema.
When the data sources to be analyzed are identified, and a schema that would
provide the user with answers to the questions we are interested in is defined,
I need to figure out how to get from raw data to well-ordered schema.
Next, I will give a general introduction to how I have performed data extraction
and transformation for my applications, and after that I will give brief descriptions
of how these were constructed in each specific case of the four analysis applications.
23
CHAPTER 5. METHOD
5.1.1 Extraction
By using SQL/MED data adaptors (see section 4.2.4), LucidDB can access a variety
of data sources as regular relational database tables. For the apps implemented in
this project, I used data adaptors for Oracle (in the cases of Jira and Blackhawk) and
CSV (comma separated value) flatfiles (in the cases of Cruisecontrol and Perforce).
This allowed me to keep the transformation layer (see below) strictly in SQL,
even though the data sources come from different sources.
5.1.2 Transformation
Once the connections to the data sources to be used for analysis are set up, the ETL
(Extract, Transform, Load) layer takes over. The ETL layer is the set of rules for
transforming and loading the raw data into an ordered schema suited for analysis.
Creating an ETL layer is basically answering the question of how to get from
point A to point B, with point A being the raw data to be analyzed, and point B
being a schema that meets a number of requirements. This proved to be the most
time consuming part of analytics application development, by far.
Two typical patterns I used when creating the ETL layer was normalization,
when creating dimensions out of a central table, and de-normalization when creating measures out of a normalized relational database, for example. I will give
real-world examples of both these patterns being used in the cubes described below.
5.1.3 Star schema
The result of the ETL layer is the star schema. For each of the data sources used, I
will list which measures and dimensions I was able to extract.
5.2
Defect tracking - Jira
Jira stores it’s information in a relational database, the particular instance used at
LucidEra runs on an Oracle server. For extracting data from the database I used
LucidDB’s ready-made Oracle adapter.
5.2.1 Extraction
The Jira database schema is a fairly normalized one, with a table JIRAISSUE, where
each row describes an issue, as the hub. The structure of the schema is already
geared towards analysis, with columns in the JIRAISSUE table containing keys to
rows in tables that can be used as OLAP dimensions without too much need for
alteration.
24
5.2. DEFECT TRACKING - JIRA
5.2.2 Transformation
Using a traditionally normalized relational database schema such as the one in Jira
as a data source for a mondrian OLAP cube makes creating dimensions easy (since
the structure is already that of a central hub table containing keys to data in other
tables), but getting the data for the actual measures in place required me to do denormalization in the cases where that data is stored in separate tables. The resulting
data flow for the Jira application’s ETL layer is illustrated by figure 5.1.
LucidDB
Oracle
CALENDAR_DIM
ID
YEAR
MONTH
DAY
REPORTER_DIM
ID
NAME
ASSIGNEE_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
REPORTER_DIM
ID
NAME
REPORTER_DIM
ID
NAME
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_KEY
REPORTER_KEY
CREATE_DATE_KEY
REPORTER_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_DIM
ASSIGNEE_KEY
ID REPORTER_KEY
NAME CREATE_DATE_KEY
Oracle
adapter
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_KEY
REPORTER_KEY
CREATE_DATE_KEY
ASSIGNEE_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
CALENDAR_DIM
ID
YEAR
MONTH
DAY
CALENDAR_DIM
ID
YEAR
MONTH
DAY
Transform
Extract
REPORTER_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
Star schema
Load
Figure 5.1. ETL layer for the Jira-based application
To give a concrete example of when denormalization was used: I wanted a measure
’average time to resolve’, so for each issue (each row in the JIRAISSUE table) I
had to get the date when the status of that issue was set to “Resolved”. However,
that data is not there in the hub JIRAISSUE table, but is sitting in a table called
CHANGEITEM, where each row belongs to one row in the CHANGEGROUP table,
where, in turn, each row belongs to one row in the JIRAISSUE table. See figure 5.2.
Also, a single issue may have been set to “Resolved” several times (it might have
been re-opened if an attempted fix didn’t work), so I only want to get the latest
resolution date.
JIRAISSUE
CHANGEGROUP
ID: integer
PKEY: varchar(255)
PROJECT: integer
REPORTER: varchar(255)
...
ID: integer
ISSUEID: integer
AUTHOR: varchar(255)
CREATED: date
CHANGEITEM
ID: integer
GROUPID: integer
FIELDTYPE: varchar(255)
FIELD: varchar(255)
...
Figure 5.2. Example of normalization in the Jira source schema
First, I join the CHANGEITEM table to CHANGEGROUP, putting all changes into one
and the same table, it could be done easier just for the case of latest resolution date,
but this way I can reuse this view for other similar de-normalizations:
c r e a t e view ”JIRACHANGES VIEW” as
s e l e c t ”CHANGEITEM” . ” ID ” ,
25
CHAPTER 5. METHOD
” ISSUEID ” ,
”CREATED” ,
” FIELDTYPE ” ,
” FIELD ”
from ”CHANGEGROUP” l e f t j o i n ”CHANGEITEM”
on ”CHANGEGROUP” . ” ID ” = ”CHANGEITEM” . ”GROUPID”
Second, I find out which is the latest resolution change for each ISSUEID:
c r e a t e view ” LATEST RESOLUTION DATE VIEW” as
s e l e c t ” ISSUEID ” ,
max( ”CREATED” ) as ”RESOLUTION DATE”
from ”JIRACHANGES VIEW”
where ” FIELDTYPE ” = ’ j i r a ’ and ” FIELD ” = ’ r e s o l u t i o n ’
group by ” ISSUEID ”
Third, to use this extracted resolution date as a measure, I need to get it into the
fact table, which calls for another join:
c r e a t e view ” LE FACT JIRA ” as
s e l e c t ” JIRAISSUE ” . ” ID ” ,
” JIRAISSUE ” . ”REPORTER” ,
” JIRAISSUE ” . ” ASSIGNEE ” ,
” JIRAISSUE ” . ” PRIORITY ” ,
” LATEST RESOLUTION DATE VIEW” . ”RESOLUTION DATE”
from ” JIRAISSUE ” l e f t j o i n ” LATEST RESOLUTION DATE VIEW”
on ” JIRAISSUE ” . ” ID ” = ” LATEST RESOLUTION DATE VIEW” . ” ISSUEID ”
5.2.3 Star schema
With the Jira relational database schema as the data source, I was able to construct
an OLAP cube with the measures and dimensions listed in table 5.1.
Avg. days to resolve
Avg. priority
Issue count (all)
Issue count (open)
Issue count (closed)
Closed % of total
Issue type
Reporter
Assignee
Issuestatus
Priority
Resolution
Issue name
Project
Component
Creation date
Resolution date
Modify date
Measures
Avg. number of days to resolve.
Avg. issue priority.
Number of issues.
Number of open issues.
Number of closed issues.
Percentage of issues closed.
Dimensions
Type of issue (bug, task, etc).
Person who reported the issue.
Person assigned to resolve the issue.
Issue status (open, resolved, etc).
Bug priority.
Kind of resolution (fixed, invalid, etc).
Name of issue.
Which project affected.
Which subcomponent affected.
Date of creation.
Date of resolution.
Date of last modification.
Table 5.1. Measures and dimensions acquired for defect tracking
26
5.3. TEST RESULT TRACKING - BLACKHAWK
5.3
Test result tracking - Blackhawk
Blackhawk is used for automatically running big sets of test cases. They can be unit
tests, testing some specific low-level functionality, or higher-level tests testing larger
parts of the platform.
Individual tests are organized in suites, and each set of suites is then run with
different configurations (affecting operating systems, JVMs, optimization levels).
For each test run, the results are gathered in a report covering pass/fail status and
duration for each individual test and configuration. This report is then stored in a
database.
5.3.1 Extraction
The Blackhawk server at LucidEra stores it’s results in an Oracle database. This
made the extraction straightforward, all that needed to be done was to again utilize
LucidDB’s Oracle adapter to connect to this foreign server using SQL/MED and then
copying the data using normal SQL INSERTs.
5.3.2 Transformation
The test report data in the Oracle server is stored in one single table. To create
dimensions and make a star schema out of this, I had to move dimension data into
separate tables, so a certain amount of normalization was needed.
Through this normalization process I use SQL SELECTs and INSERTs to transform the big all-in-one table into a star schema. Data is moved into separate tables,
connected to the fact table using foreign keys, while some data, that which is to be
used as measures, is left in the fact table. The resulting ETL process can be seen in
figure 5.3.
LucidDB
BUGREPORT_FACT
REPORTER_DIM
ID
ID
PRIORITY NAME
ASSIGNEE_KEY
REPORTER_KEY CALENDAR_DIM
ID
CREATE_DATE_KEY
YEAR
MONTH
DAY
ASSIGNEE_DIM
ID
NAME
Oracle
NIGHTLY_RESULTS
OPENCN
PLATFORMCN
SUCCESS
DURATION
TESTNAME
FOREIGNDB
...
Extract
Oracle
adapter
CALENDAR_DIM
ID
YEAR
MONTH
DAY
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_KEY
REPORTER_KEY
CREATE_DATE_KEY
CALENDAR_DIM
ID
YEAR
MONTH
DAY
CALENDAR_DIM
ASSIGNEE_DIM
ID
NAME
ID
YEAR
MONTH
DAY
BUGREPORT_FACT
ID
PRIORITY
CALENDAR_DIM
ASSIGNEE_KEY
ID
REPORTER_KEY
YEAR
CREATE_DATE_KEY
MONTH
DAY
Transform
REPORTER_DIM
ID
NAME
Star schema
Load
Figure 5.3. ETL layer for the Blackhawk-based application
27
CHAPTER 5. METHOD
5.3.3 Star schema
With the Blackhawk results table as the data source, I was able to construct an OLAP
cube the measures and dimensions in table 5.2.
Tests Run
Success Count
Duration
Fail Count
Success %
Open Changeno
Platform Changeno
Known failure
Foreign DB
Test Suite
Test Name
CPU
JVM
Optimization
Operating System
Test date
Measures
Number of tests run.
Number of successful tests.
How long it took.
Number of failed tests.
Percentage of tests successful.
Dimensions
Open source components build label.
Closed source components build label.
To filter out things known to be broken.
Which foreign database was used.
Which suite of tests.
Name of the test.
CPU architecture used.
JVM used, Sun Hotspot or BEA JRockit.
Compiler optimization level.
Operating system used.
Date of test run.
Table 5.2. Measures and dimensions acquired for test result tracking
5.4
Continous integration tracking - Cruisecontrol
Cruisecontrol is used for scheduling and performing automatic builds whenever the
code base changes. When a build attempt is made, Cruisecontrol keeps track of the
changelists submitted since the last successful build. If the new build is successful,
it is given a unique ID equal to the revision control system submission number of
the last submission included. The logs from each build attempt are stored as XML
files on the build server.
An example chain of events could be where developer Amy submits change 3441
to the revision control system. Cruisecontrol notices that build dependencies have
changed since the last successful build, and schedules a new build attempt to be
made. Before the build attempt starts, however, Bob submits change 3442. This
will have Cruisecontrol include both changes 3441 and 3442 in the build attempt.
The Build is successful, so Cruisecontrol labels the build as number 3442, saves the
log from this attempt, and waits for the next submission to be made.
5.4.1 Extraction
Cruisecontrol logs are stored as XML files:
,
<? xml version = " 1 . 0 " encoding = " UTF -8 " ? >
< cruisecontrol >
< modifications >
< modification type = " p4 " >
< date > 10/11/2006 14 : 06 : 07 </ date >
28
5.4. CONTINOUS INTEGRATION TRACKING - CRUISECONTROL
< user > ogothberg </ user >
< revision > 4072 </ revision >
< email > o g o t h b e r g @l u c i d e r a . com </ email >
< client > ogothberg . wabeeno . lucidera </ client >
</ modification >
< modification type = " p4 " >
< date > 10/11/2006 14 : 14 : 40 </ date >
< user > ogothberg </ user >
< revision > 4073 </ revision >
< email > o g o t h b e r g @l u c i d e r a . com </ email >
< client > ogothberg . wabeeno . lucidera </ client >
</ modification >
</ modifications >
< info >
< property name = " projectname " value = " platform " / >
< property name = " builddate " value = " 10/11/2006 14 : 55 : 46 " / >
< property name = " label " value = " 4074 " / >
...
</ info >
< build time = " 46 min 11 sec " >
...
</ build >
</ cruisecontrol >
LucidDB currently doesn’t have a ready-made adapter for XML files, so in order
to make the Cruisecontrol logs importable, I created an XSL (extensible stylesheet
language) script to convert them into comma separated value (CSV) files:
,
" LABEL " ," PROJECT " ," BUILDDATE " ," DURATION " ," ERROR " ," CHANGENO " ," CHANGEDATE " ," USERNAM . .
" 4074 " ," platform " ," 10/11/2006 14 : 55 : 46 " ," 46 min 11 sec " ," " ," 4073 " ," 10/11/2006 14 : . .
" 4074 " ," platform " ," 10/11/2006 14 : 55 : 46 " ," 46 min 11 sec " ," " ," 4072 " ," 10/11/2006 14 : . .
As can be seen in this example piece of log data, each row in the table corresponds to a revision control system submission, and there can be several such
submissions for each build.
XML
files
XSL
LucidDB
CALENDAR_DIM
ID
YEAR
MONTH
DAY
CSV
Flatfile
adapter
REPORTER_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_DIM
ASSIGNEE_KEY
ID REPORTER_KEY
NAME CREATE_DATE_KEY
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_KEY
REPORTER_KEY
CREATE_DATE_KEY
ASSIGNEE_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
CALENDAR_DIM
ID
YEAR
MONTH
DAY
Extract
Transform
REPORTER_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
Star schema
Load
Figure 5.4. ETL layer for the Cruisecontrol-based application
29
CHAPTER 5. METHOD
5.4.2 Transformation
Once the CSV files are imported as tables in the database, the transformation process for Cruisecontrol data from source table to star schema, as illustrated by figure
5.4, is very similar to that for Blackhawk. That is, all data is stored in a single table,
which requires normalization in order to create dimension tables.
5.4.3 Star schema
Cruisecontrol logs as a data source enables the construction of an OLAP cube with
the measures and dimensions listed in table 5.3.
Build count (all)
Build count (successful)
Build count (failed)
Success %
Avg. number of changes
Avg. duration
Label
Project
Build timestamp
Success
Change timestamp
Change author
Measures
Number of builds made.
Number of successful builds.
Number of failed builds.
Percentage of builds successful.
Avg. number of included changelists.
Avg. time to do a build.
Dimensions
Build label.
Which project.
Time of build.
If the build was successful or not.
Time of submitted Perforce change.
Author of Perforce change.
Table 5.3. Measures and dimensions acquired for continous integration tracking
5.5
Revision control tracking - Perforce
With Perforce, I chose to interface directly through the changelogs as generated by
it’s end-user command-line utility. The advantage of this approach is that anyone
with a perforce account will be able to use the analytics application, as it doesn’t
require direct access to the central repository server.
5.5.1 Extraction
A submission to the perforce server consists of a changelist (one or more files added,
edited, deleted) along with metadata such as a change number to uniquely identify
the change, the client submitting the change, a description, and a timestamp. Perforce allows clients to retrieve a description for a specific changelist using the p4
describe command:
,
Change 7688 by o g o t h b e r g @ o g o t h b e r g . wabeeno . eigenbase on 2006/09/18 01 : 33 : 35
LUCIDDB : new UDP for calculating statistics for everything in a schema LER -1820
Affected files . . .
. . . // open / lu / dev / luciddb / initsql / installApplib . ref # 21 edit
30
5.5. REVISION CONTROL TRACKING - PERFORCE
. . . // open / lu / dev / luciddb / initsql / installApplib . sql # 18 edit
. . . // open / lu / dev / luciddb / test / sql / udr / udp / test . xml # 9 edit
Data in these change descriptions is nicely structured, but to be able to import
it into LucidDB we have to transform it into a comma-separated-value (CSV) file:
,
CHANGELIST _ NUMBER , FILE , REVISION _ NUMBER , ACTION , AUTHOR , WHEN
7688 , // open / lu / dev / luciddb / initsql / installApplib . ref ,21 , edit , ogothberg ,2006 -09 -18 . .
7688 , // open / lu / dev / luciddb / initsql / installApplib . sql ,18 , edit , ogothberg ,2006 -09 -18 . .
7688 , // open / lu / dev / luciddb / test / sql / udr / udp / test . xml ,9 , edit , ogothberg ,2006 -09 -18 . .
The process of extracting descriptions from Perforce and transforming these into
CSV files as that above is automated using sed1 , awk2 , regular expressions and shell
scripting.
5.5.2 Transformation
As can be concluded from the CSV file format used for import, like in the cases
of Blackhawk and Cruisecontrol, and in contrast to the case with Jira, the data
from Perforce had to be normalized to adhere to the star schema required for OLAP
analysis. The data flow is shown in figure 5.5.
Text
files
SED/AWK
LucidDB
CALENDAR_DIM
ID
YEAR
MONTH
DAY
CSV
REPORTER_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_DIM
ASSIGNEE_KEY
ID REPORTER_KEY
NAME CREATE_DATE_KEY
Flatfile
adapter
BUGREPORT_FACT
ID
PRIORITY
ASSIGNEE_KEY
REPORTER_KEY
CREATE_DATE_KEY
ASSIGNEE_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
CALENDAR_DIM
ID
YEAR
MONTH
DAY
Extract
Transform
REPORTER_DIM
ID
NAME
CALENDAR_DIM
ID
YEAR
MONTH
DAY
Star schema
Load
Figure 5.5. ETL layer for the Perforce-based application
5.5.3 Star schema
With Perforce changelogs as the data source, I was able to construct an OLAP cube
with the measures and dimensions listed in table 5.4.
1
2
Sed, http://en.wikipedia.org/wiki/Sed
Awk, http://en.wikipedia.org/wiki/Awk
31
CHAPTER 5. METHOD
Change count (all)
Submission count (all)
Avg. revision number
Changelist ID
Action
Author
Project
Component
Submit date
Measures
Number of file changes.
Number of submissions.
Avg. revision number for changed files.
Dimensions
Unique ID of submission.
Action performed (add, edit, etc).
Submission author.
Affected project.
Affected project subcomponent.
Date of submission.
Table 5.4. Measures and dimensions acquired for revision control tracking
32
Chapter 6
Results
To give the reader a better feel for what analytics applications on the LucidEra
stack actually look like, and a better understanding for the real-world usage of
OLAP measures and dimensions, I will start out this results chapter with a couple
of screenshots from example reports, in section 6.1. The screenshots are from a
pre-beta version of Clearview (LucidEra’s graphical MDX-generating user interface,
see section 4.2.1).
Regarding the actual aim of this thesis, a sizeable part of the analytics from my
wish list in chapter 3 is fairly easy to obtain, while a few would require more work.
In section 6.2, I will compare the original aim to what I was actually able to acquire
through the approach described in chapter 5.
With the metrics in section 6.2 as the jumping board, there are a number of
possible “next steps”, metrics that should be possible to get but that would require
alterations or additions to existing systems or workflows to obtain. I present a few
suggestions in section 6.3.
6.1
Example reports
6.1.1 Performance comparison between virtual java machines
With the test result tracking application developed in section 5.3, I was able to compare how long the same test suite runs in different server environments (number
3.2.2 in my metrics list).
Figure 6.1 shows a report comparing the differences in execution time between
two virtual java machines. From an end-user perspective, the report was created by
dragging and dropping measures and dimensions from the left-hand panel to the
larger right-hand report area. The report shows one measure, “Duration”, and two
visible dimensions: “JVM Name” on columns, and hierarchical dimension “Test” on
rows. Note that dimension “Test” has two levels: “Test Suite” and “Test Name”.
Two additional dimensions are used for filtering to create this report: “Operating
System” (for filtering out only tests run on the RedHat 4 operating system) and
“Platform Changeno” (for filtering out only the latest build available).
33
CHAPTER 6. RESULTS
Figure 6.1. Report on execution time for different virtual java machines.
6.1.2 Test success monitoring for recent builds
Again from the test result tracking application, figure 6.2 shows a bar chart over
the percentage of successful automated tests for the ten most recent builds, and
the differences between two different operating systems (RedHat 4 and Windows
2003). It uses the measure “Success %”, as the X-axis and two dimensions “Platform
Changeno” and “Operating System” on the Y-axis.
Both dimensions showed in this chart are also used as filters for the data in it,
“Platform Changeno” for filtering out only the 10 most recent build numbers, and
“Operating System” for filtering out only RedHat 4 and Windows 2003.
6.2
Compared to the original aim
In this section, I will compare the metrics I was able to get from my applications to
what I initially aimed for with my list of metrics in chapter 3.
6.2.1 Metrics that were easy to get
Many simple but useful metrics are easy to retrieve from the source systems, requiring only fairly simple ETL processes and no changes to infrastructures or workflows.
Basically, these are what a software engineering team could be able to expect “for
free” out of their in-place systems.
34
6.2. COMPARED TO THE ORIGINAL AIM
Figure 6.2. Report on the percentage of passed tests for the ten most recent builds.
Comparing the kind of measures and dimensions I was able to extract from the
source systems (see tables 5.1, 5.2, 5.3, 5.4) to the specific metrics wished for in
chapter 3, table 6.1 lists those that can be deemed easy to get.
3.1.3
3.1.4
3.1.5
3.2.1
3.2.2
3.3.1
3.3.2
3.3.3
3.3.4
3.3.5
3.4.1
3.4.2
3.4.3
3.4.4
3.1 Defect tracking
Open/close activity charts.
Bug report quality.
Average resolution time.
3.2 Test result tracking
Total test results over time.
Performance comparisons.
3.3 Continous integration tracking
Number of runs per day.
Time to fix.
Number of new submissions per build.
Number of changed files per build.
Number of line changes per build.
3.4 Revision control tracking
Number of checkins.
Number of files changed per submission.
Number of lines changed per submission.
Number of integrations.
Table 6.1. Metrics that are easy to get from the source systems.
It should be noted that a couple of the metrics listed, 3.3.4 and 3.3.5, although
they could not be extracted only from continous integration tracking data, would
be easy to get through a bit of cross-referencing with the revision control system,
35
CHAPTER 6. RESULTS
and are therefore well within the “easy” group.
6.2.2 Metrics that would require more work
Some of the metrics (shown in table 6.2) we hoped to have are currently not possible to retrieve due to data simply not being there (3.2.3 and 3.2.4), while others
just proved to be harder to get than expected due to limitations in the analytics
platform (3.1.1 and 3.1.2). Implementing the changes required to infrastructure
and/or workflows in order to obtain these would be some work, but not out of the
possible.
3.1 Defect tracking
Open bugs over time.
Cumulative open/close plot.
3.2 Test result tracking
3.2.3
Total test results over time.
3.2.4
Performance comparisons.
3.1.1
3.1.2
Table 6.2. Metrics that would require extra work to retrieve from source systems.
Problems with cumulative measures
Due to the way that queries are executed against the star schema, reports such
as “Open bugs over time” (3.1.1) or “Cumulative open/close plot” (3.1.2) would
require workarounds to get working.
The reason for this is that even though each row in the star schema’s fact table
has a “create date” and a “close date”, which does enable metrics such as “Open/close activity charts” (3.1.3), the OLAP server does not not allow joins on conditions
other than “equals”. That is, asking the schema “I would like the number of facts
opened on the date D” is possible, but asking “I would like the number of facts
opened before, but closed after, the date D” is currently not.
A possible, but ugly, workaround could be to have a stored procedure in the
database “explode” the fact table temporarily for this kind of queries (creating duplicate rows for each value in the interval) to allow such metrics even when only
joining on “equals”.
Metrics that would require infrastructure improvements
Metrics such as “Test execution reports” (3.2.3) and “Test development reports”
(3.2.4) are technically simple to retrieve from the source data, but since they are
based on the comparison to target data (number of tests expected to be running,
number of tests expected to be developed) from user input, they would require the
infrastructure (extensions to software) to allow users to set these targets.
36
6.3. NEXT STEPS
6.3
Next steps
With the metrics I was able to provide using my analytics applications, there are
a number of possible next steps. In this section, I will cover a few suggestions for
further development.
6.3.1 Possible with additional data sources
Test development effectiveness
By incorporating data from code coverage1 systems such as Emma2 , it would be
possible to measure code coverage as a function of test cases, to get a metric for the
effectiveness of the engineering team’s test development efforts.
Code complexity measures
By combining revision control data and code complexity measures (such as cyclometric complexity), one could try and measure changes in code complexity by
project events.
For example, is there a correlation of jumps of complexity to large bugfixes? Or
do large bugfixes lower complexity (maybe large fixes indicate people simplifying
logic to fix problems)?
Another question: does a higher rate of change result in a worse complexity
than the same change amount over a longer period of time (i.e. when the team is in
a hectic “crunch” period of high workload, how much worse is the code produced)?
6.3.2 Possible with better cross-referencing
Issues affected per submission
Keeping track of how submissions relate to issues in the bug database would make
for interesting analysis of the team’s effectiveness. For example, how long it takes
from the point when a bug is filed until a first attempt to fix it is made, or how many
submissions are required to fix a bug.
Provided a reliable way to link revision control system submissions to issues in
the bug database, a metric like “Known issues affected per submission” this should
be possible to get. The simplest way would be to have developers always enter the
bug database issue ID in the description of every submission they make, the problem
is that having this done regularly enough to base analysis on would require a very
disciplined group of developers.
However, there is software available on the market to make this kind of linking
happen more reliably, such as the Perforce plugin3 for Jira.
1
Code coverage is a measure for to which degree the source code is being tested.
Emma is an open source tool for java code coverage.
3
Perforce plugin for Jira, http://confluence.atlassian.com/display/JIRAEXT/JIRA+
Perforce+Plugin
2
37
CHAPTER 6. RESULTS
File fragility
Getting a metric for how likely it is to break things by editing a certain file could
be useful for identifying weak points in the source dependencies, basically a file
fragility index.
By cross-referencing submission statistics from revision control data and build
breakage data from continous integration tracking, this should be possible to get.
38
Chapter 7
Conclusions
This thesis has investigated the viability of using common business intelligence techniques in order to analyze the software engineering process.
Using multiple software-based tools, such as revision control systems and bug
databases, in order to manage engineering projects, is already a widespread practice. Therefore, the introduction of another one, provided it can add substantial
support to managerial decisions with only a relatively low barrier of entry, should
be possible. What this would require, however, is analytics applications that are
modular enough to be hooked up to existing systems without too much tailoring.
In this thesis I have shown that developing such applications using business
intelligence techniques and platforms should be possible, and that this approach
indeed can automate the gathering of many of the analytics that today are gathered
manually by engineering management.
39
Bibliography
[1]
Atlassian Software Jira - Bug tracking, issue tracking and project management
Accessed on October 23rd, 2006
http://www.atlassian.com/software/jira/
[2]
BBC News July 2nd, 2006. EADS and Airbus bosses both quit
Accessed on October 2nd, 2006
http://news.bbc.co.uk/2/hi/business/5138840.stm
[3]
BBC News March 22nd, 2006. Microsoft delays launch of Vista
Accessed on October 2nd, 2006
http://news.bbc.co.uk/2/hi/business/4831374.stm
[4]
Black, R. (1999) Managing the Testing Process. Microsoft Press
ISBN 0-7356-0584-X
[5]
Brown, M. (1996) Keeping Score: Using the Right Metrics to Drive World-Class
Performance
Quality Resources, ISBN 0-527-76312-8
[6]
Caserta, J. and Kimball, R. (2004) The Data Warehouse ETL Toolkit: Practical
Techniques for Extracting, Cleaning, Conforming, and Delivering Data.
Wiley, ISBN 0-764-56757-8
[7]
Davenport, T. Competing On Analytics
InformationWeek, February 1st, 2006.
[8]
Desmond, J. 2006 Software 500: Deep Concerns Over Security
Software Magazine, issue of October 2006
[9]
Eigenbase Foundation BlackhawkWiki
Accessed on October 23rd, 2006
http://eigenbase.wikispaces.com/BlackhawkWiki
[10] eWeek Microsoft to Gut Longhorn to Make 2006 Delivery Date
Accessed on October 2nd, 2006
http://www.eweek.com/article2/0,1895,1640212,00.asp
41
BIBLIOGRAPHY
[11] ISO/IEC 9075-9:2003 Management of External Data (SQL/MED)
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?
CSNUMBER=34136
[12] Johnson, J. and Boucher K. and Connors, K. and Robinson, J. (2001) Collaborating on Project Success
Software Magazine, issue of February/March 2001
[13] Kaner, C. (2000) Rethinking software metrics: Evaluating measurement schemes
Software Testing & Quality Engineering. Vol. 2, Issue 2 (March) p. 50-57,
2000.
http://www.kaner.com/pdfs/rethinking_sw_metrics.pdf
[14] Kaner, C. (2002) A short course in metrics and measurement dysfunction.
Tutorial session at 15th International Software Quality Conference (Quality
Week), San Francisco, CA, September, 2002
http://www.kaner.com/pdfs/metrics_measurement_dysfunction.pdf
[15] Kaner, C. and Bond, W. (2004) Software Engineering Metrics: What do they
measure and how do we know?
presented at 10th International Software Metrics Symposium (METRICS ’04),
Chicago, IL, September 14-16, 2004.
http://www.kaner.com/pdfs/metrics2004.pdf
[16] Kannala, J. (2005) Predicting Defects in Code
Kungliga Tekniska Högskolan, TRITA-NA-E05082
[17] Kimball, R. and Ross, M. (2002) The Data Warehouse Toolkit: The Complete
Guide to Dimensional Modeling (Second Edition).
Wiley, ISBN 0-471-20024-7
[18] Kimball, R. and Reeves, L. and Ross, M. Thornthwaite, W. (1998) The Data
Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses
Wiley, ISBN 0-471-25547-5
[19] Krusko, A. (2004) Complexity Analysis of Real Time Software
Kungliga Tekniska Högskolan, TRITA-NA-E04032
[20] Eigenbase Foundation LucidDBWiki
Accessed on October 23rd, 2006
http://eigenbase.wikispaces.com/LucidDbWiki
[21] Mann, C. Why Software Is So Bad
Technology Review, issue of July/August 2002
42
[22] Mills, E. (1988) Software Metrics
Carnegie Mellon University, SEI Curriculum Module SEI-CM-12-1.1.
ftp://ftp.sei.cmu.edu/pub/education/cm12.pdf
[23] Perforce Software, Inc. Perforce Product Information
Accessed on October 23rd, 2006
http://www.perforce.com/perforce/products.html
[24] Pressman, R. (1992) Software Engineering: A practitioner’s approach
McGraw-Hill, ISBN 0-07-050814-3
[25] SourceForge.net CruiseControl Overview
Accessed on October 23rd, 2006
http://cruisecontrol.sourceforge.net/overview.html
[26] SourceForge.net Mondrian
Accessed on October 23rd, 2006
http://sourceforge.net/projects/mondrian
[27] The Standish Group International (1995) The CHAOS Report (1994)
Accessed on October 28th, 2006
http://www.standishgroup.com/sample_research/chaos_1994_1.php
[28] Tassey, G. (2002) The Economic Impacts of Inadequate Infrastructure for Software Testing
National Institute of Standards and Technology (NIST)
Accessed on October 28th, 2006
http://www.nist.gov/director/prog-ofc/report02-3.pdf
[29] Thomsen, E. and Spofford, G. and Chase, D. (1999) Microsoft OLAP Solutions.
Wiley, ISBN 0-471-33258-5
[30] Whitehorn, M. and Zare, R. and Pasumansky, M. (2002) Fast Track to MDX
Springer, ISBN 1-85233-681-1
[31] Wikipedia Business Intelligence
Accessed on October 2nd, 2006
http://en.wikipedia.org/wiki/Business_Intelligence
[32] Wikipedia Multidimensional Expressions
Accessed on October 24th, 2006
http://en.wikipedia.org/wiki/Multidimensional_Expressions
43
TRITA-CSC-E 2007: 003
ISRN-KTH/CSC/E--07/003--SE
ISSN-1653-5715
www.kth.se