Quality Metrics for Maintainability of Standard Software

Transcription

Quality Metrics for Maintainability of Standard Software
Quality Metrics
for Maintainability of Standard Software
Master Thesis
Dipl.-Ing. Oleksandr Panchenko
Matr.Nr. 724084
Mentors:
Dr.-Ing. Bernhard Gröne, Hasso-Plattner-Institute for IT Systems
Engineering
Dr. Albert Becker, SAP AG, Systems Applications Products in Data
Processing
23.02.2006, Potsdam, Germany
2
Abstract
The handover of software from development to the support department is accompanied
by many tests and checks, which prove the maturity and the readiness for „go to
market“. However, these quality gates are not able to assess the complexity of the entire
product and predict the effort of maintenance. This work aims the researching of
metric-based quality indicators in order to be able to assess the most important
maintainability aspects of the standard software. The static source code analysis is
selected as the method for mining information about the complexity. The research of
this thesis is restricted to the ABAP and Java environment. The used quality model is
derived from the Goal Question Metric approach and extends it for purposes of the
current thesis. After literature research, the quality model was expanded by standard
metrics and some special newly invented metrics. The selected metrics were validated
theoretically against numerical properties using Zuse’s software measurement
framework and practically against the ability to predict the maintainability using
experiments. After experiments with several SAP-projects, some metrics were
recognized as reliable indicators of the maintainability. Some other metrics can be used
to find non-maintainable code and provide additional metric-based audits. For semiautomated analysis, few tools were suggested and an XSLT converter was developed in
order to process the measurement data and prepare reports. This thesis should prepare
the basis for further implementation and usage of the metrics.
3
4
Zusammenfassung
Vor der Softwareübergabe von der Entwicklung zur Wartung werden zahlreiche Tests
und Untersuchungen durchgeführt, die überprüfen sollen, ob das Produkt bereits reif
genug ist, um an den Markt zu gehen. Obwohl die Qualitätskontrollen sehr
umfangreich sind, wurden die gesamte Softwarekomplexität und der Aufwand für die
zukünftige Wartung bisher kaum berücksichtigt. Deshalb setzt sich die vorliegende
Arbeit zum Ziel, die verschiedenen auf Metriken basierten Qualitätsindikatoren, die die
wichtigsten Aspekte der Wartbarkeit von Standardsoftware einschätzen, zu
untersuchen. Als Komplexitätsanalysemethode wurde die statische Quellcodeanalyse
ausgewählt. Die Untersuchung ist auf die ABAP- und Java-Umgebung beschränkt. Das
Qualitätsmodell ist von der „Goal Question Metric“ - Methode abgeleitet und auf die
Anforderungen der vorliegenden Arbeit
angepasst.
Nach ausführlicher
Literaturrecherche wurde das Qualitätsmodell um bereits vorhandene und neu
entwickelte Metriken erweitert. Die numerischen Eigenschaften der ausgewählten
Metriken wurden mit Hilfe des Messsystems von Zuse theoretisch validiert. Um die
Aussagefähigkeit von Metriken einzuschätzen, wurden praktische Studien
durchgeführt. Experimente mit ausgewählten SAP-Projekten bestätigten einige
Metriken als zuverlässige Wartbarkeitsindikatoren. Andere Metriken können
verwendet werden, um nicht wartbaren Programmcode zu finden und zusätzliche auf
Metriken basierte Audits zu liefern. Für ein halbautomatisches Vorgehen wurden einige
Werkzeuge ausgewählt und zusätzlich eine XSLT entwickelt, um Messdaten zu
aggregieren und Berichte vorzubereiten. Die vorliegende Arbeit soll sowohl als
Grundlage für weitere Forschungen als auch für zukünftige Implementierungen
dienen.
5
6
Abbreviations
A
ABAP
AMC
AST
Ca
CBO
CDEm
Сe
COBISOME
CLON
CQM
CR
CYC
D
D2IMS
DCD
DCI
DD
DIT
DOCU
FP
FPM
GQM
GVAR
H
I
IF
IMS
In
ISO
KPI
LC
LCC
LCOM
LOC
LOCm
m
Abstractness
Advanced Business Application Programming (Language)
Average Method Complexity
Abstract Syntax Tree
Afferent Coupling
Coupling between Objects
Class Definition Entropy (modified)
Efferent Coupling
Complexity Based Independent Software Metrics
Clonicity
Code Quality Management
Comments Rate
Cyclic Dependencies
Distance from Main Sequence
Development to IMS
Degree of Cohesion (direct)
Degree of Cohesion (indirect)
Defect Density
Depth of Inheritance Tree
Documentation Rate
Function Points
Functions Point Method
Goal Question Metric
Number of Global Variables
Entropy
Information
Inheritance Factor
Installed Base Maintenance & Support
Instability
International Standards Organization
Key Performance Indicators
Lack of Comments
Loose Class Cohesion
Lack of Cohesion of Methods
Lines Of Code
Average LOC in methods
Structure Entropy
7
MCC
MEDEA
MI
MTTM
NAC
NDC
NOC
NOD
NOM
NOO
NOS
OO-D
PIL
RFC
SAP
SMI
TCC
U
UML
XML
XSLT
V
WMC
ZCC
8
McCabe Cyclomatic Complexity
Metric Definition Approach
Maintainability Index
Mean Time To Maintain
Number of Ancestor Classes
Number of Descendent Classes
Number Of Children in inheritance tree
Number Of Developers
Number of Methods
Number Of Objects
Number of Statements
OO-Degree
Product Innovation Lifecycle
Response For a Class
Systems, Applications and Products in Data Processing
Software Maturity Index
Tight Class Cohesion
Reuse Factor
Unified Modeling Language
eXtensible Markup Language
eXtensible Stylesheet Language Transformation
Halstead volume
Weighted Methods per Class
ZIP-Coefficient of Compression
Table of content
1. Introduction
.
.
.
.
.
.
.
.
.
2. Research problem description .
.
.
.
.
.
.
Different methods for source analysis .
.
.
.
.
Metrics vs. audits .
.
.
.
.
.
.
Classification of the metrics
.
.
.
.
.
.
Types of maintenance
.
.
.
.
.
.
.
Goal of the work .
.
.
.
.
.
.
.
3. Related works and projects
.
.
.
.
.
.
.
Maintainability index (MI) .
.
.
.
.
.
.
Functions point method (FPM) .
.
.
.
.
.
Key performance indicators (KPI)
.
.
.
.
.
Maintainability assessment
.
.
.
.
.
.
Abstract syntax tree (AST) .
.
.
.
.
.
.
Complexity based independent software metrics (COBISOME)
.
Kaizen
.
.
.
.
.
.
.
.
.
ISO/IEC 9126 quality model
.
.
.
.
.
.
4. Quality model – goals and questions .
.
.
.
.
.
Goal Question Metric approach .
.
.
.
.
.
Quality model
.
.
.
.
.
.
.
.
Size-dependent and quality-dependent metrics
.
.
.
5. Software quality metrics overview .
.
.
.
.
.
Model: Lexical model
.
.
.
.
.
.
.
Metric: LOC – Lines Of Code
.
.
.
.
.
Metric: CR – Comments Rate, LC – Lack of Comments
.
Metric: CLON – Clonicity .
.
.
.
.
.
Short introduction into information theory and Metric:
CDEm – Class Definition Entropy (modified) .
.
.
Model: Flow-graph .
.
.
.
.
.
.
.
Metric: MCC – McCabe Cyclomatic Complexity
.
.
Model: Inheritance hierarchy
.
.
.
.
.
.
Metric: NAC – Number of Ancestor Classes .
.
.
Metric: NDC – Number of Descendant Classes .
.
.
Geometry of Inheritance Tree
.
.
.
.
.
Metric: IF – Inheritance Factor .
.
.
.
.
Model: Structure tree
.
.
.
.
.
.
.
Metric: CBO – Coupling Between Objects
.
.
.
Metric: RFC – Response For a Class
.
.
.
.
Metric: m – Structure entropy
.
.
.
.
.
Metric: LCOM – Lack of Cohesion Of Methods .
.
.
Metric: D – Distance from main sequence
.
.
.
Metric: CYC – Cyclic dependencies
.
.
.
.
Metric: NOM – Number Of Methods and WMC – Weighted
Methods per Class .
.
.
.
.
.
.
11
13
13
13
15
15
16
17
17
17
18
18
19
19
19
19
20
20
21
24
25
25
25
26
26
27
35
35
36
37
37
37
40
40
40
42
43
45
50
51
53
9
Model: Structure chart
.
.
.
.
.
Metric: FAN-IN and FAN-OUT .
.
.
Metric: GVAR – Number of Global Variables .
Other models
.
.
.
.
.
.
Metric: DOCU – Documentation Rate .
.
Metric: OO-D – OO-Degree
.
.
.
Metric: SMI – Software Maturity Index .
.
Metric: NOD – Number Of Developers .
.
Correlation between metrics
.
.
.
.
Metrics selected for further investigation
.
.
Size-dependent metrics and additional metrics .
6. Theoretical validation of the selected metrics
.
.
Problem of misinterpretation of metrics .
.
.
Types of scale
.
.
.
.
.
.
Types of metrics
.
.
.
.
.
.
Conversion of the metrics .
.
.
.
.
Other desirable properties of the metrics
.
.
Visualization .
.
.
.
.
.
.
7. Tools
.
.
.
.
.
.
.
.
ABAP-tools .
.
.
.
.
.
.
Transaction SE28 .
.
.
.
.
Z_ASSESSMENT .
.
.
.
.
CheckMan, CodeInspector.
.
.
.
AUDITOR .
.
.
.
.
.
Java-tools
.
.
.
.
.
.
.
Borland Together Developer 2006 for Eclipse .
Code Quality Management (CQM)
.
.
CloneAnalyzer
.
.
.
.
.
Tools for dependencies analyze .
.
.
JLin .
.
.
.
.
.
.
Free tools: Metrics and JMetrics .
.
.
Framework for GQM-approach .
.
.
.
8. Results
.
.
.
.
.
.
.
.
Overview of the code examples to be analyzed .
.
Experiments .
.
.
.
.
.
.
Admissible values for the metrics
.
.
.
Interpretation of the results
.
.
.
.
Measurement procedure .
.
.
.
.
9. Conclusion
.
.
.
.
.
.
.
10. Outlook .
.
.
.
.
.
.
.
References .
.
.
.
.
.
.
.
Appendix .
.
.
.
.
.
.
.
10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
54
54
55
55
55
55
56
56
57
59
60
60
61
62
64
67
67
70
70
70
70
70
71
71
71
72
72
72
73
73
74
75
75
76
84
85
85
88
90
92
97
1. Introduction
The Product Innovation Lifecycle (PIL) of SAP is divided into five phases with a set of
milestones. A brief overview of the PIL can be seen in figure 1.1. Consider the milestone
(so called Quality-Gate) “Development to IMS” (D2IMS) in more details. The “Installedbased Maintenance and Support Department” (IMS) gets the next release through such
a Quality-Gate with the start of the Main Stream phase and will support it for the rest of
its lifecycle. The Quality-Gate is a formal decision to hand over the release
responsibility to IMS and is based on a readiness check, which proves the quality of the
release by a set of checks. However, this check aims to establish the overall quality and
absence of errors and is not intended for determination of how easy is it to maintain the
release.
For correct planning of resources for the maintenance the IMS needs additional
information about those attributes of software that impact maintainability. This
information will not influence the decision of the Quality-Gate, but will help IMS
developers in the planning of resources and analyzing of the release. Such information
can also support the code reviews and allows earlier feedback to the development.
This thesis aims at filling this gap by providing of a set of indicators, which describe the
release from the viewpoint of its maintainability, and provides instructions how these
should be interpreted. The second goal is a set of indicators, which can find the badly
maintainable code.
Detailed descriptions of the PIL concept can be found at SAPNet Quicklink /pil or in
[SAP04].
Figure 1.1: Goals of Quality-Gate “Development to IMS”. [SAP05 p.43]
With maintainability is meant the attribute for how easy and rapidly the process of
maintaining the software is. The high maintainability means smooth and well
11
structured software, which can be easily maintained. Other definitions of the
maintainability like “likelihood of errors” are out of scope of this work.
On time of the Quality-Gate D2IMS, the product has already been completely
developed and tested. Thus the complete source code is accessible. However the
product is only about to “go to market” and no data about customer messages or errors
is available. Consequently, only internal static properties of the software can be analyzed
at this point of time.
One way of approaching this problem is to investigate the dependency between the
maintainability of the software and its design, with the goal to find the design
properties that can be used as maintainability indicators. Since standard software is
usually very large and no human analysis is possible, such finding should be taken by
an automated device and must be objective. Thus only objective measures can be used.
The subject of this thesis is the complexity of the software, which often leads to the
badly maintainable code. Metrics provide a mathematical fashion for a purposeful
describing of certain properties of the objective. After comprehension of the
maintainability basis and finding the design peculiarities which impact the
maintainability, this thesis proposes a way to describe these properties of the design
using the metrics. Consequently, several selected metrics should be able to indicate the
most important aspects of maintainability, and the overall quality of the software.
Moreover it is commonly accepted that a bad code or lack of design is much easier to
discover than a good code. Therefore it should not be a big challenge to find code that
could cause problems for maintenance.
All in all the solution of this task allows: deep understanding of the essence of the
maintainability and its factors, estimating the quality of the product from viewpoint of
the maintainability, appropriate planning the resources for the maintenance and
providing the earlier feedback to the development.
A more detailed problem description and the goals of this thesis are presented in
chapter 2.
This thesis is composed in the following way: chapter 3 gives an overview of the related
work. The quality model, which is used to determine the essence of the maintainability,
is discussed in chapter 4. Chapter 5 provides short descriptions of the metrics –
candidates for extending the quality model. Chapter 6 supplements metric’s
descriptions with theoretical validation. Tools, which can be used for the software
measurement, are discussed in chapter 7. Experiments and results are discussed in
chapter 8. Conclusions are given in chapter 9 and a short outlook in chapter 10 finishes
this thesis.
12
2. Research Problem Description
Different Methods for Source Code Analysis
All methods for the analysis of the source code can be divided into two groups: static
and dynamic methods. The static methods work with the source code directly and don’t
demand running the product. This quality allows using of the static methods in earlier
phases of the lifecycle and is one of the important requirements for this thesis. To the
static methods belong metrics and audits.
To the dynamic methods belong metrics, tests and Ramp-Up. Dynamic methods can
also consider dynamic complexity of the product, for example not only how the
connections between modules are organized, but also how often these are actually used.
Here and further in this paper with a module is meant any compilable unit: a class in
Java and a program, a class, a function module etc. in ABAP. Noteworthy, that the
dynamic methods can show different results in different environments. Above all it is
important for applications which provide only generic framework and the customer
composes its own product using provided functions (for example solutions for the
business warehousing).
Metrics for dynamic complexity are usually based on an UML specification of project,
for instance state-diagrams, and are analyzed using colored Petri-Nets. The next
possibility is collecting of statistical information about the running application. Several
experts mean, that improving of only few modules (which are most often used) can
significantly improve the quality of the entire system. Noteworthy, modules, which are
often used from other modules, are more sensible to the changes and should have better
quality.
All methods except metrics are relatively good investigated at SAP and also supported
with the tools.
The author believes that the main reasons of program’s maintainability are placed
directly in the code and indicators can be extracted from the static code without
supplementary dynamic analysis. Three main activities of the maintenance are:
analyzing the source code, changing the program and testing the changes. Therefore,
most of the time maintainer works also with the static code.
Metrics vs. Audits
Two main types of the static code analysis are distinguished: metrics and audits. The
metrics provide a description of the measured code, which means a homomorphic
mapping from empirical objects to real numbers in order to describe their (objects)
certain properties. The homomorphism in this case means “a mapping from the
empirical relational system to the formal relational system which preserves all relations
and structures between the considered objects” [ZUSE98 p.641]. Empirical objects can
have different relations in between, for example: one program is larger than another, or one
program is more understandable than another one. Of course researcher wants the metric to
preserve such relations. Such mapping also means that metrics should be considered
only in context of some relation between empirical objects. Common example of the
13
homomorphic mapping is presented in figure 2.1. More to the theoretic framework of
the software measurement see [ZUSE98, in particular p.p. 103-130]. An example of
metric is LOC (Lines Of Code), this metric preserves relation “Analyzability”, since
smaller programs are in common easier to understand than larger programs. Other
metric like NOC (Number Of Children in inheritance tree) preserves relation
“Changeability”, since class with many sub-classes is more difficult to change, than
class with only few or no sub-classes.
Empirical Objects
Numerical Objects
Metric
M1
P1
Numerical
Relations
Empirical
Relations
P2
PN
Metric
M2
MN
Figure 2.1: Metric - Mapping between Empirical and Numerical objects which preserves all relations
According to Zuse’s measurement framework [ZUSE98], the specifying of the metric
includes the following steps:
Identify attributes for real world entities
Identify empirical relations for such attributes
Identify numerical relations corresponding to each empirical relation
Define mapping from real world entities to numbers
Check that numerical relations preserve and are preserved by empirical relations
In opposite to the metrics, audits are just verification of adherence to some rules or
development standards. Usually audit is a simple calculation of violation of these rules
or patterns. SAP uses a wide range of audit-based tools for ABAP: CHECKMAN, Code
Inspector, Advanced Syntax Check, SLIN; and Java: JLin, Borland Together.
Audits help developers find and fix the code potentially having errors and increase
awareness of the quality in development. However audits are bad predictors of the
maintainability, because though an application is conformant to the development
standards, it could be poorly maintainable. Moreover, the audits give concrete
recommendation to developers, but are not able to characterize the quality of the
product in general. The second reason for rejecting of the audits is absence of the
complexity analysis – main part of the maintainability analysis.
Further in this work only metrics will be considered. Approaches for finding of the
appropriate metrics are discussed in chapter 4. Research of numerical properties of
metrics is discussed in chapter 6.
14
Based on the metric definition, the following scenarios of usage are thinkable:
Compare the certain attributes of two or several software systems
Formally describe the certain attributes of the software
Prediction. If a strong correlation between the metrics is found, the value of one
metric can be predicted based on the values of another metric. For example if the
relation between some complexity metric (product metric) and the fault
probability (process metric) is found, one can predict the probability of a fault in
certain module based on its complexity
Keep track of evolution of the software. Comparing different versions of the
same software allows drawing conclusions about the evolution and trend of the
product
Classification of the Metrics
In [DUMK96 p.p. 4, 8] Dumke considers three improvement (measurement) areas of the
software development: software products, software processes and resources, and gives
metrics classification of each area.
Product metrics describe properties of the product (system) itself and thus depend on
internal qualities of the product only. The examples of product metrics are number of
lines of code or comments rate.
Process metrics describe an interaction process between the product and its
environment, the environment can be also the people, who develop or maintain the
product. The examples of the process metrics are number of problem closed during the
month or mean time to maintain (MTTM). Obviously, the maintainability can be
measured directly in the process of maintenance using process metrics like MTTM.
However this maintainability assessment should be made before the maintenance
begins and these metrics are available. Thus this thesis tries to predict the
maintainability in earlier phases of the lifecycle using the product metrics only.
Resources metrics describe properties of the environment. The examples of resource
metrics are number of developers in a team or amount of available memory on a server.
This work is concentrated purely on the software product measurement. However the
process metrics also can be used for empirical validation of the product metrics, because
these can measure the maintainability directly and prove the prediction, which has been
made by the product metrics. Once an appropriate correlation between the product
metrics and the process metrics is established, one can talk about empirically validated
product metrics.
Types of Maintenance
There are three main types of maintenance (based on [RUTH]):
Corrective – making it right (also called repairs)
o To correct residual faults: specification, design, implementation,
documentation, or any other types of faults
o Time consuming, because each repair must go through the full
development lifecycle.
15
o On average, ~20% of the overall maintenance time (however at IMS
reaches 60%, and with beginning of the Extended Maintenance even up to
100%)
Adaptive – making it different (functional changes)
o Responses to changes in environment, in which the product operates
o Changed hardware
o Changed connecting software, e.g. new database system
o Change data, e.g. new phone dialing codes
o On average, ~20% of maintenance time (at IMS 30%, with the time 10%)
Perfective – making it better
o Change software to improve it, usually requested by client
o Add functionality
o Improve efficiency – for example performance (also called polishing)
o Improve maintainability (also called preventative maintenance)
o On average, ~60% of the maintenance time (at IMS only 10-20%)
For the IMS most important and time consuming is the corrective maintenance.
However, this thesis doesn’t distinguish between special types of maintenance, because
the general the process is the same for all types of maintenance. Nevertheless, the
results of this analysis especially can be used for the planning of the preventative
maintenance.
Goal of the Work
This thesis is going to answer the question: What are the metrics are able to do?
In order to wrap this question into more practical task, the following formulation will
be used: the set of metric-based indicators for the maintainability of the standard
software should be found in order to be able to assess or predict the maintainability,
based on the internal qualities of the software in earlier phases of the lifecycle.
No singular metric can adequately and exhaustively evaluate the software and too
many metrics may lead to the informational overload. Thus a well-chosen subset of
about 12 measures should be selected and analyzed. For each metric admissible
boundaries and recommendable values should be defined. Moreover, possible
interpretations of the results and its meaning for the maintainability should be
prepared. Since the measurement is going to be made in the automatic manner,
overview of the most suitable tools should be provided and the description of the
measurement process should be prepared. Detailed description, implementation hints
and examples should be prepared for each metric. In order to fulfill all these
requirements, the theoretical and empirical validation of the selected metrics also
should be done.
The approach must not use additional information sources (except the source code) like:
requirements, business scenarios, documentation etc. In current work only metric-based
static code analysis is considered.
16
3. Related Works and Projects
In this chapter several relevant projects are introduced. This description should give an
idea what has been done in this field so far. Besides the selected projects, a wide range
of articles and books were written to research single metrics and measurement
frameworks. These are not included in this chapter, but mentioned or referenced further
in this thesis.
Maintainability Index (MI)
Hybrid measures are not measured directly from the software, but are a combination of
other metrics. Most popular form for the combination is polynomial, nevertheless, there
are also other forms. Such combination is used for having one resulting number of the
whole evaluation. However this desire brings researches to the problem of hiding
information. Hybrid measures show attributes of empirical object incompletely and
imperfectly.
One attempt to present the maintainability as a single hybrid measure is the
Maintainability Index from Welker and Oman [WELK97], which includes some models
with various member metrics and coefficients. One of them is the improved, four-metric
MI model:
MI = 171 – 5,2 ln(Ave-V) – 0,23 Ave-MCC – 16,2 ln(Ave-LOC) + 50 sin(√(2,4 Ave-CR))
where:
Ave-V is the average Halstead volume per module,
Ave-MCC is the average McCabe Cyclomatic Complexity per module,
Ave-LOC is the average number of lines of code per module,
Ave-CR is the average per cent of comments per module.
Many of the metrics used here are discussed in chapter 5.
The research in [WELK97] gives the following indications on the meaning of the MI
values:
MI < 65: poor maintainability
65 < MI < 85: fair maintainability
85 > MI: excellent maintainability
Nevertheless, all used metrics are intra-modular and don’t concern inter-modular
dependencies, which highly impact the maintainability, thus Maintainability Index was
rejected from the further investigation.
However, using this approach led to an interesting observation: MI was messed on two
different points of time for the modules of the same system and it was shown, that less
maintainable modules became more difficult to maintain, while good maintainable
modules kept the good quality with the time.
Functions Point Method (FPM)
The Functions point method suggests assigning to each module, class, input form etc.
certain number of functions points (FP) depending on its complexity. The sum of all
17
points predicts the development or maintenance effort for the whole application.
Assumed that developer can daily implement certain number of FP on average,
manager can predict number of developers and time needed. FPM is perfectly
applicable at early project phases and allows predicting the development and
maintenance effort when source code is not yet available. It also suits strong dataoriented concept of SAP applications. Nevertheless, in case of this work source code is
already available and it could be difficult to conversely calculate the number of FPs,
which were implemented. Especially, it could be difficult in case of the product, which
has been bought from outside and no project or design documentation is available.
To make matter worse, FP are subjective and don’t suit requested objective model.
Also, these measures were rather designed for cost estimations (before source code is
available) than for the measurement. Thus in the best way one can collect information
from source code directly, not using FPM as additional layer of abstraction. For readers,
who are interested in FPM, the following sources are recommendable: [AHN03],
[ABRA04b]
Key Performance Indicators (KPIs)
The goal of this project is definition, measurement and interpretation of basic data and
KPIs for the quality of the final product. For assessment of the product quality several
(ca. 30) direct indicators were selected, this means that the data is collected immediately
in the process of maintenance. Examples of the selected indicators are:
Number of messages - Number of customer messages income per quarter
Changed coding - Sum of inserted and deleted lines in all notes with category
“program error” divided by the total number of lines of coding (counted per
quarter)
Callrate - Number of weekly incoming messages per number of active
installations
Defect Density (DD) - Defined as the number of defects (weighted by severity
and latency) identified in a product divided by the size and complexity of the
product
Nevertheless, the earliest possible availability of such indicators is at the end of RampUp phase. Thus in context of current thesis it is only possible to use these KPIs for
validating of the developed metrics. For more details about KPIs see [SAP05c].
Maintainability Assessment
This project aims assessing of the maintainability of the SAP products shortly before
handover to IMS. Thus the goal is nearly the same as with the current thesis. However
the assessment chosen in this project is audit-based. Several aspects of the
maintainability are inspected and the list with questions is prepared. Expert should
analyze the product in manual manner, answer them and fill out a special form. After
that the final conclusion about the maintainability can be automatically reported. Main
drawbacks of the suggested method are the manual character of the assessment and
only one single resulting value, which is difficult to interpret. In this project also some
18
primitive metrics like lines of code and comments rate are suggested and the tool for
supporting of these metrics is provided.
Abstract Syntax Tree (AST)
In this project ABAP code is analyzed and a method for building an abstract syntax tree
is suggested. A plug-in for Eclipse is also developed in order to automate this method.
The plug-in allows saving the AST into an XML-document and analyzing it. Based on
this technique, some metrics for ABAP can be implemented. Another way to use the
AST is to find clones – copied fragments of coding.
Complexity Based Independent Software Metrics
This is a master thesis about Complexity Based Independent Software Metrics (short:
COBISOME). The main point of this work is to find an algorithm for converting a set of
correlated metrics into a set of independent metrics. Such conversion creates the list of
virtual independent (orthogonal) metrics, what allows examining different aspects of
the software independently and thus more effectively. Nevertheless, the complicated
transformations and aggregation of several metrics to one make the analysis more
difficult at the same time. For more details see [SAP05b].
Kaizen
Objective of the project Kaizen is to analyze selected SAP code to understand it better
and look for ways to continually improve it.
Three possible objectives of the code improvement are:
Improve readability and general maintainability
Reduce cost of service enabling
Enable future enhancements in functionality (when well understood)
Kaizen will focus on objectives #1 and #2 as applicable to most SAP code. One of the
first steps of the project is the analysis of the maintainability metrics.
ISO 9126 – Standard Quality Model
The ISO 9126 quality model was proposed as an international standard for the software
quality measurement in 1992. It is a derivation of the McCall model (see appendix A).
This model associates attributes and sub-characteristics of the software to one of the
areas (so called characteristics) in hierarchical manner. For the area “Maintainability”
the following attributes are arranged: analyzability, changeability, stability, testability
and compliance. Although one has these attributes, the measuring of the quality still is
not easy. This model is customizable, but not very flexible and in many cases not
applicable. Hence this model is not common acceptable and only few tools are based on
the ISO model.
19
4. Quality Model – Goals and Questions
Goal Question Metric Approach
A quality model is a model explaining the quality from certain point of view. An
objective of the quality model could be products, processes or projects. Most of the
models suggest a decomposition principle, where a more general characteristic is
decomposed into several sub-characteristics and further into metrics.
Various metric definition approaches (MEDEAs) were developed. Most effective are the
hierarchical quality models organized in a top-down fashion: it must be focused, based
on goals and models and at the same time provide appropriate detailing. “A bottom-up
approach will not work because there are many observable characteristics in software,
but which metrics one uses and how one interprets them it is not clear without the
appropriate models and goals to define the context” [BASI94 p.2]. Nevertheless bottomup approach is useful by the metrics validation, when the metrics are already selected.
The most flexible and very intuitive approach is the Goal Question Metric MEDEA
(GQM), which suggests hierarchical top-down model for selecting of the appropriate
metrics. This model has at least three levels:
Conceptual level (goals): This level presents a measurement goal, which is
derived from business goals. In case of this thesis the measurement goal would
be “good maintainable software”. However, in order to facilitate formalizing of
the top goal, the GQM specifies a template, which includes a purpose, an object,
a quality issue, a viewpoint and a context. The formalized goal is given in the
next section (see p. 21). Since the top goal is usually very complex, it can be
broken down into several sub-goals in order to make easier the interfacing with
the underlying levels
Operational level (questions): As the goals are presented on the very abstract
conceptual level, each goal should be refined into several quantifiable questions,
which introduce more operational level and hence are more suitable for the
interpretation. Answers to these questions have to determine whether the
corresponding goal is being met. “Questions try to characterize the object of
measurement with respect to a selected quality issue and to determine its quality
from the selected viewpoint” [BASI94 p. 3]. Hence, questions help to understand
the essence of the measurement objective and find the most appropriate
indicators for it. These indicators could be explicitly formalized within an
optional Formal level
Quantitative level (metrics): Metrics placed on this level should provide all
quantitative information to adequately answer the questions. Hence, metrics are
a refinement of the questions into quantitative product measures. The metrics
should provide enough sufficient information to answer the questions. The same
metric can be used to answer multiple questions
20
Optional Tools level can be included into the model in order to show the tool
assignment for the metrics
The abstract example of the GQM model is illustrated in figure 4.1. A more detailed
description of the GQM and step-by-step procedure of using it are described in
[SOLI99].
“GQM is useful because it facilitates identifying not only the precise measures required,
but also the reasons why the data are being collected” [PARK96, p. 53].
It is possible to range the impact of metrics on questions using weight coefficients, to
make clear, which metric is more important. However, the used in this thesis model
doesn’t aim to describe, which weights the metrics have. Author believes that the best
way is to give the analyst full freedom in his decision. The analyst can decide
dependently on the situation, which indicator is more important.
Figure 4.1: The Goal Question Metric approach (abstract example)
The measurement process using the GQM-approach includes four main steps:
Definition of the top goal and the goal hierarchy
Definition of the list of the questions, which explain the goals
Selection of the appropriate metric set; theoretical and empirical analysis of each
metric; selection of the measurement tools
Collecting measurement data and interpreting of the results
The first three steps are intended for the definition of the GQM-Quality model, the last
step means the actually measurement and interpretation and can be repeated many
times.
Quality Model
According to the GQM goal specification, the major goal for the maintainability
purpose is: to assess (purpose) maintainability (quality issue) of standard software (object)
from IMS’s viewpoint (viewpoint) in order to manage it and find possible ways to improve it
(purpose) in the ABAP and Java environment (context). The question for the major goal
could be “How easy is the location and fixing of an error in the software?”, but this
question is very vague and can only be answered with the process metrics like MTTM.
As it was mentioned before, measuring such process metrics is only possible during the
21
maintenance and thus is inappropriate for purposes of this work. Let’s call such goals
external goals, because the degree of the goal achievement also depends on some
external motive. The degree of achievement of internal goals depends only on internal
properties of the software and hence can be described relatively early in the lifecycle.
This major goal is highly complex and it is difficult to create appropriate questions for
it, thus complex hierarchy of goals should be used including top goal, goals and subgoals. Moreover, on the bottom of the hierarchy should be placed internal goals only, so
that questions will be addressed only to the internal goals. The goal hierarchy is
depicted on figure 4.2, where blue boxes present external goals. Such decomposition
allows sensible selection of questions and necessary granularity. The full model is
presented in appendix B.
Figure 4.2: Mapping of external and internal goals
The used quality model is based on several validated and acknowledged quality models
as: ISO 9126 standard quality model, McCall quality model, software quality
characteristics tree from Boehm and Fenton’s decomposition of maintainability.
Corresponding parts of these models can be found in appendix A or in [KHAS04].
Several sub-goals and metrics also were taken from [MISR03]. After examination of
these quality models, theoretical speculation and research of the literature in this field,
the following areas (goals) were recognized as important for maintainability of the
software:
Maturity
Clonicity
Analyzability
Changeability
Testability
The goals Maturity and Clonicity are described together with corresponding metrics in
chapter 5 (see p.55 and p.26, correspondently). Next, the aspects Analyzability,
Changeability and Testability are discussed.
The Analyzability is probably the most important factor of the maintainability. Nearly all
metrics used in the model are also presented in the Analyzability area. First, author
22
wanted to include also goal Localizing into the model, which would characterize how
easy it is to localize (find) the fault in the software. But later it was found, that most of
metrics for this goal are already included in Analyzability and Localizing was rejected
from the model.
The following sub-goals should be fulfilled in order to create the easy comprehensible
software:
Algorithm Complexity - Keeping easy the internal (algorithm) complexity
Selfdescriptiveness - Providing of the appropriate internal documentation
(naming conventions and comments)
Modularity - Keeping the modules small and appropriate encapsulation of the
functionality into the modules (cohesiveness)
Structuredness - Proper organization of the modules in the entire structure
Consistency - Keeping the development process easy and well organized. There
are a lot of researches trying to determine whether the well organized
development process leads to good quality of the product. However no evident
relation was found. Nevertheless, the maintainer is sometimes confused if he
sees that the module was changed many times by different developers.
Consistency in this context means clear distribution of tasks between developers.
For the changeability (or easiness of making changes in the software) it is important to
have proper design of software, which allows the maintenance without side-effects. The
quality model includes the goals Structuredness, Modularity and Packaging in this area.
Whereas the structuredness has several different aspects:
Coupling describes the connectivity between classes
Cohesiveness describes the functional unity of a class
Inheritance describes the properties of inheritance trees
The Testability means easiness of the testing and the maintenance of test-cases. Bruntink
in [BRUN04] investigates the testability from the perspective of unit testing and
distinguishes between two categories of source code factors: factors that influence the
number of test cases required testing the system (let’s call the goal for these factors
“Value”), and factors that influence the effort required to develop each individual test
case (Let’s call the goal for these factors “Simplicity”). Noteworthy, that with the
“Value” the number of the necessary test-cases is meant and not a number of actually
available test-cases. Consequently, for the high maintainability is important to keep the
“Value” small. Nevertheless, most efforts in the field of the test coverage are
concentrated on the low procedure level, for example the percentage of the tested
statements within a class.
The quality model includes several metrics for the testability validated in [BRUN04].
In the SAP system important part of the complexity is included in the parameters for
customization, however the experts argue, that most of the customization complexity is
already included in source code, where the parameters are read and processed.
The impact of individual metrics on the maintainability is discussed in greater detail in
chapter 5.
23
Size-dependent and Quality-dependent Metrics
Before the individual metrics can be discussed, one important property of the metric,
namely size-dependency, should be introduced. Some metrics are highly sensible of the
project size. That means such metrics will show higher values whenever software
grows. Such metrics are size-dependent. Other metrics are quality-dependent and measure
purely quality independent of size. That means larger software can have smaller values
of such metrics. A good example of size-dependent metric could be LOC (Lines Of
Code) because it continuously grows with each new statement. The metric Ave-LOC
(average number of LOC in module or method) is on the contrary independent of the
size and imparts important characteristic of the quality – the modularity.
In order to be able to compare software of very different size, usage of the qualitydependent metrics is more preferable. However many size-dependent metrics impart
the qualitative attributes of software as well, but they are too sensible of the size and
need to be converted before usage in order to reinforce the quality constituent of the
metric.
Moreover, few size-dependent metrics should be included into the quality model, in
order to gain some insight about the size of the considered system. For this purpose the
metrics Total-LOC – total lines of code and Total-NOO – number of all objects
(modules) are suggested.
“Although size and complexity are truly two different aspects of software, traditionally
various size metrics have been used to help indicate complexity” [ETZK02, p. 1].
Consequently, the metric that assesses a code complexity of a software component by
the use of a size analysis alone will never provide a complete view of complexity.
24
5. Software Quality Metrics Overview
In this chapter all metrics, which are supposed to be used in the quality model, will be
discussed in more details.
Many metrics are complex and difficult to measure directly, thus it is usual to build
some abstraction of the system called model and measure the attributes of this model.
There are five main models that suit for software product measurement in context of
this thesis. Since the properties of a metric depend on the model where the metric is
measured in, all metrics are grouped in sets concerning the model they belong to.
In the literature two major classes of software measures can be found. They are based
on modules and entire software systems and are called respectively intra- and intermodular measures. Metrics, based on lexical model and flow-graph, are intra-modular,
metrics, based on inheritance hierarchy model, structure tree and structure chart, are
usually inter-modular.
Model: Lexical Model
This model is intended for intra-modular measures and constitutes plain text in the
programming language. It is also possible to partition the text into simple tokens and
analyze the frequencies of usage for these tokens.
Metric: LOC – Lines Of Code
The metric LOC counts the number of lines of code excluding white spaces and the
comments which take a whole line.
Total LOC in the system reveals quantitative meaning and shows first of all the size of
the system. It has no qualitative meaning because both small and huge system could be
maintainable or not. In qualitative sense can be used the metric Ave-LOC – average
amount of LOC per module or class. This metric shows how good the system is split in
parts. It is widely accepted that small modules are in common easier to understand,
change and test than bigger ones. However, a system with a large number of small
modules has a large number of relations between them and is complex as well. See
chapter “Correlation between metrics” for more details.
If you really want to compare code written by different people, you might want to
adjust for different styles. One simple way of doing it is to only count open braces and
semicolons or full stops for ABAP (this works fairly well for ABAP and Java). From this
point of the view metric NOS (Number of Statements) is more universal. However, in
large systems both metrics are strongly correlated because of a mixture of different
programming styles, and have the same empirical and numerical properties.
Noteworthy that Java has a more compact syntax. In [WOLL03, p.5] it is shown that a
program written in Java has about 1,4 times more functionality than an ABAP program
of equal length. This should be considered by estimation of admissible values for LOC.
Probably, the LOC is the most important metric because many other metrics correlate
with LOC. Therefore the approximate value of other more complicated metrics could be
25
easily estimated by LOC. For example figure 5.14 depicts the correlation between LOC
and WMC (Weighted Methods per Class).
Metrics: CR – Comments Rate, LC – Lack of Comments
It is obvious that comments in code help to understand it. Hence metric CR (Comments
Rate) is a good candidate for the analyzability. CR is a ratio of the sum of all kinds of
comments (full-line comments, part-line comments, JavaDoc, etc.) to LOC (Lines Of
Code). CR is easy to calculate and interpret. However, many comments are created
automatically and do not provide any additional information about the functionality.
Noteworthy, these comments help to better lay out the source code and make it more
readable, but do not help the maintainer in understanding the code. Additionally, a
piece of code could be commented out and will be counted as the comment. Through
the modern systems for versioning, many developers leave such fragments in the code.
In this case CR can reach 70 – 80 %, what is much overstated.
The metric which takes into account such “comments” and automatically generated
comments, is no more trivial. Therefore CR should be considered critically and the
maintainer should understand, that the CR could be overstated.
During experiments it was detected that interfaces and abstract classes have very high
amount of comments and only few LOC. Hence many interfaces and abstract classes
increase overall percentage of comments.
Noteworthy that CR is the only the metric in the quality model, which values become
better with increasing. All other metrics should be minimized. Thus one new metric is
suggested. LC (Lack of Comments) indicates deficiency of CR to the optimal value and
is calculated using following approach: LC = 100 – Median-CR. Since CR is a percentage
measure, the arithmetic mean must not be used.
The difference should be calculated for the aggregated value for the entire system. This
substitution will not worse numerical properties of the metric, since CR already has
relatively bad numerical properties (see chapter 6). Now all metrics in the quality model
should be minimized.
Metric: CLON – Clonicity
The code cloning or the act of copying the code fragments is a widespread technique for
implementation and the acceleration of the development should not be underestimated.
But cloning is also a well known problem for the maintenance. Clones increase work
and cognitive load for maintainers because of many reasons [RIEG05]:
The amount of code, that has to be maintained, is increased
When maintaining or enhancing a piece of code, duplication multiplies the work
to be done
Since usually no record of the duplications exists, one cannot be sure that a
defect has been eliminated from the entire system without performing a clone
analysis
If large pieces of software are copied, parts of the code may be unnecessary in the
new context. Lacking a thorough analysis of the code, they may however not be
26
identified as such. It may also be the case that they are not removable without a
major refactoring of the code. This may, firstly, result in dead code, which is
never executed, and, secondly, such code increases the cognitive load of future
maintainers
Larger sequences repeated multiple times within a single function make the code
unreadable, hiding what is actually different in the mass of code. Code is then
also likely to be on different levels of detail, slowing down the process of
understanding
If all copies are to be enhanced collectively at one point, the necessary
enhancements may require varying measures in the cases where copies have
evolved differently. As an extreme case, one can imagine that a fix introduced in
the original code actually breaks the copy
Exact and parameterized clones are distinguished. Finding of exact clones is easier and is
language independent. Parameterized clones are more difficult to find, but
emphatically more helpful, because clones are often insignificantly changed already by
coping. In [RYSS] various techniques for clone finding are classified. These techniques
can be roughly classified into three categories:
string-based, i.e. the program is divided into a number of strings (typically lines)
and these strings are compared against each other to find sequences of
duplicated strings
token-based, i.e. a lexer tool divides the program into a stream of tokens and
then searches for series of similar tokens
parse–tree based, i.e., after building a complete parse-tree one performs pattern
matching on the tree to search for similar sub–trees. Parse-tree based technique
was considered also during ASP project.
Choosing of the technique should be made according to the goal of the measurement.
The finding of all possible clones for next audits preferably will use the token or parsetree based technique. In context of this thesis it is more interesting to know only
approximately number of clones and thus the simple and quick string-based technique
can be used. The next important property of string-based technique is language
independency, since the ABAP and Java environments are considered.
As most important indicator is suggested the metric CLON (Clonicity), which is a ratio
of LOC in all detected clones to the Total-LOC. This metric should give an idea about
usage of the copy-paste in development process and consequently about redundancy of
the final product.
Short Introduction into Information Theory and Metric: CDEm – Class
Definition Entropy (modified)
Methods for describing of complexity
There are many methods, which allow describing the complexity of the system. Only
few of them are listed below (partially taken from [CART03]):
Human observation and (subjective) rating. The weakness of such evaluation is
too subjective manner and the required human involvement.
27
Number of parts or distinct elements. Nevertheless, size and complexity are truly
two different aspects of software. Despite of this fact traditionally various size
metrics have been used to help indicate the complexity. However many such
metrics are size-dependent and don’t allow the comparing of systems of different
size. It is also not always clear, what should be counted as a distinct part.
Number of parameters controlling the system. Here the same comments as by
number of parts can be applied.
Minimal description in some model/language presents some kind of abstraction.
Obviously a system, which has smaller minimal description, is easier, than a
system with larger minimal description. In this method a model (a description)
includes only relevant information, thus the redundant information, which
intensifies size without incrementing complexity, is avoided.
Information content (how is defined/measured information?)
Minimal generator/constructor (what machines/methods can be used?)
Minimum energy/time to construct. Several experts argue that the system,
which needs more time to be designed (implemented), is more complex.
Obviously, the study of complex systems is going to demand that the analyst uses some
kind of statistical method. Next, after short introduction into information theory,
entropy-based metrics for supporting of some above mentioned methods are discussed.
Information
Remark: all following theses are considered in terms of the probability.
Consider a process of reading of a random text, whereas it is supposed that alphabet is
initially known to the reader. The reading of each next symbol can be seen like an event.
The probability of this event depends on the symbol and its place within the text.
Examine a measure related to how surprising or unexpected an observation or event is
and let’s call this measure information. Thus the information, which is gotten from each
new symbol, is in this context an amount of new knowledge, which reader gets from
this symbol. It is obvious that information inversely related to the probability of the
event: if the probability of occurrence of “i” is small, reader would be quite surprised if
the outcome actually was “i”. Conversely, if probability of certain symbol is high (for
example the probability of occurrence of the symbol “i” after “t” in the word
“information” tend to 1), reader will not get much information form this symbol.
Let’s describe the information measure more scientifically. For that Shannon proposed
four axioms:
Information is a non-negative quantity: I(p) >= 0
If two independent events occur (whose joint probability is the product of their
individual probabilities), then the information reader gets from observing the
events is the sum of the two informations: I(p1*p2) = I(p1)+I(p2)
I(p) is continuous and monotonic function of the probability (slight changes in
probability should result in slight changes in information)
If an event has probability 1, reader gets no information from the occurrence of
the event: I(1) = 0
28
Deriving from these axioms one can get the definition of information in terms of
probability: I(p) = −log2(p). More detailed description of this derivation can be found in
[CART03] or [FELD02]. Index 2 means binary character of the events. In this case units
for the information are bits. However other indexes are also possible.
Entropy
Each symbol in the text brings different amount of information. Interesting would be
the average amount of information within the text. For this propose term entropy is
introduced. After simple transformation the following expression for entropy can be
derived:
Noteworthy, that H(P) is not a function of X. It is a function of the probability
distribution of the random variable X. Entropy has the following important property:
0<=H(P)<=log(n). H(P) = 0, when exactly one of the probabilities is one and all the rest
are zero (only one symbol is possible). H(P) = log(n) only when all of the events have
the same probability 1/n. It is surely wanted to maximize H by a uniform distribution.
Everything is equally likely to occur - you can't get much more uncertain than that.
Since maximal possible entropy is known, the normalized Entropy can be introduced:
It is important, since entropy is project size dependent. Remarkable, that entropy
logarithmic depends on size: doubling of the size increments maximal entropy by one
point.
Next some possible interpretations of entropy are listed:
Entropy of a probability distribution is just the expected value of the information
of the distribution
Entropy is also related to how difficult it is to guess the value of a random
variable X [FELD02, p.p. 5-7]. One can show that H(X) <= Average # of Yes-No
Questions to Determine X <= H(X) + 1
Entropy indicates the best possible compression for the distribution – average
number of bits needed to store the value of the random variable X. Noteworthy
that entropy suggests only theoretical basis, some practical algorithm should be
used for the actually coding (for example Huffman codes).
Next, some applications of entropy for the software measurement are discussed.
Average Information Content Classification
In [ETZK02, p 295] the work of Harrison is mentioned. Harrison and other scientists
proposed to extend Halstead's number of operators and measure distribution of
different operators and operands within a program. It should allow assessing the
analyzability of one single chunk of code. However such method is not very useful
since the main complexity is contained within user-defined strings.
Remarkable, that syntactical rules in languages decrease entropy. For example it is not
possible to have 2 operands without an operator in between and the compiler takes care
for it. Hence the probabilities of the occurrence of the operands in the syntactical correct
29
programming text depend on the syntactical rules. Consequently, the entropy of the
syntactical correct program will never reach maximum and normalizing with respect to
the syntactical rules becomes much more difficult.
Metric: CDEm - Class Definition Entropy (modified)
This metric reduces alphabet for text to the user-defined strings used in a class, because
these contain the most part of the complexity. Examples of the user-defined strings are:
Names of classes, attributes and methods
Package and class names within an import section
Types of public attributes and types of return values for methods
Types of parameters
Method calls, etc
By such restriction one can get another level of granularity. Let’s illustrate this metric by
an example. Consider a maintainer seeking throw the source code. How surprised
would be the maintainer, if he sees a reference to other class?
Suppose that:
Maintainers work easily if they confront with the same object again and again
Maintainers work difficult if they often should analyze new unknown objects
Consider two programs presented in figure 5.1. Assume that the maintainer has to fix
two faults in the modules B and C. Both modules the use functionality provided by the
modules A and E. In the first program the module A plays a role of an interface and the
maintainer can work easy, because he has to keep in mind only one pattern for
collaboration. In the second program the modules B and C have references to different
modules, thus such model is more multifarious and more difficult to comprehend.
Figure 5.2 show different patterns for frequency of occurrence of module names in an
abstract program. The frequent used modules play a role of interface for its contained
package. Entropy of frequency distribution is an indicator for evidence of interfaces.
Positively P1 has less value for entropy than P2. Noteworthy that other metrics will
show that P2 is much easier to comprehend: less coupling between modules – less
complexity.
Consequently high entropy for distribution of the user-defined strings will indicate the
difficult comprehensible text.
Different variances of this metric have been proposed. A very simple and intuitive
variant is an analysis of the import-section only and calculation of the distribution of
occurrences of class names in the import-sections. The classes, which occur most often
in the import-sections, are also often used from outside of the package, where they are
defined, and thus are the interface of this package. Clear (small) package interfaces are
indicator of the good design. This metric is called CDEm – Class Definition Entropy
(modified). Reader interested in other implementations of this kind of metric are
referred to [ETZK99], [ETZK02] and [Yi04]. Incidentally some entropy-based metrics
use also semantic analysis to improve its pronouncement.
30
P1
P2
H(P1) = 2,37
F
F
E
E
A
B
H(P2) = 2,5
C
D
A
B
C
D
Figure 5.1: The uniform (left) and the multifarious pattern for communication between modules
Frequency diagram for user-defined strings
Frequency /
Probability of
occurrence
Frequency /
Probability of
occurrence
ABCDEF . . . Z
Name of
module
ABCDEF . . . Z
Name of
module
The system with pronounced interfaces
The system with ulterior interfaces
Figure 5.2: The evidence of classes, which play the role of interface for the packages
For the calculation of CDEm two programs were developed: Class Entropy.java
prepares the list of all classes in the project and after that seeks the source code in order
to find references on classes from this list. Next, the list is filled up with data about
frequencies and entropy is calculated.
Class EntropyImport.java doesn’t have a list of classes to be found, this tool seeks
source files and calculates entropy of import-clauses. In this case list of user-defined
strings (import clauses) is prepared dynamically.
It is argued that both tools measure the same aspect, since the results of both tools are
correlated. Since entropy based on analysis of import section is easier, it will be used for
the further research.
As very initial indicator of entropy the coefficient of compression (for example of ZIP
archive) can be taken. The author believes that ZIP practically implements algorithm,
which with its ZIP-coefficient of compression (ZCC) tends to the best possible
31
compression defined by means of entropy. However, ZIP works with symbols, while
CDEm works with tokens.
ZCC = size of the ZIP-archive / size of the project before the compression. Thus, high
ZCC indicates high complexity of the project, low ZCC indicates simple project with
high redundancy. High CDEm indicates complex design, low CDEm indicated simple
design. The next simple experiment tries to find out whether these two metrics are
correlated and proves whether there is a correlation between CDEm and ZIPcoefficient. Figure 5.3 shows dependency between ZIP-compression coefficient and
import-based CDEm. Input for this experiment were examples of code described in
chapter 8.
CDEm (%)
Correlation between ZCC and CDEm
89,0
0,29; 88,6
88,0
87,0
0,30; 87,1
0,28; 85,9
86,0
85,0
84,0
0,25; 83,3
0,32; 84,2
0,21; 83,0
83,0
82,0
0,17
0,18; 82,4
0,20
0,21; 82,5
0,23
0,26
0,29 ZCC 0,32
Figure 5.3: ZCC and CDEm do not have evident dependence
During this experiment 4 pairs of projects were analyzed, whereas each pair presents
two versions of the same project – old and new. Each newer project is supposed to have
better values than older one. In figure 5.3 arrows connecting two measurement points
indicate trend of values within one project. Since directions of the arrows are quite
different one can say about absence of any connection between these metrics. Overview
of all considered projects is also shown in table 5.1. Trend of metrics (improvement - or degradation - ) is shown using arrows. According to expert’s opinion all newer
version should show improvement, however the metrics often show opposite results.
Nevertheless, ZCC measures not pure entropy of the code, but also entropy of
comments, most of which are generated or consist of the same predicates.
Consequently, ZCC shows lower values that the entropy actually is. More accurate
experiment should exclude comments before compression. Besides, high CLON can
cause low ZIP-coefficient values as well.
32
Table 5.1: Dependence between ZCC and CDEm
Metric
Size on disk
Size of ZIP
archive
ZCC
CDEm (%)
ObjMgr ObjMgr SLDClient SLDClient JLin 630 JLin dev
old
new
old
new
old
new
788202 833213
1916940
1454570 1896690 1822725
144018
0,18
82,4
209549
0,25
83,3
409760
0,21
82,5
300455
0,21
83,0
522499
0,28
85,9
590507
0,32
84,2
Mobile
Client
7.0
722159
Mobile
Client
7.1
1725178
216222
0,30
87,1
503508
0,29
88,6
The analysis of the examples also shows that many developers use “*” in import
sections. Such inaccurate definition leads to inexact CDEm calculation.
Hence the contradiction between ZCC and CDEm is most probably caused by not
proper computation of the metrics. The author argues that more accurate experiments
should be made in order to ascertain ability of these metrics to predict the
maintainability.
Noteworthy, that some peculiar properties of the software design can influence CDEm.
For example the project Ant (www.apache.org) has very low value for CDEm, because
almost every class uses the following classes: BuildException, Project, Task and some
others. Such distribution of the user-defined strings leads to the underestimation of the
entropy.
Complexity of the development process
Interesting approach is suggested in [HASS03] by Hassan. He argues, that if developers
should intensively modify many files at the same time, it increases cognitive loading
and can also cause problem for managers. Such strategy can also lead to the bad quality
of the product. As measurement for the chaos of software development was suggested
an entropy-based process metric. Input data for the measurement is a history of code
development. Time is divided in periods and for each period is calculated the frequency
of changes for each source file (see illustration in figure 5.4). Through main property of
the entropy it will maximize for uniform distribution. Thus high entropy for the
distribution of the source code changes indicates situation, when many files are
changed very actively. Low entropy shows normal development process, when only
few files are edited actively and the rest is kept untouched or is changed very
insignificantly. Evolution of entropy during the development process is illustrated in
figure 5.5.
Hence high entropy can warn the project manager about insufficient organization of the
development process.
The next entropy-based metric is discussed in sub-chapter “Metric: m – Structure
Entropy” after introduction of an appropriate model.
As short conclusion to the usage of the information theory in the software measurement
one can say: it is powerful non-counting method for describing of the semantic
properties of the software, but before one can use it, more experiments with exact and
perceptive tools should be made.
33
Figure 5.4: The Entropy of a Period of Development [HASS03, p. 3]
Figure 5.5: The Evolution of the Entropy of Development [HASS03, p.4]
34
Model: Flow-graph
The flow-graph model represents intra-modular control complexity in form of a graph.
The flow-graph consists of edges and nodes, whereas nodes represent operators and
edges represent possible control steps between the operators. A flow-chart is a kind of
flow-graph, where decision nodes are marked out with a different symbol. Figure 5.6
provides an example of both notations. A region is an area within a graph that is
completely bounded by nodes and edges.
Nodes
Nodes
11
Edges
Edges
11
2,3
2,3
22
66
33
44
66
88
77
77
55
R1
R1
R2
R2
4,5
4,5
88
R4
R4
99
10
10
99
10
10
11
11
R3
R3
Regions
Regions
11
11
Flow
Flow
FlowChart
Chart
FlowGraph
Graph
Figure 5.6: Example of Flow-graph and corresponding flow-chart [ALTU06, p. 15]
Metric: MCC – McCabe Cyclomatic Complexity
The cyclomatic number from graph theory presents the number of regions in the graph
or number of linearly independent paths in the graph. Initially McCabe suggested using
cyclomatic number to assess the number of test cases needed to sufficient testing of the
module and called this metric MCC (McCabe Cyclomatic Complexity). Since all
independent paths through the module should be tested independently, it is
recommendable to have at least one test case for each path within the module. Thus
MCC presents minimal number of test cases for the sufficient test coverage.
However, later this metric was suggested for assessment of comprehension complexity
and now MCC is also used as recommendation for the modularity in development
process. Empirical researches have showed that probability of the fault increases in
modules with MCC>10. Thus it is recommendable to split modules with MCC>10 into
several modules. Many experts argue, that a lot of decision operators (IF, FOR, CASE,
etc.) increases the algorithm complexity of the module. It is obvious that the program,
where all operations are made sequentially, is easy independent of its size.
Consequently, MCC is included in the quality model in the areas Analyzability and
Testability.
35
One possible way to calculate MCC is: MCC = E - N + 2, where E = number of edges
and N = number of nodes. It has also been shown that for a program with binary
decisions only (all nodes have out-degree <= 2), MCC = P + 1, where P is number of
predicated (decision) nodes (operators: if, case, while, for, do, etc.).
Usage of MCC in the object-oriented environment
This intra-modular metric can be used both in procedural and in object-oriented
context. However, the usage of this popular metric in the object-oriented context has
some peculiarities. Usual object-oriented programs show understated values of MCC,
because up to 90% of methods could have MCC = 1. In [SER05] is hypothesized that
part of the complexity is hidden behind object-oriented mechanisms such as
inheritance, polymorphism or overloading. These mechanisms are in fact the hidden
decision nodes. A good illustration of this phenomenon applied to overloading could be
the following example:
Listing 5.1: Illustration of the hidden decision node in case of the overloading
class A{
method1(int arg){};
method1(String arg){}
};
…
public A a;
a.method1(b);
Hidden in the last statement decision node could be represented in procedural way by
proving the type of the argument and calling the corresponding method. Additional
decision nodes for polymorphism and inheritance could be presented similar. The
hypothesis is: the less OO mechanisms are used, the more complex methods should be.
Experiment described in [SER05] tried to find inverse correlation between inheritance
factor (Depth in Inheritance Tree) and MCC, but didn’t show significant results.
Nevertheless polymorphism or overloading could be a better factor to correlate with.
Additional experiments are needed.
Since MCC can be calculated only for the single chunk of code, in OO-environment one
further metric is introduced in order to aggregate values and present metric for entire
class, see metric WMC (Weighted Methods per Class) for more details.
Model: Inheritance Hierarchy
For the next group of metrics different models of an inheritance hierarchy can be used.
Empirical objects for all these models are classes and interfaces, which are connected
into hierarchies using “extends” or “implements” relation:
In the Simple Inheritance Hierarchy nodes present classes and edges present
inheritance connections between classes, whereas only simple inheritance is
possible
The Extended Inheritance Hierarchy can have more than one root and allows
multiple inheritances. Such edges present “implements”- connections between
36
the interface and the class, which implements this interface. The Extended
Inheritance Hierarchy is a directed acyclic graph with no loops
The Advanced Inheritance Hierarchy supplements the Extended Inheritance
hierarchy by adding attributes and methods for each class
Because in ABAP and Java interfaces are widely used and they have great impact on the
analyzability and changeability, the Simple Inheritance Hierarchy was rejected. On the
other side the very detailed level of granularity, provided by the Advanced Inheritance
Hierarchy, is not very useful in context of the maintainability. Consequently, the
Extended Inheritance Hierarchy is chosen as most appropriate basis for the
maintainability metrics. For this model the following metrics were proposed:
Chidamber and Kemerer proposed the Depth of Inheritance Tree (DIT) metric, which is
the length of the longest path from a class to the root in the inheritance hierarchy
[CHID93 p.p. 14-18] and the Number of Children (NOC) metric, which is the number of
classes, that directly inherit from a given class [CHID93 p.p. 18-20].
Later, Li suggested two substitution metrics: the Number of Ancestor Classes (NAC)
metric to measure how many classes may potentially affect the design of the class
because of inheritance and Number of Descendent Classes (NDC) metric to measure
how many descendent classes the class may affect because of inheritance. These two
metrics are good candidates for the quality model in the areas Analyzability and
Changeability respectively and will be discussed in more details.
Metric: NAC – Number of Ancestor Classes
This metric indicates the analyzability from the viewpoint of inheritance and in general
holds: the deeper a class is placed in the hierarchy, the more ancestors has the class and
the more additional classes have to be analyzed and understood by the developer in
order to understand the given class. It is also can be shown that a class with high NAC
implements more complicated behavior. Several guides recommend avoiding classes
with DIT more than 3 or NAC more than 6.
Metric: NDC - Number of Descendent Classes
This metric shows the changeability of the class and means how many classes could be
affected by changing a given class by the developer.
Noteworthy that experiments with Chidamber and Kemerer metrics set [BASI95] show
that larger NOCs correlated with smaller defect probability. It can be explained by the
fact that classes with many subclasses are the subject of much testing and most of error
are found by the implementation of subclasses.
“Inheritance introduces significant tight coupling between super classes and their
subclasses” [ROSE, p.5]. Thus importance of NAC and NDC is high.
Geometry of Inheritance Hierarchy
Above the metrics NAC and NDC were introduced and it was shown that they are
good descriptors of a single class. In this subsection author tries to use these metrics to
describe the entire inheritance hierarchy.
Let’s try to classify inheritance hierarchies into subtypes based on geometrical
properties. The most important geometric characteristics are the width and weight
37
distribution. The width is a ratio of super-classes to total number of classes. An indicator
of width is U metric (Reuse factor), where U = super-classes / classes = (CLS - LEAFS) /
CLS. A super-class is a class that is not a leaf class. U measures reuse via inheritance.
The high U indicates a deep class hierarchy with high reuse. The reuse ratio varies in
the range 0 <= U < 1.
The weight distribution means tendency to where main functionality is implemented.
However there is no appropriate metric for the weight distribution. The best way to
indicate weight distribution is a histogram, where vertical axis represents DIT and
horizontal axis represents the number of classes, number of methods or sum of WMC.
Figure 5.8 depicts an example of top-heavy hierarchy, as functionality indicator the
metric WMC is selected.
Different designs of inheritance hierarchy are presented in figure 5.7. The next
experiment tries to estimate best geometry for inheritance hierarchy from viewpoint of
the maintainability using metrics NAC and NDC.
Figure 5.7: Types of Inheritance Hierarchies.
First of all, values of metrics are calculated for each class and then aggregated using
arithmetic mean. Let’s try to estimate the analyzability and changeability for each type
of hierarchy based on the average values. The comments can be also seen in the figure
5.7.
38
DIT
Distribution of Average WMC in Levels of Inheritance Tree
6
15,12
5
21,21
4
27,11
3
69,12
2
75,43
1
44,53
0
10
20
30
40
50
60
70 Ave-WMC
Figure 5.8: Weight distribution
Top-heavy hierarchies maybe not take advantage of reuse potential. Ultimately here the
design is discussed from viewpoint of the maintainability. Top-heavy means, that the
classes with main functionality are placed near the root, hence such a hierarchy should
be easy to understand, because the classes have small number of ancestors. However, if
classes have large number of descendents they are difficult to change.
Bottom-heavy hierarchy is easy to change, because many classes have no children.
Narrow bottom-heavy designs are difficult to understand because of many unnecessary
levels of abstraction.
Nevertheless this consideration has several problems:
Though the metrics NAC and NDC seem to be comprehensive, mean values of
this metrics are fungible and yield same numerical values. Ave-NAC can be
calculated as the number of descendent-ancestor relations divided by the
number of classes. Ave-NDC can be calculated as the number of ancestordescendent relations divided by the number of classes. Because each descendentancestor relation is the reversed ancestor-descendent relation, Ave-NAC = AveNDC. The numbers in figure 5.7 confirm it. Noteworthy, that the metrics DIT and
NOC have the same property, applied to simple inheritance hierarchy: Ave-DIT
= Ave-NOC. Therefore the aggregated values for these metrics are redundant
In some cases the metrics NAC and NDC cannot distinct between different types
of hierarchies, in the given example top-heavy narrow hierarchy has
approximately equal values as a bottom-heavy wide hierarchy with the same
number of classes. To distinct different designs additional metric is needed
In common it is not possible to assess the maintainability based on the
geometrical properties of the hierarchy, because it is more important to know
how the inheritance is used
Some experts mean, that the inheritance hierarchy should be optimized, for example
using balanced trees. However the theory of balanced trees is intended for the search or
change operation, such a tree will be always wide and bottom-heavy, because 50% of
the nodes are leafs. Thus such optimization is misleading for the maintainability’s goals.
39
Many experts agree that the inheritance is a very important attribute of the software,
which has impact also on the maintainability. However, there are different points of
view, some experts recommend using a deep hierarchy, others prefer a wide hierarchy.
Nevertheless author does not see any possibility to assess the entire inheritance
hierarchy from maintainability’s point of view using the metrics.
Consequently, the suggestion is to use metrics NAC and NDC for finding of classes,
which could be difficult to maintain because of erroneous usage of the inheritance. A
simple example of the audit is the report, which includes all classes with more than 3
super-classes or more than 10 sub-classes.
Metric: IF – Inheritance Factor
The metric IF (Inheritance Factor) shows the percentage of classes that belong to any
inheritance hierarchy. A stand-alone class doesn’t belong to any inheritance hierarchy
and thus doesn’t have any ascendant or descendant classes. Localizing of the faults in
the large stand-alone classes is difficult, because such class implements accomplished
functionality and is large. Classes, which belong to an inheritance hierarchy, provide
only fractional functionality and thus it is relative easy to find, which part has caused
fault, irrespective of the size. Additionally, classes within an inheritance tree can be
maintained using inheritance concept and so new functionality can be added with
preserving of the old functionality.
Model: Structure Tree
This model is presented by a directed graph composed of two different types of nodes:
leaf nodes and interior nodes; and two different types of edges: structural edges and
dependency edges.
A leaf node corresponds to a function module, global variables, program or form for
ABAP; method or public attributes for Java.
An interior node corresponds to either:
an aggregate of the leaf nodes (function pool, program or class)
an aggregate of other interior nodes (directory or package)
Structural edges, attached to interior and leaf nodes, create a tree that corresponds to
the package and file structure of the source. Note that a structural edge may connect
two interior nodes, or an interior node with a leaf node, but may never connect two leaf
nodes. Figure 5.9 (see p. 44) shows an example of the structure tree. In this example the
system has a package A, which has two classes (B and C). Points marked with small
letters (leafs) are methods or attributes. Doted edges between leafs are dependency
edges and represent calls.
Metric: CBO - Coupling Between Objects
The coupling is a quality, which characterizes the number and strength of connections
between modules. In the scope of maintainability, the software elements A and B are
interdependent if:
Some change to A requires a change to B to maintain correctness, or
40
Some change elsewhere requires both A and B to be changed
Obviously, the first case is much easier to find.
In general, objects can be coupled in many different ways. The next list presents several
important types of coupling, resulted from theoretical speculations of the author and
partially taken from [ZUSE98]:
By content coupling one module directly references the code of another module.
This type of coupling is very strong because almost any change in the referred
module will affect the referring module. In Java such type of coupling is
impossible, in ABAP is implemented through the INCLUDE directive
By common coupling two modules share a global data structure. In ABAP is most
commonly used in the DATA DICTIONARY. Such coupling is not very
dangerous, because data structures are changed very seldomly
By external coupling two modules share a global variable. This coupling deserves
attention, because excessive usage of global variables can lead to maintenance
problems. To handle with the external coupling a metric GVAR (number of
global variables) is suggested. However this metric is duplicated in metrics FANIN and FAN-OUT and thus rejected from further investigation (see p. 54)
Data coupling is the most commonly used and unavoidable. In his work Yourdon
stated that any program can be written using only data coupling
[ZUSE98, p. 524]. Two modules are data coupled if one calls the other. In objectoriented environment there are even more possibilities to use the data coupling:
o Class A has a method with local variable of type B
o Class A has a method with return type B
o Class A has a method with argument of type B
o Class A has an attribute of type B
There are several metrics for data coupling: FAN-IN, FAN-OUT for procedural
and RFC (Response For a Class), CBO for object-oriented environment. For the
metrics FAN-IN, FAN-OUT see section “Structure Chart” (p. 54)
Inheritance coupling appears in an inheritance hierarchy between classes or
interfaces. The metrics for this type of coupling have been discussed in the
previous section
Structural coupling appears between all units, which are combined together in a
container. For example all methods within a class are structural coupled into the
class; all classes within a package are coupled into the package. In order to
qualify such coupling the term cohesion is introduced in one of the next sections
(p. 44)
Logical coupling is unusual coupling, because modules are not coupled physically,
but changing of one will cause changing of another. Since there is no
representation of such coupling in the source code, the logical coupling is very
difficult to find out. Reader interested in this type of coupling can find reference
to the research of logical coupling at the end of chapter 10 (p. 91)
Indirect coupling. If class A has direct references to A1, A2, …, An, then class A has
indirect references to those classes directly and indirectly referenced by A1, A2,
…, An. In this thesis (except inheritance) only direct coupling is considered
41
Content, common, logical and indirect coupling are not considered in this thesis,
structural coupling in form of cohesion of methods is discussed in one of the next
sections. The metrics for inheritance coupling have been discussed in the previous
section.
Coupling Between Objects (CBO) is the most important metric for data coupling in the
object-oriented paradigm. “CBO for a class is a count of the number of other classes to
which it is coupled” [CHID93, p. 20]. However it would be more precise to call this
metric Coupling Between Classes, because at the time of the measurement are no
objects created yet.
“In order to improve modularity and promote encapsulation, inter-object class couples
should be kept to a minimum. The larger the number of couples, the higher the
sensitivity to changes in other parts of the design, and therefore maintenance is more
difficult” [CHID93, p. 20]. Also the class with many relations to other classes is difficult
to test. Hence CBO impacts the Changeability and the Testability.
Nevertheless, CBO can indicate the Analyzability as well, but the RFC metric indicates
it more precisely.
Metric: RFC - Response For a Class
The response of a class is a number of methods that can potentially be executed in
response to a message received by an object of that class and can be expressed as
number of public methods of the class and sum of number of methods called by
methods of given class. Example of calculation of the RFC is shown in listing 5.3. For
more details see [CHID93, p.p. 22-24]. However, some implementation of the RFC (for
example Borland Together) counts private methods as well.
The class can implicit call the methods of its ancestors, for example in constructor, but
in this case the constructor will not be called from outside, thus only explicit calls of
foreign methods will be counted in this metric.
This metric shows the analyzability of the class:
The class with more methods is more difficult to understand than the class with
fewer methods
The method, which calls many other methods, is more difficult to understand,
than a method calling fewer foreign methods
In Java it is possible to use enclosed method calls, the example is showed in the
following listing:
Listing 5.2: Example of enclosed method calls
a = b.getFactory().getBrige().getWidth(c.getXYZ, 15);
Such calls embarrass the understanding of the program repeatedly and should be
counted as separate method calls. Noteworthy in ABAP this is not possible.
42
Listing 5.3: Example of calculation of the metric RFC
public class RFCexample {
public ClassC c = new ClassC(); // constructor is not counted: RFC = 0
public int meth1() {// RFC = 1
int temp = 0;
temp += c.D();
// RFC = 2
temp += c.D();
// duplicate call: RFC = 2
return c.getClassB().
// RFC = 3
D() +
// RFC = 4
meth2();
// RFC = 5
};
private int meth2() {// private methods are counted: RFC = 6
return c.D(); // duplicate calls, which appear
// in different methods are counted: RFC = 7
}
}
“If a large number of methods can be invoked in response to a message, the testing and
debugging of the class becomes more complicated since it requires a greater level of
understanding required on the part of the tester” [CHID93 p.22].
From the definition of the RFC is clear that it consists of two parts: number of methods
within a class and number of calls of other methods. Hence, RFC correlates with NOM
and FAN-OUT, this has been shown in [BRUN04, p.9].
RFC is an OO-metric and corresponds to FAN-OUT in the procedural context.
Metric: m – Structure Entropy
An interesting metric was proposed by Hewlett-Packard Laboratories in [SNID01]. A
simplified version follows. The main question discussed in here is “how can you
measure the degree of conformance of a large software system to the principles of
maximal cohesion and minimal coupling?”
The input to the model is the source code of the system to be measured. The output is
the numeric measure of the degree of conformance
Before the model is created the following assumptions are supposed:
Since engineers work with source code when modifying a system, it is interesting
to analyze the structure of the application at the lexical level
It is more interesting to analyze the global relationships than local ones
The more dependencies a module has to other parts of the system, the harder it is
to modify
“Remote” dependencies are more expensive (in terms of comprehensibility) than
“local” dependencies (restatement of cohesion and coupling principle)
An example of the used model is depicted in figure 5.9. Here some calls are short
(within one class), others are middle (between classes) or long (between packages). In
43
agreement with assumptions, the system with many short calls and only few long has a
good design.
Let’s find an optimal method for describing of character of the calls. Initially each
dotted edge can be described by a pair of numbers: start leaf and end leaf. Therefore,
each leaf needs log2F bits, where F equals the number of leafs. For the description of
each call 2*log2F bits are needed. However it is possible to reduce number of bits, by
indicating a relative path for the end leaf. Hence short calls need shorter description
and long calls – longer. If one describes all calls of the system in such way, and calculate
average number of bits needed for each call one can gather about design of the system.
Higher number of bits needed for description of average relation indicates poor design.
System I
System II
Packages
AA
BB
aa
Classes
C
C
bb
cc
dd
AA
ee
Methods
and attributes
BB
aa
C
C
bb
cc
dd
ee
Figure 5.9: Example of structure tree
Nevertheless the analyst doesn’t need to actually code all these calls. The information
theory says that one can easily estimate average number of needed bits based on the
entropy. As probability basis for entropy one can use the frequencies of length of call.
To make matter worse, long calls can be additionally penalized by coefficients. For
entropy background see section “Introduction into Information Theory”. More detailed
description of this metric can be found in [SNID01, p.p. 7-9]. Here just one simple
example is given in order to illustrate ability of this metric.
Consider two small systems, depicted in figure 5.9. Both systems have equal number of
classes, methods (F=5) and calls (E=4). However most of the calls in the first system are
long, this disadvantage was fixed in the second system by better encapsulation: method
c provide an interface for attributes d and e in its class. Thus it is supposed, that the
second system is more maintainable, because of more easier and structured design.
According to formulas given in [SNID01, p.p. 7-9], the structure entropies of the given
systems are:
m(I) = - (3/5*log2(3/5) + 1/5*log2(1/5) + 0 + 1/5*log2(1/5))
+ 4/5 (¼*log2(5* 8/20) + ¾*log2(5* 12/20))
≈ 2,52
m(II) = - (2/5*log2(2/5) + 2/5*log2(2/5) + 1/5*log2(1/5))
+ 4/5 (¾*log2(5* 8/20) + ¼*log2(5* 12/20))
≈ 2,44
44
Hence, the second system needs fewer bits for its description and has fewer long calls.
Consequently metric m (Structural Entropy) can indicate the tendency of the system to
have short or long calls.
Metric: LCOM - Lack of Cohesion Of Methods
The cohesion is one of the structural properties of a class. The cohesion is the degree, to
which the elements in a class are logically related. Most often it is estimated by the
degree of similarity of functions provided by methods. With respect to object-oriented
design, a class has to consist only of methods and attributes, which have common
functionality. If the class can be split in parts without the breaking the intra-modular
calls, as it shown in figure 5.10, the class is supposed to be not cohesive.
Figure 5.10 Non-cohesive class can be divided in parts.
However, “coupling and cohesion are also interesting because they have been applied
to procedural programming languages as well as OO languages” [DARC05, p.28]. In
case of the procedural paradigm, the procedures and functions of the module should
implement a single logical function. Concern an example with a functions pool in
ABAP. It is an analogue of a class, it has internal global data (attributes) and functions
(methods). Noteworthy that by a call of one function from the pool the entire function
group is loaded into the memory. Consequently, if you create new function modules,
you should deliberate how they will be organized in the function groups. In one
function group you should combine only function modules, which use common
components of this function groups, so that the loading into the memory is not useless
(translation from [KELL01, p. 256]). Finally the low cohesion can indicate potential
performance problems. For the maintenance, the low cohesion means that the
maintainer has to understand the additional not related to the main part code, which
may be badly structured. This fact has an impact on the analyzability. Additionally, a
low cohesive component, which implements several different functionalities, will be
more affected by the maintenance, because the changing of one logical part of the
component can destroy other parts. “Components with low cohesion are modified more
often since they implement multiple functions. Such components are also more difficult
to modify, because a modification of one functionality may affect other functionalities.
Thus, low cohesion implies lower maintainability. In contrast, components with high
cohesion are modified less often and are also easier to modify. Thus, high cohesion
implies higher maintainability” [NAND99]. This fact has impact on the changeability.
45
High cohesion indicates good class subdivision. The cohesion degree of a component is
high, if it implements a single logical function. Objects with high cohesiveness cannot
be split apart.
Lack of cohesion or low cohesion increases complexity, thereby increasing effort to
comprehend unnecessary parts of component. Classes with low cohesion could
probably be subdivided into two or more subclasses with increased cohesion. It is
widely recognized that highly cohesive components tend to have high maintainability
and reusability. The cohesion of a component allows the measurement of its structure
quality.
“There are at least two different ways of measuring cohesion:
1. Calculate for each attribute in a class what percentage of the methods use that
attribute. Average the percentages then subtract from 100%. Lower percentages mean
greater cohesion of data and methods in the class.
2. Methods are more similar if they operate on the same attributes. Count the number of
disjoint sets produced from the intersection of the sets of attributes used by the
methods” [ROSE, p.4].
In [BADR03] most used metrics for cohesion are shortly described, see brief definitions
in table 5.2.
Metrics for cohesion are not applicable for classes and interfaces with:
no attributes
one or no methods
only attributes with get and set methods for these (data-container classes)
abstract classes
numerous attributes for describing internal states, together with an equally large
number of methods for individually manipulating these attributes
multiple methods that share no variables but perform related functionality. Such
situation can appear because of usage of several patterns
Classes, where calculation of cohesion is not possible, are accepted as cohesive.
To overcome these limitations the following various implementations of the LCOM
metric are possible:
Regarding inherited attributes and/or methods in the calculation or not
Regarding constructor in the calculation or not
Regarding only public method or all methods in the calculation
Regarding get and set methods or not
These implementations are independent of which definition is used. According to the
recommendation from [ETZK97], [LAKS99] and [KABA] and theoretical speculation the
following options were selected:
Inherited attributes and methods are excluded from calculation
Constructors are excluded from calculation
Get and set methods are excluded from calculation
Methods with all types of visibility are included into calculation
46
It is also possible to find and remove all data-container classes from research of
cohesion. It can be easily made by an additional metric NOM (Number Of Methods). In
case of the data-container class NOM=WMC.
Table 5.2: The major existing cohesion metrics [BADR, p. 2]
Metric
Description
LCOM1
The number of pairs of methods in a class using no attribute in common.
LCOM2
Let P be the pairs of methods without shared instance variables, and Q be
the pairs of methods with shared instance variables. Then LCOM2 = |P| |Q|, if |P| > |Q|. If this difference is negative, LCOM2 is set to zero.
LCOM3
The Li and Henry definition of LCOM. Consider an undirected graph G,
where the vertices are the methods of a class, and there is an edge
between two vertices if the corresponding methods share at least one
instance variable. Then LCOM3 = |connected components of G|
LCOM4
Like LCOM3, where graph G additionally has an edge between vertices
representing methods Mi and Mj, if Mi invokes Mj or vice versa.
Co
Connectivity. Let V be the vertices of graph G from LCOM4, and E its
LCOM5
edges. Then
Consider a set of methods {Mi} (i = 1, … , m) accessing a set of instance
variables {Aj} (j = 1, …, a). Let µ (Aj) be the number of methods that
reference Aj. Then
Coh
TCC
LCC
DCD
DCI
Cohesiveness is a variation on LCOM5.
Tight Class Cohesion. Consider a class with N public methods. Let NP be
the maximum number of public method pairs: NP = [N*(N – 1)]/2. Let
NDC be the number of direct connections between public methods. Then
TCC is defined as the relative number of directly connected public
methods. Then, TCC = NDC / NP.
Loose Class Cohesion. Let NIC be the number of direct or indirect
connections between public methods. Then LCC is defined as the relative
number of directly or indirectly connected public methods.
LCC=NIC/NP.
Degree of Cohesion (direct) is like TCC, but taking into account Methods
Invocation Criterion as well. DCD gives the percentage of methods pairs,
which are directly related.
Degree of Cohesion (indirect) is like LCC, but taking into account
Methods Invocation Criterion as well
47
In [LAKS99] and [ETZK97] various implementations of the cohesion metrics (LCOM2
and LCOM3) are compared on C++ code example classes. Best results show the
following metrics:
LCOM3, which did not include inherited variables, and that did include the
constructor function in the calculations [ETZK97], [LAKS99]
LCOM3 with consideration of inheritance and constructor [LAKS99]
The metrics LCOM5 and Coh are not robust and are rejected from the further
investigation. The next simple example presented in table 5.3 shows this.
Table 5.3: Example for LCOM5 and Coh
A1 A2 A3 A4 A5
M1
+
M2
+
M3
+
+
+
M4
+
+
M5
µ
2
2
2
+
+
+
2
2
Obviously, the class is relative cohesive – all pairs of method have one common
variable, but the metrics show the opposite.
LCOM5 = ((1/a) Σ µ (Aj) – m) / (1 – m) = (10 / 5 – 5) / (1 - 5) = 0,75
Coh = Σ µ (Aj) / (m*a) = 10 / 5*5 = 0,4
In [BADR03] experts argue that methods can be connected in many ways:
Attributes Usage Criterion – two methods are connected, if they use at least one
attribute in common.
Methods Invocation Criterion – two methods are connected, if one calls other
Only three metrics (LCOM4, DCD, and DCI) consider both types of connections, all
other metrics consider only attribute connection.
The metrics have different empirical meaning:
Number of pairs of methods (LCOM1, LCOM2)
Number of connected components (LCOM3, LCOM4, Co)
Relative number of connections (TCC, LCC, DCD, DCI)
Most logically and interesting for the goals of this thesis is the number of connected
components, this could be interpreted as number of parts, in which the class could be
split.
Noteworthy, that the values of normalized metrics (TCC, LCC, DCD, DCI, Co) are
difficult to aggregate for representing of the result for entire system because averaging
of the percentages leads to value with bad numerical and empirical properties. For
more precise results the weighted mean value should be used. In case of size-dependent
metrics (LCOM1, LCOM2, LCOM3, LCOM4) simply average value can be used.
48
Hence LCOM4 is the most appropriate metric. Basically it is the well-handled metric
LCOM3 extended by the methods invocation criterion.
“A non-cohesive class means that its components tend to support different tasks.
According to common wisdom, this kind of class has more interactions with the rest of
the system than classes encapsulating one single functionality. Thus, the coupling of
this class with the rest of the system will be higher than the average coupling of the
classes of the system. This relationship between cohesion and coupling means that a
non-cohesive class should have a high coupling value” [KABA, p.2].
However in [KABA, p.6] by means of an experiment is shown that “in general, there is
no relationship between these (LCC, LCOM) cohesion metrics and coupling metrics
(CBO, RFC)”. Also one cannot say that less cohesive classes are more coupled to other
classes.
In [DARC05] Darcy believes that metrics for coupling and cohesion should be used only
together and expects, that “for more highly coupled programs, higher levels of cohesion
increase comprehension performance”. He motivated his conception by the following
thought experiment (figure 5.11).
Figure 5.11: Interaction of coupling and cohesion (according to [DARC05, p. 17])
“If a programmer needs to comprehend program unit 1, then the programmer must
also have some understanding of the program units to which program unit 1 is coupled.
In the simplest case, program unit 1 would not be coupled to any of the other program
units. In that case, the programmer need only comprehend a single chunk (given that
program unit 1 is highly cohesive). In the second case, if program unit 1 is coupled to
program unit 2, then just 1 more chunk needs to be comprehended (given that program
unit 2 also shows high cohesion). If program unit 1 is also coupled to program unit 3,
then it can be expected that Short-Term Memory (STM) may fill up much more quickly
because program unit 3 shows low cohesion and thus represents several chunks. But,
the primary driver of what needs to be comprehended is the extent to which program
unit 1 is coupled to other units. If coupling is evident, it is only then that the extent of
cohesion becomes a comprehension issue.” Next, Darcy confirmed his hypotheses with
an experiment with the maintenance of a test application. However, the very artificial
sort of the experiment prevents reader from untried implementation of this hypothesis
without more experiments.
49
Other types of cohesion – Functional Cohesion.
Zuse [ZUSE98, p.525] distinguishes seven types of cohesion:
Functional Cohesion: A functionally cohesive module contains elements that all
contribute to the execution of one and only one problem related task
Sequential Cohesion: A sequentially cohesive module is one whose elements are
involved in activities such that output data from one activity serves as input data
to the next
Communicational Cohesion: A communicational cohesive module is one whose
elements contribute to activities that use the same input or output data
Procedural Cohesion: As we reach procedural cohesion, we cross the boundary
from the easily maintainable modules to the higher levels of cohesion to the less
easily maintainable modules of the middle levels of cohesion. A procedurally
cohesive module is one whose elements are involved in different and possibly
unrelated activities in which control flows from each activity to the next
Temporal Cohesion: A temporally cohesive module is one whose elements are
involved in activities that are related in time.
Logical Cohesion: A logically cohesive module is one whose elements contribute
to activities of the same general category in which the activity or activities to be
executed are selected from outside the module.
Coincidental Cohesion: A coincidentally cohesive module is one whose elements
contribute to activities with no meaningful relationship to one another”.
Since the functional cohesion is most desirable, some researchers ([BIEM94]) tried to
develop a metric to measure it.
Nevertheless, the “Functional Cohesion is actually an attribute of individual procedures
or functions, rather than an attribute of a separately compliable program unit or
module” [BIEM94 p.1] and is out of scope of this work. Inter-modular metrics are more
important, since these are better indicators of the maintainability. Intra-modular
cohesion seems to be too complicated in the calculation and weak in prediction of the
maintainability of the entire system.
The next type of cohesion is package cohesion or partition of classes into packages. Such
kind of cohesion is also important, however difficult to analyze. Hence the package
cohesion is the topic of separate research.
LCOM Essential:
It is the degree of relatedness of methods within a class.
Cohesion can be used in the procedural and in object-oriented model as well.
Has impact on the analyzability and the changeability
Cohesion may be concerned together with coupling
LOCM4 seems to be most appropriate metric from the theoretical point of view,
additional experiments are needed.
50
Metric: D – Distance from Main Sequence
This set of metrics was suggested by Martin in [MART95] and measures the
responsibility, independency and stability of the packages. Martin proposes to consider
a ratio between the amount of abstract classes within a package and its stability.
A package is responsible, if it has big number of classes outside this package that depend
upon classes within this package. This number is called Afferent Coupling (Ca). Package
is independent, if it has small number of classes outside this package that are depended
upon by classes inside this package. This metric is called Efferent Coupling (Ce). The
responsible and independent package is stable, such package has no reason to change,
and lots of reasons not to change.
For measuring stability of the package Martin suggests the Instability metric:
In = Ce / (Ca+Ce). This metric has the range [0,1]. In=0 indicates a maximally stable
package. In=1 indicates a maximally instable package. The special case of a package
coupled to no external classes (not mentioned by Martin) is considered to have the
instability of 0 [REIS].
If all the packages in a system were maximally stable, the system would be
unchangeable. In fact, designer wants portions of the design to be flexible enough to
withstand significant amount of change. Also package should have sufficiently number
of classes that are flexible enough to be extended without requiring modification abstract classes.
To measure it, Martin suggests the Abstractness metric: A = # of abstract classes in the
package / total # of classes in the package. This metric also has the range [0,1]. 0 means
concrete and 1 means completely abstract package.
The more stable is the package, the more abstract classes should it have in order to keep
the ability to extension. These metrics are presented graphically on the figure 5.12
Each dot in the coordinate frame presents one package with two characteristics: stability
and abstractness. Packages placed in area A are highly stable and concrete. Such
packages are not desirable because they are rigid. These cannot be extended because
they are not abstract. And they are very difficult to change because of high stability.
Packages from area C are also undesirable, because they are maximally abstract and yet
have no dependencies.
Packages from area B are partially extensible, because they are partially abstract.
Moreover, these are partially stable so that the extensions are not subject to maximal
instability. Such a category seems to be "balanced". Its stability is in balance with its
abstractness. The size of the dot in figure 5.12 indicates the size of the corresponding
package.
As final metric was suggested the distance from the dot, which presents the package, to
line A+In=1. Because of its similarity to the graph used in astronomy, Martin calls this
line the Main Sequence.
The perpendicular distance of a package from the main sequence is D = (A+In-1)/√2.
This metric ranges from [0, ~0.707]. One can normalize this metric to range between
[0, 1] by using the simpler form D=|(A+In-1)|.
The big distance from the main sequence doesn’t mean that this package was bad
designed. It also depends on place of the package in the architecture. Packages working
51
with database or offering tools usually have high afferent coupling and low efferent
coupling, therefore are highly stable and difficult to change. Thus it is useful to have
more abstract classes here in order to be able to extend these and in such way maintain
the packages.
Packages for user interface depend from many other packages thus they have low
afferent coupling and high efferent coupling and are mostly instable. Hence designers
don’t need to have many abstract classes here, because these packages could be easily
changed. This statement should be empirically proved.
In figure 5.12 the analysis of project Mobile Client 7.1 (detailed described in chapter 8) is
presented. As it can be seen, the packages are evenly distributed on the whole square,
and it is impossible to conclude whether the entire system has good or bad design. The
same situation can be seen in all other analyzed projects. Hence D-metric is bad
indicator for the maintainability of the entire project. But one can notice that single
packages from areas A and C possible may by difficult to maintain. Thus D-metric is
supposed to be used for the metric-based audits. However experiments and discussion
with the designers show, that audits based on D-metric can find only evident errors of
design (for example not used abstract classes). Consequently, D-metric is rejected from
the quality model.
Figure 5.12: Demonstration of the analysis of Martin on project Mobile Client 7.1
Metric: CYC - Cyclic Dependencies
The metric CYC determines the number of mutual coupling dependencies between
packages. That is, the numbers of other packages the package depends upon and which
52
in turn depend on that package. Cyclic dependencies are difficult to maintain and
indicate potential code to apply refactoring changes, since cyclically dependent
packages are not only harder to comprehend/compile individually, but they cannot be
packaged, versioned, and distributed independently. Thus, they violate the idea that a
package is the unit of release. Unfortunately, this metric is project size dependent and it
is impossible to compare two projects based on this metric. Consequently, the audits
based on this metric can be useful to catch cyclic package dependencies before they
make it into a software baseline.
Metric: NOM - Number of Methods and WMC - Weighted Methods per Class
Consider a class with n methods. Let c1...cn be the complexity of the methods. Then:
If all method complexities are considered to be unity (equal to 1), then WMC = NOM =
n, the number of methods. However in most cases complexity of the methods is
estimated by MCC.
The metric WMC was introduced by Chidamber and Kemerer's [CHID93, p. 12] and
criticized by Churcher, Shepperd and Etzkorn. In particular, Etzkorn has suggested
new metric for complexity of the methods [ETZK99] – Average Method Complexity.
He argued that WMC has overstated values for classes with many simple methods. For
example, a class with 10 attributes has 10 get-methods, 10 set-methods and the
constructor, thus WMC = 21, what is very high value for such a primitive class. AMC,
on the opposite, will have understated values for classes with a few really complex
methods (MCC > 100) and many simple methods. “Thus, AMC is not intended
primarily as a replacement for the WMC metric, but rather as an additional way to
examine particular classes for complexity” [ETZK99, p. 12].
In this thesis it is more preferable to use WMC instead of AMC, because WMC is class
size dependent, but independent from the project size. Additionally, it has very clear
meaning: number of all decision statements in the class plus number of methods.
Consequently, WMC is a good metric for estimating of overall algorithm complexity of
the class.
For data-container classes NOM = WMC because such classes have only get- and setmethods, which have MCC = 1. Thus, NOM can be used as additional metric for finding
data-container classes. It is important for rejecting the data-container classes from the
cohesion research.
Structure Chart
This model describes the communication between modules in the procedural
environment and suits for illustration of processes in non-OO ABAP programs.
Example of structure chart is depicted in figure 5.13. Boxes present modules (function
53
modules, programs, includes, etc.), circles present global variables and arcs present
calls, whereas parameters of the call can be also depicted. Direction of the arrows
distinguishes between importing and exporting parameters.
Figure 5.13: Example of structure chart
Metric: FAN-IN and FAN-OUT
These metrics describe the data and external coupling in the procedural environment.
Besides FAN-IN and FAN-OUT describe opposite directions of coupling:
Parameters passed by values count toward Fan In
External variables used before being modifies count toward Fan In
External variables modified in the block count toward Fan Out
Return values count toward Fan Out
Parameters passed by reference depend on their use
Drawback of these metrics is that it is assumed that all the pieces of information have
the same size, however the distinction of complexity of procedure calls requires much
more detailed analysis. All in all these metric can impart quite thorough idea about
coupling.
In the ABAP-environment functions for “where-used” and “usage” can be used for
reporting of FAN-IN and FAN-OUT respectively.
Based on these metrics a wide list of hybrid metrics was suggested in order to aggregate
metrics to one single value for the entire system. One example is D-INFO =
(SUM(FAN-IN*FAN-OUT))2, where SUM means sum for all modules (see [ZUSE98] for
more details). However these derived metrics are project size dependent and have less
meaning.
Metric: GVAR - Number of Global Variables
This metric presents the number of global variables used in a system. Usually, to
overcome the size-dependency of this metric by normalizing, Number of Global
Variables is divided by number of modules. Nevertheless, this metric is indirectly
included in FAN-IN and FAN-OUT, hence it is senseless to include this excessive metric
into the quality model even in spite of its ease.
54
Other Models
Here some simple metrics, which don’t suit to any previous introduced models, are
discussed
Metric: DOCU – Documentation Rate
This quantitative metric indicates the percentage of modules, which have external
documentation: DOCU for ABAP or JavaDoc for Java. However, the quality of the
documentation itself is not considered and is very difficult to automatically assess at all.
Moreover, this metric is a part of the Maintainability Assessment. Thus this metric is
excluded from the model.
Metric: OO-D – OO-Degree
This is an additional metric for ABAP (for Java application is always 100%). It shows
percentage of compilable units created using the object-oriented paradigm (classes or
interfaces) to total number of compilable units. This metric like other additional metric
don’t have any qualitative meaning, but indicates the importance of OO-metrics: if only
small part of a system is created using OO-paradigm, the analyst will pay less attention
to OO-metrics.
Metric: SMI – Software Maturity Index
It is possible that the customer changes some modules in order to customize his system.
Before the maintainers can start with the analysis and updating of the customer’s
system, they should make sure that modules, which should be maintained, are not
affected by the customization. It is important to know how different from the standard
release the system actually is. For this reason the list of new created, changed or deleted
objects should be written. As the metric for the modifying degree, the metric SMI is
suggested. This is the rate of new created, changed or deleted objects with respect to the
total number of objects, whereas it is unimportant who has made the changes: IMS or
the customer.
M − (M a + M c + M d )
SMI =
M
If SMI is less than 1, it is very likely that maintainers should compare the current
customer version with the standard release before the update. The SMI approaches 1 as
product begins to stabilize.
Empirical meaning of SMI is: which percent of modules is not changed with respect to
last standard release.
This metric has type Percentage, thus it has bad numerical properties, see next chapter
for more details.
For the ABAP environment the number of changed LOC can be calculated by Note
Assistant.
Noteworthy, that in the ABAP environment only small part of the system cannot be
changed by the customer, in the Java environment unchangeable part is much bigger.
55
Metric: NOD – Number Of Developers
This metric shows the average number of developers which have ever touched an
object. Author believes that modules, which were changed many times by different
developers, have complicated behavior and are hard to modify. Moreover, such
modules likely have very different styles and names conventions. All these factors
decrease the maintainability. The interpretation of the values depends on the used for
the development process methodology. For example eXtreme Programming doesn’t
distinguish the code ownership and in this case the metric NOD is senseless.
Correlation between Metrics
Various methods can be used in order to prove whether one metric depends on another.
Depends on numerical properties (see next chapter), several methods are possible.
Pearson and Kendall’s correlation coefficients are applicable with measures with a ratio
scale, whereas Spearman is used when the measure has an ordinal scale.
For the small amount of data the covariation can be used: Covariation = SUM(xave(x))(y-ave(y))/(n-1)
Several experts argue that the positive result of the correlation not necessarily implies
the presence of casual relationship between the correlated metrics. In this thesis the
relation between the metrics is proved using the correlation and deduction as well.
This procedure is used in order to find correlated metrics in the quality model and reject
the less important metrics, which do not provide additional information. The second
scenario is an empirical validation of metrics, in this case it is proved whether the
selected product metrics are correlated with process metrics. Unfortunately, in this
thesis such study is impossible because lack of data for process metrics.
For the illustration and to stress out the important properties of the correlation, a
diagram can be used. The example depicted in figure 5.14 presents the correlation
between LOC and WMC. The area marked with “1” presents several generated message
classes, which have a lot of LOC, but no methods and thus no WMC. The area marked
with “2” presents interfaces and abstract classes, which have few methods and
approximately as few LOC.
56
WMC
Correlation between LOC and WMC
200
150
100
50
2
0
0
1
400
800
1200
1600
LOC
Figure 5.14: Correlation between LOC and WMC
The next possible relation between metrics is not so obvious. Each product has the
minimal inherent complexity, which depends only on the problem statement. If
complexity of one perspective is reduced, complexity of other perspective will increase.
For example reducing the high intra-modular complexity by increasing the total
number of classes will lead to increasing the inter-modular complexity. An example of
such relation is depicted in figure 5.15. With the numbers “1” and “2”, two releases with
the same functionality are marked off. Here can be seen that decreasing the average
MCC leads to increasing the total number of classes in the second release.
Figure 5.15: Example of dependency between MCC and NOO
Metrics Selected for Further Investigation
Some parts of the quality model were rejected already by the creation of the model and
expansion of it with the metrics:
Questions “Does the system have data flow anomalies?” and “Is a code that is
unreachable or that does not affect the program avoided?” were removed from
57
the quality model, because they have no great impact on the maintainability and
are difficult to calculate.
For the question “Are the naming conventions followed?” has not been found
any appropriate simple metric and this question was removed from the quality
model. All in all the proving of the naming conventions is not a trivial task.
The question “How complex is the DB-schema?” is not urgent for SAP
architecture and was removed.
The question about the complexity of the data types was removed because of
lack of metrics.
However the quality model is still redundant. Different metrics get the same qualitative
statement. Therefore, the metrics, which cannot get new qualitative information or are
difficult to calculate, should be discarded. These metrics may be put on a waiting list for
the implementing in the future.
For the selection of the most important metrics, three additional criteria for each metric
were provided: the importance, estimation or judgment in the literature and ease of
implementation. In the following the list of all rejected metrics are enumerated with an
indication of the reason for rejection:
A (Abstractness) – is used also in D, the aggregated value doesn't provide
appreciable qualitative meaning
CN (Control Nesting) – is included in and correlates with MCC
CR (Comments Rate) – is replaced by LC (Lack of Comments)
CYC (Cyclic Dependencies) – is size-dependent; optional can be used for
additional audits
D (Distance from Main Sequence) – its aggregated value doesn't provide
appreciable qualitative meaning
DIT (Depth of Inheritance Tree) – is extended and replaced by NDC
DOCU (Documentation Rate) – is difficult to analyze, is a part of the
Maintainability Assessment
D-INFO – is size-dependent and included in FAN-IN and FAN-OUT
GVAR (Number of Global Variables) – is included in FAN-IN and FAN-OUT
NOC (Number of Children) – is extended by NAC
NOF (Number of Fields) - correlates with LOC
NOM (Number of Methods) - correlates with LOC
NOS (Number of Statements) - correlates with LOC and MCC
U (Reuse Factor) – is not very important for the maintenance
After the truncation of the quality model the following metrics were supposed as
maintainability indicators and thus selected for further research:
CBO - Coupling between objects
CDEm - Class Definition Entropy (Modified)
CLON - Clonicity
LC – Lack of Comments
FAN-IN (substitutes CBO in non OO ABAP environment)
58
FAN-OUT (substitutes RFC in non OO ABAP environment)
LCOM - Lack of Cohesion of Methods
LOC - Lines Of Code
LOCm – Average LOC in methods
m – Structure Entropy
MCC - McCabe Cyclomatic Complexity (substitutes WMC in non OO ABAP
environment)
NOD - Number of Developers
RFC - Response For a Class
SMI - Software Maturity Index
WMC - Weighted Methods per Class
The metrics NAC (Number of Ancestor Classes) and NDC (Number of Descendent
Classes) are suggested to support metric-based audits.
The selected metrics are expected to describe the maintainability of the software and
should cover the following aspects:
Incoming and outgoing connections between programming objects
Quantity of the internal code documentation
Cohesion of programming objects
Degree of conformance to the principle high cohesion - low coupling
Modularity
Algorithmic complexity
Number of developers
Usage of the inheritance
Maturity
Clonicity
Size-dependent Metrics and Additional Metrics
The size-dependent metrics will not provide any qualitative statement about system,
but can give some idea about the size of the concerning system. The metrics Total-LOC
(Total Lines of Code) and NOO (Number of Objects) are suggested.
Additional metrics help to get some idea about importance of some other metrics. The
metrics OO-D and IF are suggested. OO-D has an effect only upon ABAP, because for
Java OO-D is always = 1. OO-D designates whether OO-metrics (like NAC, NDC, RFC,
CBO, D and other) should be taken into the maintainability assessment or not. If less
than 15% of entire system is made using OO approach, OO metrics have no evident
impact on the maintainability.
IF (Inheritance Factor) designates, whether inheritance metrics (NAC and NDC) should
be taken into assessment or not. Nevertheless IF imparts some qualitative statement as
well, because by changing of stand-alone classes no OO methods can be used, hence
such classes are more difficult to change.
59
6. Theoretical Validation of the Selected
Metrics
Problem of Misinterpretation of Metrics
After the calculation of the metric values for concrete source code one can analyze,
compare or transform these. Nevertheless, in order to exclude misinterpretation, the
application of the metrics should only take place after the metrics have been shown to
be theoretically valid, in the sense that their numerical properties are well known and
all operations after the results collection are deliberate. The next example shows a
misleading usage of metrics. The maintenance can be seen as the process of exchanging
one component of the system with another (newer) component. Figure 6.1 presents a
simple illustration of this process.
M5
M2
M4
M3
M1
M3‘
Figure 6.1: The maintenance as the process of exchanging components.
Obviously the new component M3’ is somehow better than the substituted component
M3 and one of the metrics should show it. In this example it would be the Defects
Density (DD). However an improvement of the single part (or even each part) of the
system not always leads to the improvement of the entire system. Often this is not a
problem of the improvement, but of the description of it. Thus the right metrics for
estimation and right operation at metrics should be used.
The next table (6.1) is taken from [ZUSE98, p. 47] and shows two versions of one system
with five modules. Each module in newer version is better than the correspondent
module in old version – it has smaller DD. However, overall DD for the system
becomes worse. This can happen because DD is a percentage measure and thus
depends on the size of module. The overall DD depends also on distribution of size
between the modules and the analyst must interpret such metrics very carefully.
It is helpful to follow the number of steps to ensure the reliability of the proposed
metrics. Some approaches were found to check the numerical properties of metrics in
60
order to find admissible transformations and prepare hints for analyst how to handle
metrics. One of them is axiomatic approach, proposed by Weyuker [WEYU88], which
provides a framework based on a set of nine axioms.
Table 6.1: Trend of DD for two versions of the system [ZUSE98, p. 47]
Version 1
Version 2
# of
module # of errors LOC
DD
# of errors LOC
DD
1
3
55 0.0545
12
777 0.0154
2
6
110 0.0545
5
110 0.0454
3
3
110 0.0272
2
110 0.0181
4
70 10000 0.0070
6
1000 0.0060
5
4
110 0.0363
3
110 0.0272
SUM
86 10385 0.0082
28
2107 0.0132
Trend of DD
improvement
improvement
improvement
improvement
improvement
degradation !
Zuse’s framework for the software measurement provides also a set of axioms, the so
called extensive structure. Depending on the fulfillment of these, one can conclude about
the type of scale of the metric and hence admissible transformation, which can be
applied to the metric. This framework was used in this thesis for the examination of the
selected metrics as it is more competent, common accepted and simple.
Types of Scale
Admissible transformations and hence types of scale are probably the most important
properties of metrics, because other properties follow from types of scale. In the early
1940’s Stevens introduced a hierarchy of measurement scales and classified statistical
procedures according to the scales, for which they were “permissible.” A brief
description and criticism can be found in [VELL]. Here some basics will be introduced.
All types of scale are summarized in table 6.2.
The first very primitive scale is nominal. The values of metrics on this scale have no
qualitative meaning and characterize just belonging to one or another class. The only
possible operation is equality – one can define whether two values belong to the same
class. Examples of nominal scale would be labels or classifications such as:
f(P) = 1, if Program is written in ABAP
f(P) = 2, if Program is written in Java
The ordinal scale introduces the qualitative relation between values, thus they are able to
be compared. The expert’s notes are on ordinal scale and one can use the empirical
operation “more maintainable”.
Values on the interval scale have equal distance between values.
The Ratio scale allows comparing ratios between the values.
The Absolute scale is a special case of a ratio scale and presents the actual count.
The used in this thesis metrics are placed on the ordinal and ratio scale. One can see that
higher scales (ratio) provide more possibilities for interpretation (wide range of
empirical and statistical operation), but are more sensible to admissible transformation.
Using a not appropriate transformation will lead to decreasing of type of scale for the
result value or even to wrong conclusion.
61
The main idea of analyzing the types of scale is to help choosing the appropriate model
and correctly analyzing the results by using of the appropriate operations.
Admissible
Transform.
Basic Empirical
Operations
Statistical Operations
Examples
Any oneto-one
transformat
ion
determination of
equality
mode,
histograms,
Non-parametric
statistics
(frequency counts, …)
y2 > y1
iff x2 > x1
(strictly
monotone
increasing
transformat
ion)
y = ax + b,
a>0
(positive
linear
transformat
ion)
y = ax, a >
0
(similarity
transformat
ion)
y
=
x
(identity)
the above, plus
determination of
greater or less
The above plus Rank
order
statistics
(Spearman and Kendall
Tau
correlation
coefficient,
Median),
Maximum, Minimum
the above, plus
determination of
the equality of
intervals
or
differences
The
above
plus
Comparisons
of
arithmetic means, the
Pearson
correlation
coefficient
the above, plus
determination of
the equality of
ratios
The
above
plus
Comparison
of
percentage calculations,
Variance
the above, plus
determination of
equality with
values obtained
from other scales
of the same type
the above
labels or classifications such as:
f(P) = 1, if Program is written in
ABAP
f(P) = 2, if Program is written in
Java;
activities (analyzing, designing,
coding, testing); problem types;
numbering of football players
rankings or orderings such as
severity and priority assignments
f(P) = 1, if Program is easy to read
f(P) = 2, if Program is not hard to
read; NAC; NDC; CR; LC; OO-D;
SMI; D; NOD; DOCU; IF; m;
CDEm; GVAR; CLON; DD
The absolute time when an event
occurred;
calendar
date;
temperature
in
degrees
Fahrenheit or Celsius;
intelligence scores (“standard
scores”)
time intervals; cost, effort (staffhours), length, weight, & height;
temperature in degrees Kelvin;
LOC; LCOM; CBO; RFC; WMC;
FAN-IN; FAN-OUT; NOM
Counting; probability
Absolute
Ratio
Interval
Ordinal
Nominal
Scale
Type
Table 6.2: Types of Scales (partially taken from [PARK96, p.9])
Types of Metrics
Measures can be divided into different groups regarding the kind of receiving
information for the metric. See [ZUSE98, p.p. 242 – 246] for the full list. Below only used
in this thesis types of measure are presented:
Counting – simple calculating of objects or their artifacts. The following operations can
be applied: Range, Sum (for additive metrics), Average (for additive metrics, carefully),
Weighted Mean, Median, Standard Deviation, Graphic, Aggregation (very careful).
62
Examples are: LOC, WMC, RFC, CBO, LCOM4, FAN-IN, FAN-OUT, NOM, MCC,
NOD.
Density – one metric value is divided by another independent metric value. Examples
are: GVAR, DD. The following operations can be applied: Range (very careful),
Weighted Mean, Median, Standard Deviation, Graphic, Aggregation (very careful)
Percentage – a metric expressed as ratio of one part of empirical objects or their artifacts
with respect to their total number. Examples are: CR, OO-D, SMI, DOCU. The following
operations can be applied: Range (very careful), Weighted Mean, Median, Standard
Deviation, Graphic, Aggregation (very careful). In particular this means that the
percentage metrics must not be used for arithmetic mean. For example if one module
has CR(P1) = 50% and other module CR(P2) = 10%, one must not average these to 30%.
Dependent on size of modules the real CR(P1 + P2) could be between 10 and 50%.
Especially for the CR the weighted mean will be smaller than the arithmetic mean,
because smaller classes usually have larger CR and weighted mean weights small
classes with smaller coefficients. Distribution for LOC and CR by the example of date
from the project ObjMgr (new) is shown in figure 6.2.
Distribution of LOC and CR
LOC
2000
1500
1000
500
0
0
50
100
150
200
250
CR 300
Figure 6.2: Distribution between LOC and CR, smaller classes usually have larger CR.
Minimum, Maximum – minimal or maximal value of a population (metric set of each
empirical object). Only Range operation can be used.
Hybrid metric is a metric, which consists of the union of other metrics using the addition
or multiplication. Examples are: LC, m, CDEm, D, MI. Hybrid metric inherits lowest
numerical properties of its components. Hence, usually such metrics have relatively
poor numerical properties and only few operations can be applied.
Concatenation operation for inheritance hierarchies is indefinably, because new nodes
can be added at any place in hierarchy. Thus all metrics, based on this model have only
ordinal scale. This is detailed discussed in [ZUSE98 p.p. 273 - 335].
63
Conversion of Metrics
The conversion of metrics is a numerical operation with one or more metrics in order to
get metrics with new numerical or qualitative properties.
The first type of conversion is the aggregation. Some metrics like SMI, CLON or OO-D
are calculated for the project as whole and don’t need to be aggregated, but many
others metrics describe one single class or even method and need to be aggregated in
some way to one single value indicates the entire group of empirical objects (package,
inheritance hierarchy or whole system). Depends on the type of scale, different methods
are possible.
The first method is the range, in this method only maximal (minimal) value is taken.
However one extreme value is a bad indicator for the entire system and only can be
used together with other methods. Range can be used for metrics on ordinal or higher
scale.
The second method is the averaging using arithmetic mean: values of all modules are
summed and divided by number of modules. Keep on mind that this method can be
applied only to metrics on interval and higher scales. This is probably the most simple
and popular way of the averaging. Nevertheless it will change empirical statement of
the resulting metric and its properties. Consider some features of the arithmetic mean
applied to the inter-modular metrics. As example the metrics FAN-IN and FAN-OUT
are taken. The metrics FAN-IN and FAN-OUT are good indicators of the analyzability
and changeability of a single module. But how the values of single modules can be
combined to one common indicator for the entire system? Consider an example system
on figure 6.3 to illustrate this.
A
LOC:200
FAN-IN:0
FAN-OUT:2
B
LOC: 100
FAN-IN:0
FAN-OUT:2
C
LOC:100
FAN-IN:2
FAN-OUT:2
E
LOC:50
FAN-IN:2
FAN-OUT:2
H
LOC:50
FAN-IN:2
FAN-OUT:0
D
LOC:150
FAN-IN:1
FAN-OUT:2
F
LOC:50
FAN-IN:3
FAN-OUT:3
G
LOC:200
FAN-IN:1
FAN-OUT:3
I
LOC:50
FAN-IN:2
FAN-OUT:1
J
LOC:50
FAN-IN:4
FAN-OUT:0
Figure 6.3: Example system for weighted mean
Average values of FAN-IN and FAN-OUT are:
Ave-FAN-IN = (0+0+2+1+2+3+1+2+2+4)/10 = 1,7
Ave-FAN-OUT = (2+2+2+2+2+3+3+0+1+0)/10 = 1,7
64
It is not a singular coincidence. For any closed system average values are equal because
each relation is directional and is calculated twice: once in FAN-IN and once in FANOUT by the constant number of modules.
Thus for the arithmetic mean in closed systems simple formula Ave-FAN-IN = AveFAN-OUT = Number of Relations / Number of Objects can be used.
The problem is that all modules have equal weight, thus all empirical objects have equal
impact on the average value. In real world large and complex objects have more impact
on attribute of the whole system, but averaging will equalize in rights complex and
simple objects. It is quite reasonable to calculate weighted mean value, which
characterizes quality attribute of the entire system more precisely.
Different systems for the weighting can be used. Suppose that possibility of changing
the larger module is more than the smaller one, because it has more LOC, which could
be changed. Consider again the example on figure 6.3, notice that the darkness of
rectangles represents the size of the module.
Let’s calculate weighted by size (in LOC) mean values of FAN-IN and FAN-OUT.
Mean-FAN-IN = 0*0,2 + 0*0,1 + 2*0,1 + 1*0,15 + 2*0,05
+ 3*0,05 + 1*0,2 + 2*0,05 + 2*0,05 + 4*0,05 = 1,2
Mean-FAN-OUT = 2*0,2 + 2*0,1 + 2*0,1 + 2*0,15 + 2*0,05
+ 3*0,05 + 3*0,2 + 0*0,05 + 1*0,05 + 0*0,05 = 2
The results show, that on average by the analysis of the system the developer should
analyze 2 modules and by changing should keep stable 1,2 modules. That means the
system is relative stable, but difficult to analyze. Hence the weighted mean allows not
only more precise calculating of aggregated values, but also distinguishing the systems
with tendency to predominance of one or other direction of relations. Noteworthy
predominance in this case means the probability of having to analyze relations in this
direction.
Nevertheless, in generally a large module uses more other modules than smaller one.
That means weighted mean for FAN-OUT tends to be more than weighted mean for
FAN-IN. Figure 6.4 shows that usually large classes (larger LOC) have more
connections with other classes (larger RFC). This observation is based on 8 SAP Java
projects. Hence weighted FAN-OUT supposed to be larger than weighted FAN-IN.
Since WMC is good indicator for class complexity and should be correlated with the
fault probability, this metric also can be used for weighting. In this case the weighted
mean for the FAN-OUT can be interpreted as the average number of modules the
developer has to analyze by localizing of a fault.
The second type of conversion is the normalizing. Metrics have different ranges of values
and for comparison or presentation it is useful to have all metrics having the same
range (usually in interval [0; 1]). This could be achieved by the normalizing. Example
for this conversion is the normalizing of entropy: because maximal entropy is known,
the normalizing is easy. Anyway since the entropy metric is on ordinal scale, the
normalizing will not worsen its numerical properties.
65
Correlation between LOC and RFC
RFC
400
350
300
250
200
150
100
50
0
0
200
400
600
800
1000
1200
LOC
1400
Figure 6.4: Correlation between LOC and RFC
The third type of conversion is the composition of set of metrics to one hybrid metric.
Very popular is polynomial composition, however other types are also possible.
Example is the Maintainability Index.
The fourth type of conversion is the percentage grouping. All modules are grouped into
three groups based on the metrics values (normal values, high but still acceptable,
inadmissible) after that the percentage of modules in each group are presented using
pie-diagram as it is shown in figure 6.5. Such diagrams also can give an idea about
distribution of values within the system and good complements the aggregated value.
Percentage grouping for LOC
Mobile Client - 7.0
red: 62
36%
red: 118
30%
green: 99
58%
yellow : 10
6%
Percentage grouping for LOC
Mobile Client - 7.1
yellow : 32
8%
green:240
62%
Figure 6.5: Example of percentage grouping. Comparison of two versions of Mobile Client
66
The percentage grouping can also be done at more detailed level of granularity, namely
in LOC. For that it is needed to recalculate all values in LOC and present the
distribution of LOC, like it is shown in figure 8.6 (see p. 82). Such detailed description
allows not only the analyzing the module distribution into normal, high and
inadmissible areas, but also analyzing how many LOC do the inadmissible modules
have.
The aggregation and normalizing are also used to convert size-dependent metrics to
quality-dependent metrics. Usually, average values of size-dependent metric are no
more size-dependent. Nevertheless any conversion should be used very carefully,
because such transformation can change the numerical properties and qualitative
meaning of the metric.
Other Desirable Properties of the Metrics
Meta-metrics or properties of the metrics are discussed in [MUST05]:
Compliance: The ability to cover all aspects of quality factors and the design
characteristics
Orthogonality: The ability to represent different aspects of the system under
measurement. This property is detailed studied during the COBISOME project
Formality: The ability to get the same value for the same systems for different
people at different times through precise, objective and unambiguous
specification
Minimality: The ability to be used with the minimum number of metrics
Implementability/Usability: The implementation technology independent ability
Accuracy: A quantitative measure of the magnitude of error, preferably
expressed as a function of relative error
Validity refers to the degree to which a study accurately reflects or assesses the
specific concept that the researcher is attempting to measure
Reliability: The probability of failure free software operation for a specified
period of time in a specified environment
Interpretability: The ease with which the user may understand and properly use
and analyze the metrics results
Visualisation
The very popular method for the presentation of metric results is the Kiviat-Diagram.
For the maintainability’s dimensions such diagram is presented in figure 6.6.
These diagrams are very pictorial and present information simple and intuitive.
However this intuition can be misunderstood. Usually, all dimensions are on ordinal
scale and any ratios between numbers by comparison of two systems are meaningless.
But the numbers are graphically presented on each dimension using the rational
intervals, what can lead to the situation, when the analyst will try to interpret the ratio
between intervals on Kiviat-Diagram as the ratio between metrics on ordinal scale. This
is the misunderstanding. The same holds for other column-diagrams as well.
67
The next drawback is hiding of the information: behind each dimension several metrics
are hidden and after aggregation it is not clear, which metric caused the deviation. It is
also not clear, which weights for each metric should be taken.
Figure 6.6: The Kiviat-diagram for the maintainability dimensions
Consequently, the best possibility for presenting of the multidimensional information
remains a table, where all used metrics with aggregated values are listed. Additionally
some color marker accent indicators with high or not permissible values. By comparing
multiple releases of one system it is possible to indicate trends of values with arrows.
The example of such presentation is given in table 6.3.
Table 6.3: Example of output table (Mobile Client 7.0 vs. 7.1)
The next possibility is the usage of a Business Warehousing system. Here different reports
can be prepared and the history can be saved. However at this point of time it is not
possible and can be planned for remote future. In this work the simple tables will be
used.
68
It is interesting to inspect how values of metric are distributed between modules. The
possibility to take a deeper look into essential of metric is the distribution graphic. One of
them is depicted in figure 6.7. Here can be seen that most of methods have few LOC
and only few methods are very large. Thus averaging of LOC can lead to
underestimating.
LOC
Distribution of LOC in Methods
176
151
126
101
76
51
26
1
1
51
101
151
201
251
301
351
401
451
# of Me thod
Figure 6.7: Distribution of LOC in Methods (small part of the project Mobile Client 7.1)
The class blueprint graphically presents information in for of the chart, where simple
metrics and trivial class diagrams are combined. Elements of a class are graphically
represented as boxes. Their shape, size and color reflect semantic information. The two
dimensions of the box are given by two selected metrics. Example is given in figure 6.8.
Three-dimensional diagrams are also possible.
Figure 6.8: Example of visualization of class diagram.
Other class graphs for the software metrics visualization are detailed discussed in
[LANZ99]. For his work Lanza used the tool called CodeCrawler, which is a language
independent reverse engineering tool, which combines the metrics and software
visualization. See also http://www.iam.unibe.ch/~scg/Research/CodeCrawler/ .
69
7. Tools
In this chapter several tools for the software measurement and analysis are discussed.
In the first section tools for ABAP are presented, in second section Java tools are
introduced and in third section tools for automation of the GQM-approach and the
integration of several tools in order to automate experiments are discussed.
ABAP-tools
Transaction SE28
The transaction SE28, which uses the package SCME, can be used to calculate metrics
and visualize these in form of the hierarchy. Program SAPRCODD can be used for
calculating of a set of metrics for a single program or for a set of the programs. In this
case the mask “*” should be used as for the parameter object name. SE28 is not a
standard tool, which can be found in each standard installation of the basis. It is
available for example in BC0 and B20 systems. For parsing of the source code the ABAP
command SCAN ABAP-SOURCE INTO TOKENS is used. However, only few metrics
are implemented: LOC, MCC, Comparative complexity (calculates number of ORs and
ANDs within IF statements), DIF – Halstead Difficulty, Number of comments, etc.
Unfortunately, it is impossible to show more than 4 metrics at the same time.
Noteworthy, that this transaction only presents the results, the actually calculation is
made by the job EU_PUT.
Additionally, for the measurement some standard ABAP tools can be used. The
transaction DECO and its components can be used for calculating of the FAN-IN and
FAN-OUT metrics. See tables BUMF_DECO and TADIR, which contain entire structure
of the system, the package SEWA and functions MRMY_ENVIR_CHECK,
RS_EU_CROSSREF and REPOSITORY_ENVINRONMENT_CHECK. These functions
return a list with “where-used” and “uses” objects. Before these functions can be used a
special job should be started in order to fulfill the tables. Questions refer to Martin
Runte or Andreas Borchardt.
Z_ASSESSMENT
This small report calculates metrics needed for the Maintainability Assessment project.
They are: Total number of forms, Forms with > 150 lines, Percent with > 150 lines and
Ratio comments to code. It is very small, but useful report. Please contact Alpesh Patel
for further questions.
CheckMan, CodeInspector
These tools increase awareness of the quality in development by checking of code
against a great many rules (audits). In opposite to CheckMan, CodeInspector provides
also possibility to counting of some software attributes and hence allows metric’s
70
implementation. However until now no metric was implemented in such way. ABAP
Test Cockpit is successor for Code Inspector and CheckMan. It also suits to counting of
metrics. But its productive start is only planned. For more details about CheckMan see
[SAP03], for more details about CodeInspector see [SAP03b].
AUDITOR
This tool from third vendor (www.caseconsult.de) is supposed for audits and doesn’t
suit for the calculation of sums. Each ABAP program is processed separately and the
result is stored in a HTML-file. Thus, if one wants to have information about the whole
system, or some of the inter-modular metrics, these HTML-files should be additionally
processed, in order to collect this information.
The second problem is that AUDITOR can read only flat files, thus the entire system
should be exported before the measurement. Additional audits can be implemented in
form of C++ libraries. All listed problems make this tool awkward for current task.
At this point of time the transaction SE28, report Z_ASSESSMENT and some standard
tools can be used. Nevertheless, the ABAP environment has deficit in tools for the
following metrics: CDEm, m, SMI, LCOM4, RFC, CBO, NAC and NDC. Other metrics
can be measured directly or with a small work-around. For the metric CLON the tool
CloneAnalyzer can be used.
Java-tools
Some tools for Java need the binary code or compilable source code because these have
to parse the classes first and only then the metrics will be calculated. Some other tools
don’t need the compilable code. This allows calculating of metrics for the coding, which
contains syntax errors or references on missed classes. This makes it possible to use
such a tool for analyzing of a part of the application only. Nevertheless in this case
results can be slightly garbled and the analyst should draw the conclusions very
guardedly.
Borland Together Developer 2006 for Eclipse
Among other features Borland Together Developer provides also the functionality for
the metrics measurement. This tool collects measurement data on source code level. It is
important for the analysis of only a part of the software. However not all selected
metrics are provided by Together. This tool is used for initial research with substituting
of some metrics. For the next usage the missing metrics should be implemented. Next,
several important for the experiment specialties are introduced.
LOC is the number of lines of code in a method or a class. LOC counts all rows within
class body only. Comments, package specification and imports before the class body are
not counted. There are two options for LOC:
Documentation and implementation comments as well as blank lines and fullrow comments can be optionally interpreted as code
71
Empty strings and full-row comments are rejected. Noteworthy if several
statements are written in one line, they are counted separately. This option has
been chosen for the experiments
The metric for CR is called TCR and counts the ratio of the documentation and
implementation comments to the Total-LOC. Comments inside a file, which contains
the class, but outside the class (before the class body) are not counted.
The metric for WMC is called WMPC1. The metric for DIT is called DOIH.
For NOM (number of methods) the metric NOO – Number of operations (except
constructors) is used.
The tool provides several metrics for cohesion. LCOM is calculated using attributesmethods coupling only. The coupling between methods and inherited attributes are not
considered. Thus no proper metric for cohesion was found.
Borland Together allows processing the classes with missed “imports”, however it can
lead to erroneous results. For example DIT will be equal to zero if the parent class is
missed. The same can be said about many other inter-modular metrics. Thus, after the
calculation the selectively results should be proved manually.
Code Quality Management (CQM)
The CQM supports the entire development process by providing a landscape for the
measurement and analysis. Measurement data is stored in the repository. After that the
web-based quality reports can be prepared. Other tools (like Together or JDepend) can
be integrated into the CQM using adapters. For more details see [SAP05d].
CloneAnalyzer
CloneAnalyzer is a free Eclipse plug-in for the software quality analysis. It allows
finding, displaying and inspecting of clones, which are fragments of duplicated source
code resulting from lack of proper reuse. Noteworthy this tool finds only exact clones.
That means this tool is language independent and can be used both for the Java and for
ABAP environment. Nevertheless, the CloneAnalyzer works with flat files only. Thus
the corresponding part of the ABAP system should be exported and the file filter should
be set in options. In options it is also possible to set the minimal size of the clones to be
found: in context of this thesis clones with length of 15 LOC and larger were found. The
founded clones can be saved in CSV-format with specifying of number of clones in
clone set and the length of a clone. This data allows calculation of total LOC in the
clones
and
thus
Clonicity.
For
the
more
details
see
http://cloneanalyzer.sourceforge.net/ .
Tools for Dependencies Analyze
All tools discussed in this subsection calculate metrics for the packaging including:
Afferent Couplings (Ca), Efferent Couplings (Ce), Abstractness (A), Instability (In) and
Distance from the Main Sequence (D).
JDepend is a free stand-alone tool and is also available as plug-in for Eclipse. Here only
some basic packaging metrics are provided. For more information see
72
http://www.clarkware.com/software/JDepend.html
and
http://andrei.gmxhome.de/jdepend4eclipse/ .
OptimalAdvisor 4.0 is a commercial tool, which graphically presents results for each
package, whereas if package has sub-packages, those classes will be taken into account.
Though well graphical representation no possibility for saving information is provided.
For more details see http://javacentral.compuware.com/products/optimaladvisor/ .
Code Analysis Plug-In (CAP) is a free plug-in for Eclipse. It provides handy interface
for browsing classes and showing dependencies. Unfortunately, it is not possible to
save results. This plug-in doesn’t take into account standard java libraries.
CodePro is a commercial plug-in for Eclipse and parallel with dependency analysis also
provides few other metrics and other functionality. CodePro metrics take into account
dependencies to
standard java
libraries.
For
more information see
http://www.instantiations.com/codepro/ .
The problem of many tools is the aggregation, all tools do it in different ways and the
methods used for the aggregation are even not documented. For example CodePro just
takes the value of the top package (for example com or org). Since this package usually
has no responsibility, the D-metric will be equal to the abstractness, what is quite
confusing. The best way to get the average value for D (or other metric) is to calculate it
ourselves using XSLT.
JLin
SAP-internal tool JLin performs static tests. Possible applications include
Identification of potential error sources
Enforcing of code conventions
Evaluation of metrics and statistics
Enforcing of architectural patterns
Monitoring
In order to fulfill these requirements, JLin can be used in the following environments:
As an Eclipse plug-in
Within the SAP make process (which currently uses ant, in the near future the
Component Build Server)
via an API
Nevertheless, only few metrics are implemented. For more details see [JLIN05].
Free tools: Metrics and JMetrics
Metrics 1.3.6 is a free plug-in for Eclipse. Calculation is based on binary code thus the
source code should be compilable. Most important metrics like Number of Classes,
Number of Children (subclasses) of a class, DIT, NOM, LOC, Total-LOC, LOCm, WMC,
LCOM1, LCOM2 are provided. It is possible to export the measurement data in XMLformat. For more details see http://metrics.sourceforge.net/
JMetric is a free stand-alone tool for the metrics collection and analysis. JMetric collects
information from Java source files and compiles a metrics model. This model is then
populated and extended with metrics such as LOC, Total-LOC, LOC in Methods,
Number Of Classes, Lack of Cohesion Of Methods, WMC, Number Of Children classes,
73
DIT, NOM. JMetric also provides few analysis methods in the form of drill down and
cross section tables, charts, and raw metrics text. For more details see
http://www.it.swin.edu.au/projects/jmetric/ .
Though the wide range of tools several metrics for Java should be additionally
implemented: CDEm, m, SMI, LCOM4, NAC, NDC and NOD.
Framework for GQM-approach
In current thesis the presentation of the quality model is made in MS Visio. However
there are tools for automated support of model building, measurement and
interpreting, which could be integrated into the development landscape. GQM-DIVA,
GQMaspect and MetriFlame are some of these. For more details in GQM-approach
supporting tools readers are referred to [LAVA00].
However these tools are not handy and don’t support appropriate input source of
information. Thus for the goals of the experiments presented in this thesis some XMLand XSLT-documents were prepared in order to provide automatic generation of the
reports. In common the measurement process works in the following way: third tools
provide the measurement data in form of XML-files. The quality model is also saved in
form of XML-file. All these XML-files are input for XSL-transformation, which
generates the output report concerned to the GQM-model. See figure 7.1 for more
architectural details. Example of usage is given on p. 85.
Figure 7.1: Architecture of the metric report generator
In the experiments the trial version of Borland Together Developer 2006 for Eclipse, trial
version of CodePro 4.2 and Clone Analyzer 0.0.2 were used. This selection is explained
by a wide number of provided metrics and proper output format. However, on
conditions that the analyzed project doesn’t contain compiler errors, the free tools like
Metric 1.3.6 or JMetric can be used in order to save costs.
74
8. Results
For empirical validation of the selected metrics several experiments were made. In this
chapter the description of the experiments, their results and drawn conclusion are
discussed.
Overview of the Code Examples to be Analyzed
For empirical validation of the selected metrics several SAP projects were selected.
Overview is depicted in table 8.1.
Table 8.1: List of the projects analyzed during this work
Components
Contact person or
organization
Java
Two versions of Object Manager (SLD):
perforce3008:\\sdt\com.sap.lcr\620_SP_COR\src\java\com\sap\lcr\objmg
r\... (old)
perforce3301:\\engine\j2ee.lm.sld\dev\src\tc~sld~sldcimom_lib\_tc~sld~s
ldcimom_lib\java\com\sap\lcr\objmgr\... (new)
Two versions of SLD-Client (WBEM API):
perforce3301:\\base\common.sld\dev\src\tc~sld~lcrclient_lib\_tc~sld~lcrc
lient_lib\java\com\sap\lcr\api\cim\... and \cimclient\... (old)
perforce3301:\\base\common.sld\dev\src\tc~sld~sldclient_lib\_tc~sld~sld
client_lib\java\com\sap\sld\api\wbem\... (new)
JLin/ATX project comparison: "perforce3002\tc\jtools\630_SP_COR" vs.
"perforce3227\buildenv\BE.JLin\dev"
Mobile Client routine comparison: 7.0 (2.1) vs. 7.1 (2.5)
Thorsten Himmel
Thorsten Himmel
Georg Lokowandt
Janosz Smilek
ABAP
Package CRM_DNO
Package ME
Richard Himmelsbach
Joachim Seidel
For assessment of the actually maintainability for selected project either process metrics
or expert’s opinion are needed. The process metrics could be easily extracted from the
system for the customer messages management. Examples are MTTM or Backlog
Management Index (number of problem closed during the month / number of problem
arrivals during the month). Further examples for the available process metrics can be
found in [SAP05c].
However, for the new projects no information about the maintenance is available yet,
for other projects it was a bureaucratic problem to get the data. Hence the alternative
option using the expert’s opinion should be used.
Very popular method for evaluation of software metrics is finding the correlation
between expert’s estimation and automated calculated metrics. As statistical method the
convergence or other similar methods can be used. However this works in case of
enough estimated sources are available. Furthermore, it is desirable to have all
75
estimations made by one expert in order to ensure the uniformity of the evaluation. In
context of the current experiment it is impossible.
Hence for initial evaluation a simplified procedure is suggested. Several experiments
will be made. In each experiment only two releases of one software component will be
compared. Older release supposed to have improper design, which is improved in the
last release. It is assumed that releases have near the same functionality and thus the
same minimal complexity, to be achieved.
Comparing of metric values for two releases should give an idea whether the selected
metrics are robust enough to indicate the maintainability improvement.
This methodology is simple, but powerful. One of the experts remarks, that he couldn’t
characterize any of provided examples as the good or bad maintainable one, but the
ranking among examples is obvious.
Because of lack of tools for ABAP only Java projects will participate in the experiments.
However here short description of the selected ABAP projects is presented, which can
be used for further researches.
The first example comes from Richard Himmelsbach. Package CRM_DNO presents
classical usage scenario of ABAP and OO-ABAP. The example can be found in the
system TSL 001. This package contains several reports for DNO monitor and also
includes used classes and functions. Advantages of design are: the clear structure and
readability, the function encapsulation, modularity, exceptions handling, comments
and naming conventions and customizability because of used parameters.
The second example was suggested by Joachim Seidel and includes several objects from
the package ME. The following objects are worthy of notice, because these have being
continuously changed by many notes in different releases; the history of changes in
release 4.6C is the longest one:
the function module ME_READ_LAST_GR in the function group EINR
the function module ME_CONFIRMATION_MAINTAIN_AVIS and include
LEINBF0Q in the function group EINB
the includes LMEDRUCKF17, LMEDRUCKF06 and the function module
ME_READ_PO_FOR_PRINTING in the function group MEDRUCK
the program SAPMM06E and include MM06EF0B_BUCHEN
the function group MEPO
Such high number of faults denotes a bad design of the earlier releases, however no
opinion about the maintainability of the new version of ME is available.
Experiments
The overview of all analyzed projects can be found in table 8.2.
The arrows in the cells of the newer versions indicates the improvement () or
degradation (). It can be seen, that most pairs of metrics (old-new version) show
improvement.
Because of lack of tools fulfilling of the following goals was not proved: Consistency,
Maturity and Packaging.
76
Table 8.2: Overview of analyzed projects
ObjMgr
old
Metric
LOC
378,8
LOCm
14,9
WMC
50,1
RFC
87,0
CBO
7,4
LC
77,5
CLON
11%
CDEm
0,82
20,455
Total-LOC
54
Total-NOC
ObjMgr SLDClien SLDClien
new
t old
t new
224,2 268,3
328,9 8,9 4,2
8,0 19,4 35,4
25,7 30,1 54,1
45,3 6,9 9,9
8,8 81,0 70,0
43,5 1% 2%
1% 0.833 0,825
0,830 24,888
48,824
48,020
111
182
146
JLin/AT
X 630
132,9
9,5
15,9
33,0
7,0
54,0
7,2%
0,859
50,249
378
JLin/AT
Mobile
X dev
Client 7.0
108,1 157,9
7,7 6,3
13,8 16,8
30,2 24,3
6,4 4,7
40,0 35,0
2,2% 6%
0,842 0,871
44,116
27,000
408
171
Mobile
Client 7.1
153,0 6,2 16,6 23,7 5,1 31,0 3% 0,886 59,664
390
400
37 8,8
Project: ObjMgr
ObjMgr
350
250
224 ,2
300
7,4
10, 0
111
54
8
20 ,45
24, 88
5
83, 3
82 ,4
77 ,5
3 0, 1
6,9
8,9
14 ,9
50
1 9, 4
50 ,1
100
81, 0
87 ,0
150
11 0,0
200
0
LOC
LOCm
WMC
RFC
CBO
LC
CLON*10
CDEm *
100
TotalKLOC
TotalNOC
Figure 8.1: Evolution of the project ObjMgr
The measurement data for the project ObjMgr is presented in figure 8.1. Red columns
(the left columns in each pair) present old version, yellow – new. Two metrics on the
right side are additional and represent the size of the system in Total-LOC and TotalNOC. The new version has twice more classes, but total amount of code in LOC rose
insignificantly. It caused the reduction of the average number LOC in classes. But for all
that the inter-modular metrics are also improved. Old version had the redundancy of
the complexity. It is shown by the metrics WMC and CLON. In new version significant
reduction of number of clones leads to decreasing of the intra-modular complexity. The
77
metric WMC has greatly improved from 50,1 in old version to 19,4 in new version. Such
large difference could be explained by the distribution of the complexity in twice larger
number of classes. However total amount of complexity was also reduced – in old
version sum of WMC in all classes is equal 2700, in newer version – about 2150.
Assumed, that about 10% of complexity was decreased by reducing of clonicity and the
residual difference is caused by proper design.
Insignificant degradation is shown only by the metrics LC and CDEm.
ObjMgr (new) is a new developed version of ObjMgr (old). During the development the
design of the application was very simplified. This leaded to more handy usage of the
API and therefore to the higher maintainability.
Project: SLDClient
The same can be said about the evolution of the SLDClient, whereas here new version
provides additional new functionality. Probably new version seems to be more
complex, because of the new features, however from viewpoint of the maintainability
old version was very poor, because often the changes of one part caused faults in
another part. Redundancy was also reduced in newer version.
Noteworthy, the extensive usage of the patterns, in particular Visitor, leads to increasing
of LOC in the classes of newer version.
400
300
3 28,9
SLD C lie nt
268,3
350
182
250
146
200
20,0
48,82
4
4 8, 0
20
83, 0
10, 0
43, 5
70,0
9,9
54,1
4 5, 3
8,8
8,0
4,2
50
25, 7
35,4
100
82,5
150
0
LOC
LOCm
W MC
RFC
CBO
LC
CLON*10
CDEm *
100
TotalKLOC
TotalNOC
Figure 8.2: Evolution of the project SLDClient
Project: JLin/ATX
This project is special, because all metrics without exception show improvement of the
maintainability and thus completely meet the expert’s opinion.
78
378
408
In particular one can see, that the newer version has less Total-LOC, whereas it
provides more functionality. Moreover, Total-NOO was slightly increased and clonicity
was reduced, all this lead to decreasing of the average LOC and WMC in classes and
methods.
Noteworthy, that though the increasing of the total number of classes, the intermodular metrics (RFC, CBO) show also improvement. See figure 8.3 for the overview.
400
JLin/ATX
350
300
250
WMC
7,0
6,4
15,9
13,8
LOCm
85,9
84,2
72,0
50,24
9
44,11
6
33,0
30,2
9,5
7,7
50
54,0
40,0
100
22,0
150
132,9
108,1
200
0
LOC
RFC
CBO
LC
CLON*10
CDEm *
100
TotalKLOC
TotalNOC
Figure 8.3: Evolution of the project JLin/ATX
Project: Mobile Client
This project will be considered in more details. In this sub-section the short code names
7.0 and 7.1 will be used. The 7.1 (new version of the Mobile Client) provides a lot of new
functionality, what is also reflected by doubling of the Total-LOC and Total-NOO.
Nevertheless, the inter-modular metrics remain in the recommendable area.
Algorithm Complexity
The algorithm complexity is relative high, but still acceptable in both releases. The
average WMC has been slightly reduced in 7.1:
Ave-WMC(7.0) = 16,754
Ave-WMC(7.1) = 16,592
79
3 90
400
Mobile Client
350
300
157,
9
153,0
200
171
250
4
27, 00
0
59, 66
87, 1
8 8, 6
3 0, 0
3 5, 0
31, 0
4,7
5,1
24, 3
23, 7
6,3
6,2
50
16, 8
16, 6
100
60, 0
150
0
LOC
LOCm
WMC
RFC
CBO
LC
CLON*10
CDEm *
100
TotalKLOC
TotalNOC
Figure 8.4: Evolution of the project Mobile Client
Selfdescriptiveness
In common 7.1 is better commented than 7.0.
LC(7.0) = 35. This means 65 comments for each 100 LOC.
LC(7.1) = 31. This means 69 comment lines for each 100 LOC.
However, the manual examination shows, that many comments are automatically
generated (JavaDoc), not very meaningful or are just the code commented out.
Noteworthy, that 7.1 has more interfaces, abstract classes and data-container classes,
which have very high comments/code ratio. In this case such small changes of LC are
difficult to interpret. This metric deserves attention only in case of inadmissible values.
Modularity
Most important metrics for modularity are presented in table 8.3. 7.1 is twice so large
than 7.0 based on size on disk, number of classes and LOC. 7.1 has on average smaller
classes (153 lines of code). Typical class of 7.1 is smaller than typical class of 7.0 (see
medians in table 8.3). This means that in 7.1 only few classes are large, 7.0 has more
large classes. See appendix D with lists of complex classes and complex methods for
more details. 7.1 has slightly smaller methods as well. Only 22 methods are larger than
80 LOC. In 7.0 – 40 methods have more than 80 LOC. Based on these facts the
modularization of 7.1 is better.
80
Table 8.3: Modularity analysis
Size on disk
NOC (Number Of Classes)
NOC (Number of all Classes, including internal)
LOC (Lines Of Code)
Median-LOC
Ave-LOC
Ave-LOCm (of methods)
7.0
772 159
139
171
27 000
92
157,89
6,33
7.1
1 725 178
326
390
59 664
83,5
152,98
6,22
Structuredness
For coupling two metrics were selected: RFC and CBO. 7.1 has smaller RFC but bigger
CBO. For 7.0 this means that on average classes are coupled to less number of other
classes, but use these relations more actively. For qualitative statement about the
coupling the experts input is required.
Ave-RFC(7.0) = 24,292
Ave-RFC(7.1) = 23,667
Ave-CBO(7.0) = 4,737
Ave-CBO(7.1) = 5,15
For assessing of the cohesiveness metric LCOM4 is proposed. However, Borland
Together is not able to calculate cohesion in form it is supposed, thus four other
available metrics were analyzed, average values for which are presented in table 8.4.
The arrows in the cells of the newer versions indicates the improvement () or
degradation ().
Table 8.4: Comparison of different cohesion metrics for all classes
Component
LCOM1
LCOM2
LCOM3
TCC
Mobile Client 7.0
31,94
39,74
34,33
11,45
Mobile Client 7.1
51,35 38,91 30,36 8,73
Thus based on 3 of 4 metrics 7.1 is appreciably more cohesive.
More detailed analysis has showed that many classes, which have a role of data storage,
are not cohesive because of get and set methods. Nevertheless, such classes don’t have
to be cohesive in sense of intersection of attribute usage. Noteworthy that 7.1 has more
data containers than 7.0: 148 of 390 classes against only 52 of 171.
Cohesion in 7.1 is relative high, what is the indicator of good design. However, in order
to be able to give some suggestion for improvement, the new implementation of
cohesion metric (LCOM4) is needed.
For assessing of inheritance usage metrics NAC (Number of Ascendant Classes) and
NDC (Number of Descendant Classes) are suggested. However Together doesn’t
calculate these metrics, and the metrics DIT (Depth in Inheritance Tree) and NOC
(Number Of direct Children) are used instead.
81
Since these metrics are intended for a single class, any aggregation will lead to
uninterpretable results, thus these metrics are used in form of audits. Percentage
grouping is used in order to aggregate values for the entire system. Any other method
of aggregation needs additional human input.
Classes with DIT > 3 (7.0)
Classes with DIT > 3 (7.1)
red: 24
6%
red: 13
8%
green:
366
94%
green:
158
92%
Figure 8.5: Percentage of classes with more than 3 parents.
LOC in classes with DIT>3
(7.0)
red:
LOC in classes with DIT>3
red:
(7.1)
5662
9%
2224
8%
green:
24776
92%
green:
54002
91%
Figure 8.6: Percentage of LOC in classes with more than 3 parents.
Figure 8.5 shows percentage of classes with more than 3 parents, the list of such classes
for 7.1 can be found in appendix D. However, the check in more detailed level of
granularity, namely in LOC, shows, that number of LOC in classes with DIT>3 has been
slightly increased from 8% to 9%. It is shown in figure 8.6.
7.0 has higher IF and so less stand-alone classes:
IF (7.0) = 0,77
IF (7.1) = 0,68
The list of the complex stand-alone classes, which probably should be broken into small
hierarchies, can be found in appendix D.
82
Clonicity
7.0 has 4 clone sets with totally 1636 LOC. Thus Clonicity is 6,1%
7.1 has 8 clone sets with totally 1850 LOC. Thus Clonicity is 3,1%
Both components have quite small clonicity, the difference is insignificant. The list with
clones for 7.1 can be found in appendix D.
Entropy
An archive compression rate can be used as very primitive indicator of entropy –
average amount of information within the text. Based on compression rate both
components have approximately equal entropy of source code.
Import-based CDEm shows that 7.1 has slightly higher entropy of the package names
usage and thus is expected to require more cognitive loading from the maintainer.
Nevertheless 7.0 has only 21 packages, while 7.1 has 44 packages, thus 7.1 has much
more possibilities for composing of import-section. This fact together with improper
calculation of the CDEm demands the additional research.
Table 8.5: Comparison of compression coefficients
Size on disk
Size of ZIP-archive
Coefficient of compression (ZIP)
Normalized CDEm (import-based)
CDEm (import-based)
7.0
772 159
216 556
0,280
0,85
4,48
7.1
1 725 178
501 998
0,291
0,89
6,63
Value
7.1 has slightly smaller average values for the metrics LOC and WMC, which affect the
number of test-cases. Thus it is expected that 7.1 should have on average slightly
smaller number of test-cases needed pro class.
Simplicity
7.1 has slightly smaller average values for the metrics LOC and RFC, which affect the
simplicity of the test-cases. Thus it is expected that 7.1 should have on average slightly
easier test-cases.
Summary
Based on the metrics investigation the Mobile Client 7.1 has less complexity than Mobile
Client 7.0. In most investigated aspects 7.1 is more maintainable than 7.0. This
conclusion is based on the analysis of the set of the selected metrics.
Only CDEm (entropy-metric) and CBO (Coupling Between Objects) have shown
degradation of newer release. Here the additional research is needed. Under condition
83
that 7.1 is twice so large than 7.0 and also provides new functionality, such small
degradation is insignificant and expected.
Admissible Values for the Metrics
This section establishes the recommendable and admissible values for the validated
metrics in the following way: two boundary values partition all possible values of each
metric in three areas – recommendable area (green), admissible area (yellow) and
inadmissible area (red). Inadmissible values should attract attention of the analyst and
indicate possible problems by the maintenance.
Multiple studies confirm an optimal value for LOC based on studies of defect density
for classes with different LOC. These studies result that small classes usually have high
DD because of low number of LOC and large classes have high DD because of high
complexity and thus the high probability of the fault. Consequently, from the viewpoint
of the DD, the optimal size for a class is approximately 100 LOC.
Similar method can be used for most other metrics. Nevertheless, in the current
experiment no process data (like DD) is available, hence a method based on the expert’s
judgments should be used. This method tries to set the boundaries between the areas
so, that the distinction between the old and new projects is maximally enlarged and the
values for more maintainable project are placed in better areas. Table 8.6 presents
boundary values for validated metrics. Let’s call the jump of the metric from less
desirable area to more desirable area as confirmed improvement of the release. Most
desirable is to have 32 confirmed improvements: 8 metrics, 4 projects. With chosen
boundary values the metrics indicate 21 improvements, what is quite good recognition
grad.
Table 8.6: The admissible values for Java projects
Y
R
Metric
G
LOC
120
155
LOCm
9
12
WMC
15
26
RFC
24
32
CBO
6
9,5
LC
32
53
CLON
5%
10%
CDEm
0,85
0,9
It is desirable to have the equal boundaries for all projects. However, small differences
can be found between different types of projects and especially between the ABAP and
Java projects. For example it is expected that methods in classes are smaller than
procedures, because of another kind of encapsulation and inheritance. In
[WOLL03, p. 5] has been also shown that program in Java expected to have more
functionality, than program in ABAP of equal size. This peculiarity appears because of
more compact syntax in Java.
84
Interpretation of the Results
Like it was already mentioned in chapter 4, the quality model can be used not only for
the metrics definition and validation, but also for the interpretation of the measurement
results. In this section several instructions will be given, how to interpret the results.
On this occasion the analysis of the model is studied in the bottom-up manner and
starts on the metric layer by the aggregation of the measurement data into higher
layers. After the metrics are calculated and the admissibility of the values is
determined, the corresponding questions can be answered. Achievement of the
recommendable value of the metric signifies the positive sense of the answer. Depends
on the answers for the corresponding questions the achievement of the goals can be
estimated.
Following a very short quotation of the interpretation for project Mobile Client 7.1 is
given:
The metrics LOC and WMC are in the admissible area. Thus the goal “Low algorithm
complexity” is only partially achieved. However LOCm is in the recommendable area
and hence in spite of high LOC one can conclude that the goal “Modularity” is
complete. The goal “Structuredness” is also complete because of proper values of RFC
and CBO, however there is a lack of design in sphere of interface for packages, what is
shown by a bit higher value for CDEm. The goal “Low test effort” is achieved only
partially, because high LOC and WMC indicate high number of complicated test-cases
needed. The goals “Selfdescriptiveness” and “Clonicity” are completely achieved. All in
all one can conclude about the relatively high maintainability of this project.
Measurement Procedure
For reader, who is interested in repeating of this experiment with his own data, the
following instruction might be helpful.
For the measurement is recommendable to use Borland Together Developer 2006 for
Eclipse, because this tool provides most of needed metrics and allows saving of the
results in XML-format. It is possible to use trial version in order to save costs. Visit
http://www.borland.com/us/products/together/ in order to download and install the
tool. After the installation a new entry “Quality Assurance” in context menu for project
should appear in your Eclipse platform. Go to the Java perspective and choose “Quality
Assurance”->”Metrics” from context menu for your project. Before the calculation of
the metrics can start, some options should be set. Choose “Option” and select the
following metrics from the list: LOC, NOO, LCOM1, LCOM2, LCOM3, TCC, RFC,
WMPC1 (in current work this metric is called WMC), CBO, DOIH (DIT), NOCC (NOC),
TCR (CR). Additional settings for each metric can be set. However it is recommendable
to leave all settings by default value, because Together calculates average values not
properly. Confirm your choice by “OK” and start calculation. After that the
measurement data will appear in hierarchical form in a new Eclipse view called
“Metric”.
For the analysis it is useful to present the metric data in tabular form like is shown in
table 6.3. This could be done automatically using XSLT transformation. Before
automatic filling up, export the measurement data from Together to a file in XML85
format. Put your XML-file into directory with XSLT files and start XSLT transformation
using the sequence of two following commands:
java
XslTransformator
// Java-class for transformation
myproject.xml
// your saved XML-file
MMtogether2xml_average.xsl
// XSLT adapter for Together
MetricXml.xml
// temporary file
java
XslTransformator
// Java-class for transformation
MMGQM.xml
// GQM quality model in XML format
MMGQM2table.xsl
// XSLT for the output table
MetricTable.html
// Output file
Make sure that your CLASSPATH includes a link to the JAXP-compliant XSLT
processor, for example “xerces”.
The output table will be saved in the file you selected as third parameter in the second
transformation (in given example it is “MetricTable.html”).
After this procedure several metrics are still missing in the output table. To insert these
values some other tools are suggested.
One of them is CloneAnalyzer, discussed in chapter 7. Download and install this
Eclipse Plug-In, the new menu element called “CloneAnalyzer” should appear. Select
“CloneAnalyzer” -> “Build” and new Eclipse view “CloneTreeViewer” should come in
sight, where all clones are recorded with indication of the size of the clone and source
file, where this clone was found. Noteworthy that CloneAnalyzer search in all open
projects, thus please close unnecessary projects. Unfortunately, this tool does not
provide metric CLON, and number of LOC in all clones should be calculated manually
and divided by Total-LOC, in order to get the aggregated value. After that metric
CLON can be included into the output table.
For the metric CDEm special Java-class was developed. Select next command for the
start:
java
com.sap.olek.EntropyImport
"C:\Program Files\workspace\objmgr" // Path to directory with source code
import_objmgr_old.txt
// Output file
After execution of this command two files are generated: output file and statistic file. At
the end of the output file find value for “Norm. Entropy” and put it into the output
table for CDEm value.
It is also possible to automatically prepare the metric-based audits using the command:
java
XslTransformator
// Java-class for transformation
myproject.xml
// your saved XML-file
MMtogether2xml_list_reports.xsl
// XSLT for audits
MetricTable.html
// Output file
86
The output file presents HTML report with classes, which violate one of the following
audits (example of this report for Mobile Client 7.1 is given in appendix D):
The list of complex methods (LOC > 80)
The list of classes with DIT > 3
The list of classes with NOC > 10
The list of complex stand-alone classes (WMC > 50)
The list of large classes (LOC > 500)
The used for the transformations files can be found in CD for this master thesis.
It is also possible to use free tools in order to save costs. The most appropriate candidate
is the tool Metrics (see corresponding section in chapter “Tools”). Nevertheless, the
XSLT adapters for new tools should be implemented.
87
9. Conclusion
The experiments have shown that most of the selected and validated metrics can be
used as reliable maintainability indicators.
Nevertheless, many metrics provided by available tools have other implementation as it
was initially supposed. Such deflection is acceptable for the initial examination,
however for the next usage the metric implementations should be corrected.
After the analysis of the ability to assess the maintainability, the following groups of
metrics have been distinguished:
1. Metrics – possible indicators of the maintainability
These metrics consecutively correspond with expert’s opinions. This means that in most
experiments they indicated the improvement of newer (better) version. Nevertheless,
the difference between the values is sometimes not evident (only 3-5%). Hence in this
case human input is needed to make the conclusion. Additionally, all these metrics can
be used in form of audits for the finding of potentially bad maintainable code.
Metrics: WMC (Weighted Method Complexity), LOC (Lines Of Code), RFC (Response
For a Class), CBO (Coupling Between Objects), LC (Lack of Comments), CLON
(Clonicity), MCC (McCabe Cyclomatic Complexity)
Decision: admit into the quality model for the maintainability assessment
2. Metrics – candidates for audits or code reviews
The metrics NAC (Number of Ancestor Classes) and NDC (Number of Descendant
Classes) are very good descriptors for a single class, thus can indicate the classes which
probably can cause problems for maintenance, but the aggregated values are not
representative. The experiments show that the aggregated values for these metrics often
differ from expert’s opinions. Thus the inheritance metrics are poor indicators of the
maintainability of the entire application. Therefore, these metrics are kept in the quality
model, but as optional component for code reviews.
3. Metrics which didn’t show appropriate results
The packaging metrics are able to find only evident error of design (for example – not
used abstract classes) and don’t suit even for the audits. As it was expected, the metrics
A (Abstractness), In (Instability) and D (Distance from the main sequence) are bad
indicators for the maintainability. These metrics were rejected from the quality model.
The metric CDEm (Class Definition Entropy - modified) also didn’t show high
correlation with expert’s opinions. Nevertheless, probably bad results are caused by
inexact calculation and because of many “*” in import sections. More experiments with
this metric are needed.
88
4. Metrics that have not been participating in the experiments, but are supposed
to be good indicators of the maintainability
Metrics: m (Entropy), LCOM (Lack of Cohesion Of Methods), NOD (Number Of
Developers), FAN-IN, FAN-OUT and SMI (Software Maturity Index) have not been
participating in the experiments because of lack of tools or data. These metrics can only
restrictedly be admitted into the quality model.
The result of this thesis is a deeper knowledge about the maintainability, which is
essentially formalized in form of the quality model. Based on this model it is possible to
understand the substance of maintainability and also measure the most important
indicators of it.
Based on the theoretical speculation and the experiments presented above, the
following conclusions can be made:
It is possible to describe the different maintainability related aspects of the
software using metrics-based indicators. Several metrics chosen in this study
appear to be useful in predicting the maintainability
Since only limited aggregation is possible and the output of this research is a list
of the maintainability indicators, only a semi-automated process is possible. The
metrics can provide only the description of the system. The final decision should
be made by human
Because thousands of metrics exist, it should not be a problem to find the
appropriate metrics among them. However, during this thesis two new metrics
were suggested
Metrics have different levels of granularity: some of them describe a single class
or method, others can characterize the entire system. Since final indicators have
to describe the system, all metric values should be aggregated
Most problems occur when aggregating the data in order to characterize the
entire system. It is caused by the data garbling, information hiding or
inadmissible operations, since the metrics are good indicators of a single module.
Therefore the aggregation should be done very carefully
Since a poor quality of code can be found much more easily than a good
designed code, metrics can be used in form of audits. Metric-based audits are
good supporter for code reviews
In this work the admissible values for each practically investigated metric were
determined. On the other hand, these values depend on the used programming
paradigm and language
Metrics are able to show the trend (improvement or degradation) within the list
of releases of one component
Metrics are of limited use to compare different components
Nevertheless, the metrics are just one possibility to describe certain properties of the
software. Interpretation whether such description means good or bad maintainable
code depends on the design and goals. The final conclusion about the maintainability of
software can only be made by human.
89
10. Outlook
The current thesis gives an idea about the abilities of the metrics from viewpoint of the
maintainability assessment. Parallel with the theoretical introduction into the software
measurement, also a practical example of usage has been produced. Hence the results of
this work can be already practically used. Nevertheless, before the successful usage
several open issues should be settled. The most important issue is finding the
appropriate tool for the measurement. In chapter 7 several tools are discussed, but none
of them provides all required metrics. Several metrics for Java should be additionally
implemented. The ABAP environment has even more deficit in the tool support.
Also several metrics were not validated because of a lack of data or tools. The author
believes that all selected metrics are reliable, but to ensure results the additional
experiments should be made.
The next important step is to research how the usage of patterns impacts values of
metrics. For example, the usage of the pattern “Visitor” can lead to increasing of LOC in
the class-Visitor and it would not be unwanted. Exact impacts of different patterns and
their consequences for the metrics have to be researched. In [GARZ02] some metrics for
patterns are discussed. In [KHOS04] the assessment of 23 patterns from viewpoint of
the simplicity, modularity, understandability etc is provided. Khosravi argues that the
patterns should be used very carefully, for example the pattern Proxy makes debugging
much harder and increases the number of classes.
In the second part of the outlook, several interesting approaches are mentioned, which
make possible the further expansion of the metric-based quality mechanisms.
One of the problems of integrating the measurement tools and automating the
measurement procedure is the handling in heterogeneous and changing environment,
because during the lifecycle various tools can be used. In [AUER02] a simple metric data
exchange format and a data exchange protocol to communicate the metric data is
proposed. This approach aims filling the gap between frameworks and tools by offering
detailed instructions on how to implement metric data collection, yet an open and
simple standard, which allows easy integration of existing tools and their data handling
processes.
A flexible, easy and fast implementation of new metrics is also important. In [MARI] a
Static Analysis Interrogative Language is introduced. This is the language dedicated to
the aforementioned type of static analyses of source code and allows the implementation
of various metrics in a homogenous manner. After the parsing of the source code, the simple
but powerful queries can be written in order to obtain information about certain
properties of the code and calculate the metrics.
In this thesis the quality model aims only at collection and presentation of metrics data
to expert who should make decisions about the maintainability of the product based on
measurement data and their experiences. However, the processing of the metrics data can
be fully automated as well by using for example Fuzzy Logic or Neuronal Networks. In
90
[THWI] Thwin uses the neuronal networks for proving the ability of object-oriented
metrics to predict the number of software defects and the maintenance effort.
The next approach is not metric-based, but nonetheless very interesting and useful. In
[GALL02] Gall uses CVS history for detecting of not obvious logical relations between
classes. The classes, which are often changed together by a single change request, may
have a logical relation not necessarily reflected in physical relations. After the study of
the change history, the list of the classes, which were often changed together by one
change request, is generated and hints about such relations could be given to the
maintainer.
91
References
[ABRA04] Alain Abran, Miguel Lopez, Naji Habra, An Analysis of the McCabe
Cyclomatic Complexity Number , in 14th International Workshop on Software
Measurement (IWSM) IWSM-Metrikon 2004, Konigs Wusterhausen , Magdeburg,
Germany , Shaker-Verlag , 2004 , pp. 391-405 .
[ABRA04b] Alain Abran, Olga Ormandjieva, Manar Abu Talib, Information Theorybased Functional Complexity Measures and Functional Size with COSMIC-FFP, 2004
[AHN03] Yunsik Ahn, Jungseok Suh, Seungryeol Kim and Hyunsoo Kim, The software
maintenance project effort estimation model based on function points, J. Softw. Maint.
Evol.: Res. Pract. 2003; 15:71–85
[ALTU06] Yusuf Altunel, Component-Based Software Engineering, Chapter 9:
Component-Based SW Testing, Lecture Notes, 26.01.2006
[AUER02] Martin Auer, Measuring the Whole Software Process: A Simple Metric Data
Exchange Format and Protocol, 2002
[BADR03] Linda Badri and Mourad Badri, A New Class Cohesion Criterion: An
empirical study on several systems, 7th ECOOP Workshop on Quantitative Approaches
in Object-Oriented Software Engineering, (QAOOSE'2003), July 22nd, 2003
[BASI94] Victor R. Basili, Gianluigi Caldiera, H. Dieter Rombach, The Goal Question
Metric Approach, 1994
[BASI95] Victor R. Basili, Lionel Briand and Walcélio L. Melo, A validation of objectoriented design metrics as quality indicators, Technical Report, Univ. of Maryland, Dep.
of Computer Science, College Park, MD, 20742 USA. April 1995.
[BIEM94] James M. Bieman and Linda M. Ott, Measuring Functional Cohesion, IEEE
Transactions on software engineering, Vol. 20, No. 8. August 1994, p.p. 644 – 657
[BRUN04] Magiel Bruntink, Arie van Deursen, Predicting Class Testability using
Object-Oriented Metrics, 2004
[CART03] Tom Carter, An introduction to information theory and entropy, Complex
Systems Summer School, June, 2003
[CHID93] Shyam R. Chidamber, Chris F. Kemerer, A metrics suite for object-oriented
design, M.I.T. Sloan School of Management, Revised December 1993
[DARC05] David P. Darcy, Chris F. Kemerer, Sandra A. Slaughter, The Structural
Complexity of Software: Testing the Interaction of Coupling and Cohesion, January 22,
2005
[DOSP03] Jana Dospisil, Measuring Code Complexity in Projects Designed with
Aspect/J, Informing Science InSITE - “Where Parallels Intersect” June 2003
92
[DUMK96] Reiner R. Dumke, Erik Foltin, Metrics-based Evaluation of Object-Oriented
Software Development Methods, 1996
[ETZK97] Letha Etzkorn, Carl Davis, and Wei Li, "A Statistical Comparison of Various
Definitions of the LCOM Metric," Technical Report TR-UAH-CS-1997-02, Computer
Science Dept., Univ. Alabama in Huntsville, 1997
[ETZK99] Letha Etzkorn, Jagdish Bansiya, and Carl Davis, Design and Code
Complexity Metrics for OO Classes, Journal of Object Oriented Programming 1999;
12(1):35–40
[ETZK02] Letha H. Etzkorn, Sampson Gholston and William E. Hughes, A semantic
entropy metric, J. Softw. Maint. Evol.: Res. Pract. 2002; 14:293–310
[FELD02] David Feldman, A Brief Introduction to: Information Theory, Excess Entropy
and Computational Mechanics, April 1998 (Revised October 2002)
[GALL02] Harald Gall, Mehdi Jazayeri and Jacek Krajewski, CVS Release History Data
for Detecting Logical Couplings, Technical University of Vienna, Distributed Systems
Group, Proceedings of the Sixth International Workshop on Principles of Software
Evolution (IWPSE’03)
[GARZ02] Javier Garzás and Mario Piattini, Analyzability and Changeability in Design
Patterns, SugarloafPLoP 2002 Conference
[HASS03] Ahmed E. Hassan and Richard C. Holt, The Chaos of Software Development,
2003
[JLIN05] SAP-intern documentation. See in SAPnet
http://bis.wdf.sap.corp:1080/twiki/bin/view/Techdev/JavaTestTools -> JLin
[KABA] Hind Kabaili, Rudolf K. Keller, François Lustman and Guy Saint-Denis, Class
Cohesion Revisited: An Empirical Study on Industrial Systems
[KAJK] Mira Kajko-Mattsson, Software Evolution and Maintenance
[KELL01] Horst Keller, Sascha Krüger, ABAP Objects, Einführung in die SAPProgrammiereung, 2001, SAP PRESS
[KHOS04] Khashayar Khosravi, Yann-Gael Gueheneuc, A Quality Model for Design
Patterns, Summer 2004
[LAKS99] Anuradha Lakshminarayana and Timothy S. Newman, "Principal
Component Analysis of Lack of Cohesion in Methods (LCOM) metrics," Technical
Report TR-UAH-CS-1999-01, Computer Science, Dept., Univ. Alabama in Huntsville,
1999
[LANZ99] Michele Lanza, Combining Metrics and Graphs for Object Oriented Reverse
Engineering, 1999
[LAVA00] Luigi Lavazza, Providing Automated Support for the GQM Measurement
Process, IEEE SOFTWARE May/June 2000, p.p. 56-62
93
[MARI] Cristina Marinescu, Radu Marinescu, Tudor Gırba, A Dedicated Language for
Object-Oriented Design Analyses
[MART95] Robert Martin: OO Design Quality Metrics - An Analysis of Dependencies,
August 14, 1994 (revised June 20, 1995)
[MISR03] SUBHAS C. MISRA, VIRENDRAKUMAR C. BHAVSAR, Measures of
Software System Difficulty, SQP VOL. 5, NO. 4/2003, ASQ
[MUST05] K. Mustafa and R. A. Khan, Quality Metric Development Framework
(qMDF), Journal of Computer Science 1 (3): 437-444, 2005
[NAND99] Jagadeesh Nandigam, Arun Lakhotia and Claude G. Cech, Experimental
Evaluation of Agreement among Programmers in Applying the Rules of Cohesion,
Journal of Software Maintenance: Research and Practice, J. Softw. Maint: Res. Pract. 11,
35–53 (1999)
[PARK96] Robert E. Park, Wolfhart B. Goethert, William A. Florac, Goal-Driven
Software Measurement — A Guidebook, August 1996, Software Engineering Institute
[PIAT] Mario Piattini and Antonio Martínez, Measuring for Database Programs
Maintainability
[REIS] Ralf Reißing, Towards a Model for Object-Oriented Design Measurement
[RIEG05] Matthias Rieger, Effective Clone Detection Without Language Barriers,
Inauguraldissertation der Philosophisch-naturwissenschaftlichen Fakultat der
Universitat Bern, 10.06.2005
[ROSE] Dr. Linda H. Rosenberg and Lawrence E. Hyatt, Software Quality Metrics for
Object-Oriented Environments, NASA
[RUTH] Ian Ruthven, Maintenance
[RYSS] Filip Van Rysselberghe, Serge Demeyer, Evaluating Clone Detection Techniques
[SAP03] Cüneyt Çam, W. Hagen Thümmel, Philip J. Zhang, Essentials of CheckMan,
SAP AG 2003
[SAP03b] Randolf Eilenberger and Andreas Simon Schmitt, Evaluating the Quality of
Your ABAP Programs and Other Repository Objects with the Code Inspector, SAP
Professional Journal, 2003
[SAP04] Product Innovation Lifecycle, From Ideas to Customer Value, Whitepaper
Version 1.1, July 2004, Mat. Nr. 500 70 026
[SAP05] Dr. Eckart Spitzberg, Process Description: Quality Gates, Version 4.1
31.03.2005
[SAP05b] Pieter Bloemendaal, SAP Code Quality Management Newsflash - June 15,
2005
94
[SAP05c] Thomas Haertlein, Ulrich Weber, Neelakantan Padmanabhan, Horst Pax,
Project ‚Quality Indicators‘,23 May 2005
[SAP05d] Pieter Bloemendaal, Code Quality Management (CQM), 2005 SAP SI AG
[SERO05] Gregory Seront, Miguel Lopez, Valerie Paulus, Naji Habra: On the
Relationship between Cyclomatic Complexity and the Degree of Object Orientation,
2005
[SHEL02] Frederick T. Sheldon, Kshamta Jerath and Hong Chung, Metrics for
maintainability of class inheritance hierarchies, J. Softw. Maint. Evol.: Res. Pract. 2002;
14:147–160
[SNID01] Greg Snider, Measuring the Entropy of Large Software Systems, HP
Laboratories Palo Alto, HPL-2001-221, September 10th, 2001
[SOLI99] Rini van Solingen and Egon Berghout, The Goal/Question/Metric Method: a
practical guide for quality improvement of software development, 1999, McGraw-Hill
Publishing Company, London
[THWI] Mie Mie Thet Thwin, Tong-Seng Quah, Application of Neural Networks for
Software Quality Prediction Using Object-Oriented Metrics
[VELL] Paul Velleman and Leland Wilkinson, Nominal, Ordinal, Interval, and Ratio
Typologies are Misleading
[WELK97] Kurt D. Welker, Paul W. Oman And Gerald G. Atkinson, Development And
Application Of An Automated Source Code Maintainability Index, Software
Maintenance: Research And Practice, Vol. 9, 127–159 (1997)
[WEYU88] Weyuker, E. J., Evaluating Software Complexity Measures. IEEE
Transactions on Software Engineering, Volume: 14, No. 9, pp. 1357 – 1365. 1988.
[WOLL03] Björn Wolle, Analyze von ABAP- und Java-Anwendungen im Hinblick auf
Software-Wartung, CC GmbH, Wiesbaden, Published in MetriKon 2003 „SoftwareMessung in der Praxis“
[Yi04] Tong Yi, Fangjun Wu, Empirical Analysis of Entropy Distance Metric for UML
Class
Diagrams,
ACM
SIGSOFT
Software
Engineering
Notes,
Volume 29, Issue 5, September 2004
[ZUSE98] Horst Zuse: A Framework of Software Measurement, Walter de Gruyter,
Berlin, 1998, 755 pages. ISBN: 3-11-015587-7.
95
Abschließende Erklärung
Ich versichere hiermit, dass ich die vorliegende Masterarbeit selbständig, ohne
unzulässige Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel
angefertigt habe. Die aus fremden Quellen direkt oder indirekt übernommenen
Gedanken sind als solche kenntlich gemacht.
Potsdam, den 23. Februar 2006
96
Appendix A. Software Quality Models
(Taken from [KAJK])
McCall Quality Model, 1977 (Part)
Fenton’s decomposition of the maintainability
97
Software Quality Characteristics Tree (Boehm, 1978)
98
SMI
Rejected metrics
Metrics for audits
Non-counting metrics (Entropy)
Additional or process metrics
Inter-modular metrics
Intra-modular metrics
Metrics
Question
Goal (Internal Properties)
Goal (External Attributes)
Legenda
How many new, changed,
deleted objects does the
system have?
How significant was
the system changed?
Should the SW be compared with
original release because of changes?
Maturity
(MA)
CLON
Does the system have
clones?
Clonicity
(CL)
LC
How many
developers have
toched an object?
Consistency
(CO)
NOD
LOC
How affect parts on
each other?
How often does
developer face to new
unknown object?
H
NAC
RFC
How many other objects have
to understand developer in
order to completely
comprehend given object?
FANOUT
LCOM
Is cohesion high
enough?
Should developer analyze
not relevant code?
How is the code
divided in parts?
MCC
WMC
How easy is to
comprehend the SW?
Should developer understand
other parts of the system?
Structuredness (ST)
Is the System in all levels
good splitted in parts?
Should developer understand
large chunks of code at a time?
Modularity
(MO)
Does the code have
sufficient comments?
Selfdescriptiveness
(SD)
How complex is intramodular
algorithm complexity?
Algorithm Complexity
(AC)
Analysability
(Understandability)
Quality Metrics for Maintainability of the Standard Software
Appendix B. GQM – Quality Model
Are reusable elements isolated
from non-reusable elements?
Packaging
(PA)
Should developer change
large chinks of code at a time?
Modularity
(MO)
I
m
LCOM
D
How full are abstact
classes used?
A
LOC
CBO
FAN IN
Is the code sufficient
divided in parts?
How high is degree of conformance
of system to principles of maximal
cohesion and minimal coupling?
Should developer analyze
not relevant code?
NDC
What is the number of modules
changed per change cause?
How many relation are
between objects?
How many global
variables there are?
Is coupling low
enough?
Structuredness
(ST)
How easy is to
change the SW?
Should developer change other objects? Or
check either these have to be corrected?
Changeability
(Modifiability)
Maintainability
LOC
RFC
NOO
OO-D
IF
Additional metrics
LOC
NOM
Size-dependent metrics
FAN
OUT
WMC
MCC
How easy is to
test the SW?
LOC
How easy are
testcases to maintain?
Simplicity
(SI)
How many
Testcases should be
changed/proved?
Value
(VA)
Testability
How easy is to
maintaint the SW?
Appendix C. Quality Model in XMLformat
<?xml version="1.0" encoding="UTF-8"?>
<Model>
<Goal name="Maintainability">
<Question text="How easy is to maintain the SW?">
<Goal name="Analyzability">
<Question text="How easy is to comprehend the SW?">
<Goal name="Algorithm Complexity">
<Question text="How complex is intra-modular algorithm complexity?">
<Metric name="Weigthed Method Complexity" shortname="WMC" />
<Metric name="" shortname="MCC" />
</Question>
</Goal>
<Goal name="Selfdescriptiveness">
<Question text="Does the code have sufficient comments?">
<Metric name="" shortname="CR" />
</Question>
</Goal>
<Goal name="Modularity">
<Question text="How is the code divided in parts?"></Question>
<Question text="Should developer understand large chunks of code at a time?">
<Question text="Is the System in all levels good splitted in parts?">
<Metric name="" shortname="LOC" />
<Metric name="" shortname="LOCm" />
</Question>
</Question>
<Question text="Should developer analyze not relevant code?">
<Question text="Is cohesion high enough?">
<Metric name="" shortname="LCOMm" />
</Question>
</Question>
</Goal>
<Goal name="Structuredness">
<Goal name="Consistency">
<Question text="How many developers have touched an object?">
<Metric name="" shortname="NOD" />
</Question>
</Goal>
<Question text="How affect parts on each other?"></Question>
<Question text="Should developer understand other parts of the system?">
<Question text="How many other objects have to understand developer in
order to completely comprehend given object?">
<Metric name="" shortname="NAC" />
<Metric name="" shortname="RFC" />
<Metric name="" shortname="FAN-OUT" />
</Question>
</Question>
<Question text="How often does developer face to new unknown object?">
<Metric name="" shortname="CDEm" />
</Question>
</Goal>
</Question>
</Goal>
<Goal name="Changeability">
<Question text="How easy is to change the SW?">
<Goal name="Structuredness">
<Question text="Should developer change other objects? Or check either these
have to be corrected?">
<Question text="What is the number of modules changed per change cause?">
</Question>
<Question text="Is coupling low enough?">
<Question text="How many relation are between objects?">
<Question text="How many global variables there are?"></Question>
<Metric name="" shortname="NDC" />
<Metric name="" shortname="CBO" />
<Metric name="" shortname="FAN-IN" />
</Question></Question>
<Question text="Should developer analyze not relevant code?">
<Metric name="" shortname="LCOM" />
</Question>
<Question text="How high is degree of conformance of system to principles of
maximal cohesion and minimal coupling?">
<Metric name="" shortname="m" />
</Question>
</Question>
</Goal>
<Goal name="Modularity">
<Question text="Should developer change large chinks of code at a time?">
<Question text="Is the code sufficient divided in parts?">
<Metric name="" shortname="LOC" />
</Question>
</Question>
</Goal>
<Goal name="Packaging">
<Question text="Are reusable elements isolated from non-reusable elements?">
<Metric name="" shortname="I" />
</Question>
<Question text="How full are abstract classes used?">
<Metric name="" shortname="A" />
</Question>
101
<Metric name="" shortname="D" />
</Goal>
</Question>
</Goal>
<Goal name="Testability">
<Question text="How easy is to test the SW?">
<Goal name="Value">
<Question text="How many test-cases should be changed/proved?">
<Metric name="" shortname="LOC" />
<Metric name="" shortname="WMC" />
<Metric name="" shortname="MCC" />
</Question>
</Goal>
<Goal name="Simplicity">
<Question text="How easy are test-cases to maintain?">
<Metric name="" shortname="FAN-OUT" />
<Metric name="" shortname="LOC" />
<Metric name="" shortname="RFC" />
</Question>
</Goal>
</Question>
</Goal>
<Goal name="Clonicity">
<Question text="Does the system have clones?">
<Metric name="" shortname="CLON" />
</Question>
</Goal>
<Goal name="Maturity">
<Question text="Should the SW be compared with original release because of
changes?">
<Question text="How significant was the system changed?">
<Question text="How many new, changed, deleted objects does the system
have?">
<Metric name="" shortname="SMI" />
</Question>
</Question>
</Question>
</Goal>
</Question>
</Goal>
<Additional>
<Metric name="" shortname="TotalLOC" />
<Metric name="" shortname="TotalNOC" />
</Additional>
</Model>
102
Appendix D. Metric-based quality report
for Mobile Client 7.1
The list of complex methods (LOC > 80)
Methodname
com.sap.tc.mobile.cfs.console.MgmtConsole.run()
com.sap.tc.mobile.cfs.utils.MutableString.compareTo()
com.sap.tc.mobile.cfs.utils.AbstractSorter.sortIndexed()
com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.importRelations()
com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.importNodes()
com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.importAttrs()
com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.importDDIC()
com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl.getISOLanguage()
com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager.flushPersistentObjects()
com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager.deletePersistent()
com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager.makePersistent()
com.sap.tc.mobile.cfs.meta.ddic.DdicJcoImporterStruct.onlineImportDdic()
com.sap.tc.mobile.cfs.meta.ddic.DdicJcoImporterStruct.linkSimpleWithValueEnum()
com.sap.tc.mobile.cfs.meta.ddic.DdicModelDataProviderImpl.main()
com.sap.tc.mobile.cfs.pers.spi.query.QueryParser.parseQueryInternal()
com.sap.tc.mobile.cfs.pers.spi.query.QueryParser.parseConditionPrim()
com.sap.tc.mobile.cfs.meta.mi25io.FieldContentHandler.createAttributeDescriptor()
com.sap.tc.mobile.cfs.meta.io.TypeContentProducer.writeType()
com.sap.tc.mobile.cfs.utils.io.SplitMessageFile.SplitMessageFile()
com.sap.tc.mobile.logging.impl.FileLogger.log()
MIXMLParser.ElementParser.parse()
com.sap.tc.mobile.cfs.xml.api.MIXMLParser.parseEntity()
LOC
121
101
83
106
112
123
131
129
119
153
88
92
81
89
274
94
116
115
91
131
111
98
Number of complex methods: 22
Total number of methods: 3396
103
The list of classes with DIT > 3
DIT - Depth in Inheritance Tree
Classname
DIT
com.sap.tc.mobile.cfs.utils.ReadOnlyProperties
4
com.sap.tc.mobile.cfs.utils.ChainedException
4
com.sap.tc.mobile.cfs.pers.spi.DuplicateKeyException
6
com.sap.tc.mobile.cfs.pers.spi.ObjectNotFoundException
6
com.sap.tc.mobile.cfs.pers.spi.DBException
5
com.sap.tc.mobile.cfs.pers.spi.DBFatalException
6
com.sap.tc.mobile.cfs.pers.spi.ObjectDirtyException
6
com.sap.tc.mobile.exception.standard.SAPNumberFormatException
6
com.sap.tc.mobile.exception.standard.SAPUnsupportedOperationException
5
com.sap.tc.mobile.exception.standard.SAPIllegalAccessException
4
com.sap.tc.mobile.exception.standard.SAPIllegalStateException
5
com.sap.tc.mobile.exception.standard.SAPNullPointerException
5
com.sap.tc.mobile.exception.standard.SAPIllegalArgumentException
5
com.sap.tc.mobile.exception.standard.SAPIOException
4
com.sap.tc.mobile.cfs.pers.cache.JavaHashSet
4
com.sap.tc.mobile.cfs.pers.cache.DefaultCacheHandle
4
com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager
4
com.sap.tc.mobile.cfs.meta.mi25io.TopStructureContentHandler
4
com.sap.tc.mobile.cfs.meta.mi25io.ChildStructureContentHandler
4
com.sap.tc.mobile.cfs.pers.impl.spi.util.DBChainedException
6
com.sap.tc.mobile.cfs.xml.api.MIParseException
5
com.sap.tc.mobile.session.SessionChainedException
5
com.sap.tc.mobile.exception.BaseRuntimeException
4
com.sap.tc.mobile.cfs.utils.config.ConfigException
5
Number of classes with DIT > 3: 24
Total number of classes: 390
The list of complex stand-alone classes (WMC > 50)
WMC - Weighted Methods of Class
Classname
WMC LOC
com.sap.tc.mobile.cfs.utils.FastStringBuffer
95 490
com.sap.tc.mobile.cfs.pers.query.QueryLocalCandidateImpl
55 562
com.sap.tc.mobile.cfs.pers.spi.query.QueryParser
206 1330
com.sap.tc.mobile.cfs.xml.api.MIXMLParser
62 1024
Total number of classes: 390
104
The list of classes with NOC > 10
NOC - Number Of direct Children in inheritance tree
Classname
com.sap.tc.mobile.cfs.meta.api.StorageTypeDescriptor
com.sap.tc.mobile.cfs.meta.api.AbstractDescriptor
ClassDescriptorSPI.Instantiator
com.sap.tc.mobile.cfs.meta.spi.AbstractDescriptorSPI
com.sap.tc.mobile.cfs.utils.FastObjectHashEntry
com.sap.tc.mobile.cfs.pers.spi.Persistable
com.sap.tc.mobile.cfs.pers.spi.PersistableSPI
com.sap.tc.mobile.cfs.type.spi.GenericAccessCapableSPI
com.sap.tc.mobile.cfs.meta.mi25io.AbstractContentHandler
com.sap.tc.mobile.cfs.meta.mi25io.BaseContentHandler
com.sap.tc.mobile.cfs.type.api.GenericAccessCapable
com.sap.tc.mobile.cfs.pers.impl.spi.cache.AbstractPersistable
com.sap.tc.mobile.cfs.pers.impl.spi.cache.PersistableImpl
com.sap.tc.mobile.cfs.xml.api.MIContentHandler
com.sap.tc.mobile.cfs.xml.api.AbstractMIContentHandler
com.sap.tc.mobile.exception.IBaseException
NOC
15
34
15
23
19
16
13
14
13
12
18
11
12
27
25
17
Number of classes with NOC > 10: 16
Total number of classes: 390
The list of large classes (LOC > 500)
Classname
WMC LOC
com.sap.tc.mobile.cfs.console.MgmtConsole
51 533
com.sap.tc.mobile.cfs.pers.spi.PersistenceManager
23 630
com.sap.tc.mobile.cfs.pers.query.QueryResultClassImpl
34 725
com.sap.tc.mobile.cfs.pers.query.QueryLocalCandidateImpl
55 562
com.sap.tc.mobile.cfs.pers.query.QueryImpl
96 847
com.sap.tc.mobile.cfs.meta.mmw.MMWModelProviderImpl
113 1178
com.sap.tc.mobile.cfs.pers.cache.BLOBImpl
114 843
com.sap.tc.mobile.cfs.pers.cache.DefaultPersistenceManager 352 2650
com.sap.tc.mobile.cfs.pers.cache.PersistentList
70 654
com.sap.tc.mobile.cfs.meta.ClassDescriptorImpl
216 1522
com.sap.tc.mobile.cfs.meta.ModelDescriptorImpl
83 721
com.sap.tc.mobile.cfs.meta.TypeDescriptorImpl
121 863
com.sap.tc.mobile.cfs.meta.AttributeDescriptorImpl
76 557
com.sap.tc.mobile.cfs.pers.spi.query.QueryParser
206 1330
com.sap.tc.mobile.cfs.meta.io.ModelContentHandler
31 516
com.sap.tc.mobile.cfs.xml.api.MIXMLParser
62 1024
com.sap.tc.mobile.logging.LogController
62 714
com.sap.tc.mobile.cfs.PersMessages
134 806
com.sap.tc.mobile.logging.msg.LogMessages
78 510
Total number of classes: 390
105
List of exact clones for Mobile Client 7.1
Clone Set
Clone Set No 1
Number of LOC in all
instances
Length: instances:
26
2
52
27
2
54
30
2
60
70
2
140
145
2
290
156
7
1092
24
2
48
57
2
114
From line 476 to 501,com\sap\tc\mobile\cfs\pers\cache\BLOBImpl.java
From line 530 to 555,com\sap\tc\mobile\cfs\pers\cache\BLOBImpl.java
Clone Set No 2
From line 114 to 140,com\sap\tc\mobile\cfs\pers\spi\DBException.java
From line 158 to 184,com\sap\tc\mobile\cfs\utils\ChainedException.java
Clone Set No 3
From line 345 to 374,com\sap\tc\mobile\cfs\utils\FastStringBuffer.java
From line 391 to 420,com\sap\tc\mobile\cfs\utils\FastStringBuffer.java
Clone Set No 4
From line 278 to 347,com\sap\tc\mobile\cfs\utils\FastLongHash.java
From line 288 to 357,com\sap\tc\mobile\cfs\utils\FastObjectHash.java
Clone Set No 5
From line 45 to 189,com\sap\tc\mobile\exception\BaseException.java
From line 45 to 189,com\sap\tc\mobile\exception\BaseRuntimeException.java
Clone Set No 6
From line 57 to 212,com\sap\tc\mobile\exception\standard\SAPIOException.java
From line 80 to 235,
com\sap\tc\mobile\exception\standard\SAPIllegalAccessException.java
From line 133 to 342,
com\sap\tc\mobile\exception\standard\SAPIllegalArgumentException.java
From line 80 to 235,
com\sap\tc\mobile\exception\standard\SAPIllegalStateException.java
From line 80 to 235,
com\sap\tc\mobile\exception\standard\SAPNullPointerException.java
From line 103 to 312,
com\sap\tc\mobile\exception\standard\SAPNumberFormatException.java
From line 64 to 270,
com\sap\tc\mobile\exception\standard\SAPUnsupportedOperationException.java
Clone Set No 7
From line 176 to 199,com\sap\tc\mobile\cfs\pers\query\QueryTypeCheck.java
From line 221 to 244,com\sap\tc\mobile\cfs\pers\query\QueryTypeCheck.java
Clone Set No 8
From line 23 to 79,com\sap\tc\mobile\logging\spi\CategorySPI.java
From line 23 to 79,com\sap\tc\mobile\logging\spi\LocationSPI.java
Total lines of code in clones
106
1850