BUG TRACKING AND PREDICTION

Transcription

BUG TRACKING AND PREDICTION
International Journal of Innovative and Emerging Research in Engineering
Volume 2, Issue 3, 2015
Available online at www.ijiere.com
International Journal of Innovative and Emerging
Research in Engineering
e-ISSN: 2394 - 3343
p-ISSN: 2394 - 5494
BUG TRACKING AND PREDICTION
Komal Patel1, Prarthana Sawant1, Madhuri Tajane1 and Dr. Radha Shankarmani2
1
Information Technology Dept., Sardar Patel Institute of Technology, Andheri (W), Mumbai, India
HOD, Information Technology Dept., Sardar Patel Institute of Technology, Andheri (W), Mumbai, India
2
ABSTRACT:
A software bug is an error, fault, failure or flaw in a computer program or system that causes it to produce an
incorrect result, or to behave in unintended ways. Bug Severity is a classification of software bug to indicate the
degree of negative impact on the quality of software. Some bugs are critical and need to be fixed right away,
whereas others are minor bugs and their fixes could be postponed. In this paper, a software bug classification
algorithm, CLUBAS (Classification of Software Bugs Using Bug Attribute Similarity) is used to categorize the
bugs and based on its phase and cost the severity will be assigned. We will verify the given severity with an already
established prediction model named as Information Retrieval Based Nearest Neighbour Classification. One
special type of severe bugs are blocking bugs that prevent other bugs from being fixed. These bugs may increase
maintenance costs, reduce overall quality and delay the release of the software systems. In this paper based on
factors like description of bugs, priority, severity, no. of people involved etc. we build a decision trees to predict
whether a bug will be a blocking bug or not.
Keywords: Bug, Severity, CLUBAS, Information Retrieval Based Nearest Neighbour Classification, Decision Tree,
Blocking Bugs, Prediction.
I. INTRODUCTION
The abundance of defects in existing software systems is unsustainable. Addressing them is a dominant cost of software
maintenance, which in turn dominates the life cycle cost of a system [1]. Assigning severity label unable the developers to
prioritize bugs based on its impact on the overall budget of the project. Although guidelines exist on how severity of bugs
need to be assigned, the process is inherently manual that is highly dependent on the expertise of the bug reporters in
assigning correct labels. Novice bug reporter might find it difficult to decide the right severity level [2].
As the number of bug reports made is large, a number of past studies have proposed approaches to help users in assigning
severity labels, and development team in validating bug report severity [2]. All these approaches combine text processing
with machine learning to assign severity labels from the textual description of the reports. Menzies and Marcus develop a
machine learning approach to assign the severity labels of bug reports in NASA. More recently, Lamkanfi et al. develop
another machine learning approach to assign severity labels of bug reports in several Bugzilla[3] repositories of open source
projects The bug severity prediction tools are not perfect though and there is still room for improvement. Menzies and
Marcus reported F measures (i.e., harmonic mean of precision and recall) of 14 to 86% for the different severity labels.
Lamkanfi et al. reported F measures of 65% to 75% on Bugzilla reports from different software systems. Thus there is a
need to improve the accuracy of the prediction tools further.
In this work, we focus on assigning fine-grained bug severity labels. The objective of the proposed work is to create the
group of similar software bugs and then classify this group using discriminative terms identified from various software bug
repositories [18], [19], [20]. It helps the developers to identify the blocking bugs earlier in the process of development. The
application of the proposed work is to provide the effective management of the bug information and faster resolution of
the reported bugs.
The following paper is divided into five major sections. The first part gives an extensive introduction to the Bug tracking
process, a detailed insight into the pre-processing techniques used in the proposed system and a brief about blocking bugs
prediction. The second part of the paper is devoted to the working of CLUBAS Algorithm. The third part of the paper
describes the method we use to assign the severity of bugs. The fourth part of the paper explains the method of verifying
assigned severity label. The fifth part describes our approach for predicting blocking bugs.
II. BACKGROUND
In this section, we describe the bug tracking process, then present standard approaches to pre-process textual documents,
which we have used in our project.
A. Bug Tracking
174
International Journal of Innovative and Emerging Research in Engineering
Volume 2, Issue 3, 2015
A bug tracking system or defect tracking system is a software application that keeps track of reported software bugs in
software development projects. To help improve the quality of software systems, software projects often allow users to
report bugs [4]. This is true for both open-source and closed-source software developments. Bug tracking systems such as
Bugzilla are often used. Users from various locations can log in to Bugzilla [3] and report new bugs. Users can report
symptoms of the bugs along with other related information to developers. These include textual descriptions of the bug
either in short or detailed form, product and component that are affected by the bug, and the estimated severity of the bug.
The format of bug reports varies from one project to another, but bug reports typically contain the fields described in Table
I. Developers would then verify these symptoms and fix the bugs. They could make adjustment to the severity of the
reported bug. There are often many reports that are received and thus developers would need to prioritize which reports
are more important than others – the severity field is useful for this purpose.
B. Text Pre-Processing
Tokenization: Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases,
symbols and other elements called tokens. In the process of tokenization, some characters like punctuation marks are
discarded. The tokens become the input for another process like parsing and text mining [5].
Stop-Word Removal: Stop words are non-descriptive words carrying little useful information for retrieval tasks. These
include linking verbs such as “is”, “am” and “are”, pronouns such as “I”, “he” and “it”, etc. [2].
C. Blocking Bugs Prediction
In the normal flow of the bug process, someone discovers a bug and creates the respective bug report, then the bug is
assigned to a developer who is responsible for fixing it and finally, once it is resolved, another developer verifies the fix
and closes the bug report. Sometimes, however, the fixing process is stalled because of the presence of a blocking bug.
Blocking bugs are software defects that prevent other defects from being fixed. In this scenario, the developers cannot go
further fixing their bugs, not because they do not have the skills or resources (e.g., time) needed to do it, but because the
components they are fixing depend on other components that have unresolved bugs. These blocking bugs considerably
lengthen the overall fixing time of the software bugs and increase the maintenance cost. In fact, we found that blocking
bugs take approximately two to three times longer to be fixed compared the non-blocked bugs. To reduce the impact of
blocking bugs, we build prediction models in order to flag the blocking bugs early on for developers [6].
Table 1. Fields Present in Bug Report
Field
Description
Bug_id
Bug-ID: Distinct id assigned to recognize each bug.
Desc
Description: Detailed description of the bug. This includes information such as how toreproduce
the bug, the error log outputted when the bug occurs, etc.
Prod
Product: Product that is affected by the bug.
Comp
Component: Component that is affected by the bug.
Sev
Severity: Estimated impact of the bug to the workings of the software. In Bugzilla, there are
several severity levels: blocker, critical, major, normal, minor, and trivial. There is also another
severity level, enhancement which we ignore in this work, as we are not interested in feature
requests but only defects.
III. THE CLUBAS ALGORITHM
In this section, the working of the CLUBAS algorithm is presented [7]. CLUBAS is segmented into the five major steps.
CLUBAS takes two input parameters for performing the bug classification i.e. textual similarity threshold value (T) and
number of frequent terms in cluster label (N).
Step 1: The first step includes retrieving the random software bugs from online software bug repositories, parsing the
software bugs and saving to the local database. Software bugs are available in the format of XML (Extensible Query
Language) files. Xml files are then converted to Excel sheet for fast processing.
Step 2: Next step is the Pre-Processing Step, where the software bug records available locally in XML file formats are
parsed and bug attributes and their corresponding values are stored in the local database. After this the stop words
elimination and stemming is performed over the textual bug description, which are used for creating the bug clusters [21].
Step 3: Here we perform Clustering wherein the pre-processed software bug description are selected for textual similarity
measurement. The clusters are created as follows—Firstly one cluster is created with a random bug, then its similarity is
found with the next bug. If the similarity measure is above the threshold both are mapped to the same cluster else a new
cluster is created. This process is carried on till all the bugs are assigned to a particular cluster. If the threshold value is
high, then high similarity between the software bug attributes is expected for clustering and vice-versa.
175
International Journal of Innovative and Emerging Research in Engineering
Volume 2, Issue 3, 2015
Step 4: This step is Cluster Label Generation, which is used to generate the cluster labels using the frequent terms
present in the bugs of a cluster. In this step the descriptions of all the software bugs belonging to a particular clusters are
aggregated and frequent terms present in this aggregate text data is calculated and the N (where N is the number of frequent
terms in labels and is an user supplied parameter) top most frequent terms are assigned to the clusters as the cluster labels.
Step 5: Mapping of the cluster labels to the bug categories using the taxonomic terms, that are predefined for various
categories is carried out next (Mapping Clusters to Classes). Lastly, the taxonomic terms for the entire bug categories are
pre-identified and cluster label terms found in the previous step are matched with these terms. Matching of the terms
indicates that the clusters belong to those categories.
IV. ASSIGNING SEVERITY
Once we have categorized the bugs, we need to assign the severity. We propose a new method to assign severity to the
bugs based on the cost it may incur. It consists of 3 steps:
A.
Determining the phase in which the bug occurs
B.
Assigning cost based on the phase of software development
C.
Determining the severity based on cost
A. Determining the phase in which bug occurs
SDLC (Software Development Life Cycle) is a process followed for a software project, within a software organization
[8]. It consists of a detailed plan describing how to develop, maintain, replace and alter or enhance specific software [22].
The life cycle defines a methodology for improving the quality of software and the overall development process. It consists
of 5 main phases: Design and architecture, Implementation, Integration testing, Customer beta test, Post product release.
Bugs can occur in any of the phases, causing severe damage to the overall budget of the project[9]. Based on the categories
of bug we will assign it to one of the 5 phases of development.
B. Assigning cost based on the phase of software development
The cost of defects can be measured by the impact of the defects on the overall system and when we detect them. Earlier
the defect found lesser is the cost. For example if error is found in the requirement specifications phase then it is somewhat
cheap to fix it. The correction to the requirement specification can be done and then it can be re-issued. In the same way
when defect or error is found in the design then the design can be corrected and it can be re-issued. But if the error is not
caught in the specifications and is not found till the user acceptance then the cost to fix those errors or defects will be way
too expensive10]. The cost of defect increases depending on the phase[11] as shown in the Figure1.
Figure 1. X is normalized unit of cost and can be expressed in terms of person-hours, dollaretc.[11]
C. Determining the severity based on cost
Based on the severity, the developer prioritizes the process of debugging it. The more severe the bug is the more impact
it will have on the system and thus increases the cost. The severity labels include blocker, critical, major, minor, and trivial.





Blocker: The defect that results in the termination of the complete system or one or more component of the system
and causes extensive corruption of the data. The failed function is unusable and there is no acceptable alternative
method to achieve the required results then the severity will be stated as critical.
Critical: The defect that results in the termination of the complete system or one or more component of the system
and causes extensive corruption of the data. The failed function is unusable but there exists an acceptable
alternative method to achieve the required results then the severity will be stated as major.
Major: The defect that does not result in the termination, but causes the system to produce incorrect, incomplete
or inconsistent results then the severity will be stated as moderate.
Minor: The defect that does not result in the termination, does not damage the usability of the system and the
desired results can be easily obtained by working around the defects then the severity is stated as minor.
Trivial: The defect that is related to the enhancement of the system where the changes are related to the look and
field of the application then the severity is stated as cosmetic [12].
Table 2. Example of severity assignment
Categories of Keywords
bugs
Phase
Cost
Incurred
Severity
OS
Integration
10X
3
concurrent, path, redhat, unix, windows,
hardware architecture, interface bugs,interface
176
International Journal of Innovative and Emerging Research in Engineering
Volume 2, Issue 3, 2015
modules, macros.
Logical
assertion, annotation, asynch, argument, Development
application, attempting, break, broken,
behavior, badly, call, code, clustering,
component , core, default, error, exception,
edit, found, fail, file, frame, handle, host,
implement, incorrect, integrat, incomplete,
java, library, logic, null, option, proper, pointer.
5X
4
V. VERIFYING THE SEVERITY LABEL
In this work, we propose a new approach leveraging information retrieval[13], in particular BM25-based document
similarity function [14], to automatically predict the severity of bug reports. BM25F is a function to evaluate the similarity
between two structured documents [15]. A document is structured if it has a number of fields. A bug report is a structured
document as it has several textual fields, i.e., summary and description. Each of the fields in the structured document can
be assigned a different weight to denote its importance in measuring the similarity between two documents.
A. Overall Framework
The framework thus consists of two major components: similarity computation, which is an integral part of finding
nearest-neighbors, and label assignment [2]. Our framework assigns a severity label to a bug report BQ in question by
investigating prior bug reports with known severity labels in the pool of bug reports BPool. In the similarity computation
component, we measure the similarity between two bug reports. In the label assignment component, given a bug report
whose severity is to be predicted, we take the nearest k bug reports based on the similarity measure. These k bug reports
are then used to predict the label of the bug report.
Inputs:
i.
BQ: Bug report in question
ii.
BPool : Historical bug report pool
iii.
Output: Predicted bug report severity label
Method:
i.
Let NNSet = Find top-K nearest neighbors of BQ in BPool
ii.
Let PredictedLabel = Predict label from NNSet
iii.
Output PredictedLabel
B. Similarity Computation
Given two bug reports d and q, the similarity function REP(d, q) is a linear combination of four features, with the
following form where w(i) is the weight for the i-th feature feature(i).
REP(d, q) = ∑w(i) X feature(i)
Each weight determines the relative contribution and the degree of importance of its corresponding feature. Features
that are important to measure the similarity between bug reports would have a higher score. There are two types of features
used: textual and non-textual.
C. Label Assignment
We aggregate the contribution of each bug report to predict the label of the bug report. We compute the weighted mean
of the labels of the neighbors as the predicted label. We map the labels into integers and order them from the most severe
to the least severe. Consider a set of nearest neighbors NNSet of a bug report BQ. Also let NNSet[i] be the ith nearest
neighbour, NNSet[i]. Label be the label of the ith nearest neighbour (expressed in integer), and NNSet[i].Sim be the
similarity of BQ with NNSet[i]. The predicted label is computed by the following formula:
(∑NNSet[i].Sim X NNSet[i].Label ÷ ∑NNSet[i].Sim) + 0.5
The severity we get from this method is then compared with our previous method based on cost in order to verify the
earlier method [2].
VI. PREDICTION OF BLOCKING BUGS
In this section, we describe our approach for predicting blocking bugs. First, we list the factors extracted from the bug
reports. Second, we describe the prediction model used in our study.
A. Factors Used to Predict Blocking Bugs
Since our goal is to be able to predict blocking bugs, we extracted different factors from the bug reports so the blocking
bugs can be detected early on. In addition, we would like to determine which factors best identify these blocking bugs [6].
177
1.
2.
3.
4.
5.
6.
7.
International Journal of Innovative and Emerging Research in Engineering
Volume 2, Issue 3, 2015
Product: The product where the bug was found (e.g., Firefox OS, Bugzilla etc.). Some products are older or more
complex than others and therefore, are more likely to have blocking bugs. For example, Firefox OS and Bugzilla
[3] are two Mozilla products with approximately the same number of bugs (≈ 880) [16], however there were more
blocking bugs in Firefox OS (250 bugs) than in Mozilla (30 bugs).
Component: The component in which the bug was found (e.g. Core, Editor, UI, etc.). Some components are
more/less critical than others and as a consequence more/less likely to have blocking bugs than others. For
example, it might be the case that bugs in critical components prevents bugs in other components from being
fixed.
Platform: The operating system in which the bug was found (e.g., Windows, GNU/Linux, Android, etc.). Some
platforms are more/less prone to have bugs than others. It is more or less likely to find blocking or non-blocking
bugs for specific platforms.
Severity: The severity describes the impact of the bug. We anticipate that bugs with a high severity tend to block
the development and debugging process. On the other hand, bugs with a low severity are related to minor issues
or enhancement requests.
Priority: Refers to the order in which a bug should be attended with respect to other bugs. For example, bugs with
low priority values should be prioritized instead of bugs with high priority values. It might be the case that a
high/low priority is indicative of a blocking/non-blocking bugs.
Number in the CC list: The number of developers in the CC list of the bug. We think that bugs followed by a large
number of developers might indicate bottlenecks in the maintenance process and therefore are more likely to be
blocking bugs.
Description size: The number of words in the description. It might be the case that long/short descriptions can
help to discriminate between blocking and non-blocking bugs.
B. Prediction Model
For each of our case study projects, we use our proposed factors to train a decision tree classifier to predict whether
a bug will be a blocking bug or not.
Decision Tree Classifier
We use a tree-based classifier to perform our predictions. One of the benefits of tree-based classifiers is that they
provide explainable models. Such models intuitively show to the users (i.e., developers or managers) the decisions taken
during the prediction process. The C4.5 algorithm [17] belongs to this type of data mining technique and like other treebased classifiers; it follows a greedy divide and conquer strategy in the training stage
Figure 2. Decision Tree Example[6]
As shown in Figure 2 the algorithm begins with an empty tree. At each level, the goal is try to find the feature that
maximizes the information gain. Consider for example a data set with p feature-columns: X1 , X2 , . . . Xp (e.g., severity,
platform, comment/description-size, etc.) and a class-column: (e.g., Blocking/Non-Blocking). The C4.5 algorithm splits
the data into two subsets with rules of the form Xi < b if the feature is numeric or into multiple subsets if the feature is
nominal. The algorithm is applied recursively to the partitions until every leaf contains only records of the same class.
In Figure 2, we provide an example of a tree generated from the extracted factors in our data set. The sample tree
indicates that a bug report will be predicted as blocking bug if the Bayesian-score of its comment/description is > 0.74,
there are more than 6 developers in the CC list and the number of words in the comments is greater than 20. On the other
178
International Journal of Innovative and Emerging Research in Engineering
Volume 2, Issue 3, 2015
hand, if the Bayesian-score of its comment is ≤ 0.74 and the reporter’s experience is less than 5, then it will be predicted
as a non-blocking bug.
VII. CONCLUSION
In this paper, a text clustering and classification algorithm is developed and a GUI based tool for software bug
classification CLUBAS is presented. The CLUBAS algorithm is designed using the technique of classification by
clustering, in which initially clustering is done using textual similarity of bug description and then labels are generated and
assigned to each cluster. Lastly, the cluster labels are mapped to the bug classes using bug taxonomic terms. Based on the
categories we assign severity to the bugs. BM25 based severity prediction model is used to verify severity assigned by our
system. We also build prediction models based on decision trees to predict whether a bug will be a blocking bug or not.
VIII. FUTURE SCOPE
The proposed work can be extended using advanced text pre-processing techniques for optimizing the clustering and
classification work, and also modern text clustering and classification algorithms can be implemented and compared with
the proposed algorithm.
We also plan to extend it by performing feature selection on our factors. Employing feature selection may improve the
performance of our models since it removes redundant factors. Our results show that blocking bugs take longer to be fixed
than non-blocking bugs; however it is unclear if blocking bugs require more effort and resources than non-blocking bugs.
To tackle this question, we plan to link bug reports with information from the version control systems, leverage metrics at
commit level and perform a quantitative analysis that may help us to confirm or refute our intuition that blocking bugs
indeed require more effort.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
REFERENCES
Current challenges in automatic software repair . Goues, Claire Le, Forrest, Stephanie and Weimer, Westley. New
York: Springer Science+Business Media, 2013.
Yuan Tian, David Lo, Chengnian Sun, "Information Retrieval Based Nearest Neighbour Classification for FineGrained Bug Severity Prediction", WCRE, 2012, 2013 20th Working Conference on Reverse Engineering
(WCRE), 2013 20th Working Conference on Reverse Engineering (WCRE) 2012, pp. 215-224,
doi:10.1109/WCRE.2012.31
http://www.bugzilla.org/
Shaffiei, Zatul Amilah, Mudiana Mokhsin, and Saidatul Rahah Hamidi. "Change and Bug Tracking System:
Anjung Penchala Sdn. Bhd." Change 10.3 (2010).
http://www.techopedia.com/definition/13698/tokenization
Valdivia Garcia, Harold, and Emad Shihab. "Characterizing and predicting blocking bugs in open source
projects." Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 2014.
N. Kumar Nagwani and S. Verma, "CLUBAS: An Algorithm and Java Based Tool for Software
Bug
Classification Using Bug Attributes Similarities," Journal of Software Engineering and Applications, Vol. 5 No.
6, 2012, pp. 436-447. doi: 10.4236/jsea.2012.56050.
http://istqbexamcertification.com/what-are-the-software-development-life-cycle-sdlc-phases/
http://users.ece.cmu.edu/~koopman/des_s99/sw_testing/
http://istqbexamcertification.com/what-is-the-cost-of-defects-in-software-testing/
Wang, Hui. "Software Defects Classification Prediction Based On Mining Software Repository." (2014).
http://istqbexamcertification.com/what-is-the-difference-between-severity-and-priority/
C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval. New York, NY, USA:
Cambridge University Press, 2008, pp. 232–233.
S. Robertson, H. Zaragoza, and M. Taylor, “Simple BM25 Extension to Multiple Weighted Fields,” in
Proceedings of the thirteenth ACM international conference on Information and knowledge management, 2004,
pp. 42–49.
C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in ASE,
2011.
J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in eclipse ’05: Proceedings of the
2005 OOPSLA workshop on Eclipse technology eXchange, 2005, pp. 35–39
J. R. Quinlan, C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993.
T. Menzies and A. Marcus, “Automated severity assessment of software defect reports,” in ICSM, 2008.
A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in MSR, 2010.
A. Lamkanfi, S. Demeyer, Q. Soetens, and T. Verdonck, “Comparing mining algorithms for predicting the
severity of a reported bug,” in CSMR, 2011.
Nagwani, Naresh Kumar, and Shrish Verma. "ML-CLUBAS: A Multi Label Bug Classification Algorithm."
(2012).
http://www.tutorialspoint.com/sdlc/sdlc_overview.htm
179