BUG TRACKING AND PREDICTION
Transcription
BUG TRACKING AND PREDICTION
International Journal of Innovative and Emerging Research in Engineering Volume 2, Issue 3, 2015 Available online at www.ijiere.com International Journal of Innovative and Emerging Research in Engineering e-ISSN: 2394 - 3343 p-ISSN: 2394 - 5494 BUG TRACKING AND PREDICTION Komal Patel1, Prarthana Sawant1, Madhuri Tajane1 and Dr. Radha Shankarmani2 1 Information Technology Dept., Sardar Patel Institute of Technology, Andheri (W), Mumbai, India HOD, Information Technology Dept., Sardar Patel Institute of Technology, Andheri (W), Mumbai, India 2 ABSTRACT: A software bug is an error, fault, failure or flaw in a computer program or system that causes it to produce an incorrect result, or to behave in unintended ways. Bug Severity is a classification of software bug to indicate the degree of negative impact on the quality of software. Some bugs are critical and need to be fixed right away, whereas others are minor bugs and their fixes could be postponed. In this paper, a software bug classification algorithm, CLUBAS (Classification of Software Bugs Using Bug Attribute Similarity) is used to categorize the bugs and based on its phase and cost the severity will be assigned. We will verify the given severity with an already established prediction model named as Information Retrieval Based Nearest Neighbour Classification. One special type of severe bugs are blocking bugs that prevent other bugs from being fixed. These bugs may increase maintenance costs, reduce overall quality and delay the release of the software systems. In this paper based on factors like description of bugs, priority, severity, no. of people involved etc. we build a decision trees to predict whether a bug will be a blocking bug or not. Keywords: Bug, Severity, CLUBAS, Information Retrieval Based Nearest Neighbour Classification, Decision Tree, Blocking Bugs, Prediction. I. INTRODUCTION The abundance of defects in existing software systems is unsustainable. Addressing them is a dominant cost of software maintenance, which in turn dominates the life cycle cost of a system [1]. Assigning severity label unable the developers to prioritize bugs based on its impact on the overall budget of the project. Although guidelines exist on how severity of bugs need to be assigned, the process is inherently manual that is highly dependent on the expertise of the bug reporters in assigning correct labels. Novice bug reporter might find it difficult to decide the right severity level [2]. As the number of bug reports made is large, a number of past studies have proposed approaches to help users in assigning severity labels, and development team in validating bug report severity [2]. All these approaches combine text processing with machine learning to assign severity labels from the textual description of the reports. Menzies and Marcus develop a machine learning approach to assign the severity labels of bug reports in NASA. More recently, Lamkanfi et al. develop another machine learning approach to assign severity labels of bug reports in several Bugzilla[3] repositories of open source projects The bug severity prediction tools are not perfect though and there is still room for improvement. Menzies and Marcus reported F measures (i.e., harmonic mean of precision and recall) of 14 to 86% for the different severity labels. Lamkanfi et al. reported F measures of 65% to 75% on Bugzilla reports from different software systems. Thus there is a need to improve the accuracy of the prediction tools further. In this work, we focus on assigning fine-grained bug severity labels. The objective of the proposed work is to create the group of similar software bugs and then classify this group using discriminative terms identified from various software bug repositories [18], [19], [20]. It helps the developers to identify the blocking bugs earlier in the process of development. The application of the proposed work is to provide the effective management of the bug information and faster resolution of the reported bugs. The following paper is divided into five major sections. The first part gives an extensive introduction to the Bug tracking process, a detailed insight into the pre-processing techniques used in the proposed system and a brief about blocking bugs prediction. The second part of the paper is devoted to the working of CLUBAS Algorithm. The third part of the paper describes the method we use to assign the severity of bugs. The fourth part of the paper explains the method of verifying assigned severity label. The fifth part describes our approach for predicting blocking bugs. II. BACKGROUND In this section, we describe the bug tracking process, then present standard approaches to pre-process textual documents, which we have used in our project. A. Bug Tracking 174 International Journal of Innovative and Emerging Research in Engineering Volume 2, Issue 3, 2015 A bug tracking system or defect tracking system is a software application that keeps track of reported software bugs in software development projects. To help improve the quality of software systems, software projects often allow users to report bugs [4]. This is true for both open-source and closed-source software developments. Bug tracking systems such as Bugzilla are often used. Users from various locations can log in to Bugzilla [3] and report new bugs. Users can report symptoms of the bugs along with other related information to developers. These include textual descriptions of the bug either in short or detailed form, product and component that are affected by the bug, and the estimated severity of the bug. The format of bug reports varies from one project to another, but bug reports typically contain the fields described in Table I. Developers would then verify these symptoms and fix the bugs. They could make adjustment to the severity of the reported bug. There are often many reports that are received and thus developers would need to prioritize which reports are more important than others – the severity field is useful for this purpose. B. Text Pre-Processing Tokenization: Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements called tokens. In the process of tokenization, some characters like punctuation marks are discarded. The tokens become the input for another process like parsing and text mining [5]. Stop-Word Removal: Stop words are non-descriptive words carrying little useful information for retrieval tasks. These include linking verbs such as “is”, “am” and “are”, pronouns such as “I”, “he” and “it”, etc. [2]. C. Blocking Bugs Prediction In the normal flow of the bug process, someone discovers a bug and creates the respective bug report, then the bug is assigned to a developer who is responsible for fixing it and finally, once it is resolved, another developer verifies the fix and closes the bug report. Sometimes, however, the fixing process is stalled because of the presence of a blocking bug. Blocking bugs are software defects that prevent other defects from being fixed. In this scenario, the developers cannot go further fixing their bugs, not because they do not have the skills or resources (e.g., time) needed to do it, but because the components they are fixing depend on other components that have unresolved bugs. These blocking bugs considerably lengthen the overall fixing time of the software bugs and increase the maintenance cost. In fact, we found that blocking bugs take approximately two to three times longer to be fixed compared the non-blocked bugs. To reduce the impact of blocking bugs, we build prediction models in order to flag the blocking bugs early on for developers [6]. Table 1. Fields Present in Bug Report Field Description Bug_id Bug-ID: Distinct id assigned to recognize each bug. Desc Description: Detailed description of the bug. This includes information such as how toreproduce the bug, the error log outputted when the bug occurs, etc. Prod Product: Product that is affected by the bug. Comp Component: Component that is affected by the bug. Sev Severity: Estimated impact of the bug to the workings of the software. In Bugzilla, there are several severity levels: blocker, critical, major, normal, minor, and trivial. There is also another severity level, enhancement which we ignore in this work, as we are not interested in feature requests but only defects. III. THE CLUBAS ALGORITHM In this section, the working of the CLUBAS algorithm is presented [7]. CLUBAS is segmented into the five major steps. CLUBAS takes two input parameters for performing the bug classification i.e. textual similarity threshold value (T) and number of frequent terms in cluster label (N). Step 1: The first step includes retrieving the random software bugs from online software bug repositories, parsing the software bugs and saving to the local database. Software bugs are available in the format of XML (Extensible Query Language) files. Xml files are then converted to Excel sheet for fast processing. Step 2: Next step is the Pre-Processing Step, where the software bug records available locally in XML file formats are parsed and bug attributes and their corresponding values are stored in the local database. After this the stop words elimination and stemming is performed over the textual bug description, which are used for creating the bug clusters [21]. Step 3: Here we perform Clustering wherein the pre-processed software bug description are selected for textual similarity measurement. The clusters are created as follows—Firstly one cluster is created with a random bug, then its similarity is found with the next bug. If the similarity measure is above the threshold both are mapped to the same cluster else a new cluster is created. This process is carried on till all the bugs are assigned to a particular cluster. If the threshold value is high, then high similarity between the software bug attributes is expected for clustering and vice-versa. 175 International Journal of Innovative and Emerging Research in Engineering Volume 2, Issue 3, 2015 Step 4: This step is Cluster Label Generation, which is used to generate the cluster labels using the frequent terms present in the bugs of a cluster. In this step the descriptions of all the software bugs belonging to a particular clusters are aggregated and frequent terms present in this aggregate text data is calculated and the N (where N is the number of frequent terms in labels and is an user supplied parameter) top most frequent terms are assigned to the clusters as the cluster labels. Step 5: Mapping of the cluster labels to the bug categories using the taxonomic terms, that are predefined for various categories is carried out next (Mapping Clusters to Classes). Lastly, the taxonomic terms for the entire bug categories are pre-identified and cluster label terms found in the previous step are matched with these terms. Matching of the terms indicates that the clusters belong to those categories. IV. ASSIGNING SEVERITY Once we have categorized the bugs, we need to assign the severity. We propose a new method to assign severity to the bugs based on the cost it may incur. It consists of 3 steps: A. Determining the phase in which the bug occurs B. Assigning cost based on the phase of software development C. Determining the severity based on cost A. Determining the phase in which bug occurs SDLC (Software Development Life Cycle) is a process followed for a software project, within a software organization [8]. It consists of a detailed plan describing how to develop, maintain, replace and alter or enhance specific software [22]. The life cycle defines a methodology for improving the quality of software and the overall development process. It consists of 5 main phases: Design and architecture, Implementation, Integration testing, Customer beta test, Post product release. Bugs can occur in any of the phases, causing severe damage to the overall budget of the project[9]. Based on the categories of bug we will assign it to one of the 5 phases of development. B. Assigning cost based on the phase of software development The cost of defects can be measured by the impact of the defects on the overall system and when we detect them. Earlier the defect found lesser is the cost. For example if error is found in the requirement specifications phase then it is somewhat cheap to fix it. The correction to the requirement specification can be done and then it can be re-issued. In the same way when defect or error is found in the design then the design can be corrected and it can be re-issued. But if the error is not caught in the specifications and is not found till the user acceptance then the cost to fix those errors or defects will be way too expensive10]. The cost of defect increases depending on the phase[11] as shown in the Figure1. Figure 1. X is normalized unit of cost and can be expressed in terms of person-hours, dollaretc.[11] C. Determining the severity based on cost Based on the severity, the developer prioritizes the process of debugging it. The more severe the bug is the more impact it will have on the system and thus increases the cost. The severity labels include blocker, critical, major, minor, and trivial. Blocker: The defect that results in the termination of the complete system or one or more component of the system and causes extensive corruption of the data. The failed function is unusable and there is no acceptable alternative method to achieve the required results then the severity will be stated as critical. Critical: The defect that results in the termination of the complete system or one or more component of the system and causes extensive corruption of the data. The failed function is unusable but there exists an acceptable alternative method to achieve the required results then the severity will be stated as major. Major: The defect that does not result in the termination, but causes the system to produce incorrect, incomplete or inconsistent results then the severity will be stated as moderate. Minor: The defect that does not result in the termination, does not damage the usability of the system and the desired results can be easily obtained by working around the defects then the severity is stated as minor. Trivial: The defect that is related to the enhancement of the system where the changes are related to the look and field of the application then the severity is stated as cosmetic [12]. Table 2. Example of severity assignment Categories of Keywords bugs Phase Cost Incurred Severity OS Integration 10X 3 concurrent, path, redhat, unix, windows, hardware architecture, interface bugs,interface 176 International Journal of Innovative and Emerging Research in Engineering Volume 2, Issue 3, 2015 modules, macros. Logical assertion, annotation, asynch, argument, Development application, attempting, break, broken, behavior, badly, call, code, clustering, component , core, default, error, exception, edit, found, fail, file, frame, handle, host, implement, incorrect, integrat, incomplete, java, library, logic, null, option, proper, pointer. 5X 4 V. VERIFYING THE SEVERITY LABEL In this work, we propose a new approach leveraging information retrieval[13], in particular BM25-based document similarity function [14], to automatically predict the severity of bug reports. BM25F is a function to evaluate the similarity between two structured documents [15]. A document is structured if it has a number of fields. A bug report is a structured document as it has several textual fields, i.e., summary and description. Each of the fields in the structured document can be assigned a different weight to denote its importance in measuring the similarity between two documents. A. Overall Framework The framework thus consists of two major components: similarity computation, which is an integral part of finding nearest-neighbors, and label assignment [2]. Our framework assigns a severity label to a bug report BQ in question by investigating prior bug reports with known severity labels in the pool of bug reports BPool. In the similarity computation component, we measure the similarity between two bug reports. In the label assignment component, given a bug report whose severity is to be predicted, we take the nearest k bug reports based on the similarity measure. These k bug reports are then used to predict the label of the bug report. Inputs: i. BQ: Bug report in question ii. BPool : Historical bug report pool iii. Output: Predicted bug report severity label Method: i. Let NNSet = Find top-K nearest neighbors of BQ in BPool ii. Let PredictedLabel = Predict label from NNSet iii. Output PredictedLabel B. Similarity Computation Given two bug reports d and q, the similarity function REP(d, q) is a linear combination of four features, with the following form where w(i) is the weight for the i-th feature feature(i). REP(d, q) = ∑w(i) X feature(i) Each weight determines the relative contribution and the degree of importance of its corresponding feature. Features that are important to measure the similarity between bug reports would have a higher score. There are two types of features used: textual and non-textual. C. Label Assignment We aggregate the contribution of each bug report to predict the label of the bug report. We compute the weighted mean of the labels of the neighbors as the predicted label. We map the labels into integers and order them from the most severe to the least severe. Consider a set of nearest neighbors NNSet of a bug report BQ. Also let NNSet[i] be the ith nearest neighbour, NNSet[i]. Label be the label of the ith nearest neighbour (expressed in integer), and NNSet[i].Sim be the similarity of BQ with NNSet[i]. The predicted label is computed by the following formula: (∑NNSet[i].Sim X NNSet[i].Label ÷ ∑NNSet[i].Sim) + 0.5 The severity we get from this method is then compared with our previous method based on cost in order to verify the earlier method [2]. VI. PREDICTION OF BLOCKING BUGS In this section, we describe our approach for predicting blocking bugs. First, we list the factors extracted from the bug reports. Second, we describe the prediction model used in our study. A. Factors Used to Predict Blocking Bugs Since our goal is to be able to predict blocking bugs, we extracted different factors from the bug reports so the blocking bugs can be detected early on. In addition, we would like to determine which factors best identify these blocking bugs [6]. 177 1. 2. 3. 4. 5. 6. 7. International Journal of Innovative and Emerging Research in Engineering Volume 2, Issue 3, 2015 Product: The product where the bug was found (e.g., Firefox OS, Bugzilla etc.). Some products are older or more complex than others and therefore, are more likely to have blocking bugs. For example, Firefox OS and Bugzilla [3] are two Mozilla products with approximately the same number of bugs (≈ 880) [16], however there were more blocking bugs in Firefox OS (250 bugs) than in Mozilla (30 bugs). Component: The component in which the bug was found (e.g. Core, Editor, UI, etc.). Some components are more/less critical than others and as a consequence more/less likely to have blocking bugs than others. For example, it might be the case that bugs in critical components prevents bugs in other components from being fixed. Platform: The operating system in which the bug was found (e.g., Windows, GNU/Linux, Android, etc.). Some platforms are more/less prone to have bugs than others. It is more or less likely to find blocking or non-blocking bugs for specific platforms. Severity: The severity describes the impact of the bug. We anticipate that bugs with a high severity tend to block the development and debugging process. On the other hand, bugs with a low severity are related to minor issues or enhancement requests. Priority: Refers to the order in which a bug should be attended with respect to other bugs. For example, bugs with low priority values should be prioritized instead of bugs with high priority values. It might be the case that a high/low priority is indicative of a blocking/non-blocking bugs. Number in the CC list: The number of developers in the CC list of the bug. We think that bugs followed by a large number of developers might indicate bottlenecks in the maintenance process and therefore are more likely to be blocking bugs. Description size: The number of words in the description. It might be the case that long/short descriptions can help to discriminate between blocking and non-blocking bugs. B. Prediction Model For each of our case study projects, we use our proposed factors to train a decision tree classifier to predict whether a bug will be a blocking bug or not. Decision Tree Classifier We use a tree-based classifier to perform our predictions. One of the benefits of tree-based classifiers is that they provide explainable models. Such models intuitively show to the users (i.e., developers or managers) the decisions taken during the prediction process. The C4.5 algorithm [17] belongs to this type of data mining technique and like other treebased classifiers; it follows a greedy divide and conquer strategy in the training stage Figure 2. Decision Tree Example[6] As shown in Figure 2 the algorithm begins with an empty tree. At each level, the goal is try to find the feature that maximizes the information gain. Consider for example a data set with p feature-columns: X1 , X2 , . . . Xp (e.g., severity, platform, comment/description-size, etc.) and a class-column: (e.g., Blocking/Non-Blocking). The C4.5 algorithm splits the data into two subsets with rules of the form Xi < b if the feature is numeric or into multiple subsets if the feature is nominal. The algorithm is applied recursively to the partitions until every leaf contains only records of the same class. In Figure 2, we provide an example of a tree generated from the extracted factors in our data set. The sample tree indicates that a bug report will be predicted as blocking bug if the Bayesian-score of its comment/description is > 0.74, there are more than 6 developers in the CC list and the number of words in the comments is greater than 20. On the other 178 International Journal of Innovative and Emerging Research in Engineering Volume 2, Issue 3, 2015 hand, if the Bayesian-score of its comment is ≤ 0.74 and the reporter’s experience is less than 5, then it will be predicted as a non-blocking bug. VII. CONCLUSION In this paper, a text clustering and classification algorithm is developed and a GUI based tool for software bug classification CLUBAS is presented. The CLUBAS algorithm is designed using the technique of classification by clustering, in which initially clustering is done using textual similarity of bug description and then labels are generated and assigned to each cluster. Lastly, the cluster labels are mapped to the bug classes using bug taxonomic terms. Based on the categories we assign severity to the bugs. BM25 based severity prediction model is used to verify severity assigned by our system. We also build prediction models based on decision trees to predict whether a bug will be a blocking bug or not. VIII. FUTURE SCOPE The proposed work can be extended using advanced text pre-processing techniques for optimizing the clustering and classification work, and also modern text clustering and classification algorithms can be implemented and compared with the proposed algorithm. We also plan to extend it by performing feature selection on our factors. Employing feature selection may improve the performance of our models since it removes redundant factors. Our results show that blocking bugs take longer to be fixed than non-blocking bugs; however it is unclear if blocking bugs require more effort and resources than non-blocking bugs. To tackle this question, we plan to link bug reports with information from the version control systems, leverage metrics at commit level and perform a quantitative analysis that may help us to confirm or refute our intuition that blocking bugs indeed require more effort. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] REFERENCES Current challenges in automatic software repair . Goues, Claire Le, Forrest, Stephanie and Weimer, Westley. New York: Springer Science+Business Media, 2013. Yuan Tian, David Lo, Chengnian Sun, "Information Retrieval Based Nearest Neighbour Classification for FineGrained Bug Severity Prediction", WCRE, 2012, 2013 20th Working Conference on Reverse Engineering (WCRE), 2013 20th Working Conference on Reverse Engineering (WCRE) 2012, pp. 215-224, doi:10.1109/WCRE.2012.31 http://www.bugzilla.org/ Shaffiei, Zatul Amilah, Mudiana Mokhsin, and Saidatul Rahah Hamidi. "Change and Bug Tracking System: Anjung Penchala Sdn. Bhd." Change 10.3 (2010). http://www.techopedia.com/definition/13698/tokenization Valdivia Garcia, Harold, and Emad Shihab. "Characterizing and predicting blocking bugs in open source projects." Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 2014. N. Kumar Nagwani and S. Verma, "CLUBAS: An Algorithm and Java Based Tool for Software Bug Classification Using Bug Attributes Similarities," Journal of Software Engineering and Applications, Vol. 5 No. 6, 2012, pp. 436-447. doi: 10.4236/jsea.2012.56050. http://istqbexamcertification.com/what-are-the-software-development-life-cycle-sdlc-phases/ http://users.ece.cmu.edu/~koopman/des_s99/sw_testing/ http://istqbexamcertification.com/what-is-the-cost-of-defects-in-software-testing/ Wang, Hui. "Software Defects Classification Prediction Based On Mining Software Repository." (2014). http://istqbexamcertification.com/what-is-the-difference-between-severity-and-priority/ C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008, pp. 232–233. S. Robertson, H. Zaragoza, and M. Taylor, “Simple BM25 Extension to Multiple Weighted Fields,” in Proceedings of the thirteenth ACM international conference on Information and knowledge management, 2004, pp. 42–49. C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in ASE, 2011. J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in eclipse ’05: Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, 2005, pp. 35–39 J. R. Quinlan, C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993. T. Menzies and A. Marcus, “Automated severity assessment of software defect reports,” in ICSM, 2008. A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in MSR, 2010. A. Lamkanfi, S. Demeyer, Q. Soetens, and T. Verdonck, “Comparing mining algorithms for predicting the severity of a reported bug,” in CSMR, 2011. Nagwani, Naresh Kumar, and Shrish Verma. "ML-CLUBAS: A Multi Label Bug Classification Algorithm." (2012). http://www.tutorialspoint.com/sdlc/sdlc_overview.htm 179