Розроблення алгоритму прогнозування дефектів програмного забезпечення на основі карт кохонена та ієрархічної кластеризації – Вісник Хмельницького національного університету

РОЗРОБЛЕННЯ АЛГОРИТМУ ПРОГНОЗУВАННЯ ДЕФЕКТІВ ПРОГРАМНОГО ЗАБЕЗПЕЧЕННЯ НА ОСНОВІ КАРТ КОХОНЕНА ТА ІЄРАРХІЧНОЇ КЛАСТЕРИЗАЦІЇ

DEVELOPMENT OF SOFTWARE DEFECT PREDICTION ALGORITHM BASED ON KOHONEN MAPS AND HIERARCHICAL CLUSTERING

Сторінки: 78-82. Номер: №1, 2021 (293)
Автори:
В. ЯКОВИНА, Н. ШАХОВСЬКА, Я. МАТВІЙЧУК, Є. ЗАСОБА
Національний університет «Львівська політехніка»
V. Yakovyna, N. Shakhovska, Ya. Matviychuk, Ye. Zasoba
Lviv Polytechnic National University
DOI: https://www.doi.org/10.31891/2307-5732-2021-293-1-78-82
Рецензія/Peer review : 09.01.2021 р.
Надрукована/Printed : 10.03.2021 р.

Анотація мовою оригіналу

У роботі розроблено вдосконалений алгоритм прогнозування дефектів програмного забезпечення на основі поєднання карти Кохонена та ієрархічної кластеризації. Раніше для побудови моделей прогнозування дефектів програмного забезпечення використовувались різні методи класифікації, починаючи від простих, таких як логістична регресія, і закінчуючи сучасними методами, наприклад багатовимірне адаптивне зрощення регресії. Однак наявна література все ще не дозволяє зробити однозначний висновок щодо вибору найкращого класифікатора та спроб різних вимірів для подолання потенційних упереджень пропонується. Метою статті є аналіз метрик програмного коду для виявлення залежностей між схильністю до дефектів програмного модуля та його метриками. У цьому дослідженні було використано JM1 загальнодоступний набір даних NASA з PROMISE Software Engin-Reering Repository.
Ключові слова: аналіз дефектів програмного забезпечення, прогнозування, ієрархічна кластеризація, карти Кохонена.

Розширена анотація англійською мовою

In this work, an improved software defect prediction algorithm based on a combination of the Kohonen map and hierarchical clustering was developed. The algorithm uses a gradually decreasing learning rate to fine-tune a new era. As a result, the center is set in a position that satisfactorily clusters the examples for which the neuron is the winner. The property of topological ordering is achieved in the algorithm using the concept of neighborhood. Neighborhood is the result of agglomerative hierarchical clustering. The proposed algorithm shows higher accuracy than other classification algorithms. Pre-processing allows us to improve the quality of analysis by dividing all data into two clusters. This study used a public dataset from the PROMISE software engineering repository. The JM1 dataset for software defect prediction is selected. The source of this dataset is NASA Metrics Data Program.
Previously, various classification methods were used to build software defect prediction models, from simple ones, such as logistic regression, to modern methods, such as multidimensional adaptive regression splicing. However, the available literature still does not allow to make an unambiguous conclusion about the choice of the best classifier and attempts of different dimensions to overcome potential biases are offered. The aim of the article is to analyze the metrics of the program code to identify the relationships between the susceptibility to defects of the software module and its metrics.
The task of classification is to assign an object to one of the predefined classes based on its formalized characteristics. Each of the classified objects is represented as an N-dimensional vector, each dimension of which corresponds to one of the features of the object. The binary classification model should evaluate the impact of each parameter or group of parameters on the classification of defects. The analysis process consists of two phases. In the first phase, all parameters were considered. In the second phase, the influence of each parameter was studied.
We use an ensemble of models – the Kohonen map and hierarchical clustering. The heat map shows the weight of attributes and their grouping, as well as clusters data. The Kohonen map uses unattended learning, and the learning set consists only of the values of the input variables. Kohonen’s map is studied by successive approximation. Starting with a randomly selected initial location of the centers, the algorithm is gradually improved to collect training data. Kohonen’s basic iterative algorithm successively goes through a number of epochs; one case study is processed for each epoch. The input signals are fed sequentially to the network. The desired output signals are not determined. After processing a sufficient number of input vectors, the synaptic weights of the network are determined by clusters. In addition, the scales are organized in such a way that topologically close nodes are sensitive to such input signals.
Keywords: software defects analysis, prediction, hierarchical clustering, Kohonen maps.

References

Michael Lyu. Software Reliability Engineering: A Roadmap. In Future of Software Engineering (FoSE’07), pages 153–170, Minneapolis, MN, USA, May 2007. IEEE
Hoang Pham. System software reliability. Spring-er-Verlag London Limited, 2006.
Maurice H. Halstead. Elements of Software Sci-ence. Elsevier North-Holland Publishing, New York, 1977
J. McCabe. A Complexity Measure. IEEE trans-action on Software Engineering, SE-2(4): 308–320, 1976
Ning Chen, Steven C. H. Hoi, Xiaokui Xiao. Soft-ware process evaluation: a machine learning framework with application to defect management process. Empirical Software Engineering, 19(6): 1531–1564, 2014
Cobra Rahmani, Azad H. Azadmanesh. Exploita-tion of Quantitative Approaches to Software Reli-ability. Survivable Networked Systems (CIST-9900) Report. University of Nebraska at Omaha, 2008
Dmitry Maevsky, Vyacheslav Kharchenko, Maryna Kolisnyk, Elena Maevskaya. Software re-liability models and assessment techniques review: Classification issues. In Proceedings of 9th IEEE International Conference on Intelligent Data Ac-quisition and Advanced Computing Systems: Technology and Applications (IDAACS’2017), pages 894–899 (vol. 2), Bucharest, Romania, Sep-tember 2017. IEEE
Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, E. James Whitehead. Does Bug Prediction Support Human Developers? Find-ings from a Google Case Study. In Proceedings of the 35th International Conference on Software Engineering ICSE’13, pages 372–381, San Fran-cisco, CA, USA, September 2013. IEEE.
Nachiappan Nagappan, Thomas Ball, Andreas Zeller. Mining metrics to predict component fail-ures. In Proceedings of the 28th International Conference on Software Engineering ICSE 06, pages 452, Shanghai, China, May 2006. ACM
E. Hassan, R. C. Holt. The Top Ten List: Dy-namic Fault Prediction. In 21 IEEE Int. Conf. on Software Maintenance ICSM05, pages 263–272, Budapest, Hungary, September 2005. IEEE
Thomas J. Ostrand, Elaine J. Weyuker, Robert M. Bell. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4): 340–355, 200
Selvarani, R. Bharathi. Early Detection of Soft-ware Reliability: A Design Analysis. In book: Stra-tegic Engineering for Cloud Computing and Big Data Analytics, pp. 83–99. Springer, 2017
Foyzur Rahman, Daryl Posnett, Abram Hindle, Earl Barr, Premkumar Devanbu. BugCache for In-spections: Hit or Miss? In Proceedings of the 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering and 13rd European Soft-ware Engineering Conference (ESEC/FSE’11), Szeged, Hungary, September 2011. ACM.
Thomas Zimmermann, Nachiappan Nagappan, Andreas Zeller. Predicting Bugs from History. In Tom Mens, Serge Demeyer (Ed.), Software Evolu-tion, Chapter 4, pages 69–88, Springer, March 2008
Libo Li, Stefan Lessmann, Bart Baesens. Evaluat-ing software defect prediction performance: an updated benchmarking study. arXiv preprint arXiv:1901.01726 [cs.SE], 2019
Stefan Lessmann, Bart Baesens, Christophe Mues, Swantje Pietsch. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transac-tions on Software Engineering, 34(4): 485–496, 2008
Sayyad Shirabad, J. and Menzies, T.J. (2005) The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. Avail-able: http://promise.site.uottawa.ca/SERepository
Les Hatton. Invited Talk: The Role of Empiricism in Improving the Reliability of Fu-ture Software. In Testing: Academic and Industrial Conference – Practice and Research Techniques (TAICPART), Windsor, UK, August 2008. IEEE

Post Author: npetliaks