Data-Driven Machine Learning-Based Prediction and Performance Analysis of Software Defects for Quality Assurance

Dinesh Rajendran, Aniruddha Arjun Singh Singh, Vaibhav Maniar, Vetrivelan Tamilmani, Rami Reddy Kothamaram, Venkata Deepak Namburi

Data-Driven Machine Learning-Based Prediction and Performance Analysis of Software Defects for Quality Assurance

Dinesh Rajendran, Aniruddha Arjun Singh Singh, Vaibhav Maniar, Vetrivelan Tamilmani, Rami Reddy Kothamaram, Venkata Deepak Namburi

Citation: Dinesh Rajendran, Aniruddha Arjun Singh Singh, Vaibhav Maniar, Vetrivelan Tamilmani, Rami Reddy Kothamaram, Venkata Deepak Namburi, "Data-Driven Machine Learning-Based Prediction and Performance Analysis of Software Defects for Quality Assurance", Universal Library of Engineering Technology, Special Issue.

Copyright: This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The prediction of software defects is now an indispensable part of both the quality assurance systems of the modern world providing an opportunity to detect those modules that tend to malfunction and enhance the reliability of the systems. This study offers a machine learning (ML) based framework, which uses the NASA JM1 dataset with 10,885 records and 22 features, to construct an effective prediction model. Handling of missing values, outliers, and normalization is carried out to make sure that the data is consistent. Adaptive Sequential K-Best (ASKB) is applied to discriminate the most pertinent features to improve the Minority Oversampling by Synthetic Data (MOSD), and model performance is employed to balance the classes by providing real-life defect-prone samples. The reason why a Random Forest (RF) classifier is used is its strong ability to deal with high-dimensional, complex data. The performance of model is rigorously tested utilizing Accuracy, Recall, Precision, and F1-score and yields 98.1, 98.7, 97.8 and 98.2 respectively. These pointers confirm the effectiveness of the proposed structure in the achievement of plausible predictions. Comparative analysis indicates that Trade-off of accuracy and generalization is better in Random Forest than in Naive Bayes, Neural Networks and SVM. The study will enhance the advancement of defect prediction practices, providing a scalable and explainable solution that enables proactive quality management. Further research is underway to extend this framework to hybrid and deep learning (DL) models, thereby broadening its applicability.

Keywords: Software Defect Prediction, Machine Learning, Random Forest, MOSD, Adaptive Sequential K-Best (ASKB), Software Quality Assurance, Predictive Analytics.

Download

https://doi.org/10.70315/uloap.ulete.2022.008

Useful Links

Join Us

Data-Driven Machine Learning-Based Prediction and Performance Analysis of Software Defects for Quality Assurance

Abstract

Quick Links

Universal Library Open Access Publications LLC

Author Guidelines

Editor Guidelines

Reviewer Guidelines

Useful Links

Join Us

Data-Driven Machine Learning-Based Prediction and Performance Analysis of Software Defects for Quality Assurance

Abstract

Quick Links

Universal Library Open Access Publications LLC