0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
Impact of Oversampling Methods on Imbalanced Dataset for Software Fault Prediction
Authors :
Alireza Abiri
1
Alireza Tajary
2
Mansoor Fateh
3
1- Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran
2- Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran
3- Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran
Keywords :
Software Fault Prediction،Imbalanced Data،Machine Learning،GAN،BugHunter Dataset
Abstract :
In today's world, with the rapid advancement of technology and the increasing use and scale of software systems both in terms of data volume and number of users, the occurrence of software faults has become inevitable. Consequently, software fault prediction has gained significant importance for the early identification of faulty modules during the software development process. However, one of the key challenges in this domain is the class imbalance problem, where the number of faulty and non-faulty instances in software datasets is highly unequal. To address this issue, data oversampling techniques are commonly employed to balance the datasets. In this study, we investigate and compare the performance of three data oversampling methods on the BugHunter software fault dataset. The results indicate that using Generative Adversarial Networks (GANs) for data generation and oversampling is a more effective approach for addressing class imbalance, achieving better performance compared to alternative methods.
Papers List
List of archived papers
A Hybrid Echo State Network for Hypercomplex Pattern Recognition, Classification, and Big Data Analysis
Mohammad Jamshidi - Fatemeh Daneshfar
Real-Time Vehicle Detection and Classification in UAV imagery Using Improved YOLOv5
Mohammad Hossein Hamzenejadi - Hadis Mohseni
Atlas-based segmentation of cardiac chambers in systolic and diastolic phases of echocardiographic images
Elham Fathipour - Mahdi Saadatmand
Lossless Watermarking in Encrypted Triangular Mesh Models Based on Optimized Vertex Estimation and Error Histogram Shifting
Alireza Ghaemi - Habibollah Danyali - Kamran Kazemi - Zahra Qodrati - Amirhossein Ghaemi - Seyedeh Masoumeh Taji
PersianILP: Construction and Evaluation of a Standard Persian Dataset for Inductive Link Prediction
Mohammad Rahimi - Afsaneh Fatemi - Ahmad Baraani
Extracting structural clusters from NMF feature matrix using Cosine Similarity-Based Weighted Voting
Mehdi Rahimi - Keyhan Khamforoosh - Vafa Maihami
The Effect of Network Environment on Traffic Classification
Abolghasem Rezaei Khesal - Mehdi Teimouri
Bridging Knowledge and Language Models in Healthcare: A RAG Survey
Seyedali Hasanzadeh - Fahimeh Ghasemian - Elham Shabaninia
Reversible Data Insertion in Encryption Domain Based on Reduced Quad Difference Expansion
Alireza Ghaemi - Mohammad Zare Ehteshami - Amirhossein Ghaemi
Averting Mode Collapse for Generative Zero-Shot Learning
Shayan Ramazi - Setare Shabani
more
Samin Hamayesh - Version 43.7.0