Random Forest and Novel Under-Sampling Strategy for Data Imbalance in Software Defect Prediction

Utomo Pujianto; . .

doi:10.14419/ijet.v7i4.15.21368

Article Summary Abstract References Full Article How to cite

Authors
- Utomo Pujianto
- . .
2018-10-07

https://doi.org/10.14419/ijet.v7i4.15.21368
Data imbalance, Random forests, Software defect prediction, Under-sampling.
Abstract

Data imbalance is one among characteristics of software quality data sets that can have a negative effect on the performance of software defect prediction models. This study proposed an alternative to random under-sampling strategy by using only a subset of non-defective data which have been calculated as having biggest distance value to the centroid of defective data. Combined with random forest Â Â Â Â Â Â classification, the proposed method outperformed both the random under-sampling and non-sampling method on the basis of accuracy, AUC, f-measure, and true positive rate performance measures.
Â
Â
References
1. [1] Jones, C., & Bonsignour, O. (2011). The economics of software quality. Addison-Wesley Professional.
  [2] Boehm, B., & Basili, V. R. (2001). Top 10 list [software development]. Computer, 34(1), 135-137.
  [3] Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., & Bener, A. (2010). Defect prediction from static code features: Current results, limitations, new approaches. Automated Software Engineering, 17(4), 375-407.
  [4] Catal, C. (2011). Software fault prediction: A literature review and current trends. Expert Systems with Applications, 38(4), 4626-4636.
  [5] Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276-1304.
  [6] Song, Q., Jia, Z., Shepperd, M., Ying, S., & Liu, J. (2011). A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering, 37(3), 356-370.
  [7] Wahono, R. S. (2015). A systematic literature review of software defect prediction. Journal of Software Engineering, 1(1), 1-16.
  [8] LÃ³pez, V., FernÃ¡ndez, A., GarcÃa, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113-141.
  [9] Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. Proceedings of the First International Conference on Advanced Data and Information Engineering, pp. 13-22.
  [10] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
  [11] Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
  [12] Zhang, H., & Wang, Z. (2011). A normal distribution-based over-sampling approach to imbalanced data classification. Proceedings of the International Conference on Advanced Data Mining and Applications, pp. 83-96.
  [13] Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2013). Data quality: Some comments on the NASA software defect datasets. IEEE Transactions on Software Engineering, 39(9), 1208-1215.
  [14] Gray, D., Bowes, D., Davey, N., Sun, Y., & Christianson, B. (2012). Reflections on the NASA MDP data sets. IET Software, 6(6), 549-558.
  [15] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.
Downloads
How to Cite
Pujianto, U., & ., . (2018). Random Forest and Novel Under-Sampling Strategy for Data Imbalance in Software Defect Prediction. International Journal of Engineering & Technology, 7(4.15), 39-42. https://doi.org/10.14419/ijet.v7i4.15.21368
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: 2018-10-09

Accepted date: 2018-10-09

Published date: 2018-10-07

Random Forest and Novel Under-Sampling Strategy for Data Imbalance in Software Defect Prediction

Authors

Abstract

References

Downloads

How to Cite

Published