A New Diversity Technique for Imbalance Learning Ensembles


  • Hartono .
  • Opim Salim Sitompul
  • Erna Budhiarti Nababan
  • Tulus .
  • Dahlan Abdullah
  • Ansari Saleh Ahmar




Class Imbalance, Classifier Ensembles. Data Diversity, Hybrid Approach Redefinition


Data mining and machine learning techniques designed to solve classification problems require balanced class distribution. However, in reality sometimes the classification of datasets indicates the existence of a class represented by a large number of instances whereas there are classes with far fewer instances. This problem is known as the class imbalance problem. Classifier Ensembles is a method often used in overcoming class imbalance problems. Data Diversity is one of the cornerstones of ensembles. An ideal ensemble system should have accurrate individual classifiers and if there is an error it is expected to occur on different objects or instances. This research will present the results of overview and experimental study using Hybrid Approach Redefinition (HAR) Method in handling class imbalance and at the same time expected to get better data diversity. This research will be conducted using 6 datasets with different imbalanced ratios and will be compared with SMOTEBoost which is one of the Re-Weighting method which is often used in handling class imbalance. This study shows that the data diversity is related to performance in the imbalance learning ensembles and the proposed methods can obtain better data diversity.




[1] Chawla NV, Japkowicz N & Kolcz A (2004), Special Issue Learning Imbalanced Datasets. SGIKDD Explor. Newsl 6(1), 1-6

[2] Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H & Bing G (2017), Learning From Class-Imbalanced Data. Experts Systems with Application 73, 220-239

[3] Pastor J F D, Rodriguez J J, Osorio C I G & Kuncheva L I (2015), Diversity techniques improve the performance of the best imbalance learning ensembles. Information Sciences 325, 98-117

[4] Roy A, Cruz R M O, Sabourin M & Cavalcanti G D C (2018), A Study on combining Dynamic Selection and Data Preprocessing for Imbalance Learning. Neurocomputing

[5] Hartono, Sitompul O S, Tulus, Nababan E B (2018), Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem. IOP Conference Series: Materials Science and Engineering, 288, 012075.

[6] Galar M, Fernandez A, Barrenechea E & Bustince H (2012), A Review on Ensembles for the Class Imbalance Problem: Bagging, Boosting, and Hybrid-Based Approachs. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 42(4), 1-21

[7] Jian C, Gao J & Ao Y (2016), A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Machine Ensemble. Neurocomputing 193, 115-122

[8] Kuncheva L I, Combining Pattern Classifiers, John Wiley & Sons, (2004), pp. 295-327

[9] Wang S & Yao X, "Diversity Analysis on Imbalanced Data Sets by Using Ensemble Models", Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, (2009)

[10] Sun Y, Kamel M S, Wong A K C & Wang Y (2007), Cost-Sensitive Boosting for Classification of Imbalanced Data. Pattern recognition 10, 3358-3378

[11] Yule G U (1900), On The Association of Attributes in Statistics. Philosophical Transactions of The Royal Society of London A194, 257-319

[12] Pastor J F D, Rodriguez J J, Osorio C I G, Kuncheva L I (2015), Random Balance: Ensembles of Variable Priors Classifiers for Imbalanced Data. Knowledge-Based Systems 85, 96-111

[13] Chawla N, Bowyer K, Hall L & Kegelmeyer P (2002), SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research 16, 321-357