A New Diversity Technique for Imbalance Learning Ensembles
Keywords:Class Imbalance, Classifier Ensembles. Data Diversity, Hybrid Approach Redefinition
Data mining and machine learning techniques designed to solve classification problems require balanced class distribution. However, in reality sometimes the classification of datasets indicates the existence of a class represented by a large number of instances whereas there are classes with far fewer instances. This problem is known as the class imbalance problem. Classifier Ensembles is a method often used in overcoming class imbalance problems. Data Diversity is one of the cornerstones of ensembles. An ideal ensemble system should have accurrate individual classifiers and if there is an error it is expected to occur on different objects or instances. This research will present the results of overview and experimental study using Hybrid Approach Redefinition (HAR) Method in handling class imbalance and at the same time expected to get better data diversity. This research will be conducted using 6 datasets with different imbalanced ratios and will be compared with SMOTEBoost which is one of the Re-Weighting method which is often used in handling class imbalance. This study shows that the data diversity is related to performance in the imbalance learning ensembles and the proposed methods can obtain better data diversity.
 Chawla NV, Japkowicz N & Kolcz A (2004), Special Issue Learning Imbalanced Datasets. SGIKDD Explor. Newsl 6(1), 1-6
 Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H & Bing G (2017), Learning From Class-Imbalanced Data. Experts Systems with Application 73, 220-239
 Pastor J F D, Rodriguez J J, Osorio C I G & Kuncheva L I (2015), Diversity techniques improve the performance of the best imbalance learning ensembles. Information Sciences 325, 98-117
 Roy A, Cruz R M O, Sabourin M & Cavalcanti G D C (2018), A Study on combining Dynamic Selection and Data Preprocessing for Imbalance Learning. Neurocomputing
 Hartono, Sitompul O S, Tulus, Nababan E B (2018), Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem. IOP Conference Series: Materials Science and Engineering, 288, 012075.
 Galar M, Fernandez A, Barrenechea E & Bustince H (2012), A Review on Ensembles for the Class Imbalance Problem: Bagging, Boosting, and Hybrid-Based Approachs. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 42(4), 1-21
 Jian C, Gao J & Ao Y (2016), A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Machine Ensemble. Neurocomputing 193, 115-122
 Kuncheva L I, Combining Pattern Classifiers, John Wiley & Sons, (2004), pp. 295-327
 Wang S & Yao X, "Diversity Analysis on Imbalanced Data Sets by Using Ensemble Models", Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, (2009)
 Sun Y, Kamel M S, Wong A K C & Wang Y (2007), Cost-Sensitive Boosting for Classification of Imbalanced Data. Pattern recognition 10, 3358-3378
 Yule G U (1900), On The Association of Attributes in Statistics. Philosophical Transactions of The Royal Society of London A194, 257-319
 Pastor J F D, Rodriguez J J, Osorio C I G, Kuncheva L I (2015), Random Balance: Ensembles of Variable Priors Classifiers for Imbalanced Data. Knowledge-Based Systems 85, 96-111
 Chawla N, Bowyer K, Hall L & Kegelmeyer P (2002), SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research 16, 321-357
LicenseAuthors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under aÂ Creative Commons Attribution Licensethat allows others to share the work with an acknowledgement of the work''s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal''s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (SeeÂ The Effect of Open Access).