Analyzing performance of classifiers for medical datasets

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    This paper analyses the performance of classification models using single classification and combination of ensemble method, which are Breast Cancer Wisconsin and Hepatitis data sets as training datasets. This paper presents a comparison of different classifiers based on a 10-fold cross validation using a data mining tool. In this experiment, various classifiers are implemented including three popular ensemble methods which are boosting, bagging and stacking for the combination. The result shows that for the classification of the Breast Cancer Wisconsin data set, the single classification of Naïve Bayes (NB) and a combination of bagging+NB algorithm displayed the highest accuracy at the same percentage (97.51%) compared to other combinations of ensemble classifiers. For the classification of the Hepatitisdata set, the result showed that the combination of stacking+Multi-Layer Perception (MLP) algorithm achieved a higher accuracy at 86.25%. By using the ensemble classifiers, the result may be improved. In future, a multi-classifier approach will be proposed by introducing a fusion at the classification level between these classifiers to obtain classification with higher accuracies.



  • Keywords

    Classification model; Data mining; Medical dataset.

  • References

      [1] Abraham R, Simha JB & Iyengar SS (2007), Medical datamining with a new algorithm for Feature Selection and Naïve Bayesian classifier. Proceedings of the IEEE 10th International Conference on Information Technology, pp. 44–49.

      [2] Al-Aidaroos KM, Bakar AA & Othman Z (2012), Medical Data Classification with Naive Bayes Approach. Information Technology Journal 11, 1166–1174.

      [3] Areerachakul S & Sanguansintukul S (2010), Classification and regression trees and MLP neural network to classify water quality of canals in Bangkok, Thailand. International Journal of Intelligent Computing Research 1, 43–50.

      [4] Bache K & Lichman M (2013), UCI machine learning repository, University of California.

      [5] Bhuvaneswari E & Dhulipala VS (2013), The study and analysis of classification algorithm for animal kingdom dataset. Information Engineering 2, 412–414.

      [6] Blake CL & Merz CJ (1998), UCI repository of machine learning databases, University of California.

      [7] EL-Bohy AM, Hashad AI & Taha HS (2015), Performance evaluation of hepatitis diagnosis using single and multi-classifiers fusion. International Journal of Engineering Research and Technology 4, 293–298.

      [8] Hickey SJ (2013), Naive Bayes classification of public health data with greedy feature selection. Communications of the IIMA 13, 87–98.

      [9] Nandhini M & Scholar PD (2016), Boosting and meta-learning techniques for distributed data mining on electronic medical datasets. International Journal of Computer Technology and Applications 7, 403–410.

      [10] Rosly R, Makhtar M, Awang MK, Rahman MN & Deris MM (2006), Multi-classifier models to improve accuracy of water quality application. ARPN Journal of Engineering and Applied Sciences 11, 3208–3211.

      [11] Salama GI, Abdelhalim M & Zeid MA (2012), Breast cancer diagnosis on three different datasets using multi-classifiers. International Journal of Computer and Information Technology 1, 36–43.




Article ID: 11370
DOI: 10.14419/ijet.v7i2.15.11370

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.