Prediction of Breast Cancer Using Big Data Analytics

  • Authors

    • K. Shailaja
    • B. Seetharamulu
    • M. A. Jabbar
    2018-09-25
    https://doi.org/10.14419/ijet.v7i4.6.20480
  • Big data, Healthcare, Breast cancer, KNN, Wisconsin dataset.
  • Big data is a phrase which is used to report collection of data that vast in size and still growing exponentially with time. It covers structured unstructured and semi-structured data. Now a day’s big data is widely used in healthcare for prediction of diseases. Breast cancer is one of top cancer that occurs in a woman. It is the second main leading reason for the death of a woman in the United States and in Asian countries. If we identify this disease in early stages there is a better chance for curing. For this experiment, we used K nearest neighbor (KNN) algorithm for finding classification accuracy and it is implemented on R tool. We consider Wisconsin breast cancer (original) dataset taken from UCI machine learning repository.

     

     

  • References

    1. [1] K. Shailaja et al., “Applications of Big Data Analytics: A Systematic Reviewâ€, International Journal of Engineering Research in Computer Science and Engineering, volume 5, 2018.

      [2] American Cancer Society. Breast Cancer Facts & Figures 2005-2006. Atlanta: American Cancer Society, Inc. http://www.cancer.org/.

      [3] Ms. Shweta Srivastava et al., “A Review Paper on Feature Selection Methodologies and Their Applicationsâ€, International Journal of Engineering Research and Development, Volume 7, PP. 57-61, 2013.

      [4] Abdur Rahman Onik et al., “An Analytical Comparison on Filter Feature Extraction Method in Data Mining using J48 Classifier, International Journal of Computer Applications, volume 13, 2015.

      [5] Mitushi Modi et al., “An evaluation of filter and wrapper methods for feature selection in classificationâ€, International Journal of Engineering Development and Research, volume 2, 2014.

      [6] Syed Imran Ali et al., “A Feature Subset Selection Method based on Symmetric Uncertainty and Ant Colony Optimizationâ€, International Journal of Computer Applications, volume 11, 2012.

      [7] Sai Prasad Potharaju et al., “A Novel M-Cluster of Feature Selection Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Datasetsâ€, Journal of Engineering Science and Technology Review, volume 6, pp.154-162, 2017.

      [8] Bangsuk Jantawan et al., “A Comparison of Filter and Wrapper Approaches with Data Mining Techniques for Categorical Variables Selectionâ€, International Journal of Innovative Research in Computer and Communication Engineering, Volume 2, 2014.

      [9] MA Jabbar, “Prediction of heart disease using k-nearest neighbor and particle swarm optimizationâ€, Biomedical Research , volume 28, 2017.

      [10] M Akhil Jabbar, et al., “Heart disease classification using nearest neighbor classifier with feature subset selectionâ€, Anale. Seria Informatica, volume 11 , 2013.

      [11] M Akhil Jabbar et al., Classification of heart disease using k-nearest neighbor and genetic algorithm, Procedia Technology, volume 10, 85-94, 2013.

      [12] K .P Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012.

      [13] A.Priyanga, “Effectiveness of Data Mining - based Cancer Prediction Systemâ€, International Journal of Computer Applications, volume 10, 2013.

      [14] .Animesh et al., “Study and analysis of Breast cancer Cell Detection using Naïve Bayes, SVM and Ensemble Algorithmsâ€, International Journal of Computer Applications, vol.2, 2016.

      [15] K.Sivakami et al., “Mining Big Data: Breast Cancer Prediction using DT - SVM Hybrid Modelâ€, International Journal of Scientific Engineering and Applied Science, volume 1, 2015.

      [16] G. Sumalatha et al., “A Study on Early Prevention and Detection of Breast Cancer using Data Mining Techniquesâ€, International Journal of Innovative Research in Computer and Communication Engineering, volume 5,2017.

      [17] D.R Umesh et al., “Big Data Analytics to Predict Breast Cancer Recurrence on SEER Dataset using MapReduce Approachâ€, International Journal of Computer Applications, volume 7, 2016.

      [18] Hiba Asri, “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosisâ€, The 6th International Symposium on Frontiers in Ambient and Mobile Systems, pp.1064-1069.

      [19] Asuncion, A. & Newman, D.J. (2007). UCI Machine learning repository, http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, School of Information and Computer Science.

      [20] https://www.r-project.org/

      [21] Sai Prasad Potharaju et al., “A Novel M-Cluster of Feature Selection Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Data setsâ€, Journal of Engineering Science and Technology Review, volume 6, pp. 154-162, 2017.

  • Downloads

  • How to Cite

    Shailaja, K., Seetharamulu, B., & A. Jabbar, M. (2018). Prediction of Breast Cancer Using Big Data Analytics. International Journal of Engineering & Technology, 7(4.6), 223-226. https://doi.org/10.14419/ijet.v7i4.6.20480