Study and analysis of feature selection problems and impact of bias in machine learning disease prediction models

  • Authors

    • Anil Kumar Prajapati Institute of Computer Science, Vikram University Ujjain MP, India https://orcid.org/0000-0001-9701-8423
    • Umesh Kumar Singh Institute of Computer Science, Vikram University Ujjain (MP)
    • Rekha Singh School of Computer Science & Information Technology, DAVV Indore (MP)
    • Arpita Shukla JNS Govt. PG College Shujalpur
    2024-05-27
    https://doi.org/10.14419/hn1n3h15
  • Classification; Health Care; Fuzzy Logic; Machine Learning.
  • Abstract

    In the current scenario machine learning is the branch of artificial intelligence being used in every field and medicine is one of them. In medical science, the use of machine learning techniques aims to improve patient care by collecting, and analyzing patient data, and designing advanced and intelligent tools and/or devices for disease detection using collective experience. ML technology detects patterns associated with specific diseases by analyzing large datasets that include various patient records, such as diabetes, blood pressure, cholesterol, X-rays, MRIs, CT scans, imaging data, and genomic information. ML algorithms compute the primary symptoms of the disease. Based on these calculations the disease is identified. Here it is necessary to have sufficient dataset and/or features for computation. The understanding of the ML model depends on the underlying feature to be used to identify the related problem. The fairness of a machine learning algorithm depends on which symptoms are selected to determine any disease. The selection of features for ML models is an important task, more or less features can make the model underfit or overfit. Incorrect determination of selected features can introduce bias into the model which can greatly affect the accuracy of the model. If the bias in the machine learning model is not properly tuned or the bias is tuned too high or too low then the prediction does not cover the underlined pattern. Diseases arise in different circumstances; each disease has its special characteristics. To cover all the basic parameters of each disease is a very tough task. If a basic attribute is missed and/or an attribute that has no relation to the disease is captured then the desired result of the model may be affected. In the proposed research paper, the feature selection problem and bias effect have been analyzed through the Support Vector Machine (SVM) and Logistic Regression (LR) algorithm.

  • References

    1. Li H, Deep learning for natural language processing: advantages and challenges, National Science Review, Volume 5, Issue 1, January 2018, Pages 24–26, https://doi.org/10.1093/nsr/nwx110.
    2. Bishop CM, “Neural Networks for Pattern Recognition”, Oxford University Press, Inc., USA, 1995. https://books.google.co.in/books?. https://doi.org/10.1093/oso/9780198538493.001.0001.
    3. Cios KJ, and Shin I, “Image recognition neural network: IRNN”, Neurocomputing 7 (1995) 159–185, https://doi.org/10.1016/0925-2312(93)E0062-I.
    4. Kaggle Heart Disease Dataset, Access date 15/10/2023 https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset.
    5. Prajapati AK, and Singh UK, “Cardiovascular disease (CVD) prediction through Artificial Neural network in the perspective of Deep Learning”, International Journal of Computing Algorithm, Volume 11, Issue 2, 2022, pp. 1-7, https://doi.org/10.20894/IJCOA.101.011.002.002.
    6. Alanazi R, “Identification and Prediction of Chronic Diseases Using Machine Learning Approach”, Journal of Healthcare and Engineer-ing, Volume 2022, PP. 2022:2826127. Feb - 2022, https://doi.org/10.1155/2022/2826127.
    7. Kliegr T, Bahnik S, Furnkranz J, “A review of possible effects of cognitive biases on interpretation of rule-based machine learning models”, Artificial Intelligence, Volume 295,2021, PP.103458, ISSN 0004-3702, https://doi.org/10.1016/j.artint.2021.103458.
    8. Prajapati AK and Singh UK, “An empirical analysis of ML techniques and/or algorithms for disease diagnosis prediction from the per-spective of cardiovascular disease (CVD)”, International Journal of Computing Algorithm, Volume 11, Issue 2, PP. 6-16, Dec. 2022, https://doi.org/10.20894/IJCOA.101.011.002.002.
    9. Gu, J, and, Oelke D, “Understanding Bias in Machine Learning”, ArXiv abs/1909.01866, 1st Workshop on Visualization for AI Ex-plainability in 2018 IEEE.
    10. Sun W, Nasraoui O, Shafto P, “Evolution and impact of bias in human and machine learning algorithm interaction”, PLOS ONE, Vol-ume 15, Issue 8, Id - e0235502. https://doi.org/10.1371/journal.pone.0235502.
    11. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A, “A survey on bias and fairness in machine learning”, Volume 54, Issue 6, PP. 1-33, July 2021, https://doi.org/10.1145/3457607.
    12. Sun W, Nasraoui O, Shafto P, “Evolution and impact of bias in human and machine learning algorithm interaction”, PLOS ONE Vol-ume 15, Issue 8, Id - e0235502. PP. 1-39, August 2020, https://doi.org/10.1371/journal.pone.0235502.
    13. Vokinger, KN, Feuerriegel S, & Kesselheim AS, “Mitigating bias in machine learning for medicine”. Communications Medicine 1, Vol-ume 25, PP 1-3, August 2021 https://doi.org/10.1038/s43856-021-00028-w.
    14. Prajapati AK, & Singh, UK, “An Optimal Solution to the Overfitting and Underfitting Problem of Healthcare Machine Learning Models. Journal of Systems Engineering and Information Technology (JOSEIT), Volume. 2, PP. 77-84. (2023) https://doi.org/10.29207/joseit.v2i2.5460.
    15. Mehrabi N, Morstatter F, Saxena N, Lerman K, and Galstyan A, “A Survey on Bias and Fairness in Machine Learning”, ACM Compu-ting Surveys, Volume. 54, Issue 6, Article 115, PP 35, July 2022, https://doi.org/10.1145/3457607.
    16. Zhang K, Khosravi B, Vahdati S, Faghani S, Nugen F, Rassoulinejad-Mousavi SM, Moassefi M, M. Jagtap JM, Singh Y, Rouzrokh R, and J. Erickson B, "Mitigating Bias in Radiology Machine Learning: 2. Model Development", Radiology Artificial Intelligence, Volume 4, Issue 5, PP 1-8, August 2022, https://doi.org/10.1148/ryai.220010.
    17. Pagano TP, Loureiro RB, Lisboa FVN, Cruz GOR, Peixoto RM, De Sousa Guimarães GA, Dos Santos LL, Araujo MM, Cruz M, De Oliveira ELS, Winkler I, and Nascimento EGS, "Bias and unfairness in machine learning models: a systematic literature review", Vol-ume 3, PP 1-24 Nov 2022.
    18. Gupta GK and Sharma DK, "A Review of Overfitting Solutions in Smart Depression Detection Models," 2022 9th International Confer-ence on Computing for Sustainable Global Development (INDIACom), IEEE - New Delhi, India, PP 145-151, March 2022, https://doi.org/10.23919/INDIACom54597.2022.9763147.
    19. Thomas H, Dignum V, and Bensch S. "Bias in Machine Learning What is it Good for?", Volume 2, PP 1-8, April 2020.
    20. Mehrabi N, Morstatter F, Saxena N, Lerman K, and Galstyan A,"A Survey on Bias and Fairness in Machine Learning", ACM Compu-ting Surveys, Volume 54, Issue 6, Article No. 115, PP 1-35, July 2021, https://doi.org/10.1145/3457607.
    21. Fahse, T, Huber, V, Van Giffen B, "Managing Bias in Machine Learning Projects". Innovation Through Information Systems, Volume 2, Issue 2021, PP 94-109, Springer International Publishing, October 2021, https://doi.org/10.1007/978-3-030-86797-3_7.
    22. Bailly A, Blanc C, Francis E, Guillotin T, Jamal F, Wakim B, Roy P, “Effects of dataset size and interactions on the prediction perfor-mance of logistic regression and deep learning models”, Computer Methods and Programs in Biomedicine, Volume 213, PP 106504, ISSN 0169-2607, January 2022, https://doi.org/10.1016/j.cmpb.2021.106504.
    23. Gavrilov AD, Jordache A, Vasdani M, and Deng J, “Preventing Model Overfitting and Underfitting in Convolutional Neural Networks”, International Journal of Software Science and Computational Intelligence, Volume 10, Issue 4, PP 1-10, December 2018 https://doi.org/10.4018/IJSSCI.2018100102.
    24. Heintz F, Milano M, and O’Sullivan B “Trustworthy AI-Integrating Learning, Optimization, and Reasoning”, Conference proceedings, First International Workshop workshop, Springer Nature, PP 31-42, September 2020, https://doi.org/10.1007/978-3-030-73959-1.
    25. Li L, and Spratling M “Understanding and combating robust overfitting via input loss landscape analysis and regularization”, Pattern Recognition, Volume 136, Issue, PP 1-11, April 2023, https://doi.org/10.1016/j.patcog.2022.109229.
    26. Gupta GK and Sharma DK, "A Review of Overfitting Solutions in Smart Depression Detection Models”, 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2022, PP 145-151, https://doi.org/10.23919/INDIACom54597.2022.9763147.
    27. Ghosh P, Azam S, Jonkman M, Karim S, Shamrat FMJM, Ignatius E, Sultana S, Beeravolu AR, and De Boer AF, "Efficient Prediction of Cardiovascular Disease Using Machine Learning Algorithms with Relief and LASSO Feature Selection Techniques," IEEE Access, Volume. 9, PP 19304-19326, 2021, https://doi.org/10.1109/ACCESS.2021.3053759.
    28. Mat Deris A and Zain AM and Sallehuddin R, “Overview of Support Vector Machine in Modeling Machining Performances”, Interna-tional Conference on Advances in Engineering, Procedia Engineering 2011, Volume 24, PP 308-312, https://doi.org/10.1016/j.proeng.2011.11.2647.
    29. Li H. “Support Vector Machine, Machine” Learning Methods. Springer, Singapore, 2024 https://doi.org/10.1007/978-981-99-3917-6_7.
    30. Yang Xu, Bern Klein, Genzhuang Li, Bhushan Gopaluni, “Evaluation of logistic regression and support vector machine approaches for XRF based particle sorting for a copper ore”, Minerals Engineering, Volume 192, PP 108003, ISSN 0892-6875, 2023, https://doi.org/10.1016/j.mineng.2023.108003.
    31. Loh, WY, “Logistic Regression Tree Analysis”, Pham, H. (eds) Springer Handbook of Engineering Statistics, Springer Handbooks, Springer, London, 2023 https://doi.org/10.1007/978-1-4471-7503-2_30.
    32. Jain A, and Sharma A, “Membership function formulation methods for fuzzy logic systems: A comprehensive review” Journal of Criti-cal Reviews, Volume 7, Issue 19 PP 8717-8733, 2020.
    33. Subhashini, LDCS Li, Y, Zhang J, and Atukorale, AS, “Integration of fuzzy logic and a convolutional neural network in three-way deci-sion-making”, Expert Systems with Applications, Volume 202, PP 117103, 2022, https://doi.org/10.1016/j.eswa.2022.117103.
  • Downloads

  • How to Cite

    Anil Kumar Prajapati, Umesh Kumar Singh, Rekha Singh, & Shukla , A. . (2024). Study and analysis of feature selection problems and impact of bias in machine learning disease prediction models. International Journal of Engineering & Technology, 13(2), 182-188. https://doi.org/10.14419/hn1n3h15