Evaluation of Named Entity Recognition Algorithms Using Clinical Text Data

  • Authors

    • J. Manimaran
    • T. Velmurugan
    2018-09-22
    https://doi.org/10.14419/ijet.v7i4.5.20093
  • Natural Language Processing, Text Mining, Information Extraction, Medical Text Data.
  • Named Entity Recognition (NER) is one of the most important research areas in the field of medical. Presently, most of the clinical NER research is based on two approaches as Knowledge Engineering (KE) and Machine Learning (ML). KE is used a word lookup table approach and ML is known as supervised learning approach. The aim of this work is to evaluate a recent algorithm in KE and ML approaches using various clinical text databases. Therefore, the NOBLE Coder and Clinical Named Entity Recognition (CliNER) algorithms are selected, NOBLE Coder is depended on KE approach and CliNER is ML approach. The two algorithms will be described and compared its performance on three openly available datasets that is obtained from Medical Information Mart for Intensive Care II (MIMIC II), Pittsburgh Medical Center, and i2b2 2010 challenge. Among these datasets, the annotated data are included which is used to detect the highest sensitivity and specificity on each algorithm. The randomly distributed patient reports were taken as input data to these algorithms. By executing these algorithms, the information is extracted and which classified into predefined concept types, for example medical problems, treatments and tests. The accuracy of both algorithms is calculated using standard measures. The taken two algorithms are analyzed based on the produced results. Finally, the best among two is suggested for better use in clinical data.

     

  • References

    1. [1] Edward H. Shortliffe and James J. Cimino, “Biomedical Informatics: Computer Applications in Health Care and Biomedicine (Health Informatics)â€, Springer-Verlag New York, Inc., 2006.

      [2] Ira Goldstein, Anna Arzumtsyan, and Ozlem Uzuner, “Three Approaches to Automatic Assignment of ICD-9-CM Codes to Radiology Reportsâ€, AMIA Annual Symposium Proceedings, pp. 279–283 (2007).

      [3] Lauren Heidemann, James Law, and Robert J. Fontana, "A text searching tool to identify patients with idiosyncratic drug-induced liver injury", Digestive diseases and sciences, Vol. 62, No. 3, pp. 615-625, 2017.

      [4] Seonho Kim and Juntae Yoon, "Link-topic model for biomedical abbreviation disambiguation", Journal of biomedical informatics, Vol. 53, pp. 367-380, 2015.

      [5] Ozlem Uzuner, Yuan Luo, and Peter Szolovits, "Evaluating the state-of-the-art in automatic de-identification", Journal of the American Medical Informatics Association, Vol. 14, Issue 5, pp. 550-563, 2007.

      [6] Ozlem Uzuner, "Recognizing obesity and comorbidities in sparse data", Journal of the American Medical Informatics Association, Vol. 16, Issue 4, pp. 561-570, 2009.

      [7] Ozlem Uzuner, Imre Solti, and Eithon Cadag, "Extracting medication information from clinical text", Journal of the American Medical Informatics Association, Vol. 17, Issue 5, pp. 514-518, 2010.

      [8] Ozlem Uzuner et al., "2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text", Journal of the American Medical Informatics Association, Vol. 18, Issue 5, pp. 552-556, 2011.

      [9] Ozlem Uzuner et al., “Evaluating the state of the art in coreference resolution for electronic medical recordsâ€, Journal of the American Medical Informatics Association, Vol. 19, Issue 5, pp. 786-791, 2012.

      [10] Weiyi Sun, Anna Rumshisky, and Ozlem Uzuner, "Annotating temporal information in clinical narratives", Journal of biomedical informatics, Vol. 46, pp. S5-S12, 2013.

      [11] Kaoru Yamamoto et al., "Use of morphological analysis in protein name recognition", Journal of Biomedical Informatics, Vol. 37, Issue 6, pp. 471-482, 2004.

      [12] Adler Perotte et al., “Diagnosis code assignment: models and evaluation metricsâ€, Journal of the American Medical Informatics Association, Vol. 21, Issue 2, pp. 231–237, 2014, https://doi.org/10.1136/amiajnl-2013-002159

      [13] Elena Tutubalina and Sergey Nikolenko, "Combination of Deep Recurrent Neural Networks and Conditional Random Fields for Extracting Adverse Drug Reactions from User Reviews", Journal of Healthcare Engineering, 2017.

      [14] Yonghui Wu et al., "A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)", Journal of the American Medical Informatics Association, Vol. 24, Issue e1, pp. e79-e86, 2017.

      [15] Jon Patrick and Min Li, "High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge", Journal of the American Medical Informatics Association, Vol. 17, Issue 5, pp. 524-527, 2010.

      [16] Eugene Tseytlin et al., "NOBLE–Flexible concept recognition for large-scale biomedical natural language processing", BMC bioinformatics, Vol. 17, Issue 32, 2016.

      [17] William Boag et al., "CliNER: A lightweight tool for clinical named entity recognition", AMIA Joint Summits on Clinical Research Informatics (poster), 2015.

      [18] Enrique Amigo et al., "A comparison of extrinsic clustering evaluation metrics based on formal constraints", Information retrieval, Vol. 12, Issue 4, pp. 461-486, 2009.

      [19] Neil Ireson et al. "Evaluating machine learning for information extraction", Proceedings of the 22nd international conference on Machine learning. ACM, pp. 345-352, 2005.

      [20] N. Kanya, T. Ravi, and S. Geetha, "A comparative study of Information Extraction tools used for Biological database", Sustainable Energy and Intelligent Systems (SEISCON 2011), International Conference, pp. 886-890, 2011.

  • Downloads

  • How to Cite

    Manimaran, J., & Velmurugan, T. (2018). Evaluation of Named Entity Recognition Algorithms Using Clinical Text Data. International Journal of Engineering & Technology, 7(4.5), 295-302. https://doi.org/10.14419/ijet.v7i4.5.20093