Part of Speech Tagging for Arabic Long Sentence

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Part Of Speech (POS) tagging of Arabic words is a difficult and non-travail task it was studied in details for the last twenty years and its performance affects many applications and tasks in area of natural language processing (NLP). The sentence in Arabic language is very long compared with English sentence. This affect tagging process for any approach deals with complete sentence at once as in Hidden Markov Model HMM tagger. In this paper, new approach is suggested for using HMM and n-grams taggers for tagging Arabic words in a long sentence. The suggested approach is very simple and easy to implement. It is implemented on data set of 1000 documents of 526321 tokens annotated manually (containing punctuations). The results shows that the suggested approach has higher accuracy than HMM and n-gram taggers. The F-measures were 0.888, 0.925 and 0.957 for n-grams, HMM and the suggested approach respectively.


  • Keywords


    .

  • References


      [1] Jurafsky D & Martin J, “Speech and Language Processing: An introduction to natural language processing”, computational linguistics, and speech recognition, (2008).

      [2] Nitin I & Fred J, Handbook of Natural Language Processing, Second Edition, Chapman & Hall/CRC Machine Learning & Pattern Recognition, USA, (2010).

      [3] Aliwy AH, “Arabic morphosyntactic raw text part of speech tagging system”, Ph.D dissertation, University of Warsaw, warsaw, Poland, (2010).

      [4] Darwish K, Abdelali A & Mubarak H, “Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging”, LREC, (2014), pp.2926-2931.‏

      [5] Diab M, Hacioglu K & Jurafsky D, “Automatic tagging of Arabic text: From raw text to base phrase chunks”, Proceedings of HLT-NAACL:Short papers, (2004), pp.149-152.‏

      [6] Attia M & Rashwan M, “A large-scale Arabic POS tagger based on a compact Arabic POS tags set, and application on the statistical inference of syntactic diacritics of Arabic text words”, Proceedings of the Arabic Language Technologies and Resources Int’l Conference, (2004).

      [7] Albared M, Omar N, Ab Aziz MJ & Nazri MZA, “Automatic part of speech tagging for Arabic: an experiment using Bigram hidden Markov model”, International Conference on Rough Sets and Knowledge Technology, (2010), 361-370.

      [8] Mansour S, Sima'an K & Winter Y, “Smoothing a lexicon-based POS tagger for Arabic and Hebrew”, Proceedings of the Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, (2007), pp.97-103.‏

      [9] Surendar, A., & Nelakuditi, U. R. (2017). Editorial -New developments in electronics, cloud and IoT. Electronic Government, 13(4).

      [10] Albared M, Omar N & Ab Aziz MJ, “Developing a competitive HMM arabic POS tagger using small training corpora”, Asian Conference on Intelligent Information and Database Systems, (2011), pp.288-296.‏

      [11] Aliwy AH, “Combining POS taggers in master-slaves technique for highly inflected languages as Arabic”, International Conference on Cognitive Computing and Information Processing, (2015), pp. 1-5.

      [12] Abbas M, Smaili K & Berkani D, “Evaluation of Topic Identification Methods on Arabic Corpora”, Journal of Digital Informa0on Management, Vol.9, No.5, (2011), pp.185-192.

      [13] Toutanova K, Klein D, Manning CD & Singer Y, “Feature-Rich Part-Of-Speech Tagging With a Cyclic Dependency Network”, Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, (2003), pp.173–180.

      [14] Z Iskakova, M Sarsembayev, Z Kakenova (2018). Can Central Asia be integrated as asean? Opción, Año 33. 152-169.

      [15] G Cely Galindo (2017) Del Prometeo griego al de la era-biós de la tecnociencia. Reflexiones bioéticas Opción, Año 33, No. 82 (2017):114-133


 

View

Download

Article ID: 17671
 
DOI: 10.14419/ijet.v7i3.27.17671




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.