Kannada morpheme segmentation using machine learning

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various  methods for evaluation were considered and an accuracy of 85.97% was achieved.

  • Keywords


  • References

      [1] http://tdil-dc.in/tdildcMain/articles/134692Draft%20POS%20Tag%20standard.pdf

      [2] Vikram S, “Morphology: Indian Languages and European Languages”, International Journal of Scientific and Research Publications, Vol.3, No.6, (2013).

      [3] Goyal V & Lehal GS, “Hindi morphological analyzer and generator”, First International Conference on. Emerging Trends in Engineering and Technology, (2008).

      [4] Gupta R, Goyal P & Diwakar S, “Transliteration among Indian Languages using WX Notation”, KONVENS, (2010).

      [5] Creutz M & Lagus K, “Unsupervised models for morpheme segmentation and morphology learning. ACM Trans”, Speech Lang. Process., Vol.4, No.1, (2007).

      [6] Hammarström H & Borin L, “Unsupervised learning of morphology”, Comput. Linguist., Vol.37, No.2, (2011), pp.309-350.

      [7] Ruokolainen T, Kohonen O, Virpioja S & Kurimo M, “Supervised morphological segmentation in a low-resource learning setting using conditional random fields”, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, (2013), pp.29-37.

      [8] Larkey LS & Connell ME, “Structured queries, language modeling, and relevance modeling in cross-language information retrieval”, Information processing & management, Vol.41, No.3,(2005), pp.457–473.

      [9] Vikram TN & Shalini R Urs, “Development of prototype morphological analyzer for the south indian language of kannada”,. Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, (2007), pp.109–116.

      [10] Shambhavi BR., Ramakanth Kumar P, Srividya K, Jyothi BJ, Spoorti Kundargi & Varsha Shastri G, “Kannada morphological analyser and generator using trie”, IJCSNS, Vol.11, (2011).

      [11] Veerappan R, Antony PJ, Saravanan S & Soman KP, “A rule based kannada morphological analyzer and generator using finite state transducer”, International Journal of Computer Applications, Vol.27, No.10,(2011), pp.45–52.

      [12] Bhat S, “Morpheme segmentation for kannada standing on the shoulder of giants”, 24th International Conference on Computational Linguistics, (2012).

      [13] Melinamath BC & Mallikarjunmath AG, “A morphological generator for kannada based on finite state transducers”, Electronics Computer Technology (ICECT), Vol.1, (2011), pp.312–316.

      [14] Dhanalakshmi V, Rekha RU, Kumar A, Soman KP & Rajendran S, “Morphological analyzer for agglutinative languages using machine learning approaches”, International Conference on Advances in Recent Technologies in Communication and Computing, (2009), pp.433-435.

      [15] Xia F, “The segmentation guidelines for the Penn Chinese Treebank (3.0)”, Technical Report, (2000).

      [16] Cakıcı R, “Morpheme segmentation in the METU-Sabancı Turkish Treebank”, Proceedings of the Sixth Linguistic Annotation Workshop. Association for Computational Linguistics, (2012).

      [17] Rao A, Muralikrishna SN & Nayak A, “Developing A Dependency Treebank for Kannada”, An International Journal of Engineering Sciences, Special Issue iDravadian, (2014).

      [18] Bharati A, Sangal R & Sharma DM, “SSF: Shakti standard format guide”, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India, (2007), pp.1-25.




Article ID: 13395
DOI: 10.14419/ijet.v7i2.31.13395

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.