Kannada morpheme segmentation using machine learning

  • Authors

    • Sachi Angle
    • B Ashwath Rao
    • S N. Muralikrishna
    2018-05-29
    https://doi.org/10.14419/ijet.v7i2.31.13395
  • .
  • This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various  methods for evaluation were considered and an accuracy of 85.97% was achieved.

  • References

    1. [1] http://tdil-dc.in/tdildcMain/articles/134692Draft%20POS%20Tag%20standard.pdf

      [2] Vikram S, “Morphology: Indian Languages and European Languagesâ€, International Journal of Scientific and Research Publications, Vol.3, No.6, (2013).

      [3] Goyal V & Lehal GS, “Hindi morphological analyzer and generatorâ€, First International Conference on. Emerging Trends in Engineering and Technology, (2008).

      [4] Gupta R, Goyal P & Diwakar S, “Transliteration among Indian Languages using WX Notationâ€, KONVENS, (2010).

      [5] Creutz M & Lagus K, “Unsupervised models for morpheme segmentation and morphology learning. ACM Transâ€, Speech Lang. Process., Vol.4, No.1, (2007).

      [6] Hammarström H & Borin L, “Unsupervised learning of morphologyâ€, Comput. Linguist., Vol.37, No.2, (2011), pp.309-350.

      [7] Ruokolainen T, Kohonen O, Virpioja S & Kurimo M, “Supervised morphological segmentation in a low-resource learning setting using conditional random fieldsâ€, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, (2013), pp.29-37.

      [8] Larkey LS & Connell ME, “Structured queries, language modeling, and relevance modeling in cross-language information retrievalâ€, Information processing & management, Vol.41, No.3,(2005), pp.457–473.

      [9] Vikram TN & Shalini R Urs, “Development of prototype morphological analyzer for the south indian language of kannadaâ€,. Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, (2007), pp.109–116.

      [10] Shambhavi BR., Ramakanth Kumar P, Srividya K, Jyothi BJ, Spoorti Kundargi & Varsha Shastri G, “Kannada morphological analyser and generator using trieâ€, IJCSNS, Vol.11, (2011).

      [11] Veerappan R, Antony PJ, Saravanan S & Soman KP, “A rule based kannada morphological analyzer and generator using finite state transducerâ€, International Journal of Computer Applications, Vol.27, No.10,(2011), pp.45–52.

      [12] Bhat S, “Morpheme segmentation for kannada standing on the shoulder of giantsâ€, 24th International Conference on Computational Linguistics, (2012).

      [13] Melinamath BC & Mallikarjunmath AG, “A morphological generator for kannada based on finite state transducersâ€, Electronics Computer Technology (ICECT), Vol.1, (2011), pp.312–316.

      [14] Dhanalakshmi V, Rekha RU, Kumar A, Soman KP & Rajendran S, “Morphological analyzer for agglutinative languages using machine learning approachesâ€, International Conference on Advances in Recent Technologies in Communication and Computing, (2009), pp.433-435.

      [15] Xia F, “The segmentation guidelines for the Penn Chinese Treebank (3.0)â€, Technical Report, (2000).

      [16] Cakıcı R, “Morpheme segmentation in the METU-Sabancı Turkish Treebankâ€, Proceedings of the Sixth Linguistic Annotation Workshop. Association for Computational Linguistics, (2012).

      [17] Rao A, Muralikrishna SN & Nayak A, “Developing A Dependency Treebank for Kannadaâ€, An International Journal of Engineering Sciences, Special Issue iDravadian, (2014).

      [18] Bharati A, Sangal R & Sharma DM, “SSF: Shakti standard format guideâ€, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India, (2007), pp.1-25.

  • Downloads

  • How to Cite

    Angle, S., Ashwath Rao, B., & N. Muralikrishna, S. (2018). Kannada morpheme segmentation using machine learning. International Journal of Engineering & Technology, 7(2.31), 45-49. https://doi.org/10.14419/ijet.v7i2.31.13395