Kannada morpheme segmentation using machine learning

Sachi Angle; B Ashwath Rao; S N. Muralikrishna

doi:10.14419/ijet.v7i2.31.13395

Article Summary Keywords Abstract References Full Article How to cite

Authors
- Sachi Angle
- B Ashwath Rao
- S N. Muralikrishna
2018-05-29

https://doi.org/10.14419/ijet.v7i2.31.13395
.
This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various Â methods for evaluation were considered and an accuracy of 85.97% was achieved.
References
1. [1] http://tdil-dc.in/tdildcMain/articles/134692Draft%20POS%20Tag%20standard.pdf
  [2] Vikram S, â€œMorphology: Indian Languages and European Languagesâ€, International Journal of Scientific and Research Publications, Vol.3, No.6, (2013).
  [3] Goyal V & Lehal GS, â€œHindi morphological analyzer and generatorâ€, First International Conference on. Emerging Trends in Engineering and Technology, (2008).
  [4] Gupta R, Goyal P & Diwakar S, â€œTransliteration among Indian Languages using WX Notationâ€, KONVENS, (2010).
  [5] Creutz M & Lagus K, â€œUnsupervised models for morpheme segmentation and morphology learning. ACM Transâ€, Speech Lang. Process., Vol.4, No.1, (2007).
  [6] HammarstrÃ¶m H & Borin L, â€œUnsupervised learning of morphologyâ€, Comput. Linguist., Vol.37, No.2, (2011), pp.309-350.
  [7] Ruokolainen T, Kohonen O, Virpioja S & Kurimo M, â€œSupervised morphological segmentation in a low-resource learning setting using conditional random fieldsâ€, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, (2013), pp.29-37.
  [8] Larkey LS & Connell ME, â€œStructured queries, language modeling, and relevance modeling in cross-language information retrievalâ€, Information processing & management, Vol.41, No.3,(2005), pp.457â€“473.
  [9] Vikram TN & Shalini R Urs, â€œDevelopment of prototype morphological analyzer for the south indian language of kannadaâ€,. Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, (2007), pp.109â€“116.
  [10] Shambhavi BR., Ramakanth Kumar P, Srividya K, Jyothi BJ, Spoorti Kundargi & Varsha Shastri G, â€œKannada morphological analyser and generator using trieâ€, IJCSNS, Vol.11, (2011).
  [11] Veerappan R, Antony PJ, Saravanan S & Soman KP, â€œA rule based kannada morphological analyzer and generator using finite state transducerâ€, International Journal of Computer Applications, Vol.27, No.10,(2011), pp.45â€“52.
  [12] Bhat S, â€œMorpheme segmentation for kannada standing on the shoulder of giantsâ€, 24th International Conference on Computational Linguistics, (2012).
  [13] Melinamath BC & Mallikarjunmath AG, â€œA morphological generator for kannada based on finite state transducersâ€, Electronics Computer Technology (ICECT), Vol.1, (2011), pp.312â€“316.
  [14] Dhanalakshmi V, Rekha RU, Kumar A, Soman KP & Rajendran S, â€œMorphological analyzer for agglutinative languages using machine learning approachesâ€, International Conference on Advances in Recent Technologies in Communication and Computing, (2009), pp.433-435.
  [15] Xia F, â€œThe segmentation guidelines for the Penn Chinese Treebank (3.0)â€, Technical Report, (2000).
  [16] CakÄ±cÄ± R, â€œMorpheme segmentation in the METU-SabancÄ± Turkish Treebankâ€, Proceedings of the Sixth Linguistic Annotation Workshop. Association for Computational Linguistics, (2012).
  [17] Rao A, Muralikrishna SN & Nayak A, â€œDeveloping A Dependency Treebank for Kannadaâ€, An International Journal of Engineering Sciences, Special Issue iDravadian, (2014).
  [18] Bharati A, Sangal R & Sharma DM, â€œSSF: Shakti standard format guideâ€, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India, (2007), pp.1-25.
Downloads
How to Cite
Angle, S., Ashwath Rao, B., & N. Muralikrishna, S. (2018). Kannada morpheme segmentation using machine learning. International Journal of Engineering & Technology, 7(2.31), 45-49. https://doi.org/10.14419/ijet.v7i2.31.13395
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX

Kannada morpheme segmentation using machine learning

Authors

References

Downloads

How to Cite

Published