VBS Stemmer: A vocabulary-based stemmer

Article Summary Abstract References Full Article How to cite

Authors
- Hamed Zakeri Rad Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
- Sabrina Tiun Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
- Saidah Saad Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
How to Cite

Zakeri Rad, H., Tiun, S., & Saad, S. (2018). VBS Stemmer: A vocabulary-based stemmer. International Journal of Engineering and Technology, 7(2), 551-554. https://doi.org/10.14419/ijet.v7i2.9192

ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: January 17, 2018

Accepted date: April 6, 2018

Published date: April 13, 2018
https://doi.org/10.14419/ijet.v7i2.9192
English Suffix Removal, Information Retrieval, Stemming Algorithm, Suffix Removal, Vocabulary Based Stemmer,
Abstract

Stemming is referred to a procedure of reducing all words appearing in different morphological variants to a common form. As a matter of fact, it is considered as a functional way in various areas of information-retrieval work and computational linguistics. In this paper, we introduced the Vocabulary Based Stemmer (VBS) as the alternative solution to the stemming problem for the applications which are based on the semantic relation between words or dictionary based and need valid words. The Vocabulary part of VBS stemmer is generated based on WordNet. To validate the VBS Stemmer, part of â€œCranfield 1400â€ test collection being used, and the result shows significant improvements over the previous stemmers.
References
1. [1] Bacchin, M., N. Ferro, and M. Melucci. â€œThe effectiveness of a graph-based algorithm for stemming,â€ in ICADL. Springer.2002.
  [2] Lovins, J.B., â€œDevelopment of a stemming algorithm,â€MIT Information Processing Group, Electronic Systems Laboratory Cambridge.1968.
  [3] Porter, M.F., â€œAn algorithm for suffix stripping,â€Program,14(3):1980.pp. 130-137. https://doi.org/10.1108/eb046814.
  [4] Dawson, J.L., â€œSuffix removal and word conflation,â€ALLC Bulletin, Michaelmas,1974. pp. 33-46.
  [5] Dattola, R.T., â€œFIRST: Flexible information retrieval system for text,â€Journal of the Association for Information Science and Technology, 1979. 30(1):pp. 9-14. https://doi.org/10.1002/asi.4630300103.
  [6] Porter, M.F., â€œSnowball: A language for stemming algorithms,â€ 2001.
  [7] Willett, P., â€œThe Porter stemming algorithm: then and now,â€ Program, 2006. 40(3):pp. 219-223. https://doi.org/10.1108/00330330610681295.
  [8] Van Rijsbergen, C.J., S.E. Robertson, and M.F. Porter, â€œNew models in probabilistic information retrieval,â€British Library Research and Development Department. 1980
  [9] Chris, D.P. â€œAnother stemmer,â€ in ACM SIGIR Forum. 1990.
  [10] Kraaij, W. and R. Pohlmann, â€œPorterâ€™s stemming algorithm for Dutch. Informatiewetenschap,â€ 1994: pp. 167-180.
  [11] Idris, N. and S.S. Mustapha, â€œStemming for term conflation in Malay texts,â€ 2001.
  [12] Orengo, V.M. and C. Huyck. â€œA stemming algorithm for the portuguese language,â€ in String Processing andInformation Retrieval, 2001. SPIRE 2001. Proceedings. Eighth International Symposium on IEEE.2001 https://doi.org/10.1109/SPIRE.2001.989755.
  [13] Ramanathan, A. and D.D. Rao. â€œA lightweight stemmer for Hindi,â€ in the Proceedings of EACL. 2003.
  [14] Taghva, K., R. Beckley, and M. Sadeh. â€œA stemming algorithm for the farsi language. in Information Technology: Coding and Computing,â€2005. ITCC 2005. International Conference onIEEE. 2005.
  [15] Savoy, J., â€œSearching strategies for the Bulgarian language,â€Information Retrieval, 2007. 10(6):pp. 509-529. https://doi.org/10.1007/s10791-007-9033-9.
  [16] Savoy, J., â€œSearching strategies for the Hungarian language,â€Information processing & management, 2008. 44(1):pp. 310-324. https://doi.org/10.1016/j.ipm.2007.01.022.
  [17] Sawalha, M. and E. Atwell.â€ Comparative evaluation of arabic language morphological analysers and stemmers,â€ in Proceedings of COLING 2008 22nd International Conference on Comptational Linguistics (Poster Volume)). 2008. Coling 2008 Organizing Committee. 2008.
  [18] Sharma, D., â€œStemming algorithms: A comparative study and their analysis,â€International Journal of AppliedInformation Systems, 2012. 4(3): pp. 7-12. https://doi.org/10.5120/ijais12-450655.
  [19] Oard, D.W., G.-A. Levow, and C.I. Cabezas. â€œCLEF experiments at Maryland: Statistical stemming and backoff translation,â€ in Workshop of the Cross-Language Evaluation Forum for European Languages. Springer.2000.
  [20] Bacchin, M., N. Ferro, and M. Melucci, â€œA probabilistic model for stemmer generation,â€Information Processing &Management, 2005. 41(1):pp. 121-137. https://doi.org/10.1016/j.ipm.2004.04.006.
  [21] Majumder, P., et al., â€œYASS: Yet another suffix stripper,â€ACM transactions on information systems (TOIS), 2007.25(4): pp. 18.
  [22] Paik, J.H., D. Pal, and S.K. Parui. â€œA novel corpus-based stemming algorithm using co-occurrence statistics,â€ in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM.2011. https://doi.org/10.1145/2009916.2010031.
  [23] Miller, G. and C. Fellbaum, â€œWordnet: An electronic lexical database,â€MIT Press Cambridge.1998.
Downloads
How to Cite
Zakeri Rad, H., Tiun, S., & Saad, S. (2018). VBS Stemmer: A vocabulary-based stemmer. International Journal of Engineering and Technology, 7(2), 551-554. https://doi.org/10.14419/ijet.v7i2.9192
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: January 17, 2018

Accepted date: April 6, 2018

Published date: April 13, 2018

VBS Stemmer: A vocabulary-based stemmer

Authors

How to Cite

Abstract

References

Downloads

How to Cite

Published