Term Weight Measures Influence in Information Retrieval

  • Authors

    • T Raghunadha Reddy
    • G Apparao Naidu
    • B Vishnu Vardhan
  • Information Retrieval, Term Weight Measures, Tfidf, Recal, Cosine Similarity Measure
  • Abstract

    Indexing was majorly used in different applications like information retrieval (IR), Document categorization. In the field of IR, indexer is used by search engines to represent the content of a document with short and content-bearing terms so that the retrieval process obtained great performance. The text index systems produce better results based on the assignment of suitable weights to the terms. These results crucially depend on the selection of the efficient term weighting measures. In this work, the experimentation carried out with different types of term weight measures to assign weights to the terms in the query and document representation. Cosine similarity measure is used to find the similarity between the query vector and document vector. The experimentation is performed on four standard datasets and recall as a performance evaluation measure. The results obtained in this work are promising than most of the approaches in IR field.



  • References

    1. [1] Tony I. Obaseki, "Automated Indexing: The Key to Information Retrieval in the 21st Century," Library Philosophy and Practice (e-journal) Libraries at University of Nebraska-Lincoln, 2010.

      [2] MeronSahlemariam, MulugetaLibsie, and Daniel Yacob, "Concept-Based Automatic Amharic Document Categorization," AMCIS 2009 Proceedings, 2009.

      [3] Dow Jones Markets, Vijay V. Raghavan, William I. Grosky, Rajesh Kasanagottu, and Venkat N. G Udivada, "Information retrieval on the World Wide Web.," in IEEE Internet Computing, 1997.

      [4] Gerard Salton, "Syntactic Approaches to Automatic Book Indexing," Proceedings of the 26th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 204-210, 1988. https://doi.org/10.3115/982023.982048.

      [5] S. E. Robertson and K. S. Jones, Simple, proven approaches to text retrieval. Cambridge: Computer Laboratory, University of Cambridge, 1997.

      [6] Patrick Pantel and Dekang Lin, "A Statistical Corpus-Based Term Extractor," Advances in Artificial Intelligence, pp. 36-46, 2001. https://doi.org/10.1007/3-540-45153-6_4.

      [7] Melvin Earl and John L. Kuhns. Maron, "On Relevance, Probabilistic Indexing and Information Retrieval," Journal of the ACM (JACM) 7.3, Vol. 7, No. 3, pp. 216-244, 1960.

      [8] L Fagan Joel, "Experiments in Automatic Phrase Indexing for Document Retrieval: A comparision of Syntactic and non-Syntactic Methods," 1987.

      [9] Renee Pohlmann and Wessel Kraaij, "The Effect of Syntactic Phrase Indexing on Retrieval Performance for Dutch Texts," In Proceedings of RIAO’97, pp. 176-187, 1997.

      [10] Barón Marco Suárez and Valencia Kathleen Salinas, "An approach to semantic indexing and information retrieval," Revista Facultad de Ingeniería Universidad de Antioquia, pp. 174-187, 2009.

      [11] Barbara Rosario, "Latent Semantic Indexing: An overview," Techn. Rep. INFOSYS 240, 2000.

      [12] Tewodros Hailemeskel Gebermariam, "Amharic Text Retrieval: An Experiment Using Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD)," Master’s Thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa, Ethiopia, Unpublished 2003.

      [13] Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “Author profile prediction using pivoted unique term normalizationâ€, Indian Journal of Science and Technology, Vol 9, Issue 46, Dec 2016.

      [14] Raghunadha Reddy T, Vishnu Vardhan B, GopiChand M, Karunakar K, “Gender prediction in Author Profiling using ReliefF Feature Selection Algorithmâ€, Proceedings in Advances in Intelligent Systems and Computing, Volume 695, PP. 169-176, 2018.

      [15] Raghunadha Reddy T, Gopichand M, Hemanath K, “Location Prediction Of Anonymous Text Using Author Profiling Techniqueâ€, International Journal of Civil Engineering and Technology (IJCIET), Volume 8, Issue 12, December 2017, pp. 339-345.

      [16] Swathi Ch, Karunakar K, Archana G, T. Raghunadha Reddy, “A New Term Weight Measure for Gender Prediction in Author Profilingâ€, Proceedings in Advances in Intelligent Systems and Computing, Volume 695, PP. 11-18, 2018.

      [17] Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “A Survey on Author Profiling Techniquesâ€, International Journal of Applied Engineering Research, March 2016, Volume-11, Issue-5, pp. 3092-3102.

      [18] Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “Profile specific Document Weighted approach using a New Term Weighting Measure for Author Profiling â€, International Journal of Intelligent Engineering and Systems, Nov 2016, 9 (4), pp. 136 - 146. DOI: 10.22266/ijies2016.1231.15 https://doi.org/10.22266/ijies2016.1231.15.

      [19] Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “N-gram approach for gender predictionâ€, IEEE 7th International Advance Computing Conference, jan 5-7, 2017, pp.860-865

      [20] Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “A Document weighted Approach for Gender and Age Predictionâ€, International Journal of Engineering, Volume 30, No. 5, 2017, PP. 647-653.

      [21] Moheb Ramzy Girgis, Abdelmgeid Amin Aly & Fatima Mohy Eldin Azzam, “The Effect Of Similarity Measures On Genetic Algorithm-Based Information Retrievalâ€, International Journal of Computer Science Engineering and Information Technology Research, Vol. 4, Issue 5, pp. 91-100, Oct 2014.

      [7] Melvin Earl, and John L. Kuhns. Maron, "On Relevance, Probabilistic Indexing and Information Retrieval," Journal of the ACM (JACM) 7.3 , Vol. 7, No. 3, pp. 216-244, 1960.

      [8] L Fagan Joel, "Experiments in Automatic Phrase Indexing for Document Retrieval: A comparision of Syntactic and non Syntactic Methods," 1987.

      [9] Renee Pohlmann and Wessel Kraaij, "The Effect of Syntactic Phrase Indexing on Retrieval Performance for Dutch Texts," In Proceedings of RIAO’97, pp. 176-187, 1997.

      [10] Barón Marco Suárez and Valencia Kathleen Salinas, "An approach to semantic indexing and information retrieval," Revista Facultad de Ingeniería Universidad de Antioquia, pp. 174-187, 2009.

      [11] Barbara Rosario, "Latent Semantic Indexing: An overview," Techn. rep. INFOSYS 240, 2000.

      [12] Tewodros Hailemeskel Gebermariam, "Amharic Text Retrieval: An Experiment Using Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD)," Masters Thesis, School of Information Studies for Africa, Addis Ababa University, Addis Ababa, Ethiopia, Unpublished 2003.

      [13] Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “Profile specific Document Weighted approach using a New Term Weighting Measure for Author Profiling â€, International Journal of Intelligent Engineering and Systems, Nov 2016, 9 (4), pp. 136 - 146. DOI: 10.22266/ijies2016.1231.15

      [14] Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “N-gram approach for gender predictionâ€, IEEE 7th International Advance Computing Conference, jan 5-7, 2017, pp.860-865

      [15] Raghunadha Reddy T, Vishnu Vardhan B, Vijayapal Reddy P, “A Document weighted Approach for Gender and Age Predictionâ€, International Journal of Engineering, Volume 30, No. 5, 2017, PP. 647-653.

      [16] Moheb Ramzy Girgis, Abdelmgeid Amin Aly & Fatima Mohy Eldin Azzam, “The Effect Of Similarity Measures On Genetic Algorithm-Based Information Retrievalâ€, International Journal of Computer Science Engineering and Information Technology Research, Vol. 4, Issue 5, pp. 91-100, Oct 2014.

  • Downloads

    Additional Files

  • How to Cite

    Pradeep Reddy, K., Raghunadha Reddy, T., Apparao Naidu, G., & Vishnu Vardhan, B. (2018). Term Weight Measures Influence in Information Retrieval. International Journal of Engineering and Technology, 7(2), 832-836. https://doi.org/10.14419/ijet.v7i2.11664

    Received date: 2018-04-17

    Accepted date: 2018-05-14

    Published date: 2018-05-31