A new approach for finding semantic similar scientific articles

  • Authors

    • Masumeh Islami Nasab Msc Student
    • Reza Javidan Assistant Professor in Computer Engineering and IT Department in Shiraz University of Technology
  • Similarities, Semantic Similarities, Text Preprocessing, WordNet.
  • Calculating article similarities enables users to find similar articles and documents in a collection of articles. Two similar documents are extremely helpful for text applications such as document-to-document similarity search, plagiarism checker, text mining for repetition, and text filtering. This paper proposes a new method for calculating the semantic similarities of articles. WordNet is used to find word semantic associations. The proposed technique first compares the similarity of each part two by two. The final results are then calculated based on weighted mean from different parts. Results are compared with human scores to find how it is close to Pearson’s correlation coefficient. The correlation coefficient above 87 percent is the result of the proposed system. The system works precisely in identifying the similarities.

  • References

    1. [1] Sheth, A, Lytras M., "Information Retrieval by Semantic Similarity", int. journal on semantic web & information systems, 2(3), pp: 55-73. (2006).

      [2] Ramprasath, M, Hariharan, Sh.,â€Using ontology for Measuring Semantic Similarity for Question Answering Systemâ€IEEE International conference on Advanced Communication control and Computing Technologies(ICACCD), pp: 218-223. (2012).

      [3] Sahami, M, Heilman, T., “A Web-based Kernel Function for Measuring the Similarity of Short text Snippetsâ€, Proceeding of 15th International Word Wide Web Conference. (2006). http://dx.doi.org/10.1145/1135777.1135834.

      [4] Madylova, A., “A Taxonomy based Semantic Similarity Documents Using Cosine Measureâ€, Computer an Information Sciences, IEEE,Iscis 2009.24th, International Symposium. (2009).

      [5] Mihalcea, R., Corley, C, Strapparava, C., “Corpus-based and Knowledge-based Measures of Text Semantic Similarityâ€, Proceeding of th National Conference on Artificial Intelligence ,pages:775-780. (2006).

      [6] Ghazizadeh Ahsaee, M, Naghibzadeh, M, Yasrebi Naieni, S.E., “Weighted Semantic Similarity Assesment Using Word Net â€, Dept. of Computer Engineering Ferdowsi University of Mashhad, Iran , International Conference on computer & Information Science(ICCIS), pp:66-71, (2012).

      [7] Qasim, A, Omar, N, Albared, M., “Combined Statistical Methods to Measure Semantic Text Similarity in Holy Qurʼanic Translationsâ€, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, university Kebang Saan Mlaysia, 43600 Bangi Selangor, Malaysia, vol5(17), pp:1-7, (2013).

      [8] Huang, A., “Similarity Measure for Text Document Clusteringâ€, Department of Computer Science The University of Waikato, Hamilton, New Zealand, pp:49-56, (2008).

      [9] Song, W, Cheol Park, S., “An Improved Genetic Algotithm for Document Clustering With Semantic Similarity Measureâ€, Division of Electronics and Information Engineering, Chonbuk National University, Jeonju, 561756, korea(IEEE), pp:536-540. (2006).

      [10] Porter, M., “An algorithm for suffix stripping. Programâ€.14(3), pp.130-137, (1980). http://dx.doi.org/10.1108/eb046814.

      [11] Lin, F, Sandkuhl, K., “A Survey of Exploiting WordNet in Ontology Matchingâ€. In IFIP International Federation for Information Processing, Artificial Intelligence and Practice II; Max Bramer; (Boston: Springer), Vol 276, pages: 341–350, (2008).

      [12] Cimiano, P., “Ontology Learning and Population from Text: Algorithms, Evaluation and Applicationsâ€, Springer, 2006.

      [13] Lin, D., “An information-theoretic definition of similarityâ€. In Proceeding of the15th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, USA, pp. 296–304, (1998).

      [14] Petrakis, E.G.M., Varelas, G., “Design and evaluation of semantic similarity measures for concepts stemming from the same or different ontologiesâ€. In 4th Workshop on Multimedia Semantics (WMS’06), pp. 44–52, (2006).

      [15] Resnic, P., “Using Information content to evaluate semantic similarity in a taxonomyâ€, Proceedings of IJCAI-95, vol. 1, 448-453, (1995).

      [16] Anisimov, A.V., Marchenko, O.O, and Kysenko .V.K., “A Method for the Coputation of the Semantic Similarity and Relatedness between Natural Language Wordsâ€, Cybernetics and Systems Analysis, Vol 047, pp: 515-522, (2011). http://dx.doi.org/10.1007/s10559-011-9334-2.

  • Downloads

  • How to Cite

    Islami Nasab, M., & Javidan, R. (2015). A new approach for finding semantic similar scientific articles. Journal of Advanced Computer Science & Technology, 4(1), 53-59. https://doi.org/10.14419/jacst.v4i1.4012