Finding author similarity by clustering probabilistic LSA factors in INDIAN english authors poetry

  • Authors

    • K Praveen kumar
    • Venkata Naresh Mandhala
    • Sudheshna Vempati
    • Dr Subba Rao Peram
    2018-03-18
    https://doi.org/10.14419/ijet.v7i2.7.12235
  • LSA, PLSA, Word Occurrence, High Dimensionality
  • High dimensionality and sparseness is the big challenge to the data scientists to discover the similarity among the documents. In unsuper-vised learning data is unlabeled and there is no clear distance measures to discover the clusters among the data. In this paper we considered Indian English Authors poems to cluster them using Probabilistic Latent Semantic Analysis, using which we analyzed the authors similarity. We compared the results of clustering with Latent Semantic Analysis method, a word occurrence method. In this case, Results are shown that probabilistic methods are performing good clustering than the word occurrence method.

     

     

  • References

    1. [1] Capasso, Vincenzo, and David Bakstein. "Fundamentals of Probability." In An Introduction to Continuous-Time Stochastic Processes, pp. 3-76. Birkhäuser, Boston, MA, 2012.

      [2] Chua, Freddy Chong Tat. "Dimensionality reduction and clustering of text documents." Singapore Management University (2009).

      [3] Hornik, Kurt, and Bettina Grün. "topicmodels: An R package for fitting topic models." Journal of Statistical Software 40, no. 13 (2011): 1-30.

      [4] Hofmann, Thomas. "Probabilistic latent semantic analysis." In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pp. 289-296. Morgan Kaufmann Publishers Inc., 1999.

      [5] Hosseinia, M., K. Badie, and A. Moeini. "Aspect-Oriented Document Clustering for Facilitating Retrieval Process." International Journal of Computer Theory and Engineering 4, no. 5 (2012): 707

      [6] Lancia, Franco. "Word co-occurrence and similarity in meaning." Mind as infinite dimensionality. Charlotte, NC: Information Age Publishers (2007).

      [7] Scheaffer, Richard L., Madhuri Mulekar, and James T. McClave. Probability and statistics for engineers. Cengage Learning, 2010.

      [8] K. Praveenkumar., T. M. Padmaja, “An Analysis on Computational Approach for Finding Similarity in Indian English Authors Poetryâ€, International Conference on SMART DSC-2017, Vignan Institute of Information Technology, Visakhapatnam, on November 30 to December 02, 2017, Advanced Science and Technology Letters,Vol.147 (SMART DSC-2017), pp. 193–203, 2017.

  • Downloads

  • How to Cite

    Praveen kumar, K., Naresh Mandhala, V., Vempati, S., & Subba Rao Peram, D. (2018). Finding author similarity by clustering probabilistic LSA factors in INDIAN english authors poetry. International Journal of Engineering & Technology, 7(2.7), 1096-1099. https://doi.org/10.14419/ijet.v7i2.7.12235