Events Tagging in Twitter Using Twitter Latent Dirichlet Allocation

  • Authors

    • Ghaidaa A. Al-Sultany
    • Hiba J. Aleqabie
    2018-11-27
    https://doi.org/10.14419/ijet.v7i4.19.28064
  • Twitter, TLDA, PMI, and Perplexity.
  • Abstract

    Twitter has become a great platform to publish and carrying news, advisements, events, topics and even daily events in our lives. Twitter Post has limitations on the length and noise. These limitations make that the post is unsuitable for topic modeling due to sparsity.  In this paper, Twitter Latent Dirichlet allocation (TLDA) method for topics modeling was applied to overcome the sparsity problem of tweets modeling. Many steps were implemented for event tagging on Twitter. First: construct a dataset by hashtag pooling technique, and then the preprocessing was performed to extract the features.  Secondly, find the suitable number of topics through Perplexity criterion, then, the topics are labeled by WordNet lexicon. Finally, events are tagging using Pricewise Mutual Information (PMI) criterion. The dataset is constructed about various topics including the American elections, Football world cup 2018, and a natural phenomenon and many others; the number of tweets is 63458. This study shows good results in training tweets dataset.

     

  • References

    1. [1] A. O. Steinskog, J. F. Therkelsen, and B. Gambäck, “Twitter Topic Modeling by Tweet Aggregation,†Proc. 21st Nord. Conf. Comput. Linguist., no. May, pp. 77–86, 2017.

      [2] H. Cai, Y. Yang, X. Li, and Z. Huang, “What are Popular : Exploring Twitter Features for Event Detection , Tracking and Visualization,†MM ’15 Proc. 23rd ACM Int. Conf. Multimed., pp. 89–98, 2015.

      [3] X. Zhao, J. Jiang, and W. X. Zhao, “An Empirical Comparison of Topics in Twitter and Traditional Media,†Singapore Manag. Univ. Sch. Inf. Syst. Tech. Pap. Ser., 2011.

      [4] R. Mehrotra, S. Sanner, W. Buntine, and L. Xie, “Improving LDA topic models for microblogs via tweet pooling and automatic labeling,†Proc. 36th Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. - SIGIR ’13, p. 889, 2013.

      [5] D. Alvarez-Melis and M. Saveski, “Topic Modeling in Twitter: Aggregating Tweets by Conversations,†$Icwsm16, no. Icwsm, pp. 519–522, 2016.

      [6] W. D. Penniman, Social Informatics, vol. 6430. 2010.

      [7] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter , a Social Network or a News Media?,†Int. World Wide Web Conf. Comm., pp. 1–10, 2010.

      [8] K. Sarkar and R. Law, “A Novel Approach to Document Classification using WordNet,†arXiv1510.02755 [cs], pp. 1–14, 2015.

      [9] G. Ifrim, B. Shi, and I. Brigadir, “Event detection in Twitter using aggressive filtering and hierarchical tweet clustering,†CEUR Workshop Proc., vol. 1150, pp. 33–40, 2014.

      [10] L. Liu, L. Tang, W. Dong, S. Yao, and W. Zhou, “An overview of topic modeling and its current applications in bioinformatics,†Springerplus, vol. 5, no. 1, 2016.

      [11] D. A. Ostrowski, “Using latent dirichlet allocation for topic modelling in twitter,†Proc. 2015 IEEE 9th Int. Conf. Semant. Comput. IEEE ICSC 2015, pp. 493–497, 2015.

      [12] X. Wan and T. Wang, “Automatic Labeling of Topic Models Using Text Summaries,†Proc. 54th Annu. Meet. Assoc. Comput. Linguist. (Volume 1 Long Pap., pp. 2297–2305, 2016.

      [13] C. C. Muşat, Ş. Trǎuşan-Matu, J. Velcin, and M.-A. Rizoiu, “Automatic extraction of conceptual labels from topic models,†UPB Sci. Bull. Ser. C Electr. Eng., vol. 74, no. 2, pp. 57–68, 2012.

      [14] A. Huang, R. Lehavy, A. Zang, and R. Zheng, “Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach,†Ssrn, 2014.

      [15] W. X. Zhao et al., “Topical keyphrase extraction from Twitter,†Proc. 49th Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol. 1, pp. 379–388, 2011.

  • Downloads

  • How to Cite

    A. Al-Sultany, G., & J. Aleqabie, H. (2018). Events Tagging in Twitter Using Twitter Latent Dirichlet Allocation. International Journal of Engineering & Technology, 7(4.19), 884-888. https://doi.org/10.14419/ijet.v7i4.19.28064

    Received date: 2019-03-01

    Accepted date: 2019-03-01

    Published date: 2018-11-27