A study on social big data analysis using text clustering

  • Authors

    • Jin HeeKu
    • Yoon Su Jeong
    2018-04-03
    https://doi.org/10.14419/ijet.v7i2.12.11023
  • Text Clustering, Social Big Data, Text Mining, Association Word, Cluster Analysis.
  • Background/Objectives: As the use of big data increases in various fields, the use of social big data analysis for social media is increasing rapidly.This study proposed a method to apply text clustering for analysis by related topics of texts extracted using text mining of social big data.

    Methods/Statistical analysis: R was used for data collection and analysis, and social big data was collected from Twitter. The clustering model applicable to the related subject analysis of Twitter text was compared and selected and text clustering was performed. Text clustering is analyzed through a cluster dendrogram by generating a corpus, then grouping similar entities from the term-document matrix, and removing the sparse words.

    Findings: In this study, text clustering improves the difficulty in analyzing by word association and subject in text mining methods such as word cloud. Especially, in the text clustering model for the related topic analysis of social big data, the hierarchical clustering model based on the cosine similarity was more suitable than the non-hierarchical model for identifying which terms in the tweet have an association with each other. In addition, cluster dendrogram has been found to be effective in analyzing text contexts by grouping several groups of similar texts repeatedly in the visualization process.

    Improvements/Applications: This study can be used to confirm ideas and opinions of various participants by using Social Big Data, and to analyze more precisely the complex relationship between the prediction of social problems and the phenomenon.

     

  • References

    1. [1] Chakraborty G, Pagolu M, Garla S, Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS, SAS Institute Inc.:North Carolina, USA, 2013.

      [2] Wikipedia, https://en.wikipedia.org/wiki/Text_mining, 2017.

      [3] GrossO, Doucet A, Toivonen H, Document Summarization Based on Word Associations, Proceedings of the 37th international ACM SIGIR conference, 2014, pp. 1023-1026.

      [4] Park Y M, Kim B G, Kwak S J, Lee J S, Two-Level Clustering for Sub-Topic Labeling of Social Media Data, Journal of KISS : Software and Applications, 2014, 41(3), pp. 225-232.

      [5] Gao D, Li W, Zhang R, Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013, pp. 567-571.

      [6] Park W J, Yu K Y, Spatial Clustering Analysis based on Text Mining of Location-Based Social Media Data, Journal of the Korean Society for Geospatial Information Science, 2015, 23(2), pp. 89-96.

      [7] IBM, IBM SPSS Modeler Text Analytics 17 User's Guide, IBM Corporation 2003: USA, 2015.

      [8] Yu C H, Hong S H, R Visualization, Insight: Seoul, KOREA, 2015.

      [9] Vijayarani S, Ilamathi J, Nithya, Preprocessing Techniques for Text Mining - An Overview, International Journal of Computer Science & Communication Networks, 2015, 5(1), pp. 7-16.

      [10] Kim U J, Introduction to Artificial Intelligence Machine Learning and Deep Learning, Wiki Books: Seoul, KOREA, 2016.

      [11] Steinbach M, Karypis G, Kumar V, A Comparison of Document Clustering Techniques, the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp. 1-20.

      [12] Zhao Y,Karypis G, Comparison of Agglomerative and Partitional Document Clustering Algorithms, 2002, University of Minnesota, Technical Report#02-014, pp. 1-13.

      [13] NCSS, Hierarchical Clustering / Dendrograms, http://ncss.wpengine.netdna-cdn.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Hierarchical_Clustering-Dendrograms.pdf.

  • Downloads

  • How to Cite

    HeeKu, J., & Su Jeong, Y. (2018). A study on social big data analysis using text clustering. International Journal of Engineering & Technology, 7(2.12), 1-4. https://doi.org/10.14419/ijet.v7i2.12.11023