Distributed Community Detection based on Apache Spark using Multi Label Propagation for Digital Social Networks

  • Authors

    • Satya Keerthi Gorripati
    • Valli Kumari
    2018-09-22
    https://doi.org/10.14419/ijet.v7i4.5.20016
  • Apache Spark, Community detection, Distributed, RDD, Social graphs.
  • Organization, Government and Individual (OGI) have popularized the use of Digital Social Networks (DSN) that reduces the processing time of social-aware tasks. To accomplish a community-based communication, each social-aware task should identify its community group. The identified group uses a task to avail all the DSN benefits to their customers / citizens. As a result, the community-based detection algorithm has played a significant role in literature. However, the existing algorithms have had several challenging issues, such as performance and scalability. Thus, a distributed community detection algorithm is presented using Apache Spark’s Resilient Distributed Data Set (RDD) framework based on the Scala programming language. The Apache Spark framework provides an ideal solution that offers ease of coding, performance, interactive mode and disk Input-Output bottlenecks in Hadoop /Map Reduce. Besides, it presents a platform of distributed community detection that reduces the computational computation by applying transformations, aggregations and joins. The experimental results show that the proposed framework achieves high accuracy for both real-world and synthetic networks.

     

     


  • References

    1. [1] Friis, C. S., Demchak, C., & LaPorte, T. (2000). Webbing goernance: National differences in constructing the face of public organizations. In Handbook of public information susytems. Marcel Dekker Incorporated.

      [2] Dutton, W. H., & Blank, G. (2011). Next generation users: the internet in Britain.

      [3] Hofmann, S., Beverungen, D., Räckers, M., & Becker, J. (2013). What makes local governments' online communications successful? Insights from a multi-method analysis of Facebook. Government Information Quarterly, 30(4), 387-396.

      [4] Girvan, M., and Newman, M. E. (2002), Community structure in social and biological networks. Proceedings of the National Academy of Sciences, Vol. 99, No. 12, pp. 7821-7826.

      [5] Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008

      [6] Adamcsek, B., Palla, G., Farkas, I. J., Derényi, I., & Vicsek, T. (2006). CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 22(8), 1021-1023.

      [7] Cao, X., Wang, X., Jin, D., Cao, Y. and He, D., (2013) ‘ Identifying overlapping communities as well as hubs and outliers via nonnegative matrix factorization’, Scientific reports, 3, p.2993

      [8] Zeng, J. and Yu, H., (2015), “Parallel Modularity-based Community Detection on Large-scale Graphsâ€, In Cluster Computing(CLUSTER) IEEE International Conference

      [9] Raghavan, U.N., Albert, R. and Kumara, S., (2007), “Near linear time algorithm to detect community structures in large-scale networksâ€, Physical review E, 76(3), p.036106

      [10] Xie,J., Szymanski, B.K. and Liu, X., (2011), “Slpa: Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic processâ€, In Data Mining Workshops (ICDMW) IEEE 11th International Conference, pp. 344-349

      [11] Dickinson, B. and Hu, W., (2015), “The Effects of Centrality Ordering in Label Propagation for CommunityDetectionâ€, SocialNetworking, 4(04), p.103.

      [12] Wu, Z.H., Lin, Y.F., Gregory, S., Wan, H.Y. and Tian, S.F., (2012), “ Balanced multi-label propagation for overlapping community detection in social networksâ€, Journal of Computer Science and Technology, Vol.27, No.3, pp.468-479

      [13] Prabavathi, G. T., and V. Thiagarasu., (2014), “Design and development of overlapping community detection algorithm using Multi-Label Propagationâ€, International Journal of Advance Research in Computer Science and Management Studies, Vol.2 No.2, pp. 195-199.

      [14] Angadi, A. and Varma, P.S., (2015), “Overlapping community detection in temporal networksâ€, Indian Journal of Science and Technology, 8(31).

      [15] Xie,J., Szymanski B K. ,(2013), “Labelrank: A stabilized label propagation algorithm for community detection in networksâ€, In Network Science Workshop (NSW), pp. 138–143.IEEE

      [16] Xie,J., Chen, M. and Szymanski, B.K., (2013), “LabelrankT: Incremental community detection in dynamic networks via label propagationâ€, In Proceedings of the Workshop on Dynamic Networks Management and Mining, pp. 25-32, ACM

      [17] Bhat, A.U., (2012), Scalable community detection using label propagation & map-reduce

      [18] Kuzmin, K., Shah, S.Y. and Szymanski, B.K., (2013), “Parallel overlapping community detection with SLPAâ€, In Social Computing (SocialCom), International Conference, pp. 204-212, IEEE

      [19] Kuzmin, K., Chen, M. and Szymanski, B.K., (2015), “Parallelizing SLPA for scalable overlapping community detection†, Scientific Programming, 2015, pp.4

      [20] Sotera (2014) [online] https://sotera.github.io/ distributed-graph-analytics/louvain/

      [21] Prat-Pérez, A., Dominguez-Sal, D. and Larriba-Pey, J.L.,( 2014), “ High quality, scalable and parallel community detection for large real graphsâ€, In Proceedings of the 23rd international conference on World wide web pp. 225-236. ACM.

      [22] Xin, R.S., Gonzalez, J.E., Franklin, M.J. and Stoica, I., (2013), “Graphx: A resilient distributed graph system on sparkâ€, In First International Workshop on Graph Data Management Experiences and Systems, p. 2, ACM.

      [23] Buzun, N., Korshunov, A., Avanesov, V., Filonenko, I., Kozlov, I., Turdakov, D. and Kim, H., (2014), “Egolp: Fast and distributed community detection in billion-node social networks.â€, In Data Mining Workshop (ICDMW), IEEE International Conference on (pp. 533-540).

      [24] Rathee, S., Kaul, M. and Kashyap, A.,( 2015), “R-Apriori: an efficient apriori based algorithm on spark†, In Proceedings of the 8th Workshop on Ph. D. Workshop in Information and Knowledge Management , pp. 27-34, ACM.

      [25] Dongen, S.M., (2000), Graph clustering by flow simulation.

      [26] Koy, Albert.(2015), "Subgraph-centric Large-Scale Graph Analytics on Spark."

  • Downloads

  • How to Cite

    Keerthi Gorripati, S., & Kumari, V. (2018). Distributed Community Detection based on Apache Spark using Multi Label Propagation for Digital Social Networks. International Journal of Engineering & Technology, 7(4.5), 79-86. https://doi.org/10.14419/ijet.v7i4.5.20016