A Survey of Data Mining Techniques on Information Networks

  • Authors

    • Sadhana Kodali
    • Madhavi Dabbiru
    • B Thirumala Rao
    2018-03-11
    https://doi.org/10.14419/ijet.v7i2.6.11267
  • InformationNetworks, DataMining Techniques, Homogeneous Information Networks, HeterogeneousInformation Networks
  • Abstract

    An Information Network is the network formed by the interconnectivity of the objects formed due to the interaction between them. In our day-to-day life we can find these information networks like the social media network, the network formed by the interaction of web objects etc. This paper presents a survey of various Data Mining techniques that can be applicable to information networks. The Data Mining techniques of both homogeneous and heterogeneous information networks are discussed in detail and a comparative study on each problem category is showcased.


  • References

    1. [1] Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, Philip S. Yu,†A Survey of Heterogeneous Information Network Analysisâ€, in Journal Latex class Files, Vol 14,no 8,August 2017.

      [2]Glen Jeh, Jennifer Widom,â€SimRank: A measure of Structural-Context similarity†in KDD, pp. 538–543, 2002.

      [3] A. Blum, T.-H. H. Chan, and M. R. Rwebangira. A random-surfer web-graph model. In ANALCO ’06: Proceedings of the eighth Workshop on Algorithm Engineering and Experiments and the third Workshop on Analytic Algorithmic and Combinatorics, pages 238- 246, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics.

      [4] Peixiang Zhao, Jiawei Han, Yizhou Sun,†P-Rank: a Comprehensive Structural Similarity Measure over Information Networksâ€.CIKM’09, Hong Kong, China. November 2–6, 2009

      [5] H. G. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, Science, 24(4):265-269, 1973.

      [6] M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10-25, 1963.

      [7] R. Amsler. Application of citation-based automatic classification. Technical report, The University of Texas at Austin Linguistics Research Center, December 1972.

      [8] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu, “PathSim: Meta path Based TopK Similarity Search in Heterogeneous Information Networks†Proceedings of the VLDB Endowment, Vol. 4, No. 11 2011.

      [9] Wensi Xi, Benyu Zhang, Edward A. Fox, SimFusion: A Unified Similarity Measurement Algorithm for Multi-Type Interrelated Web Objects, in the www conference May10-14, 2005.

      [10] C. Shi, X. Kong, Y. Huang, S. Y. Philip, and B. Wu, “HeteSim:A general framework for relevance measure in heterogeneous networks,†IEEE Transactions on Knowledge & Data Engineering, vol. 26, no. 10, pp. 2479–2492, 2014.

      [11] Sadhana Kodali,Madhavi Dabbiru,Kamalakar Meduri,â€Constraint based approach for minging Heterogeneous Information Networks†6th IEEE IACC 2016 ,27th -28th February 2016.

      [12] Jeffrey Dean and Sanjay Ghemawat ,†MapReduce: Simplified Data Processing on Large Clusters†,OSDI 2004 11th March.

      [13] Jiazhen Nian, Shanshan Wang, and Yan Zhang,â€HN-Sim: A Structural Similarity Measure over Object-Behavior Networksâ€, Part I, LNAI 8346, pp. 48–59 ,ADMA-2013.

      [14]J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,†in SODA, pp. 668–677. 1999.

      [15] L. Page, S. Brin, R. Motwani, and T. Winograd, “The Page Rank citation ranking: Bringing order to the web.†Technical report, Stanford University Database Group, 1998.

      [16] R. LEMPEL and S. MORAN, SALSA: The Stochastic Approach for Link Structure Analysis, ACM Transactions on Information Systems, Vol. 19, No. 2, April 2001.

      [17] Taher H. Haveliwala, Topic Sensitive Page Rank WWW May 7–11, Honolulu, Hawaii, USA, 2002.

      [18] G.Jeh and J. Widom, “Scaling personalized web search,†in WWW, pp 271–279, 2003.

      [19] Gyongyi Z, Garcia-Molina H, Pedersen J Combating web spam with TrustRank. In: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30, VLDB Endowment, VLDB '04, pp 576-587, 2004.

      [20] Balmin A, Hristidis V, Papakonstantinou ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30, VLDB Endowment, VLDB, pp 564-575, 2004.

      [21] Yoav Freund, Raj Iyer, Robert E. Schapire, Yoram Singer, An Efficient Boosting Algorithm for Combining Preferences, Journal of Machine Learning Research 4 pp.933-969,2003.

      [22] Kazawa, H., Hirao, T.Maeda,: Order SVM: a kernel method for order learning based on generalized order statistics. Systems and Computers in Japan pp 35–43, 2005.

      [23] Herbrich, R., Graepel, T., Bollmann-Sdorra, P Obermayer, K.: Learning preference relations for information retrieval. In: ICML-98 Workshop: Text Categorization and Machine Learning.pp 80–84, 1998.

      [24] Nie Z, Zhang Y, Wen JR, Ma WY (2005) Object-level ranking: bringing order to web objects. In: Proceedings of the 14th international conference on World Wide Web, WWW '05, pp 567-574,2005.

      [25] Hai-jiang He, A Co-Ranking Algorithm for Learning Listwise Ranking Functions from Unlabeled Data, journal of computers, vol. 6, no. 11, november 2011.

      [26]Zhirun Liu,Heyan Huang,Xiaochi Wei,Xianling Mao, Tri-Rank: An Authority Ranking Framework in Heterogeneous Academic Networks by Mutual Reinforce,26 th IEEE International Conference on tools with Artificial Intelligence,2014.

      [27] Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu, “RankClus:Integrating clustering with ranking for heterogeneous information network analysis,†in EDBT,pp. 565–576, 2009

      [28] A. P. Dempster; N. M. Laird; D. B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. , pp. 1-38, 1977.

      [29] Y. Sun, Y. Yu, and J. Han, “Ranking-based clustering of heterogeneous information networks with star network schema,†in KDD, pp. 797–806, 2009.

      [30] Ling Chen,XueLi,Jiawei Han “MedRank: Discovering Influential Medical Treatments from Literature by Information Network Analysis†Twenty-Fourth Australasian Database Conference (ADC2013), Adelaide, Australia,2013.

      [31]R. Wang, C. Shi, P. S. Yu, and B. Wu, “Integrating clustering and ranking on hybrid heterogeneous information network,†in PAKDD, pp. 583–594, 2013.

      [32] R. Angelova, G. Kasneci, and G. Weikum, “Graffiti: Graph-based classification in heterogeneous networks,†in WWW, pp.139–170, 2012.

      [33] M. Ji, J. Han, and M. Danilevsky, “Ranking-based classification of heterogeneous information networks,†in KDD pp. 1298–1306, 2011.

      [34] C. Luo, R. Guan, Z. Wang, and C. Lin, “HetPathMine: A novel transductive classification algorithm on heterogeneous information networks,†Advances in Information Retrieval, vol. 8416, pp. 210–221, 2014.

      [35]L. Ungar and D. Foster, “Clustering Methods for Collaborative Filtering,†Proc. Workshop on Recommendation Systems, AAAI Press, 1998.

      [36] Xiao Yu, Jiawei Han et al."Personalized Entity Recommendation: A Heterogeneous Information Network Approach"WSDM’14, , New York, New York, USA, February 24–28, 2014.

      [37] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. "A limited memory algorithm for bound constrained optimization." SIAM Journal on Scientific Computing, 16(5):1190–1208, 1995.

  • Downloads

  • How to Cite

    Kodali, S., Dabbiru, M., & Rao, B. T. (2018). A Survey of Data Mining Techniques on Information Networks. International Journal of Engineering & Technology, 7(2.6), 293-300. https://doi.org/10.14419/ijet.v7i2.6.11267

    Received date: 2018-04-08

    Accepted date: 2018-04-08

    Published date: 2018-03-11