MCDAStream: a real-time data stream clustering based on micro-cluster density and attraction
-
2018-03-13 https://doi.org/10.14419/ijet.v7i2.9051 -
Data Stream, Data Mining, Density-Based Clustering, Grid-Based Clustering, Micro-Clusters. -
Abstract
Real-time data stream clustering has been widely used in many fields, and it can extract useful information from massive sets of data. Most of the existing density-based algorithms cluster the data streams based on the density within the micro-clusters. These algorithms completely omit the data density in the area between the micro-clusters and recluster the micro-clusters based on erroneous assumptions about the distribution of the data within and between the micro-clusters that lead to poor clustering results. This paper describes a novel density-based clustering algorithm for evolving data streams called MCDAStream, which clusters the data stream based on micro-cluster density and attraction between the micro-clusters. The attraction of micro-clusters characterizes the positional information of the data points in each micro-cluster. We generate better clustering results by considering both micro-cluster density and attraction of micro-clusters. The quality of the proposed algorithm is evaluated on various synthetic and real-time datasets with distinct characteristics and quality metrics.
-
References
[1] Chen Y, Tu L, “Stream Data Clustering Based on Grid Density and Attraction.†ACM Transactions on Knowledge discovery Data, 3(3): Article No. 12, 2009.
[2] Han J. and Kamber, M. “Data Mining Concepts and Techniques.†2nd Ed. Burlington: Morgan Kauffman, 2006.
[3] J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. P. L. F. d. Carvalho, and J. a. Gama, “Data stream clustering: A survey,†ACM Computing Surveys, vol. 46, no. 1, pp. 13:1–13:31, Jul. 2013.
[4] Ester M., Kriegel H., Sander J., and Xu X. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.†In: Proc. of 2nd International Conference on Knowledge Discovery, pp. 226–231, 1996.
[5] Cao F, Ester M, Qian W, Zhou A. “Density-Based Clustering Over an Evolving Data Stream with Noise.†In Proc. the SIAM Conference on Data Mining, April 2006, pp.328-339.https://doi.org/10.1137/1.9781611972764.29.
[6] Tasoulis D K, Ross G, Adams N M. “Visualising the Cluster Structure of Data Streams.†In Proc. the 7th International Conference on Intelligent Data Analysis, Sept. 2007, pp.81- 92.https://doi.org/10.1007/978-3-540-74825-0_8.
[7] Menasalvas E, Ruiz C, Spiliopoulou M. “C-DenStream: Using Domain Knowledge on a Data Stream.†In Proc. the 12th International Conference on Discovery Science, Oct. 2009, pp.287-301.
[8] Jing K, Liu L, Guo Y et al. “A Three-Step Clustering Algorithm over an Evolving Data Stream.†In Proc. the IEEE Int. Conf. Intelligent Computing and Intelligent Systems, Nov. 2009, pp.160-164.
[9] Ren J, Ma R. “Density-Based Data Streams Clustering over Sliding Windows.†In Proc. the 6th Int. Conf. Fuzzy systems and Knowledge Discovery, Aug. 2009, pp.248-252.https://doi.org/10.1109/FSKD.2009.553.
[10] Lin J, Lin H. “A Density-Based Clustering over Evolving Heterogeneous Data Stream.†In Proc. The 2nd Int. Colloquium on Computing, Communication, Control, and Management, Aug. 2009, pp.275-277.https://doi.org/10.1109/CCCM.2009.5267735.
[11] Dunham M, Isaksson C, Hahsler M. “SOStream: Self Organizing Density-Based Clustering over Data Stream.†In Lecture Notes in Computer Science 7376, Perner P (ed.), Springer Berlin Heidelberg, 2012, pp.264-278.
[12] Zimek A, Ntoutsi I, Palpanas T et al. “Density-Based Projected Clustering over High Dimensional Data Streams.†In Proc. The 12th SIAM Int. Conf. Data Mining, April 2012, pp.987-998.
[13] Spaus P, Hassani M, Gaber M M, Seidl T. “Density-Based Projected Clustering of Data Streams.†In Proc. the 6th Int. Conf. Scalable Uncertainty Management, Sept. 2012, pp.311-324.
[14] Pizzuti C, Forestiero A, Spezzano G. “A Single Pass Algorithm for Clustering Evolving Data Streams based on Swarm Intelligence.†Data Mining and Knowledge Discovery, 2013, 26(1): 1-26.https://doi.org/10.1007/s10618-011-0242-x.
[15] Amineh A, Teh Ying W “LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream.†Journal of Computer and Communications, pp. 26-31, 2013.
[16] Hahsler M, and Matthew B. “Clustering Data Streams Based on Shared Density between Micro-Clusters.†IEEE Transactions on Knowledge and Data Engineering, 2016.https://doi.org/10.1109/TKDE.2016.2522412.
[17] Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. In Proc. ACM SIGMOD International Conference on Management of Data, June 1996, pp.103-114.https://doi.org/10.1145/233269.233324.
[18] Li J, Gao J, Zhang Z, Tan P N. An incremental Data Stream Clustering Algorithm Based on Dense Units Detection. In Proc. the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, May 2005, pp.420-425.
[19] Chen Y, Tu L. Density-Based Clustering for Real-Time Stream Data. In Proc. the 13th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Aug. 2007, pp.133-142.https://doi.org/10.1145/1281192.1281210.
[20] Tan C, Jia C, Yong A. A Grid and Density-Based Clustering Algorithm for Processing Data Stream. In Proc. the 2nd Int. Conf. Genetic and Evolutionary Computing, Sept. 2008, pp.517-521.
[21] Ng W K, Wan L, Dang X H et al. Density-Based Clustering of Data Streams at Multiple Resolutions. ACM Trans. Knowledge Discovery from Data, 2009, 3(3).
[22] Ren J, Cai B, Hu C. Clustering over Data Streams Based on Grid Density and Index Tree. Journal of Convergence IT, 2011, 6(1): 83-93.
[23] Yang Y, Liu Z, Zhang J et al. Dynamic Density-Based Clustering Algorithm over Uncertain Data Streams. In Proc. the 9th Int. Conf. Fuzzy Systems and Knowledge Discovery, May 2012, pp.2664-2670.https://doi.org/10.1109/FSKD.2012.6233800.
[24] Teh Ying W, Amini A, DENGRIS-Stream: A Density-Grid Based Clustering Algorithm for Evolving Data Streams over Sliding Window. In Proc. International Conference on Data Mining and Computer Engineering, Dec. 2012, pp.206-210.
[25] Kaur S, Bhatnagar V, Chakravarthy S. Clustering Data Streams using Grid-Based Synopsis. Knowledge and Information Systems, June 2013.
[26] L. Wan, W. K. Ng, X. H. Dang, P. S. Yu, and K. Zhang, “Density-Based Clustering of Data Streams at Multiple Resolutions,†ACM Transactions on Knowledge Discovery from Data, vol. 3, no. 3, pp. 1–28, 2009.https://doi.org/10.1145/1552303.1552307.
[27] George K, Eui-Hong H, Vipin K., “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling†IEEE Computer, pp. 68-75, August 1999.
[28] M. Hahsler, M. Bolanos, and J. Forrest, stream: Infrastructure for Data Stream Mining, 2015, R package version 1.2-2.
[29] Bache K, Lichman M (2013). UCI Machine Learning Repository." URL http://archive.ics.uci.edu/ml.
-
Downloads
-
How to Cite
K, S. S. R., & C, S. B. (2018). MCDAStream: a real-time data stream clustering based on micro-cluster density and attraction. International Journal of Engineering & Technology, 7(2), 270-275. https://doi.org/10.14419/ijet.v7i2.9051Received date: 2018-01-05
Accepted date: 2018-03-07
Published date: 2018-03-13