A Survey on Clustering Density Based Data Stream algorithms

  • Authors

    • Mayas Aljibawi
    • Mohd Zakree Ahmed Nazri
    • Zalinda Othman
    2018-12-09
    https://doi.org/10.14419/ijet.v7i4.36.23735
  • data mining, clustering, density-based clustering, grid-based clustering, micro-clustering, stream data clustering.
  • Abstract

    With the rapid evolution of technology, data size has increased as well. Thus, open the door to a new challenge of finding patterns such as the limitation of memory and time and the one pass to the whole data. Many clustering techniques has been developed to overcome these issues. Streaming data evolve with time, and that makes it almost impossible to define clusters number in that data. Density-based algorithm is one of the significant data clustering class to overcome this issue due to it doesn’t require an advance knowledge about the number of clusters. This paper reviewed some of the existing density-based clustering algorithms for the data stream with the measurement used to evaluate the algorithm.

     


     

  • References

    1. [1] Wong, K.-C., K.-S. Leung, and M.-H. Wong. Effect of spatial locality on an evolutionary algorithm for multimodal optimization. in European Conference on the Applications of Evolutionary Computation. 2010. Springer.

      [2] Amini, A., T.Y. Wah, and H. Saboohi, On density-based data streams clustering algorithms: A survey. Journal of Computer Science and Technology, 2014. 29(1): p. 116-141.

      [3] Han, J., J. Pei, and M. Kamber, Data mining: concepts and techniques. 2011: Elsevier.

      [4] Hruschka, E.R., R.J. Campello, and A.A. Freitas, A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2009. 39(2): p. 133-155.

      [5] Berkhin, P., A survey of clustering data mining techniques, in Grouping multidimensional data. 2006, Springer. p. 25-71.

      [6] Filippone, M., et al., A survey of kernel and spectral methods for clustering. Pattern recognition, 2008. 41(1): p. 176-190.

      [7] Borland, J., J. Hirschberg, and J. Lye, Data reduction of discrete responses: an application of cluster analysis. Applied Economics Letters, 2001. 8(3): p. 149-153.

      [8] Aggarwal, C.C., Data streams: models and algorithms. Vol. 31. 2007: Springer Science & Business Media.

      [9] O'callaghan, L., et al. Streaming-data algorithms for high-quality clustering. in Data Engineering, 2002. Proceedings. 18th International Conference on. 2002. IEEE.

      [10] Ackermann, M.R., et al., StreamKM++: A clustering algorithm for data streams. Journal of Experimental Algorithmics (JEA), 2012. 17: p. 2.4.

      [11] Jain, A.K. and R.C. Dubes, Algorithms for clustering data. 1988.

      [12] Mohammed, M.A., Ghani, M.K.A., Arunkumar, N., Obaid, O.I., Mostafa, S.A., Jaber, M.M., Burhanuddin, M.A., Matar, B.M. and Ibrahim, D.A., 2018. Genetic case-based reasoning for improved mobile phone faults diagnosis. Computers & Electrical Engineering, 71, pp.212-222.

      [13] Aggarwal, C.C., et al. -A Framework for Clustering Evolving Data Streams. in Proceedings 2003 VLDB Conference. 2003. Elsevier.

      [14] Karypis, G., E.-H. Han, and V. Kumar, Chameleon: Hierarchical clustering using dynamic modeling. Computer, 1999. 32(8): p. 68-75.

      [15] Zhang, T., R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. in ACM Sigmod Record. 1996. ACM.

      [16] Mostafa, S.A., Mustapha, A., Mohammed, M.A., Ahmad, M.S. and Mahmoud, M.A., 2018. A fuzzy logic control in adjustable autonomy of a multi-agent system for an automated elderly movement monitoring application. International journal of medical informatics, 112, pp.173-184.

      [17] Dempster, A.P., N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B (methodological), 1977: p. 1-38.

      [18] Dang, X.H., et al. An EM-based algorithm for clustering data streams in sliding windows. in International Conference on Database Systems for Advanced Applications. 2009. Springer.

      [19] Ghani, M.K.A., Mohammed, M.A., Ibrahim, M.S., Mostafa, S.A. And Ibrahim, D.A., 2017. Implementing An Efficient Expert System For Services Center Management By Fuzzy Logic Controller. Journal of Theoretical & Applied Information Technology, 95(13).

      [20] Sheikholeslami, G., S. Chatterjee, and A. Zhang, WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal—The International Journal on Very Large Data Bases, 2000. 8(3-4): p. 289-304.

      [21] Agrawal, R., et al., Automatic subspace clustering of high dimensional data for data mining applications. Vol. 27. 1998: ACM.

      [22] Chen, Y. and L. Tu. Density-based clustering for real-time stream data. in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007. ACM.

      [23] Wan, L., et al., Density-based clustering of data streams at multiple resolutions. ACM Transactions on Knowledge discovery from Data (TKDD), 2009. 3(3): p. 14.

      [24] Ester, M., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. in Kdd. 1996.

      [25] Mostafa, S.A., Mustapha, A., Hazeem, A.A., Khaleefah, S.H. and Mohammed, M.A., 2018. An Agent-Based Inference Engine for Efficient and Reliable Automated Car Failure Diagnosis Assistance. IEEE Access, 6, pp.8322-8331.

      [26] Hinneburg, A. and D.A. Keim. An efficient approach to clustering in large multimedia databases with noise. in KDD. 1998.

      [27] Cao, F., et al. Density-based clustering over an evolving data stream with noise. in Proceedings of the 2006 SIAM international conference on data mining. 2006. SIAM.

      [28] Tasoulis, D.K., G. Ross, and N.M. Adams. Visualising the cluster structure of data streams. in International Symposium on Intelligent Data Analysis. 2007. Springer.

      [29] Mutlag, A.A., Ghani, M.K.A., Arunkumar, N., Mohamed, M.A. and Mohd, O., 2019. Enabling technologies for fog computing in healthcare IoT systems. Future Generation Computer Systems, 90, pp.62-78..

      [30] Liu, L.-x., et al. A three-step clustering algorithm over an evolving data stream. in Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on. 2009. IEEE.

      [31] Ren, J. and R. Ma. Density-based data streams clustering over sliding windows. in Fuzzy Systems and Knowledge Discovery, 2009. FSKD'09. Sixth International Conference on. 2009. IEEE.

      [32] Lin, J. and H. Lin. A density-based clustering over evolving heterogeneous data stream. in Computing, Communication, Control, and Management, 2009. CCCM 2009. ISECS International Colloquium on. 2009. IEEE.

      [33] Isaksson, C., M.H. Dunham, and M. Hahsler. SOStream: Self organizing density-based clustering over data stream. in International Workshop on Machine Learning and Data Mining in Pattern Recognition. 2012. Springer.

      [34] Ntoutsi, I., et al. Density-based projected clustering over high dimensional data streams. in Proceedings of the 2012 SIAM International Conference on Data Mining. 2012. SIAM.

      [35] Hassani, M., et al. Density-based projected clustering of data streams. in International Conference on Scalable Uncertainty Management. 2012. Springer.

      [36] Forestiero, A., C. Pizzuti, and G. Spezzano, A single pass algorithm for clustering evolving data streams based on swarm intelligence. Data Mining and Knowledge Discovery, 2013. 26(1): p. 1-26.

      [37] Gao, J., et al. An incremental data stream clustering algorithm based on dense units detection. in Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2005. Springer.

      [38] Jia, C., C. Tan, and A. Yong. A grid and density-based clustering algorithm for processing data stream. in Genetic and Evolutionary Computing, 2008. WGEC'08. Second International Conference on. 2008. IEEE.

      [39] Tu, L. and Y. Chen, Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery from Data (TKDD), 2009. 3(3): p. 12.

      [40] 20. Mostafa, S.A., Ahmad, M.S., Mustapha, A. and Mohammed, M.A., 2017. Formulating layered adjustable autonomy for unmanned aerial vehicles. International Journal of Intelligent Computing and Cybernetics, 10(4), pp.430-450.

      [41] Ren, J., B. Cai, and C. Hu, Clustering over data streams based on grid density and index tree. Journal of Convergence Information Technology, 2011. 6(1).

      [42] Amini, A. and T.Y. Wah. DENGRIS-Stream: A density-grid based clustering algorithm for evolving data streams over sliding window. in Proc. International Conference on Data Mining and Computer Engineering. 2012.

      [43] Cao, Y., H. He, and H. Man, SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps. IEEE transactions on neural networks and learning systems, 2012. 23(8): p. 1254-1268.

      [44] Amini, A. and T.Y. Wah, Leaden-stream: A leader density-based clustering algorithm over evolving data stream. Journal of Computer and Communications, 2013. 1(05): p. 26.

      [45] Bhatnagar, V., S. Kaur, and S. Chakravarthy, Clustering data streams using grid-based synopsis. Knowledge and information systems, 2014. 41(1): p. 127-152.

      [46] Amini, A., et al., A fast density-based clustering algorithm for real-time internet of things stream. The Scientific World Journal, 2014. 2014.

      [47] Amini, A., et al., MuDi-Stream: A multi density clustering algorithm for evolving data stream. Journal of Network and Computer Applications, 2016. 59: p. 370-385.

      [48] Ding, S., et al., An adaptive density data stream clustering algorithm. Cognitive Computation, 2016. 8(1): p. 30-38.

      [49] Carnein, M. and H. Trautmann, evoStream–Evolutionary Stream Clustering Utilizing Idle Times. Big Data Research, 2018.

  • Downloads

  • How to Cite

    Aljibawi, M., Zakree Ahmed Nazri, M., & Othman, Z. (2018). A Survey on Clustering Density Based Data Stream algorithms. International Journal of Engineering & Technology, 7(4.36), 147-153. https://doi.org/10.14419/ijet.v7i4.36.23735

    Received date: 2018-12-12

    Accepted date: 2018-12-12

    Published date: 2018-12-09