Data Mining Models of High Dimensional Data Streams, and Contemporary Concept Drift Detection Methods: a Comprehensive Review

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Concept drift is defined as the distributed data across multiple data streams that change over the time. Concept drift is visible only when the type of collected data changes after some stable period. The emergence of concept drift in data streams leads to increase misclassification and performing degradation of data streams. In order to obtain accurate results, identification of such concept drifts must be visible. This paper focused on a review of the issues related to identifying the changes occurred in the various multivariate high dimensional data streams. The insight of the manuscript is probing the inbuilt difficulties of existing contemporary change-detection methods when they encounter during data dimensions scales.



  • Keywords

    CUSUM, streaming ensemble algorithm, concept drift detection, dimensional data streams, change-detection tests, Hoteling’s t-squared test, Bayesian Online Change Point Detection.

  • References

      [1] Evangelista PF, Embrechts MJ & Szymanski BK, “Taming the curse of dimensionality in kernels and novelty detection”, Applied soft computing technologies: The challenge of complexity, (2006), pp.425-438.

      [2] Gama J, Žliobaitė I, Bifet A, Pechenizkiy M & Bouchachia A, “A survey on concept drift adaptation”, ACM computing surveys (CSUR), (2014).

      [3] Gama J, Medas P, Castillo G & Rodrigues P, “Learning with drift detection”, Brazilian symposium on artificial intelligence, (2004) pp.286-295.

      [4] Alippi C, Boracchi G & Roveri M, “Just-in-time classifiers for recurrent concepts”, IEEE transactions on neural networks and learning systems, Vol.24, No.4,(2013), pp.620-34.

      [5] Ross GJ, Adams NM, Tasoulis DK & Hand DJ, “Exponentially weighted moving average charts for detecting concept drift”, Pattern recognition letters, Vol.33, No.2, (2012), pp.191-8.

      [6] Gama J, Knowledge discovery from data streams, CRC Press, (2010).

      [7] Ditzler G, Roveri M, Alippi C & Polikar R, “Learning in nonstationary environments: A survey”, IEEE Computational Intelligence Magazine, Vol.10, No.4,(2015), pp.12-25.

      [8] Khamassi I, Sayed-Mouchaweh M, Hammami M & Ghédira K, “Self-adaptive windowing approach for handling complex concept drift”, Cognitive Computation, Vol.7, No.6,(2015), pp.772-90.

      [9] Minku LL, White AP & Yao X, “The impact of diversity on online ensemble learning in the presence of concept drift”, IEEE Transactions on knowledge and Data Engineering, Vol.22, No.5,(2010), pp.730-42.

      [10] Tsymbal A, Pechenizkiy M, Cunningham P & Puuronen S, “Handling local concept drift with dynamic integration of classifiers: Domain of antibiotic resistance in nosocomial infections”, 19th IEEE International Symposium on Computer-Based Medical Systems, (2006), pp. 679-684.

      [11] Sebastião R, Silva MM, Rabiço R, Gama J & Mendonça T, “Real-time algorithm for changes detection in depth of anesthesia signals”, Evolving Systems, Vol.4, No.1,(2013), pp.3-12.

      [12] Toubakh H & Sayed-Mouchaweh M, “Hybrid dynamic data-driven approach for drift-like fault detection in wind turbines”, Evolving Systems, Vol.6, No.2, (2015), pp.115-29.

      [13] Navarro-Gonzalez JL, Lopez-Juarez I, Ordaz-Hernandez K & Rios-Cabrera R, “On-line incremental learning for unknown conditions during assembly operations with industrial robots”, Evolving Systems, Vol.6, No.2, (2015), pp.101-14.

      [14] Sun J, Li H & Adeli H, “Concept drift-oriented adaptive and dynamic support vector machine ensemble with time window in corporate financial risk prediction”, IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol.43, No.4,(2013), pp.801-13.

      [15] Aloraini A, “Penalized ensemble feature selection methods for hidden associations in time series environments case study: equities companies in saudi stock exchange market”, Evolving Systems, Vol.6, No.2,(2015), pp.93-100.

      [16] Wang S, Minku LL & Yao X, “Online class imbalance learning and its applications in fault detection”, International Journal of Computational Intelligence and Applications, Vol.12, No.04,(2013).

      [17] AlZoubi O, Fossati D, D’Mello S & Calvo RA, “Affect detection from non-stationary physiological data using ensemble classifiers”, Evolving Systems, Vol.6, No.2,(2015), pp.79-92.

      [18] Tünnermann J & Mertsching B, “Region-based artificial visual attention in space and time”, Cognitive computation, Vol.6, No.1, (2014), pp.125-43.

      [19] Amiribesheli M, Benmansour A & Bouchachia A, “A review of smart homes in healthcare”, Journal of Ambient Intelligence and Humanized Computing, Vol.6, No.4,(2015), pp.495-517.

      [20] Wald A, Sequential analysis, Courier Corporation, (1973).

      [21] Basseville M & Nikiforov IV, Detection of abrupt changes: theory and application, Englewood Cliffs: Prentice Hall, (1993).

      [22] Pimentel MA, Clifton DA, Clifton L & Tarassenko L, “A review of novelty detection”, Signal Processing, Vol.99,(2014), pp.215-49.

      [23] Ben-Gal I, “Outlier detection”, Data mining and knowledge discovery handbook, (2005), pp.131-146.

      [24] Kuncheva LI, “Change detection in streaming multivariate data using likelihood detectors”, IEEE Transactions on Knowledge and Data Engineering, Vol.25, No.5,(2013), pp.1175-80.

      [25] Zorriassatine F, Al-Habaibeh A, Parkin RM, Jackson MR & Coy J, “Novelty detection for practical pattern recognition in condition monitoring of multivariate processes: a case study”, The International Journal of Advanced Manufacturing Technology, Vol.25, No.9-10, (2005),pp. 954-63.

      [26] Nguyen TD, Du Plessis MC, Kanamori T & Sugiyama M, “Constrained least-squares density-difference estimation”, IEICE TRANSACTIONS on Information and Systems, Vol.97, No.7,(2014), pp.1822-9.

      [27] Tartakovsky AG, Rozovskii BL, Blažek RB & Kim H, “Detection of intrusions in information systems by sequential change-point methods”, Statistical methodology, Vol.3, No.3,(2006), pp.252-93.

      [28] Tartakovsky AG, Rozovskii BL, Blazek RB & Kim H, “A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods”, IEEE Transactions on Signal Processing, Vol.54, No.9, (2006), pp.3372-82.

      [29] Krempl G, “The algorithm APT to classify in concurrence of latency and drift”, International Symposium on Intelligent Data Analysis, (2011), pp.222-233.

      [30] Dyer KB, Capo R & Polikar R, “Compose: A semisupervised learning framework for initially labeled nonstationary streaming data”, IEEE transactions on neural networks and learning systems, Vol.25, No.1,(2014), pp.12-26.

      [31] Lung-Yut-Fong A, Lévy-Leduc C & Cappé O, “Robust changepoint detection based on multivariate rank statistics”, A IEEE International Conference on coustics, Speech and Signal Processing (ICASSP), (2011), pp.3608-3611.

      [32] Ditzler G & Polikar R, “Hellinger distance-based drift detection for nonstationary environments”, IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE), (2011), pp.41-48.

      [33] Montanez GD, Amizadeh S & Laptev N, “Inertial Hidden Markov Models: Modeling Change in Multivariate Time Series”, AAAI, (2015), pp.1819-1825

      [34] Adams RP & MacKay DJ, “Bayesian online change point detection”, arXiv preprint arXiv:0710.3742, (2007).

      [35] Cavalcante RC, Minku LL & Oliveira AL, “Fedd: Feature extraction for explicit concept drift detection in time series”, International Joint Conference on Neural Networks (IJCNN), (2016), pp.740-747.

      [36] Barnett I & Onnela JP, “Change point detection in correlation networks”, Scientific reports, (2016).

      [37] Idé T, Phan DT & Kalagnanam J, “Change Detection Using Directional Statistics”, IJCAI, (2016), pp.1613-1619.

      [38] Yamada M, Kimura A, Naya F & Sawada H, “Change-Point Detection with Feature Selection in High-Dimensional Time-Series Data”, IJCAI, (2013), pp.1827-1833.

      [39] Hocking T, Rigaill G, Vert JP & Bach F, “Learning sparse penalties for change-point detection using max margin interval regression”, International conference on machine learning, (2013), pp.172-180.

      [40] Harel M, Mannor S, El-Yaniv R & Crammer K, “Concept drift detection through resampling”, International Conference on Machine Learning, (2014), pp.1009-1017.

      [41] Bardwell L & Fearnhead P, “Bayesian detection of abnormal segments in multiple time series”, Bayesian Analysis, Vol.12, No.1, (2017), pp.193-218.

      [42] Cabrieto J, Tuerlinckx F, Kuppens P, Grassmann M & Ceulemans E, “Detecting correlation changes in multivariate time series: A comparison of four non-parametric change point detection methods”, Behavior research methods, Vol.49, No.3,(2017), pp.988-1005.

      [43] Jones M, Nikovski D, Imamura M & Hirata T, “Exemplar learning for extremely efficient anomaly detection in real-valued time series”, Data Mining and Knowledge Discovery, Vol.30, No.6, (2016), pp.1427-54.

      [44] Qahtan AA, Alharbi B, Wang S & Zhang X, “A pca-based change detection framework for multidimensional data streams: Change detection in multidimensional data streams”, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2015), pp.935-944.

      [45] Guha S, Mishra N, Roy G & Schrijvers O, “Robust random cut forest based anomaly detection on streams”, International Conference on Machine Learning, (2016), pp.2712-2721.

      [46] Song X, Wu M, Jermaine C & Ranka S, “Statistical change detection for multi-dimensional data”, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, (2007), pp.667-676.

      [47] Dasu T, Krishnan S, Venkatasubramanian S & Yi K, “An information-theoretic approach to detecting changes in multi-dimensional data streams”, Proc. Symp. on the Interface of Statistics, Computing Science, and Applications, (2006).

      [48] Krempl G, Siddiqui ZF & Spiliopoulou M, “Online clustering of high-dimensional trajectories under concept drift”, Proceedings of the European conference on Machine learning and knowledge discovery in databases-Volume Part II, (2011), pp.261-276.

      [49] Gaber MM & Yu PS, “Classification of changes in evolving data streams using online clustering result deviation”, Proc. Of International Workshop on Knowledge Discovery in Data Streams, (2006).

      [50] Hunt KM & Turner AG, “The effect of soil moisture perturbations on Indian monsoon depressions in a numerical weather prediction model”, Journal of Climate, Vol.30, No.21,(2017), pp.8811-23.

      [51] Faithfull WJ & Kuncheva LI, “On Optimum Thresholding of Multivariate Change Detectors”, Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), (2014), pp.364-373.

      [52] Krawczyk B, Minku LL, Gama J, Stefanowski J & Woźniak M, “Ensemble learning for data stream analysis: A survey”, Information Fusion, Vol.37, (2017), pp.132-56.

      [53] Maciel BI, Santos SG & Barros RS, “A lightweight concept drift detection ensemble”, IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), (2015), pp.1061-1068.

      [54] Woźniak M, Ksieniewicz P, Cyganek B & Walkowiak K, “Ensembles of Heterogeneous Concept Drift Detectors-Experimental Study”, 15th IFIP International Conference on Computer Information Systems and Industrial Management, (2016), pp.538-549.

      [55] Du L, Song Q, Zhu L & Zhu X, “A selective detector ensemble for concept drift detection”, The Computer Journal, Vol.58, No.3, (2014), pp.457-71.

      [56] Alippi C, Boracchi G & Roveri M, “Hierarchical change-detection tests. IEEE transactions on neural networks and learning systems”, Vol.28, No.2, (2017), pp.246-58.

      [57] Bifet A, Frank E, Holmes G & Pfahringer B, “Ensembles of restricted hoeffding trees”, ACM Transactions on Intelligent Systems and Technology (TIST), Vol.3, No.2, (2012).

      [58] Frías-Blanco I, del Campo-Ávila J, Ramos-Jiménez G, Morales-Bueno R, Ortiz-Díaz A & Caballero-Mota Y, “Online and non-parametric drift detection methods based on Hoeffding’s bounds”, IEEE Transactions on Knowledge and Data Engineering, Vol.27, No.3, (2015), pp.810-23.

      [59] Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavaldà R & Morales-Bueno R, “Early drift detection method”, (2006).

      [60] Street WN & Kim Y, “A streaming ensemble algorithm (SEA) for large-scale classification”, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, (2001), pp.377-382.

      [61] Kuncheva LI, “Classifier ensembles for detecting concept change in streaming data: Overview and perspectives”, 2nd Workshop SUEMA, (2008), pp.5-10.




Article ID: 14959
DOI: 10.14419/ijet.v7i3.6.14959

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.