A contemplate report on clustering evaluation and nonlinear clustering in high-dimensional data

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Every day people use large volumes of data, for future purpose data can be classified into different categories such as clusters. The main intension of the cluster is to divide unlabeled finite dataset in to different set of structures. Distribution of clusters classified into linearly independent clustering and non- linearly independent clustering. Non-linear independent clustering means at least one group with rounded boundaries or of arbitary figures. Many clustering algorithms don’t calculate approximately interior clusters. Several indexes used and planned for different Scenarios. There is no combining procedure for cluster assessment. We reconsider the existing clustering quality process and measure is difficult context designed for high-dimensional clustering. Dimensionality affect dissimilar clustering value indexes in dissimilar modes; few are preferred, to establish clustering quality in several ways. We are discuss in this paper, clustering evaluation, internal criteria, cluster quality indices, comparison of various clustering algorithms, problems in analyzing high dimensional data, clustering techniques for high dimensional data and perspectives and future directions.



  • Keywords

    Linear Clustering; Non- Linear Clustering; High-Dimensional Data; Hubness; Data Clustering; Cluster Indexes; Internal Indices; External Indices; Distance Concentration.

  • References

      [1] Rousseeuw, P.J., Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J.Comput.Appl.Math. 20.53-65(1987).

      [2] Vendramin, L., Campello, R.J.G.B., Hruschka. Relative clustering validity criteria: a comparative overview. Stat.Anal.DatMin.3(4),209-235(2010).

      [3] Dunn, J.C.: Well-seperated clusters and optimal fuzzy operations. J.Cybern. $(1), 95-104(1974).

      [4] Davies, D.L., Bouldin, D.W.: A cluster separation measure, IEEE Trans. Patteren Anal, Mach.Intell. 1(2), 224-227(1979).

      [5] Pauwels,E.J.,Frederex, G.: Cluster-based segmentation of natural scenes, In: Proceedings of the 7th IEEE International Conference on Computer Vision(ICCV), Vol.2, pp.997-1002(1999)Hubert, L.J.,Levin,J.R.: A general statistical framework for assessing categorical clustering in free recall.Psycol.Bull.83(6),1072(1976).

      [6] Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J.Am/Sta.Data Eng.19 (7), 553-569 (1983).

      [7] Ratkowsky, D., Lance, G.: A criterion for determining the number of groups in a classification. Aust.Comt.J.10 (3), 115-117 (1978).

      [8] Calinski, T., Harabasz, J.: Adendrite method for cluster analysis. Commun.Stat. Simul.Comput. 3(1), 1-27 (1974).

      [9] Bougulia, N., Almakadmeh. Boutemedjet, S.: A finite mixture model for simultaneous high-dimensional clustering localized feature selection and outlier rejection. Expert Syst. Appl. 39(7) 6641-6656 (2012).

      [10] Baker, F.B., Hubert, and L.J.: Measuring the power of hierarchical cluster analysis. J. Am.Stat.Assoc. 70(349), 31-38(1975).

      [11] Milligan, G.W.: A Monte carlo study of thirty internal criterion measures for cluster analysis Psychometrika 46(2), 187-199(1981).

      [12] Hubert, L., Arabie, and P.: Comparing patitions. J. Classif. 2(1), 193-218 (1985).

      [13] Santos, J.M., Embrechts, and M.: On the use of the adjusted rand index as a metric for evaluating supervised classification. In: Proceedings of the 19th International Conference on Artificaial Neural Networks (ICANN), Part II.Lecture Notes in Computer Science,Vol.5769, pp.175-184. Springer, Berlin (2009).

      [14] .Bellman, R.E: Adaptive Control Process-A Guided Tour. Princeton University Press, Priceton (1961).

      [15] Chavez.E, Navarro G: Probabilistic Spell for Curse of dimensionality in metric spaces. Inf.Process Lett. 85 (1), 39-46(2003).

      [16] Serpent, G., Pathical, S: Classification in high dimensional feature spaces: Random sub sample ensemble.In: Proceedings of the International Conference on machine Learning and Applications (ICMLA), pp.740-745(2009).

      [17] Evangelista, P.F., Embrechts, M.J., Szymanski. Taming the curse of dimensionality in kernels and novelty detection.In: Applied Soft Computing Technologies: The challenge of complexity, pp.425-438, springer, Berlin (2006).

      [18] Aggarwal, CC.: On randomization, public information and the curse of dimensionality. In. Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE) pp.136-145(2007).

      [19] Randovanvic .M: Representations and Metrics in High Dimensional Data mining. Serbia (2011).

      [20] Cater, raich, Hero: On local intrinsic dimension estimation and its applications. IEEE trans, Signal Process, 58 (2), 650-663 (2010).

      [21] Gupta. M.D, Huang, T.S: Regularized maximum likelihood for intrinsic dimension estimation. Compt,Res.pep. (2012).

      [22] Durrant, R, J, Kaban When is ‘nearest neighbour’ meaningful: a converse thermo and implications. J. Complex. 25(4), 385-397(2009).

      [23] Zimek, Schubert,E, Kriegel, H.P: A survey on supervised outlier detection in high dimensional numerical data. Stat, Anal, Data Mining.5 (5), 363-387(2012).

      [24] Kabana, Non-parametric detection of meaningless distances in high dimensional data. Stat.comput. 22(2), 375-385(2012).

      [25] Yin, J., Fan, X., Chen, Y., Ren, and J Highdimensional shared neares neighbor clustering algorithm. In: Fuzzy Systems and Knowledge Discivery, Lecture Notes in Computer Science, vol.3614, pp. 484-484,Springer, Berlin(2005).

      [26] Randovanovic,M: Nanopoulos,A., Ivanovic, M.: Nearest neighbours in high dimensional data: The emergence and influence of hubs. In: Proceedings of the 26th International Conference on Machine Learning (ICML), pp.865-872(2009).

      [27] Tomasev, N., Rupnik,J., Mladenic, D., The role of hubs in cross-lingual supervised document retrieval . In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp.185-196. Springer, Berlin (2013).

      [28] Tomasev, N., Mladenic, D.: Hubness aware shared neighbor distances for high dimensional k-nearest neighbor classification, Knowl,Ing,Syst. 39(1), 89-122(2013).

      [29] Tomasev, N., Randavonic, M., Mladenic, D., Ivanovic, M., The role pf hubness in clustering high dimensional data IEEE Trans, Know Data Eng. 26(3), 739-751(2014).




Article ID: 19306
DOI: 10.14419/ijet.v7i3.29.19306

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.