A brief survey of unsupervised agglomerative hierarchical clustering schemes
-
2019-01-27 https://doi.org/10.14419/ijet.v8i1.13971 -
Agglomerative Hierarchical Clustering, Clustering Process, Distance Metric, Divisive Hierarchical Clustering, Similarity Measure, Linkage Method. -
Abstract
Unsupervised hierarchical clustering process is a mathematical model or exploratory tool aims to provide the easiest way to categorize the distinct groups over the large volume of real time observations or dataset in tree form based on nature of similarity measures without prior knowledge. Dataset is an important aspect in the hierarchical clustering process that denotes the behavior of living species depicts the properties of a natural phenomenon and result of a scientific experiment and observation of a running machinery system without label identification. The hierarchical clustering scheme consists of Agglomerative and Divisive that is applicable to employ into various scientific research areas like machine learning, pattern recognition, big data analysis, image pixel classification, information retrieval, and bioinformatics for distinct patterns identification. This paper discovered a brief survey of agglomerative hierarchical clustering schemes with its clustering procedures, linkage metrics, complexity analysis, key issues and development of AHC scheme.
Â
-
References
[1] Antti Honkela, Jeremias Seppa & Esa Alhoniemi, “Agglomerative independent variable group analysisâ€, Neurocomputing, vol. 71, 2008, pp. 1311-1320. https://doi.org/10.1016/j.neucom.2007.11.024.
[2] Athman Bauguettaya, Qi Yu, Xumin Liu, Xiangmin Zhou & Andy Song, “Efficient Agglomerative Hierarchical Clusteringâ€, Expert Systems with Applications, vol. 42, no. 5, 2015, pp. 2785-2797. https://doi.org/10.1016/j.eswa.2014.09.054.
[3] Caiming Zhong, Duoqian Miao & Pasi Franti, “Minimum Spanning tree based split-and-merge: A hierarchical clustering methodâ€, Information Sciences, vol. 181, 2011, pp. 3397-3410.
[4] Chih-Tang Chang, Lai Jim, ZC & Jeng, MD, “Fast agglomerative clustering using information of k-nearest neighborsâ€, Pattern Recognition, vol. 43, 2010, pp. 3958-3968. https://doi.org/10.1016/j.patcog.2010.06.021.
[5] Chung-Chain Hsu, Chin-Long Chen & Yu-Wei Su, “Hierarchical clustering of mixed data based on distance hierarchyâ€, Information Sciences, vol. 177, no. 20, 2007, pp. 4474-4492. https://doi.org/10.1016/j.ins.2007.05.003.
[6] Davies, DL & Bouldin, DW, “Cluster Separation Measureâ€, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 1,
no. 2, 1979, pp. 95-105. https://doi.org/10.1109/TPAMI.1979.4766909.[7] De Amorim, RC, “Feature Relevance in Wardâ€s Hierarchical Clustering Using the Lp Normâ€, Journal of Classification, vol. 32,
no. 1, 2015, pp. 46–62. https://doi.org/10.1007/s00357-015-9167-1.[8] Defays, D, “An efficient algorithm for a complete link method“, Computer Journal British Computer Society, vol. 20, no. 4,
1977, pp. 364–366. https://doi.org/10.1093/comjnl/20.4.364.[9] Duda, R, Hart, PE & Stork, DG, “Pattern classificationâ€, 2nd edition, New York, NY: John Wiley & Sons, 2001.
[10] Dutta, M, Kakoti Mahanta, A & Arun Pujari, K, “QROCK: A quick version of the algorithm for clustering of categorical dataâ€, Pattern Recognition Letter, vol.26, 2005, pp. 2364-2373. https://doi.org/10.1016/j.patrec.2005.04.008.
[11] Fionn Murtagh & Legendre P, “Wards hierarchical agglomerative clustering method: which algorithms implement wards criterionâ€, Journal of Classification, vol. 31, 2014, pp. 274-295. https://doi.org/10.1007/s00357-014-9161-z.
[12] Franti, P, Kaukoranta, T, Sen, DF & Chang, KS, “Fast and memory efficient implementation of the exact PNNâ€, IEEE Transaction on Image Processing, vol. 9, 2000, pp. 773-777. https://doi.org/10.1109/83.841516.
[13] George Karypis, Eui-Hong Han & Vipin Kumar, “Chameleon: Hierarchical Clustering Using Dynamic Modeling, Computersâ€, vol. 32, no. 8, 1999, pp. 68-75. https://doi.org/10.1109/2.781637.
[14] George Kollios, Dimitrios Gunopulos, Nick Koudas & Stefan Berchtold, “Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Setsâ€, IEEE Transaction on Knowledge and Data Engineering, vol. 15, no. 5, 2003, pp. 1170-1187. https://doi.org/10.1109/TKDE.2003.1232271.
[15] Gower J, “A comparison of some methods of cluster analysisâ€, Biometrics, vol. 23, no. 4, 1967, pp. 623-628. https://doi.org/10.2307/2528417.
[16] Gower, JC & Ross, GJS, “Minimum spanning trees and single linkage cluster analysisâ€, Journal of the Royal Statistical Society, Series C, vol. 18, no. 1, 1969, pp. 54–64. https://doi.org/10.2307/2346439.
[17] Ham, J & Kamber, M, “Data Mining: Concepts and Techniquesâ€, 2nd Edition. Kaufmann, 2006.
[18] Hisashi Koga, Tetsuo Ishibashi & Toshinori Watanabe, “Fast agglomerative hierarchical cluatering algorithm using locality-sensitive hashingâ€, Knowledge and Information Systems , vol. 12, 2007, pp. 25-53. https://doi.org/10.1007/s10115-006-0027-5.
[19] Jain, A & Dubes, R, “Algorithm for clustering dataâ€, Englewood Cliffs, NJ: Prentice Hall, 1988.
[20] Jain, AK, Murty, MN & Flynn, PJ, “Data Clustering: A Reviewâ€, ACM Computer Surveys, vol. 31, no. 3, 1988, pp. 264-323. https://doi.org/10.1145/331499.331504.
[21] Johnson, S, “Hierarchical clustering schemesâ€, Psychometrika, vol. 32, no. 3, 1967, pp. 241-254. https://doi.org/10.1007/BF02289588.
[22] Lai Jim, ZC, Juan Eric, YT & Lai Franklin, JC, “Rough clustering using generalized fuzzy clustering algorithmâ€, Pattern Recognition, vol. 46, 2013, pp. 2538-2547. https://doi.org/10.1016/j.patcog.2013.02.003.
[23] Lai Jim, ZC & Tsung-Jen Huang, “An agglomerative clustering algorithm using a dynamic k-nearest-neighbor listâ€, Information Sciences, vol. 181, 2011, pp. 1722-1734. https://doi.org/10.1016/j.ins.2011.01.011.
[24] Lance, G & Williams, W, “A general theory of classification sorting strategies: Hierarchical systemâ€, Computer Journal, vol. 9, 1967, pp. 373-380. https://doi.org/10.1093/comjnl/9.4.373.
[25] Lin, CR & Chen, MS, “Combining partitional and hierarchical algorithms for robust and efficient data clustering with chesion self-mergingâ€, IEEE Transaction on Knowledge and Data Engineering, vol. 17, 2005, pp. 145-159. https://doi.org/10.1109/TKDE.2005.21.
[26] Manoranjan Dash, Huan Liu, Peter Scheuermann & Kian Lee Tan, “Fast Hierarchical Clustering and its validationâ€, Data and Knowledge Engineering, vol. 44, 2003, pp. 109-138. https://doi.org/10.1016/S0169-023X(02)00138-6.
[27] Mark Junjie Li, Michael K Ng, Yiu-ming Cheung & Joshua Zhexue Huang, “Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clustersâ€, IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 11, 2008, pp. 1519-1534. https://doi.org/10.1109/TKDE.2008.88.
[28] Martin Ester, Hans-peter Kriegel, Jorgs & Xiaowei Xu, “A denisty-based algorithm for discovering cluster in large spatial data base with noiseâ€, 2nd International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226- 231.
[29] Mcquitty, L, “Similarity analysis by reciprocal pairs for discrete and continuous dataâ€, Educational and Psychological Measurement, vol. 27, 1966, pp. 21-46. https://doi.org/10.1177/001316446702700103.
[30] Murtagh, F, “Survey of recent advances in hierarchical clustering algorithmsâ€, The Computer Journal, vol. 26, no. 4, 1983, pp. 354-359. https://doi.org/10.1093/comjnl/26.4.354.
[31] Murtagh, F 1984, “Complexities of Hierarchic Clustering Algorithms: the state of the artâ€, Computational Statistics Quarterly, vol. 1, 1984, pp. 101–113.
[32] Ng, RT & Jiawei Han, “CLARANS: A Method for clustering objects for spatial data miningâ€, IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 1, 2002, pp. 1003-1016. https://doi.org/10.1109/TKDE.2002.1033770.
[33] Pasi Franti, Olli Virmajoki & Ville Hautamaki, “Fast Agglomerative Clustering using a k-Nearest Neighbor Graphâ€, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol.28, 2006, pp. 1875-1881. https://doi.org/10.1109/TPAMI.2006.227.
[34] Rui, XU & Donald Wunsch, C, “Clusteringâ€, IEEE Press, 2009.
[35] Sibson, R, “SLINK: an optimally efficient algorithm for the single-link cluster method, The Computer Journal (British Computer Society), vol. 16, no. 1, 1973, pp. 30–34. https://doi.org/10.1093/comjnl/16.1.30.
[36] Sneath, P, “The application of computers to taxonomyâ€, Journal of general microbiology, vol. 17, 1957, pp. 201-226.
[37] Sokal, R & Michener, C, “A statistical method for evaluating systematic relationshipsâ€, University of Kansas Science Bulletin, vol. 38, 1958, pp. 1409-1438.
[38] Sudipto Guha, Rajeev Rastogi & Kyuseok Shim, “CURE: An Effective Clustering Algorithm for Large Databaseâ€, Information Systems, vol. 26, 2001, pp. 35-58. https://doi.org/10.1016/S0306-4379(01)00008-4.
[39] Sudipto Guha, Rajeev Rastogi & Kyuseok Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributesâ€, Information System, vol. 25, no. 5, 2000, pp. 512-521. https://doi.org/10.1016/S0306-4379(00)00022-3.
[40] Szekely Gabor, J & Rizzo Maria, L, “Hierarchical clustering via joint between-within distance: extending wards minimum variance methodâ€, Journal of Classification, vol. 22, 2005, pp. 151-183. https://doi.org/10.1007/s00357-005-0012-9.
[41] Vijaya, PA, Narasimha Murty, M & Subramanian, DK, “Leader-Subleaders: An Efficient Hierarchical Clustering Algorithm for Large datasetsâ€, Pattern Recognition Letter, vol. 25, 2004, pp. 505-513. https://doi.org/10.1016/j.patrec.2003.12.013.
[42] Ward, J, “Hierarchical grouping to optimize an objective functionâ€, Journal of the American statistical association, vol. 58, 1963, pp. 236-244. https://doi.org/10.1080/01621459.1963.10500845.
[43] Wei Zhang, Gongxuan Zhang, Yongli Wang, Zhaomeng Zhu & Tao Li, “NNB: An Efficient Nearest Neighbor Search Method for Hierarchical Clustering on Large Datasetsâ€, IEEE International Conference on Semantic Computing (ICSC), 2015, pp. 405-412. https://doi.org/10.1109/ICOSC.2015.7050840.
[44] Zhang, T, Ramakrishnan, R & Livny, M, “BIRCH: an efficient data clustering method for very large databasesâ€, International Journal of Windom (Ed.) ACM SIGMOD International Conference on Management of data, 1996, pp. 103-114. https://doi.org/10.1145/233269.233324.
[45] Krishnamoorthy, K & Sreedhar kumar, S, “An Improved Agglomerative Clustering Algorithm for Outlier Detectionâ€, Applied Mathematics and Information Sciences, vol.10, no. 3, 2016, pp.1125-1138, doi.org/10.18576/amis/100332, https://doi.org/10.18576/amis/100332.
[46] Madheswaran M & Sreedhar kumar, S, “An Improved Frequency Based Agglomerative Clustering Algorithm for Detecting Distinct Clusters on Two Dimensional Dataset.â€, Applied Mathematics and Information Sciences, vol. 9, no. 4, 2017, pp. 30-41, 10.5897/JETR2017.0628, 2017.
[47] https://en.wikipedia.org/wiki/Hierarchical_clustering.
[48] https://en.wikipedia.org/wiki/Euclidean_distance.
[49] https://en.wikipedia.org/wiki/Single-Linkage_clustering.
[50] https://en.wikipedia.org/wiki/Complete-linkage_clustering.
[51] https://en.wikipedia.org/wiki/UPGMA.
[52] Sreedhar Kumar S, Madheswaran M, “An improved Partitioned Clustering Technique for Identifying Optimum Number of Dissimilar Groups in Multimedia Dataset,†European Journal of Scientific Research, vol. 151, no. 1, pp. 5-21, 2018.
-
Downloads
-
How to Cite
Kumar S, S., M, M., B A, V., Singh H, M., & K V, C. (2019). A brief survey of unsupervised agglomerative hierarchical clustering schemes. International Journal of Engineering & Technology, 8(1), 29-37. https://doi.org/10.14419/ijet.v8i1.13971