A brief survey of unsupervised agglomerative hierarchical  clustering schemes

Sreedhar Kumar S; Madheswaran M; Vinutha B A; Manjunatha Singh H; Charan K V

doi:10.14419/ijet.v8i1.13971

Article Summary Abstract References Full Article How to cite

Authors
- Sreedhar Kumar S Chandigarh University
- Madheswaran M Chandigarh University
- Vinutha B A
- Manjunatha Singh H
- Charan K V
2019-01-27

https://doi.org/10.14419/ijet.v8i1.13971
Agglomerative Hierarchical Clustering, Clustering Process, Distance Metric, Divisive Hierarchical Clustering, Similarity Measure, Linkage Method.
Abstract

Unsupervised hierarchical clustering process is a mathematical model or exploratory tool aims to provide the easiest way to categorize the distinct groups over the large volume of real time observations or dataset in tree form based on nature of similarity measures without prior knowledge. Dataset is an important aspect in the hierarchical clustering process that denotes the behavior of living species depicts the properties of a natural phenomenon and result of a scientific experiment and observation of a running machinery system without label identification. The hierarchical clustering scheme consists of Agglomerative and Divisive that is applicable to employ into various scientific research areas like machine learning, pattern recognition, big data analysis, image pixel classification, information retrieval, and bioinformatics for distinct patterns identification. This paper discovered a brief survey of agglomerative hierarchical clustering schemes with its clustering procedures, linkage metrics, complexity analysis, key issues and development of AHC scheme.
Â
References
1. [1] Antti Honkela, Jeremias Seppa & Esa Alhoniemi, â€œAgglomerative independent variable group analysisâ€, Neurocomputing, vol. 71, 2008, pp. 1311-1320. https://doi.org/10.1016/j.neucom.2007.11.024.
  
  [2] Athman Bauguettaya, Qi Yu, Xumin Liu, Xiangmin Zhou & Andy Song, â€œEfficient Agglomerative Hierarchical Clusteringâ€, Expert Systems with Applications, vol. 42, no. 5, 2015, pp. 2785-2797. https://doi.org/10.1016/j.eswa.2014.09.054.
  
  [3] Caiming Zhong, Duoqian Miao & Pasi Franti, â€œMinimum Spanning tree based split-and-merge: A hierarchical clustering methodâ€, Information Sciences, vol. 181, 2011, pp. 3397-3410.
  
  [4] Chih-Tang Chang, Lai Jim, ZC & Jeng, MD, â€œFast agglomerative clustering using information of k-nearest neighborsâ€, Pattern Recognition, vol. 43, 2010, pp. 3958-3968. https://doi.org/10.1016/j.patcog.2010.06.021.
  
  [5] Chung-Chain Hsu, Chin-Long Chen & Yu-Wei Su, â€œHierarchical clustering of mixed data based on distance hierarchyâ€, Information Sciences, vol. 177, no. 20, 2007, pp. 4474-4492. https://doi.org/10.1016/j.ins.2007.05.003.
  
  [6] Davies, DL & Bouldin, DW, â€œCluster Separation Measureâ€, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 1,
  no. 2, 1979, pp. 95-105. https://doi.org/10.1109/TPAMI.1979.4766909.
  
  [7] De Amorim, RC, â€œFeature Relevance in Wardâ€s Hierarchical Clustering Using the Lp Normâ€, Journal of Classification, vol. 32,
  no. 1, 2015, pp. 46â€“62. https://doi.org/10.1007/s00357-015-9167-1.
  
  [8] Defays, D, â€œAn efficient algorithm for a complete link methodâ€œ, Computer Journal British Computer Society, vol. 20, no. 4,
  1977, pp. 364â€“366. https://doi.org/10.1093/comjnl/20.4.364.
  
  [9] Duda, R, Hart, PE & Stork, DG, â€œPattern classificationâ€, 2^nd edition, New York, NY: John Wiley & Sons, 2001.
  
  [10] Dutta, M, Kakoti Mahanta, A & Arun Pujari, K, â€œQROCK: A quick version of the algorithm for clustering of categorical dataâ€, Pattern Recognition Letter, vol.26, 2005, pp. 2364-2373. https://doi.org/10.1016/j.patrec.2005.04.008.
  
  [11] Fionn Murtagh & Legendre P, â€œWards hierarchical agglomerative clustering method: which algorithms implement wards criterionâ€, Journal of Classification, vol. 31, 2014, pp. 274-295. https://doi.org/10.1007/s00357-014-9161-z.
  
  [12] Franti, P, Kaukoranta, T, Sen, DF & Chang, KS, â€œFast and memory efficient implementation of the exact PNNâ€, IEEE Transaction on Image Processing, vol. 9, 2000, pp. 773-777. https://doi.org/10.1109/83.841516.
  
  [13] George Karypis, Eui-Hong Han & Vipin Kumar, â€œChameleon: Hierarchical Clustering Using Dynamic Modeling, Computersâ€, vol. 32, no. 8, 1999, pp. 68-75. https://doi.org/10.1109/2.781637.
  
  [14] George Kollios, Dimitrios Gunopulos, Nick Koudas & Stefan Berchtold, â€œEfficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Setsâ€, IEEE Transaction on Knowledge and Data Engineering, vol. 15, no. 5, 2003, pp. 1170-1187. https://doi.org/10.1109/TKDE.2003.1232271.
  
  [15] Gower J, â€œA comparison of some methods of cluster analysisâ€, Biometrics, vol. 23, no. 4, 1967, pp. 623-628. https://doi.org/10.2307/2528417.
  
  [16] Gower, JC & Ross, GJS, â€œMinimum spanning trees and single linkage cluster analysisâ€, Journal of the Royal Statistical Society, Series C, vol. 18, no. 1, 1969, pp. 54â€“64. https://doi.org/10.2307/2346439.
  
  [17] Ham, J & Kamber, M, â€œData Mining: Concepts and Techniquesâ€, 2nd Edition. Kaufmann, 2006.
  
  [18] Hisashi Koga, Tetsuo Ishibashi & Toshinori Watanabe, â€œFast agglomerative hierarchical cluatering algorithm using locality-sensitive hashingâ€, Knowledge and Information Systems , vol. 12, 2007, pp. 25-53. https://doi.org/10.1007/s10115-006-0027-5.
  
  [19] Jain, A & Dubes, R, â€œAlgorithm for clustering dataâ€, Englewood Cliffs, NJ: Prentice Hall, 1988.
  
  [20] Jain, AK, Murty, MN & Flynn, PJ, â€œData Clustering: A Reviewâ€, ACM Computer Surveys, vol. 31, no. 3, 1988, pp. 264-323. https://doi.org/10.1145/331499.331504.
  
  [21] Johnson, S, â€œHierarchical clustering schemesâ€, Psychometrika, vol. 32, no. 3, 1967, pp. 241-254. https://doi.org/10.1007/BF02289588.
  
  [22] Lai Jim, ZC, Juan Eric, YT & Lai Franklin, JC, â€œRough clustering using generalized fuzzy clustering algorithmâ€, Pattern Recognition, vol. 46, 2013, pp. 2538-2547. https://doi.org/10.1016/j.patcog.2013.02.003.
  
  [23] Lai Jim, ZC & Tsung-Jen Huang, â€œAn agglomerative clustering algorithm using a dynamic k-nearest-neighbor listâ€, Information Sciences, vol. 181, 2011, pp. 1722-1734. https://doi.org/10.1016/j.ins.2011.01.011.
  
  [24] Lance, G & Williams, W, â€œA general theory of classification sorting strategies: Hierarchical systemâ€, Computer Journal, vol. 9, 1967, pp. 373-380. https://doi.org/10.1093/comjnl/9.4.373.
  
  [25] Lin, CR & Chen, MS, â€œCombining partitional and hierarchical algorithms for robust and efficient data clustering with chesion self-mergingâ€, IEEE Transaction on Knowledge and Data Engineering, vol. 17, 2005, pp. 145-159. https://doi.org/10.1109/TKDE.2005.21.
  
  [26] Manoranjan Dash, Huan Liu, Peter Scheuermann & Kian Lee Tan, â€œFast Hierarchical Clustering and its validationâ€, Data and Knowledge Engineering, vol. 44, 2003, pp. 109-138. https://doi.org/10.1016/S0169-023X(02)00138-6.
  
  [27] Mark Junjie Li, Michael K Ng, Yiu-ming Cheung & Joshua Zhexue Huang, â€œAgglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clustersâ€, IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 11, 2008, pp. 1519-1534. https://doi.org/10.1109/TKDE.2008.88.
  
  [28] Martin Ester, Hans-peter Kriegel, Jorgs & Xiaowei Xu, â€œA denisty-based algorithm for discovering cluster in large spatial data base with noiseâ€, 2nd International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226- 231.
  
  [29] Mcquitty, L, â€œSimilarity analysis by reciprocal pairs for discrete and continuous dataâ€, Educational and Psychological Measurement, vol. 27, 1966, pp. 21-46. https://doi.org/10.1177/001316446702700103.
  
  [30] Murtagh, F, â€œSurvey of recent advances in hierarchical clustering algorithmsâ€, The Computer Journal, vol. 26, no. 4, 1983, pp. 354-359. https://doi.org/10.1093/comjnl/26.4.354.
  
  [31] Murtagh, F 1984, â€œComplexities of Hierarchic Clustering Algorithms: the state of the artâ€, Computational Statistics Quarterly, vol. 1, 1984, pp. 101â€“113.
  
  [32] Ng, RT & Jiawei Han, â€œCLARANS: A Method for clustering objects for spatial data miningâ€, IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 1, 2002, pp. 1003-1016. https://doi.org/10.1109/TKDE.2002.1033770.
  
  [33] Pasi Franti, Olli Virmajoki & Ville Hautamaki, â€œFast Agglomerative Clustering using a k-Nearest Neighbor Graphâ€, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol.28, 2006, pp. 1875-1881. https://doi.org/10.1109/TPAMI.2006.227.
  
  [34] Rui, XU & Donald Wunsch, C, â€œClusteringâ€, IEEE Press, 2009.
  
  [35] Sibson, R, â€œSLINK: an optimally efficient algorithm for the single-link cluster method, The Computer Journal (British Computer Society), vol. 16, no. 1, 1973, pp. 30â€“34. https://doi.org/10.1093/comjnl/16.1.30.
  
  [36] Sneath, P, â€œThe application of computers to taxonomyâ€, Journal of general microbiology, vol. 17, 1957, pp. 201-226.
  
  [37] Sokal, R & Michener, C, â€œA statistical method for evaluating systematic relationshipsâ€, University of Kansas Science Bulletin, vol. 38, 1958, pp. 1409-1438.
  
  [38] Sudipto Guha, Rajeev Rastogi & Kyuseok Shim, â€œCURE: An Effective Clustering Algorithm for Large Databaseâ€, Information Systems, vol. 26, 2001, pp. 35-58. https://doi.org/10.1016/S0306-4379(01)00008-4.
  
  [39] Sudipto Guha, Rajeev Rastogi & Kyuseok Shim, â€œROCK: A Robust Clustering Algorithm for Categorical Attributesâ€, Information System, vol. 25, no. 5, 2000, pp. 512-521. https://doi.org/10.1016/S0306-4379(00)00022-3.
  
  [40] Szekely Gabor, J & Rizzo Maria, L, â€œHierarchical clustering via joint between-within distance: extending wards minimum variance methodâ€, Journal of Classification, vol. 22, 2005, pp. 151-183. https://doi.org/10.1007/s00357-005-0012-9.
  
  [41] Vijaya, PA, Narasimha Murty, M & Subramanian, DK, â€œLeader-Subleaders: An Efficient Hierarchical Clustering Algorithm for Large datasetsâ€, Pattern Recognition Letter, vol. 25, 2004, pp. 505-513. https://doi.org/10.1016/j.patrec.2003.12.013.
  
  [42] Ward, J, â€œHierarchical grouping to optimize an objective functionâ€, Journal of the American statistical association, vol. 58, 1963, pp. 236-244. https://doi.org/10.1080/01621459.1963.10500845.
  
  [43] Wei Zhang, Gongxuan Zhang, Yongli Wang, Zhaomeng Zhu & Tao Li, â€œNNB: An Efficient Nearest Neighbor Search Method for Hierarchical Clustering on Large Datasetsâ€, IEEE International Conference on Semantic Computing (ICSC), 2015, pp. 405-412. https://doi.org/10.1109/ICOSC.2015.7050840.
  
  [44] Zhang, T, Ramakrishnan, R & Livny, M, â€œBIRCH: an efficient data clustering method for very large databasesâ€, International Journal of Windom (Ed.) ACM SIGMOD International Conference on Management of data, 1996, pp. 103-114. https://doi.org/10.1145/233269.233324.
  
  [45] Krishnamoorthy, K & Sreedhar kumar, S, â€œAn Improved Agglomerative Clustering Algorithm for Outlier Detectionâ€, Applied Mathematics and Information Sciences, vol.10, no. 3, 2016, pp.1125-1138, doi.org/10.18576/amis/100332, https://doi.org/10.18576/amis/100332.
  
  [46] Madheswaran M & Sreedhar kumar, S, â€œAn Improved Frequency Based Agglomerative Clustering Algorithm for Detecting Distinct Clusters on Two Dimensional Dataset.â€, Applied Mathematics and Information Sciences, vol. 9, no. 4, 2017, pp. 30-41, 10.5897/JETR2017.0628, 2017.
  
  [47] https://en.wikipedia.org/wiki/Hierarchical_clustering.
  
  [48] https://en.wikipedia.org/wiki/Euclidean_distance.
  
  [49] https://en.wikipedia.org/wiki/Single-Linkage_clustering.
  
  [50] https://en.wikipedia.org/wiki/Complete-linkage_clustering.
  
  [51] https://en.wikipedia.org/wiki/UPGMA.
  
  [52] Sreedhar Kumar S, Madheswaran M, â€œAn improved Partitioned Clustering Technique for Identifying Optimum Number of Dissimilar Groups in Multimedia Dataset,â€ European Journal of Scientific Research, vol. 151, no. 1, pp. 5-21, 2018.
Downloads
How to Cite
Kumar S, S., M, M., B A, V., Singh H, M., & K V, C. (2019). A brief survey of unsupervised agglomerative hierarchical clustering schemes. International Journal of Engineering & Technology, 8(1), 29-37. https://doi.org/10.14419/ijet.v8i1.13971
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX

A brief survey of unsupervised agglomerative hierarchical clustering schemes

Authors

Abstract

References

Downloads

How to Cite

Published