Data analysis using representation theory and clustering algorithms

  • Authors

    • Suboh Alkhushayni Minnesota State University Mankato
    • Taeyoung Choi Minnesota State University Mankato
    • Du'a Alzaleq Minnesota State University Mankato
    2020-12-18
    https://doi.org/10.14419/ijet.v9i4.31234
  • Representation Theory, Data Analysis, Persistence Homology, Agglomerative Hierarchical Clustering, K-Means, Cosine Distance, Manhattan Distance, Minkowski Distance, Single Cluster, Complete Cluster, Average Cluster.
  • Abstract

    This work aims to expand the knowledge of the area of data analysis through persistence homology and representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical dataset. We tried to find the agglomerative hierarchical clustering method by testing the data that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods.

     

    By comparing each model's accuracy value with cultivar coefficients, we concluded that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with Random Forest are the most beneficial approaches when using categorical variables. These tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators for the business area.

  • References

    1. [1] G. Carlsson, "Topology and data". In: Bulletin of the American Mathematical Society 46.2 (2009) pp. 255-308. https://doi.org/10.1090/S0273-0979-09-01249-X.

      [2] F. Chaze, V. de Silva, M. Glisse, and S.Y. Oudot. The Structure an stability of persistence modules. Research Report arXiv:1207.3674 [math.AT] To appear as volume of SpringerBriefs in Mathematics. 2012.

      [3] K. Meeham, D. Meyer. "Interleaving distance as a limit". arXiv:1710.11489v1 [math.AT] 2017.

      [4] K. Meeham, D. Meyer. "An isometry theorem for generalized persistence modules". arXiv:1710.02858v1 [math.AT] 2017.

      [5] S.Y. Oudot. Persistence Thoery: From Quiver Representations to Data Analysis. American Mathematical Society, 2015. https://doi.org/10.1090/surv/209.

      [6] Gao, Jing. "Clustering Lecture 3: Hierarchical Methods." Clustering Lecture 3: Hierarchical Methods, 2019, cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_hierarchical.pdf.

      [7] Topological Data Analysis gen_feedback_link (leftr, rightr); (2016, July 25). Retrieved from https://researcher.watson.ibm.com/researcher/view_group.php?id=6585.

      [8] Alaa, H. N., & Mohamed, S. A. (2017, July 24). On the Topological Data Analysis extensions and comparisons. Retrieved from https://www.sciencedirect.com/science/article/pii/S1110256X17300433.

      [9] Herbert Edelsbrunner and John Harer. Persistent Homology – a Survey [PDF fiwle]. Retrieved from https://www.maths.ed.ac.uk/~v1ranick/papers/edelhare.pdf.

      [10] Herbert Edelsbrunner∗ and Dmitriy Morozov†. Persistent Homology: Theory and Practice. Retrieved from https://pdfs.semanticscholar.org/cf6d/43b39d66a6c3f061afeb73327312ca9cc4cb.pdf.

      [11] Peter Bubenik University of Florida Department of Mathematics. Topology for Data Science 1: An Introduction to Topological Data Analysis. https://people.clas.ufl.edu/peterbubenik/files/abacus_1.pdf.

      [12] Peter Bubenik, Department of Mathematics Cleveland State University. Statistical Topological Data Analysis using Persistence Landscapes. http://www.jmlr.org/papers/volume16/bubenik15a/bubenik15a.pdf.

      [13] Anon, (2019). [online] Available at: https://www.quora.com/What-are-the-most-relevant-findings-and-limitations-of-Topological-Data-Analysis [Accessed 25 Oct. 2019].

      [14] k-Means Advantages and Disadvantages Clustering in Machine Learning. (n.d.). Retrieved from https://developers.google.com/machine-learning/clustering/algorithm/advantages-disadvantages.

      [15] Marina Santini, Department of Linguistics and Philology Uppsals University, Advantages & Disadvantages of K-Means and Hierarchical clustering (2016), retrieved from http://santini.se/teaching/ml/2016/Lect_10/10c_UnsupervisedMethods.pdf.

      [16] What are the Strengths and Weaknesses of Hierarchical Clustering? (n.d.). Retrieved from https://www.displayr.com/strengths-weaknesses-hierarchical-clustering/.

      [17] Hierarchical clustering algorithm - Data Clustering Algorithms. (n.d.). Retrieved from https://sites.google.com/site/dataclusteringalgorithms/hierarchical-clustering-algorithm.

      [18] K-Means Advantages and Disadvantages | Clustering in Machine Learning. (n.d.). Retrieved from https://developers.google.com/machine-learning/clustering/algorithm/advantages-disadvantages.

      [19] Marina Santini, Department of Linguistics and Philology Uppsals University, Advantages & Disadvantages of K-Means and Hierarchical clustering (2016), retrieved from http://santini.se/teaching/ml/2016/Lect_10/10c_UnsupervisedMethods.pdf.

      [20] Unknown. (1970, January 1). K-Means Clustering Advantages and Disadvantages. Retrieved from http://playwidtech.blogspot.com/2013/02/k-means-clustering-advantages-and.html.

      [21] Keppel, J., & Schmalz, S. (2017, November 27). Anomaly Detection: (Dis-)advantages of k-means clustering - inovex-Blog. Retrieved from https://www.inovex.de/blog/disadvantages-of-k-means-clustering/.

      [22] B. Rieck1,2 and H. Leitte1, exploring and comparing clustering's of multivariate data sets using persistent homology, file:///E:/2019%20Proejct/reasearch3.pdf.

  • Downloads

  • How to Cite

    Alkhushayni, S., Choi, T., & Alzaleq, D. (2020). Data analysis using representation theory and clustering algorithms. International Journal of Engineering & Technology, 9(4), 887-899. https://doi.org/10.14419/ijet.v9i4.31234

    Received date: 2020-10-25

    Accepted date: 2020-11-24

    Published date: 2020-12-18