Quasi-optimality under pseudo f statistic in clustering data
-
2018-05-16 https://doi.org/10.14419/ijet.v7i2.28.13205 -
Clustering, Difference, Pseudo F Statistic, Quasi-Optimum, Relative Difference. -
Abstract
Pseudo F statistic is often used in deciding the number of clusters. A set of clusters having the largest pseudo F value is selected as the op-timum set of clusters. This paper proposes the quasi-optimum set of clusters, whose pseudo F value is larger than those of other sets of clusters, whose numbers are around the number of clusters in the quasi-optimum set. The before and behind (BB) difference of pseudo F values is proposed to find the number of clusters in the quasi-optimum set. The relative BB difference of pseudo F values, which is the ratio of the BB difference of pseudo F values to the pseudo F value itself, is also proposed to find it when the pseudo F value severely varies. This paper shows some examples to demonstrate that the BB differences of pseudo F values and the relative ones work well in finding qua-si-optimum sets of clusters.
Â
-
References
[1] Wikimedia Foundation, “Wikipedia,†https://en.wikipedia. org/.
[2] Yahoo Group, “Yahoo! Answers,†https://answers.yahoo. com/.
[3] S. Sagiroglu and D. Sinanc, “Big Data: A Review,†International Conference on Collaboration Technologies and Systems, (2013), pp. 42-47.
[4] D. Agrawal, S. Das, and A. E. Abbadi, “Big Data and Cloud Computing: Current State and Future Opportunities,†Proceedins of the 14th International Conference on Extending Database Technology (EDBT/ICDT '11), (2011), pp. 530-533.
[5] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of Things (IoT): A vision, architectural elements, and future directions,†Future Generation Computer Systems, Vol. 29, No. 7, (2013), pp. 1645-1660.
[6] I. Lee and K. Lee, “The Internet of Things (IoT): Applications, investments, and challenges for enterprises,†Business Horizons, Vol. 58, No. 4, (2015), pp. 431-440.
[7] W. He and L. Xu, “A state-of-the-art survey of cloud manufacturing,†International Journal of Computer Integrated Manufacturing, Vol. 28, No. 3, (2015), pp. 239-250.
[8] S. Marsland, Machine Learning, Chapman & Hall/CRC, (2015).
[9] N. Zumel and J. Mount, Practical Data Science with R, MANNING, (2014).
[10] D. Pelleg, “X-means: Extending K-means with Efficient Estimation of the Number of Clusters,†Proceedings of the 17th International Conference on Machine Learning (ICML '00), (2000), pp. 727-734.
[11] U. Maulik and S. Bandyopadhyay, “Performance evaluation of some clustering algorithms and validity indices,†IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 12, (2002), pp. 1650-1654.
[12] [12] L. Wilkinson, L. Engelman, J. Corter, and M. Coward, “Cluster Analysis,†http://cda.psych.uiuc.edu/multivariate_fall_2012/systat_ cluster_ manual.pdf (Accessed on Dec. 22, 2017).
[13] T. Calinski, and J. Harabasz, “A dendrite method for cluster analysis,†Communications in Statistics, vol. 3, (1074), pp. 1-27.
[14] The Data and Story Library, http://lib.stat.cmu.edu/DASL/ Datafiles/Protein.html.
[15] Dept. of Electronics, Information and Bioengineering, Polytechnic University of Milan, “Fuzzy C-Means Clustering,†https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/ cmeans.html (Accessed on Dec. 22, 2017).
[16] J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters,†Journal of Cybernetics, Vol. 3, (1973), pp. 32-57.
[17] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algoritms,†Plenum Press, (1981).
-
Downloads
-
How to Cite
Hochin, T., Hayashi, Y., Nomiya, H., & U. Chowdhury, M. (2018). Quasi-optimality under pseudo f statistic in clustering data. International Journal of Engineering & Technology, 7(2.28), 320-324. https://doi.org/10.14419/ijet.v7i2.28.13205Received date: 2018-05-23
Accepted date: 2018-05-23
Published date: 2018-05-16