Feature Selection using Genetic Algorithm for Clustering  high Dimensional Data

Kahkashan Kouser; Amrita Priyam

doi:10.14419/ijet.v7i2.11.11001

Article Summary Abstract References Full Article How to cite

Authors
- Kahkashan Kouser
- Amrita Priyam
2018-04-03

https://doi.org/10.14419/ijet.v7i2.11.11001
feature selection, clustering, high dimensional data, Genetic algorithm.
Abstract

One of the open problems of modern data mining is clustering high dimensional data. For this in the paper a new technique called GA-HDClustering is proposed, which works in two steps. First a GA-based feature selection algorithm is designed to determine the optimal feature subset; an optimal feature subset is consisting of important features of the entire data set next, a K-means algorithm is applied using the optimal feature subset to find the clusters. On the other hand, traditional K-means algorithm is applied on the full dimensional feature space.Â Â Â Finally, the result of GA-HDClusteringÂ is Â comparedÂ withÂ the Â traditional Â clusteringÂ algorithm.Â For comparison different validity Â matrices Â such Â as Â SumÂ of Â squared Â error Â (SSE), Â Within Â Group average distance (WGAD), Between group distance (BGD), Davies-Bouldin index(DBI),Â Â are used .The GA-HDClustering uses genetic algorithm for searching an effective feature subspace in a large feature space. This large feature space is made of all dimensions of the data set. The experiment performed on the standard data set revealed that the GA-HDClustering is superior to traditional clustering algorithm.
Â
References
1. [1] Sun, M., Xiong, L., Sun, H., & Jiang, D. (2009, October), A GA-based feature selection for high-dimensional data clustering. In 3rd International Conference on Genetic and Evolutionary Computing WGEC'09, pp. 769-772.
  [2] Sun, H. J., & Xiong, L. H. (2009, August), Genetic algorithm-based high-dimensional data clustering technique. In Sixth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD'09, Vol. 1, pp. 485-489.
  [3] Parsons, L., Haque, E., & Liu, H. (2004), Subspace clustering for high dimensional data: a review. Acm Sigkdd Explorations Newsletter 6, 90-105.
  [4] Alzubaidi, A., Cosma, G., Brown, D., & Pockley, A. G. (2016, October), Breast cancer diagnosis using a hybrid genetic algorithm for feature selection based on mutual information. In International Conference on Interactive Technologies and Games (iTAG), pp. 70-76.
  [5] Tiwari, R., & Singh, M. P. (2010), Correlation-based attribute selection using genetic algorithm. International Journal of Computer Applications 4, 28-34.
  [6] Li, J. (2015, December), A feature subset selection algorithm based on feature activity and improved GA. In 11th International Conference on Computational Intelligence and Security (CIS), pp. 206-210.
  [7] Chaimontree, S., Atkinson, K., & Coenen, F. (2010, November). Best clustering configuration metrics: towards multiagent based clustering. In International Conference on Advanced Data Mining and Applications (pp. 48-59). Springer, Berlin, Heidelberg.
  [8] David Bouldin Index, Available at: https://en.wikipedia.org/wiki/DavieBouldin_index
  [9] Hall, M. A. (1999). Correlation-based feature selection for machine learning.
  [10] Rostami, M., & Moradi, P. (2014, May), A clustering based genetic algorithm for feature selection. In 6th Conference on Information and Knowledge Technology (IKT), pp. 112-116.
  [11] Desale, K. S., & Ade, R. (2015, January), Genetic algorithm based feature selection approach for effective intrusion detection system. In International Conference on Computer Communication and Informatics (ICCCI), pp. 1-6.
  [12] Song, Q., Ni, J., & Wang, G. (2013), A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering 25, 1-14.
  [13] Chandrashekar, G., & Sahin, F. (2014), A survey on feature selection methods. Computers & Electrical Engineering 40, 16-28.
  [14] Goldberg, D. E. (1989), Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley.
  [15] Han, J., Pei, J., & Kamber, M. (2011), Data mining: concepts and techniques. Elsevier.
  [16] Dunham, M. H. (2006), Data mining: Introductory and advanced topics. Pearson Education India..
Downloads
How to Cite
Kouser, K., & Priyam, A. (2018). Feature Selection using Genetic Algorithm for Clustering high Dimensional Data. International Journal of Engineering & Technology, 7(2.11), 27-30. https://doi.org/10.14419/ijet.v7i2.11.11001
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: 2018-04-03

Accepted date: 2018-04-03

Published date: 2018-04-03

Feature Selection using Genetic Algorithm for Clustering high Dimensional Data

Authors

Abstract

References

Downloads

How to Cite

Published