Enhanced Feature Selection Clustering Algorithm for Attribute Similarity in High Dimensional Data

Deena Babu Mandru; Y. K. Sundara Krishna

doi:10.14419/ijet.v7i4.29.21641

Article Summary Abstract References Full Article How to cite

Authors
- Deena Babu Mandru
- Y. K. Sundara Krishna
How to Cite

Babu Mandru, D., & K. Sundara Krishna, Y. (2018). Enhanced Feature Selection Clustering Algorithm for Attribute Similarity in High Dimensional Data. International Journal of Engineering and Technology, 7(3.29), 688-693. https://doi.org/10.14419/ijet.v7i4.29.21641

ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: November 26, 2018

Accepted date: November 26, 2018

Published date: November 26, 2018
https://doi.org/10.14419/ijet.v7i4.29.21641
Multi attribute clustering, minimum spanning tree, feature selection, theoretic graph clustering and sub set of features.
Abstract

Data collection is aggressive concept in data mining which is based on various attributes from dissimilar data sets. For some real world data, real time dataset portioning with abnormal behavioral class label instances is expensive and impossible to data presentation. Through user preferences, now a dayâ€™s data summarization based on clustering with different attributes is another aggressive concept. Traditionally clustering with multi-attribute framework was introduced to group multiple attributes to explore uncertain data for reliable data sets. In multi attribute similarity measure for uncertain data, feature selection is the factor to provide most matched and most useful features which produces compatible results from original set of features present in data sets. So feature selection algorithm is required to evaluate efficiency to form subset of features with respect to quality assurance for subset of features. In this paper, we proposed and implemented Enhanced Feature Selection based Clustering (EFSC) algorithm to evaluate above considerations. Our proposed method consists of two stages in implementation. In first stage, classify features into clusters using graph based theoretic approach. In second stage, identify most representative attribute which is most relate to selected attribute from each cluster to sub set of features. In this paper, we use Minimum spanning tree (MST) for effective clusters formation with respect to subset of features. EFSC is compare with some existing algorithms like FCBF, ReliefF, CFS, Consist, and FOCUS-SF with respect to chosen classifiers prior to and later than feature selection from subset of features. Our experimental results performed on company statistical data with text, image orientated data, and EFSC produces small subset of features with high accuracy and less time efficiency for real time data sets.
Â
Â
References
1. [1] Qinbao Song, Jingjie Ni, and Guangtao Wang, â€œA Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Dataâ€, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2013.
  [2] Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, and Chris Price, â€œA Link-Based Cluster Ensemble Approach for Categorical Data Clusteringâ€, In IEEE Transactions On Knowledge And Data Engineering, Vol. 24, No. 3, March 2012.
  [3] Gionis, H. Mannila, and P. Tsaparas, â€œClustering Aggregation,â€ Proc. Intâ€™l Conf. Data Eng. (ICDE), pp. 341-352, 2005.
  [4] N. Nguyen and R. Caruana, â€œConsensus Clusterings,â€ Proc. IEEE Intâ€™l Conf. Data Mining (ICDM), pp. 607-612, 2007.
  [5] A.P. Topchy, A.K. Jain, and W.F. Punch, â€œClustering Ensembles: Models of Consensus and Weak Partitions,â€ proceedings in IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
  [6] T. Boongoen, Q. Shen, and C. Price, â€œDisclosing False Identity through Hybrid Link Analysis,â€ Artificial Intelligence and Law, vol. 18, no. 1, pp. 77-102, 2010.
  [7] F. Fouss, A. Pirotte, J.M. Renders, and M. Saerens, â€œRandom-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation,â€ in IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, pp. 355-369, Mar. 2007.
  [8] P. Chanda, Y. Cho, A. Zhang, and M. Ramanathan, â€œMining of Attribute Interactions Using Information Theoretic Metrics,â€ Proc. IEEE Intâ€™l Conf. Data Mining Workshops, pp. 350-355, 2009.
  [9] S. Chikhi and S. Benhammada, ,â€œReliefMSS: A Variation on a Feature Ranking Relieff Algorithm,â€ proceedings in Intâ€™l J. Business Intelligence and Data Mining, vol. 4, nos. 3/4, pp. 375-390, 2009.
  [10] J. Demsar, in J. â€œStatistical Comparison of Classifiers over Multiple Data Sets,â€ Machine Learning Res., vol. 7, pp. 1-30, 2006.
  [11] G. Van Dijck and M.M. Van Hulle, ,â€œSpeeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis,â€ in Proc. Intâ€™l Conf. Artificial Neural Networks, 2006.
  [12] C. Sha, X. Qiu, and A. Zhou, â€œFeature Selection Based on a New Dependency Measure,â€ Proc. Fifth Intâ€™l Conf. Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 266-270, 2008.
  [13] Z. Zhao and H. Liu, â€œSearching for Interacting Features,â€ in Proc. 20th Intâ€™l Joint Conf. Artificial Intelligence, 2007.
  [14] Z. Zhao and H. Liu, J. â€œSearching for Interacting Features in Subset Selection,â€ Intelligent Data Analysis, vol. 13, no. 2, pp. 207-228, 2009.
  [15] S. Das, â€œFilters, Wrappers and a Boosting-Based Hybrid for Feature Selection,â€ Proc. 18th Intâ€™l Conf. Machine Learning, pp. 74-81, 2001.
  [16] M. Dash and H. Liu, â€œConsistency-Based Search in Feature Selection,â€ Artificial Intelligence, vol. 151, nos. 1/2, pp. 155-176, 2003.
  [17] J. Demsar, â€œStatistical Comparison of Classifiers over Multiple Data Sets,â€ J. Machine Learning Res., vol. 7, pp. 1-30, 2006.
  [18] I.S. Dhillon, S. Mallela, and R. Kumar, â€œA Divisive Information Theoretic Feature Clustering Algorithm for Text Classification,â€J. Machine Learning Research, vol. 3, pp. 1265-1287, 2003.
  [19] E.R. Dougherty, â€œSmall Sample Issues for Microarray-Based Classification,â€ Comparative and Functional Genomics, vol. 2, no. 1,pp. 28-34, 2001.
  [20] U. Fayyad and K. Irani, â€œMulti-Interval Discretization of Continuous- Valued Attributes for Classification Learning,â€ Proc. 13th Intâ€™l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
  [21] D.H. Fisher, L. Xu, and N. Zard, â€œOrdering Effects in Clustering,â€ Proc. Ninth Intâ€™l Workshop Machine Learning, pp. 162-168, 1992.
  [22] F. Fleuret, â€œFast Binary Feature Selection with Conditional Mutual Information,â€ J. Machine Learning Research, vol. 5, pp. 1531-1555,2004.
Downloads
How to Cite
Babu Mandru, D., & K. Sundara Krishna, Y. (2018). Enhanced Feature Selection Clustering Algorithm for Attribute Similarity in High Dimensional Data. International Journal of Engineering and Technology, 7(3.29), 688-693. https://doi.org/10.14419/ijet.v7i4.29.21641
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: November 26, 2018

Accepted date: November 26, 2018

Published date: November 26, 2018

Enhanced Feature Selection Clustering Algorithm for Attribute Similarity in High Dimensional Data

Authors

How to Cite

Abstract

References

Downloads

How to Cite

Published