Enhanced Feature Selection Clustering Algorithm for Attribute Similarity in High Dimensional Data

  • Authors

    • Deena Babu Mandru
    • Y. K. Sundara Krishna
    2018-11-26
    https://doi.org/10.14419/ijet.v7i4.29.21641
  • Multi attribute clustering, minimum spanning tree, feature selection, theoretic graph clustering and sub set of features.
  • Data collection is aggressive concept in data mining which is based on various attributes from dissimilar data sets. For some real world data, real time dataset portioning with abnormal behavioral class label instances is expensive and impossible to data presentation. Through user preferences, now a day’s data summarization based on clustering with different attributes is another aggressive concept. Traditionally clustering with multi-attribute framework was introduced to group multiple attributes to explore uncertain data for reliable data sets. In multi attribute similarity measure for uncertain data, feature selection is the factor to provide most matched and most useful features which produces compatible results from original set of features present in data sets. So feature selection algorithm is required to evaluate efficiency to form subset of features with respect to quality assurance for subset of features. In this paper, we proposed and implemented Enhanced Feature Selection based Clustering (EFSC) algorithm to evaluate above considerations. Our proposed method consists of two stages in implementation. In first stage, classify features into clusters using graph based theoretic approach. In second stage, identify most representative attribute which is most relate to selected attribute from each cluster to sub set of features. In this paper, we use Minimum spanning tree (MST) for effective clusters formation with respect to subset of features. EFSC is compare with some existing algorithms like FCBF, ReliefF, CFS, Consist, and FOCUS-SF with respect to chosen classifiers prior to and later than feature selection from subset of features. Our experimental results performed on company statistical data with text, image orientated data, and EFSC produces small subset of features with high accuracy and less time efficiency for real time data sets.

     

     

  • References

    1. [1] Qinbao Song, Jingjie Ni, and Guangtao Wang, “A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Dataâ€, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2013.

      [2] Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, and Chris Price, “A Link-Based Cluster Ensemble Approach for Categorical Data Clusteringâ€, In IEEE Transactions On Knowledge And Data Engineering, Vol. 24, No. 3, March 2012.

      [3] Gionis, H. Mannila, and P. Tsaparas, “Clustering Aggregation,†Proc. Int’l Conf. Data Eng. (ICDE), pp. 341-352, 2005.

      [4] N. Nguyen and R. Caruana, “Consensus Clusterings,†Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 607-612, 2007.

      [5] A.P. Topchy, A.K. Jain, and W.F. Punch, “Clustering Ensembles: Models of Consensus and Weak Partitions,†proceedings in IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.

      [6] T. Boongoen, Q. Shen, and C. Price, “Disclosing False Identity through Hybrid Link Analysis,†Artificial Intelligence and Law, vol. 18, no. 1, pp. 77-102, 2010.

      [7] F. Fouss, A. Pirotte, J.M. Renders, and M. Saerens, “Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation,†in IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, pp. 355-369, Mar. 2007.

      [8] P. Chanda, Y. Cho, A. Zhang, and M. Ramanathan, “Mining of Attribute Interactions Using Information Theoretic Metrics,†Proc. IEEE Int’l Conf. Data Mining Workshops, pp. 350-355, 2009.

      [9] S. Chikhi and S. Benhammada, ,“ReliefMSS: A Variation on a Feature Ranking Relieff Algorithm,†proceedings in Int’l J. Business Intelligence and Data Mining, vol. 4, nos. 3/4, pp. 375-390, 2009.

      [10] J. Demsar, in J. “Statistical Comparison of Classifiers over Multiple Data Sets,†Machine Learning Res., vol. 7, pp. 1-30, 2006.

      [11] G. Van Dijck and M.M. Van Hulle, ,“Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis,†in Proc. Int’l Conf. Artificial Neural Networks, 2006.

      [12] C. Sha, X. Qiu, and A. Zhou, “Feature Selection Based on a New Dependency Measure,†Proc. Fifth Int’l Conf. Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 266-270, 2008.

      [13] Z. Zhao and H. Liu, “Searching for Interacting Features,†in Proc. 20th Int’l Joint Conf. Artificial Intelligence, 2007.

      [14] Z. Zhao and H. Liu, J. “Searching for Interacting Features in Subset Selection,†Intelligent Data Analysis, vol. 13, no. 2, pp. 207-228, 2009.

      [15] S. Das, “Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection,†Proc. 18th Int’l Conf. Machine Learning, pp. 74-81, 2001.

      [16] M. Dash and H. Liu, “Consistency-Based Search in Feature Selection,†Artificial Intelligence, vol. 151, nos. 1/2, pp. 155-176, 2003.

      [17] J. Demsar, “Statistical Comparison of Classifiers over Multiple Data Sets,†J. Machine Learning Res., vol. 7, pp. 1-30, 2006.

      [18] I.S. Dhillon, S. Mallela, and R. Kumar, “A Divisive Information Theoretic Feature Clustering Algorithm for Text Classification,â€J. Machine Learning Research, vol. 3, pp. 1265-1287, 2003.

      [19] E.R. Dougherty, “Small Sample Issues for Microarray-Based Classification,†Comparative and Functional Genomics, vol. 2, no. 1,pp. 28-34, 2001.

      [20] U. Fayyad and K. Irani, “Multi-Interval Discretization of Continuous- Valued Attributes for Classification Learning,†Proc. 13th Int’l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.

      [21] D.H. Fisher, L. Xu, and N. Zard, “Ordering Effects in Clustering,†Proc. Ninth Int’l Workshop Machine Learning, pp. 162-168, 1992.

      [22] F. Fleuret, “Fast Binary Feature Selection with Conditional Mutual Information,†J. Machine Learning Research, vol. 5, pp. 1531-1555,2004.

  • Downloads

  • How to Cite

    Babu Mandru, D., & K. Sundara Krishna, Y. (2018). Enhanced Feature Selection Clustering Algorithm for Attribute Similarity in High Dimensional Data. International Journal of Engineering & Technology, 7(3.29), 688-693. https://doi.org/10.14419/ijet.v7i4.29.21641