An efficient and robust cluster based outlying points detection in multivariate data sets

  • Authors

    • S. Anitha
    • Dr. Mary Metilda
    https://doi.org/10.14419/ijet.v7i4.21505
  • Abstract

    Outlier is a data that does not match to the normal points along with the data set. A recent research focused on number of clusters and distance based outlier detection strategies. In this paper, outliers are identified and eliminated in four phases.Feature selection technique using genetic algorithm is applied to the pre processed data to reducedlarge amount of dataset into significant attributes. Data sets are partitioned as clusters after the feature selection process. Multiple outliers are identified by mahalanobis distance based onthe value of median and covariance matrix. Four real life data sets are taken from UCI machine learning repository and rigorous experiments are conducted by the proposed process of GBFS, CLOPD, and IMO for selecting the relevant subsets, clustering and Outliers removal. These three methods are analysed with data sets and results are depicted. It usedforreducing time complexity and improving clustering and classification accuracy.

  • References

    1. [1] Aggarwal, Charu C., and Philip S. Yu. "Outlier detection for high dimensional data." In ACM Sigmod Record, vol. 30, no. 2, pp. 37-46. ACM, 2001.

      [2] Anitha, S., and M. Mary Metilda. "A heuristic approach for observing outlying points in diabetes data set." In Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), 2017 IEEE International Conference on, pp. 199-202. IEEE, 2017.

      [3] Anitha.S, Mary Metilda, "A Survey on Cluster Based Outlier Detection Techniques in Data Stream", International Journal of Data Mining Techniques and Applications (IJDMTA), vol. 5(1) pp. 96-101, 2016https://doi.org/10.20894/IJDMTA.102.005.001.023.

      [4] Anusha, M., &Sathiaseelan, J. G. R. (2015). An improved K-means genetic algorithm for multiobjective optimization. International Journal of Applied Engineering Research, Special Issue, 10(1), 228–231.

      [5] Anusha, M., and J. G. R. Sathiaseelan. "An enhanced K-means genetic algorithms for optimal clustering." Computational Intelligence and Computing Research (ICCIC), 2014 IEEE International Conference on. IEEE, 2014.

      [6] Bello-Orgaz, G., & Camacho, D. (2014). Evolutionary clustering algorithm for community detection using graph-based information. In IEEE Congress on Evolutionary Computation (pp. 930–937).

      [7] Chatterjee, S., &Mukhopadhyay, A. (2013). Clustering ensemble: A multiobjective genetic algorithm based approach. Procedia Technology, 10, 443–449. https://doi.org/10.1016/j.protcy.2013.12.381.

      [8] Halkidi, Maria, YannisBatistakis, and Michalis Vazirgiannis. "Clustering validity checking methods: part II." ACM Sigmod Record 31.3 (2002): 19-27.https://doi.org/10.1145/601858.601862.

      [9] http://www.ics.uci.edu/mlearn/MLRepository.html.

      [10] Jing, L. P., Ng, M. K., & Huang, Z. X. (2007). An entropy weighting k-means algorithm for subspace clustering of high dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering, 19, 1026–1041.https://doi.org/10.1109/TKDE.2007.1048.

      [11] Mukhopadhyay, A., Maulik, U., &Bandyopadhyay, S. (2013). An interactive approach to multiobjective clustering of gene expression patterns. IEEE Transactions on Biomedical Engineering, 60(1), 35–41.https://doi.org/10.1109/TBME.2012.2220765.

      [12] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM Comput. Sure. 41(3), 2009.https://doi.org/10.1145/1541880.1541882.

      [13] Raja, P. Vishnu, and V. MuraliBhaskaran. "An effective genetic algorithm for outlier detection." International Journal of Computer Applications 38, no. 6 (2012): 30-33.

      [14] Hodge, Victoria, and Jim Austin. "A survey of outlier detection methodologies." Artificial intelligence review 22, no. 2 (2004): 85-126.https://doi.org/10.1023/B:AIRE.0000045502.10941.a9.

      [15] Cheng, Chun-Hung, Wing-Kin Lee, and Kam-Fai Wong. "A genetic algorithm-based clustering approach for database partitioning." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 32, no. 3 (2002): 215-230.https://doi.org/10.1109/TSMCC.2002.804444.

      [16] Beasley, David, David R. Bull, and Ralph Robert Martin. "An overview of genetic algorithms: Part 1, fundamentals." University computing 15, no. 2 (1993): 56-69.

      [17] Hawkins, Douglas M. Identification of outliers. Vol. 11. London: Chapman and Hall, 1980.https://doi.org/10.1007/978-94-015-3994-4.

      [18] Anirudha R C, Kannan R and Patil N, “Genetic algorithm based wrapper feature selection on hybrid prediction model for analysis of high dimensionaldataâ€, In IEEE 9th International Conference on Industrial and Information Systems (ICIIS), pp. 1-6, 2014.

      [19] Hadi, A.S., (1992), 'Identifying multiple outliers in multivariate data', Journal of the Royal Statistical Society. Series B (Methodological), Vol. 54, No. 3(1992), pp. 761-771

      [20] Anusha, M., and J. G. R. Sathiaseelan. "Evolutionary clustering algorithm using criterion-knowledge-ranking for multi-objective optimization." Wireless Personal Communications 94, no. 4 (2017): 2009-2030.https://doi.org/10.1007/s11277-016-3350-5.

      [21] Pachghare, V. K., Parag Kulkarni, and Deven M. Nikam. "Intrusion detection system using self-organizing maps." In Intelligent Agent & Multi-Agent Systems, 2009. IAMA 2009. International Conference on, pp. 1-5. IEEE, 2009.

      [22] Patole, Vivek A., V. K. Pachghare, and Parag Kulkarni. "Self-Organizing Maps to build intrusion detection systems." Journal of Computer Applications 1, no. 7 (2010).

      [23] Gen, Mitsuo, and Runwei Cheng. Genetic algorithms and engineering optimization. Vol. 7. John Wiley & Sons, 2000.

      [24] Ceglar, Aaron, John F. Roddick, and David MW Powers. "CURIO: A fast outlier and outlier cluster detection algorithm for large datasets." In Proceedings of the second international workshop on Integrating artificial intelligence and data mining-Volume 84, pp. 39-47. Australian Computer Society, Inc., 2007.

      [25] Acuna, Edgar, and Caroline Rodriguez. "A meta-analysis study of outlier detection methods in classification." Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez (2004): 1-25.

  • Downloads

  • How to Cite

    Anitha, S., & Metilda, D. M. (2018). An efficient and robust cluster based outlying points detection in multivariate data sets. International Journal of Engineering & Technology, 7(4), 2881-2885. https://doi.org/10.14419/ijet.v7i4.21505