Migrating From Data Mining to Big Data Mining

  • Authors

    • Gourav Bathla
    • Himanshu Aggarwal
    • Rinkle Rani
    2018-06-25
    https://doi.org/10.14419/ijet.v7i3.4.14667
  • Clustering, Classification, Big Data, Hadoop, MapReduce
  • Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with  pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples.  Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.

     

     

  • References

    1. [1] Wu X, Kumar V, Quinian JR, Ghosh J,Yang Q, Motoda H, McLachlan GJ Liu B, Yu PS , Zhou Z, Steinbach M, Hand DJ and Steinberg D(2007) , Top 10 algorithms in data mining, Knowledge and Information Systems, vol. 14, no.1, pp. 1-37.

      [2] Demidova L, Nikulchev E and Sokolova Y(2016), Big Data Classification Using The SVM Classifiers With The Modified Particle Swarm Optimization And The SVM Ensembles, International Journal of Advanced Computer Science and Applications (IJACSA), vol.7, no. 5, pp.294-312.

      [3] He Q, Zhuang F, Li J and Sh Z (2010), Parallel implementation of classification algorithms based on MapReduce, In International Conference on Rough Sets and Knowledge Technology, pp. 655-662, Springer.

      [4] Dean J and Ghemawat S (2008), MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, vol. 51, no.1, pp. 107-113.

      [5] Liu B, Blasch E, Chen Y, Shen D and Chen G (2013), Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier ,IEEE Conference on Big Data.

      [6] Zhao W, Ma H and He Q (2009), Parallel K-Means Clustering Based on MapReduce, in Cloud Com LNCS 5931, pp. 674-679.

      [7] Owen S and Owen S (2012), Mahout in action.

      [8] Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R and Wu AY (2002), An efficient k-means clustering algorithm: Analysis and implementation, IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp.881-892.

      [9] Cai X, Nie F and Huang H (2013), Multi-View K-Means Clustering on Big Data, IJCAI, pp. 2598-2604.

      [10] Cui X, Zhu P, Yang X, Li K and Ji C (2014), Optimized big data K-means clustering using MapReduce, The Journal of Supercomputing, vol. 70, no. 3, pp.1249-1259.

      [11] Vapnik VN (1995), Editor,The Nature of Statistical Learning Theory, Springer- Verlag.

      [12] Tong S and Koller D (2001), Support Vector Machine Active Learning with Applications to Text Classification, Journal of Machine Learning Research, pp. 45-66.

      [13] Priyadarshini A and Agarwal S (2015), A Map Reduce based Support Vector Machine for Big Data Classification, IJDTA, vol. 8 no. 5, pp. 77-98.

      [14] Hearst MA, Dumais ST, Osuna E, Platt J and Scholkopf B (1998), Support vector machines, IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp.18-28.

      [15] Cervantes J, Li X, Yu W and Li K (2008), Support vector machine classification for large data sets via minimum enclosing ball clustering, Neurocomputing, vol. 71, no. 4-6, pp.611-619.

  • Downloads

  • How to Cite

    Bathla, G., Aggarwal, H., & Rani, R. (2018). Migrating From Data Mining to Big Data Mining. International Journal of Engineering & Technology, 7(3.4), 13-16. https://doi.org/10.14419/ijet.v7i3.4.14667