Study With Comparing Big-Data Handling Techniques using Apache Hadoop Map Reduce VS Apache Spark
-
2019-02-15 https://doi.org/10.14419/ijet.v7i4.1.15997 -
Big Data, Hadoop, Map Reduce, Spark . -
Abstract
Current digital world face trouble with massive information, again it made a demand for latest and advanced software frameworks for efficiently processing present world large data. Because digital world information is double rapidly, generally but existing and traditional tools for Big Data (BD) are becoming insufficient since enormous data processing towards to distributed, parallel, and group (Batch). Main essential thing is to evaluate tools and technologies, one important thing must follow the understanding of what to evaluate for. Even growing multiple options the intention of choosing Big Data functions for the digital world will be difficult. In the existing tools had merits, disadvantages and lack of many limitations but many had an overlapping custom. This survey looks at the major attention on BD the basic area is associated with analytics tools. In the current digital world (DW), exactly every computation perform on online as interactive processing also introduce apache free access tool to overcome restrictions and issues in Hadoop by Apache open Spark.
Â
-
References
[1] BingbingRao, liqiang Wang,, “A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computingâ€,3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress, (IEEE), (2017), pp.81-88.
[2] S Agarwal, S Kandula, N Bruno, M C Wu, I Stoica, J Zhou,†Re-optimizing data-parallel computingâ€, In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, (2012),
[3] J Ahrens, B Hendrickson, G Long, S Miller, R Ross, D Williams,†Data-intensive science in the us doe: case studies and future challengesâ€, Computing in Science & Engineering,(2011).https://doi.org/10.1109/MCSE.2011.77.
[4] A Alexandrov, R Bergmann, S Ewen, J C Freytag, F Hueske, A Heise, O Kao, M Leich, U Leser, V Markl,†The stratosphere platform for big data analyticsâ€, The VLDB Journal, (2014).https://doi.org/10.1007/s00778-014-0357-y.
[5] M Armbrust, R S Xin, C Lian, Y Huai, D Liu, J K Bradley, X Meng, T Kaftan, M J Franklin, A Ghodsi,†Spark sql: Relational data processing in sparkâ€, In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM, (2015), pp.1383-1394. https://doi.org/10.1145/2723372.2742797.
[6] Sara Landset, Taghi M Khoshgoftaar, Aaron N Richter, TawfiqHasanin, “A survey of open source tools for machine learning with big data in the Hadoop ecosystemâ€, Journal of Big Data, (2015),pp.1-36.
[7] Jai PrakashVerma, Bankim Patel, Atul Patel, “Big Data Analysis: Recommendation System with Hadoop Frameworkâ€, IEEE International Conference on Computational Intelligence & Communication Technology, (2015), pp.92-96.https://doi.org/10.1109/CICT.2015.86.
[8] YavuzCanbay, serefsagiroglu,†Big data anonymization with sparkâ€, Diego GarcÃa Gil, Sergio RamÃrezGallego, Salvador GarcÃa, Francisco Herrera,†A comparison on scalability for batch big data processing on Apache Spark and Apache Flinkâ€, Big Data Analytics, (2017), pp.1-12.
[9] Amir Bahmani, Alexander B Sibley, Mahmoud Parsian, KourosOwzar, Frank Mueller,â€SparkScore: Leveraging Apache Spark for Distributed Genomic Inferenceâ€, International Parallel and Distributed Processing Symposium Workshops (IPDPSW)Chicago, IL, USA, IEEE, (2016), pp.435-442.
[10] Jian Fu, Junwei Sun, Kaiyuan Wang,â€SPARK—A Big Data Processing Platform for Machine Learningâ€,International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration, IEEE, ( 2016), pp.48-51.
[11] AsmelashTekaHadgu, Aastha Nigam, Ernesto Diaz Aviles,†Large-scale learning with AdaGrad on Sparkâ€, 2015 IEEE International Conference on, Santa Clara CA, IEEE, (2015), pp. 2828-2830.
[12] Hang Tao, Bin Wu, Xiuqin Lin, Budgeted mini-batch parallel gradient descent for support vector machines on Spark, In 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, (2014), pp. 945-950.
[13] SauptikDhar, Congrui Yi, Naveen Ramakrishnan, Mohak Shah,ADMM based Scalable Machine Learning on Spark, in Big Data(Big Data), 2015 IEEE International Conference on, Santa Clara CA, (2015), pp. 1174-1182.
[14] Zhijie Han, Yujie Zhang, “A Big Data Processing Platform Based on Memory Computing, in Parallel Architectures, Algorithms and Programming (PAAP)â€, 2015 Seventh International Symposium on,Nanjing, (2015), pp. 172-176.
[15] E.Dede, B.Sendir, P.Kuzlu, J Weachock, M Govindaraju, L Ramakrishnan, “Processing Cassandra Datasets with Hadoop -Streaming Based Approachesâ€, IEEE Transactions on Services Computing, (2015), pp. 46-58.
[16] N.Deshai, G.P.S.Varma, S.V.Ramana, “A study on analytical framework to breakdown conditions among data quality measurements†in International Journal of Engineering & Technology, Vol 7(1.1), pp: 167-172, 2018.
[17] N.Deshai, S.Venkataramana, I.Hemalatha, G.P.S.Varma, “A Study on Big Data Hadoop Map Reduce Job Schedulingâ€, International Journal of Engineering & Technology, Vol 7(3.31), pp: 59-65, 2017.
[18] N.Deshai, P. Swamy, G.P.S.Varma, “Big Data Challenges and Analytics Processing Over health Prescriptionsâ€, Jouonal of Advance Research in Dynamical & Control Systems, 15-Special Issue Vol 7(3.31), pp: 650-657, Oct’2017.
-
Downloads
-
How to Cite
Deshai, N., V.D.S.Sekhar, B., Venkataramana, S., V.S.S.S.Chakravarthy, V., & S.R.Chowdary, P. (2019). Study With Comparing Big-Data Handling Techniques using Apache Hadoop Map Reduce VS Apache Spark. International Journal of Engineering & Technology, 7(4), 4839-4843. https://doi.org/10.14419/ijet.v7i4.1.15997Received date: 2018-07-22
Accepted date: 2018-09-06
Published date: 2019-02-15