Investigation on Processing of Real-Time Streaming Big Data

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    MapReduce is the most widely used for huge data processing and it is a part of the Hadoop big data and this will provide the quality and efficient results because of their processing functions. For the batch jobs, Hadoop is the proper and also there is inflated request for non-batch elements homogeneous interactive jobs, and high data currents. For this non-batch assignments, consider Hadoop is not useful and present situations are recommending to these new crises. In this paper, these are divided into two stages that are real-time processing, and stream processing of big data. For every stage, the models are deliberate, stability and diversity to Hadoop. For every group, we have provided the working systems and structures. For the creation of the new examples, some experiments are conducted to improve the new results belongs to available Hadoop-based solutions.    



  • Keywords

    MapReduce is the most widely used for huge data processing and it is a part of the Hadoop big data and this will provide the quality and efficient results because of their processing functions. For the batch jobs, Hadoop is the proper and also there is in

  • References

      [1] Jacobs, A. The pathologies of big data. ACM Commun. 2009, 52, 36–44.

      [2] Wu, X.; Zhu, X.; Wu, G.-Q.; Ding, W. Data mining with big data. Knowl. IEEE Trans. Data Eng. 2014, 26, 97–107.

      [3] Fernández, A.; del Río, S.; López, V.; Bawakid, A.; del Jesus, M.J.; Benítez, J.M.; Herrera, F. Big Data with Cloud Computing: An Insight on the Computing Environment, MapReduce and Programming Frameworks. WIREs Data Min. Knowl. Discov. 2014, doi:10.1002/widm.1134.

      [4] Dean, J.; Ghemawat, S. MapReduce: A flexible data processing tool. ACM Commun. 2010, 53, 72–77.

      [5] Ghemawat, S.; Gobioff, H.; Leung, S.-T. The Google file system. ACM SIGOPS Oper. Syst. Rev. 2003, 37, 29–43.

      [6] Chang, F.; Dean, J.; Ghemawat, S.; Hsieh, W.C.; Wallach, D.A.; Burrows, M.; Chandra, T.; Fikes, A.; Gruber, R.E. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 2008, 26, doi:10.1145/1365815.1365816.

      [7] White, T. Hadoop: The Definitive Guide; O’Reilly Media: Sebastopol, CA, USA, 2012.

      [8] Agneeswaran, V. Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives; Pearson FT Press: Upper Saddle River, NJ, USA, 2014.

      [9] Kambatla, K.; Kollias, G.; Kumar, V.; Grama, A. Trends in big data analytics. J. Parallel Distrib. Comput. 2014, 74, 2561–2573.

      [10] Isard, M.; Budiu, M.; Yu, Y.; Birrell, A.; Fetterly, D. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev. 2007, 41, 59–72.

      [11] Dean, J.; Ghemawat, S. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Opearting Systems Design & Implementation, San Francisco, CA, USA, 6–8 December 2004.

      [12] Ekanayake, J.; Li, H.; Zhang, B.; Gunarathne, T.; Bae, S.-H.; Qiu, J.; Fox, G. Twister: A runtime for iterative MapReduce. In Proceedings of the 19th ACM International Symposium on High. Performance Distributed Computing, Chicago, IL, USA, 20–25 June 2010; pp. 810–818.

      [13] Bu, Y.; Howe, B.; Balazinska, M.; Ernst, M.D. HaLoop: Efficient iterative data processing on large clusters. Proc. VLDB Endow. 2010, 3, 285–296.

      [14] Vavilapalli, V.K.; Murthy, A.C.; Douglas, C.; Agarwal, S.; Konar, M.; Evans, R.; Graves, T.; Lowe, J.; Shah, H.; Seth, S.; et al. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing; Santa Clara, CA, USA, 1–3 October 2013; p. 5.

      [15] Chambers, C.; Raniwala, A.; Perry, F.; Adams, S.; Henry, R.R.; Bradshaw, R.; Weizenbaum, N. FlumeJava: Easy, efficient data-parallel pipelines. ACM Sigplan Not. 2010, 45, 363–375.

      [16] Yoo, R.M.; Romano, A.; Kozyrakis, C. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In Proceedings of IEEE International Symposium on Workload Characterization, Austin, TX, USA, 4–6 October 2009; pp. 198–207.

      [17] Fang, W.; He, B.; Luo, Q.; Govindaraju, N.K. Mars: Accelerating MapReduce with Graphics Processors. IEEE Trans. Parallel Distrib. Syst. 2010, 22, 608–620.

      [18] Cheatham, T.; Fahmy, A.; Stefanescu, D.C.; Valiant, L.G. Bulk synchronous parallel computing—A paradigm for transportable software. In Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences, Wailea, HI, USA, 3–6 January 1995; pp. 268–275.

      [19] Malewicz, G.; Austern, M.H.; Bik, A.J.C.; Dehnert, J.C.; Horn, I.; Leiser, N.; Czajkowski, G. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 International Conference on Management of Data, Indianapolis, Indiana, USA, 6–11 June 2010; pp. 135–146.

      [20] Melnik, S.; Gubarev, A.; Long, J.J.; Romer, G.; Shivakumar, S.; Tolton, M.; Vassilakis, T. Dremel: Interactive Analysis of Web-scale Datasets. Proc. VLDB Endow. 2010, 3, 330–339.




Article ID: 16329
DOI: 10.14419/ijet.v7i3.13.16329

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.