Applying compression algorithms on hadoop cluster implementing through apache tez and hadoop mapreduce

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    The latest and famous subject all over the cloud research area is Big Data; its main appearances are volume, velocity and variety. The characteristics are difficult to manage through traditional software and their various available methodologies. To manage the data which is occurring from various domains of big data are handled through Hadoop, which is open framework software which is mainly developed to provide solutions. Handling of big data analytics is done through Hadoop Map Reduce framework and it is the key engine of hadoop cluster and it is extensively used in these days. It uses batch processing system.

    Apache developed an engine named "Tez", which supports interactive query system and it won't writes any temporary data into the Hadoop Distributed File System(HDFS).The paper mainly focuses on performance juxtaposition of MapReduce and TeZ, performance of these two engines are examined through the compression of input files and map output files. To compare two engines we used Bzip compression algorithm for the input files and snappy for the map out files. Word Count and Terasort gauge are used on our experiments. For the Word Count gauge, the results shown that Tez engine has better execution time than Hadoop MapReduce engine for the both compressed and non-compressed data. It has reduced the execution time nearly 39% comparing to the execution time of the Hadoop MapReduce engine. Correspondingly for the terasort gauge, the Tez engine has higher execution time than Hadoop MapReduce engine.



  • Keywords

    Data; Mapreduce; Compression; Tez; Hadoop.

  • References

      [1] Apache software foundation, "welcome to apache Tez",[online] available:,march 16,2017.

      [2] R. Singh and P.J. Karu,"Analyzing performance of apache tez and Map Reduce with hadoop multinode cluster on Amazon cloud", J Big Data vol.3, no1,p.19,dec 2016.

      [3] A.wenas, S.Suharjito,"Improving data technology warehouse performance using filesystem with gzip,lzjb and zle compression", journal informtikadans isteminformasi, vol2, no.2, pp 41-51,feb 2017.

      [4] Y.Chen,A.Ganapthi,R.H.Katz,"To compress or not to compress compute vs. IO tradeoffs for MapReduce energy efficiency ", green networking 2010, new delhi, India,pp 23-28, August 30,2010.

      [5] B.J Mathiya, V.I.Desai,"Hadoop map reduce performance evaluation and improvement using compression algorithms on single cluster", techrepublic, available: http://www// and improvement using compression algorithms on single cluster [Accessed 01-may-2017].

      [6] The big Data blog,"Hadoop ecosystem overview", [online] available:http://the big data,2016.

      [7] Cve-2010-0405,''bzip2 and lib bzip2", [online] available:http://www/,2017.

      [8] G.Kamat,S.Singh,"Compression options in hadoop- a tale of tradeoffs",[online]available:http://www/,2013.

      [9] Project gutenberg,''free ebooks by project gutenberg", [online] available:http://www/,2017.




Article ID: 12539
DOI: 10.14419/ijet.v7i2.26.12539

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.