Improving hadoop performance in heterogeneous big data environments by dynamic slot configurations in mapreduce hadoop programming model

  • Authors

    • Abolgasem M. Ali Enfais UNIVERSITY
    • Adam Amril Jaharadak UNIVERSITY
    • Amad Abdelkarim El Marghani UNIVERSITY
    2019-07-22
    https://doi.org/10.14419/ijet.v7i4.26653
  • Hadoop, MapReduce, Scheduling Algorithms, Workload, Heterogeneous Data.
  • Hadoop has been developed as a platform solution for processing a large scale of data in parallel for different applications in Cloud com-puting. A Hadoop system can be characterized based on three main factors: cluster, workload, and user. Each of these factors can be described in heterogeneous environment, which reflects the heterogeneity degree of the Hadoop system. This paper investigates the effect of heterogeneity in each of these factors on the performance of Hadoop for different schedulers. Three schedulers which consider different levels of Hadoop heterogeneity are used for the analysis: FIFO, Fair sharing, and COSHH (Classification and Optimization based Scheduler for Heterogeneous Hadoop). Performance issues are introduced for Hadoop schedulers and comparative performance analysis between different cases of jobs submission. These jobs are processed in heterogeneous data environments and, under fixed or reconfigurable slot between map and reduce tasks for Hadoop MapReduce java programming clustering model. The results showed that when assigning tunable knob between map and reduce tasks under certain scheduler like FIFO algorithm, the performance enhanced about 81.42% especially in cases of heterogeneity environment where the workload is decreased significantly and the utilization of computational resources in increased obviously.

     

     


  • References

    1. [1] Al-Ameen, Z., Sulong, G., & Johar, M. G. M. (2012). Enhancing the contrast of CT medical images by employing a novel image size dependent normalization technique. International journal of Bio-science and bio-technology, 4(3), 63-68.

      [2] Issa, J. A. (2015). Performance evaluation and estimation model using regression method for hadoop WordCount. IEEE Access, 3, 2784-2793 https://doi.org/10.1109/ACCESS.2015.2509598.

      [3] Zhao, Y., Wu, J., & Liu, C. (2014). Dache: A data aware caching for big-data applications using the MapReduce framework. Tsinghua science and technology, 19(1), 39-50. https://doi.org/10.1109/TST.2014.6733207.

      [4] Zhang, Z., Cherkasova, L., & Loo, B. T. (2013, April). Benchmarking approach for designing a mapreduce performance model. In Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering (pp. 253-258). ACM. https://doi.org/10.1145/2479871.2479906.

      [5] Rizvandi, N. B., Taheri, J., & Zomaya, A. Y. (2011). Some observations on optimal frequency selection in DVFS-based energy consumption minimization. Journal of Parallel and Distributed Computing, 71(8), 1154-1164 https://doi.org/10.1016/j.jpdc.2011.01.004.

      [6] Rizvandi, N. B., Zomaya, A. Y., Boloori, A. J., & Taheri, J. (2012). On modeling dependency between mapreduce configuration parameters and total execution time. arXiv preprint arXiv:1203.0651.

      [7] Jaiswal, N., & Bhatt, M. (2017). Big-data application using the mapreduce framework.

      [8] Lämmel, R. (2008). Google’s MapReduce programming model—revisited. Science of computer programming, 70(1), 1-30. https://doi.org/10.1016/j.scico.2007.07.001.

      [9] Ghemawat, S., Gobioff, H., & Leung, S. T. (2003). The Google file system (Vol. 37, No. 5, pp. 29-43). ACM. https://doi.org/10.1145/1165389.945450.

      [10] Xiong, R., Luo, J., & Dong, F. (2014, November). Sldp: a novel data placement strategy for large-scale heterogeneous hadoop cluster. In advanced cloud and big data (CDB), 2014 second international conference on (pp. 9-17). IEEE. https://doi.org/10.1109/CBD.2014.57.

      [11] Xiong, R., Luo, J., & Dong, F. (2015). Optimizing data placement in heterogeneous hadoop clusters. Cluster computing, 18(4), 1465-1480. https://doi.org/10.1007/s10586-015-0495-z.

      [12] Park, D., Kang, K., Hong, J., & Cho, Y. (2016, April). An efficient hadoop data replication method design for heterogeneous clusters. In proceedings of the 31st annual acm symposium on applied computing (pp. 2182-2184). Acm. https://doi.org/10.1145/2851613.2851945.

      [13] Shaikh, T. A., Shafeeque, U. B., & Ahamad, M. (2018). An Intelligent Distributed K-means Algorithm over Cloudera/Hadoop.

      [14] Singh, S., & Liu, Y. (2016). A cloud service architecture for analyzing big monitoring data. Tsinghua Science and Technology, 21(1), 55-70. https://doi.org/10.1109/TST.2016.7399283.

      [15] Johar, M. G. M., & Awalluddin, J. A. A. (2011). The role of technology acceptance model in explaining effect on e-commerce application system. International Journal of Managing Information Technology, 3(3), 1-14. https://doi.org/10.5121/ijmit.2011.3301.

      [16] Hamat, A., Embi, M. A., & Hassan, H. A. (2012). The use of social networking sites among Malaysian university students. International Education Studies, 5(3), 56. https://doi.org/10.5539/ies.v5n3p56.

  • Downloads

  • How to Cite

    M. Ali Enfais, A., Amril Jaharadak, A., & Abdelkarim El Marghani, A. (2019). Improving hadoop performance in heterogeneous big data environments by dynamic slot configurations in mapreduce hadoop programming model. International Journal of Engineering & Technology, 7(4), 6977-6980. https://doi.org/10.14419/ijet.v7i4.26653