Enhancing hadoop performance in homogeneous big data environment assuming configuration of dynamic slots in map-reduce pattern

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Hadoop is a Java-based programming framework that supports the storing and processing of large data sets in a distributed computing environment and it is very much appropriate for high volume of data. It uses HDFS for data storing and uses MapReduce for processing that data. MapReduce is a popular programming model to support data-intensive applications using shared-nothing clusters. The main objective of MapReduce programming model is to parallelize the job execution across multiple nodes for execution. Nowadays, all focus of the researchers and companies toward to Hadoop. Due to this, many scheduling algorithms have been proposed in the past decades. There are three important scheduling issues in MapReduce such as locality, synchronization and fairness. The most common objective of scheduling algorithms is to minimize the completion time of a parallel application and also achieve to these issues. Performance issues are introduced for Hadoop schedulers, and comparative performance analysis between different cases of jobs submission. These jobs are processed in different homogenous data environment and, under fixed or reconfigurable slot between map and reduce tasks for Hadoop MapReduce java programming clustering model. The results showed that when assigning tunable knob between map and reduce tasks under certain scheduler like FIFO algorithm, the performance enhanced 16.66% in inverted index, 55.55% in word count and 11.76% in classification process.



  • Keywords

    Hadoop; Mapreduce; Parallel Processing; Scheduling Algorithms; Homogeneous Data.

  • References

      [1] Al-Ameen, Z., Sulong, G., Gapar, M. D., & Johar, M. D. (2012). Reducing the Gaussian blur artifact from CT medical images by employing a combination of sharpening filters and iterative deblurring algorithms. Journal of Theoretical and Applied Information Technology, 46(1), 31-36.

      [2] Al-Ameen, Z., Sulong, G., & Johar, M. G. M. (2012). Fast deblurring method for computed tomography medical images using a novel kernels set. International Journal of Bio-Science and Bio-Technology, 4(3), 9-19.

      [3] Faizal, M. M., Christopher, P., & Issac, A. J. (2018). Self-Adjusting Slot Configurations for Hadoop Clusters Using Data Security In Cloud. Self, 5(04).

      [4] Verma, K., & Sahu, K. (2018). Implementation of big-data applications using map reduce framework.

      [5] Lin, C. Y., Chen, T. H., & Cheng, Y. N. (2013, December). On improving fault tolerance for heterogeneous hadoop mapreduce clusters. In cloud computing and big data (cloudcom-asia), 2013 international conference on (pp. 38-43). IEEE. https://doi.org/10.1109/CLOUDCOM-ASIA.2013.83.

      [6] Jaiswal, N., & Bhatt, M. (2017). Big-data application using the mapreduce framework.

      [7] Borase, S. D., & Banait, S. S. (2016). Dimensionality reduction using clustering techniques.

      [8] Al-Ameen, Z., Sulong, G., Johar, M. G. M., Verma, N., Kumar, R., Dachyar, M., & Singh, S. (2012). A comprehensive study on fast image deblurring techniques. International Journal of Advanced Science and Technology, 44.

      [9] Issa, J. A. (2015). Performance evaluation and estimation model using regression method for hadoop WordCount. IEEE Access, 3, 2784-2793. https://doi.org/10.1109/ACCESS.2015.2509598.




Article ID: 26962
DOI: 10.14419/ijet.v7i4.26962

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.