An improved hadoop load rebalancer
-
2018-08-06 https://doi.org/10.14419/ijet.v7i2.27.11775 -
HDFS, Load Balancer, Rebalance, Scheduler, Spark, Yarn -
Abstract
Hadoop has taken an important space in the market as a result of quick growth of data. Load rebalancing in Hadoop is an area of major concern due to the unpredictable nature of tasks, new nodes added to cluster and node computing capacities. A load rebalancer that is efficient can help to improve the performance and reduce computation time. Load rebalancer and schedulers are used interchangeably in many cases. The main idea of this paper is to explore how load balancers / schedulers work in case of native Hadoop also included insights from some of the works, which identify and addresses the problems around schedulers and rebalancers. In this paper, an Improved Hadoop Load Re-balancer adopts a strategy to move the task to the node which has replica, which is faster and is topologically closer, which reduces the network congestion and execution time of Hadoop.
Â
Â
-
References
[1] Apache. Welcome to ApacheTM HadoopR. 10. July 2013
[2] Kwon, Y., et al. "A study of skew in mapreduce applications." Open Cirrus Summit (2011).
[3] HDFS Architecture guide - https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.
[4] Tyagi, Sajal, and Shipra Saraswat. "Different Scheduling Options in YARN." In Microelectronics and Telecommunication Engineering (ICMETE), 2016 International Conference on, pp. 190-196. IEEE, 2016.
[5] Hadoop’s Fair Scheduler. - https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.
[6] Chauhan, Jagmohan, Dwight Makaroff, and Winfried Grassmann. "The impact of capacity scheduler configuration settings on mapreduce jobs." In Cloud and Green Computing (CGC), 2012 Second International Conference on, pp. 667-674. IEEE, 2012.
[7] Hadoop’s Capacity Scheduler.- http://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html.
[8] Kulkarni, Amogh Pramod, and Mahesh Khandewal. "Survey on Hadoop and Introduction to YARN." International Journal of Emerging Technology and Advanced Engineering 4, no. 5 (2014): 82-87.
[9] Yoo, Dongjin, and Kwang Mong Sim. "A comparative review of job scheduling for MapReduce." In Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on, pp. 353-358. IEEE, 2011.
[10] Rao, B. Thirumala, and L. S. S. Reddy. "Survey on improved scheduling in Hadoop MapReduce in cloud environments."arXiv preprint arXiv:1207.0780 (2012).
[11] Islam, Nusrat Sharmin, Md Wasi-ur-Rahman, Xiaoyi Lu, and Dhabaleswar K. DK Panda. "Efficient data access strategies for Hadoop and spark on HPC cluster with heterogeneous storage." In Big Data (Big Data), 2016 IEEE International Conference on, pp. 223-232. IEEE, 2016.
[12] Hadoop Load Rebalancer is on demand - https://issues.apache.org/jira/browse/HADOOP-1652.
[13] Thirumala Rao, B., Susmitha, M., Swathi, T., & Akhil, G. (2018). “Implementation of Hybrid Scheduler in Hadoopâ€, International Journal of Engineering & Technology, 7(2.7), 868-871.
[14] S. Kalyan Chakravarthy, N., Sudhakar, N., & Srinivasa Reddy, E. (2018). “Implementation of cost effective hierarchical Hadoop cluster–a case study for educationâ€, International Journal of Engineering & Technology.
[15] Sujatha, J., & Meena, K. (2018). “A vibrant data placement approach for map reduce in diverse environmentsâ€, International Journal of Engineering & Technology, 7(2.4), 20-22.
[16] Nagalakshmi, M., Surya Prabha, I., & Anil, K. (2017). “Bigdata implementation of apriori algorithm for handling voluminous data-setsâ€. International Journal of Engineering & Technology, 7(1.5), 217-220.
-
Downloads
-
How to Cite
J, G., Bhaskar N, U., & Reddy P, C. (2018). An improved hadoop load rebalancer. International Journal of Engineering & Technology, 7(2.27), 109-112. https://doi.org/10.14419/ijet.v7i2.27.11775Received date: 2018-04-20
Accepted date: 2018-05-28
Published date: 2018-08-06