Performance evaluation and resource optimization of cloud based parallel Hadoop clusters with an intelligent scheduler.

Article Summary Abstract References Full Article How to cite

Authors
- Manishankar S. Metals Engineering Department, College of Materials Engineering, University of Babylon http://orcid.org/0000-0002-4404-9143
- S. Sathayanarayana Department of Economics / Faculty of Management and Economics, University of Norwich.
2018-11-29

https://doi.org/10.14419/ijet.v7i4.13372
Big Data, Hadoop, Parallel Processing, Intelligent Scheduler, Ganglia Monitor, Super Node, Mediation Manager.
Abstract

Data generated from real time information systems are always incremental in nature. Processing of such a huge incremental data in large scale requires a parallel processing system like Hadoop based cluster. Major challenge that arises in all cluster-based system is how efficiently the resources of the system can be used. The research carried out proposes a model architecture for Hadoop cluster with additional components integrated such as super node who manages the clusters computations and a mediation manager who does the performance monitoring and evaluation. Super node in the system is equipped with intelligent or adaptive scheduler that does the scheduling of the job with optimal resources. The scheduler is termed intelligent as it automatically decides which resource to be taken for which computation, with the help of a cross mapping of resource and job with a genetic algorithm which finds the best matching resource. The mediation node deploys ganglia a standard monitoring tool for Hadoop cluster to collect and record the performance parameters of the Hadoop cluster. The system over all does the scheduling of different jobs with optimal usage of resources thus achieving better efficiency compared to the native capacity scheduler in Hadoop. The system is deployed on top of OpenNebula Cloud environment for scalability.
Â
Â
Â
Â

Â
References
1. [1] J. Eckroth, â€œTeaching Future Big Data Analysts : Curriculum and Experience Report,â€ 2017.
  [2] J. V Gautam, H. B. Prajapati, V. K. Dabhi, and S. Chaudhary, â€œA survey on job scheduling algorithms in Big data processing,â€ 2015 IEEE Int. Conference. Electronic. Computer. Communication. Technol., pp. 1â€“11, 2015.
  [3] A. Sfrent and F. Pop, â€œAsymptotic scheduling for many task computing in Big Data platforms,â€ Inf. Sci. (Ny). vol. 319, pp. 71â€“91, 2015. https://doi.org/10.1016/j.ins.2015.03.053.
  [4] Q. Lu, S. Li, W. Zhang, and L. Zhang, â€œA genetic algorithm-based job scheduling model for big data analytics,â€ Eurasip J. Wireless. Communication. Network. vol. 2016, no. 1, 2016. https://doi.org/10.1186/s13638-016-0651-z.
  [5] R. Kune, P. K. Konugurthi, A. Agarwal, R. R. Chillarige, and R. Buyya, â€œGenetic Algorithm Based Data-Aware Group Scheduling for Big Data Clouds,â€ in Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014, 2015, pp. 96â€“104.
  [6] D. Cheng, J. Rao, C. Jiang, and X. Zhou, â€œResource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters,â€ in Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015, 2015, pp. 956â€“965.
  [7] D. Jiang, B. Ooi, L. Shi, and S. Wu, â€œBig Data Processing Using Hadoop: Survey on Scheduling,â€ Proc. VLDB Endow. vol. 3, no. 10, pp. 272â€“277, 2010.
  [8] L. De Giovanni and F. Pezzella, â€œAn Improved Genetic Algorithm for the Distributed and Flexible Job-shop Scheduling problem,â€ European. Journal. Operaton. Research. vol. 200, no. 2, pp. 395â€“408, 2010. https://doi.org/10.1016/j.ejor.2009.01.008.
  [9] A. Rasooli and D. G. Down, â€œA hybrid scheduling approach for scalable heterogeneous hadoop systems,â€ in Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012, 2012, pp. 1284â€“1291.
  [10] S. Liu, J. Xu, Z. Liu, and X. Liu, â€œEvaluating task scheduling in hadoop-based cloud systems,â€ in Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013, 2013, pp. 47â€“53. https://doi.org/10.1109/BigData.2013.6691697.
  [11] A. Rasooli and D. G. Down, â€œGuidelines for Selecting Hadoop Schedulers Based on System Heterogeneity,â€ J. Grid Comput. vol. 12, no. 3, pp. 499â€“519, 2014. https://doi.org/10.1007/s10723-014-9299-2.
  [12] D. Ding, F. Dong, and J. Luo, â€œMulti-Q: Multiple Queries Optimization Based on MapReduce in Cloud,â€ Proc. - 2014 2nd Int. Conf. Adv. Cloud Big Data, CBD 2014, pp. 100â€“107, 2015.
  [13] J. Zhu, J. Li, E. Hardesty, H. Jiang, and K. C. Li, â€œGPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms,â€ in 2014 IEEE/ACIS 13th International Conference on Computer and Information Science, ICIS 2014 - Proceedings, 2014, pp. 321â€“326.
  [14] J. Dittrich, J.-A. QuianÃ©-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad, â€œHadoop++: Making a yellow elephant run like a cheetah (without it even noticing),â€ Proc. VLDB Endow., vol. 3, no. 1â€“2, pp. 515â€“529, 2010. https://doi.org/10.14778/1920841.1920908.
  [15] Y. Zhang et al., â€œParallel Processing Systems for Big Data: A Survey,â€ Proc. IEEE, vol. 104, no. 11, pp. 2114â€“2136, 2016. https://doi.org/10.1109/JPROC.2016.2591592.
  [16] A. Alexandrov et al., â€œMassively Parallel Data Analysis with PACTs on Nephele,â€ Proc. 36th International. Conference on. Very Large Data Bases, pp. 1625â€“1628, 2010. https://doi.org/10.14778/1920841.1921056.
  [17] B. Jena, M. K. Gourisaria, S. S. Rautaray, and M. Pandey, â€œImprovising Name Node Performance by Aggregator Aided HADOOP Framework,â€ pp. 382â€“388, 2016. https://doi.org/10.1109/ICCICCT.2016.7987978.
  [18] X. Wu, â€œA MapReduce Optimization Method on Hadoop Cluster,â€ Proc. - 2015 Int. Conf. Ind. Informatics - Comput. Technol. Intell. Technol. Ind. Inf. Integr. ICIICII 2015, pp. 18â€“21, 2016.
  [19] A. Vaccaro, L. Troiano, A. Vaccaro, and M. C. Vitelli, â€œOn-line smart grids optimization by case-based reasoning on big data On-line Smart Grids Optimization by Case-Based Reasoning on Big Data,â€ no. September 2016.
  [20] A. Ramaprasath, A. Srinivasan, and C.-H. Lung, â€œPerformance optimization of big data in mobile networks,â€ 2015 IEEE 28th Can. Conference. Electrical. Computer. Engineering. Vol. 2015â€“June, no. June, pp. 1364â€“1368, 2015. https://doi.org/10.1109/CCECE.2015.7129477.
  [21] S. Gokuldev and R. Radhakrishnan, â€œAn adaptive job scheduling with efficient fault tolerance strategy in computational grid,â€International. Journal of Engineering. Technology. Vol. 6, No.4, pp (1793-1798), 2014.
Downloads
How to Cite
S., M., & Sathayanarayana, S. (2018). Performance evaluation and resource optimization of cloud based parallel Hadoop clusters with an intelligent scheduler. International Journal of Engineering and Technology, 7(4), 4220-4226. https://doi.org/10.14419/ijet.v7i4.13372
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: 2018-05-28

Accepted date: 2018-08-25

Published date: 2018-11-29

Performance evaluation and resource optimization of cloud based parallel Hadoop clusters with an intelligent scheduler.

Authors

Abstract

References

Downloads

How to Cite

Published