A vibrant data placement approach for map reduce in diverse environments

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Map reduce assumes that the computing capacity is same for each node in a cluster. Each node is assigned to the same load in homogeneous environment, hence it fully use the resources in the cluster. In such a cluster, there is likely to be various specifications of PCs or servers, which causes the abilities of the nodes to differ. If such a heterogeneous environment still uses the original Hadoop strategy that distributes data blocks into each node equally and the load is also evenly distributed to each node, then the overall performance of Hadoop may be reduced. The majorreasonis thatdifferentcomputing capacitiesbetweennodes causethetask executiontimeto differ so thatthefasterexecutionrate nodes processinglocal data blocks faster than other slower nodes do.The required data should be transferredfrom another node through the network.Becausewaitingforthedatatransmissiontimeincreasesthetask executiontime,it causestheentirejobexecution timeto becomeprolonged.

  • Keywords

    Map Reduce; HDFS; Dynamic Data Placement (DDP); File Systems; Data Nodes.

  • References

      [1] AmazonElasticMapReduce, http:// aws. amazon. Com /elasticmapreduce.

      [2] Apache, http://httpd.apache.org/.

      [3] Hadoop, http://hadoop.apache.org/.

      [4] Hadoop Distributed File System, http:// hadoop. apache.org/ docs/stable/hdfs_design.html.

      [5] HadoopMapReduce,http://hadoop.apache .org /docs /stable /mapred_tutorial. html.

      [6] HadoopYahoo, http: // www. ithome. com.tw /itadm/article.php

      [7] D.Borthakur, K.Muthukkaruppan, K.Ranganathan, S.Rash, J.-S. Sarma, N.Spiegelberg, D.Molkov, R.Schmidt, J.Gray,H.Kuang,A.Menon,A. Aiyer, Apache Hadoop goes realtime at Facebook, in: SIGMOD ’11, Athens, Greece, June 12–16, 2011. https://doi.org/10.1145/1989323.1989438.

      [8] F. Chang, J. Dean, S. Ghemawat, W.-C. Hsieh, D.A. Wallach, M. Burrows, T. Chan- dra, A. Fiker, R.E. Gruber, BigTable: a distributed storage system for structured data, in: 7th USENIX Symposium on Operating Systems Design and Implemen- tation, OSDI’06, 2006, pp. 205–218.

      [9] Q. Chen, D. Zhang, M. Guo, Q. Deng, S. Guo, SAMR: a self-adaptive MapRe- duce scheduling algorithm in heterogeneous environment, in: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), IEEE, 2010, pp. 2736–2743.

      [10] J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, in: OSDI ’04, Dec. 2004, pp. 137–150.

      [11] S. Ghemawat, H. Gobioff, S.-T. Leung, “The Google file system, in: Proc. SOSP 2003, pp. 29–43. https://doi.org/10.1145/945445.945450.

      [12] B. He, W. Fang, Q. Luo, N. Govindaraju, T. Wang, Mars: MapReduce framework on graphics processors, in: ACM 2008, 2008, pp. 260–269. https://doi.org/10.1145/1454115.1454152.

      [13] G. Lee, B.G. Chun, R.H.Katz, Heterogeneity-aware resource allocation and scheduling in the cloud, in: Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud, vol.11, 2011.

      [14] C. Tian, H. Zhou, Y. He, L. Zha, A dynamic MapReduceschedulerforheteroge- neous workloads, in: Eighth International Conference on Grid and Cooperative Computing, GCC’09, IEEE, 2009.

      [15] M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, I. Stoica, Improving MapReduce performance in heterogeneous environments, in: Proc. OSDI, San Diego, CA, De- cember 2008, pp. 29–42.




Article ID: 10034
DOI: 10.14419/ijet.v7i2.4.10034

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.