Performance Evaluation of Hadoop in Cloud for Big Data

  • Authors

    • Mohammed Fakherldin
    • Ibrahim Aaker Targio Hashem
    • Abdullah Alzuabi
    • Faiz Alotaibi
    2018-10-07
    https://doi.org/10.14419/ijet.v7i4.15.21363
  • Cloud computing, Hadoop, MapReduce.
  • Abstract

    Recent trends in big data have shown that the amount of data continues to increase at an exponential rate. This trend has inspired many researchers over the past few years to explore new research direction of studies related to multiple areas in big data. Hadoop is one of the most popular platforms for big data, thus, Hadoop MapReduce is used to store data in Hadoop distributed file systems. While, cloud computing is considered an excellent candidate for storing and processing the big data. However, processing big data across multiple nodes is a challenging task. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. This paper provides a review and analysis of the impact of using physical versus cloud cluster in the processing a large amount of data. This analysis has an impact on the processing in terms of execution time and cost of using each one of them. The result indicates that the use of cloud virtual machines helped better utilize the resources of the host computer.

     

  • References

    1. [1] Mell, P., & Grance, T. (2011). The NIST definition of cloud computing. http://faculty.winthrop.edu/domanm/csci411/Handouts/NIST.pdf.

      [2] Aceto, G., Botta, A., De Donato, W., & Pescapè, A. (2013). Cloud monitoring: A survey. Computer Networks, 57(9), 2093-2115.

      [3] Yao, Y., Wang, J., Sheng, B., Tan, C., & Mi, N. (2015). Self-adjusting slot configurations for homogeneous and heterogeneous hadoop clusters. IEEE Transactions on Cloud Computing, 5(2), 344-357.

      [4] Apache Hadoop Project Members. Apache Hadoop. https://hadoop.apache.org/.

      [5] Zoll, Q., Zhu, Y., & Feng, D. (2010). A study of self-similarity in parallel I/O workloads. Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, pp. 1-6.

      [6] White, T. (2012). Hadoop: The definitive guide. O'Reilly Media Inc.

      [7] Ghemawat, S., Gobioff, H., & Leung, S. T. (2003). The Google file system. Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29-43.

      [8] Borthakur, D. (2008). HDFS architecture guide. Hadoop Apache Project, 53, 1-13.

      [9] Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

      [10] Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., & Saha, B. (2013). Apache hadoop yarn: Yet another resource negotiator. Proceedings of the ACM 4th Annual Symposium on Cloud Computing, pp. 1-16.

      [11] Sharma, B., Wood, T., & Das, C. R. (2013). Hybridmr: A hierarchical mapreduce scheduler for hybrid data centers. IEEE 33rd International Conference on Distributed Computing Systems, pp. 102-111.

      [12] Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., & Zaharia, M. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50-58.

      [13] Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., & Wu, S. (2012). Maestro: Replica-aware map scheduling for mapreduce. Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 435-442.

  • Downloads

  • How to Cite

    Fakherldin, M., Aaker Targio Hashem, I., Alzuabi, A., & Alotaibi, F. (2018). Performance Evaluation of Hadoop in Cloud for Big Data. International Journal of Engineering & Technology, 7(4.15), 16-18. https://doi.org/10.14419/ijet.v7i4.15.21363

    Received date: 2018-10-09

    Accepted date: 2018-10-09

    Published date: 2018-10-07