An Efficient Data Replication Scheme for Hadoop Distributed File System

  • Authors

    • T Lakshmi Siva Rama Krishna
    • J Priyanka
    • N Nikhil Teja
    • Sd Mahiya Sultana
    • B Jabber
    2018-05-31
    https://doi.org/10.14419/ijet.v7i2.32.15396
  • Hadoop Data locality, Data Replication, Data Placement
  • A Distributed file system (DFS) is a storage component of a distributed system (DS). DS consists of multiple autonomous nodes connected via a communication network to solve large problems and to achieve more computing power. One of the design requirement of any DS is to provide replicas. In this paper, we propose a new replication algorithm which is more reliable than the existing replication algorithm used in DFS. The advantages of our proposed replication algorithm by incrementing nodes sequentially (RAINS) is that it distributes the storage load equally among all the nodes sequentially and it guarantees a replica copy in case two racks in a DS are down. This feature is not available in the existing DFS. We have compared existing replication algorithm used by Hadoop distributed file system (HDFS) with our proposed RAINS algorithm. The experimental results indicate that our proposed RAINS algorithm performs better when more number of racks failed in the DS.

     

     
     
  • References

    1. [1] J. Dean and S. Ghemawat, “MapReduce: simplified data pro-cessing on large clustersâ€, Communications of the ACM, vol. 51, no. 1, (2008), pp.107-113.

      [2] White, Tom. Hadoop: The definitive guide. O'Reilly Media, Inc., (2012).

      [3] D. Borthakur, HDFS architecture guide, HADOOP APACHE PROJECT http://hadoop. apache. org/common/docs/current/hdfs design. pdf, (2008).

      [4] Thomasian and J. Menon, RAID5 performance with distrib-uted sparing, Parallel and Distributed Systems, IEEE Transactions on, vol. 8, no. 6, (1997), pp. 640-657.

      [5] J. Dean and S. Ghemawat, MapReduce: simplified data pro-cessing on large clusters, Communications of the ACM, vol. 51, no. 1, (2008), pp. 107-113.

      [6] K. Shvachko, The hadoop distributed file system, Mass Stor-age Systems and Technologies (MSST), 2010 IEEE 26th Symposi-um on. IEEE, (2010).

      [7] S. Mahadev, A survey of distributed file systems, Annual Review of Computer Science, vol. 4, no. 1, (1990), pp. 73-104.

      [8] Q. Wei, CDRM: A cost-effective dynamic replication manage-ment scheme for cloud storage cluster, ClustComputing (CLUS-TER), 2010 IEEE International Conference on. IEEE, (2010).

      [9] J. Xiong, Improving data availability for a cluster file system through replication, Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. IEEE, (2008).

      [10] Abad, L. Cristina, Y. Lu, and R. H. Campbell, DARE: Adap-tive data replication for efficient cluster scheduling, Cluster Com-puting (CLUSTER), 2011 IEEE International Conference on. Ieee, (2011).

      [11] Khanli, L. Mohammad, A. Isazadeh, and T. N. Shishavan, PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid, Future Generation Computer Systems, vol. 27, no. 3, (2011), pp. 233-244.

      [12] S. Seo, HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment, Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, (2009).

      [13] X. Zhang, An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environ-ments. Cloud and Service Computing (CSC), 2011 International Conference on. IEEE, (2011).

      [14] M. Zaharia, Delay scheduling: a simple technique for achiev-ing locality and fairness in cluster scheduling, Proceedings of the 5th European conference on Computer systems. ACM, (2010).

      [15] T. White, Hadoop: The definitive guide, O'Reilly Media, In(2012).

      [16] Krishna, Talluri Lakshmi Siva Rama, Thirumalaisamy Ragun-athan, and Sudheer Kumar Battula. Performance evaluation of read and write operations in hadoop distributed file system. Parallel Architectures, Algorithms and Programming (PAAP), 2014 Sixth International Symposiumon.IEEE,2014.

  • Downloads

  • How to Cite

    Lakshmi Siva Rama Krishna, T., Priyanka, J., Nikhil Teja, N., Mahiya Sultana, S., & Jabber, B. (2018). An Efficient Data Replication Scheme for Hadoop Distributed File System. International Journal of Engineering & Technology, 7(2.32), 167-169. https://doi.org/10.14419/ijet.v7i2.32.15396