A four-phase data replication algorithm for data grid
-
2015-04-12 https://doi.org/10.14419/jacst.v4i1.4009 -
Data Grid, Data replication, Geographical Locality, Replica Placement, Temporal Locality. -
Abstract
Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.
-
References
[1] I. Foster, C. Kesselman. "The Grid: Blueprint for a New Computing Infrastructure", Morgan Kaufmann, (2004).
[2] Khanli L.M., A. Isazadeh, T.N. Shishavanc. "PHFS: A dynamic Replication method, to decrease access latencyin multi-tier data grid", Future GenerationComputer Systems 27, (2011), pp.233-244. http://dx.doi.org/10.1016/j.future.2010.08.013.
[3] W. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D.Quesnel, S. Tuecke. "Secure, Efï¬cient Data Transportand Replica Management for High-Performance Data-Intensive Computing" IEEE Mass Storage Conference, (2001).
[4] Stevens. R, Woodward. P, DeFanti. T, Catlett. C. "From the I-WAY to the National Technology Grid", Communications of the ACM, Vol.40, No.11, (1997), pp.50-61. http://dx.doi.org/10.1145/265684.265692.
[5] Jose M. Perez, F. Garcia-Carballeira, J. Carretero, A. Calderon, J. Fernandez. "Branch replication scheme: a new model for data replication in large scale data grids", Future Generation Computer Systems, Vol.26, No.1, (2010), pp.12–20. http://dx.doi.org/10.1016/j.future.2009.05.015.
[6] Chang R-S., J-S. Chang, S-Y. Lin. "Job scheduling and data replication on data grids", Future Generation Computer Systems, Vol.23, (2007), pp.846-860. http://dx.doi.org/10.1016/j.future.2007.02.008.
[7] Rabinovich M., I. Rabinovich, R. Rajaraman. "Dynamic Replication on the Internet", Technical Report, (1998).
[8] Park S.M., J.H. Kim, Y.B. Ko, W.S. Yoon. "Dynamic Data Replication Strategy Based on Internet Hierarchy BHR", Lecture notes in Computer Science Publisher, Springer-Verlag, Vol.3033, (2004), pp.838-846.
[9] Tang M., B.S. Lee, X. Tang, C.K. Yeo. "The impact of data replication on job scheduling performance in the data grid", Future Generation Computer Systems, Vol.22, No.3, (2006), pp.254–268. http://dx.doi.org/10.1016/j.future.2005.08.004.
[10] William H. Bell, David G. Cameron, Ruben Carvajal-Schiafï¬no, A. Paul Millar, Kurt Stockinger, Floriano Zini. "Evaluation of an economy-based ï¬le replication strategy for a Data Grid", International Workshop on Agent Based Cluster and Grid Computing at CCGrid, Tokyo, Japan, IEEE Computer Society Press ,(2003). http://dx.doi.org/10.1109/CCGRID.2003.1199430.
[11] Sang-Min Park, Jai-Hoon Kim, Young-Bae Ko, Won-Sik Yoon. "Dynamic Data Grid replication strategy based on internet hierarchy", Second International Workshop on Grid and CooperativeComputing, GCC', Shanghai, China, (2003).
[12] K. Sashi, A.S. Thanamani. "Dynamic replication in a data grid using a modified BHR region based algorithm", Future Generation Computer Systems, Vol.27, No.2, (2011), pp.202–210. http://dx.doi.org/10.1016/j.future.2010.08.011.
[13] K. Ranganathan, I. Foster. "Design and evaluation of dynamic replication strategies for a high performance data grid", International Conference on Computing in High Energy and Nuclear Physics, Beijing, China, (2001).
[14] A. Horri, R. Sepahvand, Gh. Dastghaibyfard. "A hierarchical scheduling and replication strategy", International Journal of Computer Science and Network Security Vol.8, (2008).
[15] K. Sashi, A.S. Thanamani. "Dynamic replication in a data grid using a modified BHR region based algorithm", Future Generation Computer Systems, Vol.27, No.2, (2011), pp.202–210. http://dx.doi.org/10.1016/j.future.2010.08.011.
[16] K. Sashi, A.S. Thanamani. "A new dynamic replication algorithm for European data grid", Proceedings of the Third Annual ACM Bangalore Conference, (2010), p.17. http://dx.doi.org/10.1145/1754288.1754305.
[17] Rood B, Lewis M J. "Grid resource availability prediction- based scheduling and task replication", Journal of Grid Computing, Vol.7, No.4, (2009), pp.479-500. http://dx.doi.org/10.1007/s10723-009-9135-2.
[18] Al-Kuwaiti M, Kyriakopoulos N, Hussein S. "A comparative analysis of network dependability, fault-tolerance, reliability, security, and survivability", IEEE Communications Surveys & Tutorials, Vol.11, No.2, (2009), pp.106-124. http://dx.doi.org/10.1109/SURV.2009.090208.
[19] Sun DW, Chang GR, Gao S., "Modeling a dynamic data replication strategy to increase system availability in cloud computing environments", Journal of Computer Science and Technology, Vol. 27, No.2, (2012), pp.256-272. http://dx.doi.org/10.1007/s11390-012-1221-4.
[20] D.G. Cameron, R.C. Schiaffino, J. Ferguson, P. Millar, C. Nicholson, K.Stockinger, F. Zini. "OptorSim v2.0 Installation and User Guide", (2004).
[21] K. Holtman. "CMS Data Grid system overview and requirement", Tech report CERN, (2001).
-
Downloads
-
How to Cite
Saleh, A., Javidan, R., & FatehiKhajeh, M. T. (2015). A four-phase data replication algorithm for data grid. Journal of Advanced Computer Science & Technology (JACST), 4(1), 163-174. https://doi.org/10.14419/jacst.v4i1.4009Received date: 2014-12-10
Accepted date: 2015-01-05
Published date: 2015-04-12