Comparative Study for Load Management of HBase and Cassandra Distributed Databases in Big Data
-
2018-12-09 https://doi.org/10.14419/ijet.v7i4.31.23715 -
Big Data, BigTable, Cassandra, HBase, Load Management, YCSB. -
Abstract
The advancement in cloud computing, the increasing size of databases and the emergence of big data have made traditional data management system to be insufficient solution to store and manage such large-scale data. Therefore, there has been an emergence of new mechanisms for data storage that can handle large-scale data. NoSQL databases are used to store and manage large amount of data. They are intended to be open source, distributed and horizontally scalable in order to provide high performance. Scalability is one of the important features of such systems, it means that by increasing the number of nodes, more requests can be served per unit of time. Distribution and scalability are always companied with load management, which provides load balancing of work among multiple nodes. Load management efficiency varies from system to another according to the used load balancing technique. In this study, HBase and Cassandra load management with scalability will be evaluated as they are the most popular NoSQL databases modeled based on BigTable. In particular, this paper will compare and analyze the load management for the distributed performance of HBase and Cassandra using standard benchmark tool named Yahoo! Cloud Serving Benchmark (YCSB). The experiments will measure the performance of database operations with a different number of connections using different numbers of operations, database size, and processing nodes. The experimental results showed that HBase can provide better performance as the number of connections increase in the presence of horizontal scalability.
Â
 -
References
[1] George L, HBase: The Definitive Guide: Random Access to Your
Planet-Size Data, 2nd ed., O’Reilly Media, Inc., (August, 2017), pp: 1-522.[2] Carpenter J and Hewitt E, Cassandra: The Definitive Guide: Distributed Data at Web Scale, 2nd ed., O’Reilly Media, Inc., (2016), pp: 1-337.
[3] Anderson JC, Lehnardt J, and Slater N, CouchDB: The Definitive Guide: Time to relax, 1st ed., O’Reilly Media, Inc., (November, 2010), pp: 1-184.
[4] Feinberg A, “Project Voldemort: Reliable distributed storage,†Proceedings of the 10th IEEE International Conference on Data Engineering, Hannover, Germany, (2011).
[5] Cooper B F, Ramakrishnan R, Srivastava U, Silberstein A, Bohannon P, Jacobsen H-A, Puz N, Weaver D, and Yerneni R, “Pnuts: Yahoo!’s hosted data serving platform,†Proceedings of the VLDB Endowment, vol. 1, no. 2, (January, 2008), pp. 1277–1288.
[6] Habeeb M, A developer’s guide to Amazon SimpleDB, Upper Saddle River: Addison-Wesley Professional, (2010).
[7] Cooper BF, Silberstein A, Tam E, Ramakrishnan R, and Sears R, “Benchmarking cloud serving systems with YCSB,†Proceedings of the 1st ACM symposium on Cloud computing - SoCC 10, (2010), pp: 143-154
[8] Abramova V, Bernardino J, and Furtado, P,â€Evaluating
Cassandra scalability with YCSB,†Proceedings of International Conference on Database and Expert Systems Applications, Springer International Publishing, (2014), pp: 199–207.[9] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, and Gruber RE, “Bigtable: A distributed storage system for structured data,†ACM Transactions on Computer Systems (TOCS), ACM, vol. 26, no. 2, (January, 2008), pp: 1–26.
[10] Carstoiu D, Cernian A, and Olteanu A, “Hadoop hbase-0.20.2
performance evaluation,†New Trends in Information Science and Service Science (NISS), 2010 4th International Conference on. IEEE, (May, 2010), pp: 84-87.[11] Gokavarapu H, “Exploring Cassandra and HBase with BigTable Model,†Indiana University Bloomington.
[12] Abubakar Y, Adeyi TS, and Auta IG, “Performance Evaluation of NoSQL Systems using YCSB in a Resource Austere Environment,†International Journal of Applied Information Systems, vol. 7, no. 8, (April, 2014), pp: 23–27.
[13] Jogi VD and Sinha A, “Performance evaluation of MySQL, Cassandra and HBase for heavy write operation,†In Recent Advances in Information Technology (RAIT), 2016 3rd International Conference on IEEE, (March, 2016), pp: 586-590.
[14] Shi Y, Meng X, Zhao J, Hu X, Liu B, and Wang H, “Benchmarking cloud-based data management systems,†Proceedings of the second international workshop on Cloud data management - CloudDB 10, ACM, (October, 2010), pp. 47-54.
[15] Rabl T, Gómez-Villamor S, Sadoghi M, Muntés-Mulero V, Jacobsen HA, Mankovskii S, “Solving big data challenges for enterprise application performance management,†Proceedings of the VLDB Endowment. vol. 5, no. 12, (August, 2012), pp: 1724-1735.
[16] Kuhlenkamp J, Klems M, and Röss O, “Benchmarking scalability and elasticity of distributed database systems,†Proceedings of the VLDB Endowment, vol. 7, no. 12, (August, 2014), pp: 1219–1230
[17] Tudorica BG and Bucur C, “A comparison between several NoSQL databases with comments and notes,†2011 RoEduNet International Conference 10th Edition: Networking in Education and Research, IEEE, (June, 2011), pp: 1-5
[18] Xiong A-P and Zou J, “Research of Dynamic Load Balancing Strategy on Hbase,†Proceedings of the 5th International Conference on Information Engineering for Mechanics and Materials, (2015).
[19] Wang P and Qi Y, “Research of Load Balancing Based on NOSQL Database,†In Applied Mechanics and Materials, Trans Tech Publications, vol. 602, (2014).
[20] Jmtauro C, Aravindh S, and Shreeharsha AB, “Comparative Study of the New Generation, Agile, Scalable, High Performance NOSQL Databases,†International Journal of Computer Applications, vol. 48, no. 20, (June, 2012), pp: 1–4.
[21] DataStax Software, “Introduction to Apache Cassandra,†DataStax Software company: https://www.datastax.com/resources/ whitepapers/intro-to-cassandra, last visit:20.01.2018
[22] Vora MN, “Hadoop-HBase for large-scale data,†Computer
science and network technology (ICCSNT), Proceedings of 2011 International Conference on Computer Science and Network Technology, IEEE, Vol. 1, (December, 2011), pp: 601-605 .[23] Pirzadeh P, Tatemura J, Po O, and Hacıgümüş H, “Performance Evaluation of Range Queries in Key Value Stores,†Journal of Grid Computing, vol. 10, no. 1, (March, 2012), pp: 109–132.
[24] Feng C, Zou Y, and Xu Z, “CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queries in Cassandra,†In Semantics Knowledge and Grid (SKG), 2011 Seventh International Conference, on IEEE, (October, 2011), pp. 130-136.
[25] Lakshman A and Malik P, “Cassandra: a decentralized
structured storage system,†ACM SIGOPS Operating Systems Review, vol. 44, no. 2, (April, 2010), pp: 35-40.[26] Konstantinou I, Tsoumakos D, Mytilinis I, and Koziris N, “DBalancer: distributed load balancing for NoSQL data-stores,†Proceedings of the 2013 international conference on Management of data - SIGMOD 13, ACM, (June, 2013), pp: 1037-1040.
[27] Featherston D, “Cassandra: Principles and application,†Department of Computer Science University of Illinois at Urbana-Champaign, (August, 2010).
[28] Bhupathiraju V and Ravuri RP, “The dawn of Big Data - HBase,†2014 Conference on IT in Business, Industry and Government (CSIBIG), IEEE, (March, 2014), pp: 1-4.
[29] Katyal M and Mishra A, “A comparative study of load
balancing algorithms in cloud computing environment.†arXiv preprint arXiv:1403.6918, (March, 2014).[30] Ghandour A, Moukalled M, Jaber M, Falcone Y, "User-based Load Balancer in HBase." In Proceedings of the 7th International Conference on Cloud Computing and Services Science (CLOSER 2017), (2017), pp: 364-368.
-
Downloads
-
How to Cite
Y. Aldailamy, A., Muhammed, A., Ismail, W., & Radman, A. (2018). Comparative Study for Load Management of HBase and Cassandra Distributed Databases in Big Data. International Journal of Engineering & Technology, 7(4.31), 375-380. https://doi.org/10.14419/ijet.v7i4.31.23715Received date: 2018-12-12
Accepted date: 2018-12-12
Published date: 2018-12-09