Data sources and ingestion big data layers: meta-modeling of key concepts and features
-
https://doi.org/10.14419/ijet.v7i4.21742 -
Abstract
A deluge of data is to be expected in the years to come. Nowadays, huge masses of data is produced every day. For example, if we take only social network users and the Internet of Things, we shall find that they generate large volumes of varied data that have to be transmitted, recorded and processed at high speed. These data are an important source of information that can improve the performance of predictions. Hence, the data are no longer in net structures, easy to consume, but they are represented in different types of structures, namely structured, semi-structured and unstructured data. At the Big Data architecture level, these different data sources are located in the Data Sources layer, which is the starting point for any further processing of Big Data. Indeed, this layer has a direct relationship with the Ingestion layer, which is in charge of validating, transforming, cleaning, reducing and integrating data that can be used later on by the Hadoop ecosystem. In this paper, we applied techniques related to Model Driven Engineering "MDE" to propose a universal Meta-modeling for both Data Sources and Ingestion Big Data layers. These meta-models are platform independent according to Model Driven Architecture pattern, which describes the structures of Data Sources and Ingestion independently from any specific platform.
-
References
[1] Menon, R. (2014). Cloudera Administration Handbook
[2] HortonWorks Data Platform HortonWorks Data Platform: New Book. (2015).
[3] Dunning, T., & Friedman, E. (2015). Real-World Hadoop.
[4] Quintero, D. (n.d.). Front cover implementing an IBM InfoSphere BigInsights Cluster using Linux on Power.
[5] Pivotal Software, I. (2014). Pivotal HD Enterprise Installation and Administrator Guide.
[6] Sarkar, D. (2014). Pro Microsoft HDInsight. Berkeley, CA: Apress. https://doi.org/10.1007/978-1-4302-6056-1.
[7] Allae Erraissi, Abdessamad Belangour, Abderrahim Tragha. “A Big Data Hadoop Building Blocks Comparative Study.†International Journal of Computer Trends and Technology. Accessed June 18, 2017. http://www.ijcttjournal.org/archives/ijctt-v48p109.
[8] Royer, Jean-Claude, and Hugo Arboleda. Model-Driven and Software Product Line Engineering. [1] Edition. London, UK : Hoboken, NJ, USA: Wiley-ISTE, 2012.
[9] Rumpe, Bernhard. Modeling with UML: Language, Concepts, and Methods. First ed. 2016 edition. New York, NY: Springer, 2016.
[10] Jean Bézivin. La transformation de modèles. INRIA-ATLAS & Université de Nantes, 2003. Ecole d’Eté d’Informatique CEA EDF INRIA 2003, cours #6.
[11] ISO/IEC/JTC 1/SC 32. ISO/IEC 19502:2005, Information Technology - Meta Object Facility. Multiple. Distributed through American National Standards Institute, 2007.
[12] Sawant, N., & Shah, H. (Software engineer). (2013). Big data application architecture & a problem-solution approach. Apress.
[13] Gates, Alan, and Daniel Dai. Programming Pig: Dataflow Scripting with Hadoop. Two edition. O’Reilly Media, 2016.
[14] Capriolo, Edward, Dean Wampler, and Jason Rutherglen. Programming Hive: Data Warehouse and Query Language for Hadoop. [1] Edition. Sebastopol, CA: O’Reilly Media, 2012.
[15] Ting, Kathleen, and Jarek Jarcec Cecho. Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational Database. [1] Edition. Sebastopol, CA: O’Reilly Media, 2013.
[16] Allae Erraissi, Abdessamad Belangour, and Abderrahim Tragha, “A Comparative Study of Hadoop-based Big Data Architectures,†Int. J. Web Appl. IJWA, vol. 9, no. 4, 2017.
[17] Roessler, Wolfgang. [(Model Driven Engineering for Safety Relevant Embedded Systems)]. VDM Verlag, 2013.
[18] Blokdyk, Gerardus. MapReduce Complete Self-Assessment Guide. CreateSpace Independent Publishing Platform, 2017.
[19] Raj, Pethuru, and Ganesh Chandra Deka. A Deep Dive into NoSQL Databases: The Use Cases and Applications. S.l.: Academic Press, 2018.
[20] Alapati, Sam R. Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS. Boston, MA: Addison Wesley, 2016.
[21] “ATL: Atlas Transformation Language Specification of the ATL Virtual Machine.â€
[22] “ATL: Atlas Transformation Language ATL Starter’s Guide,†2005.
[23] Allae Erraissi, Abdessamad Belangour, and Abderrahim Tragha, “Digging into Hadoop-based Big Data Architectures,†Int. J. Comput. Sci. Issues IJCSI, vol. 14, no. 6, pp. 52–59, Nov. 2017. http://ijcsi.org/papers/IJCSI-14-6-52-59.pdf.
[24] OMG. Meta Object Facility (MOF) 2.0 Core Specification. January 2006
[25] Read, W., Report, T., & Takeaways, K. (2016). The Forrester WaveTM: Big Data Hadoop Distributions, Q1 2016.
[26] R. D. Schneider, “HADOOP BUYER’S GUIDE,†2014.
[27] V. Starostenkov, R. Senior, and D. Developer, “Hadoop Distributions â€.
[28] Pastor, Oscar, and Juan Carlos Molina. Model-Driven Architecture in Practice: A Software Production Environment Based on Conceptual Modeling. 2007 edition. Berlin ; New York: Springer, 2007.
[29] Allae Erraissi, Abdessamad Belangour, and Abderrahim Tragha. “Meta-Modeling of Data Sources and Ingestion Big Data Layers.†SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, May 26, 2018. https://papers.ssrn.com/abstract=3185342.
[30] Key, Sam. XML Programming Success in a Day: Beginner?s Guide to Fast, Easy, and Efficient Learning of XML Programming. CreateSpace Independent Publishing Platform, 2015.
[31] E. F. Codd, “A Relational Model of Data for Large Shared Data Banks,†1970.
[32] J. D. Allen and Unicode Consortium., the Unicode standard 5.0. Addison-Wesley, 2007.
-
Downloads
-
How to Cite
Erraissi, A., & Belangour, A. (2018). Data sources and ingestion big data layers: meta-modeling of key concepts and features. International Journal of Engineering & Technology, 7(4), 3607-3612. https://doi.org/10.14419/ijet.v7i4.21742Received date: 2018-11-26
Accepted date: 2018-11-26