Data sources and ingestion big data layers: meta-modeling of key concepts and features

Allae Erraissi; Abdessamad Belangour

doi:10.14419/ijet.v7i4.21742

Article Summary Abstract References Full Article How to cite

Authors
- Allae Erraissi
- Abdessamad Belangour
https://doi.org/10.14419/ijet.v7i4.21742
Abstract

A deluge of data is to be expected in the years to come. Nowadays, huge masses of data is produced every day. For example, if we take only social network users and the Internet of Things, we shall find that they generate large volumes of varied data that have to be transmitted, recorded and processed at high speed. These data are an important source of information that can improve the performance of predictions. Hence, the data are no longer in net structures, easy to consume, but they are represented in different types of structures, namely structured, semi-structured and unstructured data. At the Big Data architecture level, these different data sources are located in the Data Sources layer, which is the starting point for any further processing of Big Data. Indeed, this layer has a direct relationship with the Ingestion layer, which is in charge of validating, transforming, cleaning, reducing and integrating data that can be used later on by the Hadoop ecosystem. In this paper, we applied techniques related to Model Driven Engineering "MDE" to propose a universal Meta-modeling for both Data Sources and Ingestion Big Data layers. These meta-models are platform independent according to Model Driven Architecture pattern, which describes the structures of Data Sources and Ingestion independently from any specific platform.
References
1. [1] Menon, R. (2014). Cloudera Administration Handbook
  [2] HortonWorks Data Platform HortonWorks Data Platform: New Book. (2015).
  [3] Dunning, T., & Friedman, E. (2015). Real-World Hadoop.
  [4] Quintero, D. (n.d.). Front cover implementing an IBM InfoSphere BigInsights Cluster using Linux on Power.
  [5] Pivotal Software, I. (2014). Pivotal HD Enterprise Installation and Administrator Guide.
  [6] Sarkar, D. (2014). Pro Microsoft HDInsight. Berkeley, CA: Apress. https://doi.org/10.1007/978-1-4302-6056-1.
  [7] Allae Erraissi, Abdessamad Belangour, Abderrahim Tragha. â€œA Big Data Hadoop Building Blocks Comparative Study.â€ International Journal of Computer Trends and Technology. Accessed June 18, 2017. http://www.ijcttjournal.org/archives/ijctt-v48p109.
  [8] Royer, Jean-Claude, and Hugo Arboleda. Model-Driven and Software Product Line Engineering. [1] Edition. London, UK : Hoboken, NJ, USA: Wiley-ISTE, 2012.
  [9] Rumpe, Bernhard. Modeling with UML: Language, Concepts, and Methods. First ed. 2016 edition. New York, NY: Springer, 2016.
  [10] Jean BÃ©zivin. La transformation de modÃ¨les. INRIA-ATLAS & UniversitÃ© de Nantes, 2003. Ecole dâ€™EtÃ© dâ€™Informatique CEA EDF INRIA 2003, cours #6.
  [11] ISO/IEC/JTC 1/SC 32. ISO/IEC 19502:2005, Information Technology - Meta Object Facility. Multiple. Distributed through American National Standards Institute, 2007.
  [12] Sawant, N., & Shah, H. (Software engineer). (2013). Big data application architecture & a problem-solution approach. Apress.
  [13] Gates, Alan, and Daniel Dai. Programming Pig: Dataflow Scripting with Hadoop. Two edition. Oâ€™Reilly Media, 2016.
  [14] Capriolo, Edward, Dean Wampler, and Jason Rutherglen. Programming Hive: Data Warehouse and Query Language for Hadoop. [1] Edition. Sebastopol, CA: Oâ€™Reilly Media, 2012.
  [15] Ting, Kathleen, and Jarek Jarcec Cecho. Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational Database. [1] Edition. Sebastopol, CA: Oâ€™Reilly Media, 2013.
  [16] Allae Erraissi, Abdessamad Belangour, and Abderrahim Tragha, â€œA Comparative Study of Hadoop-based Big Data Architectures,â€ Int. J. Web Appl. IJWA, vol. 9, no. 4, 2017.
  [17] Roessler, Wolfgang. [(Model Driven Engineering for Safety Relevant Embedded Systems)]. VDM Verlag, 2013.
  [18] Blokdyk, Gerardus. MapReduce Complete Self-Assessment Guide. CreateSpace Independent Publishing Platform, 2017.
  [19] Raj, Pethuru, and Ganesh Chandra Deka. A Deep Dive into NoSQL Databases: The Use Cases and Applications. S.l.: Academic Press, 2018.
  [20] Alapati, Sam R. Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS. Boston, MA: Addison Wesley, 2016.
  [21] â€œATL: Atlas Transformation Language Specification of the ATL Virtual Machine.â€
  [22] â€œATL: Atlas Transformation Language ATL Starterâ€™s Guide,â€ 2005.
  [23] Allae Erraissi, Abdessamad Belangour, and Abderrahim Tragha, â€œDigging into Hadoop-based Big Data Architectures,â€ Int. J. Comput. Sci. Issues IJCSI, vol. 14, no. 6, pp. 52â€“59, Nov. 2017. http://ijcsi.org/papers/IJCSI-14-6-52-59.pdf.
  [24] OMG. Meta Object Facility (MOF) 2.0 Core Specification. January 2006
  [25] Read, W., Report, T., & Takeaways, K. (2016). The Forrester WaveTM: Big Data Hadoop Distributions, Q1 2016.
  [26] R. D. Schneider, â€œHADOOP BUYERâ€™S GUIDE,â€ 2014.
  [27] V. Starostenkov, R. Senior, and D. Developer, â€œHadoop Distributions â€.
  [28] Pastor, Oscar, and Juan Carlos Molina. Model-Driven Architecture in Practice: A Software Production Environment Based on Conceptual Modeling. 2007 edition. Berlin ; New York: Springer, 2007.
  [29] Allae Erraissi, Abdessamad Belangour, and Abderrahim Tragha. â€œMeta-Modeling of Data Sources and Ingestion Big Data Layers.â€ SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, May 26, 2018. https://papers.ssrn.com/abstract=3185342.
  [30] Key, Sam. XML Programming Success in a Day: Beginner?s Guide to Fast, Easy, and Efficient Learning of XML Programming. CreateSpace Independent Publishing Platform, 2015.
  [31] E. F. Codd, â€œA Relational Model of Data for Large Shared Data Banks,â€ 1970.
  [32] J. D. Allen and Unicode Consortium., the Unicode standard 5.0. Addison-Wesley, 2007.
Downloads
How to Cite
Erraissi, A., & Belangour, A. (2018). Data sources and ingestion big data layers: meta-modeling of key concepts and features. International Journal of Engineering and Technology, 7(4), 3607-3612. https://doi.org/10.14419/ijet.v7i4.21742
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: November 26, 2018

Accepted date: November 26, 2018

Data sources and ingestion big data layers: meta-modeling of key concepts and features

Authors

Abstract

References

Downloads

How to Cite