Parallel processing on Big Data in the context of Machine Learning and Hadoop Ecosystem: A Survey
-
2018-03-18 https://doi.org/10.14419/ijet.v7i2.7.10885 -
Big Data, Hadoop, Machine Learning, Parallel Processing. -
Abstract
Emergent Big Data applications have become gradually more essential. In reality, a lot of institutes, businesses and in general entire society from diverse segments depend more and more on information take out from enormous quantity of raw information, statistics and numbers. On the other hand, in Big Data perspective, customary information methods and policies are not as much of capable. They prove a time-consuming receptiveness and are short of quantifiability, measurability, presentation and accurateness. To solve the      composite Big Data constraints and difficulties, a large amount effort has been carried out. As an effect, different categories of packages, distributions and technologies have been developed. In this paper an evaluation is done, this studies recent technologies developed for Big Data. It aims to assist to choose and adopt the exact combination of diverse Big Data technologies according to their technological, scientific needs and particular applications requirements. It provides not only a worldwide sight of most important Big Data technologies but also relationship according to special organizational, classifications levels such as Information Storage Level, Information Processing Level, Information Querying Level, Information Access Level and Management Level. It classifies and talks about main tools and its features, advantages, restrictions and treatments.
-
References
[1] Botta, A., de Donato, W., Persico, V., PescapŽ, A., 2016. Integration of cloud computing and internet of things: a survey. Future Gener. Comput. Syst. 56, 684Ã700.
[2] Weiss, R., Zgorski, L., 2012. Obama Administration Unveils Big Data Initiative: Announces 200 Million in New R&D Investments. Office of Science and Technology Policy, Washington, DC.
[3] Chen, M., Mao, S., Zhang, Y., Leung, V.C., 2014b. Big Data: Related Technologies, Challenges and Future Prospects. Springer.
[4] Letouz, E., 2012. Big Data for Development: Challenges & Opportunities. UN Global Pulse.
[5] Purcell, B.M., 2013. Big Data using cloud computing. Holy Family Univ. J. Technol. Res.
[6] Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E., 2015a. Deep learning applications and challenges in big data analytics. J. Big Data 2, 1.
[7] Khan, N., Yaqoob, I., Hashem, I.A.T., Inayat, Z., Mahmoud Ali, W.K., Alam, M., Shiraz, M., Gani, A., 2014. Big data: survey, technologies, opportunities, and challenges. Sci. World J.
[8] Chen, C.P., Zhang, C.-Y., 2014. Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314Ã347.
[9] Nahar, J., Imam, T., Tickle, K.S., Chen, Y.-P.P., 2013. Computational intelligence for heart disease diagnosis: a medical knowledge driven approach. Expert Syst. App. 40, 96Ã104.
[10] Park, B.-J., Oh, S.-K., Pedrycz, W., 2013. The design of polynomial function-based neural network predictors for detection of software defects. Inf. Sci. 229, 40Ã57.
[11] Zhou, L., 2013. Performance of corporate bankruptcy prediction models on imbalanced dataset: the effect of sampling methods. Knowledge-Based Syst. 41, 16Ã25.
[12] Yu, H., Ni, J., Zhao, J., 2013. Acosampling: An ant colony optimization-based undersampling method for classifying imbalanced dna microarray data. Neurocomputing 101, 309Ã318.
[13] Di Martino, B., Aversa, R., Cretella, G., Esposito, A., KoÅ‚odziej, J., 2014. Big data (lost) in the cloud. Int. J. Big Data Intell. 1, 3Ã17.
[14] Wang, S., Yao, X., 2012. Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42, 1119Ã1130
[15] Zhou, L., Wang, Q., Fujita, H., 2017. One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies. Inf. Fusion 36, 80Ã89.
[16] Wang, L., 2016. Machine learning in big data. Int. J. Adv. Appl. Sci. 4, 117Ã123.
[17] Tsai, C.-W., Lai, C.-F., Chao, H.-C., Vasilakos, A.V., 2016. Big data analytics. In: Big Data Technologies and Applications. Springer, pp. 13Ã52.
[18] Bishop, C.M., 2006. Pattern recognition. Mach. Learn. 128, 1Ã58.
[19] Jadhav, A., Deshpande, L., 2016. A survey on approaches to efficient classification of data streams using concept drift. Int. J. 4.
[20] Sun, J., Fujita, H., Chen, P., Li, H., 2017. Dynamicfinancialdistress prediction with concept drift based on time weighting combined with adaboost support vector machine ensemble. Knowledge-Based Syst. 120, 4Ã14.
[21] Razzak, M.I., Naz, S., Zaib, A., 2017. Deep learning for medical image processing: Overview, challenges and future. arXiv preprint arXiv:1704.06825.
[22] Zang, W., Zhang, P., Zhou, C., Guo, L., 2014. Comparative study between incremental and ensemble learning on data streams: case study. J. Big Data 1, 1Ã16.
[23] Skowron, A., Jankowski, A., Dutta, S., 2016. Interactive granular computing. Granular. Computing 1, 95Ã113.
[24] Mazumder, S., 2016. Big data tools and platforms. In: Big Data Concepts, Theories, and Applications. Springer, pp. 29Ã128.
[25] Nathan, P., 2013. Enterprise Data Workßows with Cascading. OÕReilly Media Inc..
[26] Beyer, K.S., Ercegovac, V., Gomulka, R., Balmin, A., Eltabakh, M., Kanne, C.-C., Ozcan, F., Shekita, E.J., 2011. Jaql: a scripting language for large scale semistructured data analysis. In: Proceedings of VLDB Conference.
[27] Vohra, D., 2016. Using apache sqoop. In: Pro Docker. Springer, pp. 151Ã183.
[28] Hoffman, S., 2015. Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing Ltd..
[29] Shireesha, R., Bhutada, S., 2016. A study of tools, techniques, and trends for big data analytics. IJACTA 4, 152Ã158.
[30] Sakr, S., 2016b. General-purpose big data processing systems. In: Big Data 2.0 Processing Systems. Springer, pp. 15Ã39.
[31] Azarmi, B., 2016b. Scalable Big Data Architecture. Springer.
[32] [32] Landset, S., Khoshgoftaar, T.M., Richter, A.N., Hasanin, T., 2015. A survey of open source tools for machine learning with big data in the hadoop ecosystem. J. Big Data 2, 1.
[33] Wadkar, S., Siddalingaiah, M., 2014b. Hcatalog and hadoop in the enterprise. In: Pro Apache Hadoop. Springer, pp. 271Ã282.
[34] Dinsmore, T.W., 2016. Streaming analytics. In: Disruptive Analytics. Springer, pp. 117Ã144.
[35] Team, R.C., 2000. R Language DeÞnition. R foundation for statistical computing, Austria.
[36] Brown, M.S., 2014. Data Discovery For Dummies, Teradata Special Edition. John Wiley & Sons Inc..
[37] Raim, A.M., 2013. Introduction to Distributed Computing with pbdR at the UMBC High Performance Computing Facility. Technical Report HPCF-2013-2, UMBC High Performance Computing Facility, University of Maryland, Baltimore County.
[38] Ames, A., Abbey, R., Thompson, W., 2013. Big Data Analytics Benchmarking SAS, R, and Mahout. SAS Technical Paper.
[39] Lublinsky, B., Smith, K.T., Yakubovich, A., 2013. Professional Hadoop Solutions. John Wiley & Sons.
[40] Junqueira, F., Reed, B., 2013. ZooKeeper: Distributed Process Coordination. Reilly Media Inc.
[41] Shapira, G., Seidman, J., Malaska, T., Grover, M., 2015. Hadoop Application Architectures. OÕReilly Media Inc..
[42] Maeda, K., 2012. Comparative survey of object serialization techniques and the programming supports. J. Commun. Comput. 9, 920Ã928.
[43] Islam, M.K., Srinivasan, A., 2015. Apache ozie: The Workßow Scheduler for Hadoop. Reilly Media Inc..
[44] Kamrul Islam, M., Srinivasan, A., 2014. Apache Oozie The Workßow Scheduler for Hadoop. OÕReilly Media Inc.
[45] White, T., 2012. Hadoop: The Definitive Guide. Reilly Media Inc..
[46] Wadkar, S., Siddalingaiah, M., 2014a. Apache Ambari. In: Pro Apache Hadoop. Springer, pp. 399Ã401.
[47] Sammer, E., 2012. Hadoop Operations. Reilly Media Inc..
[48] Lovalekar, S., 2014. Big Data: an emerging trend in future. Int. J. Comput. Sci. Inf. Technol. 5.
[49] Chullipparambil, C.P., 2016. Big Data Analytics Using Hadoop Tools (Ph.D. thesis). San Diego State University.
[50] Landset, S., Khoshgoftaar, T.M., Richter, A.N., Hasanin, T., 2015. A survey of open source tools for machine learning with big data in the hadoop ecosystem. J. Big Data 2, 1.
[51] Azarmi, B., 2016b. Scalable Big Data Architecture. Springer.
[52] Prasad, B.R., Agarwal, S., 2016. Comparative study of big data computing and storage tools: a review. Int. J. Database Theory App. 9, 45Ã66.
[53] Azarmi, B., 2016a. The big (data) problem. In: Scalable Big Data Architecture. Springer, pp. 1Ã16.
[54] Sakr, S., 2016. Big data 2.0 processing systems: a survey. Springer Briefs in Computer Science.
[55] Kobielus, J.G., 2012. The forrester wave: Enterprise hadoop solutions, q1 2012. Forrester
[56] Zikopoulos, P., Parasuraman, K., Deutsch, T., Giles, J., Corrigan, D., et al., 2012. Harness the Power of Big Data The IBM Big Data Platform. McGraw Hill Professional.
[57] Hurwitz, J., Nugent, A., Halper, F., Kaufman, M., 2013. Big Data for Dummies. (1st ed.). For Dummies
[58] Dijcks, J.P., 2012. Oracle: Big Data for the Enterprise. Oracle White Paper. Dimiduk, N., Khurana, A., Ryan, M.H., Stack, M., 2013. HBase in Action. Manning Shelter Island.
[59] Murthy, B., Goel, M., Lee, A., Granholm, D., Cheung, S., 2011. Oracle Exalytics in- Memory Machine: A brief Introduction.
[60] [60] Nadipalli, R., 2015. HDInsight Essentials. Packt Publishing Ltd..
[61] Ahmed Oussous, Fatima-Zahra Benjelloun Ayoub Ait Lahecen, Samir Belfair,â€Big Data technologies :A survey†Journal of King Saud University à Computer and Information Sciences 2017.
[62] VARUN TEJA, T. and ASADI, S.S., 2016. An integrated approach for evaluation of environmental impact assessment - A model study. International Journal of Civil Engineering and Technology, 7(6), pp. 650-659.
[63] JAWAHAR, A. and KOTESWARA RAO, S., 2015. Recursive multistage estimator for bearings only passive target tracking in ESM EW systems. Indian Journal of Science and Technology, 8(26),.
[64] ADITYA VARMA, K.V., MANIDEEP, T. and ASADI, S.S., 2016. A critical comparison of quantity estimation for gated community construction project using Traditional method vs Plan swift software: A case study. International Journal of Civil Engineering and Technology, 7(6), pp. 707-713.
[65] MURALI, A., KAKARLA, H.K. and VENKAT REDDY, D., 2016. Integrating FPGAs with trigger circuitry core system insertions for observability in debugging process. Journal of Engineering and Applied Sciences, 11(12), pp. 2643-2650.
[66] BALA GOPAL, P., HARI KISHORE, K., KALYANA VENKATESH, R.R. and HARINATH MANDALAPU, P., 2015. An FPGA implementation of onchip UART testing with BIST techniques. International Journal of Applied Engineering Research, 10(14), pp. 34047-34051.
[67] BHARADWAJ, M. and KISHORE, H., 2017. Enhanced launch-off-capture testing using BIST design. Journal of Engineering and Applied Sciences, 12(3), pp. 636-643.
-
Downloads
-
How to Cite
Vishwanath Brahmane, A., & Murugan, R. (2018). Parallel processing on Big Data in the context of Machine Learning and Hadoop Ecosystem: A Survey. International Journal of Engineering & Technology, 7(2.7), 577-588. https://doi.org/10.14419/ijet.v7i2.7.10885Received date: 2018-04-01
Accepted date: 2018-04-01
Published date: 2018-03-18