Massive Volume of Unstructured Data and Storage Space Optimization- a Review
Keywords:Deduplication, storage optimization, data compression, unstructured data.
Nowadays the volume of digital data generated and used by enterprises is increasing at an enormous rate. The survey says that more than 80% of data that were generated in the last two years are unstructured in nature. Hence storage space requirement for storing this big volume of unstructured data is very high.Â It has gained attention to large-scale storage systems. Deduplication is a space efficient method mainly used to solve storage space optimization problem. This paper focuses on the effect of massive volume of unstructured data and review various storage optimization techniques and survey of various storage types. In addition, it elaborates specific challenges with regard to storage optimization using deduplication and technology that handles a huge amount of unstructured data.
 He Q, Li Z & Zhang X, â€œData deduplication techniquesâ€, IEEE International Conference on Future Information Technology and Management Engineering (FITME), (2010), pp.430-433.
 Chen CP & Zhang CY, â€œData-intensive applications, challenges, techniques and technologies: A survey on Big Dataâ€, Information Sciences, Vol.275, (2014), pp.314-347.
 Kulkarni P, Douglis F, LaVoie JD & Tracey JM, â€œRedundancy Elimination within Large Collections of Filesâ€, USENIX Annual Technical Conference, General Track, (2004), pp.59-72.
 Michael K & Miller KW, â€œBig data: New opportunities and new challenges [guest editors' introduction]â€, Computer, Vol.46, No.6, (2013), pp.22-24.
 Shoro AG & Tariq RS, â€œBig data analysis: Apache spark perspectiveâ€, Global Journal of Computer Science and Technology, (2015).
 Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C & Byers AH, â€œBig data: The next frontier for innovation, competition, and productivityâ€, McKinsey Global Institute, (2011).
 Turner V, Gantz JF, Reinsel D & Minton S, â€œThe digital universe of opportunities: Rich data and the increasing value of the internet of thingsâ€, IDC Analyze the Future, (2014).
 Nguyen TH, Shirai K & Velcin J, â€œSentiment analysis on social media for stock movement predictionâ€, Expert Systems with Applications, Vol.42, No.24, (2015), pp.9603-9611.
 Salomon D, Data compression: the complete reference, Springer Science & Business Media, (2004).
 Reghbati HK, â€œSpecial feature an overview of data compression techniquesâ€, Computer, Vol.14, No.4, (1981), pp.71-75.
 Boldi P & Sebastiano V, â€œThe web graph framework I: compression techniquesâ€, Proceedings of the 13th international conference on World Wide Web, (2004).
 Sethi G, â€œData Compression Techniquesâ€, International Journal of Computer Science and Information Technologies, Vol.5, No.4, (2014), pp.5584-6.
 Chen M, Shiwen M & Yunhao L, â€œBig data: A surveyâ€, Mobile networks and applications, Vol.19, No.2, (2014), pp.171-209.
 Statistics, YouTube, YouTube Inc., (2016).
 Hess B & Virginia T, â€œEducating Consumers through Social Mediaâ€, Consumer Interests Annual, Vol. 60, (2014).
 Brain, Statistic, â€œFacebook statisticsâ€, Retrieved March, Vol.17, (2014).
 Brain, Statistic, â€œTwitter statisticsâ€, Statistic Brain (2014), http://www.statisticbrain.com/twitter-statistics/.
 White, Tom. Hadoop: The definitive guide, O'Reilly Media, Inc, (2012).
 Bigelow SJ & Hawkins J, Data deduplication (Intelligent compression or single-instance storage), (2008).
 Matze JEG, â€œSystem and method for data deduplicationâ€, U.S. Patent No. 8,205,065, (2012).
 Venish A & Siva Sankar K, â€œStudy of chunking algorithm in data deduplicationâ€, Proceedings of the International Conference on Soft Computing Systems, (2016).
 Shin Y, Dongyoung K & Junbeom H, â€œA survey of secure data deduplication schemes for cloud storage systemsâ€, ACM Computing Surveys (CSUR), Vol.49, No.4, (2017).
 Sharma S & Mangat, V, â€œTechnology and trends to handle big data: Surveyâ€, IEEE Fifth International Conference on Advanced Computing & Communication Technologies (ACCT), (2015), pp.266-271.
 Bhadani AK & Dhanya J, â€œBig Data: Challenges, Opportunities, and Realitiesâ€, Effective Big Data Management and Opportunities for Implementation, IGI Global, Pennsylvania, USA, (2016), pp.1-24.
 Oâ€™Malley, Owen, â€œTerabyte sort on apache Hadoopâ€, Yahoo, (2008), pp.1-3.
 Zakir J, Tom S, and Kristi B, â€œBig Data Analyticsâ€, Issues in Information Systems, Vol.16, No.2, (2015).
 Cohen J & Subatra A, â€œTowards a more secure apache hadoop hdfs infrastructureâ€, International Conference on Network and System Security. Springer, Berlin, Heidelberg, (2013).
 Frank S, â€œScalable block data storage using content addressingâ€, U.S. Patent No. 9,104,326, (2015).
 Yaqoob I, â€œBig data: From beginning to futureâ€, International Journal of Information Management, Vol.36, No.6, (2016), pp.1231-1247.
 Smith C, â€œBy the numbers: 160+ interesting Instagram statisticsâ€, Retrieved February, Vol.11, (2016).
 Meyer DT and William JB, â€œA study of practical deduplicationâ€, ACM Transactions on Storage (TOS), Vol.7, No.4, (2012), pp.1-14.
 Nam YJ, Dongchul P & David HCD, â€œAssuring demanded read performance of data deduplication storage with backup datasetsâ€, IEEE 20th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2012.
 Wallace G, â€œCharacteristics of backup workloads in production systemsâ€, FAST. Vol.12, (2012).
 G, Abikhanova, A Ahmetbekova, E Bayat, A Donbaeva, G Burkitbay (2018). International motifs and plots in the Kazakh epics in China (on the materials of the Kazakh epics in China), OpciÃ³n, AÃ±o 33, No. 85. 20-43.
 A Mukanbetkaliyev, S Amandykova, Y Zhambayev, Z Duskaziyeva, A Alimbetova (2018). The aspects of legal regulation on staffing of procuratorial authorities of the Russian Federation and the Republic of Kazakhstan OpciÃ³n, AÃ±o 33. 187-216.