Generating realistic Arabic handwriting dataset

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    During the previous year's holistic approach showing satisfactory results to solve ‎the ‎problem of Arabic handwriting word  recognition instead of word letters ‎‎segmentation.‎ ‎In this paper, we present an efficient system for ‎ generation realistic Arabic handwriting dataset from ASCII input ‎text. We carefully selected simple word list that contains most Arabic ‎letters normal and ligature connection cases. To improve the ‎performance of new letters reproduction we developed our ‎normalization method that adapt its clustering action according to ‎created Arabic letters families. We enhanced  Gaussian Mixture ‎Model process to learn letters template by detecting the ‎number and position of Gaussian component by implementing ‎Ramer-Douglas-Peucker‎ algorithm which improve the new letters ‎shapes reproduced by using and Gaussian Mixture Regression. ‎‎We learn the translation distance between word-part to achieve ‎real handwriting word generation shape.‎ Using combination of LSTM and CTC layer as a recognizer to validate the ‎efficiency of our approach in generating new realistic Arabic handwriting words inherit user handwriting style as shown by the experimental results.‎


  • Keywords

    Arabic handwriting ; normalization; ligatures; template learning; Gaussian regression.

  • References

      [1] Mamoun Sakkal," Arabic Alphabet Chart in Naskh Style",‎

      [2] ‎A. Amin, 2000, “Recognition of Printed Arabic Text Based on Global Features and ‎Decision Tree Learning ‎Techniques”, Pattern Recognition, vol. 33, pp. 1309–1323.‎

      [3] ‎Yannis H.,1995, "The Traditional Arabic Type-case Extended to the Unicode Set of ‎Glyphs" Electronic ‎Publishing, Vol. 8, pp. 111-123.‎

      [4] ‎A. Graves, “Generating sequences with recurrent neural networks,” CoRR, vol. ‎abs/1308.0850, 2013. [Online]. ‎Available:‎

      [5] ‎Y. Elarian, Husni Al-Muhtaseb, and LahouariGhouti,2010, "Arabic Handwriting ‎Synthesis", International ‎Workshop on Frontiers in Arabic Handwriting Recognition, ‎Istanbul.‎

      [6] ‎Margner V, Pechwitz M (2001) Synthetic Data for Arabic OCR System Development. In: ‎Sixth International ‎Conference on Document Analysis and Recognition (ICDAR'01), ‎IEEE: 1159-1163.‎

      [7] ‎R.M. Saabni, J.A. El-Sana,2013, "Comprehensive synthetic Arabic database for on/offline ‎script recognition ‎research," Int. J. Doc. Anal. Recognit. (IJDAR) 16 (3) pp. 285–294.‎

      [8] ‎Shatnawi M. and Abdallah S.,2015,"Improving Handwritten Arabic Character ‎Recognition by Modeling ‎Human Handwriting Distortions," ACM Transactions on Asian ‎and Low-Resources Information Processing.‎

      [9] ‎A. Almaksour, E. Anquetil, R. Plamondon, and C. O'Reilly, Synthetic handwritten ‎gesture generation using ‎sigma-lognormal model for evolving handwriting classifiers, in: ‎Proceedings of the 15th Biennial Conference of ‎the International Graphonomics Society, ‎‎2011, pp.98–101.‎

      [10] ‎Y. Zheng and D. Doermann, “Handwriting matching and its application to handwriting ‎synthesis,” in ‎Proceedings of the Eight International Conference on Document Analysis ‎and Recognition (ICDAR), 2005, pp. ‎‎861–865.‎

      [11] ‎Dinges, L.; Al-Hamadi, A.; Elzobi, M.; El etriby, S.; Ghoneim, A. ASM based Synthesis ‎of Handwritten Arabic ‎Text Pages. Sci. World J. 2015, 2015, 323575.‎

      [12] ‎D. Salomon, “Curves and Surfaces for Computer Graphics”, Ch.1, pp.7-14, Springer, ‎‎2006.‎

      [13] ‎Mustaffa and Yusof. A Comparison of Normalization Techiques in Predicting Dengue ‎Outbreak. International ‎Cinference on Business and Economics Resaerch, ‎vol.1(2011) © (2011) LACSIT Press, Kuala Lumpur, ‎Malaysia. ‎

      [14] ‎Patel and Mehta. Impact of Outlier Removal and Normalization Approach in Modified k-‎Means Clustering ‎Algorithm. IJCSI International Journal of Computer Science Issues, ‎Vol. 8, Issue 5, No 2, September 2011, ‎ISSN (Online): 1694-0814.‎

      [15] ‎G.Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, vol. 6, 1978, ‎pp. 461-464. ‎

      [16] ‎C. Biernacki, G.Celeux and G. Govarert, “Assessing a Mixture Model for Clustering with ‎the Integrated ‎Completed Likelihood,” Technical Report 3,521, Inria, 1998. ‎

      [17] ‎A.Likas, N.Vlassis, and J.Verbeek, “The Global k-means clustering algorithm,” Pattern ‎Recognition 36, 2003, ‎pp. 451-461. [12] J.Verbeek, N.Vlassis, and B.Krose, “Efficient ‎Greedy Learning of Gaussian Mixture,” Neural ‎Computation 15, 2003, pp. 469-485.‎

      [18] ‎Y Lee, KY Lee, J Lee.,2006, "The estimating optimal number of Gaussian mixtures based ‎on incremental k-‎means for speaker identification", International Journal of Information ‎Technology 12 (7), pp13-21.‎

      [19] ‎U. Ramer, An iterative procedure for the polygonal approximation of plane curves, ‎Computer Graphics and ‎Image Processing 1(3) (1972) 244-256. ‎

      [20] ‎D.H. Douglas, T.K. Peucker, Algorithms for the reduction of the number of points ‎required to represent a ‎digitized line or its caricature, Cartographical: The International ‎Journal for Geographic Information and ‎Geovisualization 10(1973) 112-122.‎

      [21] ‎A. Dempster and N. Rubin, “Maximum likelihood from incomplete data via the em ‎algorithm,” Journal of the ‎Royal Statistical Society, vol. 39(1), pp. 1–38, 1977‎

      [22] ‎D. Cohn, Z. Ghahramani, and M. Jordan, Active learning with statistical models. Articial ‎Intelligence Research, ‎vol. 4, pp. 129145, 1996.‎

      [23] ‎Alex Graves and Jürgen S. 2009, "Offline handwriting recognition with multidimensional ‎recurrent neural ‎networks". In Advances in Neural Information Processing Systems 21, pp ‎‎545-552.‎




Article ID: 29786
DOI: 10.14419/ijet.v8i4.29786

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.