Generating realistic Arabic handwriting dataset

  • Authors

    • Mahmoud I. Abdalla Computer engineer at Suez Canal Authority
    • Mohsen A. Rashwan professor, Electronics and Communication Department, Zagazig University,Zagazig
    • Mohamed A. Elserafy professor, Electronics and Communication Department, Cairo University, Cairo‎
    2019-10-19
    https://doi.org/10.14419/ijet.v8i4.29786
  • Arabic handwriting, normalization, ligatures, template learning, Gaussian regression.
  • During the previous year's holistic approach showing satisfactory results to solve ‎the ‎problem of Arabic handwriting word  recognition instead of word letters ‎‎segmentation.‎ ‎In this paper, we present an efficient system for ‎ generation realistic Arabic handwriting dataset from ASCII input ‎text. We carefully selected simple word list that contains most Arabic ‎letters normal and ligature connection cases. To improve the ‎performance of new letters reproduction we developed our ‎normalization method that adapt its clustering action according to ‎created Arabic letters families. We enhanced  Gaussian Mixture ‎Model process to learn letters template by detecting the ‎number and position of Gaussian component by implementing ‎Ramer-Douglas-Peucker‎ algorithm which improve the new letters ‎shapes reproduced by using and Gaussian Mixture Regression. ‎‎We learn the translation distance between word-part to achieve ‎real handwriting word generation shape.‎ Using combination of LSTM and CTC layer as a recognizer to validate the ‎efficiency of our approach in generating new realistic Arabic handwriting words inherit user handwriting style as shown by the experimental results.‎

     

  • References

    1. [1] Mamoun Sakkal," Arabic Alphabet Chart in Naskh Style", www.sakkal.com.‎

      [2] ‎A. Amin, 2000, “Recognition of Printed Arabic Text Based on Global Features and ‎Decision Tree Learning ‎Techniquesâ€, Pattern Recognition, vol. 33, pp. 1309–1323.‎ https://doi.org/10.1016/S0031-3203(99)00114-4.

      [3] ‎Yannis H.,1995, "The Traditional Arabic Type-case Extended to the Unicode Set of ‎Glyphs" Electronic ‎Publishing, Vol. 8, pp. 111-123.‎

      [4] ‎A. Graves, “Generating sequences with recurrent neural networks,†CoRR, vol. ‎abs/1308.0850, 2013. [Online]. ‎Available: http://arxiv.org/abs/1308.0850‎

      [5] ‎Y. Elarian, Husni Al-Muhtaseb, and LahouariGhouti,2010, "Arabic Handwriting ‎Synthesis", International ‎Workshop on Frontiers in Arabic Handwriting Recognition, ‎Istanbul.‎

      [6] ‎Margner V, Pechwitz M (2001) Synthetic Data for Arabic OCR System Development. In: ‎Sixth International ‎Conference on Document Analysis and Recognition (ICDAR'01), ‎IEEE: 1159-1163.‎

      [7] ‎R.M. Saabni, J.A. El-Sana,2013, "Comprehensive synthetic Arabic database for on/offline ‎script recognition ‎research," Int. J. Doc. Anal. Recognit. (IJDAR) 16 (3) pp. 285–294.‎ https://doi.org/10.1007/s10032-012-0189-5.

      [8] ‎Shatnawi M. and Abdallah S.,2015,"Improving Handwritten Arabic Character ‎Recognition by Modeling ‎Human Handwriting Distortions," ACM Transactions on Asian ‎and Low-Resources Information Processing.‎ https://doi.org/10.1145/2764456.

      [9] ‎A. Almaksour, E. Anquetil, R. Plamondon, and C. O'Reilly, Synthetic handwritten ‎gesture generation using ‎sigma-lognormal model for evolving handwriting classifiers, in: ‎Proceedings of the 15th Biennial Conference of ‎the International Graphonomics Society, ‎‎2011, pp.98–101.‎

      [10] ‎Y. Zheng and D. Doermann, “Handwriting matching and its application to handwriting ‎synthesis,†in ‎Proceedings of the Eight International Conference on Document Analysis ‎and Recognition (ICDAR), 2005, pp. ‎‎861–865.‎

      [11] ‎Dinges, L.; Al-Hamadi, A.; Elzobi, M.; El etriby, S.; Ghoneim, A. ASM based Synthesis ‎of Handwritten Arabic ‎Text Pages. Sci. World J. 2015, 2015, 323575.‎ https://doi.org/10.1155/2015/323575.

      [12] ‎D. Salomon, “Curves and Surfaces for Computer Graphicsâ€, Ch.1, pp.7-14, Springer, ‎‎2006.‎

      [13] ‎Mustaffa and Yusof. A Comparison of Normalization Techiques in Predicting Dengue ‎Outbreak. International ‎Cinference on Business and Economics Resaerch, ‎vol.1(2011) © (2011) LACSIT Press, Kuala Lumpur, ‎Malaysia. ‎

      [14] ‎Patel and Mehta. Impact of Outlier Removal and Normalization Approach in Modified k-‎Means Clustering ‎Algorithm. IJCSI International Journal of Computer Science Issues, ‎Vol. 8, Issue 5, No 2, September 2011, ‎ISSN (Online): 1694-0814.‎

      [15] ‎G.Schwarz, “Estimating the Dimension of a Model,†Annals of Statistics, vol. 6, 1978, ‎pp. 461-464. ‎https://doi.org/10.1214/aos/1176344136.

      [16] ‎C. Biernacki, G.Celeux and G. Govarert, “Assessing a Mixture Model for Clustering with ‎the Integrated ‎Completed Likelihood,†Technical Report 3,521, Inria, 1998. ‎

      [17] ‎A.Likas, N.Vlassis, and J.Verbeek, “The Global k-means clustering algorithm,†Pattern ‎Recognition 36, 2003, ‎pp. 451-461. [12] J.Verbeek, N.Vlassis, and B.Krose, “Efficient ‎Greedy Learning of Gaussian Mixture,†Neural ‎Computation 15, 2003, pp. 469-485.‎ https://doi.org/10.1016/S0031-3203(02)00060-2.

      [18] ‎Y Lee, KY Lee, J Lee.,2006, "The estimating optimal number of Gaussian mixtures based ‎on incremental k-‎means for speaker identification", International Journal of Information ‎Technology 12 (7), pp13-21.‎

      [19] ‎U. Ramer, An iterative procedure for the polygonal approximation of plane curves, ‎Computer Graphics and ‎Image Processing 1(3) (1972) 244-256. ‎https://doi.org/10.1016/S0146-664X(72)80017-0.

      [20] ‎D.H. Douglas, T.K. Peucker, Algorithms for the reduction of the number of points ‎required to represent a ‎digitized line or its caricature, Cartographical: The International ‎Journal for Geographic Information and ‎Geovisualization 10(1973) 112-122.‎ https://doi.org/10.3138/FM57-6770-U75U-7727.

      [21] ‎A. Dempster and N. Rubin, “Maximum likelihood from incomplete data via the em ‎algorithm,†Journal of the ‎Royal Statistical Society, vol. 39(1), pp. 1–38, 1977‎ https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.

      [22] ‎D. Cohn, Z. Ghahramani, and M. Jordan, Active learning with statistical models. Articial ‎Intelligence Research, ‎vol. 4, pp. 129145, 1996.‎ https://doi.org/10.1613/jair.295.

      [23] ‎Alex Graves and Jürgen S. 2009, "Offline handwriting recognition with multidimensional ‎recurrent neural ‎networks". In Advances in Neural Information Processing Systems 21, pp ‎‎545-552.‎

  • Downloads

  • How to Cite

    I. Abdalla, M., A. Rashwan, M., & A. Elserafy, M. (2019). Generating realistic Arabic handwriting dataset. International Journal of Engineering & Technology, 8(4), 460-466. https://doi.org/10.14419/ijet.v8i4.29786