Generating realistic Arabic handwriting dataset

Mahmoud I. Abdalla; Mohsen A. Rashwan; Mohamed A. Elserafy

doi:10.14419/ijet.v8i4.29786

Article Summary Abstract References Full Article How to cite

Authors
- Mahmoud I. Abdalla Computer engineer at Suez Canal Authority
- Mohsen A. Rashwan professor, Electronics and Communication Department, Zagazig University,Zagazig
- Mohamed A. Elserafy professor, Electronics and Communication Department, Cairo University, Cairoâ€Ž
2019-10-19

https://doi.org/10.14419/ijet.v8i4.29786
Arabic handwriting, normalization, ligatures, template learning, Gaussian regression.
Abstract

During the previous year's holistic approach showing satisfactory results to solve â€Žthe â€Žproblem of Arabic handwriting wordÂ recognition instead of word letters â€Žâ€Žsegmentation.â€Ž â€ŽIn this paper, we present an efficient system for â€Ž generation realistic Arabic handwriting dataset from ASCII input â€Žtext. We carefully selected simple word list that contains most Arabic â€Žletters normal and ligature connection cases. To improve the â€Žperformance of new letters reproduction we developed our â€Žnormalization method that adapt its clustering action according to â€Žcreated Arabic letters families. We enhancedÂ Gaussian Mixture â€ŽModel process to learn letters template by detecting the â€Žnumber and position of Gaussian component by implementing â€ŽRamer-Douglas-Peuckerâ€Ž algorithm which improve the new letters â€Žshapes reproduced by using and Gaussian Mixture Regression. â€Žâ€ŽWe learn the translation distance between word-part to achieve â€Žreal handwriting word generation shape.â€Ž Using combination of LSTM and CTC layer as a recognizer to validate the â€Žefficiency of our approach in generating new realistic Arabic handwriting words inherit user handwriting style as shown by the experimental results.â€Ž
Â
References
1. [1] Mamoun Sakkal," Arabic Alphabet Chart in Naskh Style", www.sakkal.com.â€Ž
  [2] â€ŽA. Amin, 2000, â€œRecognition of Printed Arabic Text Based on Global Features and â€ŽDecision Tree Learning â€ŽTechniquesâ€, Pattern Recognition, vol. 33, pp. 1309â€“1323.â€Ž https://doi.org/10.1016/S0031-3203(99)00114-4.
  [3] â€ŽYannis H.,1995, "The Traditional Arabic Type-case Extended to the Unicode Set of â€ŽGlyphs" Electronic â€ŽPublishing, Vol. 8, pp. 111-123.â€Ž
  [4] â€ŽA. Graves, â€œGenerating sequences with recurrent neural networks,â€ CoRR, vol. â€Žabs/1308.0850, 2013. [Online]. â€ŽAvailable: http://arxiv.org/abs/1308.0850â€Ž
  [5] â€ŽY. Elarian, Husni Al-Muhtaseb, and LahouariGhouti,2010, "Arabic Handwriting â€ŽSynthesis", International â€ŽWorkshop on Frontiers in Arabic Handwriting Recognition, â€ŽIstanbul.â€Ž
  [6] â€ŽMargner V, Pechwitz M (2001) Synthetic Data for Arabic OCR System Development. In: â€ŽSixth International â€ŽConference on Document Analysis and Recognition (ICDAR'01), â€ŽIEEE: 1159-1163.â€Ž
  [7] â€ŽR.M. Saabni, J.A. El-Sana,2013, "Comprehensive synthetic Arabic database for on/offline â€Žscript recognition â€Žresearch," Int. J. Doc. Anal. Recognit. (IJDAR) 16 (3) pp. 285â€“294.â€Ž https://doi.org/10.1007/s10032-012-0189-5.
  [8] â€ŽShatnawi M. and Abdallah S.,2015,"Improving Handwritten Arabic Character â€ŽRecognition by Modeling â€ŽHuman Handwriting Distortions," ACM Transactions on Asian â€Žand Low-Resources Information Processing.â€Ž https://doi.org/10.1145/2764456.
  [9] â€ŽA. Almaksour, E. Anquetil, R. Plamondon, and C. O'Reilly, Synthetic handwritten â€Žgesture generation using â€Žsigma-lognormal model for evolving handwriting classifiers, in: â€ŽProceedings of the 15th Biennial Conference of â€Žthe International Graphonomics Society, â€Žâ€Ž2011, pp.98â€“101.â€Ž
  [10] â€ŽY. Zheng and D. Doermann, â€œHandwriting matching and its application to handwriting â€Žsynthesis,â€ in â€ŽProceedings of the Eight International Conference on Document Analysis â€Žand Recognition (ICDAR), 2005, pp. â€Žâ€Ž861â€“865.â€Ž
  [11] â€ŽDinges, L.; Al-Hamadi, A.; Elzobi, M.; El etriby, S.; Ghoneim, A. ASM based Synthesis â€Žof Handwritten Arabic â€ŽText Pages. Sci. World J. 2015, 2015, 323575.â€Ž https://doi.org/10.1155/2015/323575.
  [12] â€ŽD. Salomon, â€œCurves and Surfaces for Computer Graphicsâ€, Ch.1, pp.7-14, Springer, â€Žâ€Ž2006.â€Ž
  [13] â€ŽMustaffa and Yusof. A Comparison of Normalization Techiques in Predicting Dengue â€ŽOutbreak. International â€ŽCinference on Business and Economics Resaerch, â€Žvol.1(2011) Â© (2011) LACSIT Press, Kuala Lumpur, â€ŽMalaysia. â€Ž
  [14] â€ŽPatel and Mehta. Impact of Outlier Removal and Normalization Approach in Modified k-â€ŽMeans Clustering â€ŽAlgorithm. IJCSI International Journal of Computer Science Issues, â€ŽVol. 8, Issue 5, No 2, September 2011, â€ŽISSN (Online): 1694-0814.â€Ž
  [15] â€ŽG.Schwarz, â€œEstimating the Dimension of a Model,â€ Annals of Statistics, vol. 6, 1978, â€Žpp. 461-464. â€Žhttps://doi.org/10.1214/aos/1176344136.
  [16] â€ŽC. Biernacki, G.Celeux and G. Govarert, â€œAssessing a Mixture Model for Clustering with â€Žthe Integrated â€ŽCompleted Likelihood,â€ Technical Report 3,521, Inria, 1998. â€Ž
  [17] â€ŽA.Likas, N.Vlassis, and J.Verbeek, â€œThe Global k-means clustering algorithm,â€ Pattern â€ŽRecognition 36, 2003, â€Žpp. 451-461. [12] J.Verbeek, N.Vlassis, and B.Krose, â€œEfficient â€ŽGreedy Learning of Gaussian Mixture,â€ Neural â€ŽComputation 15, 2003, pp. 469-485.â€Ž https://doi.org/10.1016/S0031-3203(02)00060-2.
  [18] â€ŽY Lee, KY Lee, J Lee.,2006, "The estimating optimal number of Gaussian mixtures based â€Žon incremental k-â€Žmeans for speaker identification", International Journal of Information â€ŽTechnology 12 (7), pp13-21.â€Ž
  [19] â€ŽU. Ramer, An iterative procedure for the polygonal approximation of plane curves, â€ŽComputer Graphics and â€ŽImage Processing 1(3) (1972) 244-256. â€Žhttps://doi.org/10.1016/S0146-664X(72)80017-0.
  [20] â€ŽD.H. Douglas, T.K. Peucker, Algorithms for the reduction of the number of points â€Žrequired to represent a â€Ždigitized line or its caricature, Cartographical: The International â€ŽJournal for Geographic Information and â€ŽGeovisualization 10(1973) 112-122.â€Ž https://doi.org/10.3138/FM57-6770-U75U-7727.
  [21] â€ŽA. Dempster and N. Rubin, â€œMaximum likelihood from incomplete data via the em â€Žalgorithm,â€ Journal of the â€ŽRoyal Statistical Society, vol. 39(1), pp. 1â€“38, 1977â€Ž https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.
  [22] â€ŽD. Cohn, Z. Ghahramani, and M. Jordan, Active learning with statistical models. Articial â€ŽIntelligence Research, â€Žvol. 4, pp. 129145, 1996.â€Ž https://doi.org/10.1613/jair.295.
  [23] â€ŽAlex Graves and JÃ¼rgen S. 2009, "Offline handwriting recognition with multidimensional â€Žrecurrent neural â€Žnetworks". In Advances in Neural Information Processing Systems 21, pp â€Žâ€Ž545-552.â€Ž
Downloads
How to Cite
I. Abdalla, M., A. Rashwan, M., & A. Elserafy, M. (2019). Generating realistic Arabic handwriting dataset. International Journal of Engineering & Technology, 8(4), 460-466. https://doi.org/10.14419/ijet.v8i4.29786
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX
Received date: 2019-08-25

Accepted date: 2019-10-05

Published date: 2019-10-19

Generating realistic Arabic handwriting dataset

Authors

Abstract

References

Downloads

How to Cite

Published