An OCR System for Arabic Calligraphy Documents

  • Authors

    • Hassanin Al-Barhamtoshy
    • Kamal Jambi
    • Hany Ahmed
    • Shaimaa Mohamed
    • Mohsen Rashwan
    • Sherif Abdou
    2019-03-01
    https://doi.org/10.14419/ijet.v8i1.11.28083
  • Use about five key words or phrases in alphabetical order, Separated by Semicolon.
  • This paper introduces to get good accuracy for Arabic OCRresultsforolddocumentsandcalligraphydocuments.While our developed system has provided accurate results for modern Arabic documents, when we used that system for old Arabic documents, we got a steep degradation in performance, (around 25% accuracy compared with 85% for modern Arabic documents).MarketofArabicOCRforolddocumentsislargeand it deservers higher attention even more than the modern documents, which in many cases are already distributed in digitized format. Therefore, in this paper, we addresses the challengesofArabicOCRforolddocuments.Wemadethreemain modifications for our system. Firstly, we eliminated the word segmentation step and run the OCR process on the complete line. With this modification, we managed to avoid large number of segmentation errors on the word level but had to change our recognition approach to be a dictionary based. The second modification, we changed the used features to histogram gravity basedone.Thistypeoffeatureprovidedmuchbetterperformance especially in the challenging cases of old documents such as low quality printing effects, wavy baselines and heavy noisy documents. Third, we used a hybrid model that integrate Neural NetworkswithHMMtoprovidebetterdiscriminationbetweenthe shapes of Arabic ligatures. The paper starts with fast review for the baseline system and the added enhancements of such OCR. Then introduce to the new developed OCR efforts for Arabic old documents and calligraphy documents.

     

     

  • References

    1. [1] S. AbdelAzeem and H. Ahmed, Effective technique for the recognition of offline Arabic handwritten words using hidden Markov models, International Journal on Document Analysis and Recognition (IJDAR), December 2013, Volume 16, Issue 4, pp 399–412.

      [2] M. Attia, M. S. El-Mahallawy, M. Rashwan, W. Nazih, M. Al-Badrashiny, Omnifont text recognition of printed cursive scripts via HMMs, compact lossless features, and soft data clustering, Pattern Analysis and Applications, August 2015, Volume 18, Issue 3, pp 507– 521.

      [3] Rashwan, M. Rashwan, A. Abdel-Hameed, S. Abdou, A. Khalil, A robust omni-font open-vocabulary Arabic OCR system using pseudo-2D-HMM, Proc. SPIE 8297, Document Recognition and Retrieval XIX, 829707 (January 23, 2012); doi:10.1117/12.910390.

      [4] Srimany ; S. Dutta Chowdhuri ; U. Bhattacharya ; S.K. Parui, Holistic Recognition of Online Handwritten Words Based on an Ensemble of SVM Classifiers, Document Analysis Systems (DAS), 2014 11th IAPR International Workshop.

      [5] H. M. Al-Barhamtoshy, Towards Large Scale Image Similarity Discovery Model, 2nd International Conference on Advanced Technologies for Signal & Image Processing ATSIP’2016, March 21-24, Monastir Tunisia, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7523047

      [6] H. Liu, H. Zha, X. Liu. "Skew detection for complex document images using robust borderlines in both text and non-text regions." Pattern Recognition Letters 29.13 (2008): 1893-1900.

      [7] H. Al-Barhamtoshy, and M. Rashwan, (2014). Arabic OCR Segmented-based System, Life Science Journal, 11 (10), (ISSN: 1097-8135), http://scholar.google.com.eg/scholar_url?hl=en&q=http://www.lifescien cesite.com/lsj/life1110/200_27304life111014_1273_1283.pdf&sa=X&s cisig=AAGBfm0YM6ykkOm8jGglYVhx2mT-ZU8OIA&oi=scholaralrt, http://www.lifesciencesite.com

      [8] Abdelaziz, S. Abdou, and H. Al-Barhamtoshy, “A large vocabulary system for Arabic online handwriting recognitionâ€, Pattern Analysis & Applications, Springer, Dec. 2015, DOI 10.1007/s10044-015-0526-7. http://link.springer.com/article/10.1007%2Fs10044-015-0526-7#page-1.

      [9] M. Elad, and M. Aharon, (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, pp. 3736-3745.

      [10] Y. Kirti, M. Patil, (2013). Confidance Calibration Mesures to Improve Speech Recognition, International conference on Communication and Signal Processing, April 3-5, 2013, pp. 826- 829, India.

      [11] Bo, Li and Khe Chai, Sim, (2013). Noise adaptive front-end normalization based on Vector Taylor Series for Deep Neural Networks in robust speech recognition, ICASSP 2013, pp. 7408- 7412.

      [12] S. Siniscalchi, D. Yu, Li Deng, and C. Lee, (2013). Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model, IEEE Signal Processing Letters, Vol. 20, No. 3, March 2013, pp. 201-204.

  • Downloads

  • How to Cite

    Al-Barhamtoshy, H., Jambi, K., Ahmed, H., Mohamed, S., Rashwan, M., & Abdou, S. (2019). An OCR System for Arabic Calligraphy Documents. International Journal of Engineering & Technology, 8(1.11), 9-15. https://doi.org/10.14419/ijet.v8i1.11.28083