A Comprehensive Framework for OCR Web Services System for Arabic Calligraphy Documents

  • Authors

    • Hassanin M. Al-Barhamtoshy
    • Abdullah S. Al-Ghamdi
    2019-03-01
    https://doi.org/10.14419/ijet.v8i1.11.28084
  • Arabic, Document analysis, connected component, sparse, segmentation, OCR web services.
  • Abstract

    This paper describes document layout analysis web services approach for OCR systems, in case of integrate with web-based applications using SOAP and REST interfaces. The proposed solution provides accessing way to use different OCR systems. Therefore, these web services are implemented using SOAP and REST interfaces through HTTP or HTTPS requests. Consequently, different developers can communicate with each other’s without time consuming to customize code implementation, operating system barriers, and programming language conditions.

     

    The scientific scope of this paper focuses on three objectives:

    (1)   The document categories on which they are included in the dataset, (2) The related algorithms that are used in the level of document analysis, and (3) The Arabic document image segmentation algorithms they are used. Consequently, the connected components method is used to remove page frame in the old and calligraphy documents. Also, shadow noises in the old and historical documents are removed using the adapted sparse algorithm.

    This paper discusses a number of the major areas where OCR web services have been working comprehensively: in supporting document analysis and OCR service-oriented architecture computing. Using the OCR web services approaches, we are dealing with heterogeneous large scale documents with wide varying structured category. Furthermore, there could be multipage document with different languages. Accordingly, the language domain will be identified within the language script specification module.

     

  • References

    1. [1] S. Setlur and Z. Shi, (2014). “Asian character Recognitionâ€, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_14, Springer-Verlang London, pp. 459-486.

      [2] H. Cao and P. Natarajan, (2014). “Machine printed character recognitionâ€, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_44, Springer-Verlang London, pp. 331-358.

      [3] H.Al-Barhamtoshy, and M. Rashwan, (2014). “Arabic OCR Segmented-based Systemâ€, Life Science Journal, 11 (10), (ISSN: 1097- 8135),http://www.lifesciencesite.com/lsj/life1110/200_27304life111014_1273_1283. pdf&sa=X&scisig=AAGBfm0YM6ykkOm8jGglYVhx2mT-ZU8OIA&oi=scholaralrt, http://www.lifesciencesite.com.

      [4] U. Pal, and N. Dash, (2014). “Language, Script, and Font Recognitionâ€, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_9, Springer-Verlang London, pp. 291-330.

      [5] S. Zha, X. Peng, H. Cao, X. Zhuang, P. Natarajan, and P. Natarajan, (2014). “Text Classification via iVector Based Feature Representationâ€. 11th IAPR International Workshop on Document Analysis System, IEE, pp. 151-155.

      [6] K. El-Gajoui and F. Ataa-Allah, (2014). “Optical character recognition for multilingual documentsâ€: Amazigh-French Abstract-Optical, IEEE Second World Conference on Complex Systems, pp. 978-1-4799-4647-1.

      [7] M. S. Khorsheed and H. Al-Omari, (2011). “Recognizing Cursive Arabic Text: Using statistical features and interconnected mono-HMMsâ€, 4th IEEE International Congress on Image and Signal Processing, pp. 1540-1543.

      [8] Krayem, N. Sherkat, L. Evett, and T. Osman, (2013). “Holistic Arabic Whole Word Recognition using HMM and Block-based DCTâ€. 12th International Conference on Document Analysis and Recognition, pp. 1120-1124.

      [9] M. Baechler, M. Liwicki, R. Ingold, “Text line extraction using DMLP classifiers for historical manuscriptsâ€, in: Proceedings of 12th ICDAR, IEEE, 2013, p. 1029.

      [10] S. Cholia, D. Skinner, and J. Boverhof, “NEWT: A RESTful service for building High Performance Computing web applications,†in 2010

      [11] Gateway Computing Environments Workshop, 2010.

      [12] Lamiroy and D. Lopresti, “An Open Architecture for End-to-End Document Analysis Benchmarking,†in 2011 International Conference on Document Analysis and Recognition, sep 2011, pp. 42–47.

      [13] H. M. Al-barhamtoshy, (2016). “Towards Large Scale Image Similarity Discovery Modelâ€, 2nd International Conference on Advanced Technologies for Signal& Image Processing ATSIP’2016, March 21-24, Monastir Tunisia, http://ieeexplore.ieee.org/stamp/stamp. jsp?tp=&arnumber=7523047

      [14] S. Eskenazi, P. Kramer, J. Ogier, (2017). “A Comprehensive Survey of mostly Textual Document Segmentation Algorithmsâ€, since 2008, Pattern Recognition 64 (2017) 1-14.

  • Downloads

  • How to Cite

    M. Al-Barhamtoshy, H., & S. Al-Ghamdi, A. (2019). A Comprehensive Framework for OCR Web Services System for Arabic Calligraphy Documents. International Journal of Engineering & Technology, 8(1.11), 16-24. https://doi.org/10.14419/ijet.v8i1.11.28084

    Received date: 2019-03-01

    Accepted date: 2019-03-01

    Published date: 2019-03-01