Image Enhancement of Complex Document Images Using Histogram of Gradient Features

  • Authors

    • Sajan A. Jain
    • N. Shobha Rani
    • N. Chandan
    2018-12-09
    https://doi.org/10.14419/ijet.v7i4.36.24244
  • Document images, pre-processing, marginal noise removal, Hu moment’s, optical character recognition.
  • Enhancement of document images is an interesting research challenge in the process of character recognition. It is quite significant to have a document with uniform illumination gradient to achieve higher recognition accuracies through a document processing system like Optical Character Recognition (OCR). Complex document images are one of the varied image categories that are difficult to process compared to other types of images. It is the quality of document that decides the precision of a character recognition system. Hence transforming the complex document images to a uniform illumination gradient is foreseen. In the proposed research, ancient document images of UMIACS Tobacco 800 database are considered for removal of marginal noise. The proposed technique carries out the block wise interpretation of document contents to remove the marginal noise that is present usually at the borders of images. Further, Hu moment’s features are computed for the detection of marginal noise in every block. An empirical analysis is carried out for classification of blocks into noisy or non-noisy and the outcomes produced by algorithm are satisfactory and feasible for subsequent analysis.

     

  • References

    1. [1] Shafait F &Breuel TM, “A simple and effective approach for border noise removal from document imagesâ€, IEEE 13th International Conference on Multi topic, (2009), pp. 1-5.

      [2] Verma RN & Malik LG, “Review of illumination and skew correction techniques for scanned documentsâ€, Procedia Computer Science, Vol.45, (2015), pp.322-327.

      [3] Shafait F, Keysers D & Breuel TM, “Efficient implementation of local adaptive thresholding techniques using integral imagesâ€, International Society for Optics and Photonics Electronic Imaging, (2008), pp.681510-681510.

      [4] Agrawal M & Doermann D, “Stroke-like pattern noise removal in binary document images. IEEE International Conference on Document Analysis and Recognition, (2011), pp.17-21.

      [5] Agrawal M & Doermann D, “Clutter noise removal in binary document imagesâ€, IEEE 10th International Conference on Document Analysis and Recognition, (2009), pp.556-560.

      [6] Farahmand A, Sarrafzadeh A & Shanbehzadeh J, “Document image noises and removal methodsâ€, International Multi Conference of Engineers and Computer Scientists, (2013), pp.1-5.

      [7] Gupta A, Gutierrez-Osuna R, Christy M, Capitanu B, Auvil L, Grumbach L & Mandell L, “Automatic Assessment of OCR Quality in Historical Documentsâ€. AAAI, (2015), pp.1735-1741.

      [8] Lins RD, Ãvila BT & De Araújo Formiga A, “Big Batch–an environment for processing monochromatic documentsâ€, International Conference Image Analysis and Recognition, (2006), pp.886-896.

      [9] Stamatopoulos N, Gatos B & Kesidis A, “Automatic borders detection of camera document imagesâ€, 2nd International Workshop on Camera-Based Document Analysis and Recognition, (2007), pp.71-78.

      [10] Shafait F, Van Beusekom J, Keysers D & Breuel TM, “Document cleanup using page frame detectionâ€, International Journal of Document Analysis and Recognition, Vol.11, No.2, (2008), pp.81-96.

      [11] Hoang TV, Smith EHB & Tabbone S, “Sparsity-based edge noise removal from bilevel graphical document imagesâ€, International Journal on Document Analysis and Recognition, Vol.17, No.2, (2014), pp.161-179.

      [12] Fan KC, Wang YK & Lay TR, “Marginal noise removal of document imagesâ€, Pattern Recognition, Vol.35, No.11, (2002), pp.2593-2611

      [13] Jadhav PD, Jadhav DR, Gite SS & Mulik V, “Enhancement of Old Degraded Documents Using Phase Base Binarization by Dip Techniqeâ€, International Journal of Engineering Science, Vol.4225, (2016).

      [14] Lewis D, Agam G, Argamon S, Frieder O, Grossman D & Heard J, “Building a test collection for complex document information processingâ€, Annual Int. ACM SIGIR Conference, (2006), pp. 665-C666.

      [15] Agam G, Argamon S, Frieder O, Grossman D & Lewis D, “The Complex Document Image Processing (CDIP) test collectionâ€, Illinois Institute of Technology, (2006).

      [16] The Legacy Tobacco Document Library (LTDL), University of California, San Francisco, (2007).

      [17] Rani NS & Vasudev T, “An Efficient Technique for Detection and Removal of Lines with Text Stroke Crossings in Document Imagesâ€, International Conference on Cognition and Recognition, (2018), pp.83-97.

      [18] Rani DANS, Vineeth P & Ajith D, “Detection and removal of graphical components in pre-printed documentsâ€, International Journal of Applied Engineering Research, Vol.11, No.7, (2016), pp.4849-4856.

  • Downloads

  • How to Cite

    A. Jain, S., Shobha Rani, N., & Chandan, N. (2018). Image Enhancement of Complex Document Images Using Histogram of Gradient Features. International Journal of Engineering & Technology, 7(4.36), 780-783. https://doi.org/10.14419/ijet.v7i4.36.24244