Image Enhancement of Complex Document Images Using Histogram of Gradient Features

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Enhancement of document images is an interesting research challenge in the process of character recognition. It is quite significant to have a document with uniform illumination gradient to achieve higher recognition accuracies through a document processing system like Optical Character Recognition (OCR). Complex document images are one of the varied image categories that are difficult to process compared to other types of images. It is the quality of document that decides the precision of a character recognition system. Hence transforming the complex document images to a uniform illumination gradient is foreseen. In the proposed research, ancient document images of UMIACS Tobacco 800 database are considered for removal of marginal noise. The proposed technique carries out the block wise interpretation of document contents to remove the marginal noise that is present usually at the borders of images. Further, Hu moment’s features are computed for the detection of marginal noise in every block. An empirical analysis is carried out for classification of blocks into noisy or non-noisy and the outcomes produced by algorithm are satisfactory and feasible for subsequent analysis.


  • Keywords

    Document images, pre-processing, marginal noise removal, Hu moment’s, optical character recognition.

  • References

      [1] Shafait F &Breuel TM, “A simple and effective approach for border noise removal from document images”, IEEE 13th International Conference on Multi topic, (2009), pp. 1-5.

      [2] Verma RN & Malik LG, “Review of illumination and skew correction techniques for scanned documents”, Procedia Computer Science, Vol.45, (2015), pp.322-327.

      [3] Shafait F, Keysers D & Breuel TM, “Efficient implementation of local adaptive thresholding techniques using integral images”, International Society for Optics and Photonics Electronic Imaging, (2008), pp.681510-681510.

      [4] Agrawal M & Doermann D, “Stroke-like pattern noise removal in binary document images. IEEE International Conference on Document Analysis and Recognition, (2011), pp.17-21.

      [5] Agrawal M & Doermann D, “Clutter noise removal in binary document images”, IEEE 10th International Conference on Document Analysis and Recognition, (2009), pp.556-560.

      [6] Farahmand A, Sarrafzadeh A & Shanbehzadeh J, “Document image noises and removal methods”, International Multi Conference of Engineers and Computer Scientists, (2013), pp.1-5.

      [7] Gupta A, Gutierrez-Osuna R, Christy M, Capitanu B, Auvil L, Grumbach L & Mandell L, “Automatic Assessment of OCR Quality in Historical Documents”. AAAI, (2015), pp.1735-1741.

      [8] Lins RD, Ávila BT & De Araújo Formiga A, “Big Batch–an environment for processing monochromatic documents”, International Conference Image Analysis and Recognition, (2006), pp.886-896.

      [9] Stamatopoulos N, Gatos B & Kesidis A, “Automatic borders detection of camera document images”, 2nd International Workshop on Camera-Based Document Analysis and Recognition, (2007), pp.71-78.

      [10] Shafait F, Van Beusekom J, Keysers D & Breuel TM, “Document cleanup using page frame detection”, International Journal of Document Analysis and Recognition, Vol.11, No.2, (2008), pp.81-96.

      [11] Hoang TV, Smith EHB & Tabbone S, “Sparsity-based edge noise removal from bilevel graphical document images”, International Journal on Document Analysis and Recognition, Vol.17, No.2, (2014), pp.161-179.

      [12] Fan KC, Wang YK & Lay TR, “Marginal noise removal of document images”, Pattern Recognition, Vol.35, No.11, (2002), pp.2593-2611

      [13] Jadhav PD, Jadhav DR, Gite SS & Mulik V, “Enhancement of Old Degraded Documents Using Phase Base Binarization by Dip Techniqe”, International Journal of Engineering Science, Vol.4225, (2016).

      [14] Lewis D, Agam G, Argamon S, Frieder O, Grossman D & Heard J, “Building a test collection for complex document information processing”, Annual Int. ACM SIGIR Conference, (2006), pp. 665-C666.

      [15] Agam G, Argamon S, Frieder O, Grossman D & Lewis D, “The Complex Document Image Processing (CDIP) test collection”, Illinois Institute of Technology, (2006).

      [16] The Legacy Tobacco Document Library (LTDL), University of California, San Francisco, (2007).

      [17] Rani NS & Vasudev T, “An Efficient Technique for Detection and Removal of Lines with Text Stroke Crossings in Document Images”, International Conference on Cognition and Recognition, (2018), pp.83-97.

      [18] Rani DANS, Vineeth P & Ajith D, “Detection and removal of graphical components in pre-printed documents”, International Journal of Applied Engineering Research, Vol.11, No.7, (2016), pp.4849-4856.




Article ID: 24244
DOI: 10.14419/ijet.v7i4.36.24244

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.