A. BANDYOPADHYAY1, A. GANGULY2, U. PAL3
1CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India.
2CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India.
3CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India.
Received : - Accepted : - Published : 15-12-2011
Volume : 1 Issue : 1 Pages : 5 - 10
J Comput Ling 1.1 (2011):5-10
Layout segmentation algorithms found in published papers often rely on some predetermined parameters such as general font sizes, distances between text lines, presence of images and document scan resolutions. Variations of these parameters in real document images greatly affect the performance of these algorithms. In this paper we present a simple and novel approach for document page segmentation which are complex in nature (having more than one picture or header). In this paper we have dealt with the segmentation of a scanned document into images, headers, columns and finally into paragraphs. We first separate the image and fonts of greater size and then follow it up with column separation. Finally we divide it into smaller paragraphs.
[1] D. Chetverikov, J. Liang, J. Komuves, and R.
Haralick, Zone classification using texture features . In
Proc. of Intl. Conf. on Pattern Recognition, volume 3,
pages 676 680, 1996.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[2] S. S. G. Nagy and S. Stoddard, Document analysis
with expert system . Procedings of Pattern Recognition
in Practice II, June 1985.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[3] M. Hose and Y. Hoshino, Segmentation method of
document images by two-dimensional fourier transformation
. System and Computers in Japan.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[4] A. Jain, Fundamentals of digital image processing .
Prentice Hall, 1990.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[5] A. Jain and B. Yu, Document representation and its
application to page decomposition . IEEE trans. On
Pattern Analysis and Machine Intelligence, 20(3):294
308, March 1998.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[6] A. K. Jain and S. Bhattacharjee, Text segmentation
using gabor filters for automatic document processing .
Machine Vision and Applications, 5(3):169 184, 1992
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[7] O. Okun, D. Doermann, and M. Pietikainen, Page
segmentation and zone classification: The state of the
art . In UMD, 1999
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[8] T. Pavlidis and J. Zhou, Page segmentation by white
Streams . Proc. 1st Int. Conf. Document Analysis and
Recognition (ICDAR),Int. Assoc. Pattern Recognition,
pages 945 953, 1991
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[9] C. Tan and Z. Zhang, Text block segmentation using
pyramid structure . SPIE Document Recognition and
Retrieval, San Jose, USA, 8:297 306, January 24-25
2001.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[10] F. Wahl, K. Wong, and R. Casey, Block
segmentation and text extraction in mixed text/image
documents . CGIP, 20:375 390, 1982
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[11] D. Wang and S. Srihari, Classification of
newspaper image blocks using texture analysis . CVGIP,
47:327 352, 1989
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[12] Z. Shi and V. Govindaraju, Multi-scale Techniques
for Document Page Segmentation pp.1020-1024, Eighth
International Conference on Document Analysis and
Recognition (ICDAR'05), 2005
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus
[13] G.Harit, R.Garg and S.Chaudhury, Syntactic and
semantic labelling of hierarchical organized document
image components of Indian Scripts . Proc. ICAPR, pp.
314-317, 2009
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus