AUTOMATIC NON-HORIZONTAL SCENE TEXT RECOGNITION FROM A DOCUMENT IMAGE

BASAVANNA M.1*, SHIVANAND GORNALE2, SHIVAKUMARA P.3, SRIVATSA S.K.4
1School of Computing Science, VELS University, Chennai-Tamil Nadu-India
2Department of Computer Science, Government College (Autonomous), Mandya-Karnataka-India
3School of Computing, National University of Singapore-Singapore
4Senior Professor, St. Joseph College of Engineering, Chennai-Tamil Nadu-India
* Corresponding Author : basavanna_m@yahoo.com

Received : 29-09-2011     Accepted : 03-11-2011     Published : 07-11-2011
Volume : 3     Issue : 3       Pages : 164 - 167
Int J Mach Intell 3.3 (2011):164-167
DOI : http://dx.doi.org/10.9735/0975-2927.3.3.164-167

Conflict of Interest : None declared
Acknowledgements/Funding : This research work is supported by University Grants Commission (UGC), New Delhi, India (F.No.UGC/MRP(s)/800/10-11/KAMY022).

Cite - MLA : BASAVANNA M., et al "AUTOMATIC NON-HORIZONTAL SCENE TEXT RECOGNITION FROM A DOCUMENT IMAGE." International Journal of Machine Intelligence 3.3 (2011):164-167. http://dx.doi.org/10.9735/0975-2927.3.3.164-167

Cite - APA : BASAVANNA M., SHIVANAND GORNALE, SHIVAKUMARA P., SRIVATSA S.K. (2011). AUTOMATIC NON-HORIZONTAL SCENE TEXT RECOGNITION FROM A DOCUMENT IMAGE. International Journal of Machine Intelligence, 3 (3), 164-167. http://dx.doi.org/10.9735/0975-2927.3.3.164-167

Cite - Chicago : BASAVANNA M., SHIVANAND GORNALE, SHIVAKUMARA P., and SRIVATSA S.K. "AUTOMATIC NON-HORIZONTAL SCENE TEXT RECOGNITION FROM A DOCUMENT IMAGE." International Journal of Machine Intelligence 3, no. 3 (2011):164-167. http://dx.doi.org/10.9735/0975-2927.3.3.164-167

Copyright : © 2011, BASAVANNA M., et al, Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

The detection and extraction of scene text from document images is one of the challenging research areas. Many researchers have detected and extracted the text from plain text background. But the multi-oriented scene text detection is one of the complex problems due to multi-oriented texts which have different orientation, font size, colors etc. In this work, we have proposed a new algorithm to detect and extract the multi-oriented scene text. Experiments have been carried out to find the robustness on the proposed method, by conducting various experiments on heterogeneous datasets. The proposed method achieves selectively high detection rate of 88.43% on the multi-oriented scene text.

Keywords

Multi-Oriented scene text, Sobel edge map, horizontal text image, Scene text detection, Scene text recognition.

Introduction

Automatic identifying and extracting text lines of images has become an imperative need to help blind persons [1-4] . Efficient identifying and retrieving relevant information from multi-oriented text in an image becomes hard and challenging problem for the researchers [5] . Therefore, for the past decades, several ways based on annotation and content have been introduced to meet real challenges of the retrieval. But, to the best of our knowledge, none of the methods achieve good accuracy in filling the semantic gap between low level and high level features to understand the image [6-8] . This is because of unexpected and undesirable properties of image such as low resolution, complex background, different orientation, font, size and text [9-11] . Hence, an alternate way to fill the semantic gap is multi-oriented text detection, extraction and recognition to understand the image content. Text detection and recognition is quite familiar work for image analysis community but due to the above properties of image, image analysis based methods may fail to give satisfactory results [12-14] .
Multi-Oriented text detection and extraction in image is usually addressed by four main approaches, namely, distance between two white pixels of horizontal text image, distance between two white pixels of oriented text image, connected components, and eliminating false positive using geometrical properties such as height, width, aspect ratio and PCA (Principal Component Analysis) of text block. These methods solve the problem to some extent, but still there is room for improvements especially for large image containing both graphics and scene text and more than 180 degree angle [5] . Many researchers have worked and noticed that there are several methods for oriented text detection in scene images. But none of the method to identify oriented text lines from a document image gives perfect solution to this problem in terms of simplicity and better accuracy. Hence, these factors motivated us to propose a new method for Multi-Oriented text detection in image. In this paper, we propose simple and effective method which considers regular spacing between characters and words as a basis for text detection. The proposed concept for text segmentation is to achieve better accuracy in text detection. The proposed method uses geometrical properties and PCA to find angle of all the blocks of image to eliminate false positives and retain true text blocks.

Methodology

The objective of image analysis is to recognize the text and graphics components in images of documents, and to extract the desired information as manually done. Two categories of document image analysis are defined viz. textual processing and graphical processing. Textual processing deals with the text components of a document image. In this, detection of skew (any tilt present in the document during scanning), finding columns, paragraphs, text lines, words, and finally recognizing the text (and possibly its attributes such as orientation, size, font etc.) through the Optical Character Recognition (OCR) system. Graphical processing deals with the non-textual line and symbol components that make up line diagrams, delimiting straight line between text sections, company logos etc. Pictures are the third major component of documents, but except for recognizing their location on a page, further analysis of them is usually the task of other image processing and machine vision techniques. After application of these text and graphics analysis techniques, the several megabytes of initial data are called to yield a much more concise semantic description of the documents. Our system’s aim is to guide blind person without anybody’s help to walk around the city or busy area. Camera is fixed on head of the blind person, which captures images with text information; there is a system to recognize the text [15] . Once the text is recognized the system should translate it into speech to guide him.

System Design

The system works as follows a) Acquire the image. b) Oriented text detection from the acquired images. c) Store the detected text. d) Character Recognition e) Text to speech recognition. This research work is intended to carry out the second step of the proposed system i.e. Oriented text detection from the acquired image.

Proposed Algorithm

In this work, we propose new algorithm, for oriented text detection in scene images as we have observed that the space between characters and words in a text line of Sobel edge map of given gray input image is in regular pattern. This observation indicates strong clue for oriented text location in the scene image. The reason for using Sobel edge map of input image is that the Sobel operator detects edges when there is high contrast information in the images. It is true that text in the images usually have high contrast compared to its background. Since the aim of our work is to locate oriented text in images, few edges detected by Sobel edge operator are enough to identify the location of text in the images. The text detection is illustrated in [Fig-1] .
For the given horizontal gray image shown in [Fig-1] (a), we obtain Sobel edge map as shown in [Fig-1] (b) where we can see edge information corresponding to text information in the image. The horizontal run length is obtained by counting consecutive number of black pixel between the white pixels in horizontal direction for the Sobel edge map. The frequency of number of consecutives of black pixels between white pixels is stored in the array say horizontal array (HA). Same procedures applied for oriented scene text input image [Fig-1] (c) and [Fig-1] (d). The frequency of number of consecutives of black pixels between white pixels of [Fig-1] (c) is stored in the array say oriented array (OA). We compare the frequency of number of consecutives of black pixels between white pixels of HA and OA, if the frequency of HA and OA is equal we fill corresponding oriented image [Fig-1] (e). To identify the frequency of number consecutives which represents text pixels, we introduce Max-Min clustering algorithm. The Max-Min clustering algorithm first selects maximum and minimum values in HA and then the values in HA are compared with Max and Min. As a result, we get two clusters. The cluster which belongs to Max values is considered as text cluster as shown in [Fig-1] (e) where text lines are separated with spaces compared to results in [Fig-1] (f). Then the output of Max-Min clustering algorithm is used for fixing bounding boxes for the text lines as shown in [Fig-1] (f) with false positive elimination. We use geometrical properties such as height, width, aspect ratio and using PCA angles of text block for eliminating false positives. The final text detection result after eliminating false positives can be seen with correct bounding boxes in [Fig-1] (f). One can notice from [Fig-1] (g) that false positives are not eliminated completely due to the problem of text and non-text separation.

Experimental Results

In this experimental work we have proposed the new algorithm for oriented text detection from the acquired image, which is applied on 341 sample images. These images were used to test the proposed algorithm and compared with existing algorithms [16-17] . In all the three kinds of datasets, we have computed the number of true text block detection (TDB), Number of False Positives (NFP), number of Miss Detection Blocks (MDB) for the number of actual text blocks (ATB) of an image.

Performance Measures for Text Frames

The performance of the proposed method is predicted as under by defining the following quality measures:
• Truly Detected Block (TDB): A detected block that contains a text string, partially or fully.
• Falsely Detected Block (FDB): A detected block that does not contain text.
• Text Block with Missing Data (MDB): A detected block that misses some characters of a text string (MDB is a subset of TDB).
For each image in the dataset, we manually count the number of Actual Text Blocks (ATB), i.e. the number of true text blocks in the frame.
• Recall (R) = TDB / ATB
• Precision (P) = TDB / (TDB + FDB)
• F-measure (F) = 2 × P × R / (P + R)
• Misdetection Rate (MDR) = MDB / TDB
There are two other performance measures commonly used in the literature, Detection Rate and False Positive Rate; however, they can also be converted to Recall and Precision: Recall = Detection Rate and Precision = 1 – False Positive Rate [18] . Hence only the above four performance measures are used for evaluation. [Table-1] and [Fig-2] gives the objective subjective analysis of tested sample images for the performance of the two existing methods and the proposed method on the non-horizontal text dataset. The proposed method has the highest recall, the second highest precision (almost the same as that of Laplacian method) and the highest F-measure. This shows the advantage of the proposed method because it achieves good results while making fewer assumptions about text and MDR, which is good as those of the Skeleton-Based method. The proposed method achieves a relatively high detection rate (88.43%) on the non-horizontal text dataset, which is much more challenging than graphics text due to its arbitrary orientation and low contrast. However, the proposed method also has a high FPR.

Conclusion and Future work

Recognizing the text and graphics component from images is one of challenging tasks in Digital image processing and machine vision. Due to unexpected and undesirable properties of images such as low resolution, Font, Size, Complex background and different orientation, it is very difficult to locate the exact text blocks in the image. In view of this we have proposed a new algorithm for Scene Text Recognition in document Image and the method works on the basis that the spacing between characters and words in an oriented text line with oriented regular pattern. The Experimental work has been carried out on 341 sample images and we have achieved a detection rate of 88.43%. It is also observed that in order to get more detection rate more false positives may be eliminated.
Further this work may be extended to strengthen the algorithm to get higher detection rate and for higher oriented text images with a complex background.

Acknowledgement

This research work is supported by University Grants Commission (UGC), New Delhi, India (F.No.UGC/MRP(s)/800/10-11/KAMY022).

References

[1] Doermann D., Liang J. and Li H. (2003) In Proc. ICDAR, 606-616.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Jung K. (2001) Pattern Recognition Letters, 1503-1515.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Ye Q., Huang Q., Gao W. and Zhao.D. (2005) Image and Vision Computing, 565-576.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Chen D., Odobez J.M. and Bourlard H. (2004) Patter Recognition, 595-608.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Clark P. and Mirmehdi M. (2000) SPIE Conf. on Document Recognition and Retrieval VII, 267–277.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Chucai Yi. and YingLi Tian. (2011) IEEE Transactions, Volume: 20, Issue: 9, 2594 – 2605.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Roy P.P., Pal U., Llados J. and Delalandre. M. (2009) In Proc. ICDAR, 11-15.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] Jagath Samarabandu, Xiaoqing Liu. (2005) IEEE International Conference, 701 – 706.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Shivakumara P. Dutta A., Pal U. and Chew Lim Tan. (2010) ICFHR, 387 – 392.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Jung K., Kim K.I. and Jain A.K. (2004) Pattern Recognition, 977-997.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[11] Wu V., Manmatha R. and Riseman E.M. (1999) IEEE Transactions on PAMI, 1224-1229.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[12] Pan Y.F., Hou X. and Liu C.L. (2011) IEEE Transactions on Image Processing, 800-813.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[13] Chen X., Yang J., Zhang J. and Waibel A. (2004) IEEE Transactions on Image Processing, 87-99.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[14] Bhattacharya U., Parui S.K. and Mondal S. (2009) In Proc. ICDAR, 171-175.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[15] Ezaki N., Bulacu M. and Schomaker L. (2004) 17th International Conference on Pattern Recognition, 683-686.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[16] Trung Quy Phan, Palaiahnakote Shivakumara, Chew Lim Tan. DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, 271-278.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[17] Shivakumara P., Trung Quy Phan Chew Lim Tan. (2011) Pattern Analysis and Machine Intelligence, 412– 419.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[18] Wong E.K. and Chen M. (2003) Pattern Recognition, 1397-1406.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

Images
Fig. 1- Steps for scene text detection (a) Horizontal gray image (b) Sobel edge (c) Oriented gray image (e) Oriented Run length (d) Sobel edge (f) Bounding boxes (g) Text detection
Fig. 2- Sample results (a) Input images (b) Oriented Run length (c) Bounding boxes (d) Text detection
Fig. 3-
Table 1- Performance of the proposed method and existing methods