淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-0801201110391800
中文論文名稱 基於多影格的精確新聞影片文字偵測與擷取
英文論文名稱 Precise News Video Text Detection and Text Extraction Based on Multiple Frames Integration
校院名稱 淡江大學
系所名稱(中) 資訊工程學系博士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 99
學期 1
出版年 100
研究生中文姓名 張曉維
研究生英文姓名 Hsiao-Wei Chang
學號 895410180
學位類別 博士
語文別 英文
口試日期 2011-01-13
論文頁數 45頁
口試委員 指導教授-顏淑惠
委員-施國琛
委員-徐道義
委員-蔡憶佳
委員-黃心嘉
委員-林慧珍
委員-顏淑惠
中文關鍵字 文字偵測  文字擷取  二值化  黑白轉變  肯尼邊緣偵測器 
英文關鍵字 text detection  text extraction  binarization  black-and-white transition  Canny edge detector 
學科別分類 學科別應用科學資訊工程
中文摘要 出現於新聞影片中的文字對於新聞影片的索引與摘要是很重要的。 在本論文中,我們提出一個強韌與有效的文字偵測(text detection)方法,以及隨後對偵測到的文字區域做精確的文字擷取(text extraction),即二值化(binarization)。我們提出的文字偵測方法是首先利用時間的訊息與邏輯運算AND移除絕大多數不相關的背景,然後在邊圖(edge map)上應用視窗為基礎的方式計算黑白轉變(black-and-white transition)得到粗略的文字塊。直線消除法被運用二次以細緻化文字塊。我們提出的方法可適用於多種語言,例如:英文、日文與中文。對文字亮度低(背景亮度高)或文字亮度高(背景亮度低)、文字的不同大小與文字水平或垂直方向的排列都具有強韌性。我們以三種評估方法去測量本論文所提出文字偵測方法的效能,對多種語言的實驗結果均可達到96%以上的優異表現。文字偵測之後,我們提出文字擷取(二值化)的方法,它是首先利用肯尼邊緣偵測器(Canny edge detector)於已偵測到的文字方塊上,然後對文字方塊由左至右垂直掃描二次。垂直線由上往下掃描穿過各像素直到碰撞邊緣像素(edge pixel)或到達最底線,相同的,垂直線由下往上掃描穿過各像素直到碰撞邊緣像素或到達最頂線,所有這些被穿過的像素均分類為背景像素。然後我們從非背景像素的直方圖中找出最多相同亮度的背景像素點p及計算出標準差σ,最後根據判斷文字亮度低(高),設定臨界值T = [0, p+kσ]或T = [p-kσ, 255],並擷取出文字。我們提出的文字擷取方法的特點是不需任何參數,對文字亮度低(高)亦沒有限制,並能處理背景與文字有相同的亮度的狀況。本方法亦可用於不同的新聞影片、歷史檔案文件及其他不同的文件上,在準確性與品質方面更優於其他眾所周知的方法,例如:Otsu、Niblack及Souvola。
英文摘要 Text on news video is crucial for news video indexing and summarization. In this thesis, we present both a robust and efficient text detection algorithm and the subsequent precise text extraction (binarization) algorithm to binarize the detected text regions on news videos. The proposed text detection algorithm first uses both the temporal information of video and the logical AND operation to remove most of the irrelevant background. Then, a window-based method by counting the black-and-white transitions is applied to the resulted edge map to obtain rough text blocks. Line deletion technique is used twice to refine the text blocks. The proposed algorithm is applicable to multiple languages (i.e. English, Japanese, and Chinese), robust to text polarities (positive or negative), various character sizes, and text alignments (horizontal or vertical). Three metrics (recall, precision, and quality of bounding preciseness) are adopted to measure the efficacy of text detection algorithms. According to the experimental results on various multilingual video sequences, the proposed algorithm has above 96% performance in all three metrics. Following text detection, the text extraction (binarization) algorithm proposed first applies the Canny edge detector on the text box. Next, the vertical line scanning from left to right of the text box is performed twice. The vertical line traverses downwards until it hits an edge pixel or it reaches the bottom of the box. Similarly, the vertical line traverses upwards until it hits an edge pixel or it reaches the top of the box. These traversed pixels are classified as background pixels. The algorithm then locate the peak intensity p and evaluate the standard deviation σ from the histogram of those non-background pixels. And finally, after the threshold is set to be T = [0, p+kσ] or T = [p-kσ, 255] depending on text polarity, the algorithm obtain the result of text extraction (binarization). Notably, the proposed method is parameter-free, has no limitation on the text polarity, and can handle the cases with similar intensity in both background and text of news video. The method has been extensively experimented on text boxes from various news videos, historical archive documents, and other different documents. The proposed algorithm outperforms the well-known methods such like Otsu, Niblack, and Souvola methods etc. in precision and quality.
論文目次 Table of Contents
List of Figures V
List of Tables VII
Chapter 1 Introduction 1
Chapter 2 Related Works 3
Chapter 3 Edge Detection 7
3.1 Sobel Edge Detection 7
3.2 Canny Edge Detection 8
Chapter 4 Proposed Method 12
4.1 Text Detection 12
4.2 Text Extraction 21
Chapter 5 Experimental Results 27
5.1 Text Detection Results 27
5.2 Text Extraction Results 35
Chapter 6 Conclusion 39
References 41
Appendix 45

List of Figures
Figure 1 (a) the horizontal mask (b) the vertical mask 7
Figure 2 (a) and (d) are original images, (b) and (e) are Sobel edges, (c) and (f) are Canny edge maps 11
Figure 3 The flowchart of the proposed approach 13
Figure 4 Four color reference frames 14
Figure 5 Four grayscale reference images 15
Figure 6 Four Canny edge detection maps after removing lines that are too long 16
Figure 7 The AND-edge-map: the result after taking AND operation on four Canny edge maps of Fig. 6 16
Figure 8 (a) Edge of the string. (b) Edge of English letter “I” 17
Figure 9 The result of the masked region of Fig. 7 19
Figure 10 The result of text extraction 20
Figure 11 Effectiveness of text refinement (a) before and (b) after 20
Figure 12 The flowchart of the proposed approach 21
Figure 13 (a) original image; (b) extended image; (c) grayscale image; (d) Canny edge map; (e) binarization result 23
Figure 14 (a) histogram of the Fig. 13(c); (b) intensity information of the Fig. 13(c) 23
Figure 15 Original image (text and the background have similar intensities) 24
Figure 16 The binarization results of Fig. 15. (a) Otsu; (b) Lin; (c) Niblack (w= 30, k= 0.2); (d) Souvola (w= 30, k= 0.2) 24
Figure 17 (a) histogram of the Fig. 15; (b) intensity information of the Fig. 15 24
Figure 18 (a) detected text region; (b) reverse it; (c) Canny map of (b) 25
Figure 19 (a) histogram of the Fig. 18(b); (b) intensity information of the Fig. 18(b) 25
Figure 20 (a) modify background intensities of Fig. 15; (b) proposed algorithm binarization result 26
Figure 21 The binarization result of Fig. 20(a) by method of (a) Otsu; (b) Lin; (c) Niblack (w= 30, k= 0.2); (d) Souvola (w= 30, k= 0.2) 26
Figure 22 Three bounding boxes for the same text from large to small 27
Figure 23 Detected boxes (in yellow) for a ground truth text (in blue) (a) and (b) fail to detect, (c)~(e) truly detect it but only (e) accurately detects the text 27
Figure 24 The ambiguities in defining a ground truth for “TALK & VIRGINIA SHOWDOWN” 29
Figure 25 Some results of our method(size 400x300). (a) and (b) are CNN videos, (c) ESPN video, (d) NHK video from Japan, (e) ETTV and (f) TVBS are two different news videos from Taiwan 30
Figure 26 The detection result with presumed character size to be 10x10 33
Figure 27 The detection result of different image size (640x480) image 33
Figure 28 Some results of other’s method: (a) [6], (b) [8], (c) [13], (d) [15] 34
Figure 29 (a), (b), (c), (d) are from news videos; (e), (f) are historical archive document images 38
Figure 30 (a)histogram and (b)intensity information of the Fig. 29(d) 38

List of Tables
Table 1 Results on different video sequences 30
Table 2 Results on R, P, and Q 30


參考文獻 [1] H. Chang, Automatic Web Image Annotation for Image Retrieval Systems, 12th WSEAS International Conference on Systems, pp. 670-674, 2008.

[2] K. Jung, K. I. Kim, A. K. Jain, Text information extraction in images and video: A survey, Pattern Recognition, Vol. 37, No. 5, pp. 977-997, 2004.

[3] Y. Liu, H. Lu, X. Xue, Y. Tan, Effective video text detection using line feature, 8th International Conference on Control, Automation, Robotics and Vision (ICARCV), Vol. 2, pp. 1528-1532, 2004.

[4] J. Wang, Y. Zhou, An unsupervised approach for video text localization, IEICE TRANS. INF. & SYST., Vol. E89-D, Issue: 4, pp. 1582-1585, 2006.

[5] T. Tsai, Y. Chen, C. Fang, A two-directional videotext extractor for rapid and elaborate design, Pattern Recognition, Vol. 42, Issue 7, pp. 1496-1510, 2009.

[6] R. Wang, W. Jin, L. Wu, A novel video caption detection approach using multi-frame integration, Pattern Recognition, 17th International Conference on ICPR2004, Vol. 1, pp. 449-452, 2004.

[7] C. Mi, Y. Xu, H. Lu, X. Xue, A novel video text extraction approach based on multiple frames, IEEE International Conference on Information, Communications and Signal Processing (ICICS 2005), Vol. 2005, No. 1689133, pp. 678-682, 2005.

[8] X. Huang, H. Ma, H. Yuan, A novel video text detection and localization approach, PCM 2008. LNCS 5353, pp. 525-534, 2008.

[9] S. H. Yen, C. W. Wang, J. P. Yeh, M. J. Lin, H. J. Lin, Text extraction in video images, The Second IEEE International Conference on Secure System Integration and Reliability Improvement (SSIRI2008), pp. 189-190, 2008.

[10] H. Sun, N. Zhao, X. Xu, Extraction of text under complex background using wavelet transform and support vector machine, IEEE International Conference on Mechatronics and Automation (ICMNA 2006), Vol. 2006, No. 4026310, pp. 1493-1497, 2006.

[11] A. K. Jain, B. Yu, Automatic text location in images and video frames, Pattern Recognition, Vol. 31, No. 12, pp. 2055-2076, 1998.

[12] V. Y. Mariano, R. Kasturi, Locating uniform-colored text in video frames, Proceeding of the 15th International Conference Pattern Recognition, Vol. 4, pp. 539-542, 2000.

[13] X. Hua, X. Chen, W. Liu, H. Zhang, Automatic location of text in video frames, Proceeding of ACM Multimedia Workshop (MIR2001), pp. 24-27, 2001.

[14] M. R. Lyu, J. Song, M. Cai, A comprehensive method for multilingual video text detection, localization, and extraction, IEEE Trans. on Circuits And Systems For Video Technology, Vol. 15, Issue: 2, pp. 243-255, 2005.

[15] M. Anthimopoulos, B. Gatos, I. Pratikakis, A hybrid system for text detection in video frames, Document Analysis Systems, DAS '08. The Eighth IAPR International Workshop, pp. 286-292, 2008.

[16] A. Safi, M. Azam, S. Kiani, N. Daudpota, Online Vehicles License Plate Detection and Recognition System using Image Processing Techniques, Proceeding of the 5th WSEAS International Conference on Applied Computer Science, pp. 793-800, 2006.

[17] S. Brook, Z. Aghbari, Holistic Approach for Classifying and Retrieving Personal Arabic Handwritten Documents, 7th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases (AIKED'08), pp. 565-570, 2008.

[18] S. Choi, J. Yun, K. Koo, J. Choi, S. Kim, Text Region Extraction Algorithm on Steel Making Process, 8th WSEAS International Conference on Robotics, Control and Manufacturing Technology (ROCOM '08), pp. 24-28, 2008.

[19] J. Zhang, D. Goldgof, R. Kasturi, A new edge-based text verification approach for video, 19th International Conference on Pattern Recognition(ICPR2008), pp. 1-4, 2008.

[20] N. Otsu. A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man and Cybernetics, 9(1): pp. 62-66, 1979.

[21] H. Lin, F. Yang, An intuitive threshold selection based on mountain clustering, Proceedings of the conference(JCIS'2000), 2000.

[22] R. R. Yager, D. P. Filve, Generation of fuzzy rules by mountain clustering, Journal of Intelligent and Fuzzy Systems, Vol. 2, pp. 209-219, 1994.

[23] W. Niblack, An introduction to digital image processing, Prentice Hall, Englewood Cliffs, NJ, pp. 115–116, 1986.

[24] O. D. Trier, A. Jain, Goal-Directed Evaluation of Binarization Methods, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(12): pp. 1191-1201, 1995.

[25] C. Wolf, J. Jolion, F. Chassaing, Text localization, enhancement and binarization in multimedia documents, International conference on pattern recognition (ICPR 02), pp. 1037-1040, 2002.

[26] B. Gatos, I. Pratikakis, S. J. Perantonis, Adaptive degraded document image binarization, Pattern Recognition, pp. 317-327, 2006.

[27] J. Sauvola, T. Seppanen, S. Haapakoski, M. Pietikainen, Adaptive document binarization, International conference on document analysis and recognition, pp. 147–152, 1997.

[28] M. Sezgin, B. Sankur, Survey over image thresholding techniques and quantitative performance evaluation, Journal of Electronic Imaging 13(1), pp. 146–165, 2004.

[29] J. He, Q. D. M. Do, A. C. Downton, J. H. Kim, A Comparison of Binarization Methods for Historical Archieve Documents, ICDAR’05, pp. 538-542, 2005.

[30] C. Ngo, C. Chan, Video text detection and segmentation for optical character recognition, Multimedia Systems 10: pp. 261-272, 2005.

[31] P. H. Lindsay, D. A. Norman, Introduction into Psychology-Human Information Reception and Processing (in German), Springer-Verlag, Berlin, Germany, 1991.

[32] S. Pfeiffer, R. Lienhart, S. Fischer, W. Effelsberg, Abstraction digital movies automatically, J. Vis. Comm. Image Represent., Vol. 7, No. 4, pp. 345-353, 1996.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2013-01-19公開。
  • 同意授權瀏覽/列印電子全文服務,於2013-01-19起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信