系統識別號 | U0002-2706200620351000 |
---|---|
DOI | 10.6846/TKU.2006.00863 |
論文名稱(中文) | 動態影像之文字擷取 |
論文名稱(英文) | Text Extraction for Dynamic Images |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 94 |
學期 | 2 |
出版年 | 95 |
研究生(中文) | 王駿瑋 |
研究生(英文) | Chun-Wei Wang |
學號 | 693192204 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2006-06-13 |
論文頁數 | 55頁 |
口試委員 |
指導教授
-
林慧珍(hjlin@cs.tku.edu.tw)
委員 - 林慧珍(hjlin@cs.tku.edu.tw) 委員 - 顏淑惠(shyen@cs.tku.edu.tw) 委員 - 徐道義(taoi@cc.shu.edu.tw) |
關鍵字(中) |
文字資訊擷取 離散餘弦轉換 型態影像學 影像重建 文字識別 |
關鍵字(英) |
Text Information Extraction DCT Morphological operators Erosion Dilation Opening Closing Image Reconstruction Text Identification |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
數位影像中文字資訊擷取(Text Information Extraction)是近年來已成為一項重要的應用,一文字資訊擷取系統具有對影像內容自動加上註解/註釋的功能,可提供影像內容索引的機制。一個完整的文字資訊擷取系統包含了“文字偵測”、“文字定位”、“文字追踪”、“影像強化”、以及“文字識別”。然而,由於多樣化及複雜化的影像文件,文字在影像文件中可能有不同字型大小、形狀、方向等變化,若再加上顏色變化,則更是提高了文字資訊擷取的困難度。本研究試著發展出一個從動態影像中自動擷取文字的方法,透過壓縮格式中常用的離散餘弦轉換(DCT)直接在頻率域上處理文字區塊擷取工作,計算每一DCT的8x8區塊上的水平能量,再加上文字區塊在時間軸上的特性,計算並過濾掉大部份非文字區塊,並保留下來絕大部份的文字區塊,搭配型態影像學的方法,包含浸蝕(Erosion)、增長(Dilation)、開(Opening)、閉(Closing)等運算,找出正確的文字區域。 接下來,我們利用型態影像學的影像重建從找到的文字區域中將文字擷取出來,利用Lin et al.的方法對擷取出來的文字做二值化,得到一比較完整的文字圖形,以利後續的處理與應用,比如做文字識別、存成文字檔案、對影像內容加上註解/註釋、做影像搜尋等。 |
英文摘要 |
Recently, Text Information Extraction (TIE) is one of the most important applications. We can not only automatically add annotation to the image but also provide an image indexing mechanism with text information. A complete text information extraction system is composed of detection, localization, tracking, enhancement and recognition. However, because of the complexity and variations in image styles, the text may vary on font size, shape, and orientation. Moreover, with variations in color, text extraction becomes more challenging. In our method, the DCT coefficients and temporal information of a sequence of video images are used to evaluate horizontal energy, with which most of the non-text blocks can be filtered out. Some morphological operators such as erosion, dilation, opening and closing are performed to further remove the non-text blocks with all text blocks reserved. The detected text blocks are further enhanced to extract characters for recognition. The recognized characters are then saved as text files for later use, such as video indexing. |
第三語言摘要 | |
論文目次 |
第一章 緒論 1 1.1 研究動機與目的 1 1.2 研究內容 2 1.3 論文架構 5 第二章 相關研究與理論基礎 6 2.1文字擷取相關研究 6 2.2理論基楚 11 2.2.1 DCT的水平能量與垂直能量 11 2.2.2二值化 (Bi-Level Thresholding) 12 2.2.3型態影像學運算 (Morphological Operations) 16 結構元素(Structuring Elements) 16 浸蝕(Erosion) 17 增長(Dilation) 18 開運算(Opening) 19 閉運算(Closing) 19 測量學浸蝕(Geodesic Erosion) 20 影像重建(Image Reconstruction) 20 第三章 研究方法 21 3.1找出文字區域 22 3.2背景移除 28 3.3二值化 34 3.4後處理 36 第四章 實驗結果與探討 38 4.1實驗結果 38 4.2實驗結果探討 41 4.3實驗結果比較 42 第五章 結論 46 第六章 未來研究方向 47 參考文獻 48 英文論文 50 圖1.1系統流程圖 4 圖2.1線段分解示意圖 8 圖2.2 DCT能量示意圖 (a)陣列MUBU0,0UBU (b) 區塊B(i,j)為陣列MUBUi,jUBU之I DCT轉換結果 12 圖2.3 各種形狀之結構元素(a)方形, (b)十字形, (c)水平直線,與(d)垂直直線 17 圖2.4浸蝕示意圖 (a)結構元素,(b)欲處理之影像,(c)浸蝕之結果 17 圖2.5增長示意圖(a)結構元素,(b)欲處理之影像,(c)為增長後的結果 18 圖2.6開運算示意圖 (a)結構元素,(b)欲處理之影像,(c)開運算後之結果 19 圖2.7閉運算示意圖 (a)結構元素,(b)欲處理之影像,(c)閉運算後之結果 20 圖3.1 DCT水平能量示意圖 (a)原影像,(b)影像中水平能量較高的部分 23 圖3.2以固定間隔S=5所取出的5個畫格 25 圖3.3使用時間資訊找出可能文字區塊 26 圖3.4使用型態影像學後所形成的區塊 27 圖3.5連結元件示意圖 27 圖3.6影像重建範例 (a) 原圖(遮罩影像),(b)保留四個邊的圖(標記影像),(c)第39次重建後的圖,(d) 第79次重建後的圖,(e) 第119次重建後的圖,(f)利用(a)與(f)相減所得之背景移除結果 29 圖3.7背景移除之範例(a)原影像,(b)重建後的背景影像,(c)背景移除後之影像 31 圖3.8 區塊放大後之背景移除範例(a)原影像,(b)重建後的背景影像,(c)背景移除後之影像 33 圖3.9 二值化範例(a)原影像,(b)Otsu二值化結果,(c)Lin et al.二值化結果 35 圖3.10影像搜尋系統介面圖 36 圖3.11新聞片段範例 37 圖4.1實驗結果(新聞) 39 圖4.2實驗結果(卡通) 40 圖4.3比較結果(a)~(d)本方法之擷取結果,(e)~(h)R. Wang et al.之方法擷取結果 44 |
參考文獻 |
1.M. Y. H. Yassin and L. J. Karam, “Morphological text extraction from images”, IEEE Transactions on Image Processing, Vol. 9, No. 11, pp. 1978-1983, November, 2000. 2.K. Sobottka, H. Bunke, and H. Kronenberg, “Identification of text on colored book and journal covers”, ICDAR 1999, pp. 57-62, 1999. 3.K. Jung, “Neural network-based text location in color images”, Pattern Recognition Letters 22, pp. 1503-1515, 2001. 4.X. Tang, X. Gao, J. Liu, and H. Zhang, “A spatial-temporal approach for video caption detection and recognition”, IEEE Transactions on Neural Networks, Vol. 13, Issue: 4, pp. 961-971, July, 2002. 5.Y. Zhong, H. Zhang, and A. K. Jain, “Automatic caption localization in compressed video”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 4, pp. 385-392, April, 2000. 6.D. Zhang, B. L. Tseng, C.-Y. Lin, and S.-F. Chang, “Accurate Overlay Text Extraction for Digital Video Analysis”, Proc. of IEEE Intl. Conf. on Information Technology: Research and Education, Newark, pp.233-237, Aug, 2003. 7.Anil K. Jain and Yu Zhong, “Page segmentation using texture analysis”, Pattern Recognition Vol. 29, No. 5, pp. 743-770, 1996. 8.Apostolos Antonacopoulos, “Page Segmentation Using the Description of the Background”, Computer Vision and Image Understanding Vol. 70, No. 3, pp. 350-369, 1998. 9.O'Gorman L., “The document spectrum for page layout analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 15, No. 11, pp 1162-1173, November 1993. 10.S. L. Chang, Y. P. Chen, T. Y. Tsai, and S. W. Chen, “Automatic License Plate Recognition System”, Proc. of 15th IPPR Conf. on CVGIP, Hsing-Chu, Taiwan, pp. 353-360, 2002. 11.Hsin-Der Lui, Yuan-Kai Wang, Kuo-Chin Fan, and Bor-Shenn Jeng, “Robust license plate recognition system using multi-experts approach”, 15th IPPR Conference on Computer Vision Graphics and Image Processing, pp. 426-436, August, 2002. 12.Ezaki, N., Bui Truong Minh, Kiyota, K., Bulacu, M. & Schomaker, L.R.B. (2005), “TImproved text-detection methods for a camera-based text reading system for blind persons”, TProc. of 8th Int. Conf. on Document Analysis and Recognition (ICDAR 2005), IEEE Computer Society, Vol. I, pp. 257-261, August 29T-TSeptember 1, 2005. 13.Rainer Lienhart and Wolfgang Effelsberg, “Automatic Text Segmentation and Text Recognition for Video Indexing”, ACM/Springer Multimedia Systems, Vol. 8, pp. 69-81, Jan. 2000. 14.Xian-Sheng Hua, Xiang-Rong Chen, Liu Wenyin and Hong-Jiang Zhang, “Automatic Location of Text in Video Frames”, Proceeding of ACM Multimedia Workshops (MIR2001), pp. 24-27, Ottawa, Canada, Oct. 2001. 15.S. M. Smith, J. M. Brady, “SUSAN - A New Approach to Low Level Image Processing”. Int. Jour. of Computer Vision, Vol. 23. No. 1, pp. 45-78, May 1997. 16.R. Wang, W. Jin, and L. Wu, “TA novel video caption detection approach using multiframe integration”, Pattern Recognition 17th International Conference on ICPR’04 Vol. 1, pp. 449-452, August 23-26, 2004. 17.D. Chen, H. Bourlard, and J-Ph. Thiran, “Text identification in complex background using SVM”, Proc. of the. Int. Conf. on Computer Vision and Pattern Recognition, pp. 621-626, Dec. 2001. 18.Hwei-Jen Lin and Fu-Wen Yang, “An Intuitive Threshold Selection Based on Mountain Clustering”, First International Workshop on Intelligent Multimedia Computing and Networking (IMMCN2000), Session: Algorithms in Multimedia Computing, 2000. 19.TPierre Soille, “Morphological image analysis: principles and applications”, pp. 163-164, Springer-Verlag, 1999. 20.Otsu N. A, “Threshold selection method from gray-level histograms”, IEEE Transactions Systems, Man and Cybernetics, pp.62-66, Vol. 9 No. 1, 1979. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信