§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2809202118505000
DOI 10.6846/TKU.2021.00802
論文名稱(中文) 國畫文字擷取與修補
論文名稱(英文) Text extraction and inpainting of Chinese paintings
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 109
學期 2
出版年 110
研究生(中文) 彭耀慶
研究生(英文) Yao-Ching Peng
學號 607410569
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2021-06-29
論文頁數 35頁
口試委員 指導教授 - 顏淑惠
委員 - 凃瀞珽
委員 - 林莊傑
關鍵字(中) 場景文字去除
多任務
文字範圍
國畫
關鍵字(英) Scene text removal
Multitask
Region
Chinese paintings
第三語言關鍵字
學科別分類
中文摘要
場景文字去除主要可以分為語義分割(semantic segmentation)和影像修復(image inpainting)兩項任務,近年許多研究通過結合兩者為一個多任務網路,藉由彼此互相學習影響,獲得了巨大的進展,然而我們發現了三個問題,一是錯認為文字的雜訊,二是去除不全的殘字,三是與現實差異過大的訓練資料。
在本篇論文中,我們提出一個基於雙分支文字定位的場景文字去除網路,透過合理的文字定位提升文字去除的影像品質。首先我們引入文字範圍(region)的概念,與以往追求的文字位置(mask)不同,將像素簡化為區塊,在共用潛在特徵下,mask提供精確資訊有效消除雜訊,region大範圍感知確保文字完全去除,彼此互相對抗學習,取得合理的文字定位,其次我們選用國畫作為訓練資料,合成的國畫不僅取得容易,也相似於現實情況,所訓練的網路將更具代表性。
藉由大量地實驗證實我們的網路能有效去除國畫中的文字,與先進的網路相比,儘管在影像修復表現略差的情況下,透過合理的文字定位,文字去除的影像品質反而更加優秀,代表我們基於雙分支文字定位的網路,能極大程度地提升文字去除的影像品質。
英文摘要
Scene text removal contains two tasks: semantic segmentation and image inpainting. Recently, there are many methods of text removal achieving significant progress by integrating both into a multitask network, which can learn from each other. However, there are three problems: 1) the noise misclassified as text; 2) the incomplete removal of characters; 3) the synthetic training images are unrepresentative of real-world datasets.
In this paper, we propose a scene text removal network based on dual-branch text extraction. First, unlike the conventional approach of pixel-level text localization, we introduce the concept of region localization which can reduce noise sensitivity and improve the text removal quality. In this way, the first two mentioned problems have been greatly alleviated. Second, we apply our architecture to separate calligraphy from traditional Chinese paintings. Calligraphy and painting are not only worthy of attention art works, but the training dataset is easy to produce and realistic. The third mentioned problem is thus solved. 
To illustrate the efficacy of the proposed model, many traditional Chinese paintings are tested in synthetic and real ones. Our overall results outperform those of the state-of-the-art networks. Although our model is not as impressive in image inpainting, yet the image quality is greatly benefitting from the dual-branch text extraction module.
第三語言摘要
論文目次
目錄
第一章	緒論	1
1.1	簡介場景文字去除	1
1.2	資料蒐集	2
1.3	研究目的與方向	4
第二章	文獻回顧	6
2.1	語義分割	6
2.2	影像修復	7
2.3	場景文字去除	8
第三章	研究方法	9
3.1	定義問題	9
3.2	網路架構	10
3.2.1	特徵編碼	11
3.2.2	修復解碼	12
3.2.3	定位解碼	12
3.2.4	橫向串聯	13
3.3	最佳化網路	14
3.3.1	修復損失	15
3.3.2	定位損失	16
第四章	實驗	17
4.1	資料庫	17
4.2	衡量方法	19
4.3	訓練細節	20
4.4	實驗結果	21
4.4.1	定量定性	21
4.4.2	消融實驗	25
第五章	結論與未來展望	30
5.1	結論	30
5.2	未來展望	30
參考文獻	32

圖目錄
圖 1. 合理地掩蓋文字訊息,圖片來自EraseNet[20]	2
圖 2. 成對的有無文字影像,上為合成資料,下為真實資料,合成資料過於不現實,令人感到違和,圖片來自[21]	3
圖 3. 合成國畫與真實國畫,圖片來自[24]	4
圖 4. 由左至右為輸入影像、真實情況、生成影像、預測位置,紅框內雜訊影響定位精度,卻不降低影像品質;綠框內殘字降低影像品質,對精度影響相對小,圖片來自[20]	5
圖 5. 語義分割範例,由左至右輸入影像、預測語義、真實語義,圖片來自[4]	6
圖 6. 由左至右為原影像、輸入影像、預測結果,影像修復可以去除物件、文字、人物,但必須指定修復的位置(填白部分),圖片來自[18]	7
圖 7. [21]網路架構,利用語義分割和影像修復階段性完成場景文字去除的工作	8
圖 8. 由左至右為I_in、M_out、I_out、I_comp	9
圖 9. 我們的網路架構,在Encoder的特徵會透過各自的橫向串聯,協助Decoder中的Inpainting、Mask、Region等任務	10
圖 10. 每個RFA block中包含四個resblocks,於第一個resblock以stride 2縮小長寬為原來的1/2,並將通道增至原來的2倍,隨後接以三個resblock運算,其輸出維度不變,最後參考四個resblock的殘差輸出並得出最終殘差結果,圖片來自[23]	11
圖 11. 利用空洞卷積增加視野範圍,並同時參考多重範圍的視野,對影像修復有著非常大的幫助,圖片來自[9]	12
圖 12. 橫向串聯,其中CBAM可以依據所需更換	13
圖 13. CBAM有著兩個子模塊:通道和空間,依次對特徵細化(refine),圖片來自[17]	13
圖 14. 無文字的國畫I_gt,上列[24]多為美術館藏,下列[11]多為近現代中國繪畫	17
圖 15. 蘇軾<中山松醪賦>,上為書法文字W,下為其文字位置M_gt,用於測試	18
圖 16. 王羲之<蘭亭序>,上為書法文字W,下為其文字位置M_gt,用於訓練	18
圖 17. 粗略的文字範圍R_gt,左二為M_gt,右二為對應的R_gt	19
圖 18. 影像修復結果,(a)I_in,(b) I_gt,(c)(d)EraseNet的I_out和I_comp,(e)(f)我們的I_out和I_comp	22
圖 19. 文字定位結果,(a) I_in,(b) M_gt,(c)EraseNet的M_out,(d)我們的M_out	23
圖 20. 實際應用結果,(a) I_in,(b)(c)EraseNet的I_comp和M_out,(d)(e)我們的I_comp和M_out	24
圖 21. (b)中的字有些殘缺,(f)中的字比較圓滿,影像修復我們的比較好	26
圖 22. (d)在影像修復出現突兀的綠色,明顯出了問題	27
圖 23. 圖像修復L_inpainting和文字定位L_extraction權重比,(e)1:1,(f) 3.44:18.16	28
圖 24. 影像修復I_comp和文字定位M_out結果,(a) I_in,(b)without region,(c)skip,(d)seSE,(e)固定權重,(f)我們的網路(使用CBAM和自適權重)	29
圖 25. 左二為I_in,右二為I_out,兩者之間顏色出現偏移	31

表目錄
表 1. 實驗結果,IoU、PSNR、SSIM皆愈大愈好	21
表 2. 消融實驗結果,IoU、PSNR、SSIM皆愈大愈好	25
參考文獻
參考文獻
[1] D. Karatzas et al., “ICDAR 2013 robust reading competition,” in Proc. 12th Int. Conf. Document Anal. Recognit., Aug. 2013, pp. 1484–1493.
[2] Diederik P. Kingma, and Jimmy Ba, “Adam: A Method for Stochastic Optimization,” Machine Learning (cs.LG), Dec. 2014.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015 International Conference on Learning Representations (ICLR), San Diego, CA, 2015, pp. 1–14.
[4] Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 3431-3440.
[5] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. MICCAI, 2015, pp. 234–241.
[6] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2536–2544.
[7] T. Nakamura, A. Zhu, K. Yanai, and S. Uchida, “Scene text eraser,” in Proc. 14th IAPR Int. Conf. Document Anal. Recognit. (ICDAR), Nov. 2017, pp. 832–837.
[8] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Trans. Graph., vol. 36, no. 4, pp. 1–14, Jul. 2017.
[9] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam, “Rethinking Atrous Convolution for Semantic Image Segmentation,” in Proc. Computer Vision and Pattern Recognition (cs.CV), Dec. 2017.
[10] Jie Hu, Li Shen, and Gang Sun, “Squeeze-and-Excitation Networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 7132-7141.
[11] Guanyang Wang, Ying Chen, and Yuan Chen, “Chinese Painting Generation Using Generative Adversarial Networks,” 2017.
[12] S. Zhang, Y. Liu, L. Jin, Y. Huang, and S. Lai, “Ensnet: Ensconce text in the wild,” in Proc. AAAI, vol. 33, 2019, pp. 801–808.
[13] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in Proc. ECCV, 2018, pp. 85–100.
[14] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5505–5514.
[15] Alex Kendall, Yarin Gal, and Roberto Cipolla, “Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics,” in Proc. Computer Vision and Pattern Recognition (cs.CV), Apr. 2018.
[16] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1125–1134.
[17] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon, “CBAM: Convolutional Block Attention Module,” in Proc. Computer Vision and Pattern Recognition (cs.CV), Jul. 2018.
[18] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. Huang, “Free-form image inpainting with gated convolution,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4471–4480.
[19] Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo, “Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 1486-1494.
[20] Chongyu Liu, Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Canjie Luo, and Yongpan Wang, “Erasenet: End-to-end text removal in the wild,” IEEE Transactions on Image Processing, 29:8760–8775, 2020.
[21] Jan Zdenek, and Hideki Nakayama, “Erasing Scene Text with Weak Supervision,” IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
[22] Xuewei Bian, Chaoqun Wang, Weize Quan, Juntao Ye, Xiaopeng Zhang, and Dong-Ming Yan, “Scene text removal via cascaded text stroke detection and erasing,” in Proc. Computer Vision and Pattern Recognition (cs.CV), Nov. 2020.
[23] Jie Liu, Wenjie Zhang, Yuting Tang, Jie Tang, and Gangshan Wu, “Residual Feature Aggregation Network for Image Super-Resolution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 2359-2368.
[24] Alice Xue, “End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks,” in Proc. Computer Vision and Pattern Recognition (cs.CV), Nov. 2020.
[25] Yuxin Wang, Hongtao Xie, Shancheng Fang, Yadong Qu, and Yongdong Zhang, “PERT: A Progressively Region-based Network for Scene Text Removal,” in Proc. Computer Vision and Pattern Recognition (cs.CV), Sep. 2021.
[26] https://www.easyatm.com.tw/wiki/%E5%9C%8B%E7%B2%B9, accessed on Jul. 2021. 
[27] https://read01.com/KB0DNB5.html#.YMbrHPkzYuV, accessed on Jul. 2021.
[28] https://www.easyatm.com.tw/wiki/%E5%9C%8B%E7%B2%B9, accessed on Nov. 2020.
[29] https://www.sohu.com/a/145523117_804291, accessed on Nov. 2020.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信