電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2021-07-21起於校外公開使用
本論文紙本於2021-07-21起公開使用

系統識別號	U0002-0807202111373200
DOI	10.6846/TKU.2021.00199
論文名稱(中文)	結合注意力模塊的U-Net架構應用於圖像去背
論文名稱(英文)	Attention Module Combined with U-Net Architecture for Image Matting
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊工程學系資訊網路與多媒體碩士班
系所名稱(英文)	Master's Program in Networking and Multimedia, Department of Computer Science and Information Engine
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	109
學期	2
出版年	110
研究生(中文)	張國棟
研究生(英文)	ZHANG, GUODONG
學號	608424023
學位類別	碩士
語言別	繁體中文
第二語言別	英文
口試日期	2021-06-29
論文頁數	39頁
口試委員	指導教授 - 顏淑惠(105390@mail.tku.edu.tw) 委員 - 廖弘源(liao@iis.sinica.edu.tw) 委員 - 顏淑惠(105390@mail.tku.edu.tw) 委員 - 蔡憶佳(iaiclab.tku@gmail.com)
關鍵字(中)	三元圖圖像摳圖去背阿爾法遮罩注意力模塊 U-Net架構
關鍵字(英)	Trimap attention module U-net matting
第三語言關鍵字
學科別分類
中文摘要	三元圖(Trimap)在影像摳圖去背(image matting)領域中起著至關重要的作用。然而，trimap往往由人工標記形成，比較繁瑣而且成本很高。我們建議使用一個嵌入注意力模塊的U-Net架構，在沒有trimap的輸入的情況下，用來學習輸入圖像的前景、背景和不透明區域，以估計阿爾法遮罩(alpha matte)。在估計alpha matte時，我們發現有兩個現象。連接到前景和背景的邊緣應該被保留下來，而前景內的紋理細節應該被忽略掉。但是，當前景有雜亂的紋理時，或者當前景和背景都有類似的紋理時，alpha matte很容易出錯。為了緩解這個問題，我們設計了一個額外的分類器來學習輸入圖像中的純前景。這裡 "純 "表示這些像素必須是前景。然後我們把它添加到原始的U-Net架構結果中，成功地解決了這個問題。大量的實驗證明，我們的方法與那些最先進的方法相當或更好。
英文摘要	Trimap plays an essential role in image matting. However, generating trimaps is costly. Instead, we propose an U-net architecture equipped with both spatial and channel attention modules to estimate alpha matte without trimaps. However, there are two opposite phenomena observed in estimating alpha matte. The fine-structure edges connecting to background should be preserved whereas the texture details inside the foreground should be ignored. Thus, alpha matte is prone to make errors when foreground is clutter with texture, or when both foreground and background have similar textures. To alleviate this problem, we design an additional classifier to learn the pure foreground from the input image. This “pure” indicates that those pixels must be foreground. Then we add it to the original U-net architecture result and successfully solve the problem. Extensive experiments have proved that our method are comparable or better to those state-of-the-art methods.
第三語言摘要
論文目次	目錄第一章緒論 1 1.1 簡介Image Matting 1 1.2 方法摘要 2 1.3 貢獻 2 1.4 論文架構 3 第二章文獻回顧 4 2.1 Image matting回顧 4 2.1.1 傳統方法 4 2.1.2 機器學習方法 5 2.2 注意力模塊 6 2.3 其他方法 10 第三章研究方法 11 3.1 分析問題 11 3.2 架構設計 12 3.2.1 A-U-Net 12 3.2.2 分類網絡 14 3.3 loss設計 15 第四章實驗 17 4.1 實施細節 17 4.2 與最先進技術的比較 18 4.3 消融研究 22 4.4 loss權重參數設定 24 4.5 局限性 25 第五章結論與未來展望 27 5.1 結論 27 5.2 未來展望 27 參考文獻 28 附錄：英文論文 32 圖目錄圖1. 所示為我們的方法產生的結果，當不使用trimap時，（a）是我們的輸入圖像。（b）是帶注意塊的U-Net的結果，它對邊界處理非常好，可以學習到精細結構和紋理，但前景內部的紋理細節也會顯示出來。（c）為分類結果中的純白部分，可以去除前景內部的細節。在兩個架構的共同作用下，我們可以得到一個非常好的結果，如圖（d）。（e）是我們的真實的alpha圖，由數據集提供。 2 圖2. [15] SEblock，目的是增強feature中重要的特徵，減弱feature中不重要的特徵，從而讓提取的特徵指向性更強。 8 圖3. [13] SOCA module，將SEblock中的Global Average Pooling改為Global Covariance average pooling以獲得相比SEblock中更高階的語義信息。 8 圖4. [11] Convolutional Block Attention Module (CBAM)，一種結合了空間（spatial）和通道（channel）的注意力機制模塊。 9 圖5. ASPP [12] 並行的採用多個採樣率的空洞卷積層來學習，以多個比例捕捉對像以及圖像上下文，對於高階語義信息的提取有非常大的幫助。 10 圖6. 我們的模型。整個架構分為兩部分。 A-U-Net和一個以resnet-34 [20] 為骨幹的分類架構。我們在A-U-Net中加入了剩餘塊、ASPP、修改過的和特別設計的CBAM塊（包括空間上的注意、通道上的注意）。在分類架構中，我們加入了ASPP和SOCA，它們可以自適應地重新劃分特征的通道。 H/W：特征的長寬，C：特征的通道數。 12 圖7. 空間注意模塊。 (a)是CBAM[11] 中的原始空間注意模塊，(b)是我們的修改模塊。我們沒有使用內核大小為7的捲積操作，而是使用內核大小為11、7、3的三個平行卷積。 14 圖8. 展示了我們分類架構的target圖生成的過程，我們使用數據集中給定的真實的alpha圖，經過dilation和erosion，最後生成一個分為0,1,2三類的mask。 18 圖9. 在Adobe Composition-1k測試集[4]上的alpha matte結果的定性比較。我們的實驗結果按照論文[25]的方法來做，最後把實驗結果裁剪成800×800的尺寸。 21 圖10. 經過過度處理的alpha 圖。我們的結果比GT的結果更銳利。 26 圖11. 不正確的分類產生了一個帶有噪音的alpha 圖。 26 表目錄表1. 在Composition-1k測試集[4]上，alpha 圖與其他方法的比較。 表示該方法不需要trimap作為額外的輸入 20 表2. 對Composition-1k測試集的消融研究[4]。那些缺失的值意味著所產生的alpha matte的質量很差。各種架構在文中有解釋。 23 表3. 各損失函數在訓練U-Net中的效果。 24 表4. 在不同loss的權重參數組合下的實驗結果。 25
參考文獻	[1] Levin, Anat, Dani Lischinski, and Yair Weiss. "A closed-form solution to natural image matting." IEEE transactions on pattern analysis and machine intelligence 30.2 (2007): 228-242. [2] Chen, Qifeng, Dingzeyu Li, and Chi-Keung Tang. "KNN matting." IEEE transactions on pattern analysis and machine intelligence 35.9 (2013): 2175-2188. [3] Cho, Donghyeon, Yu-Wing Tai, and Inso Kweon. "Natural image matting using deep convolutional neural networks." European Conference on Computer Vision. Springer, Cham, 2016. [4] Xu, Ning, et al. "Deep image matting." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [5] Lutz, Sebastian, Konstantinos Amplianitis, and Aljosa Smolic. "Alphagan: Generative adversarial networks for natural image matting." arXiv preprint arXiv:1807.10088 (2018). [6] Cai, Shaofan, et al. "Disentangled image matting." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. [7] Lu, Hao, et al. "Indices matter: Learning to index for deep image matting." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. [8] Chen, Quan, et al. "Semantic human matting." Proceedings of the 26th ACM international conference on Multimedia. 2018. [9] Qiao, Yu, et al. "Attention-guided hierarchical structure aggregation for image matting." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. [10] Li, Jizhizi, et al. "End-to-end Animal Image Matting." arXiv preprint arXiv:2010.16188 (2020). [11] Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018. [12] Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017). [13] Dai, Tao, et al. "Second-order attention network for single image super-resolution." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. [14] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015. [15] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. [16] Fu, Jianlong, Heliang Zheng, and Tao Mei. "Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [17] Mnih, Volodymyr, et al. "Recurrent models of visual attention." arXiv preprint arXiv:1406.6247 (2014). [18] Zhang, Yunke, et al. "A late fusion cnn for digital matting." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. [19] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [20] Li, Yaoyi, and Hongtao Lu. "Natural image matting via guided contextual attention." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 07. 2020. [21] Forte, Marco, and François Pitié. "$ F $, $ B $, Alpha Matting." arXiv preprint arXiv:2003.07711 (2020). [22] Gastal, Eduardo SL, and Manuel M. Oliveira. "Shared sampling for real‐time alpha matting." Computer Graphics Forum. Vol. 29. No. 2. Oxford, UK: Blackwell Publishing Ltd, 2010. [23] Zheng, Yuanjie, and Chandra Kambhamettu. "Learning based digital matting." 2009 IEEE 12th international conference on computer vision. IEEE, 2009. [24] He, Kaiming, et al. "A global sampling method for alpha matting." CVPR 2011. IEEE, 2011. [25] Aksoy, Yagiz, Tunc Ozan Aydin, and Marc Pollefeys. "Designing effective inter-pixel information flow for natural image matting." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. [26] Tang, Jingwei, et al. "Learning-based sampling for natural image matting." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. [27] Hou, Qiqi, and Feng Liu. "Context-aware image matting for simultaneous foreground and alpha estimation." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. [28] S. Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv, abs/1502.03167, 2015. [29] V. Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010. 4 [30] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,P. Doll´ar, and C. L. Zitnick. Microsoft coco: Commonobjects in context. In European Conference on Computer Vision, pages 740–755. Springer, 2014. 3 [31] Wang, L-T., et al. "SSIM: A software levelized compiled-code simulator." Proceedings of the 24th ACM/IEEE Design Automation Conference. 1987. [32] Chan, Tony, et al. "Recent developments in total variation image restoration." Mathematical Models of Computer Vision 17.2 (2005): 17-31.
論文全文使用權限	校內：校內紙本論文立即公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權予資料庫廠商校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信