系統識別號 | U0002-2908202022554500 |
---|---|
DOI | 10.6846/TKU.2020.00876 |
論文名稱(中文) | 風格創造器-應用多條件生成對抗模型於上衣時尚風格設計 |
論文名稱(英文) | Style Creator-Generate Multi-Fashion-Style on clothes using the Multi-Domain Conditional Generative Adversarial Nets |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 電機工程學系碩士班 |
系所名稱(英文) | Department of Electrical and Computer Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 108 |
學期 | 2 |
出版年 | 109 |
研究生(中文) | 張嘉鈞 |
研究生(英文) | Chia-Chun,Chang |
學號 | 607450029 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2020-07-07 |
論文頁數 | 79頁 |
口試委員 |
指導教授
-
周建興(chchou@mail.tku.edu.tw)
委員 - 趙于翔(yxzhao@nqu.edu.tw) 委員 - 謝易錚(yzhsieh@mail.ntou.edu.tw) |
關鍵字(中) |
風格轉換 條件生成對抗網路 人體語意分析模型 StarGAN |
關鍵字(英) |
Style Transform Human-Parsing CGAN StarGAN |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
設計師在設計衣服時往往需要靈感的激發而顧客在選擇衣服樣式時同樣也需要多重參考,但現實是設計師收集素材、顧客搜尋衣服就花費相當多時間。近年來,GAN(生成對抗模型)在時尚風格設計上有顯著的效果也越來越多相關的模型出現,但是大多都是選取目標服裝進行服裝轉換到輸入圖片上或是直接更換臉部讓目標服飾轉換到使用者身上,這些都是針對「服裝匹配」,鮮少是直接針對「服裝風格」而設計,而針對服裝風格轉換的模型又僅能選取兩種風格轉換,這讓每一個風格都要花費時間重新訓練,為解決上述問題本文設計了一個有效生成多風格上衣的多條件生成對抗網路模型。 本文提出的方法可以讓使用者輸入一張圖片後輸出多種基於原圖服飾進行風格轉換的結果,例如:素面,格子,條紋,圓點也包含底色更換,除了單種類風格轉換可以選擇多種類混搭,例如:底色更換加原點、格子加條紋、原點加條文;這可以讓設計師及顧客直接獲取多種類的風格圖片以及混搭風格,讓靈感激發變得更迅速方便。 |
英文摘要 |
Designers often need inspiration when designing clothes, and customers also need multiple references when choosing clothes, but the reality is designers and customers spend a lot of time collecting materials and searching clothes they want. Recent studies have shown remarkable success in fashion style translation and more related paper had presented, like translating input image’s cloth to selecting target clothing or changing input image’s face to the target image directly, but most of them are designed for “clothing matching”, not designed for “clothing style”. And the model for clothing style translation can only select two styles, which makes each style must take time to retrain. To solve the problems, this paper proposes a multi-condition generation adversarial network model that generates a multi-style clothes effectively. Which allows users to generate a variety of style translated results based on the original image, e.g., plain surface, grid, stripes, dots, color changed.In addition to a single type of style translation, you can choose a variety of types Mashups, e.g., color changed mix origin, plaid mix stripes, origin mix stripes; this allows designers and customers to directly obtain a variety of style pictures and mashup styles, making inspiration more rapid and convenient. |
第三語言摘要 | |
論文目次 |
致謝 I 中文摘要 II 目錄 V 圖目錄 IX 表目錄 XII 第一章 緒論 1 1.1 研究動機 1 1.2 研究目標 1 第二章 背景知識與相關研究 2 2.1 Generative Adversarial Nets ( GAN ) 2 2.2 Image-to-image Translation 5 2.3 Virtual Try-on 7 2.4 Fashion Cloth Generation 9 2.5 Summary 12 第三章 應用 StarGAN於生成多風格服裝之研究 13 3.1 數據集 14 3.1.1 數據集介紹 14 3.1.2 收集條件 15 3.1.3 資料建立方式 16 3.1.4 長邊縮放演算法 17 3.1.5 資料結構、資料擴增 19 3.2 人體語意分析 - Self-Correction for Human Parsing 20 3.2.1 簡介 20 3.2.2 應用SCHP於自有數據集 21 3.3 多條件生成對抗網路模型 - Star GAN 24 3.3.1 簡介 25 3.3.2 損失函式 (Loss Function) 27 3.3.2.1 Adversarial Loss 27 3.3.2.2 Domain Classification Loss 28 3.3.2.3 Reconstruction Loss 29 3.3.2.4 Full Objective 29 3.3.3 神經網路模型 30 3.3.3.1 基礎概念 30 3.3.3.2 生成器 (Generator) 31 3.3.3.3 鑑別器 (Discriminator) 31 3.3.4 應用Star GAN於自有數據集 33 3.4 風格混搭系統 34 3.4.1 使用者介面 35 3.4.2 風格混搭系統範例 38 3.4.2.1 範例一、Lativ的法蘭絨格紋襯衫商品圖片 38 3.4.2.2 範例二、Overstock的Stanley Men's Button Front Flannel Shirt商品圖片 39 3.4.2.3 範例三、SUPERBALIST的Polka Dot Shirt Black and White商品圖片 40 3.4.3 多張圖片疊加技術 41 第四章 實驗結果 43 4.1 訓練結果 43 4.1.1 黑色格子轉換成其他風格 43 4.1.2 黑色條紋轉換成其他風格 45 4.1.3 黑色圓點轉換成其他風格 46 4.1.4 素面轉換成其他風格 47 4.2 評估結果以及比較模型 48 4.2.1 評估指標介紹 48 4.2.1.1 Inception Score (IS) 48 4.2.1.2 Fréchet Inception Distance (FID) 49 4.2.2 比較模型 51 4.2.3 評估結果 52 4.2.3.1 訓練時間比較 52 4.2.3.2 FID分數比較 53 4.2.3.3 實際圖像比較 54 4.2.3.4 問卷調查-實際操作回饋 58 4.2.3.5 比較結果分析 60 第五章 結論與未來展望 61 參考文獻 62 附錄一、生成風格服飾結果 68 附錄二、問卷調查樣本 77 圖目錄 圖 2.1 GAN網路架構[29] 2 圖 2.2 cGAN 網路結構 3 圖 2.3 DCGAN 的示意圖 [14] 4 圖 2.4數據集成對跟非成對的差別 6 圖 2.5 Cycle GAN的結構簡化圖 [8] 6 圖 2.6 虛擬試穿示意圖 [44] 7 圖 2.7 VITON的Overview [20] 8 圖 2.8 VITON-GAN的Overview [25] 9 圖 2.9 Be Your Own Prada [26] 10 圖 2.10 DesIGN [27] 10 圖 2.11 Example of Fashion Style Generator [28] 11 圖 3.1 Overview of our method 13 圖 3.2 數據集簡覽 14 圖 3.3 Deep Fashion 提供的類別跟屬性 (部分) 16 圖 3.4 Fashion Product Images Dataset 16 圖 3.5 數據集資料夾結構圖 19 圖 3.6 Self-Correction 20 圖 3.7 應用SCHP於自有數據集示意圖 21 圖 3.8 跨領域模型跟StarGAN模型之比較圖 24 圖 3.9 Star GAN的概述 26 圖 3.10 Star GAN生成器G的結構 31 圖 3.11 Star GAN鑑別器D的結構 32 圖 3.12 應用Star GAN於自有數據集 33 圖 3.13 合併生成結果跟背景資料示意圖 33 圖 3.14 風格混搭介面 34 圖 3.15 風搭系統 - 使用者流程圖 35 圖 3.16 混搭系統 - 選取預轉換的圖片 36 圖 3.17 混搭系統 - 點選按鈕進行風格轉換 36 圖 3.18 混搭系統 - 點選該風格並調控閥值 37 圖 3.19 混搭系統 - 調整到喜愛的圖像便可儲存圖片 37 圖 3.20 兩種風格混搭:讓格子外套更深色些 38 圖 3.21 三種風格混搭:圓點風格為主 38 圖 3.22混搭系統 - 選擇單一風格 39 圖 3.23 混搭系統 - 格紋襯衫卻可創造另類格紋襯衫 39 圖 3.24 風搭系統 - 生成風格不固定讓圖像更具創意性 40 圖 3.25 混搭系統 - 簡單混合就有新特色風格 40 圖 4.1 各圖像轉換之簡覽 43 圖 4.2 格紋轉其他風格 44 圖 4.3 條紋轉其他風格 45 圖 4.4 圓點轉其他風格 46 圖 4.5 素面轉其他風格 47 圖 4.6 兩個模型的概率分布(左)、條件分佈平均的示意圖(右) 49 圖 4.7 圖片失真成度跟FID分數的關係 50 圖 4.8 CycleGAN的架構 51 圖 4.9 受測者實測畫面 58 表目錄 表 3.1 訓練範例風格之數量 15 表 3.2 總數據集數量 15 表 3.3 長邊縮放與一般縮放的示意圖 17 表 3.4 長邊縮放法之虛擬碼演示 18 表 3.5 Split_SCHP虛擬碼 22 表 3.6 應用SCHP於自有數據集 23 表 3.7 預測結果不好的案例 23 表 3.8 輸入圖片與風格標籤合併示意圖 25 表 3.9 預計疊加圖層比例 41 表 3.10 多圖疊加流程及權重範例表格 42 表 3.11 多圖疊加範例表格之對應參數介紹 42 表 4.1 硬體規格 52 表 4.2 軟體環境 52 表 4.3 FID Score (分數越低代表品質越好) 53 表 4.4 格紋轉素面 54 表 4.5 格紋轉圓點 54 表 4.6 格紋轉條紋 55 表 4.7 素面轉格紋 55 表 4.8 圓點轉格紋 55 表 4.9 圓點轉條紋 56 表 4.10 條紋轉格紋 56 表 4.11 條紋轉圓點 56 表 4.12 CycleGAN失敗的案例 57 表 4.13 問卷調查問題 59 表 4.14 FID分數差異與實際圖像對比 (越低越好) 60 |
參考文獻 |
[1] Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS) , 2014 , pp. 2672–2680. [2] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Be-longie. Stacked generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR), July 2017 [3] A. Radford, L. Metz, and S. Chintala. Unsupervised repre-sentation learning with deep convolutional generative adver-sarial networks. arXiv preprint arXiv:1511.06434,2015 [4] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017. [5] J. Zhao, M. Mathieu, and Y. LeCun. Energy-based genera-tive adversarial network. In5th International Conference on Learning Representations (ICLR), 2017. [6] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1, 2, 3, 5 [7] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Be-longie. Stacked generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR), July 2017. [8] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial net-works. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. pp.1-8 [9] T. Kim, B. Kim, M. Cha, and J. Kim. Unsupervised visual attribute transfer with reconfigurable generative adversarial networks. arXiv preprint arXiv:1707.09798, 2017. pp.2 [10] M. Li, W. Zuo, and D. Zhang. Deep identity-aware transfer of facial attributes. arXiv preprint arXiv:1610.05586, 2016. pp.2-8 [11] W. Shen and R. Liu. Learning residual images for face at-tribute manipulation. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. pp.2 [12] Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel. Variational Lossy Autoencoder. arXiv preprint arXiv:1611.02731 . [13] Y. Lecun Speech & Image Process. Services Lab., AT&T Bell Labs., Red Bank, NJ, USA. Gradient-based learning applied to document recognition. Proceedings of the IEEE , Nov. 1998 [14] Alec Radford, Luke Metz, Soumith Chintala . Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. The International Conference on Learning Representations (ICLR) ,Nov ,2016 [15] M. Mirza and S. Osindero. Conditional generativeadversar-ial nets. arXiv preprint arXiv:1411.1784, 2014. pp.3-5 [16] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. arXiv preprint arXiv:1703.00848, 2017. pp.3 [17] D. P. Kingma and M. Welling. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014. pp.3 [18] M.-Y. Liu and O. Tuzel. Coupled generative adversarial net-works. In Advances in Neural Information Processing Sys-tems (NIPS), pages 469–477, 2016. pp.3 [19] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Confer-ence on Machine Learning (ICML), 2017. pp 1857–1865 [20] Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis. VITON: An Image-based Virtual Try-on Network. arXiv preprint arXiv:1711.08447 [cs.CV], Jun , 2018. pp.1-5 [21] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE TPAMI, 2002.pp.4 [22] Thibaut Issenhuth, Jérémie Mary, Clément Calauzènes, End-to-End Learning of Geometric Deformations of Feature Maps for Virtual Try-On, arXiv:1906.01347 [cs.CV], Jun 2019 [23] Nilesh Pandey, Andreas Savakis, Poly-GAN: Multi-Conditioned GAN for Fashion Synthesis , arXiv:1909.02165 [cs.CV] , Sep 2019 [24] Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis , Compatible and Diverse Fashion Image Inpainting , arXiv:1902.01096 [cs.CV] , Apr 2019. [25] Shion Honda , VITON-GAN: Virtual Try-on Image Generator Trained with Adversarial Loss , arXiv:1911.07926 [cs.CV] , Nov 2019 [26] Shizhan Zhu, Sanja Fidler, Raquel Urtasun, Dahua Lin, Chen Change Loy , Be Your Own Prada: Fashion Synthesis with Structural Coherence , arXiv:1710.07346 [cs.CV] , Oct 2017 [27] Othman Sbai, Mohamed Elhoseiny, Antoine Bordes, Yann LeCun, Camille Couprie , DeSIGN: Design Inspiration from Generative Networks , arXiv:1804.00921 [cs.LG] , Sep , 2018 . [28] Shuhui Jiang and Yun Fu , Fashion Style Generator , Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) , 2017. [29] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, Jaegul Choo , StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation , IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8789-8797 [30] Ziwei Liu , Ping Luo , Shi Qiu , Xiaogang Wang , Xiaoou Tang , DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016 [31] Kaggle [Online] Availible : https://www.kaggle.com/paramaggarwal/fashion-product-images-dataset [32] PyTorch Document - Image Folder [ Online ] Availible : https://pytorch.org/docs/stable/torchvision/datasets.html [33] Peike Li, Yunqiu Xu, Yunchao Wei, Yi Yang , Self-Correction for Human Parsing , arXiv:1910.09777 [cs.CV] , Oct , 2019 [34] Xiaodan Liang, Ke Gong, Xiaohui Shen, and Liang Lin. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Transactions on Pat-tern Recognition and Machine Intelligence, 41(4):871–885, 2018. pp.1-6 [35] Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. Detect what you can: De-tecting and representing objects using holistic models and body parts. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014 pp. 1971–1978 [36] Martin Arjovsky, Soumith Chintala, Léon Bottou , Wasserstein GAN , arXiv:1701.07875 [stat.ML] , Dec , 2017. [37] Min Lin , Qiang Chen , Shuicheng Yan , Network In Network , arXiv:1312.4400v3 [cs.NE] , Mar , 2014. pp.3-5 [38] 反捲積 (Transposed Convolution, Fractionally Strided Convolution or Deconvolution)[Online] Availible : https://my.oschina.net/u/3702502/blog/1803358 [39] Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky , Instance Normalization: The Missing Ingredient for Fast Stylization , arXiv:1607.08022 [cs.CV] , Nov , 2017. [40] Augustus Odena, Christopher Olah, Jonathon Shlens , Conditional Image Synthesis With Auxiliary Classifier GANs , arXiv:1610.09585 [stat.ML] , Jul , 2017. [41] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen , Improved Techniques for Training GANs , arXiv:1606.03498 [cs.LG] , Jun , 2016. [42] Author links open overlay panelD.CDowsonB.VLandau , The Fréchet distance between multivariate normal distributions , Journal of Multivariate Analysis , Volume 12, Issue 3, September 1982, Pages 450-455. [43] A New Benchmark for Human Parsing [Online] Availible: http://www.sysu-hcp.net/a-new-benchmark-for-human-parsing/ [44] Vsmile.com.tw [Online] Availible: https://www.vismile.com.tw/zh/AI-virtual-fitting.html |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信