| 系統識別號 | U0002-1401202515025100 |
|---|---|
| DOI | 10.6846/tku202500032 |
| 論文名稱(中文) | 以合併量化微調擴散模型之圖像生成應用 |
| 論文名稱(英文) | ZipQDoRA: A Lightweight Method for Any Subject in Any Style for Merging DoRA in Diffusion Models |
| 第三語言論文名稱 | |
| 校院名稱 | 淡江大學 |
| 系所名稱(中文) | 資訊工程學系碩士班 |
| 系所名稱(英文) | Department of Computer Science and Information Engineering |
| 外國學位學校名稱 | |
| 外國學位學院名稱 | |
| 外國學位研究所名稱 | |
| 學年度 | 113 |
| 學期 | 1 |
| 出版年 | 114 |
| 研究生(中文) | 梁廣廷 |
| 研究生(英文) | Guang-Ting Liang |
| 學號 | 611410589 |
| 學位類別 | 碩士 |
| 語言別 | 繁體中文 |
| 第二語言別 | |
| 口試日期 | 2024-12-17 |
| 論文頁數 | 41頁 |
| 口試委員 |
口試委員
-
武士戎(wushihjung@mail.tku.edu.tw)
指導教授 - 陳惇凱( dkchen@mail.tku.edu.tw) 口試委員 - 林志豪 |
| 關鍵字(中) |
擴散模型 低秩分解 量化 |
| 關鍵字(英) |
diffusion model Parameter-Efficient Fine-Tuning Quantization |
| 第三語言關鍵字 | |
| 學科別分類 | |
| 中文摘要 |
本研究探討了如何整合權重分解與重組(DoRA:Weight-Decomposed Low-Rank Adaptation)技術,以實現擴散模型中的高效個性化生成。。研究聚焦於解決現有技術在主題與風格融合時的保真度損失問題。 為了解決這個問題,我們提出了 ZipQDoRA,這是一種低成本且高效的方法,用於實現風格與主題 DoRA 的高效合併,使得能夠以任何用戶提供的風格生成任何用戶提供的主題,同時保持保真實性。 此外,針對效能優化,我們還研究了採用混合精度訓練策略,將 DoRA 權重量化為較低 的精度(如 INT8)以節省 GPU 記憶體,將基礎模型權重保持在完整的 FP16精度,顯著降低GPU 記憶體使用率。 ZipQDoRA 的創新之處在於它能夠有效地結合獨立訓練的風格和主題 DoRA,同時保持高保真度。這種方法不僅維持了生成品質,還保持了計算效率。基於 LoRA 架構的進階演進,提供更靈活的適應機制,支援複雜的個性化生成任務,同時保持計算效率與生成質量的平衡。 本研究為擴散模型的個性化提供了創新解決方案,在技術實現與實際應用之間取得了良好 的平衡,為相關領域的發展提供了新的研究方向。 |
| 英文摘要 |
This research explores how to integrate Weight-Decomposed Low-Rank Adaptation (DoRA) techniques to achieve personalized generation in concept-driven models. Studies have shown that fine-tuning generative models for concept-driven personalization can achieve excellent results in both topic-driven and style-driven generation. DoRA offers a parameter-efficient way to achieve concept-driven personalization. However, existing methods for combining independent style and theme DoRA often compromise on either theme or style fidelity. To address this issue, we propose ZipQDoRA, a low-cost and effective method for merging independently trained style and theme DoRA, enabling the generation of any user-provided theme in any user-provided style while maintaining fidelity. In addition, we also investigated the possibility of applying quantization methods to DoRA weights to further reduce model size, such as mixed-precision training, quantizing DoRA weights to lower precision (such as INT8) to save memory, while maintaining the base model weights at half FP16 precision. The innovation of ZipQDoRA is that it can effectively combine independently trained style and theme DoRA while maintaining high fidelity. This method not only improves the generation quality but also maintains computational efficiency. By applying quantization techniques, such as reducing DoRA weights to INT8 precision, we can further optimize model size and memory usage, which is especially valuable for deploying models in resource-constrained environments. It is worth noting that DoRA is a variant of LoRA (Low-Rank Adaptation), which provides more flexible adaptation capabilities while maintaining parameter efficiency. The proposal of ZipQDoRA further extends this technology, enabling it to better handle complex personalized generation tasks. In conclusion, this research provides new ideas and methods for the development of personalized generation models and is expected to have an important impact on future AI applications, especially in scenarios that require high personalization and resource efficiency. |
| 第三語言摘要 | |
| 論文目次 |
目錄 誌謝 i 目錄 vi 圖目錄 viii 表目錄 ix 第一章、 簡介 1 第二章、相關研究 4 2-1提示詞工程 4 2-2全參數調整 10 2-3 DreamBooth 11 2-4量化 13 2-5參數高效微調 17 2-6知識蒸餾 19 第三章、系統方法 22 3.1基石模型 22 3.2 LoRA 22 3.3 DoRA 22 3.4 QDoRA 23 3.5 ZipQDoRA 25 3.6擴散模型評價 27 第四章、實驗結果 29 4.1 ZipQDoRA生成結果 29 4.2 量化比較 33 第五章、結論與貢獻 36 5.1 創新的 ZipQDoRA 微調合併方法 36 5.2 資源效率下的卓越性能 36 5.3 普及化的個性化圖像生成 ZipQDoRA 36 第六章、未來工作 38 參考文獻 39 圖目錄 圖 1研究目標 2 圖 2 SDXL生成德國牧羊犬在熱帶海灘上奔跑圖 7 圖 3 CIVITAI網站 8 圖 4 PROMPTHERO 網站 9 圖 5 LEXICA.ART 網站 10 圖 6少量樣本學習個人化圖像生成圖 11 圖 7 DREAMBOOTH的微調流程圖 12 圖 8 LORA 示意圖 17 圖 9 DORA 示意圖 18 圖 10使用 DREAMBOOTH 和 QDORA 在 SDXL 上進行自定義的物件訓練圖 24 圖 11使用 DREAMBOOTH 和 QDORA 在 SDXL 上進行自定義的風格訓練圖 25 圖 12合併中世紀騎士內容及1920塔羅牌風格的圖像生成圖 26 圖 13實驗結果示意圖 29 圖 14生成結果(1) 30 圖 15生成結果(2) 31 圖 16生成結果(3) 32 圖 17比較不同參數高效微調方法圖 34 圖 18比較不同合併參數高效微調方法圖 35 表目錄 表 1所需GPU VRAM表 33 表 2所需GPU型號表 33 |
| 參考文獻 |
參考文獻 [1] “Midjourney,” Midjourney. Accessed: Nov. 13, 2024. [Online]. Available: https://www.midjourney.com/website [2] C. Saharia et al., “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding,” May 23, 2022, arXiv: arXiv:2205.11487. doi: 10.48550/arXiv.2205.11487. [3] Imagen-Team-Google et al., “Imagen 3,” Aug. 13, 2024, arXiv: arXiv:2408.07009. doi: 10.48550/arXiv.2408.07009. [4] A. Ramesh et al., “Zero-Shot Text-to-Image Generation,” Feb. 26, 2021, arXiv: arXiv:2102.12092. doi: 10.48550/arXiv.2102.12092. [5] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical Text-Conditional Image Generation with CLIP Latents,” Apr. 13, 2022, arXiv: arXiv:2204.06125. doi: 10.48550/arXiv.2204.06125. [6] bitsandbytes-foundation/bitsandbytes. (Nov. 11, 2024). Python. bitsandbytes foundation. Accessed: Nov. 12, 2024. [Online]. Available: https://github.com/bitsandbytes-foundation/bitsandbytes [7] Q. Lhoest et al., “Datasets: A Community Library for Natural Language Processing,” Sep. 07, 2021, arXiv: arXiv:2109.02846. doi: 10.48550/arXiv.2109.02846. [8] “huggingface/accelerate: 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support.” Accessed: Sep. 03, 2024. [Online]. Available: https://github.com/huggingface/accelerate [9] “huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.” Accessed: Sep. 03, 2024. [Online]. Available: https://github.com/huggingface/diffusers [10] “huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.” Accessed: Sep. 03, 2024. [Online]. Available: https://github.com/huggingface/peft [11] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Q. Liu and D. Schlangen, Eds., Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. doi: 10.18653/v1/2020.emnlp-demos.6. [12] D. Podell et al., “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis,” Jul. 04, 2023, arXiv: arXiv:2307.01952. doi: 10.48550/arXiv.2307.01952. [13] P. Esser et al., “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis,” Mar. 05, 2024, arXiv: arXiv:2403.03206. doi: 10.48550/arXiv.2403.03206. [14] “Black Forest Labs - Frontier AI Lab,” Black Forest Labs. Accessed: Nov. 11, 2024. [Online]. Available: https://blackforestlabs.ai/ [15] “Civitai: The Home of Open-Source Generative AI.” Accessed: Nov. 18, 2024. [Online]. Available: https://civitai.com/ [16] “Search prompts for Stable Diffusion, ChatGPT & Midjourney,” PromptHero. Accessed: Dec. 01, 2024. [Online]. Available: https://prompthero.com/ [17] “Lexica,” Lexica. Accessed: Dec. 01, 2024. [Online]. Available: https://lexica.art/ [18] E. J. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” Oct. 16, 2021, arXiv: arXiv:2106.09685. doi: 10.48550/arXiv.2106.09685. [19] R. Gal et al., “An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion,” Aug. 02, 2022, arXiv: arXiv:2208.01618. doi: 10.48550/arXiv.2208.01618. [20] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,” Mar. 15, 2023, arXiv: arXiv:2208.12242. doi: 10.48550/arXiv.2208.12242. [21] X. Li et al., “Q-Diffusion: Quantizing Diffusion Models,” Jun. 08, 2023, arXiv: arXiv:2302.04304. doi: 10.48550/arXiv.2302.04304. [22] Y. Shang, Z. Yuan, B. Xie, B. Wu, and Y. Yan, “Post-training Quantization on Diffusion Models,” Mar. 16, 2023, arXiv: arXiv:2211.15736. doi: 10.48550/arXiv.2211.15736. [23] T. Dettmers, M. Lewis, Y. Belkada, and L. Zettlemoyer, “LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale,” Nov. 10, 2022, arXiv: arXiv:2208.07339. doi: 10.48550/arXiv.2208.07339. [24] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient Finetuning of Quantized LLMs,” May 23, 2023, arXiv: arXiv:2305.14314. doi: 10.48550/arXiv.2305.14314. [25] Y. Frenkel, Y. Vinker, A. Shamir, and D. Cohen-Or, “Implicit Style-Content Separation using B-LoRA,” Sep. 22, 2024, arXiv: arXiv:2403.14572. doi: 10.48550/arXiv.2403.14572. [26] S.-Y. Liu et al., “DoRA: Weight-Decomposed Low-Rank Adaptation,” Jul. 09, 2024, arXiv: arXiv:2402.09353. doi: 10.48550/arXiv.2402.09353. [27] J. Kohler et al., “Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation,” May 08, 2024, arXiv: arXiv:2405.05224. doi: 10.48550/arXiv.2405.05224. [28] A. Sauer, F. Boesel, T. Dockhorn, A. Blattmann, P. Esser, and R. Rombach, “Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation,” Mar. 18, 2024, arXiv: arXiv:2403.12015. doi: 10.48550/arXiv.2403.12015. [29] V. Shah et al., “ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs.” arXiv, Jan. 2023. doi: 10.48550/arXiv.2311.13600. [30] “Text to Image Models and Providers Leaderboard | Artificial Analysis.” Accessed: Nov. 14, 2024. [Online]. Available: https://artificialanalysis.ai/text-to-image [31] Y. Kirstain, A. Polyak, U. Singer, S. Matiana, J. Penna, and O. Levy, “Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation.” arXiv, Jan. 2023. doi: 10.48550/arXiv.2305.01569. [32] J. Xu et al., “ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation,” Dec. 28, 2023, arXiv: arXiv:2304.05977. doi: 10.48550/arXiv.2304.05977. |
| 論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信