§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1401202515025100
DOI 10.6846/tku202500032
論文名稱(中文) 以合併量化微調擴散模型之圖像生成應用
論文名稱(英文) ZipQDoRA: A Lightweight Method for Any Subject in Any Style for Merging DoRA in Diffusion Models
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 113
學期 1
出版年 114
研究生(中文) 梁廣廷
研究生(英文) Guang-Ting Liang
學號 611410589
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2024-12-17
論文頁數 41頁
口試委員 口試委員 - 武士戎(wushihjung@mail.tku.edu.tw)
指導教授 - 陳惇凱( dkchen@mail.tku.edu.tw)
口試委員 - 林志豪
關鍵字(中) 擴散模型
低秩分解
量化
關鍵字(英) diffusion model
Parameter-Efficient Fine-Tuning
Quantization
第三語言關鍵字
學科別分類
中文摘要
本研究探討了如何整合權重分解與重組(DoRA:Weight-Decomposed Low-Rank Adaptation)技術,以實現擴散模型中的高效個性化生成。。研究聚焦於解決現有技術在主題與風格融合時的保真度損失問題。
為了解決這個問題,我們提出了 ZipQDoRA,這是一種低成本且高效的方法,用於實現風格與主題 DoRA 的高效合併,使得能夠以任何用戶提供的風格生成任何用戶提供的主題,同時保持保真實性。
此外,針對效能優化,我們還研究了採用混合精度訓練策略,將 DoRA 權重量化為較低
的精度(如 INT8)以節省 GPU 記憶體,將基礎模型權重保持在完整的 FP16精度,顯著降低GPU 記憶體使用率。
ZipQDoRA 的創新之處在於它能夠有效地結合獨立訓練的風格和主題 DoRA,同時保持高保真度。這種方法不僅維持了生成品質,還保持了計算效率。基於 LoRA 架構的進階演進,提供更靈活的適應機制,支援複雜的個性化生成任務,同時保持計算效率與生成質量的平衡。
本研究為擴散模型的個性化提供了創新解決方案,在技術實現與實際應用之間取得了良好
的平衡,為相關領域的發展提供了新的研究方向。
英文摘要
This research explores how to integrate Weight-Decomposed Low-Rank Adaptation (DoRA) techniques to achieve personalized generation in concept-driven models. Studies have shown that fine-tuning generative models for concept-driven personalization can achieve excellent results in both topic-driven and style-driven generation. DoRA offers a parameter-efficient way to achieve concept-driven personalization. However, existing methods for combining independent style and theme DoRA often compromise on either theme or style fidelity.
To address this issue, we propose ZipQDoRA, a low-cost and effective method for merging independently trained style and theme DoRA, enabling the generation of any user-provided theme in any user-provided style while maintaining fidelity.
In addition, we also investigated the possibility of applying quantization methods to DoRA weights to further reduce model size, such as mixed-precision training, quantizing DoRA weights to lower precision (such as INT8) to save memory, while maintaining the base model weights at half FP16 precision.
The innovation of ZipQDoRA is that it can effectively combine independently trained style and theme DoRA while maintaining high fidelity. This method not only improves the generation quality but also maintains computational efficiency. By applying quantization techniques, such as reducing DoRA weights to INT8 precision, we can further optimize model size and memory usage, which is especially valuable for deploying models in resource-constrained environments.
It is worth noting that DoRA is a variant of LoRA (Low-Rank Adaptation), which provides more flexible adaptation capabilities while maintaining parameter efficiency. The proposal of ZipQDoRA further extends this technology, enabling it to better handle complex personalized generation tasks.
In conclusion, this research provides new ideas and methods for the development of personalized generation models and is expected to have an important impact on future AI applications, especially in scenarios that require high personalization and resource efficiency.
第三語言摘要
論文目次
目錄

誌謝	i
目錄	vi
圖目錄	viii
表目錄	ix
第一章、	簡介	1
第二章、相關研究	4
2-1提示詞工程	4
2-2全參數調整	10
2-3 DreamBooth	11
2-4量化	13
2-5參數高效微調	17
2-6知識蒸餾	19
第三章、系統方法	22
3.1基石模型	22
3.2	LoRA	22
3.3	DoRA	22
3.4 QDoRA	23
3.5 ZipQDoRA	25
3.6擴散模型評價	27
第四章、實驗結果	29
4.1 ZipQDoRA生成結果	29
4.2	量化比較	33
第五章、結論與貢獻	36
5.1 創新的 ZipQDoRA 微調合併方法	36
5.2 資源效率下的卓越性能	36
5.3 普及化的個性化圖像生成 ZipQDoRA	36
第六章、未來工作	38
參考文獻	39

圖目錄
 
圖 1研究目標	2
圖 2 SDXL生成德國牧羊犬在熱帶海灘上奔跑圖	7
圖 3 CIVITAI網站	8
圖 4 PROMPTHERO 網站	9
圖 5 LEXICA.ART 網站	10
圖 6少量樣本學習個人化圖像生成圖	11
圖 7 DREAMBOOTH的微調流程圖	12
圖 8 LORA 示意圖	17
圖 9 DORA 示意圖	18
圖 10使用 DREAMBOOTH 和 QDORA 在 SDXL 上進行自定義的物件訓練圖	24
圖 11使用 DREAMBOOTH 和 QDORA 在 SDXL 上進行自定義的風格訓練圖	25
圖 12合併中世紀騎士內容及1920塔羅牌風格的圖像生成圖	26
圖 13實驗結果示意圖	29
圖 14生成結果(1)	30
圖 15生成結果(2)	31
圖 16生成結果(3)	32
圖 17比較不同參數高效微調方法圖	34
圖 18比較不同合併參數高效微調方法圖	35

表目錄

表 1所需GPU VRAM表	33
表 2所需GPU型號表	33

參考文獻
參考文獻
[1]	“Midjourney,” Midjourney. Accessed: Nov. 13, 2024. [Online]. Available: https://www.midjourney.com/website
[2]	C. Saharia et al., “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding,” May 23, 2022, arXiv: arXiv:2205.11487. doi: 10.48550/arXiv.2205.11487.
[3]	Imagen-Team-Google et al., “Imagen 3,” Aug. 13, 2024, arXiv: arXiv:2408.07009. doi: 10.48550/arXiv.2408.07009.
[4]	A. Ramesh et al., “Zero-Shot Text-to-Image Generation,” Feb. 26, 2021, arXiv: arXiv:2102.12092. doi: 10.48550/arXiv.2102.12092.
[5]	A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical Text-Conditional Image Generation with CLIP Latents,” Apr. 13, 2022, arXiv: arXiv:2204.06125. doi: 10.48550/arXiv.2204.06125.
[6]	bitsandbytes-foundation/bitsandbytes. (Nov. 11, 2024). Python. bitsandbytes foundation. Accessed: Nov. 12, 2024. [Online]. Available: https://github.com/bitsandbytes-foundation/bitsandbytes
[7]	Q. Lhoest et al., “Datasets: A Community Library for Natural Language Processing,” Sep. 07, 2021, arXiv: arXiv:2109.02846. doi: 10.48550/arXiv.2109.02846.
[8]	“huggingface/accelerate: 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support.” Accessed: Sep. 03, 2024. [Online]. Available: https://github.com/huggingface/accelerate
[9]	“huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.” Accessed: Sep. 03, 2024. [Online]. Available: https://github.com/huggingface/diffusers
[10]	“huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.” Accessed: Sep. 03, 2024. [Online]. Available: https://github.com/huggingface/peft
[11]	T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Q. Liu and D. Schlangen, Eds., Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. doi: 10.18653/v1/2020.emnlp-demos.6.
[12]	D. Podell et al., “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis,” Jul. 04, 2023, arXiv: arXiv:2307.01952. doi: 10.48550/arXiv.2307.01952.
[13]	P. Esser et al., “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis,” Mar. 05, 2024, arXiv: arXiv:2403.03206. doi: 10.48550/arXiv.2403.03206.
[14]	“Black Forest Labs - Frontier AI Lab,” Black Forest Labs. Accessed: Nov. 11, 2024. [Online]. Available: https://blackforestlabs.ai/
[15]	“Civitai: The Home of Open-Source Generative AI.” Accessed: Nov. 18, 2024. [Online]. Available: https://civitai.com/
[16]	“Search prompts for Stable Diffusion, ChatGPT & Midjourney,” PromptHero. Accessed: Dec. 01, 2024. [Online]. Available: https://prompthero.com/
[17]	“Lexica,” Lexica. Accessed: Dec. 01, 2024. [Online]. Available: https://lexica.art/
[18]	E. J. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” Oct. 16, 2021, arXiv: arXiv:2106.09685. doi: 10.48550/arXiv.2106.09685.
[19]	R. Gal et al., “An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion,” Aug. 02, 2022, arXiv: arXiv:2208.01618. doi: 10.48550/arXiv.2208.01618.
[20]	N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,” Mar. 15, 2023, arXiv: arXiv:2208.12242. doi: 10.48550/arXiv.2208.12242.
[21]	X. Li et al., “Q-Diffusion: Quantizing Diffusion Models,” Jun. 08, 2023, arXiv: arXiv:2302.04304. doi: 10.48550/arXiv.2302.04304.
[22]	Y. Shang, Z. Yuan, B. Xie, B. Wu, and Y. Yan, “Post-training Quantization on Diffusion Models,” Mar. 16, 2023, arXiv: arXiv:2211.15736. doi: 10.48550/arXiv.2211.15736.
[23]	T. Dettmers, M. Lewis, Y. Belkada, and L. Zettlemoyer, “LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale,” Nov. 10, 2022, arXiv: arXiv:2208.07339. doi: 10.48550/arXiv.2208.07339.
[24]	T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient Finetuning of Quantized LLMs,” May 23, 2023, arXiv: arXiv:2305.14314. doi: 10.48550/arXiv.2305.14314.
[25]	Y. Frenkel, Y. Vinker, A. Shamir, and D. Cohen-Or, “Implicit Style-Content Separation using B-LoRA,” Sep. 22, 2024, arXiv: arXiv:2403.14572. doi: 10.48550/arXiv.2403.14572.
[26]	S.-Y. Liu et al., “DoRA: Weight-Decomposed Low-Rank Adaptation,” Jul. 09, 2024, arXiv: arXiv:2402.09353. doi: 10.48550/arXiv.2402.09353.
[27]	J. Kohler et al., “Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation,” May 08, 2024, arXiv: arXiv:2405.05224. doi: 10.48550/arXiv.2405.05224.
[28]	A. Sauer, F. Boesel, T. Dockhorn, A. Blattmann, P. Esser, and R. Rombach, “Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation,” Mar. 18, 2024, arXiv: arXiv:2403.12015. doi: 10.48550/arXiv.2403.12015.
[29]	V. Shah et al., “ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs.” arXiv, Jan. 2023. doi: 10.48550/arXiv.2311.13600.
[30]	“Text to Image Models and Providers Leaderboard | Artificial Analysis.” Accessed: Nov. 14, 2024. [Online]. Available: https://artificialanalysis.ai/text-to-image
[31]	Y. Kirstain, A. Polyak, U. Singer, S. Matiana, J. Penna, and O. Levy, “Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation.” arXiv, Jan. 2023. doi: 10.48550/arXiv.2305.01569.
[32]	J. Xu et al., “ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation,” Dec. 28, 2023, arXiv: arXiv:2304.05977. doi: 10.48550/arXiv.2304.05977.

論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於繳交授權書後, 於網際網路立即公開
校內
校內紙本論文立即公開
同意電子論文全文授權於全球公開
校內電子論文立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信