§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2907202517175100
DOI 10.6846/tku202500630
論文名稱(中文) 我獨自蒸餾:結合 RAG 增強與 LoRA 微調開發的語言模型自監督訓練系統
論文名稱(英文) Solo Distillation: Self-Supervised Language Model Training via RAG-Augmented Pseudo Labels and LoRA Fine-Tuning
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 機械與機電工程學系碩士班
系所名稱(英文) Department of Mechanical and Electro-Mechanical Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 113
學期 2
出版年 114
研究生(中文) 吳岱霖
研究生(英文) TAI-LIN WU
學號 612370139
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2025-07-04
論文頁數 61頁
口試委員 指導教授 - 王銀添(090488@o365.tku.edu.tw)
口試委員 - 許閔傑
口試委員 - 吳志清
關鍵字(中) 語言模型
自蒸餾
參數高效微調
RAG
免標註訓練
關鍵字(英) language model
self-distillation
parameter-efficient fine-tuning
retrieval-augmented generation
label-free training
第三語言關鍵字
學科別分類
中文摘要
本論文提出「我獨自蒸餾 (Solo Distillation)」,一種結合檢索增強生成 (Retrieval-Augmented Generation, RAG)、自我蒸餾 (Self-Distillation) 與 LoRA (Low-Rank Adaptation) 微調技術的語言模型自監督訓練框架。此框架旨在解決大型語言模型面臨的知識過時、微調成本高昂以及缺乏有效無監督學習方式等挑戰 。研究的核心方法是利用具備 RAG 能力的教師模型,從外部知識庫檢索資訊以生成高品質的偽標籤 (Pseudo-labels) 。接著,學生模型在不依賴人工標註的前提下,透過監督式微調 (Supervised Fine-Tuning, SFT) 學習教師模型的輸出,並僅針對 LoRA 層進行參數更新,以實現參數高效的自我蒸餾過程 。教師與學生模型共享相同的基礎模型架構,透過切換 LoRA 權重來區分其角色 。本研究選用 QASC (Question Answering via Sentence Composition) 資料集進行實驗驗證 。此研究證實了該自監督訓練框架的可行性與有效性。它展示了在無需人工標註的條件下,透過 RAG 增強的偽標籤與參數高效的 LoRA 微調,能夠成功地將知識從教師模型遷移至學生模型,為低資源條件下的語言模型優化提供了一種可擴展且高效的解決方案。
英文摘要
This thesis proposes “Solo Distillation,” a self-supervised training framework for language models that integrates Retrieval-Augmented Generation (RAG), self-distillation, and LoRA (Low-Rank Adaptation) fine-tuning. The framework addresses three key challenges faced by large language models (LLMs): knowledge obsolescence, high fine-tuning costs, and the lack of effective unsupervised learning methods.
The core idea is to employ a teacher model equipped with RAG capabilities to retrieve information from an external knowledge base and generate high-quality pseudo-labels. Without any human annotations, a student model then learns the teacher’s outputs through Supervised Fine-Tuning (SFT) while updating only the LoRA layers, enabling a parameter-efficient self-distillation process. The teacher and student share the same underlying model architecture and switch roles simply by loading different LoRA weights.Experiments are conducted on the QASC (Question Answering via Sentence Composition) dataset. Results confirm the feasibility and effectiveness of the proposed self-supervised framework: by combining RAG-enhanced pseudo-labels with parameter-efficient LoRA fine-tuning, knowledge can be transferred from the teacher to the student model without manual labeling. This provides a scalable and efficient solution for optimizing LLMs under low-resource conditions.
第三語言摘要
論文目次
目錄
目錄	III
圖目錄	V
表目錄	VI
第1章 序論	1
1.1 研究動機	1
1.2 研究目的	2
1.3 研究範圍	3
1.4 論文貢獻	4
1.5 論文架構	5
第2章 文獻探討	6
2.1 自蒸餾	6
2.2 參數高效微調技術LoRA(Low-Rank Adaptation)	7
2.3 RAG(Retrieval-Augmented Generation)	9
2.4 技術整合:自蒸餾、微調與 RAG 的融合策略	10
2.4.1 LoRA與蒸餾方式的結合與比較	10
2.4.2 RAG 與知識蒸餾結合之應用現況	11
第3章 研究方法	13
3.1研究架構	14
3.2模型設定	15
3.3訓練流程	16
第4章 資料集處理及應用	20
4.1 資料來源概述	20
4.2 原始資料預處理	21
4.3 向量資料庫建構	22
4.4 向量資料庫檢索流程與儲存偽標籤	24
4.5 資料切分與版本控制	25
4.6 品質檢查與過濾	26
第5章 實驗數據	27
5.1實驗設定	27
5.1.1 資料集設定	27
5.1.2 模型設定	27
5.1.3 實作細節	28
5.2 評估方式	30
5.2.1 定量分析方法	31
5.2.2 定性分析方法	33
5.3 實驗結果與定量分析	34
5.4 定性分析與錯誤探討	41
5.5 綜合討論	50
第6章 結果與討論	53
6.1 研究成果	53
6.2 未來研究方向	53
參考文獻	56

圖目錄
圖1.1 能力提升示意圖	3
圖1.2 運用不同模組來建構區分教師-學生	5
圖2.1 SDFT流程圖	7
圖2.2 (a)、(b)、(c)分別為全參微調、LoRA、DoRA在注意力層中的架構圖	9
圖2.3  LLM-NEO做法	11
圖2.4 KARD 流程圖	12
圖3.1 各模型資料示意圖	13
圖3.2 流程圖	14
圖3.3 教師模型示意圖	15
圖3.4 學生模型示意圖	15
圖3.5  Pseudo label產生過程	17
圖3.6  LoRA訓練流程	18
圖4.1 向量資料庫製作流程圖	23
圖4.2 原資料集與新產生的資料集用UMAP降維圖	26
圖5.1 各方式訓練時使用之VRAM	30
圖5.2  資料集Train_data準確率	35
圖5.3  資料集Test_data準確率	37
圖5.4 資料集Train_data各模型推論顯存使用量	38
圖5.5運行整體流程使用的VRAM	39
圖5.6 資料集Test_data各模型推論顯存使用量	40
圖5.7 錯誤分析流程圖	46
圖5.8 失誤類型占比	50
圖6.1 一段式訓練架構	54

表目錄
表3.1  Pseudocode	19
表4.1 資料集範例	21
表4.2 資料集使用方式	22
表4.3 節錄自長段落資料集	23
表4.4 實驗用資料集	25
表5.1  LoRA層參數	28
表5.2 模型介紹	29
表5.3 性能比較定量分析資料集Train_data	34
表5.4 性能比較定量分析資料集Test_data	36
表5.5 資源比較定量分析資料集Train_data	38
表5.6資源比較定量分析資料集Test_data	40
表5.7 定性分析內容	42
表5.8 定性分析	44
表5.9  ID 3DL65MZB8DEXDSG44TVUAV620P2CED回答比較	47
表5.10  ID 3FK0YFF9PZFAEC8QQ0F90RIDKNWVV3回答比較	48
表5.11  ID 3QECW5O0KH0E3QPMFEXHVB0TAG8T5C回答比較	49
表5.12 失誤類型占比	49
表5.13 LoRA、DoRA橫向比對	51
表6.1 Solo Distillation相較於以往方式的優勢	53


參考文獻
[1]	OpenAI, "ChatGPT: Optimizing language models for dialogue," 2022.
[2]	H. Touvron et al., "LLaMA: Open and efficient foundation language models," arXiv preprint arXiv:2302.13971, 2023. [Online]. Available: https://arxiv.org/abs/2302.13971
[3]	A. Vaswani et al., "Attention is all you need," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008
[4]	L. Ouyang et al., "Training language models to follow instructions with human feedback," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2022.
[5]	N. Chan et al., "A theoretical and empirical exploration of reinforcement learning from AI feedback," arXiv preprint arXiv:2309.00668, 2023. [Online]. Available: https://arxiv.org/abs/2309.00668
[6]	E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, "LoRA: Low-rank adaptation of large language models," in Proc. Int. Conf. on Learning Representations (ICLR), 2022. [Online]. Available: https://arxiv.org/abs/2106.09685
[7]	P. Lewis et al., "Retrieval-augmented generation for knowledge-intensive NLP tasks," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2020, pp. 9459–9474.
[8]	B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, and J. Y. Choi, “Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 3713–3722. [Online]. Available: https://arxiv.org/abs/1905.08094
[9]	W. A. Khot, A. Sabharwal, and P. Clark, "QASC: A dataset for question answering via sentence composition," in Proc. AAAI Conf. on Artificial Intelligence, 2020.
[10]	Hugging Face, "Hugging Face – The AI community building the future." [Online]. Available: https://huggingface.co
[11]	G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015. [Online]. Available: https://arxiv.org/abs/1503.02531
[12]	D.-H. Lee, "Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks," in Proc. ICML Workshop on Challenges in Representation Learning, 2013.
[13]	J. Kirkpatrick et al., "Overcoming catastrophic forgetting in neural networks," Proc. Natl. Acad. Sci. U.S.A., vol. 114, no. 13, pp. 3521–3526, 2017. [Online]. Available: https://arxiv.org/abs/1612.00796
[14]	Q. Wang, H. B. Sailor, T. Liu, and A. T. Aw, "Contextual paralinguistic data creation for multi-modal speech-LLM: Data condensation and spoken QA generation," arXiv preprint arXiv:2505.13338, May 2025. [Online]. Available: https://arxiv.org/abs/2505.13338
[15]	T. Furlanello, Z. C. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, "Born-again neural networks," in Proc. Int. Conf. on Machine Learning (ICML), 2018. [Online]. Available: https://arxiv.org/abs/1805.04770
[16]	Y. Yang, L. Kong, S. Hou, Z. Zhou, L. Cheng, and J. Liu, "Self-distillation fine-tuning: Aligning pretrained language models without human labels," in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), 2024. [Online]. Available: https://arxiv.org/abs/2402.03960
[17]	Y. Wang et al., "Self-instruct: Aligning language models with self-generated instructions," in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), 2023. [Online]. Available: https://arxiv.org/abs/2212.10560
[18]	Y. Fu, H. Wang, Z. Zhang, L. Feng, and Y. Song, "A comprehensive survey on parameter-efficient fine-tuning of pre-trained language models," arXiv preprint arXiv:2303.15647, 2023. [Online]. Available: https://arxiv.org/abs/2303.15647
[19]	Y. Gao, J. Lin, H. He, Z. Han, X. Ma, and J. Tang, "DoRA: Weight-decomposed low-rank adaptation for efficient fine-tuning of language models," arXiv preprint arXiv:2402.09353, 2024. [Online]. Available: https://arxiv.org/abs/2402.09353
[20]	Q. Li, S. Geng, T. Sun, S. Liu, H. Chen, J. Yang, and Z. Wang, "LoftQ: LoRA-fine-tuning-aware quantization for large language models," arXiv preprint arXiv:2310.08659, 2023. [Online]. Available: https://arxiv.org/abs/2310.08659
[21]	Y. Gao, X. Ma, J. Lin, P. He, and J. Tang, "Retrieval-augmented generation: A survey," arXiv preprint arXiv:2312.10997, 2023. [Online]. Available: https://arxiv.org/abs/2312.10997
[22]	R. Azimi, A. Wang, K. Yang, S. Yuan, and A. Roy-Chowdhury, "KD-LoRA: A hybrid approach to efficient fine-tuning using knowledge distillation and low-rank adaptation," arXiv preprint arXiv:2410.20777, 2024. [Online]. Available: https://arxiv.org/abs/2410.20777
[23]	Y. Yang, Y. Sun, R. Menon, A. Singh, and M. Kankanhalli, "LLM-NEO: Parameter efficient knowledge distillation for large language models," arXiv preprint arXiv:2403.06545, 2024. [Online]. Available: https://arxiv.org/abs/2403.06545
[24]	S. Yang, L. Kong, S. Hou, L. Cheng, Z. Zhou, and J. Liu, "NutePrune: Efficient progressive pruning with numerous teachers for large language models," arXiv preprint arXiv:2402.09773, 2024. [Online]. Available: https://arxiv.org/abs/2402.09773
[25]	Y. Kang, H. Hu, S. Liu, H. Liu, H. Chen, and J. Li, "KARD: Knowledge-augmented reasoning distillation," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2023. [Online]. Available: https://openreview.net/forum?id=8nWyS23A0Z
[26]	Y. Zhang, S. Tang, D. Zhang, H. Sun, Z. Wang, and S. Li, "ReAugKD: Retrieval-augmented knowledge distillation for pretrained language models," in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), 2023, pp. 3209–3222. [Online]. Available: https://aclanthology.org/2023.acl-long.210
[27]	J. Wei et al., "Chain-of-thought prompting elicits reasoning in large language models," arXiv preprint arXiv:2201.11903, 2022. [Online]. Available: https://arxiv.org/abs/2201.11903
[28]	Gemma Team et al., "Gemma 3 technical report," arXiv preprint arXiv:2503.19786, Mar. 2025. [Online]. Available: https://arxiv.org/abs/2503.19786
[29]	W.-L. Chiang et al., "Chatbot arena: An open platform for evaluating LLMs by human preference," arXiv preprint arXiv:2403.04132, Mar. 2024. [Online]. Available: https://arxiv.org/abs/2403.04132
[30]	LMArena, "Chatbot Arena Leaderboard." [Online]. Available: https://lmarena.ai/ (Accessed: May 27, 2025).
[31]	Gemini Team et al., "Gemini: A family of highly capable multimodal models," arXiv preprint arXiv:2312.11805, 2023. [Online]. Available: https://arxiv.org/abs/2312.11805
[32]	N. Reimers and I. Gurevych, "Sentence-BERT: Sentence embeddings using siamese BERT-networks," in Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2019, pp. 3982–3992.
[33]	J. Liu, "LlamaIndex," 2022. [Online]. Available: https://github.com/jerryjliu/llama_index
[34]	L. McInnes, J. Healy, and J. Melville, "UMAP: Uniform manifold approximation and projection for dimension reduction," arXiv preprint arXiv:1802.03426, 2018. [Online]. Available: https://arxiv.org/abs/1802.03426
[35]	C.-Y. Lin, "ROUGE: A package for automatic evaluation of summaries," in Text Summarization Branches Out, Barcelona, Spain, 2004, pp. 74–81.
[36]	T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, "BERTScore: Evaluating text generation with BERT," in Proc. Int. Conf. on Learning Representations (ICLR), 2020.
[37]	J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019, pp. 4171–4186.
[38]	X. Li, F. Yin, Z. Sun, X. Li, A. Wang, X. Li, and A. Chen, "Entity-relation extraction as multi-turn question answering," in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
[39]	A. Asai et al., "One question answering model for many languages with cross-lingual dense passage retrieval," in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2021.
[40]	S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan, "Tree of thoughts: Deliberate problem solving with large language models," arXiv preprint arXiv:2305.10601, 2023. [Online]. Available: https://arxiv.org/abs/2305.10601
[41]	Y. Zuo, K. Zhang, L. Sheng, Y. Chen, Y. Liu, and J. Wang, "TTRL: Test-time reinforcement learning," arXiv preprint arXiv:2504.16084, 2025. [Online]. Available: https://arxiv.org/abs/2504.16084
[42]	S. O'Brien and M. Lewis, "Decoding is an art: A survey of decoding methods for large language models," arXiv preprint arXiv:2402.06925, 2024. [Online]. Available: https://arxiv.org/abs/2402.06925
[43]	D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A learning algorithm for boltzmann machines," Cognitive Science, vol. 9, no. 1, pp. 147–169, 1985.
[44]	A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi, "The curious case of neural text degeneration," in Proc. Int. Conf. on Learning Representations (ICLR), 2020.
[45]	A. Fan, M. Lewis, and Y. Dauphin, "Hierarchical neural story generation," in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), 2018.
[46]	J. Shlens, "Notes on Kullback-Leibler divergence and likelihood theory," arXiv preprint arXiv:1404.2000, 2014.
[47]	吳岱融, “餐廳顧客評論分析:以 LDA 和 k-mean 為基礎的主題與情感分析,” 碩士論文, 資訊管理系研究所, 國立中正大學, 嘉義縣, 臺灣, 2023. [Online]. Available: https://hdl.handle.net/11296/acqs3p
[48]	周晁揚, “用於 LEO 之透射陣列單元與透射陣列設計,” 碩士論文, 電信工程學研究所, 國立臺灣大學, 臺北市, 臺灣, 2022. [Online]. Available: https://hdl.handle.net/11296/q4wj7t
[49]	蕭兆翔, “應用 AI 視覺模型於監控系統之大型室內空間的行人定位,” 碩士論文, 機械與機電工程學系碩士班, 淡江大學, 新北市, 臺灣, 2022. [Online]. Available: https://hdl.handle.net/11296/j8f825
[50]	李厚誼, “基於 FPGA 車牌辨識 AI 模型之開發,” 碩士論文, 機械與機電工程學系碩士班, 淡江大學, 新北市, 臺灣, 2024. [Online]. Available: https://hdl.handle.net/11296/qb994k
論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於繳交授權書後, 於網際網路立即公開
校內
校內紙本論文立即公開
同意電子論文全文授權於全球公開
校內電子論文立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信