電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2025-08-12起於校外公開使用
本論文紙本於2025-08-12起公開使用

系統識別號	U0002-0308202521045200
DOI	10.6846/TKU_Electronic Theses & Dissertations Service202500369
論文名稱(中文)	多模態骨架引導擴散模型於書法字體生成與遷移之研究
論文名稱(英文)	Multi-Modal Skeleton-Guided Diffusion for Calligraphic Font Generation and Transfer
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊工程學系碩士班
系所名稱(英文)	Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	113
學期	2
出版年	114
研究生(中文)	連鄭勛
研究生(英文)	Zheng-Xun Lian
學號	613410116
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2025-07-03
論文頁數	41頁
口試委員	指導教授 - 吳孟倫( mlwutp@gmail.com) 口試委員 - 林莊傑(josephcclin@mail.ntou.edu.tw) 口試委員 - 陳啟楨( cjchen@mail.tku.edu.tw)
關鍵字(中)	擴散模型書法字體生成條件式生成骨架化風格遷移
關鍵字(英)	Diffusion Models Calligraphy Generation Conditional Generation Skeletonization Style Transfer
第三語言關鍵字
學科別分類
中文摘要	近年擴散模型（Diffusion Models）於高品質影像生成領域嶄露頭角，然中文書法字體因筆劃複雜與風格多樣，仍難以直接套用既有架構。本研究提出一套「多模態條件式擴散框架」，結合骨架影像、筆劃向量與風格參考，於 128×128 解析度生成結構保真且具書法韻味之字形。方法首先以形態學骨架化萃取筆劃主幹，再將每字映射為 32 維筆劃多標籤向量，並與字元標籤共同嵌入。接著透過 Skeleton-Style Adaptive Gate 動態平衡骨架與風格資訊，以損失函數 MSE 端到端訓練；推理階段輔以 Classifier-Free Guidance 強化可控性。實驗證實，本法相較現有 Diff-Font 等模型在 SSIM、 LPIPS、PSNR 等指標均顯著提升，可有效降低客製書法字體成本，並為書法數位化與 AI 應用開拓新方向。後續工作將探討更高解析度、多語系字體與少量樣本微調之可行性。
英文摘要	In recent years, diffusion models have emerged as a powerful paradigm for high-quality image synthesis. However, directly applying existing architectures to Chinese calligraphy generation remains challenging due to the complexity of strokes and the diversity of calligraphic styles. This thesis proposes a novel multi-modal conditional diffusion framework that integrates three distinct modalities—skeleton image, stroke vector, and style reference—at a fixed resolution of 128 × 128 to generate authentic and structurally consistent calligraphic characters. First, we employ a morphological skeletonization process to extract stroke topology as a single-pixel skeleton image. Each character is then represented by a 32-dimensional multi-label stroke vector, combined with its character-ID embedding. Next, we introduce a Skeleton–Style Adaptive Gate that dynamically balances structural and stylistic information during generation. The model is trained end-to-end using an 𝜖-prediction mean-squared-error loss, and inference leverages classifier-free guidance to enhance conditional controllability. Extensive experiments demonstrate that our method significantly lowers the cost of custom calligraphy font creation and offers a new direction for digitizing traditional calligraphy with AI. Future work may explore high-resolution synthesis, multilingual style transfer, and few-shot fine-tuning extensions
第三語言摘要
論文目次	誌謝第一章引言 1 1.1 背景 1 1.2 動機 2 1.3 研究目標 3 1.4 問題陳述 4 1.5 論文組織 4 第二章文獻探討 6 2.1 基於 GAN 模型的書法字生成 6 2.2 擴散模型（Diffusion Models） 8 2.3 基於擴散模型的書法字生成 8 第三章研究方法 12 3.1 系統架構 12 3.1.1 筆劃向量編碼（Multi-label Binary Vector） 17 3.2 字體骨架化（Skeletonization） 19 3.3 自適應門控（Adaptive Gate） 20 3.4 條件式擴散過程（Conditional Diffusion Process) 22 3.5 損失函數設計（Loss Function） 26 第四章實驗結果 28 4.1 生成過程視覺化 29 4.2 實驗設定 30 4.3 模型比較實驗 31 4.4 結果分析 32 4.5 見過風格與未見風格的比較 33 4.6 有骨架化與無骨架化字體之比較實驗 34 第五章結論 37 參考文獻 39 圖目錄圖 1-1 早期點陣字體示例：每個字元由固定大小的像素矩陣組成，在低解析度顯示器上能保持可讀性，但邊緣鋸齒明顯，細節不足。 2 圖 1-2 書法「永」字示例：標註八法筆勢，展示基本筆畫結構與風格特徵。 3 圖 2-1 MXFont 模型生成的書法字範例 7 圖 2-2 DGFONT 模型生成的書法字範例 7 圖 2-3 Calliffusion 生成的書法示例「功蓋三分國」，條件：顏真卿楷書。 9 圖 2-4 每一行展示了 DiffCJK 可產生的不同風格，上方區塊為常用字符，下方區塊則從一組不常用字符中隨機抽取。在每個區塊中，灰色為參考字符，黑色為生成字符。 10 圖 2-5 下方區塊為 GT（Ground Truth），上方區塊為 Diff-Font 生成的字體效果對比。 10 圖 3-1 框架流程圖 ── 骨架與帶噪影像在通道維度拼接；四路語意向量經嵌入並與時間向量相加，於各 Residual Block 以 FiLM 方式調制特徵。 17 圖 3-2 三十二筆劃元件；每一維 Multi-label 向量對應一種元件 18 圖 3-3 「成」字骨架及其 32 維 Multi-label 向量 18 圖 3-4 骨架化示例 ── 每行左：風格字體；右：單像素骨架 20 圖 3-5 MSE 損失計算流程示意：模型根據條件向量去噪，並以真實雜訊 𝝐 計算 LMSE 後更新權重。 27 圖 4-1 「姑」字生成演進 30 圖 4-2 「沫」字生成演進 30 圖 4-3 「供」字生成演進 30 圖 4-4 Seen-style 質化比較示例（紅框標示 Diff-Font 缺筆／位移處）。 33 圖 4-5 對未見過字元的風格生成結果。上排為目標字體中的真實字形（Target），下排為本模型在僅給定風格參考、但從未在訓練集中見過這些目標字元（unseen）的情況下，生成的結果。可見，模型成功地將目標風格的筆劃特徵（如粗細、起收筆、轉折）遷移到了新字元上。 34 圖 4-6 視覺比較：目標字形（最上列）、以骨架化字體引導的生成（中列）與未骨架化引導的生成（最下列）。骨架先驗有助於筆劃拓撲與細節還原。 36 表目錄表 4-1 模型比較：Diff-Font 與本方法的比較 32 表 4-2 實驗二 ── 本方法在已見 vs. 未見風格上的量化比較 33 表 4-3 模型比較：使用與不使用骨架化字體的比較 35
參考文獻	參考文獻 [1] C. Sun, Z. Guo, and H. Tian, “Chinese calligraphy synthesis with realistic brush strokes,” IEEE Access, vol. 7, pp. 74881-74892, 2019. [2] J. Lee and J. Yu, “Automatic generation of chinese calligraphy using deep learning,” ACM Transactions on Graphics, vol. 40, no. 4, pp. 73:1-73:15, 2021. [3] M. Lin, “Digitization of chinese calligraphy and its application,” Journal of Digital Archives, vol. 15, no. 2, pp. 25-40, 2020. [4] O. Graves, “Computer‐aided design and cultural heritage: Digital preservation of handwriting,” Computer Graphics Forum, vol. 37, no. 7, pp. 105-117, 2018. [5] L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” arXiv preprint arXiv:2302.05543, 2023. [6] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of the 27th Conference on Neural Information Processing Systems (NeurIPS 2014), (Mon- treal, Canada), pp. 2672-2680, Curran Associates, Inc., 2014. [7] S. Park, S. Chun, J. Cha, B. Lee, and H. Shim, “Multiple heads are better than one: Few-shot font generation with multiple localized experts,” in Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV’21), pp. 13900- 13909, 2021. [8] Y. Xie, X. Chen, L. Sun, and Y. Lu, “Dg-font: Deformable generative networks for unsupervised font generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR’21), pp. 5130-5140, 2021. [9] J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsu- pervised learning using nonequilibrium thermodynamics,” in International Confer- ence on Machine Learning(ICML), pp. 2256-2265, 2015. [10] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Ad- vances in Neural Information Processing Systems (NeurIPS), pp. 6840-6851, 2020. [11] Y. Song, J. Sohl-Dickstein, D. J. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score- based generative modeling through stochastic differential equations,” in Interna- tional Conference on Learning Representations(ICLR), 2021. [12] A. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in Proceedings of the 38th International Conference on Machine Learning (ICML) (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learning Re- search, pp. 8162-8171, PMLR, 18-24 Jul 2021. [13] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in Interna- tional Conference on Learning Representations(ICLR), 2020. [14] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” arXiv preprint arXiv:2112.10752, 2022. [15] Q. Liao, G. Xia, and Z. Wang, “Calliffusion: Chinese calligraphy generation and style transfer with diffusion modeling,” arXiv preprint arXiv:2305.19124, 2023. [16] Y. Tian, “Diffcjk: Conditional diffusion model for high-quality and wide-coverage cjk character generation,” arXiv preprint arXiv:2404.05212, 2024. [17] H. He, X. Chen, C. Wang, and J. Liu, “Diff-font: Diffusion model for robust one-shot font generation,” International Journal of Computer Vision, vol. 132, pp. 5372-5386, 2024. [18] J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in NeurIPS 2022 Deep Generative Models and Downstream Applications Workshop, 2022. [19] S. Liu, W. Zhang, X. Chen, and L. Wang, “基於筆劃部件的中文字符骨架分析方法,” 電腦應用研究, vol. 35, no. 12, pp. 3676-3682, 2018. 依據 CNS 11643 常用部件劃分，實作 32 維筆劃向量. [20] L. Lam and Suen, “Thinning methodologies—a comprehensive survey,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 9, pp. 869-885, 1992. [21] T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM, vol. 27, no. 3, pp. 236-239, 1984. [22] Z. Guo and R. W. Hall, “Parallel thinning with two-subiteration algorithms,” Com- munications of the ACM, vol. 32, no. 3, pp. 359-373, 1989.
論文全文使用權限	國家圖書館：同意無償授權國家圖書館，書目與全文電子檔於繳交授權書後, 於網際網路立即公開校內：校內紙本論文立即公開同意電子論文全文授權於全球公開校內電子論文立即公開校外：同意授權予資料庫廠商校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信