系統識別號 | U0002-0308202521045200 |
---|---|
DOI | 10.6846/TKU_Electronic Theses & Dissertations Service202500369 |
論文名稱(中文) | 多模態骨架引導擴散模型於書法字體生成與遷移之研究 |
論文名稱(英文) | Multi-Modal Skeleton-Guided Diffusion for Calligraphic Font Generation and Transfer |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 113 |
學期 | 2 |
出版年 | 114 |
研究生(中文) | 連鄭勛 |
研究生(英文) | Zheng-Xun Lian |
學號 | 613410116 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2025-07-03 |
論文頁數 | 41頁 |
口試委員 |
指導教授
-
吳孟倫( mlwutp@gmail.com)
口試委員 - 林莊傑(josephcclin@mail.ntou.edu.tw) 口試委員 - 陳啟楨( cjchen@mail.tku.edu.tw) |
關鍵字(中) |
擴散模型 書法字體生成 條件式生成 骨架化 風格遷移 |
關鍵字(英) |
Diffusion Models Calligraphy Generation Conditional Generation Skeletonization Style Transfer |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
近年擴散模型(Diffusion Models)於高品質影像生成領域嶄 露頭角,然中文書法字體因筆劃複雜與風格多樣,仍難以直接套用既有 架構。本研究提出一套「多模態條件式擴散框架」,結合骨架影像、筆 劃向量與風格參考,於 128×128 解析度生成結構保真且具書法韻味之 字形。方法首先以形態學骨架化萃取筆劃主幹,再將每字映射為 32 維 筆劃多標籤向量,並與字元標籤共同嵌入。 接著透過 Skeleton-Style Adaptive Gate 動態平衡骨架與風格資訊, 以損失函數 MSE 端到端訓練;推理階段輔以 Classifier-Free Guidance 強化可控性。實驗證實,本法相較現有 Diff-Font 等模型在 SSIM、 LPIPS、PSNR 等指標均顯著提升,可有效降低客製書法字體成本,並為 書法數位化與 AI 應用開拓新方向。後續工作將探討更高解析度、多語 系字體與少量樣本微調之可行性。 |
英文摘要 |
In recent years, diffusion models have emerged as a powerful paradigm for high-quality image synthesis. However, directly applying existing architectures to Chinese calligraphy generation remains challenging due to the complexity of strokes and the diversity of calligraphic styles. This thesis proposes a novel multi-modal conditional diffusion framework that integrates three distinct modalities—skeleton image, stroke vector, and style reference—at a fixed resolution of 128 × 128 to generate authentic and structurally consistent calligraphic characters. First, we employ a morphological skeletonization process to extract stroke topology as a single-pixel skeleton image. Each character is then represented by a 32-dimensional multi-label stroke vector, combined with its character-ID embedding. Next, we introduce a Skeleton–Style Adaptive Gate that dynamically balances structural and stylistic information during generation. The model is trained end-to-end using an 𝜖-prediction mean-squared-error loss, and inference leverages classifier-free guidance to enhance conditional controllability. Extensive experiments demonstrate that our method significantly lowers the cost of custom calligraphy font creation and offers a new direction for digitizing traditional calligraphy with AI. Future work may explore high-resolution synthesis, multilingual style transfer, and few-shot fine-tuning extensions |
第三語言摘要 | |
論文目次 |
誌謝 第一章 引言 1 1.1 背景 1 1.2 動機 2 1.3 研究目標 3 1.4 問題陳述 4 1.5 論文組織 4 第二章 文獻探討 6 2.1 基於 GAN 模型的書法字生成 6 2.2 擴散模型(Diffusion Models) 8 2.3 基於擴散模型的書法字生成 8 第三章 研究方法 12 3.1 系統架構 12 3.1.1 筆劃向量編碼(Multi-label Binary Vector) 17 3.2 字體骨架化(Skeletonization) 19 3.3 自適應門控(Adaptive Gate) 20 3.4 條件式擴散過程(Conditional Diffusion Process) 22 3.5 損失函數設計(Loss Function) 26 第四章 實驗結果 28 4.1 生成過程視覺化 29 4.2 實驗設定 30 4.3 模型比較實驗 31 4.4 結果分析 32 4.5 見過風格與未見風格的比較 33 4.6 有骨架化與無骨架化字體之比較實驗 34 第五章 結論 37 參考文獻 39 圖目錄 圖 1-1 早期點陣字體示例:每個字元由固定大小的像素矩陣組成, 在低解析度顯示器上能保持可讀性,但邊緣鋸齒明顯,細節 不足。 2 圖 1-2 書法「永」字示例:標註八法筆勢,展示基本筆畫結構與風格 特徵。 3 圖 2-1 MXFont 模型生成的書法字範例 7 圖 2-2 DGFONT 模型生成的書法字範例 7 圖 2-3 Calliffusion 生成的書法示例「功蓋三分國」,條件:顏真卿楷 書。 9 圖 2-4 每一行展示了 DiffCJK 可產生的不同風格,上方區塊為常用 字符,下方區塊則從一組不常用字符中隨機抽取。在每個區 塊中,灰色為參考字符,黑色為生成字符。 10 圖 2-5 下方區塊為 GT(Ground Truth),上方區塊為 Diff-Font 生成的 字體效果對比。 10 圖 3-1 框架流程圖 ── 骨架與帶噪影像在通道維度拼接;四路語意 向量經嵌入並與時間向量相加,於各 Residual Block 以 FiLM 方式調制特徵。 17 圖 3-2 三十二筆劃元件;每一維 Multi-label 向量對應一種元件 18 圖 3-3 「成」字骨架及其 32 維 Multi-label 向量 18 圖 3-4 骨架化示例 ── 每行左:風格字體;右:單像素骨架 20 圖 3-5 MSE 損失計算流程示意:模型根據條件向量去噪,並以真實 雜訊 𝝐 計算 LMSE 後更新權重。 27 圖 4-1 「姑」字生成演進 30 圖 4-2 「沫」字生成演進 30 圖 4-3 「供」字生成演進 30 圖 4-4 Seen-style 質化比較示例(紅框標示 Diff-Font 缺筆/位移處)。 33 圖 4-5 對未見過字元的風格生成結果。上排為目標字體中的真實字 形(Target),下排為本模型在僅給定風格參考、但從未在訓 練集中見過這些目標字元(unseen)的情況下,生成的結果。 可見,模型成功地將目標風格的筆劃特徵(如粗細、起收筆、 轉折)遷移到了新字元上。 34 圖 4-6 視覺比較:目標字形(最上列)、以骨架化字體引導的生成 (中列)與未骨架化引導的生成(最下列)。骨架先驗有助於筆 劃拓撲與細節還原。 36 表目錄 表 4-1 模型比較:Diff-Font 與本方法的比較 32 表 4-2 實驗二 ── 本方法在已見 vs. 未見風格上的量化比較 33 表 4-3 模型比較:使用與不使用骨架化字體的比較 35 |
參考文獻 |
參考文獻 [1] C. Sun, Z. Guo, and H. Tian, “Chinese calligraphy synthesis with realistic brush strokes,” IEEE Access, vol. 7, pp. 74881-74892, 2019. [2] J. Lee and J. Yu, “Automatic generation of chinese calligraphy using deep learning,” ACM Transactions on Graphics, vol. 40, no. 4, pp. 73:1-73:15, 2021. [3] M. Lin, “Digitization of chinese calligraphy and its application,” Journal of Digital Archives, vol. 15, no. 2, pp. 25-40, 2020. [4] O. Graves, “Computer‐aided design and cultural heritage: Digital preservation of handwriting,” Computer Graphics Forum, vol. 37, no. 7, pp. 105-117, 2018. [5] L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” arXiv preprint arXiv:2302.05543, 2023. [6] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of the 27th Conference on Neural Information Processing Systems (NeurIPS 2014), (Mon- treal, Canada), pp. 2672-2680, Curran Associates, Inc., 2014. [7] S. Park, S. Chun, J. Cha, B. Lee, and H. Shim, “Multiple heads are better than one: Few-shot font generation with multiple localized experts,” in Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV’21), pp. 13900- 13909, 2021. [8] Y. Xie, X. Chen, L. Sun, and Y. Lu, “Dg-font: Deformable generative networks for unsupervised font generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR’21), pp. 5130-5140, 2021. [9] J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsu- pervised learning using nonequilibrium thermodynamics,” in International Confer- ence on Machine Learning(ICML), pp. 2256-2265, 2015. [10] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Ad- vances in Neural Information Processing Systems (NeurIPS), pp. 6840-6851, 2020. [11] Y. Song, J. Sohl-Dickstein, D. J. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score- based generative modeling through stochastic differential equations,” in Interna- tional Conference on Learning Representations(ICLR), 2021. [12] A. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in Proceedings of the 38th International Conference on Machine Learning (ICML) (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learning Re- search, pp. 8162-8171, PMLR, 18-24 Jul 2021. [13] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in Interna- tional Conference on Learning Representations(ICLR), 2020. [14] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” arXiv preprint arXiv:2112.10752, 2022. [15] Q. Liao, G. Xia, and Z. Wang, “Calliffusion: Chinese calligraphy generation and style transfer with diffusion modeling,” arXiv preprint arXiv:2305.19124, 2023. [16] Y. Tian, “Diffcjk: Conditional diffusion model for high-quality and wide-coverage cjk character generation,” arXiv preprint arXiv:2404.05212, 2024. [17] H. He, X. Chen, C. Wang, and J. Liu, “Diff-font: Diffusion model for robust one-shot font generation,” International Journal of Computer Vision, vol. 132, pp. 5372-5386, 2024. [18] J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in NeurIPS 2022 Deep Generative Models and Downstream Applications Workshop, 2022. [19] S. Liu, W. Zhang, X. Chen, and L. Wang, “基於筆劃部件的中文字符骨架分析方 法,” 電腦應用研究, vol. 35, no. 12, pp. 3676-3682, 2018. 依據 CNS 11643 常用 部件劃分,實作 32 維筆劃向量. [20] L. Lam and Suen, “Thinning methodologies—a comprehensive survey,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 9, pp. 869-885, 1992. [21] T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM, vol. 27, no. 3, pp. 236-239, 1984. [22] Z. Guo and R. W. Hall, “Parallel thinning with two-subiteration algorithms,” Com- munications of the ACM, vol. 32, no. 3, pp. 359-373, 1989. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信