| 系統識別號 | U0002-0307202514275600 |
|---|---|
| 論文名稱(中文) | 基於深度學習的靜態漢字筆畫順序推理與動態重建 |
| 論文名稱(英文) | Deep Learning Based Static Chinese Character Stroke Order Prediction and Dynamic Reconstruction |
| 第三語言論文名稱 | |
| 校院名稱 | 淡江大學 |
| 系所名稱(中文) | 資訊工程學系碩士班 |
| 系所名稱(英文) | Department of Computer Science and Information Engineering |
| 外國學位學校名稱 | |
| 外國學位學院名稱 | |
| 外國學位研究所名稱 | |
| 學年度 | 113 |
| 學期 | 2 |
| 出版年 | 114 |
| 研究生(中文) | 段凱杰 |
| 研究生(英文) | KAI-JIE DUAN |
| 學號 | 612410281 |
| 學位類別 | 碩士 |
| 語言別 | 繁體中文 |
| 第二語言別 | |
| 口試日期 | 2025-06-23 |
| 論文頁數 | 37頁 |
| 口試委員 |
指導教授
-
陳建彰(ccchen34@mail.tku.edu.tw)
口試委員 - 林其誼(chiyilin@mail.tku.edu.tw) 口試委員 - 許哲銓(tchsu@scu.edu.tw) |
| 關鍵字(中) |
書法學習 筆畫預測 時間序列 卷積神經網路 長短期記憶網路 自監督學習 DTW距離 |
| 關鍵字(英) |
Calligraphy Learning Stroke Prediction Time Series CNN LSTM Self-Supervised Learning DTW |
| 第三語言關鍵字 | |
| 學科別分類 | |
| 中文摘要 |
在本研究中,我們建立一套能從書法圖片中預測筆畫軌跡與書寫順序的深度學習系統,協助初學者掌握正確書寫方式。透過手寫板收集筆畫資料,將其轉為時間序列,利用卷積神經網路(CNN)擷取圖像特徵,再由長短期記憶網路(LSTM)預測筆畫座標與狀態(三分類)。引入 heatmap 機制與 softargmax 增強模型辨識起筆與收筆位置的能力,並搭配多種損失函數(如 DTW loss、heatmap loss、temporal smoothness loss)優化預測表現。 本研究設計自動推論系統,隨機取樣驗證資料進行視覺化分析。實驗結果顯示,模型在筆畫分類、軌跡與順序預測上表現優異,並透過吸附機制(snap-to-stroke)提升準確度。此系統可做為書法學習輔助工具,提供精確書寫路徑與結構建議,提升學習者書寫品質與筆順理解。 |
| 英文摘要 |
This study proposes a deep learning system that predicts stroke trajectories and writing order from calligraphy images to assist beginners in learning proper handwriting. Stroke data from a handwriting tablet are converted into time-series coordinates. A CNN extracts visual features, and an LSTM predicts stroke positions and three-class states. A heatmap with softargmax enhances pen-down/up detection. Multiple loss functions, including DTW and temporal smoothness loss, optimize training. An inference module samples validation data for visual analysis via images and videos. Results show strong performance in stroke classification, trajectory, and order prediction. The snap-to-stroke mechanism further improves accuracy and offers precise feedback for handwriting improvement. |
| 第三語言摘要 | |
| 論文目次 |
致謝 ii 目錄 I 圖目錄 III 表目錄 IV 第一章 緒論 1 1.1研究背景與動機 1 1.2研究目的 2 1.3論文架構 3 第二章 文獻探討 5 2.1書寫動作分析技術發展 5 2.2 深度學習於筆畫預測之應用 6 2.3 筆畫重建系統演算法介紹 8 2.4 圖神經網路介紹 10 第三章 研究方法 12 3.1系統流程架構 13 3.2 流程示意圖 15 3.3 資料表示與預處理 16 3.4 Heatmap 建構方式 17 3.5 CNN特徵提取 18 3.6 softargmax 機制 19 3.7 損失函數設計 20 第四章 實驗結果 23 4.1實驗設計與資料配置 23 4.2 不同loss設計 25 4.3結果分析與視覺化討論 27 4.4 筆畫逐步分析 28 4.5 筆畫交會處之細節改善比較 29 4.6 與現有方法比較 30 第五章 結論與未來研究方向 32 參考文獻 33 圖目錄 圖 1 、繁體中文的筆畫順序 8 圖 2 、本研究筆畫預測模型架構圖 10 圖 3、流程示意圖 15 圖 4、不同loss的結果 25 圖 5、不同模型於同一樣本之筆畫預測結果比較 27 圖 6、筆畫預測逐步分析圖 28 圖 7、筆畫交會處之細節改善比較 29 表目錄 表 1、各方法於筆畫重建功能之比較分析 30 |
| 參考文獻 |
[1] Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186, Jan.2021,doi: 10.1109/TPAMI.2019.2929257. [2] D. C. Luvizon, H. Tabia, and D. Picard, “2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, Jun. 2018, pp. 5137–5146. [3] D. C. Luvizon, D. Picard, and H. Tabia, “Human Pose Regression by Combining Indirect Part Detection and Contextual Information,” Comput. Vis. Image Underst., vol. 192, pp. 102897, Mar. 2020, doi: 10.1016/j.cviu.2019.102897. [4] H. Sakoe and S. Chiba, “Dynamic Programming Algorithm Optimization for Spoken Word Recognition,” IEEE Trans. Acoust. Speech Signal Process., vol. 26, no. 1, pp. 43–49, Feb. 1978, doi: 10.1109/TASSP.1978.1163055. [5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998. [6] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [7] C. Lea, A. Reiter, R. Vidal, and G. D. Hager, “Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Amsterdam, The Netherlands, Oct. 2016, pp. 36–52. [8] T. D. Nguyen and M. Kresovic, “A Survey of Top-Down Approaches for Human Pose Estimation,” arXiv preprint, arXiv:2202.02656, Feb. 2022. [9] H. S. Fang, J. Li, H. Tang, C. Xu, H. Zhu, Y. Xiu, Y. L. Li, and C. Lu, “AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 1–17, Jan. 2023, doi: 10.1109/TPAMI.2022.3222784. [10] J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. Lai, N. Davis, and F. Nuflo, “Modeling Visual Attention via Selective Tuning,” Artif. Intell., vol. 78, nos. 1–2, pp. 507–545, Oct. 1995. [11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All You Need,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, Long Beach, CA, USA, Dec. 2017, pp. 5998–6008. [12] X. Wang, L. Bo, and F. Li, “Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Korea, Oct. 2019, pp. 6971–6981. [13] Z. Zhang and M. R. Sabuncu, “Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels,” in Adv. Neural Inf. Process. Syst. (NeurIPS), 32nd ed., Montréal, Canada, Dec. 2018, pp. 8792–8802. [14] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, “Deep Spatial Autoencoders for Visuomotor Learning,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Stockholm, Sweden, May 2016, pp. 512–519. [15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 770–778. [16] X. Li, J. Ylioinas, J. Verbeek, and J. Kannala, “Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization,” in Proc. Eur. Conf. Comput. Vis. Workshops (ECCVW), Munich, Germany, Sep. 2018. [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 25, Lake Tahoe, NV, USA, Dec. 2012, pp. 1097–1105. [18] P. C. Chen, “Traditional Chinese Handwriting Dataset,” GitHub, 2020. [Online]. Available:https://github.com/AI-FREE-Team/Traditional-Chinese-Handwriting-Dataset [accessed: Jun. 19, 2025]. [19] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “cuDNN: Efficient Primitives for Deep Learning,” arXiv preprint, arXiv:1410.0759, Oct. 2014. [20] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon, “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Portland, OR, USA, Jun. 2013, pp. 2930–2937, doi: 10.1109/CVPR.2013.377. |
| 論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信