系統識別號 | U0002-1505202423413600 |
---|---|
DOI | 10.6846/tku202400143 |
論文名稱(中文) | 基於深度影像的3D動作評量系統-以太極拳為例 |
論文名稱(英文) | 3D Motion Evaluation System Using Depth Image: A Case Study of Tai Chi |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 112 |
學期 | 2 |
出版年 | 113 |
研究生(中文) | 陳思玫 |
研究生(英文) | Ssu-Mei Chen |
學號 | 611410134 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2024-07-10 |
論文頁數 | 42頁 |
口試委員 |
指導教授
-
陳建彰(ccchen34@mail.tku.edu.tw)
口試委員 - 林承賢(cslin@mail.tku.edu.tw) 口試委員 - 許哲銓(tchsu@scu.edu.tw) |
關鍵字(中) |
對比式學習 時間循環一致性學習 逐幀檢索 動作評分系統 |
關鍵字(英) |
Contrastive Learning Temporal Cycle Conslstency Learning Frame by frame retrieval Motion Scoring System |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
太極拳作為中華文化中的傳統武術,在教學過程中面臨主觀評分不一致且耗時的問題。隨著學習人數的增加,傳統的人工評分方式已難以滿足需求。為了解決這一挑戰,本研究開發了一個基於深度學習的太極拳動作評分系統。本研究收集了標準太極拳影片,利用MediaPipe姿勢評估系統捕捉人體骨骼資訊,提取骨骼特徵,並將這些特徵與深度影片輸入時間循環一致性學習網路(TCC)進行訓練,該網路能夠捕捉動作在時間維度上的一致性與差異。我們進行了四種不同模型的實驗:前視角無深度、前視角有深度、雙視角無深度以及雙視角有深度模型。實驗結果顯示,雙視角有深度模型在動作識別和評分的準確性上達到最佳效果,精確度、召回率和F1值分別為98%、94%和96%。這些結果表明,該系統能夠精確識別學習者動作中的細微差異,提供詳細反饋,協助教學者更有效地指導學習者練習,顯著提高太極拳教學的效率和品質,推動現代科技與傳統文化的融合。 |
英文摘要 |
Tai Chi, as a traditional martial art in Chinese culture, faces issues of inconsistent and time-consuming subjective evaluations in the teaching process. With the increasing number of learners, traditional manual scoring methods can no longer meet the demand. To address this challenge, this study developed a deep learning-based Tai Chi movement evaluation system. We collected standard Tai Chi videos and used the MediaPipe pose estimation system to capture human skeletal information, extracting skeletal features, and inputting these features along with depth videos into a Temporal Consistency Learning Network (TCC) for training. This network can capture the consistency and differences in movements over time. We conducted experiments on four different models: front view without depth, front view with depth, dual view without depth, and dual view with depth. The experimental results showed that the dual view with depth model achieved the best performance in movement recognition and evaluation accuracy, with precision, recall, and F1 scores of 98%, 94%, and 96%, respectively. These results indicate that the system can accurately identify subtle differences in students' movements, provide detailed feedback, assist teachers in guiding students more effectively, significantly improve the efficiency and quality of Tai Chi teaching, and promote the integration of modern technology with traditional culture. |
第三語言摘要 | |
論文目次 |
目錄 目錄 v 圖目錄 vii 表目錄 viii 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 論文架構 3 第二章 文獻探討 4 2.1 人體動作識別相關研究 4 2.2 對比式學習 7 2.2.1 Contrastive Loss 8 2.2.2 Triplet Loss 9 2.2.3 InforNCE Loss 9 2.3 時間循環一致性學習 10 第三章 基於深度影像與時間循環一致性之太極拳動作評分演算法 15 3.1 系統架構 15 3.2 資料集 17 3.3 肢體關節點座標繪製 17 3.4 教學者及學習者的動作差異 18 3.4.1 前視角無深度之影片 19 3.4.2 前視角含有深度之影片 20 3.4.3 前視角含有深度之影片 21 3.4.4前後視角含有深度之影片 22 第四章 實驗結果 24 4.1 太極拳動作對齊實驗結果 24 4.2 逐幀分析實驗結果 25 4.4 結果分析與討論 27 4.4.1 資料分析 27 4.4.2 圖表與資料 27 第五章 結論與未來研究方向 35 5.1 結論 35 5.2 未來研究方向 36 參考文獻 37 圖目錄 圖 1、TCC運算流程圖 10 圖 2、TCC架構圖 11 圖 3、影片逐幀檢索 13 圖 4、影片對齊 14 圖 5、逐幀分析結果示意圖 26 圖6、訓練損失函數下降趨勢圖 27 圖7、分段動作配對結果圖 29 圖10、訓練和測試過程中的loss曲線 31 圖11、訓練時間和資源使用情況圖 33 表目錄 表 1、OpenPose與MediaPipe的比較(本研究整理) 5 表2、四種模型下輸入單一影片之匹配準確性 31 |
參考文獻 |
參考文獻 [1] 3D convolution ¶. 3D Convolution - PaddleEdu documentation. (n.d.). https://paddlepedia.readthedocs.io/en/latest/tutorials/CNN/convolution_operator/3D_Convolution.html [2] Google. (n.d.). Google/mediapipe: Cross-platform, customizable ML solutions for live and streaming media. GitHub. https://github.com/google/mediapipe [3] Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., & Feng, D. D. (2019). Deep Convolutional Neural Networks for human action recognition using depth maps and postures. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49(9), 1806–1819. [4] Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., & Zisserman, A. (2019). Temporal cycle-consistency learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). [5] Jalal, A., Nadeem, A., & Bobasu, S. (2019). Human body parts estimation and detection for physical sports movements. 2019 2nd International Conference on Communication, Computing and Digital Systems (C-CODE). [6] Ooke, N., Ikegami, Y., Yamamoto, K., & Nakamura, Y. (2022). Transfer learning of deep neural network human pose estimator by domain-specific data for video motion capturing. 2022 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO). [7] Reinschmidt, C., Van Den Bogert, A. J., Nigg, B. M., Lundberg, A., & Murphy, N. (1997). “Effect of Skin Movement on the Analysis of Skeletal Knee Joint Motion During Running,” Journal of biomechanics, 30(7), 729-732. [8] Fujimori, Y., Ohmura, Y., Harada, T., & Kuniyoshi, Y. (2009, May). “Wearable Motion Capture Suit with Full-Body Tactile Sensors,” In 2009 IEEE International Conference on Robotics and Automation (pp. 3186-3193). IEEE. [9] Liu, L., Wu, X., Wu, L., & Guo, T. (2012, October). “Static Human Gesture Grading Based on Kinect,” In 2012 5th International Congress on Image and Signal Processing (pp. 1390-1393). IEEE. [10] Dempsey, P. (2017). “The Teardown-Nintendo Switch Gaming System,” [Reviews Consumer Technology]. Engineering & Technology, 12(4), 82-83. [11] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). “Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields,” In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299). [12] Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., ... & Grundmann, M. (2019). Mediapipe: A Framework for Building Perception Pipelines,” arXiv preprint arXiv:1906.08172. [13] Hadsell, R., Chopra, S., & LeCun, Y. (2006, June). “Dimensionality Reduction by Learning an Invariant Mapping,” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 1735-1742). IEEE. [14] Ren, Z., Meng, J., Yuan, J., & Zhang, Z. (2011, November). “Robust Hand Gesture Recognition with Kinect Sensor,” In Proceedings of the 19th ACM international conference on Multimedia (pp. 759-760). [15] Wei, S. E., Tang, N. C., Lin, Y. Y., Weng, M. F., & Liao, H. Y. M. (2014, November). “Skeleton-Augmented Human Action Understanding by Learning with Progressively Refined Data,” In Proceedings of the 1st ACM International Workshop on Human Centered Event Understanding from Multimedia (pp. 7-10). [16] Huang, J. D. (2011, October). “Kinerehab: a Kinect-Based System for Physical Rehabilitation: a Pilot Study for Young Adults with Motor Disabilities,” In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility (pp. 319-320). [17] De Smedt, Q., Wannous, H., Vandeborre, J. P., Guerry, J., Saux, B. L., & Filliat, D. (2017, April). “3D Hand Gesture Recognition Using a Depth and Skeletal Dataset: Shrec'17 track,” In Proceedings of the Workshop on 3D Object Retrieval (pp. 33-38). [18] Devineau, G., Xi, W., Moutarde, F., & Yang, J. (2018, June). “Convolutional Neural Networks for Multivariate Time Series Classification using both Inter-and Intra-Channel Parallel Convolutions,” In Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP'2018). [19] Hou, J., Wang, G., Chen, X., Xue, J. H., Zhu, R., & Yang, H. (2018). “Spatial-Temporal Attention Res-TCN for Skeleton-based Dynamic Hand Gesture Recognition,” In Proceedings of the European conference on computer vision (ECCV) workshops (pp. 0-0). [20] Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., & Grundmann, M. (2020). “Blazepose: On-device Real-Time Body Pose tracking,” arXiv preprint arXiv:2006.10204. [21] Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014). “Microsoft coco: Common Objects in Context,” In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (pp. 740-755). Springer International Publishing. [22] Križnar, V., Leskovšek, M., & Batagelj, B. (2021, September). “Use of Computer Vision Based Hand Tracking in Educational Environments,” In 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO) (pp. 804-809). IEEE. [23] Newell, A., Yang, K., & Deng, J. (2016). “Stacked Hourglass Networks for Human Pose Estimation,” In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14 (pp. 483-499). Springer International Publishing. [24] Bazarevsky, V., Kartynnik, Y., Vakunov, A., Raveendran, K., & Grundmann, M. (2019). “Blazeface: Sub-millisecond Neural Face Detection on Mobile GPUs,” arXiv preprint arXiv:1907.05047. [25] Hadsell, R., Chopra, S., & LeCun, Y. (2006, June). “Dimensionality Reduction by Learning an Invariant Mapping,” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 1735-1742). IEEE. [26] Schroff, F., Kalenichenko, D., & Philbin, J. (2015). “FaceNet: A Unified Embedding for Face Recognition and Clustering,” In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823). [27] Gutmann, M., & Hyvärinen, A. (2010, March). “Noise-Contrastive Estimation of Unnormalized Statistical Models with Applications to Natural Image Statistics,”In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 297-304). JMLR Workshop and Conference Proceedings. [28] Oord, A. V. D., Li, Y., & Vinyals, O. (2018). “Representation Learning with Contrastive Predictive Coding,” arXiv preprint arXiv:1807.03748. [29] Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., ... & Brain, G. (2018, May). “Time-Contrastive Networks: Self-Supervised Learning from Video,” In 2018 IEEE international conference on robotics and automation (ICRA) (pp. 1134-1141). IEEE. [30] Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., & Zisserman, A. (2019). “Temporal Cycle-Consistency Learning,” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1801-1810). [31] https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/ output.md |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信