§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1505202423413600
DOI 10.6846/tku202400143
論文名稱(中文) 基於深度影像的3D動作評量系統-以太極拳為例
論文名稱(英文) 3D Motion Evaluation System Using Depth Image: A Case Study of Tai Chi
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 112
學期 2
出版年 113
研究生(中文) 陳思玫
研究生(英文) Ssu-Mei Chen
學號 611410134
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2024-07-10
論文頁數 42頁
口試委員 指導教授 - 陳建彰(ccchen34@mail.tku.edu.tw)
口試委員 - 林承賢(cslin@mail.tku.edu.tw)
口試委員 - 許哲銓(tchsu@scu.edu.tw)
關鍵字(中) 對比式學習
時間循環一致性學習
逐幀檢索
動作評分系統
關鍵字(英) Contrastive Learning
Temporal Cycle Conslstency Learning
Frame by frame retrieval
Motion Scoring System
第三語言關鍵字
學科別分類
中文摘要
太極拳作為中華文化中的傳統武術,在教學過程中面臨主觀評分不一致且耗時的問題。隨著學習人數的增加,傳統的人工評分方式已難以滿足需求。為了解決這一挑戰,本研究開發了一個基於深度學習的太極拳動作評分系統。本研究收集了標準太極拳影片,利用MediaPipe姿勢評估系統捕捉人體骨骼資訊,提取骨骼特徵,並將這些特徵與深度影片輸入時間循環一致性學習網路(TCC)進行訓練,該網路能夠捕捉動作在時間維度上的一致性與差異。我們進行了四種不同模型的實驗:前視角無深度、前視角有深度、雙視角無深度以及雙視角有深度模型。實驗結果顯示,雙視角有深度模型在動作識別和評分的準確性上達到最佳效果,精確度、召回率和F1值分別為98%、94%和96%。這些結果表明,該系統能夠精確識別學習者動作中的細微差異,提供詳細反饋,協助教學者更有效地指導學習者練習,顯著提高太極拳教學的效率和品質,推動現代科技與傳統文化的融合。
英文摘要
Tai Chi, as a traditional martial art in Chinese culture, faces issues of inconsistent and time-consuming subjective evaluations in the teaching process. With the increasing number of learners, traditional manual scoring methods can no longer meet the demand. To address this challenge, this study developed a deep learning-based Tai Chi movement evaluation system. We collected standard Tai Chi videos and used the MediaPipe pose estimation system to capture human skeletal information, extracting skeletal features, and inputting these features along with depth videos into a Temporal Consistency Learning Network (TCC) for training. This network can capture the consistency and differences in movements over time. We conducted experiments on four different models: front view without depth, front view with depth, dual view without depth, and dual view with depth. The experimental results showed that the dual view with depth model achieved the best performance in movement recognition and evaluation accuracy, with precision, recall, and F1 scores of 98%, 94%, and 96%, respectively. These results indicate that the system can accurately identify subtle differences in students' movements, provide detailed feedback, assist teachers in guiding students more effectively, significantly improve the efficiency and quality of Tai Chi teaching, and promote the integration of modern technology with traditional culture.
第三語言摘要
論文目次
目錄
目錄	v
圖目錄	vii
表目錄	viii
第一章	緒論	1
1.1	研究背景與動機	1
1.2	研究目的	2
1.3  論文架構	3
第二章	文獻探討	4
2.1 人體動作識別相關研究	4
2.2 對比式學習	7
2.2.1  Contrastive Loss	8
2.2.2  Triplet Loss	9
2.2.3  InforNCE Loss	9
2.3 時間循環一致性學習	10
第三章 基於深度影像與時間循環一致性之太極拳動作評分演算法	15
3.1 系統架構	15
3.2 資料集	17
3.3 肢體關節點座標繪製	17
3.4 教學者及學習者的動作差異	18
3.4.1 前視角無深度之影片	19
3.4.2 前視角含有深度之影片	20
3.4.3 前視角含有深度之影片	21
3.4.4前後視角含有深度之影片	22
第四章 實驗結果	24
4.1 太極拳動作對齊實驗結果	24
4.2 逐幀分析實驗結果	25
4.4 結果分析與討論	27
4.4.1 資料分析	27
4.4.2 圖表與資料	27
第五章 結論與未來研究方向	35
5.1 結論	35
5.2 未來研究方向	36
參考文獻	37 
圖目錄
圖 1、TCC運算流程圖	10
圖 2、TCC架構圖	11
圖 3、影片逐幀檢索	13
圖 4、影片對齊	14
圖 5、逐幀分析結果示意圖	26
圖6、訓練損失函數下降趨勢圖	27
圖7、分段動作配對結果圖	29
圖10、訓練和測試過程中的loss曲線	31
圖11、訓練時間和資源使用情況圖	33

 
表目錄
表 1、OpenPose與MediaPipe的比較(本研究整理)	5
表2、四種模型下輸入單一影片之匹配準確性	31


參考文獻
參考文獻
[1]	3D convolution ¶. 3D Convolution - PaddleEdu documentation. (n.d.). 
https://paddlepedia.readthedocs.io/en/latest/tutorials/CNN/convolution_operator/3D_Convolution.html 
[2]	Google. (n.d.). Google/mediapipe: Cross-platform, customizable ML solutions for live and streaming media. GitHub. 
https://github.com/google/mediapipe 
[3]	Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., & Feng, D. D. (2019). Deep Convolutional Neural Networks for human action recognition using depth maps and postures. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49(9), 1806–1819. 
[4]	Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., & Zisserman, A. (2019). Temporal cycle-consistency learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 
[5]	Jalal, A., Nadeem, A., & Bobasu, S. (2019). Human body parts estimation and detection for physical sports movements. 2019 2nd International Conference on Communication, Computing and Digital Systems (C-CODE). 
[6]	Ooke, N., Ikegami, Y., Yamamoto, K., & Nakamura, Y. (2022). Transfer learning of deep neural network human pose estimator by domain-specific data for video motion capturing. 2022 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO). 
[7]	Reinschmidt, C., Van Den Bogert, A. J., Nigg, B. M., Lundberg, A., & Murphy, N. (1997). “Effect of Skin Movement on the Analysis of Skeletal Knee Joint Motion During Running,” Journal of biomechanics, 30(7), 729-732.
[8]	Fujimori, Y., Ohmura, Y., Harada, T., & Kuniyoshi, Y. (2009, May). “Wearable Motion Capture Suit with Full-Body Tactile Sensors,” In 2009 IEEE International Conference on Robotics and Automation (pp. 3186-3193). IEEE.
[9]	Liu, L., Wu, X., Wu, L., & Guo, T. (2012, October). “Static Human Gesture Grading Based on Kinect,” In 2012 5th International Congress on Image and Signal Processing (pp. 1390-1393). IEEE. 
[10]	Dempsey, P. (2017). “The Teardown-Nintendo Switch Gaming System,” [Reviews Consumer Technology]. Engineering & Technology, 12(4), 82-83.
[11]	Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). “Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields,” In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299).
[12]	Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., ... & Grundmann, M. (2019). Mediapipe: A Framework for Building Perception Pipelines,” arXiv preprint arXiv:1906.08172.
[13]	Hadsell, R., Chopra, S., & LeCun, Y. (2006, June). “Dimensionality Reduction by Learning an Invariant Mapping,” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 1735-1742). IEEE.
[14]	Ren, Z., Meng, J., Yuan, J., & Zhang, Z. (2011, November). “Robust Hand Gesture Recognition with Kinect Sensor,” In Proceedings of the 19th ACM international conference on Multimedia (pp. 759-760).
[15]	Wei, S. E., Tang, N. C., Lin, Y. Y., Weng, M. F., & Liao, H. Y. M. (2014, November). “Skeleton-Augmented Human Action Understanding by Learning with Progressively Refined Data,” In Proceedings of the 1st ACM International Workshop on Human Centered Event Understanding from Multimedia (pp. 7-10).
[16]	Huang, J. D. (2011, October). “Kinerehab: a Kinect-Based System for Physical Rehabilitation: a Pilot Study for Young Adults with Motor Disabilities,” In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility (pp. 319-320).
[17]	De Smedt, Q., Wannous, H., Vandeborre, J. P., Guerry, J., Saux, B. L., & Filliat, D. (2017, April). “3D Hand Gesture Recognition Using a Depth and Skeletal Dataset: Shrec'17 track,” In Proceedings of the Workshop on 3D Object Retrieval (pp. 33-38).
[18]	Devineau, G., Xi, W., Moutarde, F., & Yang, J. (2018, June). “Convolutional Neural Networks for Multivariate Time Series Classification using both Inter-and Intra-Channel Parallel Convolutions,” In Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP'2018).
[19]	Hou, J., Wang, G., Chen, X., Xue, J. H., Zhu, R., & Yang, H. (2018). “Spatial-Temporal Attention Res-TCN for Skeleton-based Dynamic Hand Gesture Recognition,” In Proceedings of the European conference on computer vision (ECCV) workshops (pp. 0-0).
[20]	Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., & Grundmann, M. (2020). “Blazepose: On-device Real-Time Body Pose tracking,” arXiv preprint arXiv:2006.10204.
[21]	Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014). “Microsoft coco: Common Objects in Context,” In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (pp. 740-755). Springer International Publishing.
[22]	Križnar, V., Leskovšek, M., & Batagelj, B. (2021, September). “Use of Computer Vision Based Hand Tracking in Educational Environments,” In 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO) (pp. 804-809). IEEE.
[23]	Newell, A., Yang, K., & Deng, J. (2016). “Stacked Hourglass Networks for Human Pose Estimation,” In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14 (pp. 483-499). Springer International Publishing.
[24]	Bazarevsky, V., Kartynnik, Y., Vakunov, A., Raveendran, K., & Grundmann, M. (2019). “Blazeface: Sub-millisecond Neural Face Detection on Mobile GPUs,” arXiv preprint arXiv:1907.05047.
[25]	Hadsell, R., Chopra, S., & LeCun, Y. (2006, June). “Dimensionality Reduction by Learning an Invariant Mapping,” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 1735-1742). IEEE.
[26]	Schroff, F., Kalenichenko, D., & Philbin, J. (2015). “FaceNet: A Unified Embedding for Face Recognition and Clustering,” In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823).
[27]	Gutmann, M., & Hyvärinen, A. (2010, March). “Noise-Contrastive Estimation of Unnormalized Statistical Models with Applications to Natural Image Statistics,”In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 297-304). JMLR Workshop and Conference Proceedings.
[28]	Oord, A. V. D., Li, Y., & Vinyals, O. (2018). “Representation Learning with Contrastive Predictive Coding,” arXiv preprint arXiv:1807.03748.
[29]	Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., ... & Brain, G. (2018, May). “Time-Contrastive Networks: Self-Supervised Learning from Video,” In 2018 IEEE international conference on robotics and automation (ICRA) (pp. 1134-1141). IEEE.
[30]	Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., & Zisserman, A. (2019). “Temporal Cycle-Consistency Learning,” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1801-1810).
[31]	https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/ output.md
論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於2024-12-05, 於網際網路公開,延後電子全文
校內
校內紙本論文立即公開
同意電子論文全文授權於全球公開
校內電子論文立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信