系統識別號 | U0002-2002202012145600 |
---|---|
DOI | 10.6846/TKU.2020.00553 |
論文名稱(中文) | 基於深度強化學習之輪型足球機器人的守門員策略 |
論文名稱(英文) | Deep Reinforcement Learning Based Goalkeeper Strategy of Wheeled Soccer Robot |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 電機工程學系機器人工程碩士班 |
系所名稱(英文) | Master's Program In Robotics Engineering, Department Of Electrical And Computer Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 108 |
學期 | 1 |
出版年 | 109 |
研究生(中文) | 黃俊睿 |
研究生(英文) | Chun-Jui Huang |
學號 | 605470169 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2020-01-19 |
論文頁數 | 65頁 |
口試委員 |
指導教授
-
李世安
委員 - 李世安 委員 - 劉智誠 委員 - 馮玄明 |
關鍵字(中) |
輪型足球機器人 守門員策略 深度強化學習 柔性行動者評論家 Gazebo模擬器 |
關鍵字(英) |
Wheeled Soccer Robot Goalkeeper Strategy Deep Reinforcement Learning Soft Actor-critic(SAC) Gazebo Simulator |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
隨著機器學習的發展,深度學習已經廣泛的應用在工業製造中的影像辨識中,而強化學習也已經有在各個遊戲平台中,表現出優於人類的能力。本論文實現一個基於深度強化學習之輪型足球機器人的守門員策略,已達到防守敵方攻擊者的射門進攻策略,並提出一個有效的訓練方法,使足球守門員能夠以更加有效率的方式學習策略。本論文所採用的深度強化學習演算法為柔性行動者評論家,藉由設計此演算法的神經網路架構和以Gazebo模擬器做為環境平台,使機器守門員能更在模擬的環境中探索與學習,並能有效得完成指定的任務。訓練方法部分,本論文設計良好的獎勵函數和利用漸進式學習的方式。藉由慢慢增加訓練環境的困難度,使訓練能夠更加穩定,以及降低訓練所需要花費的時間。根據本論文所設計之方法,以分區域的方法進行測試其策略的有效性,皆表現出優良的防守率,以證明本論文所設計的方法的強健性。 |
英文摘要 |
With the development of machine learning, deep learning has been widely applied in industrial manufactory, and reinforcement learning also has a good performance in lots of games. This paper presents a method based on deep reinforcement learning on goalkeeper strategy of wheeled soccer robot which makes the robot goalkeeper can defend the attacker shoots the soccer toward the goal and presents an effective training method that can make robot goalkeeper has a high effectivity in learning the strategy. We adopt the Soft Actor-Critic (SAC) which is one of a deep reinforcement learning algorithm. With the Gazebo simulator and SAC, the robot goalkeeper can explore the environment and learn the strategy in the simulated environment, and enable the robot goalkeeper stable and effectively complete the specified tasks by designing neural network architecture. In the section of the training method, the idea of this paper creates a useful reward function and using progressive learning. By increasing the complexity of the environment step by step, the training can be more stable, and reduce the training time. According to the method presented in the paper, by dividing the field into several areas to test the result, the robot goalkeeper has a good performance in its defending rate that proofs the robusticity of the present method. |
第三語言摘要 | |
論文目次 |
目錄 中文摘要 I 英文摘要 II 目錄 III 圖目錄 V 表目錄 VII 符號對照表 VIII 中英文對照表 XI 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 4 1.3 論文架構 5 第二章 文獻回顧 6 2.1 類神經網路 6 2.2 深度強化學習 10 第三章 研究方法 23 3.1 柔性行動者評論家 23 3.2 實驗環境 30 3.3 系統架構 37 第四章 守門員策略 40 4.1 神經網路設計 40 4.2 獎勵函數 46 4.3 訓練方法 51 第五章 實驗結果 54 5.1 測試環境 54 5.2 結果分析 55 第六章 結論與未來展望 61 6.1 結論 61 6.2 未來展望 61 參考文獻 63 圖目錄 圖1.1、Shakey機器人 2 圖1.2、Kiva機器人 3 圖1.3、Scout 3 圖2.1、單層感知網路架構 6 圖2.2、多層感知機網路架構 7 圖2.3、深度神經網路之網路架構圖 8 圖2.4、線性整流函數 9 圖2.5、雙曲正切函數 10 圖2.6、強化學習之架構圖 11 圖2.7、策略梯度演算法架構圖 15 圖2.8、DQN架構圖 19 圖2.9、DQN與其他演算法之比較 20 圖2.10、行動者評論家架構圖 21 圖2.11、深度強化學習之分類 22 圖3.1、柔性行動者評論家架構圖 25 圖3.2、柔性行動者評論家之演算法流程 29 圖3.3、FIRA足球場之規格圖 31 圖3.4、機器人模型 31 圖3.5、FIRA足球場之模型 32 圖3.6、第六代中型足球機器人 32 圖3.7、全方位輪 33 圖3.8、機器人之底盤 34 圖3.9、全方位影像 35 圖3.10、全方位視覺系統 36 圖3.11、定位系統架構圖 37 圖3.12、整體系統架構圖 38 圖4.1、機器人與球之相對位置示意圖 42 圖4.2、機器人之輸出示意圖 44 圖4.3、策略網路架構 45 圖4.4、狀態價值網路架構 46 圖4.5、動作價值網路架構 46 圖4.6、機器人之位置示意圖 50 圖4.7、機器人與足球之距離獎勵示意圖 51 圖4.8、靜態環境訓練階段示意圖 52 圖4.9、動態環境訓練階段示意圖 53 圖5.1、驗證之射門進攻位置 54 圖5.2、驗證之射門點位 55 表目錄 表3.1、第六代中型足球機器人之規格 33 表3.2、電腦規格 39 表4.1、策略網路之輸入 43 表4.2、神經網路參數表 45 表4.3、稀疏獎勵 47 表5.1、驗證結果 56 表5.2、總測試結果 60 |
參考文獻 |
[1] RoboCup, URL: https://www.robocup.org/ [2] FIRA, URL: http://www.fira.net [3] WRS, URL: https://worldrobotsummit.org/ [4] Shakey, URL: http://www.ai.sri.com/shakey/ [5] Kiva robot, URL: https://robohub.org/meet-the-drone-that-already-delivers-your-packages-kiva-robot-teardown/ [6] Scout, URL: https://blog.aboutamazon.com/transportation/meet-scout [7] G. E. Hinton, S. Osindero and Y. W. Teh, “A Fast Learning Algorithm for Feep Belief Nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006. [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Neural Information Processing Systems (NIPS), pp. 1106-1114, 2012. [9] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. [10] C. J. C. H. Watkins and P. Dayan, “Q-Learning,” Machine Learning, vol. 8, pp. 279-292, 1992. [11] V. Mnih, K. Kavukcuoglu, et al. “Human-Level Control through Deep Reinforcement Learning,” Nature, vol. 518, pp. 529-533, 2015. [12] D. Silver, A. Huang, et al. “Mastering the Game of Go with Deep Neural Networks and Tree Search,” Nature, vol. 529, pp. 484-489, 2016. [13] D. Silver, J. Schrittwieser, et al. “Mastering the Game of Go without Human Knowledge,” Nature, vol. 550, pp. 354-359, 2017. [14] C. Cortes and V. Vapnik, “Support Vector Networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995. [15] A. Ben-Hur, D. Horn, H. T. Siegelmann and V. Vapnik, “Support Vector Clustering,” Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001. [16] J. Peters and S. Schaal, “Policy Gradient Methods for Robotics,” In Proceedings of the IROS, pp. 2219-2225, 2006. [17] M. V. Otterlo and M. Wiering, “Reinforcement Learning and Markov Decision Processes,” In Reinforcement Learning, pp. 3-42, 2012. [18] M. Andrychowicz, F. Wolski, et al. “Hindsight Experience Replay,” In Advances in Neural Information Processing Systems, pp. 5048-5058, 2017. [19] V. Hasselt, Hado, A. Guez, and D. Silver. “Deep Reinforcement Learning with Double Q-learning,” Thirtieth AAAI Conference on Artificial Intelligence, vol. 2, pp. 2094-2100, 2016. [20] M. Hessel, J. Modayil, et al. “Rainbow: Combining Improvements in Deep Reinforcement Learning,” Thirty-Second AAAI Conference on Artificial Intelligence, pp. 3215-3222, 2018. [21] V. R. Konda and J. N. Tsitsiklis, “Actor-critic Algorithms,” In Advances in Neural Information Processing Systems, pp. 1008-1014, 2000. [22] V. Mnih, A. P. Badia, et al. “Asynchronous Methods for Deep Reinforcement Learning,” In International Conference on Machine Learning (ICML), pp. 1928-1937, 2016. [23] T. P. Lillicrap, J. J. Hunt et al. “Continuous Control with Deep Reinforcement Learning,” International Conference on Machine Learning (ICML), pp. 1-14, 2016. [24] S. Fujimoto, H. V. Hoof, and D. Meger, “Addressing Function Approximation Error in Actor-critic Methods,” International Conference on Machine Learning (ICML), pp. 1257-1596, 2018. [25] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with A Stochastic Actor,” International Conference on Machine Learning (ICML), pp. 1856-1865, 2018. [26] T. Haarnoja, A. Zhou, et al. “Soft Actor-Critic Algorithms and Applications,” arXiv: 1812.05905, pp. 1-17, 2018. [27] T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement Learning with Deep Energy-Based Policies,” Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1352-1361, 2017. [28] 阮明達,基於深度強化學習在動態環境下之移動型機器人的避障,淡江大學電機工程學系碩士論文,2019年6月。 [29] 連振宇,基於全方位影像的距離測量之移動機器人避障,淡江大學電機工程學系碩士論文,2017年6月。 [30] Y. Liu, X. Wu, J. J. Zhu and J. Lew, “Omni-directional Mobile Robot Controller Design by Trajectory Linearization,” Proceedings of the 2003 American Control Conference, vol. 4, pp. 3423-3428, 2003. [31] O. Zhelo, J. Zhang, et al. “Curiosity-Driven Exploration for Mapless Navigation with Deep Reinforcement Learning,” Machine Learning in the Planning and Control of Robot Motion (MLPC), pp. 1-5, 2018. [32] M. Abreu, L. P. Reis and H. L. Cardoso, “Learning High-level Robotic Soccer Strategies from Scratch through Reinforcement Learning,” IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pp. 1-7, 2019. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信