系統識別號 | U0002-2708202015191400 |
---|---|
DOI | 10.6846/TKU.2020.00801 |
論文名稱(中文) | 基於語義分割關鍵點檢測之物件重新定位規劃 |
論文名稱(英文) | Object Reorientation Planning Based on Semantic Segmentation Keypoint Detection |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 電機工程學系機器人工程碩士班 |
系所名稱(英文) | Master's Program In Robotics Engineering, Department Of Electrical And Computer Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 108 |
學期 | 2 |
出版年 | 109 |
研究生(中文) | 葉立宇 |
研究生(英文) | Li-Yu Yeh |
學號 | 606470093 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2020-07-17 |
論文頁數 | 83頁 |
口試委員 |
指導教授
-
翁慶昌(wong@ee.tku.edu.tw)
指導教授 - 劉智誠(chihchengliu20120419@gmail.com) 委員 - 龔宗鈞(cckung@ttu.edu.tw) 委員 - 陳珍源(jychen@mail.mcu.edu.tw) |
關鍵字(中) |
物件重新定位 夾取與放置 遮罩區域卷積神經網路 語義分割 三維關鍵點偵測 |
關鍵字(英) |
Object Reorientation Pick-and-Place Mask R-CNN Semantic Segmentation 3D Keypoint Detection |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
本論文提出一種基於語義分割關鍵點檢測之物件重新定位規劃方法,利用機械手臂將隨機擺放之物件重新放置到指定的位置和姿態。主要有兩個部分:(1) 三維關鍵點檢測系統,以及(2) 物件重新定位操作規劃系統。在三維關鍵點檢測系統上,本論文使用RGB-D攝影機讀取環境資訊,並產生該目標物件之三維關鍵點來表示該物件在環境中的位置及姿態資訊,此方法可以簡化三維模型之表示,並且在訓練階段僅需要加入這個類別物件的多種訓練資料,就可以用類別級別的方式來進行物件重新定位操作規劃。首先,使用遮罩區域卷積神經網路方法進行初步的物件偵測,並選用信心指數最高的物件影像作為語義分割系統的輸入,其目的是將圖片中的每個像素分類為物件的哪一個部件。此外,在使用卷積神經網路進行語義分割之後,本論文使用條件隨機場方法來進行多次疊代,以獲取一個更準確的物件辨識結果。當在影像處理過程中將目標物件分割成若干個部件後,可以獲得每個部件的中心位置,然後再依據物件的深度影像資訊可以獲得每個部件之中心點的法向量,並且可以藉由連接每個部件的中心點來得到物件的姿態。在物件重新定位操作規劃系統上,本論文首先將物件的姿態和每個部件的法向量轉換至機械手臂的工作座標系。然後根據物件的當前姿態和期望姿態,使用球面線性插值的方法來讓機械手臂在工作空間中進行一系列運動以重新定位物件。此外,本論文以物件表面上之影像特徵為根據,對物件之大地座標的z軸進行物件姿態的調整,使得所擺放之物件的姿態可以趨近於所期望的姿態。在實驗結果方面,本論文利用實驗室自製之機械手臂結合真空式吸盤夾具進行實機測試,驗證本論文所提出之系統確實得以完成所規劃之物件重新定位任務。 |
英文摘要 |
In this thesis, an object reorientation planning method based on a semantic segmentation keypoint detection method is proposed for a robot manipulator so that it can reorientate a randomly placed object to a specified position and pose. There are two main parts: (1) 3D keypoint detection system and (2) manipulation planning system for object reorientation. In the 3D keypoint detection system, an RGB-D camera is used to obtain the environment information and 3D keypoints of the target object are generated to represent its position and pose information in the environment. The representation of the 3D model can be simplified by this method and the object reorientation manipulation planning can be carried out in a category-level manner by adding a variety of training data of this type of object in the training stage. First, the Mask R-CNN algorithm is used for preliminary object detection and the object image with the highest confidence index is selected to be the input of the semantic segmentation system. Its purpose is to classify each pixel in the picture as which pack unit of the object. In addition, after using convolutional neural network for semantic segmentation, the conditional random field method is used to perform multiple iterations to obtain a more accurate object recognition result. When the target object is segmented into some pack units in the image process, the center position of each pack unit can be obtained. Then, a normal vector of the center point of each pack unit is generated based on the depth image information of the object and the pose of the object can be obtained by connecting the center points of each pack unit. In the manipulation planning system for object reorientation, the pose of the object and the normal vector of each pack unit are first converted into the working coordinate system of the robot manipulator. Then, according to the current pose and the expected pose of the object, the spherical linear interpolation method is used to let the robot manipulator perform a series of movements in the workspace to reorientate the object. In addition, based on the image features on the surface of the object, the pose of the object is adjusted on the z-axis of the object's geodetic coordinate system so that the pose of the placed object can approach the desired pose. In experimental results, a laboratory-made robot manipulator and a vacuum suction gripper were used to verify that the proposed system can indeed complete the planned task of object reorientation. |
第三語言摘要 | |
論文目次 |
目錄 中文摘要 Ⅰ 英文摘要 Ⅱ 目錄 III 圖目錄 VII 表目錄 XI 符號對照表 XII 中英文對照表 XIV 第一章 緒論 1 1.1 研究動機 1 1.2 文獻回顧 2 1.3 研究目的 5 1.4 論文架構 6 第二章 系統架構與軟硬體設備 7 2.1 系統架構 7 2.2 機械手臂之系統架構 8 2.2.1 機械手臂硬體規格 9 2.2.2 關節與連桿配置 10 2.2.3 吸盤夾爪模組 12 2.2.4 深度學習運算平台 13 2.3 機器人操作系統 14 第三章 三維關鍵點檢測系統 16 3.1 物件辨識方法概述 16 3.1.1 物件辨識演進分類 16 3.1.2 卷積神經網路 19 3.2 自動資料擴增系統 23 3.2.1 訓練資料 24 3.2.2 資料生成系統 25 3.3 遮罩區域卷積神經網路 26 3.3.1 殘差網路 27 3.3.2 特徵金字塔網路 29 3.3.3 區域推薦網路 30 3.3.4 感興趣區域對齊 34 3.3.5 末端結構 36 3.4 語義分割 38 3.4.1 全卷積網路 39 3.4.2 上採樣 40 3.4.3 空洞卷積 42 3.5 條件隨機場 43 3.6 物件關鍵點標註 44 3.7 非形體特徵之關鍵點檢測 47 第四章 物件重新定位操作規劃 50 4.1 物件姿態偵測與對比 50 4.2 機械手臂操作規劃 52 4.2.1 剛體在三維空間姿態表示法 53 4.2.2 攝影機座標轉換 56 4.2.3 空間座標系轉換 59 4.2.4 工作空間線性軌跡規劃 61 第五章 實驗流程及結果 63 5.1 實驗環境及流程 63 5.1.1 實驗環境設定 63 5.1.2 實驗流程 65 5.2 影像辨識結果 66 5.2.1 深度神經網路訓練資料 66 5.2.2 條件隨機場輸出比較 68 5.2.3 部件辨識輸出結果 69 5.2.4 物件姿態辨識結果 72 5.3 實機測試 73 第六章 結論與未來展望 76 6.1 結論 76 6.2 未來展望 77 參考文獻 78 圖目錄 圖 1.1、服務型機器人 2 圖 2.1、本論文之系統流程圖 8 圖 2.2、機械手臂控制系統架構圖 9 圖 2.3、七自由度機械手臂活動範圍圖 11 圖 2.4、七自由度機械手臂之結構示意圖 11 圖 2.5、七自由度機械手臂之連桿長度示意圖 11 圖 2.6、吸盤夾爪模組實體圖 12 圖 2.7、吸盤夾爪之電路實體圖 13 圖 2.8、ROS之TOPIC以及SERVER溝通示意圖 15 圖 3.1、深度學習方法於物件辨識的四種主流分類 17 圖 3.2、卷積神經網路之架構圖 19 圖 3.3、卷積運算示意圖[40] 20 圖 3.4、特徵圖產生示意圖[40] 20 圖 3.5、最大池化運算示意圖[40] 21 圖 3.6、線性整流單元示意圖[40] 22 圖 3.7、補零填充運算示意圖[40] 22 圖 3.8、卷積層經平坦化連接至全連接層示意圖 23 圖 3.9、遮罩區域卷積神經網路之訓練資料的標記示意圖 24 圖 3.10、語義分割之訓練資料的標記示意圖 25 圖 3.11、資料生成系統之流程圖 25 圖 3.12、MASK R-CNN整體架構示意圖 26 圖 3.13、殘差學習單元示意圖[43] 27 圖 3.14、殘差網路系列架構示意圖[43] 28 圖 3.15、殘差單元差異示意圖[43] 28 圖 3.16、特徵金字塔網路架構示意圖[44] 29 圖 3.17、RPN運作流程圖[45] 31 圖 3.18、RPN輸出結果預覽[46] 32 圖 3.19、NMS輸出結果預覽[46] 33 圖 3.20、ROI POOLING 與ROI ALIGN運算比較示意圖[47] 35 圖 3.21、MASK R-CNN最終輸出結果[46] 37 圖 3.22、基於卷積神經網路的語義分割示意圖[48] 38 圖 3.23、全連接轉換為全卷積網路輸出差異示意圖[38] 39 圖 3.24、卷積與反卷積運作示意圖 40 圖 3.25、跳躍結構的網路架構[48] 41 圖 3.26、不同上採樣結構輸出差異對比圖[48] 41 圖 3.27、空洞卷積示意圖[50] 42 圖 3.28、空洞卷積之效果對比圖[50] 43 圖 3.29、多次疊代條件隨機場輸出結果圖[50] 43 圖 3.30、利用點雲計算平面之平均法向量流程圖 47 圖 3.31、FAST特徵點示意圖[55] 49 圖 4.1、物件姿態重新定位示意圖 51 圖 4.2、機械手臂操作規劃流程圖 52 圖 4.3、軸-角旋轉示意圖 53 圖 4.4、本論文之尤拉角示意圖 55 圖 4.5、攝影機架設與機械手臂相對位置示意圖 57 圖 4.6、空間座標系轉換流程圖 59 圖 4.7、軸-角表示法轉換示意圖 59 圖 4.8、二維平面之兩向量線性插值示意圖 62 圖 5.1、實驗環境實體圖 64 圖 5.2、三種實驗物件圖 64 圖 5.3、實機測試流程圖 66 圖 5.4、物件隨機生成結果圖 67 圖 5.5、條件隨機場迭代次數輸出比較圖 69 圖 5.6、部件辨識流程輸出結果 70 圖 5.7、部件辨識流程輸出結果(隨機物件及姿態) 71 圖 5.8、水瓶物件在直立與平置之關鍵點檢測結果 72 圖 5.9、水瓶物件之平面單位向量計算結果示意圖 73 圖 5.10、實機測試結果分鏡圖 75 表目錄 表 2.1、機械手臂之硬體規格表 9 表 2.2、機械手臂之各關節規格表 10 表 2.3、吸盤夾爪模組規格表 12 表 2.4、個人電腦之硬體規格表 13 表 2.5、深度神經網路訓練軟體表 14 表 3.1、部件的對應顏色標籤 24 表 3.2、RPN損失函數參數對照表 32 表 3.3、推薦區域篩選流程 33 表 5.1、MASK R-CNN之神經網路訓練資料表 67 表 5.2、語義分割之神經網路訓練資料表 68 |
參考文獻 |
[1] “臺灣 AI 機器人科技應用與研發網絡發展動向.” URL: https://www.cier.edu.tw/site/cier/public/data/184-123-128-產業瞭望-魏聰哲.pdf. [2] “Pepper 情感服務型機器人.” URL: https://www.softbankrobotics.com/emea/en/pepper. [3] “Preferred Networks整理房間機器人.” URL: https://robotstart.info/2018/10/15/pfn-tidyup.html. [4] S. Christian, L. Wei, J. Yangqing, S. Pierre, R. Scott, A. Dragomir, E. Dumitru, V. Vincent, and R. Andrew, “Going deeper with convolutions,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07–12, pp. 1–9, 2015. [5] Q. Yanmin and W. Philip C, “Very deep convolutional neural networks for robust speech recognition,” IEEE Workshop on Spoken Language Technology, vol. 1, no. 16, pp. 481–488, 2017. [6] G. Ross, D. Jeff, D. Trevor, and M. Jitendra, “Rich feature hierarchies for accurate object detection and semantic segmentation,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014. [7] R. Girshick, “Fast R-CNN,” IEEE International Conference on Computer Vision, pp. 1440–1448, 2015. [8] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” Neural Information Processing Systems, pp. 91–99, 2015. [9] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” IEEE International Conference on Computer Vision, pp. 2980–2988, 2017. [10] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016. [11] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 187–213, 2017. [12] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.. [13] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” European Conference on Computer Vision, pp. 21–37, 2016. [14] P. Jiang, Y. Ishihara, N. Sugiyama, J. Oaki, S. Tokura, A. Sugahara, and A. Ogawa, “Depth image–based deep learning of grasp planning for textureless planar-faced objects in vision-guided robotic bin-picking,” Sensors (Switzerland), vol. 20, no. 3, 2020. [15] C. M. Lin, C. Y. Tsai, Y. C. Lai, S. A. Li, and C. C. Wong, “Visual object recognition and pose estimation based on a deep semantic segmentation network,” IEEE Sensors Journal, vol. 18, no. 22, pp. 9370–9381, 2018. [16] Y.Wu, Y.Fu, and S.Wang, “Deep instance segmentation and 6D object pose estimation in cluttered scenes for robotic autonomous grasping,” Industrial Robot, vol. 47, no. 4, pp. 593-606, 2020. [17] L. Manuelli, W. Gao, P. Florence, and R. Tedrake, “kPAM: KeyPoint affordances for category-level robotic manipulation,” arXiv preprint arXiv:1903.06684, 2019. [18] X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, “Integral human pose regression,” Computer Science, vol. 11210, pp. 536–553, 2018. [19] J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,” Robotics: Science and Systems, pp. 58-72, 2017. [20] D. Morrison, J. Leitner, and P. Corke, “Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach,” Robotics: Science and Systems, pp. 21-31, 2018. [21] D. Guo, T. Kong, F. Sun, and H. Liu, “Object discovery and grasp detection with a shared convolutional neural network,” IEEE International Conference on Robotics and Automation, pp. 2038–2043, 2016. [22] I. Lenz, H. Lee, and A. Saxena, “Deep learning for detecting robotic grasps,” The International Journal of Robotics Research, vol. 34, no. 5, pp. 705–724, 2015. [23] J. Mahler, M. Matl, V. Satish, M. Danielczuk, B. DeRose, S. McKinley, and K. Goldberg, “Learning ambidextrous robot grasping policies,” Science Robotics, vol. 4, no. 26, 2019. [24] T. Kulvicius, M. Biehl, M. J. Aein, M. Tamosiunaite, and F. Wörgötter, “Interaction learning for dynamic movement primitives used in cooperative robotic tasks,” Robotics and Autonomous Systems, vol. 61, no. 12, pp. 1450–1459, 2013. [25] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: Learning attractor models formotor behaviors,” Neural Computation, vol. 25, no. 2, pp. 328–373, 2013. [26] H. B. Amor, O. Kroemer, U. Hillenbrand, G. Neumann, and J. Peters, “Generalization of human grasping for multi-fingered robot hands,” IEEE International Conference on Intelligent Robots and Systems, pp. 2043–2050, 2012. [27] 黃宥竣,基於行為複製之機械手臂的物件夾取,淡江大學電機工程學系碩士論文(指導教授:翁慶昌、蔡奇謚),2018。 [28] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017. [29] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” International Journal of Robotics Research, vol. 37, no. 4–5, pp. 421–436, 2018. [30] K. Wada, K. Okada, and M. Inaba, “Joint learning of instance and semantic segmentation for robotic pick-and-place with heavy occlusions in clutter,” International Conference on Robotics and Automation, pp. 9558–9564, 2019. [31] W. Wan, H. Igawa, K. Harada, H. Onda, K. Nagata, and N. Yamanobe, “A regrasp planning component for object reorientation,” Autonomous Robots, vol. 43, no. 5, pp. 1101–1115, 2019. [32] W. Wan, M. T. Mason, R. Fukui, and Y. Kuniyoshi, “Improving regrasp algorithms to analyze the utility of work surfaces in a workcell,” IEEE International Conference on Robotics and Automation, pp. 4326-4333, 2015. [33] A. Nguyen, D. Kanoulas, D. G. Caldwell, and N. G. Tsagarakis, “Preparatory object reorientation for task-oriented grasping,” IEEE International Conference on Intelligent Robots and Systems, pp. 893–899, 2016. [34] R. Newbury, K. He, A. Cosgun, and T. Drummond, “Learning to place objects onto flat surfaces in human-preferred orientations,” arXiv preprint arXiv:2004.00249, 2020. [35] T. T. Do, A. Nguyen, and I. Reid, “AffordanceNet: An end-to-end deep learning approach for object affordance detection,” IEEE International Conference on Robotics and Automation, pp. 5882–5889, 2018. [36] 賴宥澄,視覺任務導向抓取於雙臂機器人的工具操作,淡江大學電機工程學系博士論文(指導教授:翁慶昌、蔡奇謚),2020。 [37] Z. Qin, K. Fang, Y. Zhu, F. Li, and S. Savarese, “KETO: Learning keypoint representations for tool manipulation,” arXiv preprint arXiv:1910.11977, 2019. [38] Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, “Fully convolutional instance-aware semantic segmentation,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446, 2017. [39] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time instance segmentation,” IEEE International Conference on Computer Vision, pp. 9156–9165, 2019. [40] “Deep Learning A-ZTM: Hands-on artificial neural networks | Udemy.” URL: https://www.udemy.com/course/deeplearning/. [41] H. Li, J. Wang, M. Tang, and X. Li, “Polarization-dependent effects of an Airy beam due to the spin-orbit coupling,” Journal of the Optical Society of America A: Optics and Image Science, and Vision, vol. 34, no. 7, pp. 1114–1118, 2017. [42] Y.J. Huang, Y.C. Lai, R.J. Chen, C.Y. Tsai and C.C. Wong, “A deep learning-based object detection algorithm applied in shelf-picking robot,” International Automatic Control Conference, pp. 1-13, 2017. [43] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. [44] Y. Liang, W. Changjian, L. Fangzhao, P. Yuxing, L. Qin, Y. Yuan, and H. Zhen, “TFPN: Twin feature pyramid networks for object detection,” International Conference on Tools with Artificial Intelligence, pp. 1702–1707, 2019. [45] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. [46] “Mask_RCNN: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow.” URL: https://github.com/matterport/Mask_RCNN. [47] “Image segmentation with Mask R-CNN - Jonathan Hui - Medium.” URL: https://medium.com/@jonathan_hui/image-segmentation-with-mask-r-cnn-ebe6d793272. [48] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, 2015. [49] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2014. [50] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2018. [51] R. Jonschkowski, C. Eppner, S. Höfer, R. Martín-Martín, and O. Brock, “Probabilistic multi-class segmentation for the Amazon picking challenge,” IEEE International Conference on Intelligent Robots and Systems, vol. 2016-Novem, no. ii, pp. 1–7, 2016. [52] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” IEEE International Conference on Computer Vision, pp. 2564–2571, 2011. [53] B. Pröll and H. Werthner, “Lecture notes in computer science: preface,” 2005. [54] M. Fischler and R. C. Bolles, “Paradigm for model,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. [55] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” Lecture Notes in Computer Science, vol. 3951, pp. 430–443, 2006. [56] 簡紹宇,基於深度強化學習之雙臂機器人的自碰撞避免與運動控制, 淡江大學電機工程學系碩士論文(指導教授:翁慶昌、劉智誠),2019。 [57] 鄭期元,整合 GNSS 與 INS 量測資訊的地平面運動軌跡估測,國立交通大學碩士論文(指導教授:胡竹生),2013。 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信