§ 瀏覽學位論文書目資料
系統識別號 U0002-2908201903093400
DOI 10.6846/TKU.2019.00990
論文名稱(中文) 基於深度姿態估測網路之無紋理模型拼裝規劃
論文名稱(英文) Textureless Model Assembling Planning Based on a Deep Pose Estimation Network
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 107
學期 2
出版年 108
研究生(中文) 高吾凱
研究生(英文) Wu-Kai Kao
學號 606460102
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2019-07-11
論文頁數 49頁
口試委員 指導教授 - 翁慶昌(wong@ee.tku.edu.tw)
指導教授 - 蔡奇謚(chiyi_tsai@mail.tku.edu.tw)
委員 - 許陳鑑(jhsu@ntnu.edu.tw)
委員 - 劉智誠(chihchengliu20120419@gmail.com)
委員 - 蔡奇謚(chiyi_tsai@mail.tku.edu.tw)
關鍵字(中) 姿態估測
機器人
卷積神經網路
深度學習
機器視覺
系統整合
機器人作業系統
虛幻引擎
關鍵字(英) Pose Estimation
Robot
CNN
Deep Learning
Robot Vision
System integration
Robot Operating System(ROS)
Unreal Engine
第三語言關鍵字
學科別分類
中文摘要
本論文提出一應用於無紋理模型拼裝任務之深度姿態估測網路,以拼裝一組有6個不同形狀且無紋理特徵的飛機切斷模型作為目標,搭配機器人作業系統(Robot Operating System, ROS)為開發環境之七軸機械手臂,對隨機擺放之目標物件以正確之相對關係進行組裝任務。所提出的深度姿態估測網路設計首先對輸入之RGB影像透過VGG網路進行影像特徵提取,再透過多任務卷積層進行物體偵測及姿態估測。由於目標模型為無紋理物體,我們發現使用原始的VGG網路無法達到理想的偵測效果。因此,為了提高影像特徵擷取的效率,本論文針對VGG網路進行改良,以提升無紋理物體的偵測率。在網路訓練上採用監督式訓練方法對整體網路進行多任務訓練,其可分別使用不同任務之損失函數進行不同網路的權重更新,使得深度卷積神經網路能夠預測訓練目標投影在二維影像上的三維邊界框。藉由此網路模型的輸出,可搭配現有EPnP演算法來估測攝影機與目標物件之相對姿態資訊,讓機械手臂能定位目標物件的三維座標,精確地吸取目標物件來達成模型拼裝的任務。
英文摘要
This thesis proposes a deep pose estimation network applied to the textureless model assembly task, which aims to assemble six textureless models with different shapes into a complete aircraft model. We use ROS as the development environment to integrate the proposed pose estimation network and the control system of the 7-DoF manipulator to perform the assembly task, in which the target objects are randomly placed in the workspace. The proposed pose estimation network firstly extracts image feature maps of the input RGB image through the VGG network, and then performs object detection and attitude estimation through multi-task convolution layers. Since the target models are textureless objects, we found that using the original VGG network to extract feature maps cannot achieve a desired detection rate. Therefore, in order to improve the efficiency of image feature extraction, we modify the existing VGG network to improve the detection rate of textureless objects. In the network training, the supervised training method is used for multi-task training of the proposed network, which can use different loss functions for different tasks to update the weights of different networks, so that the deep convolutional neural network can predict the projection of the 3D bounding box of the training target onto the 2D image plane. With the output of the network model, the existing PnP algorithm can be used to estimate the relative pose information between the camera and the target object, so that the robot can locate the 3D coordinates of the target object and accurately grasp the target object to achieve the task of model assembly.
第三語言摘要
論文目次
目錄
目錄	I
圖目錄	IV
表目錄	VI
第一章	緒論	1
1.1	研究背景	1
1.2	研究目的	2
1.3	論文架構	3
第二章	實驗平台與系統	4
2.1	前言	4
2.2	機械手臂介紹	4
2.2.1	機械手臂之配置與構造	4
2.2.2	模組化吸取裝置	7
2.3	深度學習相關平台	9
第三章	視覺拼裝系統架構與拼裝規劃	11
3.1	前言	11
3.2	機器人作業系統	12
3.3	動作流程	13
圖3.3、拼裝規劃與策略流程圖	14
3.4.1.	挑選物件	14
3.4.2.	座標轉換	15
3.4.3.	路徑規劃	17
3.4.4.	微調末端軸工具姿態	18
第四章	深度姿態估測網路設計與實現	19
4.1	前言	19
4.2	虛幻引擎	20
4.2.1	NDDS	24
4.2.2	全自動生成訓練集	25
4.2.3	訓練集	28
4.3	網路架構	29
4.4	EPnP位置與姿態估測	33
第五章	實驗流程及結果	36
5.1	訓練結果	36
5.2.1	實驗一:驗證訓練集規模與特徵提取架構	37
5.2.2	實驗二:訓練多物件	39
5.2	拼裝規劃實驗	43
第六章	結論與未來展望	45
參考文獻	46

 
圖目錄
圖2.1、七自由度冗餘機械手臂之結構示意圖	5
圖2.2、七自由度冗餘機械手臂之結構尺寸示意圖	5
圖2.3、七自由度冗餘機機械手臂活動範圍圖	7
圖2.4、模組化吸取裝置	9
圖3.1、系統方塊圖	11
圖3.2、ROS分散式架構說明	13
圖3.3、拼裝規劃與策略流程圖	14
圖3.4、挑選物件示意圖	15
圖3.5、攝影機架設與機械手臂配置示意圖	15
圖3.6、末端工具調整示意圖	18
圖4. 1、深度估測網路之系統架構	20
圖4.2、虛幻引擎操作界面	21
圖4.3、模擬實體物件模型與虛擬場地	22
圖4.4、實體物模型放置	23
圖4.5、藍圖可視化腳本	23
圖4.6、生成資料內容	25
圖4.7、場域隨機化生成畫面	26
圖4.8、圓周圍繞腳本說明	27
圖4.9、隨機自體掉落生成畫面	27
圖4.10、DOPE網路架構圖	29
圖4.11、DOPE LITE網路架構圖	30
圖4.12、控制點預測示意圖	32
圖4.13、控制點預測網路設計	33
圖4.14、P3P問題示意圖	34
圖4.15、表示任意三維座標示意圖	34
圖4.16、篩選特徵點示意圖	35
圖5.1、目標物件與測試環境	37
圖5.2、機頭模型尺寸規格	38
圖5.3、機身模型尺寸規格	40
圖5.4、機身模型尺寸規格	40
圖5.5、各模型之訓練權重表現狀況	41
圖5.6、拼裝規劃展示	43
圖5.7、投影座標之監控圖	44
 
表目錄
表2.1、雙臂機器人之機械手臂馬達硬體規格表	6
表2.2、空壓機規格	8
表2.3、訓練用運算平台規格表	10
表4.1、訓練集內容	28
表4.2、特徵提取架構階層表	31
表5.1、訓練集對於DOPE以及DOPE LITE的實驗表現結果表	39
參考文獻
[1]	A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Neural Information Processing Systems, vol. 1, pp. 1097-1105, 2012.
[2]	Y. Lecun, L. Bottou, Y. Bengio, and P. Haffne, “Gradient-based learning applied to document recognition,” Institute of Electrical and Electronics Engineers, vol. 86, no. 11, pp. 2278-2324, 1998.
[3]	C. Szegedy, C. W. Liu, C. Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
[4]	Y. Qian and P. C Woodland, “Deep convolutional neural networks for robust speech recognition,” IEEE Spoken Language Technology Workshop, pp. 481-488, 2016.
[5]	J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv, 1804.02767, 2018.
[6]	B. Leibe, J. Matas, N. Sebe, and M. Welling, “SSD: Single shot multibox detector,” European Conference on Computer Vision, pp. 21-37, 2016.
[7]	S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” Conference and Workshop on Neural Information Processing Systems, pp. 91-99, 2015.
[8]	R. Girshick, “Fast R-CNN,” International Conference on Computer Vision, pp. 1440-1448, 2015.
[9]	J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.

[10]	V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 2481- 2495, 2017.
[11]	O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 234-241, 2015.
[12]	K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” IEEE International Conference on Computer Vision, pp. 2980-2988, 2017.
[13]	L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” European Conference on Computer Vision, pp. 801-818, 2018.
[14]	J. Mahler, F. Pokorny, B. Hou, M. Roderick, M. Laskey, M. Aubry, K. Kohlhoff, T. Kroeger, J. Kuffner, and K. Goldberg, “Dex-net 1.0: A cloud-based network of 3d objects for robust grasp planning using a multi-armed bandit model with correlated rewards,” IEEE International Conference on Robotics and Automation, pp. 1957-1964,  2017.
[15]	J. Mahler, M. Matl, V. Satish, M. Danielczuk, B. DeRose, S. McKinley, and K. Goldberg, “Learning ambidextrous robot grasping policies,” Science Robotics, vol. 4, no. 26, 2019.
[16]	J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield, “Deep object pose estimation for semantic robotic grasping of household objects,” Conference on Robot Learning, pp. 1097-1105, 2018.
[17]	Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” Robotics: Science and Systems, 2018.
[18]	W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab. “SSD-6D: making RGB based 3D detection and 6D pose estimation great again” IEEE International Conference on Computer Vision, pp. 1521-1529, 2017.
[19]	C. Choi, Y. Taguchi, O. Tuzel, M.Y. Liu, and S. Ramaligam, “Voting-based pose estimation for robotic assembly using a 3D sensor,” International Conference on Robotics and Automation, pp. 1724-1731, 2012. 
[20]	B. Leibe, J. Matas, N. Sebe and M. Welling, “Real-time monocular segmentation and pose tracking of multiple objects,” European Conference on Computer Vision, pp. 423-438, 2016.
[21]	A. Krull, E. Brachmann, F. Michel, M.Ying Yang, S. Gumhold, C. Rother, “Learning analysis-by-synthesis for 6D pose estimation in RGB-D images,” IEEE International Conference on Computer Vision, pp. 954-962, 2015.
[22]	林建銘,基於深度學習之與意分割用於隨機物件夾取,淡江大學電機工程學系碩士論文(指導教授:翁慶昌),2016。
[23]	J. Tremblay, T. To, and S. Birchfield., “Falling things: A synthetic dataset for 3D object detection and pose estimation,” IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2119-2122, 2018.
[24]	V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” International Journal Computer Vision, vol. 81, no. 2, pp. 155-166, 2009.
[25]	蕭聖儒,基於ROS與SOPC之人形機器人的行走速度規劃,淡江大學電機工程學系碩士論文(指導教授:翁慶昌),2016。
[26]	P. Allgeuer, M. Schwarz, J. Pastrana, S. Schueller, M. Missura, S. Behnke, “A ROS-based software framework for the NimbRo-OP humanoid open platform,” International Conference on Humanoid Robots, 2013.
[27]	“ROS on the HUBO Humanoid Robot.” URL: http://wiki.ros.org/Robots/HUBO
[28]	I. Ha, Y. Tamura, H. Asama, J. Han, and D. W Hong, “Development of open humanoid platform DARwIn-OP,” SICE Annual Conference, 2011.
[29]	V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, “DeepIM: Deep iterative matching for 6D pose estimation,” European Conference on Computer Vision, pp. 695-711, 2018.
[30]	W. Chen, X. Danfei, Z. Yuke, M. M. Roberto, L. Cewu, F. F. Li, S. Silvio, “DenseFusion: 6D object pose estimation by iterative dense fusion,” Information Fusion, vol. 33, pp. 15-28, 2017.
[31]	M. Rad and V. Lepetit, “BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth,” IEEE International Conference on Computer Vision, pp. 3848-3856, 2017.
[32]	W. Shih-En, V. Ramakrishna, T. Kanade, Y. Sheikh, “Convolutional pose machines,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724-4732, 2016.
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信