§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0703201811263900
DOI 10.6846/TKU.2018.00222
論文名稱(中文) 基於深度學習之語義分割用於隨機物件夾取
論文名稱(英文) Semantic Segmentation for Random Object Picking Based on Deep Learning
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 106
學期 1
出版年 107
研究生(中文) 林建銘
研究生(英文) Chien-Ming Lin
學號 604470129
學位類別 碩士
語言別 英文
第二語言別
口試日期 2018-01-15
論文頁數 53頁
口試委員 指導教授 - 李世安(lishyhan@ee.tku.edu.tw)
指導教授 - 蔡奇謚(chiyi_tsai@mail.tku.edu.tw)
委員 - 夏至賢(chhsia625@gmail.com)
委員 - 翁慶昌(wong@ee.tku.edu.tw)
關鍵字(中) 深度學習
卷積類神經網路
語義分割
點雲
姿態估測
關鍵字(英) Deep Learning
Convolution Neural Networks
Semantic Segmentation
Point Cloud
Pose Estimation
第三語言關鍵字
學科別分類
中文摘要
本篇論文主要提出三點貢獻:(1)由於在訓練深度類神經網路時需要大量的訓練資料,而這些資料往往要由大量的人工與時間手動慢慢標記出來,整個流程曠日廢時,因此本論文設計一個自動產生標記資料的方法,利用GrabCut演算法及K-means演算法來自動標記資料。經由本論文提出的方法,只需要少數原始資料,就可以產生足量的訓練資料,並進行深度網路的訓練進行語義分割達到不錯的效果。(2)本篇論文使用一個基於深度神經網路及點雲影像在複雜場景下進行隨機物件夾取。首先我們使用卷積類神經網路進行語義分割,利用語義分割所得到的結果將目標物的點雲從場景取出來,最後利用FPFH特徵及RANSAC演算法計算夾取姿態。(3) 在運動學奇異點的問題上,本論文使用一個有效率的方法,來進行奇異點的偵測與迴避。在實驗結果中,本論文提出的方法可以成功的辨識物品及估測出夾取姿態,並利用7軸機械手臂進行隨機物件夾取的任務。
英文摘要
In this thesis, we have three main contributions as follows. (1) An automatic data generating scheme is proposed to automatically generate training data for training the deep neural network. Training a deep neural network requires a large amount of data, which usually is difficult to be obtained by manual craft and may cost plenty of time and laborious. In our proposed data generation scheme, we use GrabCut algorithm and K-means algorithm to label the image. By using this scheme, we can efficiently produce 7600 labeled data sets when there are only 200 original ones. (2) A deep learning and point cloud based work-flow is proposed to solve the problem of random object picking task in the clutter environment. We firstly build a deep Convolution Neural Networks (CNNs) to compute the pixel-wise probabilities of object in RGB image. Next, we use the pixel-wise probabilities obtained from the CNNs to extract the point cloud of in the scene and use Fast Point Feature Histogram (FPFH) and RANSAC algorithm to compute the 3D grasp pose of object. (3) An efficient method is utilized to detect and avoid the singularity in kinematic. In the end, we can successfully use a 7-degree-of-freedom manipulator to accomplish random object picking tasks in the clutter environment.
第三語言摘要
論文目次
Chinese Abstract I
English Abstract II
Table of Contents III
List of Figures V
List of Tables VI
1 Introduction 1
1.1 Background 1
1.2 Motivation 1
1.3 Organization 2
2 Relative Works and System Structure 3
2.1 Relative Works 3
2.2 Problem Statement 4
2.3 System Structure 5
3 Automatic Data Generation Scheme 7
3.1 Training Data 7
3.2 Semi-Automatic Data Generator 8
3.3 Fully-Automatic Data Generator 11
4 Object Detection and classification 15
4.1 Convolution Neural Networks (CNNs) 15 
4.2 Semantic Segmentation 18
4.3 Conditional Random Fields 20
5 Grasping Pose Estimation 23
5.1 Downsampling 24
5.2 Feature Detection 25
5.3 Grasping Pose Estimation 28
6 Robot Arm Manipulation 30
6.1 Transformation 30
6.2 Singularity Detection and Avoidance 32
7 Experimental Results 35
7.1 Hardware 35
7.2 The Accuracy of Object Segment Method in Fully-automatic Data Generator 38
7.3 Result of Grasping Pose Estimation 41
7.4 Performance of Manipulator Control with Singularity Detection and Avoidance 45
8 Conclusion and Future Works 47
References 48
Appendix I 52

List of Figures
2.1 Flowchart of the proposed system 6
3.1 The training data in this work 8
3.2 Example of GrabCut algorithm: Using Figure (a) as input to GrabCut algorithm. Figure (b) shows the label we denote to tell algorithm which part is foreground and background. Figure (c) shows the result of GrabCut algorithm 9
3.3 Example of undirected graph 10
3.4 The procedure to generate bounding box automatically 11
3.5 The result of GrabCut algorithm with only bounding box 12
3.6 The procedure of fully automatic data generating scheme 12
3.7 The Procedure of Data Augmentation 14
4.1 The procedure of object detection 15
4.2 Convolution neural networks 16
4.3 Example of convolutional layer 17
4.4 Result of semantic segmentation 18
4.5 Example of atrous convolution 20
4.6 Performance of atrous convolution 20
4.7 The result of semantic segmentation 21
4.8 The performance of CRFs 22
5.1 Pipeline of Object Estimation 24
5.2 The point cloud model before (left) and after (right) input to the voxel filter 25
5.3 The example of PFH 26
5.4 The example of FPFH 27
6.1 The procedure of Robot Arm Manipulation 30
6.2 Example of articulated arm at different singularity 32
7.1 Self-made 7-dof manipulator 36
7.2 2-dof pan-tilt platform 36
7.3 Intel RealSense SR300 37
7.4 Performance of CNNs in Semantic Segmentation 40
7.5 The input image and output result 40
7.6 The experimental environment and result 41
7.7 The comparison of trajectory with and without singularity avoidance 46
7.8: The comparison of the errors between the controlled methods with and without singularity avoidance 46

List of Tables
3.1 The color label of each corresponding object 7
7.1 The comparison of each methods 39
7.2 The estimation result of rotating the y-axis of the pan-tilt platform 42
7.3 The estimation result of rotating the z-axis of the pan-tilt platform 43
7.4 The mean error of pose estimation 44
7.5 The position of start point and goal point 44
7.6 The result without singularity avoidance 45
7.7 The result with singularity avoidance 45
參考文獻
[1] URL: IEEE news Assue win the Amazon Robotics Challenge: https://goo.gl/cgd4iH
[2] C. Choi, Y. Taguchi, O. Tuzel, M.Y. Liu, and S. Ramalingam, “Voting-based pose estimation for robotic assembly using a 3D sensor,” IEEE Internation Conference on Robotics and Automation(ICRA), p.p. 1724-1731, 2012
[3] Y.J. Huang, Y.C. Lai, R.J. Chen, C.Y. Tsai, and C.C. Wong, “A Deep Learning-Based Object Detection Algorithm Applied in Shelf-Picking Robot,” International Automatic Control conference (CACS), 2017
[4] C. Hernandez, M. Bharatheesha, W. Ko, H. Gaiser, J. Tan, K. van Deurzen, M. de Vries, B.V. Mil, J.V. Egmond, R. Burger, M. Morariu, J, Ju, X. Gerrmann, R. Ensing, J.V. Frankenhuuyzen, and M. Wisse, “Team delft’s robot winner of the amazon picking challenge 2016,” arXiv, 2016.
[5] S. Ren, K. He, R. Girshick, and J. SunFaster, “R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Advances in neural information processing systems, p.p. 91-99, 2015
[6] R. Jonschkowski, C. Eppner, S. Hofer, R.M. Martin, and O. Brock, “Probabilistic Multi-Class Segmentation for the Amazon Picking Challenge,” IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), p.p. 1-7, 2016
[7] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” arXiv preprint arXiv:1606.00915, 2017
[8] M. Zhu, K.G. Derpanis, Y. Yang, S. Brahmbhatt, M. Zhang, C. Phillips, M. Lecce, and K. Daniilidis, “Single Image 3D Object Detection and Pose Estimation for Grasping,” IEEE Internation Conference on Robotics and Automation(ICRA), p.p. 3936-3943, 2014
[9] C.H. Wu, S.Y. Jiang, and K.T. Song, “CAD-Based Pose Estimation for Random Bin-Picking of Multiple Object,” International Conference on Control, Automation and Systems(ICCAS), p.p 1645-1649, 2015
[10] A. Zeng, K.T. Yu, S. Song, D. Suo, E.W. Jr, A. Rodriguez, and J. Xiao, “Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge,” IEEE International Conference on Robotics and Automation(ICRA), p.p. 1386-1393, 2017
[11] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p.p. 3431-3440, 2015
[12] J.M. Wong, V. Kee, T. Le, S. Wagner, and G.L. Mariottini, “SegICP: Integrated Deep Semantic Segmentation and Pose Estimation,” arXiv, 2017
[13] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 12, 2017
[14] 蔡述翔,基於RGB-D攝影機之三維物體辨識與姿態估測系統設計與實現,淡江大學電機工程學系碩士論文(指導教授:蔡奇謚 博士),2016。
[15] E. Matsumoto, M. Saito, A. Kume, and J. Tan, ”End-to-End Learning of Object Grasp Poses in the Amazon Robotics Challenge,” 
[16] A. Torralba, B. C. Russell, and J. Yuen, “LabelMe: online image annotation and applications,” Proceedings of the IEEE, vol. 98, no. 8, pp. 1467–1484, 2010. 
[17] X. Giro-i-Nieto, N. Camps, and F. Marques, “GAT: a graphical annotation tool for semantic regions,” Multimedia Tools and Applications, vol. 46, no. 2-3, pp. 155–174, 2010. 
[18] C. Saathoff, S. Schenk, and A. Scherp, “Kat: the k-space annotation tool,” in Proceedings of the International Conference on Semantic and Digital Media Technologies, Koblenz, Germany, December 2008
[19] C. Rother, V. Kolmogorov, A. Blake, “GrabCut: Interactive Foreground Extraction using Iterated Graph Cuts,” ACM transactions on graphics (TOG), p.p. 309-314, 2004
[20] M. Tang, L. Gorelick, O. Veksler, and Y. Boykov, “GrabCut in One Cut,” Proceedings of “International Conference on Computer Vision (ICCV), pp. 1769-1776, 2013
[21] M. Tang, L. Gorelick, O. Veksler, and Y. Boykov, “Secrets of GrabCut and Kernel K-means,” Proceedings of “International Conference on Computer Vision (ICCV), pp. 1555-1563, 2015
[22] URL: Performing Convolution Operations: https://goo.gl/6A3XJs
[23] J. Redmon, and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” arXiv preprint arXiv:1612.08242, 2016
[24] R.B. Rusu, N. Blodow, Z.C. Marton, and M. Beetz, “Aligning point cloud views using persistent feature histograms,” IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), p.p. 3384-3391, 2008
[25] R.B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (FPFH) for 3D registration,” IEEE Internation Conference on Robotics and Automation(ICRA), p.p. 3212-3217, 2009
[26] A.G. Buch, D. Kraft, J.K. Kamarainen, “Pose Estimation using Local Structure-Specific Shape and Appearance Context,” IEEE Internation Conference on Robotics and Automation(ICRA), p.p. 2080-2087, 2013
[27] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” International Conference on Learning Representations (ICLR), 2015
[28] A. Zeng, et al., “Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching,” arXiv, 2017
[29] R.B. Rusu, Z.C. Marton, N. Blodow, M. Beetz, “Persistent point feature histograms for 3D point clouds,” International Conference on Intelligent Autonomous System, pp. 119-128, 2008
[30] A. Garcia-Garica, S. Otrs-Escolano, S. Oprea, V. Villena-Martinez, J. Garcia-Rodriguez, “A Review on Deep Learning Techniques Applied to Semantic Segmentation,” arXiv ,2017.
[31] 賴宥澄,七自由度冗餘機械手臂的系統開發與運動控制設計,淡江大學電機工程學系碩士論文(指導教授:翁慶昌 博士),2016。
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信