§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1809201915514500
DOI 10.6846/TKU.2019.00561
論文名稱(中文) 基於深度學習之深度估測與人車語意分割融合方法的實現及其在道路環境三維重建之應用
論文名稱(英文) Implementation of a deep learning based depth estimation and pedestrian-vehicle semantic segmentation fusion method and its applications to 3D reconstruction of road environment
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系機器人工程碩士班
系所名稱(英文) Master's Program In Robotics Engineering, Department Of Electrical And Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 107
學期 2
出版年 108
研究生(中文) 魏華廷
研究生(英文) Hua-Ting Wei
學號 604470343
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2019-07-23
論文頁數 42頁
口試委員 指導教授 - 蔡奇謚
委員 - 周永山
委員 - 王偉彥
關鍵字(中) 自動駕駛
深度學習
深度估測
語意分割
三維重建
關鍵字(英) Autonomous Driving
Deep Learning
Depth estimation
Semantic segmentation
3D reconstruction
第三語言關鍵字
學科別分類
中文摘要
在自駕車的系統開發領域中,車輛的周圍環境感知技術是關鍵的核心之一,其不但需要量測障礙物的空間位置,也需要判斷障礙物的種類,如此才能確保自駕車對於周遭環境的理解並執行安全的避障行為。為了達到車輛環境感知的目的,現今的自駕車系統大多以光達及雷達的資訊整合來獲得周圍物體的空間位置資訊,但此作法不但成本昂貴,且無法提供周圍物體類別,例如行人或車輛物體。本論文的目的即提出一個以深度學習為基礎的視覺深度估測及語意分割系統,其透過單眼攝影機所提供的輸入影像來進行深度估測及行人與車輛的物件分割處理,並融合此兩資訊來求得車輛周圍人與車輛的三維空間資訊,使得自駕車能在安全範圍內做出正確的避障策略。本論文所提出的方法不但可以輔助自駕車系統進行道路上的物件深度估測與辨識,也可達到降低感測系統成本的目的。此外,本論文因需要台灣道路數據及作為深度估測網路的訓練資料,我們使用了Zed立體視覺攝影機實地在台灣道路拍攝進行訓練與測試數據集的收集。在語意分割網路的訓練上則是使用Cityscape數據集,並只擷取了行人、汽車的標籤進行訓練。最後,統整了以上數據分別訓練並測試深度網路模型後,所提出系統可順利獲得道路上人車辨識結果及重建出相對於相機的三維空間資訊。
英文摘要
In the field of system development of self-driving vehicles, the technology of surrounding environment sensing of the vehicle is one of the key cores. It not only needs to measure the spatial position of the obstacle, but also needs to recognize the type of the obstacle. In this way, we can ensure the self-driving vehicle understanding of the surrounding environment and implement safe obstacle avoidance behavior. To achieve the purpose of road environment perception, most of the modern self-driving systems fuse LIDAR and radar information to obtain spatial position information of surrounding objects. However, this method not only is expensive, but also cannot recognize object categories, such as pedestrians or vehicles. The purpose of this thesis is to propose a deep learning based depth estimation and semantic segmentation system, which uses a single image to perform depth estimation and pedestrian-vehicle semantic segmentation processes and fuses both information to reconstruct 3D information of the pedestrians and vehicles around the self-driving vehicle. The proposed method can not only assist the self-driving system to estimate and identify the object distance and object type on the road, but also reduce the cost of the sensing system. In addition, due to the requirement for Taiwan on-road scene dataset to train the depth estimation network, we used the Zed stereo camera to collect training and testing datasets in Taiwan. In the training of the semantic segmentation network, the Cityscape dataset was used, and only the labels of pedestrians and cars were used in training. After training and testing the deep neural network model using our-own dataset, the proposed system can successfully obtain the identification results of pedestrians and vehicles on the road and reconstruct the 3D information of these objects relative to the camera.
第三語言摘要
論文目次
目錄
中文摘要.........................................................................I
英文摘要........................................................................II
目錄...............................................................................IV
圖目錄...........................................................................VI
表目錄..........................................................................VII
第一章 序論....................................................................1
  1.1研究背景................................................................1
  1.2 研究動機與目的...................................................2
  1.3論文架構................................................................3
第二章 相關研究............................................................6
  2.1基於深度學習之立體影像深度估測方法....................................6
  2.2 基於深度學習之語意分割...................................9
  2.3三維重建................................................................9
  2.4文獻總結..............................................................10
第三章 視差估測及語意分割......................................11
  3.1使用圖像重建做視差估測..................................11
  3.2視差估測網路......................................................13
  3.3訓練損失..............................................................15
  3.4語意分割需解決之問題......................................17
  3.5 PSPNet.................................................................18
第四章 視差以及語意分割之統合......................21
  4.1人車之切割..........................................................21
  4.2深度估測..............................................................22
  4.3三維重建..............................................................25
第五章 實驗結果與分析..............................................28
  5.1軟硬體介紹......................................................28
  5.2訓練資料..............................................................29
  5.3測試結果..............................................................29
  5.4平均運算速度......................................................38
第六章 結論與未來展望..............................................39
參考文獻........................................................................40







圖目錄
圖1.1論文架構...............................................................4
圖2.1複數影像估測.......................................................8
圖3.1 損失模塊............................................................12
圖3.2損失架構圖.........................................................14
圖3.3 monodepth全卷基網路架構..............................15
圖3.4 金字塔池化網路................................................20
圖4.1人車語意切割......................................................22
圖4.2相機內部參數訊息..............................................23
圖4.3人車切割之深度估測..........................................25
圖4.4世界座標及對映影像點之關係..........................25
圖4.5 3D點雲資訊圖....................................................27
圖5.1 ZED立體視覺攝影機.........................................28
圖5.2無語意分割深度估測..........................................31
圖5.3語意分割深度估測..............................................35
圖5.4淡江校區之道路..................................................37


表目錄
表4.1 ZED立體視覺攝影機解析度與焦距.................24
表5.1 ZED立體視覺攝影機技術規格.........................28
表5.2測試範圍之平均誤差..........................................30
表5.3處理之平均速度..................................................38
參考文獻
[1]	R. Memisevic and C. Conrad. Stereopsis via deep learning. In NIPS Workshop on Deep Learning, volume 1, 2011.
[2]	F. H. Sinz, J. Q. Candela, G. H. Bakır, C. E. Rasmussen, and M. O. Franz. Learning depth from stereo. In Pattern Recognition, pages 245–252. Springer, 2004.
[3]	R. Szeliski. Structure from motion. In Computer Vision, Texts in Computer    Science, pages 303–334. Springer London, 2011.
[4]	R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah. Shape-from-shading: a survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 21(8):690–706, 1999.
[5]	S. Suwajanakorn and C. Hernandez. Depth from focus with your mobile phone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[6]	D. Xu, E. Ricci, W. Ouyang, X. Wang and N. Sebe, "Multiscale Continuous CRFs as Sequential Deep Networks for
Monocular Depth Estimation," 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Honolulu,
HI, 2017, pp. 161-169
[7]	J. Xie, R. Girshick, and A. Farhadi, "Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks," European Conference on Computer Vision. Springer, Cham, 2016.
[8]	C. Godard, O. M. Aodha and G. J. Brostow, "Unsupervised Monocular Depth Estimation with Left-Right Consistency," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6602-6611.
[9]	A.Krizhevsky, I. Sutskever, and G.Hinton,”ImageNet classification with deep convolutional neural networks,” Neural Information Processing System, pp.1106-1114, 2012
[10]	Y. Furukawa and C. Hernandez. Multi-view stereo: A tutorial. ´ Foundations and Trends in Computer Graphics and Vision, 2015.
[11]	R. Ranftl, V. Vineet, Q. Chen, and V. Koltun. Dense monocular depth estimation in complex dynamic scenes. In CVPR, 2016.
[12]	A. Abrams, C. Hawley, and R. Pless. Heliometric stereo: Shape from sun position. In ECCV, 2012.
[13]	L. Ladicky, C. H ` ane, and M. Pollefeys. Learning the matching ¨ function. arXiv preprint arXiv:1502.00652, 2015
[14]	W. Luo, A. Schwing, and R. Urtasun. Efficient deep learning for stereo matching. In CVPR, 2016.
[15]	D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In ICCV, 2015.
[16]	R. Garg, V. Kumar BG, and I. Reid. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In ECCV, 2016.
[17]	J. Flynn, I. Neulander, J. Philbin, and N. Snavely. Deepstereo: Learning to predict new views from the world’s imagery. In CVPR, 2016.
[18]	J. Xie, R. Girshick, and A. Farhadi. Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In ECCV, 2016.
[19]	M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. In NIPS, 2015
[20]	J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
[21]	R. Kopper, F. Bacim, and D. A. Bowman. Rapid and accurate 3d selection by progressive refinement. 2011.
[22]	http://host.robots.ox.ac.uk/pascal/VOC/
[23]	http://cocodataset.org/#home
[24]	https://www.cityscapes-dataset.com/
[25]	P. Heise, S. Klose, B. Jensen, and A. Knoll. Pm-huber: Patchmatch with Huber regularization for stereo matching. In ICCV, 2013.
[26]	Zhao, H.; Shi, J.; Qi, X.;Wang, X.; Jia, J. Pyramid scene parsing network. arXiv preprint arXiv:1612.01105 2016.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信