§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1706202115205100
DOI 10.6846/TKU.2021.00372
論文名稱(中文) 卷積神經網絡的資料集設計對工地人員姿勢識別之影響
論文名稱(英文) The influence of data set design of convolutional neural network on posture recognition of construction site personnel
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 土木工程學系碩士班
系所名稱(英文) Department of Civil Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 109
學期 2
出版年 110
研究生(中文) 商慕釩
研究生(英文) Mu-Fan Shang
學號 609380042
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2021-06-03
論文頁數 94頁
口試委員 指導教授 - 葉怡成(140910@mail.tku.edu.tw)
委員 - 蔡明修(mht@mail.tku.edu.tw)
委員 - 連立川(lclien@cycu.edu.tw)
委員 - 葉怡成(140910@mail.tku.edu.tw)
關鍵字(中) 深度學習
YOLO
工地人員姿態識別
資料集設計
關鍵字(英) deep learning
YOLO
posture recognition of construction site personnel
dataset design
第三語言關鍵字
學科別分類
中文摘要
隨著基礎設施建設的不斷發展,建築行業的安全生產理念逐步得到推廣。近年來,基於深度學習的趨勢,使得影像識別獲得突破性的進展。因此,從監控視頻畫面中自動識別工人的行為,保障工人的安全,變得可行。但以往很少研究探討識別行人姿勢種類。因此,本研究以YOLO V4的深度學習模型識別工地環境中人員之站立(Standing person)、彎腰(Bending over person)、蹲下(Squatting person)三類姿勢。為了提升準確度,除了收集現有的工地影像數據集,為了使數據集更多元化,以及探討不同特性數據集所建構的模型的識別能力,本研究還自行建構兩種影像數據集 (1) 設計影像數據集:找來不同的人,在不同場地背景、拍攝距離、角度,以不同姿勢拍攝。(2) 自然影像數據集:在校園與工地類似的不同場地背景,包括測量實習、材料實驗室,拍攝不同群體在自然工作中的集體影像。上述三種影像數據集共890張影像,以人工標註得到行人樣本:站立2144、彎腰489、蹲下697。此外,還利用設計、自然影像數據集的樣本混合成具有等量樣本的混合數據集,以及具有二倍樣本的二倍混合數據集。以上述五種影像數據集建構Yolo V4識別模型,為了避免過度學習,訓練時,均分成80%為訓練集,20%為驗證集。各自得到最優模型(基於驗證集),再使用工地數影像數據集做為測試集,以評估何種數據集能訓練出對工地行人姿勢有最佳識別效果的模型。最後經由 mAP(平均精度)分析得出五種影像數據集的排序:兩倍混合(70.03%)>自然(65.77%)>工地(63.34%)>混合(60.50%)>設計(29.32%)。可知 (1) 樣本最多的兩倍混合數據集表現最佳,可見樣本數量十分關鍵。(2) 由mAP自然>工地>設計可知,自然數據集表現最佳,可見校園數據集略優於工地數據集,這可能是因為工地數據集中的部分行人樣本太小,解析度太差,或遮蔽過多,因此不利於學習。與工地環境相似但不相同的校園數據集反而因拍攝的解析度較高,利於學習。(3) 自然、工地數據集均遠高於設計數據集。仔細分析發現,設計數據集在訓練集、驗證集的表現極佳,因此設計數據集出現了嚴重的過度學習現象。原因可能是它的影像中通常只有少數一兩人,影像較大,無遮蔽,因此容易學習,故在訓練集、驗證集的表現極佳,但遇到差異大(解析度小、遮蔽嚴重)的工地樣本,表現很差。(4) 混合數據集的mAP介於自然、設計數據集,顯示這兩種數據集不具有互補的綜效。總之,自然數據集是最理想的訓練素材。
英文摘要
With the continuous development of infrastructure construction, the concept of safe production in the construction industry has been gradually promoted. In recent years, the trend of deep learning has led to breakthroughs in image recognition. Therefore, it becomes feasible to automatically identify workers' behavior from the surveillance video screen to ensure their safety. But in the past, few studies have explored the types of pedestrian posture recognition. Therefore, this study used the deep learning model of YOLO V4 to identify three types of person posture, including standing, bending over, and squatting, in the construction site environment. To improve the accuracy, in addition to collecting existing site image datasets, this study also constructed two image datasets to diversify the datasets and to investigate the recognition capability of the models constructed by different characteristic datasets (1) Designed image dataset: let different people too be as models, and take pictures under different backgrounds, shooting distances and angles, for different postures. (2) Nature image dataset: take images on different groups working in nature in different sites similar to construction sites in campus, including surveying practice and materials laboratory. Then a total of 890 images from the above three image datasets were manually annotated to obtain pedestrian samples: 2144 of standing person, 489 of bending over person, and 697 of squatting person. In addition, samples from the designed and nature image datasets are blended into a mixed dataset with equal number of samples, and a double mixed dataset with double samples. The Yolo V4 recognition model was constructed using the above five image datasets. To avoid over-learning, each dataset was divided into 80% for the training set and 20% for the validation set. The best models (based on the validation set) were obtained for each model, and then a construction site image dataset is used to evaluate which dataset can build the best model for image recognition of construction site pedestrian posture. Finally, the ranking of the five image datasets was obtained from the mAP (mean Average Precision) analysis: double mixed (70.03%) > nature (65.77%) > construction site (63.34%) > mixed (60.50%) > designed (29.32%). It can be seen that (1) The double mixed dataset with the largest number of samples performs best, which shows that sample size is critical. (2) According to mAP, Nature dada set performed slightly better than Construction Site dataset. This may be because some of the pedestrian samples in the construction site dataset are too small, too poorly resolved, or too much obscured, and thus not conducive to learning. The campus dataset, which is similar to but not the same as the construction site environment, is, on the contrary, better for learning because it is captured at a higher resolution. (3) The nature and construction site datasets were much better than the designed dataset. A careful analysis reveals that the designed dataset performs extremely well in the training dataset and validation dataset, but poorly in testing dada set, so the designed dataset shows a serious over-learning phenomenon. The reason may be that it usually has only one or two persons in the image, and the person image is large and unobscured, so it is easy to learn, and performs extremely well in the training dataset and validation dataset, but performs poorly when encountering construction site samples with large differences, small resolution and severe obscuration. (4) The mAP of the mixed dataset is intermediate between the nature dataset and designed dataset, showing that these two datasets do not have complementary synthesis effects. In summary, natural dataset is the most ideal training material in this study.
第三語言摘要
論文目次
圖目錄 IV
表目錄 VII
第一章 導論 1
1.1研究動機 1
1.2研究目的 2
1.3研究方法 2
1.4研究內容 2
第二章 文獻回顧 4
2.1物件識別(Object detection) 4
2.1.1傳統偵測方法 4
2.1.2深度學習偵測方法 6
2.2深度學習(Deep learning) 6
(1)類神經網絡的基本架構 7
(2)深度學習的基本架構:卷積神經網絡 8
(3)學習演算法:損失函數(loss function) 10
(4)卷積神經網絡演算法的演進:R-CNN與YOLO 11
2.3 YOLO (You Only Look Once)檢測算法 12
2.3.1 Yolo V1 12
2.3.2 Yolo V2與Yolo 9000 17
2.3.3 Yolo V3 18
2.3.4 Yolo V4 18
2.4工地行人偵測 19
2.5行人姿勢偵測 22
2.5.1傳統行人姿勢識別方法 22
2.5.2深度學習行人姿勢識別方法 22
2.6結語 23
第三章 研究方法 25
3.1 前言 25
3.2 數據集準備 26
(1)自然數據集 26
(2)設計數據集 26
(3)工地數據集 27
3.3標註(label)資料 28
3.4 訓練數據集(data set) 30
(1) 建立環境 30
(2) 構建YOLO V4模型 31
(3) 數據集配置訓練 40
3.5 測試評估 41
第四章 研究結果 43
4.1 數據集 43
(1)自然數據集 43
(2)設計數據集 45
(3)工地數據集 47
(4)混合數據集 48
4.2 數據集的處理 49
4.3 訓練網絡 51
4.4 模型驗證:訓練集與驗證集 54
4.5 模型測試:工地影像定量測試 59
4.6 模型測試:工地影像定性測試 61
4.7 模型誤判影像的分析 68
4.8 對比結果 71
4.9交叉驗證 73
4.10討論 87
第五章 結論與建議 89
5.1 結論 89
5.2 建議 91
參考文獻 92

圖目錄
圖2-1梯度方向直方圖流程 4
圖2-2梯度方向直方示意圖 5
圖2-3支持向量機之超平面空間示意圖 5
圖2-4影像辨識 6
圖2-5完全連接前饋式網絡(fully connected feedforward network)	 7
圖2-6單一神經元運作流程 8
圖2-7CNN概念圖 9
圖2-8卷積核運算過程 9
圖2-9最大池化運算 10
圖2-10 YOLO V1網絡結構圖 13
圖2-11(a)圖片網格劃分(b)IOU計算 13
圖2-12 Bounding boxes運作輸出 14
圖2-13 NMS處理方式 15
圖2-14各種姿勢檢測 23
圖3-1 研究方法及過程示意圖 25
圖3-2自然狀態下的行人 26
圖3-3設計過的行人姿勢 27
圖3-4網路上之工地人員照片 28
圖3-5 labelImg操作畫面 28
圖3-6類別名稱和yolo格式檔案 29
圖3-7 Colab操作頁面 31
圖3-8 YOLO代碼庫	31
圖3-9 YOLO V4網絡結構(部分顯示) 33
圖3-10 [net]層參數 35
圖3-11 zero padding作用 36
圖3-12 [convolutional]層參數 36
圖3-13 Mish激活函數 37
圖3-14 route層layers編號方式 37
圖3-15 [route]層參數 38
圖3-16 shortcut層from編號方式 38
圖3-17 [shortcut]層參數 38
圖3-18 upsample作用 39
圖3-19 [upsample]層參數 39
圖3-20 mask和anchors的參數關係 39
圖3-21 IoU演進 40
圖3-22 [yolo]層參數 40
圖3-23訓練、驗證、測試的作用 41
圖3-24 PR curve 42
圖3-25偵出率、誤警率 42
圖4-1 室內Data2 44
圖4-2室內Data3 45
圖4-3為設計設計之data 46
圖4-4 工地數據集 48
圖4-5 data路徑檔 49
圖4-6為訓練設計數據集時的資料夾路徑 50
圖4-7 [yolo]層修改之參數 51
圖4-8為自然數據集儲存的所有權重檔 51
圖4-9 loss function、mAP圖 53
圖4-10各數據集測試結果對比 55
圖4-11各數據集圖像檢測效果 57
圖4-12各數據集圖像檢測效果 59
圖4-13各數據集測試工地data結果對比 60
圖4-14各數據集圖像檢測效果 63
圖4-15各數據集圖像檢測效果 64
圖4-16各數據集圖像檢測效果 66
圖4-17各數據集圖像檢測效果 67
圖4-18兩倍混合數據集圖像檢測效果 71
圖4-19工地數據集研究過程示意圖 74
圖4-20工地數據集之loss function、mAP圖 75
圖4-21各數據集測試結果對比 76
圖4-22各數據集測試結果對比 78
圖4-23各數據集總評估結果對比 80
圖4-24工地數據集圖像檢測正確效果 81
圖4-25工地數據集圖像檢測失敗效果 83
圖4-26設計data之圖像檢測 83
圖4-27自然data之圖像檢測 84
圖4-28工地數據集圖像檢測效果 85
圖4-29工地數據集圖像檢測效果 86
圖4-30工地數據集圖像檢測效果 86

表目錄
表2-1工地行人偵測之相關文獻 21
表3-1設計數據集的產生 27
表4-1數據集統計 43
表4-2自然數據集統計 43
表4-3設計數據集統計 45
表4-4工地數據集統計 47
表4-5混合數據集各姿勢label數 49
表4-6最終loss值、mAP值 54
表4-7各數據集內部評估結果 56
表4-8各數據集工地評估結果 61
表4-9最終loss值、mAP值 75
表4-10各數據集內部評估結果 77
表4-11各數據集內部評估結果 79
參考文獻
[1]Hu, J., Gao, X., Wu, H., & Gao, S. (2019). Detection of Workers Without the Helments in Videos Based on YOLO V3. 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). doi:10.1109/cisp-bmei48845.2019.8966045
[2]Nath, N. D., & Behzadan, A. H. (2020). Deep Convolutional Networks for Construction Object Detection Under Different Visual Conditions. Frontiers in Built Environment, 6. doi:10.3389/fbuil.2020.00097
[3]Wu, F., Jin, G., Gao, M., HE, Z., & Yang, Y. (2019). Helmet Detection Based On Improved YOLO V3 Deep Model. 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC). doi:10.1109/icnsc.2019.8743246
[4]Zhang, X., Zhang, L., & Li, D. (2019). Transmission Line Abnormal Target Detection Based on Machine Learning YOLO V3. 2019 International Conference on Advanced Mechatronic Systems (ICAMechS). doi:10.1109/icamechs.2019.8861617
[5]Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
[6]Girshick, R. (2015). “Fast R-CNN,” in IEEE International Conference on Computer Vision (Santiago), 1440–1448. doi: 10.1109/ICCV.2015.169
[7]Ren, S., He, K., Girshick, R., and Sun, J. (2017). Faster R-CNN: towards realtime object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149. doi:10.1109/TPAMI.2016.2577031
[8]He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017). “Mask R-CNN,” in IEEE International Conference on Computer Vision (ICCV) (Venice), 2961–2969. doi: 10.1109/ICCV.2017.322
[9]Girshick, R. (2015). Fast r-cnn. Computer Science.
[10]Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
[11]Redmon, J., and Farhadi, A. (2018). YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767.
[12]Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.
[13]Leng, J.; Liu, Y. An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput. Appl. 2018. [CrossRef]
[14]Hu, J., Gao, X., Wu, H., & Gao, S. (2019). Detection of Workers Without the Helments in Videos Based on YOLO V3. 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). doi:10.1109/cisp-bmei48845.2019.8966045.
[15]Li, S., Zhao, X., & Zhou, G. (2019). Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Computer-Aided Civil and Infrastructure Engineering. doi:10.1111/mice.12433.
[16]Islam, M. M. M., & Kim, J.-M. (2019). Vision-Based Autonomous Crack Detection of Concrete Structures Using a Fully Convolutional Encoder–Decoder Network. Sensors, 19(19), 4251. doi:10.3390/s19194251.
[17]Nath, N. D., Behzadan, A. H., & Paal, S. G. (2020). Deep learning for site safety: Real-time detection of personal protective equipment. Automation in Construction, 112, 103085.
[18]Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1, No. 2). Cambridge: MIT press.
[19]Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). Ieee.
[20]Noble, W. S. (2006). What is a support vector machine?. Nature biotechnology, 24(12), 1565-1567.
[21]Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[22]Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
[23]Park, M.-W., Palinginis, E., & Brilakis, I. (2012). Detection of Construction Workers in Video Frames for Automatic Initialization of Vision Trackers. Construction Research Congress 2012. doi:10.1061/9780784412329.095.
[24]Memarzadeh, M., Golparvar-Fard, M., & Niebles, J. C. (2013). Automated 2D detection of construction equipment and workers from site video streams using histograms of oriented gradients and colors. Automation in Construction,32,24–37. doi:10.1016/j.autcon.2012.12.002.
[25]Nimmo, J., & Green, R. (2017). Pedestrian avoidance in construction sites. 2017 International Conference on Image and Vision Computing New Zealand (IVCNZ). doi:10.1109/ivcnz.2017.8402499.
[26]Neuhausen, M., Teizer, J., & König, M. (2018). Construction Worker Detection and Tracking in Bird's-Eye View Camera Images. In ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction (Vol. 35, pp. 1-8). IAARC Publications.
[27]FU, J. T., CHEN, Y. L., & CHEN, S. W. (2018). Design and Implementation of Vision Based Safety Detection Algorithm for Personnel in Construction Site. DEStech Transactions on Engineering and Technology Research, (ecar).
[28]Tang, C., Yang, F., & Yu, X. (2020). Detection of Pedestrians and Helmets in Large Construction Site. American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS), 71(1), 220-228.
[29]Son, H., Choi, H., Seong, H., & Kim, C. (2019). Detection of construction workers under varying poses and changing background in image sequences via very deep residual networks. Automation in Construction, 99, 27-38.
[30]Li, Q., Ding, X., Wang, X., Chen, L., Son, J., & Song, J. Y. (2021). Detection and Identification of Moving Objects at Busy Traffic Road based on YOLO v4. The Journal of the Institute of Internet, Broadcasting and Communication, 21(1), 141-148.
[31]Li, J., & Wu, Z. (2021, April). The Application of Yolov4 And A New Pedestrian Clustering Algorithm to Implement Social Distance Monitoring During The COVID-19 Pandemic. In Journal of Physics: Conference Series (Vol. 1865, No. 4, p. 042019). IOP Publishing.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信