§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1806202412493700
DOI 10.6846/tku202400247
論文名稱(中文) 使用CLAHE預處理和改良式YOLOv5的水下物件偵測
論文名稱(英文) Underwater Object Detection Using CLAHE Preprocessing And Improved YOLOv5
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 112
學期 2
出版年 113
研究生(中文) 陳威昊
研究生(英文) Wei-Hao Chen
學號 611410183
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2024-06-03
論文頁數 38頁
口試委員 口試委員 - 張傳育(chuanyu@yuntech.edu.tw)
口試委員 - 林其誼(chiyilin@mail.tku.edu.tw)
指導教授 - 許輝煌(hsu@gms.tku.edu.tw)
關鍵字(中) 機器學習
水下物件偵測
YOLO
CLAHE演算法
CA注意力機制
自適應空間特徵
關鍵字(英) Machine Learning
Underwater Object Detection
YOLO
CLAHE Algorithm
CA Attention Mechanism
Adaptive Spatial Features
第三語言關鍵字
學科別分類
中文摘要
深度學習在機器學習領域當中佔據了其中非常重要的一環,而深度學習指的是透過模擬人腦神經所構造出來的神經網路架構,在資料資源以及計算資源急劇提升的世代,深度學習也因此得到了日新月異的發展,而在深度學習的應用當中,物件偵測是深度學習領域非常重要的一種應用。物件偵測的意思是在圖像或是影像當中辨識以及定位目標物件,優秀的物件偵測模型不但能判斷出圖像當中有哪些物件還要精準標識出物件的位置,而物件偵測之所以這麼重要是因為能有效地運用在各大應用上面,例如:攝影監控、圖像檢索、自動駕駛、機器人操作等。早期的物件偵測依賴傳統的圖像處理以及機器學習技術,例如:霍夫轉換(Hough Transform)、支持向量機(Support Vector Machines)。
這些傳統的物件偵測方法雖然能成功偵測目標,但在面對複雜場景以及大規模資料的情況,其偵測能力以及速度會明顯下降許多。而在深度學習的發展下,O’Shea, K等人設計出了卷積神經網路(Convolutional Neural Networks, CNNs),CNNs在圖像處理領域當中表示了強大的特徵提取能力,物件偵測領域也從此開始大幅度
iii
的進步,許多研究學者開始將CNNs加入至物件偵測模型當中,其優異的表現使得出現了一系列基於深度學習的物件偵測演算法,例如:R-CNN、Fast R-CNN、YOLO(You Only Look Once)等。
這些基於深度學習的物件偵測演算法,能夠自動學習到圖像中複雜的特徵表現,且與傳統方法相比也體現出更高的偵測精準度和速度。而其中,YOLO系列更是表現出即時偵測的特性,且廣泛應用在各大領域當中。
水下物件偵測是近年來物件偵測領域當中的一大挑戰,水下物件偵測常遇到影像模糊、色彩失真以及對比度低等影響,再加上水下物件通常體積較小,故會發生水下物件偵測不良的問題。本論文設計了一個基於CLAHE預處理和改良式YOLOv5s的網路模型,接著首先圖像預處理的部分使用CLAHE演算法來增強圖像對比度,使對比度低和顏色失真的問題有所改善。接著使用CA注意力機制模塊加入到YOLOv5s的骨幹層,目的是增強關注物件特徵的學習能力。最後在預測層加入自適應空間特徵融合模塊(ASFF),目的是增強網路模型在不同尺度下對關注物件的感知能力,有助於提高整體物件偵測的效能以及偵測指標。最後實驗結果的部分,本論文採用了精確度(Precision)、召回率(Recall)、平均精確度(mAp@0.5)、平均精確度(mAp@0.5-0.95)作為評量指標,並與其他水下偵測模型進行比較以及進行消融實驗來評斷單獨模塊的能力,實驗結果表示最優的表現分別85.1%、87.6%、90.1%、66.9%,與其他水下偵測模型相比證明此模型是有效的,而透過消融實驗結果表示加入的模塊是可以處理水下物件偵測所遇到影像模糊、色彩失真以及對比度低的問題,並提高水下物件偵測的精確度。
英文摘要
Deep learning plays a very important role in the field of machine learning, and it refers to the neural network architecture constructed by simulating human brain neurons. In the era of rapidly increasing data and computing resources, deep learning has also made rapid progress. In the application of deep learning, object detection is a very important application in the field of deep learning. The meaning of object detection is to identify and locate target objects in an image or image. Excellent object detection models can not only determine which objects are in the image but also accurately identify their positions. The reason why object detection is so important is because it can be effectively applied in various applications, such as photography monitoring, image retrieval, autonomous driving, robot operation, etc. Early object detection relied on traditional image processing and machine learning techniques, such as Hough Transform and Support Vector Machines.
Although these traditional object detection methods can successfully detect targets, their detection ability and speed will significantly decrease in the face of complex scenes and large-scale data. With the development of deep learning, O'Shea, K et al. designed Convolutional Neural Networks, CNNs represent powerful feature extraction capabilities in the field of image processing, and object detection has made significant progress since then. Many researchers have begun to incorporate CNNs into object detection models, and their excellent performance has led to a series of deep learning based object detection algorithms, such as R-CNN Fast R-CNN, YOLO (You Only Look Once), etc.

These deep learning based object detection algorithms can automatically learn complex feature representations in images and demonstrate higher detection accuracy and speed compared to traditional methods. And among them, The YOLO series exhibits real-time detection characteristics and is widely used in various fields.  Underwater object detection has become a major challenge in the field of object detection in recent years. Underwater object detection often encounters issues such as image blurring, color distortion, and low contrast. In addition, underwater objects are usually small in size, which can lead to poor detection of underwater objects. This thesis designs a network model based on CLAHE preprocessing and improved YOLOv5s. Firstly, the image preprocessing part uses CLAHE algorithm to enhance image contrast, which improves the problems of low contrast and color distortion. Next, the CA attention mechanism module is added to the backbone layer of YOLOv5s to enhance the learning ability of object features. Finally, an adaptive spatial feature fusion module (ASFF) is added to the prediction layer to enhance the network model's perception ability of objects of interest at different scales, which helps to improve the overall performance of object detection and detection metrics. In the final experimental results section, this thesis adopts Precision, Recall, and Average Precision( mAp@0.5 )Average accuracy( mAp@0.5-0.95 )As an evaluation indicator, and compared with other underwater detection models and conducted ablation experiments to evaluate the ability of individual modules, the experimental results showed that the optimal performance was 85.1%, 87.6%, 90.1%, and 66.9%, respectively. Compared with other underwater detection models, this model proved to be effective. The results of ablation experiments showed that the added module can handle the problems of image blur, color distortion, and low contrast encountered in underwater object detection, and improve the accuracy of underwater object detection.
第三語言摘要
論文目次
第一章	緒論	1		
1.1	研究背景與動機	1		
1.2	研究目的	2		
第二章	文獻探討	3		
2.1	水下物件偵測之相關研究	3		
2.2	YOLO系列之相關研究	5		
第三章	基於改良YOLOv5s之水下物件偵測模型	10		
3.1	改良式YOLOv5s之架構	10		
3.2	CLAHE演算法	14		
3.3	CA注意力機制模塊	16		
3.4	自適應空間特徵融合(ASFF)	19		
第四章	實驗結果與分析	22		
4.1	資料集介紹	22		
4.2	評量指標	23		
4.3	實驗配置	23		
4.4	實驗結果與分析	25		
第五章	消融實驗	28		
5.1	消融實驗介紹	28		
5.2	消融實驗結果與分析	29		
第六章	結論與未來發展	33		
6.1	結論	33		
6.2	未來發展	34		
參考文獻	35	
		
圖目錄				
圖	1:水下光學圖像模糊以及霧化問題	4		
圖	2:YOLOv4簡易架構	7		
圖	3:SPP模塊架構圖	7		
圖	4:YOLOv5架構圖	10		
圖	5:BottleNeck架構差異	11		
圖	6:C3_1模塊架構圖	12		
圖	7:C3_2模塊架構圖	12		
圖	8:SPPF模塊架構圖	13		
圖	9:改良式YOLOv5s之架構圖	14		
圖	10:CA注意力模塊架構圖	17		
圖	11:自適應空間特徵融合機制架構圖	20		
圖	12:TrashCan資料集類別數量分類表[26]	22
圖	13:YOLOv5s之原始圖像增強	24		
圖	14:YOLOv5s之圖像增強順序(a)改良YOLOv5s之參數1(b)	改良YOLOv5s之參數2	25	
圖	15:本論文偵測結果比較1(a)YOLOv4、(b)YOLO	TrashCan、(c)本論文之模型	26	
圖	16:本論文偵測結果比較2	(a)實際標記、(b)YOLOv5s、(c)本論文之模型	27	
圖	17:本論文偵測結果比較3(a)實際標記、(b)YOLOv5s、(c)本論文之模型	27		
圖	18:YOLOv5s與YOLOv5s-CLAHE之偵測比較(a)YOLOv5s	(b)YOLOv5s-CLAHE29		
圖	19:YOLOv5s-ASFF之偵測結果	30		
圖	20:YOLOv5s-CA之偵測結果	31		
圖	21:改良式YOLOv5s之偵測結果	32		
			
表目錄				
表	01	CLAHE圖像增強之比較結果	25	
表	02	不同模型之平均精確度比較	26	
表	03	消融實驗結果之評量指標比較	29	
參考文獻
[1] Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527
[2] Redmon, J., Divvala, S.K., Girshick, R.B., & Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788.
[3] Jocher, G. (2020). YOLOv5 by Ultralytics (Version 7.0) [Computer software]. https://doi.org/10.5281/zenodo.3908559 https://github.com/ultralytics/YOLOv5
[4] S. M. Pizer, R. E. Johnston, J. P. Ericksen, B. C. Yankaskas and K. E. Muller,(1990). "Contrast-limited adaptive histogram equalization: speed and effectiveness," Proceedings of the First Conference on Visualization in Biomedical Computing, Atlanta,GA,USA,pp. 337-345, doi:10.1109/VBC.1990.109340.
[5] Hou, Q., Zhou, D., & Feng, J. (2021). Coordinate Attention for Efficient Mobile Network Design. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13708-13717.
[6] Liu, S., Huang, D., & Wang, Y. (2019). Learning Spatial Fusion for Single-Shot Object Detection. ArXiv, abs/1911.09516.
[7]Chen, Y., Yuan, X., Wu, R., Wang, J., Hou, Q., & Cheng, M. (2023). YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection. ArXiv, abs/2308.05480.
[8]Guo, C., Fan, B., Zhang, Q., Xiang, S., & Pan, C. (2019). AugFPN: Improving Multi-Scale Feature Learning for Object Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12592-12601.
36
[9] Mobley, C.D. (1994) Light and Water: Radiative Transfer in Natural Waters. Academic, San Diego.
[10] Ma, Jinxiang & Fan, Xin & Ni, Jianjun & Zhu, Xifang & Xiong, Chao. (2017). Multi-scale retinex with color restoration image enhancement based on Gaussian filtering and guided filtering. International Journal of Modern Physics B. 31. 1744077. 10.1142/S0217979217440775.
[11] Ren, S., He, K., Girshick, R.B., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149.
[12] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., & Berg, A.C. (2015). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision.
[12] Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2013). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587.
[13]Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
[14]Redmon, J., & Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525.
[15] Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ArXiv, abs/1502.03167.
[16] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. ArXiv, abs/1804.02767.
[17] Bochkovskiy, A., Wang, C., & Liao, H.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. ArXiv, abs/2004.10934.
37
[18] Wang, C., Liao, H.M., Yeh, I., Wu, Y., Chen, P., & Hsieh, J. (2019). CSPNet: A New Backbone that can Enhance Learning Capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1571-1580.
[19] He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916.
[20] Wang, C., Bochkovskiy, A., & Liao, H.M. (2022). YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7464-7475.
[21] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.
[22]Tishby, N., & Zaslavsky, N. (2015). Deep learning and the information bottleneck principle. 2015 IEEE Information Theory Workshop (ITW), 1-5.
[23] Zhu, Y. L., & Huang, C. (2012). An Adaptive Histogram Equalization Algorithm on the Image Gray Level Mapping. Physics Procedia, 25, 601–608. https://doi.org/https://doi.org/10.1016/j.phpro.2012.03.132.
[24] Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E. (2017). Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7132-7141.
[25] Woo, S., Park, J., Lee, J., & Kweon, I. (2018). CBAM: Convolutional Block Attention Module. ArXiv, abs/1807.06521.
[26] Hong, J., Fulton, M., & Sattar, J. (2020). TrashCan: A Semantically-Segmented
38
Dataset towards Visual Detection of Marine Debris. ArXiv, abs/2007.08097.
[27] M. Ali and S. Khan.(2022)."Underwater Object Detection Enhancement via Channel Stabilization," 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Sydney, Australia, doi: 10.1109/DICTA56598.2022.10034594.
[28] Zhou, W., Zheng, J. F., Yin, gang, Pang, R. Y., & Jun, Y. (2023). YOLOTrashCan: A Deep Learning Marine Debris Detection Network. IEEE Transactions on Instrumentation and Measurement, 72, 1–12. https://doi.org/10.1109/TIM.2022.3225044
[29] Meyes, R., Lu, M., Puiseau, C.W., & Meisen, T. (2019). Ablation Studies in Artificial Neural Networks. ArXiv, abs/1901.08644.
[30] Reis, D., Kupec, J., Hong, J., & Daoudi, A. (2023). Real-Time Flying Object Detection with YOLOv8. ArXiv, abs/2305.09972.
論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於繳交授權書後, 於網際網路立即公開
校內
校內紙本論文立即公開
同意電子論文全文授權於全球公開
校內電子論文立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信