簡易檢索 / 詳目顯示

研究生: 劉京樺
Ching-Hua Liu
論文名稱: 基於YOLOv4架構之水下物體偵測
Underwater Object Detection Based on Enhancement YOLOv4 Architecture
指導教授: 林昌鴻
Chang-Hong Lin
口試委員: 林淵翔
Yuan-Hsiang Lin
吳晉賢
Chin-Hsien Wu
林敬舜
Ching-Shun Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 74
中文關鍵詞: 物件偵測水下物件圖像復原深度學習影像辨識卷積神經網路
外文關鍵詞: Object detection, Underwater object, Image restoration, Deep learning, Image recognition, Convolution Neural Network
相關次數: 點閱:436下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

物件偵測和影像復原在深度學習和計算機視覺領域中是一項具挑戰性的任務,物件偵測現今被廣泛應用於自動駕駛、醫療、監控等應用中。近年來隨著硬體技術突破,使基於深度學習的物件偵測性能得到重大的突破,然而邊緣裝置上的性能與功耗的瓶頸限制,使我們必須解決這個問題,現今在特殊環境下的物件偵測需求增加,影像容易受到模糊或是雜訊影響後續的結果,因此影像改善的問題也必須重視。
過去所提出的方法中,有些致力於高準確度而需要大量推理時間,其他方法則是在速度上優先考量。所以我們必須設計出一個高效能的網路架構,在高精準度下保持推理速度,因此本論文提出了一個高效能具注意力機制的水下物體偵測網路架構,我們對於水下物體的處理與其他方法不太一樣,不佳的影像會影響後續的物件偵測結果,因此針對水下資料集影像品質的還原及改善使用了除模糊網路作為前處理,另一方面在偵測網路中的特徵提取上分別加強通道和空間的特徵信息,並將這些自適應注意力特徵整合於多尺度特徵融合,最後結合部分跨階段網路方法以提升卷積神經網路(CNN)的學習能力同時縮減模型大小。根據實驗結果,我們提出的模型架構和以前提出的方法相比在準確度方面獲得領先的成績,在模型大小方面也有取得很好的平衡,結果比較也可以看出前處理的除模糊網路帶來的提升效果。


Object detection and image restoration are challenging tasks in deep learning and computer vision. Widely used in autonomous driving, medical, surveillance, and other applications, there is increasing demand in special environments where images are susceptible to blur or noise, which can affect subsequent results. In recent years significant breakthroughs have been made in object detection performance. However, the performance and power consumption bottlenecks of edge devices require us to solve this problem, and the problem of image improvement must also be accounted for.
Among the methods proposed in the past, some focus on high accuracy and require a lot of inference time, while others prioritize speed. Therefore, we must design an efficient network architecture to maintain inference speed and high accuracy. This thesis proposes an underwater object detection network architecture with an attention mechanism. This processing differs from other methods in that a deblurring network is used as preprocessing in order to restore and improve the image quality of the underwater dataset. On the other hand, in the feature extraction of the detection network, channel and spatial feature information are enhanced, respectively. These adaptive attention features are integrated into multi-scale feature fusion. And finally, the cross-stage local method is combined to improve the network learning ability of the convolutional neural network while reducing the reduced model size. According to the experimental results, our proposed model architecture achieves leading results in terms of accuracy and also achieves a good balance in terms of model size compared to previously proposed methods.

摘要 I ABSTRACT II 致謝 III LIST OF CONTENTS IV LIST OF FIGURES VI LIST OF TABLES VIII LIST OF COMMON ABBREVIATIONS AND PARAMETERS IX CHAPTER 1 INTRODUCTIONS 1 1.1 Motivation 1 1.2 Contributions 3 1.3 Thesis Organization 4 CHAPTER 2 RELATED WORKS 5 2.1 Multi-stage object detection 6 2.2 One-stage object detection 7 2.3 Underwater object detection 8 CHAPTER 3 PROPOSED METHODS 10 3.1 Data preprocessing 12 3.1.1 Image deblurring 12 3.1.2 Resizing 19 3.1.3 Random HSV augmentation 20 3.1.4 Mosaic [25] 23 3.1.5 Label smoothing [59] 25 3.2 Network Architecture 26 3.2.1 Backbone 26 3.2.2 Neck 32 3.2.3 Detection block 35 3.3 Loss function 39 3.3.1 Ground truth 40 3.3.2 Localization loss 41 3.3.3 Confidence loss 43 3.3.4 Classification loss 44 3.3.5 Total loss 44 CHAPTER 4 EXPERIMENTAL RESULTS 45 4.1 Experimental Environment 45 4.2 Dataset 46 4.3 Training Details 48 4.4 Evaluation metrics 49 4.5 Evaluation and Results 51 CHAPTER 5 CONCLUSIONS AND FUTURE WORKS 55 5.1 Conclusions 55 5.2 Future Works 56 REFERENCES 57

[1] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, "Residual Attention Network for Image Classification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3156-3164.
[2] S. N. Gowda and C. Yuan, "ColorNet: Investigating the Importance of Color Spaces for Image Classification," in Proceedings of the Asian Conference on Computer Vision (ACCV), 2018: Springer, pp. 581-596.
[3] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self-Training With Noisy Student Improves ImageNet Classification," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10687-10698.
[4] Y. Li, C.-Y. Wu, H. Fan, K. Mangalam, B. Xiong, J. Malik, and C. Feichtenhofer, "MViTv2: Improved Multiscale Vision Transformers for Classification and Detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 4804-4814.
[5] Y. Liu, Y. Wang, S. Wang, T. Liang, Q. Zhao, Z. Tang, and H. Ling, "CBNet: A Novel Composite Backbone Network Architecture for Object Detection," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 11653-11660.
[6] M. Tan, R. Pang, and Q. V. Le, "EfficientDet: Scalable and Efficient Object Detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10781-10790.
[7] B. Zoph, E. D. Cubuk, G. Ghiasi, T.-Y. Lin, J. Shlens, and Q. V. Le, "Learning Data Augmentation Strategies for Object Detection," in Proceedings of the European Conference on Computer Vision (ECCV), 2020: Springer, pp. 566-583.
[8] J. Fu, J. Liu, J. Jiang, Y. Li, Y. Bao, and H. Lu, "Scene Segmentation With Dual Relation-Aware Attention Network," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 6, pp. 2547-2560, 2020.
[9] H. Zhang, H. Zhang, C. Wang, and J. Xie, "Co-Occurrent Features in Semantic Segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 548-557.
[10] Z. Zhu, M. Xu, S. Bai, T. Huang, and X. Bai, "Asymmetric Non-Local Neural Networks for Semantic Segmentation," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 593-602.
[11] S. Hoermann, M. Bach, and K. Dietmayer, "Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling," in Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018: IEEE, pp. 2056-2063.
[12] M. Orsic, I. Kreso, P. Bevandic, and S. Segvic, "In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12607-12616.
[13] M. Yu, G. Li, D. Jiang, G. Jiang, B. Tao, and D. Chen, "Hand Medical Monitoring System based on Machine Learning and Optimal EMG Feature Set," Personal and Ubiquitous Computing, pp. 1-17, 2019.
[14] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, "Deep Learning and Its Applications to Machine Health Monitoring," Mechanical Systems and Signal Processing (MSSP), vol. 115, pp. 213-237, 2019.
[15] M. S. Hossain, G. Muhammad, and N. Guizani, "Explainable AI and Mass Surveillance System-Based Healthcare Framework to Combat COVID-I9 Like Pandemics," IEEE Network, vol. 34, no. 4, pp. 126-132, 2020.
[16] P. Sikora, L. Malina, M. Kiac, Z. Martinasek, K. Riha, J. Prinosil, L. Jirik, and G. Srivastava, "Artificial Intelligence-Based Surveillance System for Railway Crossing Traffic," IEEE Sensors Journal, vol. 21, no. 14, pp. 15515-15526, 2020.
[17] P. Wang, W. Hao, Z. Sun, S. Wang, E. Tan, L. Li, and Y. Jin, "Regional Detection of Traffic Congestion Using in a Large-Scale Surveillance System via Deep Residual TrafficNet," IEEE Access, vol. 6, pp. 68910-68919, 2018.
[18] Z. Cai and N. Vasconcelos, "Cascade R-CNN: Delving Into High Quality Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6154-6162.
[19] R. Girshick, "Fast R-CNN," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440-1448.
[20] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 580-587.
[21] X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, "Grid R-CNN," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7363-7372.
[22] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," Advances in Neural Information Processing Systems (NIPS), vol. 28, 2015.
[23] P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, and C. Wang, "Sparse R-CNN: End-to-End Object Detection With Learnable Proposals," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14454-14463.
[24] Z. Yang, S. Liu, H. Hu, L. Wang, and S. Lin, "RepPoints: Point Set Representation for Object Detection," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9657-9666.
[25] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, 2020.
[26] K. Kim and H. S. Lee, "Probabilistic Anchor Assignment with IoU Prediction for Object Detection," in Proceedings of the European Conference on Computer Vision (ECCV), 2020: Springer, pp. 355-371.
[27] T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, and J. Shi, "FoveaBox: Beyound Anchor-Based Object Detection," IEEE Transactions on Image Processing, vol. 29, pp. 7389-7398, 2020.
[28] X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, "Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection," Advances in Neural Information Processing Systems (NIPS), vol. 33, pp. 21002-21012, 2020.
[29] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal Loss for Dense Object Detection," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980-2988.
[30] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single Shot MultiBox Detector," in Proceedings of the European Conference on Computer Vision (ECCV), 2016: Springer, pp. 21-37.
[31] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788.
[32] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7263-7271.
[33] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.
[34] N. Wang, Y. Gao, H. Chen, P. Wang, Z. Tian, C. Shen, and Y. Zhang, "NAS-FCOS: Fast Neural Architecture Search for Object Detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11943-11951.
[35] S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, "Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9759-9768.
[36] C. Zhu, Y. He, and M. Savvides, "Feature Selective Anchor-Free Module for Single-Shot Object Detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 840-849.
[37] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, "DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8183-8192.
[38] L. Chen, X. Lu, J. Zhang, X. Chu, and C. Chen, "HINet: Half Instance Normalization Network for Image Restoration," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 182-192.
[39] A. Mehri, P. B. Ardakani, and A. D. Sappa, "MPRNet: Multi-Path Residual Network for Lightweight Image Super Resolution," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 2704-2713.
[40] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, "Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering," IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080-2095, 2007.
[41] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Non-local sparse models for image restoration," in in Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (ICCV), 2009: IEEE, pp. 2272-2279.
[42] S. Gu, L. Zhang, W. Zuo, and X. Feng, "Weighted Nuclear Norm Minimization with Application to Image Denoising," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2862-2869.
[43] C. Liu, H. Li, S. Wang, M. Zhu, D. Wang, X. Fan, and Z. Wang, "A Dataset and Benchmark of Underwater Object Detection for Robot Picking," in Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2021, pp. 1-6.
[44] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, "CSPNet: A New Backbone That Can Enhance Learning Capability of CNN," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 390-391.
[45] A. Neubeck and L. Van Gool, "Efficient Non-Maximum Suppression," in Proceedings of the 18th International Conference on Pattern Recognition (ICPR), 2006, vol. 3, pp. 850-855.
[46] W. Chen and B. Fan, "Underwater Object Detection With Mixed Attention Mechanism And Multi-Enhancement Strategy," in Proceedings of the IEEE Conference on Chinese Automation Congress (CAC), 2020, pp. 2821-2826.
[47] F. Han, J. Yao, H. Zhu, and C. Wang, "Underwater Image Processing and Object Detection Based on Deep CNN Method," Journal of Sensors, vol. 2020, 2020.
[48] C.-H. Yeh, C.-H. Lin, L.-W. Kang, C.-H. Huang, M.-H. Lin, C.-Y. Chang, and C.-C. Wang, "Lightweight Deep Neural Network for Joint Learning of Underwater Object Detection and Color Conversion," IEEE Transactions on Neural Networks and Learning Systems, pp. 1 - 15, 2021.
[49] W.-H. Lin, J.-X. Zhong, S. Liu, T. Li, and G. Li, "ROIMIX: Proposal-Fusion Among Multiple Images for Underwater Object Detection," in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: IEEE, pp. 2588-2592.
[50] X. Lv, A. Wang, Q. Liu, J. Sun, and S. Zhang, "Proposal-Refined Weakly Supervised Object Detection in Underwater Images," in Proceedings of the International Conference on Image and Graphics, 2019: Springer, pp. 418-428.
[51] D. Ulyanov, A. Vedaldi, and V. Lempitsky, "Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6924-6932.
[52] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, "Multi-Stage Progressive Image Restoration," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14821-14831.
[53] O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015: Springer, pp. 234-241.
[54] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical Evaluation of Rectified Activations in Convolutional Network," arXiv preprint arXiv:1505.00853, 2015.
[55] Q. Huynh-Thu and M. Ghanbari, "Scope of validity of PSNR in image/video quality assessment," Electronics Letters, vol. 44, no. 13, pp. 800-801, 2008.
[56] D. M. Allen, "Mean Square Error of Prediction as a Criterion for Selecting Variables," Technometrics, vol. 13, no. 3, pp. 469-475, 1971.
[57] A. R. Smith, "Color Gamut Transform Pairs," ACM Siggraph Computer Graphics, vol. 12, no. 3, pp. 12-19, 1978.
[58] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, "CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6023-6032.
[59] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818-2826.
[60] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, "ImageNet: A Large-Scale Hierarchical Image Database," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248-255.
[61] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," in Proceedings of the European Conference on Computer Vision (ECCV), 2014: Springer, pp. 740-755.
[62] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "CBAM: Convolutional Block Attention Module," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3-19.
[63] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.
[64] D. Misra, "Mish: A Self Regularized Non-Monotonic Neural Activation Function," arXiv preprint arXiv:1908.08681, vol. 4, no. 2, p. 10.48550, 2019.
[65] A. F. Agarap, "Deep Learning using Rectified Linear Units (ReLU)," arXiv preprint arXiv:1803.08375, 2018.
[66] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[67] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path Aggregation Network for Instance Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759-8768.
[68] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 12993-13000.
[69] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, "PyTorch: An Imperative Style, High-Performance Deep Learning Library," Advances in Neural Information Processing Systems (NIPS), vol. 32, 2019.
[70] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour," arXiv preprint arXiv:1706.02677, 2017.
[71] S. J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2009.
[72] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," arXiv preprint arXiv:1412.6980, 2014.
[73] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.

QR CODE