基於YOLOv7架構之低光源場景物體偵測｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林家顯 Chia-Hsien Lin
論文名稱：	基於YOLOv7架構之低光源場景物體偵測 Object Detection for Low-illumination Scenes Based on Enhanced YOLOv7 Architecture
指導教授：	林昌鴻 Chang-Hong Lin
口試委員:	呂政修 Jenq-Shiou Leu 吳晋賢 Chin-Hsien Wu 阮聖彰 Shanq-Jang Ruan 林昌鴻 Chang-Hong Lin
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2023
畢業學年度：	112
語文別：	英文
論文頁數：	77
中文關鍵詞：	物件偵測、低光源、圖像復原、卷積神經網路、Transformer 、影像辨識、深度學習、影像增強
外文關鍵詞：	Object detection, Low-illumination, Image restoration, Convolution Neural Network, Transformer, Image recognition, Deep learning, Image enhancement
相關次數：	點閱：35 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在深度學習和計算機視覺的領域中，物件偵測和影像復原是充滿挑戰性的任務。物件偵測在自動駕駛、醫療、監控等領域取得了廣泛應用。近年來，隨著硬體技術的突破，基於深度學習的物件偵測性能取得了很大的進展。然而，在一些極端的特殊場景中，仍可能發生誤判或偵測不到的情況，這需要我們解決。目前，在特殊環境下物件偵測的需求日益增加，影像容易受到光源不足、模糊或雜訊的影響，這也使得影像改善變得至關重要。過去提出的方法大多專注於正常場景，少數方法則針對特殊場景進行了研究，但並未有很多方法同時在特殊場景和正常場景下取得良好效果。因此，我們需要設計一個多場景的網路架構，在低光源場景外也能在正常場景下進行有效實測。基於這些需求，本論文提出了一個高效具有注意力機制和Transformer的低光源物體偵測網路架構。
對於低光源場景物體的處理，我們與其他方法有所不同。影像品質的不佳會對後續的物件偵測結果產生影響，因此，我們使用了影像增強和復原網路作為前處理，針對低光源場景資料集中的影像進行品質的還原和改善。在偵測網路中，我們分別加強通道和空間的特徵信息，融入增強上下文信息的多頭注意力，並將這些自適應注意力特徵整合於多尺度特徵融合中。最後，我們採用部分跨階段網路方法以提升卷積神經網路(CNN)的學習能力，同時縮減模型大小。根據實驗的結果，我們提出的模型架構在準確度方面超越了以往的方法，同時在模型大小方面取得了良好的平衡。結果比較還顯示出前處理網路帶來不錯的提升效果。

In the domains of deep learning and computer vision, challenges persist in object detection and image restoration. Object detection, crucial for applications like autonomous driving and surveillance, has advanced due to breakthroughs in hardware technology. However, hurdles remain in extreme scenarios, leading to potential misjudgments or undetected objects. The demand for object detection in specialized environments—marked by challenges like insufficient lighting, blurriness, or noise—is rising, making image enhancement crucial. While previous methods often prioritized normal scenes, few addressed specific scenarios, and rarely did methods demonstrate effective results in both special and normal scenes. Thus, a multi-scene network architecture is essential for effective testing in both low-illumination environments and normal scenes. This thesis proposes an efficient object detection network for low-illumination conditions, incorporating attention mechanisms and a Transformer.
In addressing objects in low-illumination scenarios, our approach diverges from others. Poor image quality can impact subsequent object detection results. Therefore, we use image enhancement and restoration networks as preprocessing steps to improve image quality in low-illumination datasets. In the detection network, we enhance the channel and the spatial feature information, integrate adaptive attention features with multi-head attention to boost contextual information, and amalgamate these features in a multi-scale fusion. Finally, we adopt a partial cross-stage network to enhance CNN learning while reducing model size. Experimental results show our proposed model outperforms previous methods in accuracy, achieving a balanced model size. Comparison results highlight significant improvements brought about by the preprocessing network.

摘要	I
ABSTRACT	II
致謝	III
LIST OF CONTENTS	IV
LIST OF FIGURES	VII
LIST OF TABLES	IX
CHAPTER 1	INTRODUCTIONS	1
1	Motivation	1
2	Contributions	3
3	Thesis Organization	4
CHAPTER 2	RELATED WORKS	5
1	Two-stage object detection	5
2	One-stage object detection	6
3	Low-illumination Scenes Object Detection	7
CHAPTER 3	PROPOSED METHODS	10
1	Data preprocessing	12
1.1	Image enhancement	12
1.2	Image restoration	16
2	Network Architecture	19
2.1	Backbone	19
2.2	Global Attention Mechanism	24
2.3	Contextual Transformer Network	27
2.4	Head	31
2.5	Detection block	34
3	Loss function	40
3.1	Ground truth	41
3.2	Localization loss	42
3.3	Confidence loss	44
3.4	Classification loss	45
3.5	Total loss	45
CHAPTER 4	EXPERIMENTAL RESULTS	46
1	Experimental Environment	46
2	Dataset	47
2.1	NOD Dataset	47
2.2	ExDark Dataset	49
3	Training Details	52
4	Evaluation Metrics	53
5	Results	55
5.1	The results of the proposed method on NOD Dataset [24]	55
5.2	The results of the proposed method on ExDark Dataset [25]	61
5.3	Ablation study	66
5.4	COCO dataset	68
CHAPTER 5	CONCLUSIONS and Future works	70
1	Conclusions	70
2	Future Works	71
REFERENCES	73
                                

[1] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 580-587.
[2] R. Girshick, "Fast R-CNN," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440-1448.
[3] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2017, vol. 39, no. 6, pp. 1137-1149.
[4] Z. Cai and N. Vasconcelos, "Cascade R-CNN: Delving into High Quality Object Detection," in the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6154-6162.
[5] X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, "Grid R-CNN," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7363-7372.
[6] P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, and C. Wang, "Sparse R-CNN: End-to-End Object Detection with Learnable Proposals," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14449-14458.
[7] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788.
[8] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single Shot MultiBox Detector," in Proceedings of the European Conference on Computer Vision (ECCV), 2016, Springer, pp. 21-37.
[9] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal Loss for Dense Object Detection," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2999-3007.
[10] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-End Object Detection with Transformers," in Proceedings of the European Conference on Computer Vision (ECCV), 2020, Springer, pp. 213-229.
[11] C.-Y. Wang, A. Bochkovskiy and H.-Y. M. Liao, "YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 7464-7475.
[12] G. Jocher, " ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation [Computer software]," 2022, https://doi.org/10.5281/zenodo.3908559.
[13] G. Jocher, A. Chaurasia, and J. Qiu, "YOLO by Ultralytics (Version 8.0.0) [Computer software]," 2023, https://github.com/ultralytics/ultralytics
[14] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei, and X. Wei, "YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications," arXiv preprint, 2022, arXiv:2209.02976.
[15] C. Li, L. Li, Y. Geng, H. Jiang, M. Cheng, B. Zhang, Z. Ke, X. Xu, and X. Chu, "YOLOv6 v3.0: A Full-Scale Reloading," arXiv preprint, 2023, arXiv:2301.05586.
[16] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, "You Only Learn One Representation: Unified Network for Multiple Tasks," arXiv preprint, 2021, arXiv:2105.04206.
[17] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, "YOLOX: Exceeding YOLO Series in 2021," arXiv preprint, 2021, arXiv:2107.08430.
[18] Y. Jiang, X. Gong, D. Liu, Y. Cheng, C. Fang, X. Shen, J. Yang, P. Zhou, and Z. Wang, "EnlightenGAN: Deep Light Enhancement Without Paired Supervision," in the IEEE Transactions on Image Processing, 2021, vol. 30, pp. 2340-2349.
[19] R. Wang, Q. Zhang, C.-W. Fu, X. Shen, W.-S. Zheng and J. Jia, "Underexposed Photo Enhancement Using Deep Illumination Estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6842-6850.
[20] F. Lv, F. Lu, J. Wu and C. S. Lim, "MBLLEN: Low-Light Image/Video Enhancement Using CNNs," in Proceedings of the British Machine Vision Conference (BMVC), 2018, pp.700-709.
[21] C. Guo, C. Li, J. Guo, C. C. Loy, J. Hou, S. Kwong, and R. Cong, "Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1777-1786.
[22] L. Chen, X. Lu, J. Zhang, X. Chu and C. Chen, "HINet: Half Instance Normalization Network for Image Restoration," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 182-192.
[23] L. Chen, X. Chu, X. Zhang, and J. Sun, "Simple Baselines for Image Restoration," in Proceedings of the European Conference on Computer Vision (ECCV), 2022, pp. 17-33.
[24] I. Morawski, Y.-A. Chen, Y.-S. Lin, and W. H. Hsu, "NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset," arXiv preprint, 2021, arXiv: 2110.10364.
[25] Y. P. Loh, and C. S. Chan, "Getting to Know Low-light Images with The Exclusively Dark Dataset," Computer Vision and Image Understanding, 2019, vol. 178, pp. 30-42.
[26] J. Hu and Z. Cui, "YOLO-Owl: An Occlusion Aware Detector for Low Illuminance Environment," in Proceedings of the International Conference on Neural Networks, Information and Communication Engineering (NNICE), 2023, pp. 167-170.
[27] S. Liu, Y. Wang, Q. Yu, H. Liu and Z. Peng, "CEAM-YOLOv7: Improved YOLOv7 Based on Channel Expansion and Attention Mechanism for Driver Distraction Behavior Detection," in the IEEE Access, 2022, vol. 10, pp. 129116-129124.
[28] Z. Jiang, Y. Xiao, S. Zhang, L. Zhu, Y. He, and F. Zhai, "Low-Illumination Object Detection Method Based on Dark-YOLO[J]," in the Journal of Computer-Aided Design & Computer Graphics, 2023, vol. 35, no. 3, pp. 441-451.
[29] S. Nah, T. H. Kim and K. M. Lee, "Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 257-265.
[30] A. Abdelhamed, S. Lin and M. S. Brown, "A High-Quality Denoising Dataset for Smartphone Cameras," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1692-1700.
[31] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and F.-F. Li, "ImageNet: A Large-Scale Hierarchical Image Database," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248-255.
[32] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," in Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 740-755.
[33] Y. Liu, Z. Shao, and N. Hoffmann, "Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions", arXiv preprint, 2021, arXiv: 2112.05561.
[34] Y. Li, T. Yao, Y. Pan and T. Mei, "Contextual Transformer Networks for Visual Recognition," in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2023, vol. 45, no. 2, pp. 1489-1500.
[35] S. Elfwing, E. Uchibe, and K. Doya, "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning", in the Neural Networks, 2018, vol. 107, pp. 3-11.
[36] A. F. Agarap, "Deep Learning Using Rectified Linear Units (ReLU)," arXiv preprint, 2018, arXiv: 1803.08375.
[37] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "CBAM: Convolutional Block Attention Module," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3-19.
[38] J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, "BAM: Bottleneck Attention Module," arXiv preprint, 2018, arXiv: 1807.06514.
[39] J. Chen, X. Wang, Z. Guo, X. Zhang, and J. Sun, "Dynamic Region-Aware Convolution," arXiv preprint, 2020, arXiv: 2003.12243.
[40] S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, "Path Aggregation Network for Instance Segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759-8768.
[41] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection", arXiv preprint, 2020, arXiv: 2004.10934.
[42] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, "RepVGG: Making VGG-Style Convnets Great Again," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13733–13742.
[43] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 12993-13000.
[44] Y.-F. Zhang, W. Ren, Z. Zhang, Z. Jia, L. Wang, and T. Tan, "Focal and Efficient IOU Loss for Accurate Bounding Box Regression," in the Neurocomputing, 2022, vol. 506, pp. 146-157.
[45] J. He, S. Erfani, X. Ma, J. Bailey, Y. Chi, and X.-S. Hua, "Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression," arXiv preprint, 2021, arXiv: 2110.13675.
[46] Z. Gevorgyan, "SIoU Loss: More Powerful Learning for Bounding Box Regression," arXiv preprint, 2022, arXiv:2205.12740.
[47] Z. Tong, Y. Chen, Z. Xu, and R. Yu, "Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism," arXiv preprint, 2023, arXiv: 2301.10051.
[48] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, "PyTorch: An Imperative Style, High-Performance Deep Learning Library," in the Advances in Neural Information Processing Systems (NIPS), 2019, vol. 32.
[49] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour," arXiv preprint, 2017, arXiv: 1706.02677.
[50] S. J. Pan and Q. Yang, "A Survey on Transfer Learning," in Proceedings of the IEEE Transactions on Knowledge and Data Engineering, 2009, vol. 22, no. 10, pp. 1345-1359.
[51] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," arXiv preprint, 2014, arXiv: 1412.6980.
[52] K. Kim, and H. S. Lee, "Probabilistic Anchor Assignment with IoU Prediction for Object Detection," in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 355-371.
[53] N. Wang, Y. Gao, H. Chen, P. Wang, Z. Tian, C. Shen, and Y. Zhang, "NAS-FCOS: Fast Neural Architecture Search for Object Detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11943-11951.
[54] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable DETR: Deformable Transformers for End-to-End Object Detection," arXiv preprint, 2020, arXiv: 2010.04159.
[55] Z. Zong, G. Song, and Y. Liu, "DETRs with Collaborative Hybrid Assignments Training," arXiv preprint, 2022, arXiv: 2211.12860.
[56] S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, G. Wang, Q. Dang, S. Wei, Y. Du, and B. Lai, "PP-YOLOE: An Evolved Version of YOLO," arXiv preprint, 2022, arXiv: 2203.16250.

全文公開日期 2026/01/30 (校內網路)
全文公開日期 2026/01/30 (校外網路)
全文公開日期 2026/01/30 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文