簡易檢索 / 詳目顯示

研究生: 章聰誠
Cong-Cheng Zhang
論文名稱: 改進Yolov7在交通號誌偵測任務-以台灣交通號誌資料集為例
Improving Yolov7 in traffic sign detection task - A Case Study of Taiwan Traffic Signs Dataset
指導教授: 沈上翔
Shan-Hsiang Shen
口試委員: 沈上翔
Shan-Hsiang Shen
花凱龍
Kai-Lung Hua
陳永耀
Yung-Yao Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 34
中文關鍵詞: 交通號誌偵測注意力機制空間至深度卷積
外文關鍵詞: YOLO, Traffic Sign Detection, Attention Module, Space To Depth Convolution
相關次數: 點閱:84下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究目的在於提升YOLOv7在交通號誌偵測上的準確度,並用我蒐集的NTUST台灣交通號誌資料集作為實驗的對象,資料集總共包含29類的交通號誌,包含市區路、山中道路、高速公路的道路場景。由於交通號誌辨識系統預設應用場景為自駕車,而在此應用場景的訴求為需要良好反應速度及在遠距離辨識號誌的能力,而YOLOv7作為單階段的物件模型已有著良好的辨識速度,然而遠距離辨識所遇到的難題是要在少量的訊息下要對號誌進行定位及辨識,由於YOLOv7是基於CNN,其中經典的卷基層與最大池化層不斷的迭代使小物件的特徵逐漸失去,因此我透過SPD卷積模塊盡可能保留小物件的特徵,並使用注意力機制使模型更能專注於物件上,最後針對小物件增加偵測頭,同時訓練時在偵測頭中增加高雜訊使模型穩健。透過上述機制修改成功將YOLOv7的mAP50從59.5%提升到84.7%。


    The purpose of this study is to improve the accuracy of YOLOv7 on traffic sign detection, and use the NTUST Taiwan traffic sign dataset I collected as the experimental object. The dataset contains a total of 29 types of traffic signs, including urban roads, mountain roads, and highways. Since the preset application scenario of the traffic sign detection system is self-driving cars, and the demands of this application scenario require good response speed and the ability to recognize signs at long distances, as a one-stage object model, YOLOv7 already has excellent performance and detection speed, however, the problem encountered in long-distance detection is to locate and identify signs with a small amount of information. Since YOLOv7 is based on CNN, the classic convolution layer and maximum pooling layer are constantly iterated to handle small objects. However, the feature of small objects are gradually lost in this process. To address this issue, I introduce the SPD convolution module to retain the feature of small objects as much as possible. Additionally, I incorporate an attention mechanism to make the model focus more on the objects. Finally, I add a detection head specifically designed for small objects. At the same time, Gaussian noise is added to the detection head during training to enhance the robustness of the model. Through the aforementioned modifications, the YOLOv7 mAP50 was successfully increased from 59.5% to 84.7%.

    Recommendation Letter . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . ii Abstract in Chinese . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . x List of Algorithms . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . 4 2.1 Object Detection . . . . . . . . . . . . . . . 4 2.1.1 Traffic Sign Detetion . . . . . . . . . . . . 4 2.1.2 Small Object Detetion . . . . . . . . . . . . 5 2.2 Attention Mechanism . . . . . . . . . . . . . . 6 2.3 Feature Map Perturbation . . . . . . . . . . . 7 3 Methodology . . . . . . . . . . . . . . . . . . . 8 3.1 Preliminary . . . . . . . . . . . . . . . . . . 8 3.2 Overview of Improvement Methods . . . . . . . . 9 3.3 SimAM Attention Module . . . . . . . . . . . . 10 3.4 Space To Depth Convolution Module . . . . . . . 13 3.5 Enhance Tiny Object Detetion Layer . . . . . . 15 3.5.1 Normal Tiny Object Detetion Layer . . . . . . 15 3.5.2 Enhance Tiny Object Detetion Layer . . . . . 15 4 Experiment . . . . . . . . . . . . . . . . . . . 16 4.1 Datasets . . . . . . . . . . . . . . . . . . . 16 4.2 Implementation Details . . . . . . . . . . . . 19 4.3 Evaluation Metrics . . . . . . . . . . . . . . 20 4.4 Main Experiment . . . . . . . . . . . . . . . . 22 4.5 Ablation Studies . . . . . . . . . . . . . . . 23 4.6 Detailed Ablation Studies . . . . . . . . . . . 24 4.6.1 SimAM Attention Module . . . . . . . . . . . 24 4.6.2 Space To Depth Convolution Module . . . . . . 26 4.6.3 Enhance Tiny Object Detetion Layer . . . . . 27 5 Conclusions . . . . . . . . . . . . . . . . . . . 30 References . . . . . . . . . . . . . . . . . . . . 31

    [1] H.-L. Tang, S.-C. Chien, W.-H. Cheng, Y.-Y. Chen, and K.-L. Hua, “Multi-cue pedestrian detection from 3d point cloud data,” in 2017 IEEE International Conference on Multimedia and Expo, pp. 1279–1284, 2017.
    [2] T.-C. Lin, D. S. Tan, H.-L. Tang, S.-C. Chien, F.-C. Chang, Y.-Y. Chen, W.-H. Cheng, and K.-L. Hua,“Pedestrian detection from lidar data via cooperative deep and hand-crafted features,” in 2018 25th IEEE International Conference on Image Processing, pp. 1922–1926, 2018.
    [3] Y.-Y. Chen, H.-C. Lin, H.-W. Hwang, K.-L. Hua, Y.-L. Hsu, and S.-Y. Jhong, “An edge lidar-based detection method in intelligent transportation system,” APSIPA Transactions on Signal and Information Processing, vol. 12, no. 4, pp. –, 2023.
    [4] C.-W. Chang, K. Srinivasan, Y.-Y. Chen, W.-H. Cheng, and K.-L. Hua, “Vehicle detection in thermal images using deep neural network,” in 2018 IEEE Visual Communications and Image Processing,pp. 1–4, 2018.
    [5] S.-Y. Jhong, C.-H. Ko, Y.-F. Su, K.-L. Hua, and Y.-Y. Chen, “Efficient lane detection based on feature aggregation for advanced driver assistance systems,” in 2023 International Conference on Advanced Robotics and Intelligent Systems, pp. 1–6, 2023.
    [6] A. Sugiharto and A. Harjoko, “Traffic sign detection based on hog and phog using binary svm and k-nn,” in Proceedings of 2016 3rd International Conference on Information Technology, Computer, and Electrical Engineering, pp. 317–321, 2016.
    [7] C. Rahmad, I. F. Rahmah, R. A. Asmara, and S. Adhisuwignjo, “Indonesian traffic sign detection and recognition using color and texture feature extraction and svm classifier,” in Proceedings of 2018 International Conference on Information and Communications Technology, pp. 50–55, 2018.
    [8] Y. Xie, L.-F. Liu, C.-H. Li, and Y.-Y. Qu, “Unifying visual saliency with hog feature learning for traffic sign detection,” in Proceedings of 2009 IEEE Intelligent Vehicles Symposium, pp. 24–29, 2009.
    [9] S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, “Detection of traffic signs in realworld images: The German Traffic Sign Detection Benchmark,” in Proceedings of International Joint Conference on Neural Networks, no. 1288, 2013.
    [10] Álvaro Arcos-García, J. A. Álvarez García, and L. M. Soria-Morillo, “Evaluation of deep neural networks for traffic sign detection systems,” Neurocomputing, vol. 316, pp. 332–344, 2018.
    [11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of Advances in Neural Information Processing Systems, vol. 27, 2014
    [12] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4396–4405, 2019.
    [13] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycleconsistent adversarial networks,” in Proceedings of 2017 IEEE International Conference on Computer Vision, pp. 2242–2251, 2017.
    [14] L. Liu, Y. Wu, W. Wei, W. Cao, S. Sahin, and Q. Zhang, “Benchmarking Deep Learning Frameworks: Design Considerations, Metrics and Beyond,” in Proceedings of 2018 IEEE 38th International Conference on Distributed Computing Systems, pp. 1258–1269, July 2018.
    [15] M. J. KP, “Indian traffic sign detection benchmark dataset in yolo format,” 2022.
    [16] V. I. Shakhuro and A. Konouchine, “Russian traffic sign images dataset,” Computer optics, vol. 40, no. 2, pp. 294–300, 2016.
    [17] Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, and S. Hu, “Traffic-sign detection and classification in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2110–2118, 2016.
    [18] D. Tabernik and D. Skočaj, “Deep Learning for Large-Scale Traffic-Sign Detection and Recognition,”IEEE Transactions on Intelligent Transportation Systems, 2019.
    [19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
    [20] L. Wu, H. Li, J. He, and X. Chen, “Traffic sign detection method based on faster r-cnn,” Journal of Physics: Conference Series, vol. 1176, p. 032045, mar 2019.
    [21] Z. Zuo, K. Yu, Q. Zhou, X. Wang, and T. Li, “Traffic signs detection based on faster r-cnn,” in Proceedings of 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops, pp. 286–288, 2017.
    [22] C. Han, G. Gao, and Y. Zhang, “Real-time small traffic sign detection with revised faster-rcnn,” Multimedia Tools and Applications, vol. 78, pp. 13263–13278, 2019.
    [23] D. D. Thang, S. C. Hidayati, Y.-Y. Chen, W.-H. Cheng, S.-W. Sun, and K.-L. Hua, “A spatial-pyramid scene categorization algorithm based on locality-aware sparse coding,” in 2016 IEEE Second International Conference on Multimedia Big Data, pp. 342–345, 2016.
    [24] D. N. Rahmah, W.-H. Cheng, Y.-Y. Chen, and K.-L. Hua, “A robust learning-based detection and tracking algorithm,” in Technologies and Applications of Artificial Intelligence (S.-M. Cheng and M.-Y. Day, eds.), pp. 284–295, 2014.
    [25] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Proceedings of Advances in Neural Information Processing Systems, vol. 28, 2015.
    [26] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
    [27] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
    [28] M. Hearst, S. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18–28, 1998.
    [29] K. Taunk, S. De, S. Verma, and A. Swetapadma, “A brief review of nearest neighbor algorithm for learning and classification,” in Proceedings of 2019 International Conference on Intelligent Computing and Control Systems, pp. 1255–1260, 2019.
    [30] T. K. Ho, “Random decision forests,” in Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282 vol.1, 1995.
    [31] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies sets new stateof-the-art for real-time object detectors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475, 2023.
    [32] G. Qi, Y. Zhang, K. Wang, N. Mazur, Y. Liu, and D. Malaviya, “Small object detection method based on adaptive spatial parallel convolution and fast multi-scale fusion,” Remote Sensing, vol. 14, no. 2, 2022.
    [33] J.-S. Lim, M. Astrid, H.-J. Yoon, and S.-I. Lee, “Small object detection using context and attention,” in Proceedings of 2021 International Conference on Artificial Intelligence in Information and Communication, pp. 181–186, 2021.
    [34] C. Deng, M. Wang, L. Liu, Y. Liu, and Y. Jiang, “Extended feature pyramid network for small object detection,” IEEE Transactions on Multimedia, vol. 24, pp. 1968–1979, 2022.
    [35] B. Bosquet, D. Cores, L. Seidenari, V. M. Brea, M. Mucientes, and A. D. Bimbo, “A full data augmentation pipeline for small object detection based on generative adversarial networks,” Pattern Recognition, vol. 133, p. 108998, 2023.
    [36] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, “Residual attention network for image classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2017.
    [37] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
    [38] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
    [39] Z. Li, Y. Sun, L. Zhang, and J. Tang, “Ctnet: Context-based tandem network for semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9904–9917, 2022.
    [40] H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain, “On the detection of digital face manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition, pp. 5781–5790, 2020.
    [41] Z. Li, M. Duan, B. Xiao, and S. Yang, “A novel anomaly detection method for digital twin data using deconvolution operation with attention mechanism,” IEEE Transactions on Industrial Informatics, vol. 19, no. 5, pp. 7278–7286, 2023.
    [42] F. Juefei-Xu, V. N. Boddeti, and M. Savvides, “Perturbative neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3310–3318, 2018.
    [43] H. Noh, T. You, J. Mun, and B. Han, “Regularizing deep neural networks by noise: Its interpretation and optimization,” Advances in neural information processing systems, vol. 30, 2017.
    [44] Z. Liu, Y. Zhou, Y. Xu, and Z. Wang, “Simplenet: A simple network for image anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20402–20411, 2023.
    [45] L. Yang, R.-Y. Zhang, L. Li, and X. Xie, “Simam: A simple, parameter-free attention module for convolutional neural networks,” in Proceedings of International conference on machine learning, pp. 11863–11874, PMLR, 2021.
    [46] M. Aubry, B. Russell, and J. Sivic, “Painting-to-3D model alignment via discriminative visual elements,” ACM Transactions on Graphics, 2013. Pre-print, accepted for publication.
    [47] R. Sunkara and T. Luo, “No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects,” in Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 443–459, Springer, 2022.
    [48] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.

    無法下載圖示 全文公開日期 2026/01/19 (校內網路)
    全文公開日期 2026/01/19 (校外網路)
    全文公開日期 2026/01/19 (國家圖書館:臺灣博碩士論文系統)
    QR CODE