基於定位與自適應空間注意力蒸餾之物件偵測技術｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	賴政良 Zheng-Liang Lai
論文名稱：	基於定位與自適應空間注意力蒸餾之物件偵測技術 LASAD-YOLO: Enhanced Dense Object Detection with Localization and Adaptive Spatial Attention Distillation
指導教授：	郭景明 Jing-Ming Guo
口試委員:	郭景明 Jing-Ming Guo 洪一平 Yi-Ping Hung 鄭文皇 Wen-Huang Cheng 康立威 Li-Wei Kang 夏至賢 Chih-Hsien Hsia
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	89
中文關鍵詞：	密集物件偵測、即時物件偵測、知識蒸餾
外文關鍵詞：	dense object detection, real-time object detection, knowledge distillation
相關次數：	點閱：204 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本研究提出了一種基於定位與自適應空間注意力機制蒸餾方法用於物件偵測任務上，通過教師與學生模型的訓練能有效提升學生網路的預測準確率，且不會在推理時增加額外計算需求，在不改變模型大小的前提下增加模型參數的有效利用率。
知識蒸餾是一種常見的模型壓縮方法，常見的知識蒸餾方法包含特徵與邏輯的知識蒸餾方法。在過去的研究中，基於特徵的知識蒸餾方法通過讓學生直接模仿教師模型的特徵圖奠定了基礎，近期的特徵蒸餾的研究中，會使用遮罩生成方式使學生模型獲得更強的特徵表現能力以提高模型準確度，但是卻會混和類別與邊界框回歸知識；基於邏輯的知識蒸餾方法能夠個別提取類別資訊與邊界框回歸資訊，但是卻缺乏了特徵蒸餾中的教師模型的特徵圖中所能提供的資訊。本論文提出了一種整合式知識蒸餾方法用於物件偵測系統，提出了一種基於教師特徵注意力機制導向的邏輯蒸餾方法，有效的將特徵蒸餾與邏輯蒸餾的優缺點互補，並且利用位置蒸餾與類別蒸餾將知識從特徵圖上解耦分開傳遞，進一步提升模型效能。
本論文使用公開資料集MS-COCO 2017進行實驗與測試，並與先前的研究方法進行比較，實驗顯示出所提出的方法具有良好的泛化性與強健性，能夠在不同模型大小上獲得一致性的效能提升，從結果顯示所提出的方法應用在基礎模型上能夠最高能夠提升0.9%，而最好的模型表現在該資料集的評估指標平均精度(AP)上可達53.1%。

In this study, an object detection technique based on localization and adaptive spatial attention mechanism distillation method is proposed. Through the training of teacher and student models, it effectively improves the prediction accuracy of the student network without increasing additional computational demands during inference. This paper presents an integrated knowledge distillation method for object detection systems. It introduces a teacher feature attention mechanism-guided logic distillation method, effectively complementing the advantages and disadvantages of feature distillation and logic distillation. By decoupling knowledge transfer from feature maps using localization distillation and class distillation, the model performance is further improved. Experiments and tests are conducted using the publicly available MS-COCO 2017 dataset. A comparison is made with previous research methods, demonstrating the proposed method's good generalization and robustness. Consistent performance improvements are achieved across different model sizes. The results show that the proposed method achieves a maximum improvement of 0.9% when applied to the base model, and the best model achieves an average precision (AP) evaluation metric of 53.1% on the dataset.

摘要    0
Abstract    1
致謝    2
目錄    3
圖片索引    5
表格索引    8
第一章 緒論    9
1 背景介紹    9
2 研究動機與目的    11
3 論文架構    13
第二章 文獻探討    14
1 深度學習與機器學習    14
1.1 類神經網路(Artificial Neural Network, ANN)    16
1.2 卷積神經網路(Convolutional Neural Network, CNN)    21
2 物件偵測相關文獻    27
2.1 一階段物件偵測相關研究    28
3 知識蒸餾相關文獻    36
3.1 邏輯蒸餾    36
3.2 特徵蒸餾    38
3.3 結合知識蒸餾之物件偵測技術    41
第三章 研究方法    43
1 整體架構    44
2 網路架構    44
3 知識蒸餾區域    47
3.1 主蒸餾區域    48
3.2 有價值的蒸餾區域    49
3.3 注意力蒸餾區域    51
4 損失函數    52
4.1 主蒸餾損失    53
4.2 有價值的蒸餾損失    53
4.3 注意力蒸餾損失    53
第四章 實驗結果    55
1 公開資料集    55
1.1 MS-COCO    55
1.2 COCO minitrain    57
2 實驗環境    58
3 實驗結果與分析    58
3.1 評估指標    59
3.2 網路訓練參數設置    61
3.3 消融實驗    62
3.4 實驗結果比較    64
3.5 可視化結果    66
第五章 結論與未來展望    79
參考文獻    80


                                

[1] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, "Regularized evolution for image classifier architecture search," in Proceedings of the aaai conference on artificial intelligence, 2019, vol. 33, no. 01, pp. 4780-4789.
[2] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International conference on machine learning, 2019: PMLR, pp. 6105-6114.
[3] J.-M. Guo, J.-S. Yang, S. Seshathiri, and H.-W. Wu, "A light-weight CNN for object detection with sparse model and knowledge distillation," Electronics, vol. 11, no. 4, p. 575, 2022.
[4] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[5] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, 2014: Springer, pp. 818-833.
[6] I. VILLAR MEDINA, G. V. GABRIELA, and O. LÓPEZ ORTEGA, "Neural network with brain inspired computing paradigm."
[7] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
[8] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814.
[9] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," arXiv preprint arXiv:1511.07289, 2015.
[10] W. Shang, K. Sohn, D. Almeida, and H. Lee, "Understanding and improving convolutional neural networks via concatenated rectified linear units," in international conference on machine learning, 2016: PMLR, pp. 2217-2225.
[11] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," Advances in neural information processing systems, vol. 30, 2017.
[12] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
[13] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011: JMLR Workshop and Conference Proceedings, pp. 315-323.
[14] C. Gulcehre, M. Moczulski, M. Denil, and Y. Bengio, "Noisy activation functions," in International conference on machine learning, 2016: PMLR, pp. 3059-3068.
[15] S. Elfwing, E. Uchibe, and K. Doya, "Sigmoid-weighted linear units for neural network function approximation in reinforcement learning," Neural networks, vol. 107, pp. 3-11, 2018.
[16] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, 2012.
[18] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[19] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[20] X. Liu, Y. Hu, H. Ji, M. Zhang, and Q. Yu, "A Deep Learning Method for Ship Detection and Traffic Monitoring in an Offshore Wind Farm Area," Journal of Marine Science and Engineering, vol. 11, no. 7, p. 1259, 2023.
[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Region-based convolutional networks for accurate object detection and segmentation," IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 1, pp. 142-158, 2015.
[22] P. Purkait, C. Zhao, and C. Zach, "SPP-Net: Deep absolute pose regression with synthetic views," arXiv preprint arXiv:1712.03452, 2017.
[23] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[24] W. Liu et al., "Ssd: Single shot multibox detector," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016: Springer, pp. 21-37.
[25] Z. Tian, C. Shen, H. Chen, and T. He, "Fcos: Fully convolutional one-stage object detection," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9627-9636.
[26] M. Tan, R. Pang, and Q. V. Le, "Efficientdet: Scalable and efficient object detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10781-10790.
[27] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[28] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
[29] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
[30] J. Glenn. "YOLOv5 release v6.1.". https://github.com/ultralytics/yolov5/releases/tag/v6.1 (accessed.
[31] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7464-7475.
[32] J. Glenn. "Ultralytics YOLOv8." https://github.com/ultralytics/ultralytics (accessed.
[33] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International journal of computer vision, vol. 111, pp. 98-136, 2015.
[34] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.
[35] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, 2015.
[36] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[37] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, "CSPNet: A new backbone that can enhance learning capability of CNN," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 390-391.
[38] P. Xu et al., "On-board real-time ship detection in HISEA-1 SAR images based on CFAR and lightweight deep learning," Remote Sensing, vol. 13, no. 10, p. 1995, 2021.
[39] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759-8768.
[40] C. Li et al., "YOLOv6: A single-stage object detection framework for industrial applications," arXiv preprint arXiv:2209.02976, 2022.
[41] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, "Repvgg: Making vgg-style convnets great again," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13733-13742.
[42] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015.
[43] J. Gou, B. Yu, S. J. Maybank, and D. Tao, "Knowledge distillation: A survey," International Journal of Computer Vision, vol. 129, pp. 1789-1819, 2021.
[44] Z. Zheng et al., "Localization distillation for dense object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9407-9416.
[45] G. Chen, W. Choi, X. Yu, T. Han, and M. Chandraker, "Learning efficient object detection models with knowledge distillation," Advances in neural information processing systems, vol. 30, 2017.
[46] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, "Fitnets: Hints for thin deep nets," arXiv preprint arXiv:1412.6550, 2014.
[47] Z. Yang, Z. Li, M. Shao, D. Shi, Z. Yuan, and C. Yuan, "Masked generative distillation," in European Conference on Computer Vision, 2022: Springer, pp. 53-69.
[48] G. Yang, Y. Tang, J. Li, J. Xu, and X. Wan, "AMD: Adaptive Masked Distillation for Object," arXiv preprint arXiv:2301.13538, 2023.
[49] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[50] X. Li, W. Wang, X. Hu, J. Li, J. Tang, and J. Yang, "Generalized focal loss v2: Learning reliable localization quality estimation for dense object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11632-11641.
[51] P. Micikevicius et al., "Mixed precision training," arXiv preprint arXiv:1710.03740, 2017.
[52] C. Li et al., "Yolov6 v3. 0: A full-scale reloading," arXiv preprint arXiv:2301.05586, 2023.
[53] G. Ghiasi, T.-Y. Lin, and Q. V. Le, "Nas-fpn: Learning scalable feature pyramid architecture for object detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7036-7045.
[54] N. Wang et al., "NAS-FCOS: Fast neural architecture search for object detection," in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11943-11951.
[55] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, "Yolox: Exceeding yolo series in 2021," arXiv preprint arXiv:2107.08430, 2021.
[56] X. Li et al., "Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection," Advances in Neural Information Processing Systems, vol. 33, pp. 21002-21012, 2020.
[57] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980-2988.
[58] C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang, "Tood: Task-aligned one-stage object detection," in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021: IEEE Computer Society, pp. 3490-3499.
[59] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU loss: Faster and better learning for bounding box regression," in Proceedings of the AAAI conference on artificial intelligence, 2020, vol. 34, no. 07, pp. 12993-13000.
[60] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014: Springer, pp. 740-755.
[61] N. Samet, S. Hicsonmez, and E. Akbas, "Houghnet: Integrating near and long-range evidence for bottom-up object detection," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, 2020: Springer, pp. 406-423.
[62] M. I. Facebook. "Open Neural Network Exchange." https://onnx.ai/ (accessed.

全文公開日期 2025/08/21 (校內網路)
全文公開日期 2025/08/21 (校外網路)
全文公開日期 2025/08/21 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文