整合式知識蒸餾與稀疏性模型之輕量化卷積網路物件偵測技術

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊智勝 JR-SHENG YANG
論文名稱：	整合式知識蒸餾與稀疏性模型之輕量化卷積網路物件偵測技術 A Light-Weight CNN for Object Detection with Sparse Model and Knowledge Distillation
指導教授：	花凱龍 Kai-Lung Hua 郭景明 Jing-Ming Guo
口試委員:	康立威 Li-Wei Kang 丁建均 Jian-Jiun Ding 高文忠 Wen-Chung Kao
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	110
中文關鍵詞：	輕量化模型、即時物件偵測、知識蒸餾
外文關鍵詞：	Light-weight Models, Real-time Object Detection, Knowledge Distillation
相關次數：	點閱：236 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

本論文提出了基於輕量稀疏模型整合知識蒸餾優化深度網路於即時物件偵測系
統，是一種基於卷積神經網路架構進行改良之物件偵測技術，並且擁有較良好的處理
效能，更適合部屬在移動端使用。
本論文所提出輕量化模型與網路加速策略整合，著重於萃取物件資訊時能夠針對
推論速度進行優化。儘管使用參數量較少的輕量化網路作為骨幹，透過雙向特徵金字
塔網路進行特徵融合，當場景或物件較為複雜仍能夠快速的取得重要特徵。透過整合
資料增強、混合精度訓練、稀疏性網路生成等等訓練策略，能有效提升輕量化網路的
泛化能力，使其在計算推理上能有更好的精度提升，確保模型具有良好的泛化能力。
另外，本論文所提出利用知識蒸餾是為了解決輕量化網路在物件偵測精度較為不足，
透過老師與學生模型的互相訓練能有效提升學生網路的推理準確率，且不會在推理時
增加額外計算需求。由於過往研究方法中較少關注在演算法的運算成本，因此我們藉
由輕量化網路帶來的好處，提供運算力較弱的平台也能即時做物件偵測推理，這對後
續在實際場景應用能更有效控制硬體成本，在更多的場景能運用深度學習帶給人們生
活上的便利。
在實驗結果方面，本論文使用公開資料集 MS-COCO 2017 進行測試並與前人技
術比較，由於測試資料中存在較為困難的因素，如真實世界中物件邊界模糊、小物件、
物件重疊等情況。實驗結果顯示，相較於前人所提出的技術，本論文所提出的算法可
獲得良好的準確率及超過標準數倍的即時性，因此有相當大的潛力可被應用於現實生
活中，在使用較少的運算資源下並仍保有一定的準確性。

In this study, a refined and light-weight model is proposed using network sparsity and
knowledge distillation for real-time object detection. The implemented model is
characterized by its excellent performance in object detection and good compatibility with
the standalone machines.
There are several designs integrated into the proposed framework to accomplish lightweight models, inference speed up, and high performance in object detection. The sparse
and light-weight structure was chosen as the network’s backbone, and feature fusion is
performed by the modified feature pyramid networks. Thus, the crucial features in the
complex objects or scenes can still be extracted by the model effectively and efficiently. In
addition, the learning strategies of data augmentation, mixed precision training, and network
sparsity are adopted to enhance the generalization for the light-weight model, and
subsequently boost the performance. Moreover, the knowledge distillation is further applied
to tackle the significant performance dropping issue when models are lighter. The
performance of the light-weight model is further boosted via the student-teacher learning
without the burden of extra execution time. As a result, from the perspective of deployment,
the proposed light-weight model with high detection performance is competitive in reducing
the hardware barrier when using the deep learning models.
In this study, the public dataset of MS-COCO 2017 was used for performance
assessment and comparison. Experimental results show that the implemented light-weight
model not only maintains high detection performance but also accelerates the inference
efficiency to achieve real-time detection, making the deployment more promising.

中文摘要....................................................................................................................................II
Abstract ..................................................................................................................................... V
致謝.......................................................................................................................................... VI
目錄.........................................................................................................................................VII
圖片索引................................................................................................................................... X
表格索引............................................................................................................................... XIV
第一章 緒論............................................................................................................................ 1
1.1 研究背景介紹................................................................................................................. 4
1.1.1 背景介紹.................................................................................................................. 4
1.1.2 目前研究現況.......................................................................................................... 8
1.1.2.1 二階段物件偵測演算法.......................................................... 8
1.1.2.2 一階段物件偵測演算法.......................................................... 9
1.1.2.3 轉換器物件偵測演算法........................................................ 10
1.2 研究動機與目的........................................................................................................... 11
1.3 論文架構....................................................................................................................... 15
第二章 文獻探討.................................................................................................................. 16
2.1 類神經網路................................................................................................................... 19
2.1.1 前向傳播(Forward Propagation)................................................ 19
目錄
VIII
2.1.2 反向傳播(Backward Propagation)............................................. 23
2.1.3 卷積神經網路............................................................................ 28
2.1.4 卷積神經網路架構發展............................................................ 34
2.2 物件偵測相關文獻................................................................................................... 37
2.2.1 一階段物件偵測相關研究....................................................... 39
2.2.2 二階段物件偵測相關研究....................................................... 48
2.3.1 Anchor-based 與 Anchor-free 相關研究 ................................. 50
第三章 研究方法.................................................................................................................. 53
3.1 輕量化網路架構........................................................................................................... 53
3.2 知識蒸餾....................................................................................................................... 63
3.3 訓練技巧與稀疏性模型生成....................................................................................... 64
第四章 實驗結果.................................................................................................................. 70
4.1 實驗硬體設備與軟體工具........................................................................................... 70
4.2 實現細節....................................................................................................................... 71
4.2.1 訓練(Training) ....................................................................................................... 71
4.2.2 推論(Inference)...................................................................................................... 75
4.3 實驗結果與分析........................................................................................................... 75
4.3.1 評估標準介紹........................................................................................................ 75
目錄
IX
4.3.2 自我評估比較........................................................................................................ 80
4.3.3 在 MS-COCO 上的實驗結果與分析.................................................................... 81
4.3.4 與他人技術比較.................................................................................................... 85
第五章 結論與未來展望.................................................................................................... 101
參考文獻.............................................................................................................................. 102

                                

[1] J. S. YANG. "Object detection publications."
https://scholar.google.com.tw/scholar?as_q=object+detection&as_epq=&as_oq=&as_
eq=&as_occt=title&as_sauthors=&as_publication=&as_ylo=2000&as_yhi=2020&hl=
zh-TW&as_sdt=0%2C5 (accessed.
[2] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference
on computer vision, 2014: Springer, pp. 740-755.
[3] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural
networks," science, vol. 313, no. 5786, pp. 504-507, 2006.
[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale
hierarchical image database," in 2009 IEEE conference on computer vision and pattern
recognition, 2009: Ieee, pp. 248-255.
[5] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, "Selective search
for object recognition," International journal of computer vision, vol. 104, no. 2, pp.
154-171, 2013.
[6] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Region-based convolutional
networks for accurate object detection and segmentation," IEEE transactions on pattern
analysis and machine intelligence, vol. 38, no. 1, pp. 142-158, 2015.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep
convolutional neural networks," Advances in neural information processing systems,
vol. 25, pp. 1097-1105, 2012.
[8] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale
image recognition," arXiv preprint arXiv:1409.1556, 2014.
[9] W. Liu et al., "Ssd: Single shot multibox detector," in European conference on computer
參考文獻
103
vision, 2016: Springer, pp. 21-37.
[10] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature
pyramid networks for object detection," in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2017, pp. 2117-2125.
[11] M. Tan, R. Pang, and Q. V. Le, "Efficientdet: Scalable and efficient object detection,"
in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
2020, pp. 10781-10790.
[12] B. Zoph and Q. V. Le, "Neural architecture search with reinforcement learning," arXiv
preprint arXiv:1611.01578, 2016.
[13] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and
accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
[14] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable detr: Deformable
transformers for end-to-end object detection," arXiv preprint arXiv:2010.04159, 2020.
[15] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on
computer vision, 2015, pp. 1440-1448.
[16] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object
detection with region proposal networks," Advances in neural information processing
systems, vol. 28, pp. 91-99, 2015.
[17] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, "Regularized evolution for image
classifier architecture search," in Proceedings of the aaai conference on artificial
intelligence, 2019, vol. 33, no. 01, pp. 4780-4789.
[18] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural
networks," in International Conference on Machine Learning, 2019: PMLR, pp. 6105-
6114.
[19] 內政部警政署 . "109 年道路交通事故肇事原因與肇事者特性分析 ."
參考文獻
104
https://www.npa.gov.tw/ch/app/data/doc?module=wg057&detailNo=8593231033170
69824&type=s (accessed.
[20] Tesla. "Tesla Autopilot." https://www.tesla.com/autopilot (accessed.
[21] H. Mao et al., "Exploring the regularity of sparse structure in convolutional neural
networks," arXiv preprint arXiv:1705.08922, 2017.
[22] M. Khalil Alsmadi, K. B. Omar, S. A. Noah, and I. Almarashdah, "Performance
comparison of multi-layer perceptron (Back Propagation, Delta Rule and Perceptron)
algorithms in neural networks," in 2009 IEEE International Advance Computing
Conference, 2009: IEEE, pp. 296-299.
[23] A. Lydia and S. Francis, "Adagrad—an optimizer for stochastic gradient descent," Int.
J. Inf. Comput. Sci, vol. 6, no. 5, 2019.
[24] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint
arXiv:1412.6980, 2014.
[25] M. D. Zeiler, "Adadelta: an adaptive learning rate method," arXiv preprint
arXiv:1212.5701, 2012.
[26] N. Qian, "On the momentum term in gradient descent learning algorithms," Neural
networks, vol. 12, no. 1, pp. 145-151, 1999.
[27] G. Hinton, N. Srivastava, and K. Swersky, "Neural networks for machine learning
lecture 6a overview of mini-batch gradient descent," Cited on, vol. 14, no. 8, p. 2, 2012.
[28] H. Robbins and S. Monro, "A stochastic approximation method," The annals of
mathematical statistics, pp. 400-407, 1951.
[29] L. Luo, Y. Xiong, Y. Liu, and X. Sun, "Adaptive gradient methods with dynamic bound
of learning rate," arXiv preprint arXiv:1902.09843, 2019.
[30] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann
machines," in Icml, 2010.
參考文獻
105
[31] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in
convolutional network," arXiv preprint arXiv:1505.00853, 2015.
[32] C. Gulcehre, M. Moczulski, M. Denil, and Y. Bengio, "Noisy activation functions," in
International conference on machine learning, 2016: PMLR, pp. 3059-3068.
[33] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network
learning by exponential linear units (elus)," arXiv preprint arXiv:1511.07289, 2015.
[34] K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification," in Proceedings of the IEEE international
conference on computer vision, 2015, pp. 1026-1034.
[35] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to
document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[36] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition,"
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770-778.
[37] hoseong.lee. "Deep learning object detection."
https://github.com/hoya012/deep_learning_object_detection (accessed.
[38] P. Purkait, C. Zhao, and C. Zach, "SPP-Net: Deep absolute pose regression with
synthetic views," arXiv preprint arXiv:1712.03452, 2017.
[39] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, realtime object detection," in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2016, pp. 779-788.
[40] Z. Tian, C. Shen, H. Chen, and T. He, "Fcos: Fully convolutional one-stage object
detection," in Proceedings of the IEEE/CVF international conference on computer
vision, 2019, pp. 9627-9636.
[41] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint
參考文獻
106
arXiv:1804.02767, 2018.
[42] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.
[43] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance
segmentation," in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2018, pp. 8759-8768.
[44] G. Ghiasi, T.-Y. Lin, and Q. V. Le, "Dropblock: A regularization method for
convolutional networks," arXiv preprint arXiv:1810.12890, 2018.
[45] D. Misra, "Mish: A self regularized non-monotonic activation function," arXiv preprint
arXiv:1908.08681, 2019.
[46] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU loss: Faster and
better learning for bounding box regression," in Proceedings of the AAAI Conference
on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 12993-13000.
[47] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention
module," in Proceedings of the European conference on computer vision (ECCV), 2018,
pp. 3-19.
[48] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "Scaled-yolov4: Scaling cross stage
partial network," in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2021, pp. 13029-13038.
[49] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the
IEEE international conference on computer vision, 2017, pp. 2961-2969.
[50] P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient graph-based image
segmentation," International journal of computer vision, vol. 59, no. 2, pp. 167-181,
2004.
[51] S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, "Bridging the gap between anchor-based
參考文獻
107
and anchor-free detection via adaptive training sample selection," in Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9759-
9768.
[52] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object
detection," in Proceedings of the IEEE international conference on computer vision,
2017, pp. 2980-2988.
[53] Z. Zheng, R. Ye, P. Wang, J. Wang, D. Ren, and W. Zuo, "Localization Distillation for
Object Detection," arXiv preprint arXiv:2102.12252, 2021.
[54] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, "Aggregated residual transformations
for deep neural networks," in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2017, pp. 1492-1500.
[55] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected
convolutional networks," in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2017, pp. 4700-4708.
[56] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile
vision applications," arXiv preprint arXiv:1704.04861, 2017.
[57] X. Zhang, X. Zhou, M. Lin, and J. Sun, "Shufflenet: An extremely efficient
convolutional neural network for mobile devices," in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2018, pp. 6848-6856.
[58] M. Tan et al., "Mnasnet: Platform-aware neural architecture search for mobile," in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2019, pp. 2820-2828.
[59] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[60] P. Ramachandran, B. Zoph, and Q. V. Le, "Searching for activation functions," arXiv
參考文獻
108
preprint arXiv:1710.05941, 2017.
[61] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2:
Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2018, pp. 4510-4520.
[62] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet
and the impact of residual connections on learning," in Thirty-first AAAI conference
on artificial intelligence, 2017.
[63] G. Ghiasi, T.-Y. Lin, and Q. V. Le, "Nas-fpn: Learning scalable feature pyramid
architecture for object detection," in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2019, pp. 7036-7045.
[64] N. Wang et al., "NAS-FCOS: Fast neural architecture search for object detection," in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2020, pp. 11943-11951.
[65] S. Liu, D. Huang, and Y. Wang, "Learning spatial fusion for single-shot object
detection," arXiv preprint arXiv:1911.09516, 2019.
[66] S. Liu and D. Huang, "Receptive field block net for accurate and fast object detection,"
in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp.
385-400.
[67] Y. Wu and K. He, "Group normalization," in Proceedings of the European conference
on computer vision (ECCV), 2018, pp. 3-19.
[68] X. Li, W. Wang, X. Hu, J. Li, J. Tang, and J. Yang, "Generalized focal loss v2: Learning
reliable localization quality estimation for dense object detection," in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp.
11632-11641.
[69] X. Li et al., "Generalized focal loss: Learning qualified and distributed bounding boxes
參考文獻
109
for dense object detection," arXiv preprint arXiv:2006.04388, 2020.
[70] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep
bidirectional transformers for language understanding," arXiv preprint
arXiv:1810.04805, 2018.
[71] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self-training with noisy student improves
imagenet classification," in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2020, pp. 10687-10698.
[72] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network,"
arXiv preprint arXiv:1503.02531, 2015.
[73] L. Zhang and K. Ma, "Improve Object Detection with Feature-based Knowledge
Distillation: Towards Accurate and Efficient Detectors," in International Conference on
Learning Representations, 2020.
[74] L. Beyer, X. Zhai, A. Royer, L. Markeeva, R. Anil, and A. Kolesnikov, "Knowledge
distillation: A good teacher is patient and consistent," arXiv preprint arXiv:2106.05237,
2021.
[75] L. Yao, R. Pi, H. Xu, W. Zhang, Z. Li, and T. Zhang, "Joint-DetNAS: Upgrade Your
Detector with NAS, Pruning and Dynamic Distillation," in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10175-
10184.
[76] Y. Tian, D. Krishnan, and P. Isola, "Contrastive representation distillation," arXiv
preprint arXiv:1910.10699, 2019.
[77] Z. Shen and M. Savvides, "Meal v2: Boosting vanilla resnet-50 to 80%+ top-1 accuracy
on imagenet without tricks," arXiv preprint arXiv:2009.08453, 2020.
[78] I. Z. Yalniz, H. Jégou, K. Chen, M. Paluri, and D. Mahajan, "Billion-scale semisupervised learning for image classification," arXiv preprint arXiv:1905.00546, 2019.
參考文獻
110
[79] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized
intersection over union: A metric and a loss for bounding box regression," in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2019, pp. 658-666.
[80] P. Micikevicius et al., "Mixed precision training," arXiv preprint arXiv:1710.03740,
2017.
[81] NVIDIA. "Apex (A PyTorch Extension)." https://github.com/NVIDIA/apex (accessed.
[82] S. Elfwing, E. Uchibe, and K. Doya, "Sigmoid-weighted linear units for neural network
function approximation in reinforcement learning," Neural Networks, vol. 107, pp. 3-
11, 2018.
[83] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, "Ghostnet: More features from
cheap operations," in Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, 2020, pp. 1580-1589.
[84] A. Howard et al., "Searching for mobilenetv3," in Proceedings of the IEEE/CVF
International Conference on Computer Vision, 2019, pp. 1314-1324.
[85] I. Facebook, Mircosoft. "Open Neural Network Exchange." https://onnx.ai/ (accessed.
[86] Tencent. "NCNN." https://github.com/Tencent/ncnn (accessed.
[87] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman,
"The pascal visual object classes challenge: A retrospective," International journal of
computer vision, vol. 111, no. 1, pp. 98-136, 2015.
[88] P. Adarsh, P. Rathi, and M. Kumar, "YOLO v3-Tiny: Object Detection and Recognition
using one stage improved model," in 2020 6th International Conference on Advanced
Computing and Communication Systems (ICACCS), 2020: IEEE, pp. 687-694.
[89] RangiLyu. "NanoDet." https://github.com/RangiLyu/nanodet (accessed

全文公開日期 2024/09/15 (校內網路)
全文公開日期 2026/09/15 (校外網路)
全文公開日期 2026/09/15 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文