基於YOLOv3架構的二值化物件偵測模型及其硬體加速器｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃偉倫 Wei-Lun Huang
論文名稱：	基於YOLOv3架構的二值化物件偵測模型及其硬體加速器 Binarized Object Detection Model Based on YOLOv3 Architecture and Its Hardware Accelerator
指導教授：	林昌鴻 Chang-Hong Lin
口試委員:	呂政修 Jenq-Shiou Leu 沈中安 Chung-An Shen 林敬舜 Ching-Shun Lin
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	102
中文關鍵詞：	二值化神經網路、硬體加速器、物件偵測、深度學習、卷積神經網路
外文關鍵詞：	Binarized neural network, Hardware accelerator, Object detection, Deep learning, Convolution neural network
相關次數：	點閱：233 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

物件偵測被廣泛應用在監控、自駕車等應用中。近年來，基於深度學習的物件偵測方法展現出優秀的性能。然而基於深度學習的物件偵測方法往往需要大量的運算資源和記憶體頻寬，因此只能部屬在有高性能圖形處理器的電腦上；這種特性限制了基於深度學習的物件偵測方法的應用範圍。為了緩解這個問題，研究者們提出了用於緊湊的模型設計、削減參數量或是模型量化的方法。在模型量化的領域中，將大部分的特徵圖和權重二值化的量化方式(W1A1)可大幅度的縮減模型大小及簡化運算。因此在本論文中我們將YOLOv3 [1]模型的架構結合近年來先進的W1A1二值化方法提出了一個新的W1A1二值化物件偵測模型，並且在物件偵測領域中最廣泛使用的PASCAL VOC資料集[2]和COCO資料集[3]上訓練和測試我們提出的模型。根據實驗結果，我們提出的模型和以前提出的方法相比在模型的大小和準確度上取得較佳的平衡。此外我們也針對提出的模型進行更進一步的化簡和量化使其更適合實作在硬體上，然後再針對其中的二值化區塊的運算提出一個硬體加速器。和常規的中央處理器相比，該硬體加速器展現出較佳的性能。

The object detection has been widely applied in applications, such as the surveillance or the self-driving vehicles. In recent years, deep learning based object detection methods have demonstrated excellent performance. However, deep learning based methods often require huge computational cost and memory bandwidth, so that they can only be deployed on a computer with high performance graphics processing units; this characteristic limits the application field of the deep learning based object detection. To relieve the problem, researchers proposed methods, such as the compact model design, the pruning technique or the model quantization. In the model quantization field, the W1A1 quantization, which binarizes most of the weights and the feature maps in the model can largely reduce the model size and the computation. Thus, in this thesis we combine the architecture of the YOLOv3 [1] with the state of the art W1A1 quantization techniques and then propose a new W1A1 binarized object detection model. We train and test the proposed model on the widely used datasets for the object detection task: the PASCAL VOC dataset [2] and the COCO dataset [3]. According to the experimental results, the proposed model achieves a better trade-off between the model size and the accuracy than the previous works. Moreover, we further reduce and quantize the proposed model to make it more hardware friendly, and then propose a hardware accelerator for the binarized blocks in the model. Compare to the conventional central processing units, the proposed hardware accelerator shows better performance.

摘要    I
ABSTRACT    II
致謝    III
LIST OF CONTENTS    IV
LIST OF FIGURES    VII
LIST OF TABLES    IX
CHAPTER 1  INTRODUCTIONS    1
1  Motivation    1
2  Contributions    2
3  Thesis Organization    3
CHAPTER 2  RELATED WORKS    4
1  Deep learning based object detection    4
2  W1A1 quantization    5
3  Binarized neural network hardware accelerator    6
CHAPTER 3    PROPOSED METHODS    8
1  Data augmentation and preprocessing    9
1.1    Random horizontal flip    9
1.2    Random shift    10
1.3    Random crop    12
1.4    Resizing    13
1.5    Mixup [42]    14
1.6    Normalization    17
1.7    Label smoothing [43]    18
2  ReAct YOLO    19
2.1    Backbone    19
2.2    FPN [34]    25
2.3    Detection block    28
3  Loss functions    33
3.1    Ground truth    34
3.2  loss_locate    36
3.3  loss_Conf    38
3.4  loss_class    39
4  Training tricks    40
4.1    Learning rate adjustment    40
4.2    Design for backpropagation of the sign function    41
5  Model reduction and quantization    42
5.1    Reduction    43
5.2    Quantization    45
6  Hardware accelerator    46
6.1    Bin_Conv 3X3 block    48
6.2    Bin_Conv 1X1 block    56
6.3    BN_Act block    59
6.4    AXI4-stream and AXI4-lite interface    61
6.5    Controller    63
CHAPTER 4  EXPERIMENTAL RESULTS    64
1  Experimental environment    64
1.1  Environment for the proposed model    64
1.2  Environment for the proposed hardware accelerator    65
2  Dataset    66
2.1  PASCAL VOC dataset [2]    67
2.2  COCO dataset [3]    68
3  Evaluation metrics    70
3.1  VOC 2007 [2] metric    72
3.2  COCO [3] metrics    73
4  Model performance    74
4.1  Training settings    74
4.2  VOC 2007 [2]    74
4.3  COCO [3]    77
5  Hardware accelerator performance    79
CHAPTER 5    CONCLUSIONS AND FUTURE WORKS    81
1  Conclusions    81
2  Future Works    82
REFERENCES    83
APPENDIX A    88
A.1  The results on the VOC 2007 testing set [2]    88
A.2  The results on the COCO 2017 testing set [3]    89

                                

[1] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.
[2] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.
[3] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," in European Conference on Computer Vision, 2014: Springer, pp. 740-755.
[4] S. Jha, C. Seo, E. Yang, and G. P. Joshi, "Real Time Object Detection and Trackingsystem for Video Surveillance System," Multimedia Tools and Applications, vol. 80, no. 3, pp. 3981-3996, 2021.
[5] T. Santad, P. Silapasupphakornwong, W. Choensawat, and K. Sookhanaphibarn, "Application of YOLO Deep Learning Model for Real Time Abandoned Baggage Detection," in 2018 IEEE 7th Global Conference on Consumer Electronics, 2018: IEEE, pp. 157-158.
[6] C. Asha and A. Narasimhadhan, "Vehicle Counting for Traffic Management System using YOLO and Correlation Filter," in 2018 IEEE International Conference on Electronics, Computing and Communication Technologies, 2018: IEEE, pp. 1-6.
[7] J. Xu, Y. Nie, P. Wang, and A. M. López, "Training A Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving," in 2019 International Conference on Robotics and Automation, 2019: IEEE, pp. 2379-2384.
[8] S. P. Rajendran, L. Shine, R. Pradeep, and S. Vijayaraghavan, "Fast and Accurate Traffic Sign Recognition for Self Driving Cars using RetinaNet Based Detector," in 2019 International Conference on Communication and Electronics Systems, 2019: IEEE, pp. 784-790.
[9] C. Hao, A. Sarwari, Z. Jin, H. Abu-Haimed, D. Sew, Y. Li, X. Liu, B. Wu, D. Fu, and J. Gu, "A Hybrid GPU+ FPGA System Design for Autonomous Driving Cars," in 2019 IEEE International Workshop on Signal Processing Systems, 2019: IEEE, pp. 121-126.
[10] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520.
[11] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size," arXiv preprint arXiv:1602.07360, 2016.
[12] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, 2017.
[13] S. Xu, A. Huang, L. Chen, and B. Zhang, "Convolutional Neural Network Pruning: A Survey," in 2020 39th Chinese Control Conference, 2020, pp. 7458-7463.
[14] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704-2713.
[15] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "XNOR-Net: Imagenet Classification Using Binary Convolutional Neural Networks," in European Conference on Computer Vision, 2016: Springer, pp. 525-542.
[16] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, "DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," arXiv preprint arXiv:1606.06160, 2016.
[17] Z. Liu, Z. Shen, M. Savvides, and K.-T. Cheng, "ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions," in European Conference on Computer Vision, 2020: Springer, pp. 143-159.
[18] Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K.-T. Cheng, "Bi-Real Net: Enhancing the Performance of 1-bit CNNs with Improved Representational Capability and Advanced Training Algorithm," in European Conference on Computer Vision, 2018, pp. 722-737.
[19] J. Bethge, C. Bartz, H. Yang, Y. Chen, and C. Meinel, "MeliusNet: An Improved Network Architecture for Binary Neural Networks," in IEEE Winter Conference on Applications of Computer Vision, 2021, pp. 1439-1448.
[20] A. J. Redfern, L. Zhu, and M. K. Newquist, "BCNN: A Binary CNN with All Matrix Ops Quantized to 1 Bit Precision," in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 4604-4612.
[21] P. Guo, H. Ma, R. Chen, P. Li, S. Xie, and D. Wang, "FBNA: A Fully Binarized Neural Network Accelerator," in 2018 28th International Conference on Field Programmable Logic and Applications, 2018: IEEE, pp. 51-513.
[22] M. Ghasemzadeh, M. Samragh, and F. Koushanfar, "ReBNet: Residual Binarized Neural Network," in 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines, 2018: IEEE, pp. 57-64.
[23] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, "FINN: A Framework for Fast, Scalable Binarized Neural Network Inference," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 65-74.
[24] R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, and Z. Zhang, "Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 15-24.
[25] Z. Wang, Z. Wu, J. Lu, and J. Zhou, "BiDet: An Efficient Binarized Object Detector," in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2049-2058.
[26] S. Xu, J. Zhao, J. Lu, B. Zhang, S. Han, and D. Doermann, "Layer-Wise Searching for 1-Bit Detectors," in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5682-5691.
[27] J.-I. Guo, C.-C. Tsai, J.-L. Zeng, S.-W. Peng, and E.-C. Chang, "Hybrid Fixed-Point/Binary Deep Neural Network Design Methodology for Low-Power Object Detection," IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 10, no. 3, pp. 388-400, 2020.
[28] H. Nakahara, H. Yonekawa, T. Fujii, and S. Sato, "A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for An FPGA," in 2018 ACM/SIGDA International Symposium on field-programmable gate arrays, 2018, pp. 31-40.
[29] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
[30] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Advances in Neural Information Processing Systems, 2015, pp. 91-99.
[31] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.
[32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single Shot Multibox Detector," in European Conference on Computer Vision, 2016: Springer, pp. 21-37.
[33] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, "DSSD: Deconvolutional Single Shot Detector," arXiv preprint arXiv:1701.06659, 2017.
[34] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[35] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263-7271.
[36] K. He, X. Zhang, S. Ren, and J. Sun, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on Imagenet Classification," in IEEE International Conference on Computer Vision, 2015, pp. 1026-1034.
[37] A. Krizhevsky and G. Hinton., "Learning Multiple Layers of Features from Tiny Images," Technical Report. Department of Coumputer Science, University of Toronto, 2009.
[38] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," in International Conference on Machine Learning, 2015: PMLR, pp. 448-456.
[39] L. Deng, "The MNIST Database of Handwritten Digit Images for Machine Learning Research," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141-142, 2012.
[40] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, "Reading Digits in Natural Images with Unsupervised Feature Learning," Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
[41] J. Deng, W. Dong, R. Socher, L. Li, L. Kai, and F.-F. Li, "ImageNet: A Large-Scale Hierarchical Image Database," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 20-25 June 2009, pp. 248-255.
[42] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, "Mixup: Beyond Empirical Risk Minimization," International Conference on Learning Representations, 2018.
[43] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818-2826.
[44] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier Nonlinearities Improve Neural Network Acoustic Models," in International Conference on Machine Learning Workshop on Deep Learning for Audio, Speech and Language Processing, 2013, vol. 30, no. 1: Citeseer, p. 3.
[45] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression," in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 658-666.
[46] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal Loss for Dense Object Detection," in IEEE International Conference on Computer Vision, 2017, pp. 2980-2988.
[47] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour," arXiv preprint arXiv:1706.02677, 2017.
[48] Xilinx. Vivado Design Suite AXI Reference Guide (UG1037) [Online] Available: https://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf
[49] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, 2016.
[50] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, "PyTorch: An Imperative Style, High-Performance Deep Learning Library," Advances in Neural Information Processing Systems, vol. 32, pp. 8026-8037, 2019.
[51] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, and N. J. Smith, "Array Programming with NumPy," Nature, vol. 585, no. 7825, pp. 357-362, 2020.
[52] Digilent. ZedBoard Zynq-7000 Development Board Reference Manual [Online] Available:https://reference.digilentinc.com/programmable-logic/zedboard/reference-manual
[53] COCO Detection Challenge (Bounding Box) [Online] Available: https://competitions.codalab.org/competitions/20794
[54] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in International Conference on Learning Representations, 2015.
[55] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in International Conference on Learning Representations, 2015.

全文公開日期 2024/08/24 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文