簡易檢索 / 詳目顯示

研究生: 黃偉倫
Wei-Lun Huang
論文名稱: 基於YOLOv3架構的二值化物件偵測模型及其硬體加速器
Binarized Object Detection Model Based on YOLOv3 Architecture and Its Hardware Accelerator
指導教授: 林昌鴻
Chang-Hong Lin
口試委員: 呂政修
Jenq-Shiou Leu
沈中安
Chung-An Shen
林敬舜
Ching-Shun Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 102
中文關鍵詞: 二值化神經網路硬體加速器物件偵測深度學習卷積神經網路
外文關鍵詞: Binarized neural network, Hardware accelerator, Object detection, Deep learning, Convolution neural network
相關次數: 點閱:233下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 物件偵測被廣泛應用在監控、自駕車等應用中。近年來,基於深度學習的物件偵測方法展現出優秀的性能。然而基於深度學習的物件偵測方法往往需要大量的運算資源和記憶體頻寬,因此只能部屬在有高性能圖形處理器的電腦上;這種特性限制了基於深度學習的物件偵測方法的應用範圍。為了緩解這個問題,研究者們提出了用於緊湊的模型設計、削減參數量或是模型量化的方法。在模型量化的領域中,將大部分的特徵圖和權重二值化的量化方式(W1A1)可大幅度的縮減模型大小及簡化運算。因此在本論文中我們將YOLOv3 [1]模型的架構結合近年來先進的W1A1二值化方法提出了一個新的W1A1二值化物件偵測模型,並且在物件偵測領域中最廣泛使用的PASCAL VOC資料集[2]和COCO資料集[3]上訓練和測試我們提出的模型。根據實驗結果,我們提出的模型和以前提出的方法相比在模型的大小和準確度上取得較佳的平衡。此外我們也針對提出的模型進行更進一步的化簡和量化使其更適合實作在硬體上,然後再針對其中的二值化區塊的運算提出一個硬體加速器。和常規的中央處理器相比,該硬體加速器展現出較佳的性能。


    The object detection has been widely applied in applications, such as the surveillance or the self-driving vehicles. In recent years, deep learning based object detection methods have demonstrated excellent performance. However, deep learning based methods often require huge computational cost and memory bandwidth, so that they can only be deployed on a computer with high performance graphics processing units; this characteristic limits the application field of the deep learning based object detection. To relieve the problem, researchers proposed methods, such as the compact model design, the pruning technique or the model quantization. In the model quantization field, the W1A1 quantization, which binarizes most of the weights and the feature maps in the model can largely reduce the model size and the computation. Thus, in this thesis we combine the architecture of the YOLOv3 [1] with the state of the art W1A1 quantization techniques and then propose a new W1A1 binarized object detection model. We train and test the proposed model on the widely used datasets for the object detection task: the PASCAL VOC dataset [2] and the COCO dataset [3]. According to the experimental results, the proposed model achieves a better trade-off between the model size and the accuracy than the previous works. Moreover, we further reduce and quantize the proposed model to make it more hardware friendly, and then propose a hardware accelerator for the binarized blocks in the model. Compare to the conventional central processing units, the proposed hardware accelerator shows better performance.

    摘要 I ABSTRACT II 致謝 III LIST OF CONTENTS IV LIST OF FIGURES VII LIST OF TABLES IX CHAPTER 1 INTRODUCTIONS 1 1.1 Motivation 1 1.2 Contributions 2 1.3 Thesis Organization 3 CHAPTER 2 RELATED WORKS 4 2.1 Deep learning based object detection 4 2.2 W1A1 quantization 5 2.3 Binarized neural network hardware accelerator 6 CHAPTER 3 PROPOSED METHODS 8 3.1 Data augmentation and preprocessing 9 3.1.1 Random horizontal flip 9 3.1.2 Random shift 10 3.1.3 Random crop 12 3.1.4 Resizing 13 3.1.5 Mixup [42] 14 3.1.6 Normalization 17 3.1.7 Label smoothing [43] 18 3.2 ReAct YOLO 19 3.2.1 Backbone 19 3.2.2 FPN [34] 25 3.2.3 Detection block 28 3.3 Loss functions 33 3.3.1 Ground truth 34 3.3.2 loss_locate 36 3.3.3 loss_Conf 38 3.3.4 loss_class 39 3.4 Training tricks 40 3.4.1 Learning rate adjustment 40 3.4.2 Design for backpropagation of the sign function 41 3.5 Model reduction and quantization 42 3.5.1 Reduction 43 3.5.2 Quantization 45 3.6 Hardware accelerator 46 3.6.1 Bin_Conv 3X3 block 48 3.6.2 Bin_Conv 1X1 block 56 3.6.3 BN_Act block 59 3.6.4 AXI4-stream and AXI4-lite interface 61 3.6.5 Controller 63 CHAPTER 4 EXPERIMENTAL RESULTS 64 4.1 Experimental environment 64 4.1.1 Environment for the proposed model 64 4.1.2 Environment for the proposed hardware accelerator 65 4.2 Dataset 66 4.2.1 PASCAL VOC dataset [2] 67 4.2.2 COCO dataset [3] 68 4.3 Evaluation metrics 70 4.3.1 VOC 2007 [2] metric 72 4.3.2 COCO [3] metrics 73 4.4 Model performance 74 4.4.1 Training settings 74 4.4.2 VOC 2007 [2] 74 4.4.3 COCO [3] 77 4.5 Hardware accelerator performance 79 CHAPTER 5 CONCLUSIONS AND FUTURE WORKS 81 5.1 Conclusions 81 5.2 Future Works 82 REFERENCES 83 APPENDIX A 88 A.1 The results on the VOC 2007 testing set [2] 88 A.2 The results on the COCO 2017 testing set [3] 89

    [1] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.
    [2] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.
    [3] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," in European Conference on Computer Vision, 2014: Springer, pp. 740-755.
    [4] S. Jha, C. Seo, E. Yang, and G. P. Joshi, "Real Time Object Detection and Trackingsystem for Video Surveillance System," Multimedia Tools and Applications, vol. 80, no. 3, pp. 3981-3996, 2021.
    [5] T. Santad, P. Silapasupphakornwong, W. Choensawat, and K. Sookhanaphibarn, "Application of YOLO Deep Learning Model for Real Time Abandoned Baggage Detection," in 2018 IEEE 7th Global Conference on Consumer Electronics, 2018: IEEE, pp. 157-158.
    [6] C. Asha and A. Narasimhadhan, "Vehicle Counting for Traffic Management System using YOLO and Correlation Filter," in 2018 IEEE International Conference on Electronics, Computing and Communication Technologies, 2018: IEEE, pp. 1-6.
    [7] J. Xu, Y. Nie, P. Wang, and A. M. López, "Training A Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving," in 2019 International Conference on Robotics and Automation, 2019: IEEE, pp. 2379-2384.
    [8] S. P. Rajendran, L. Shine, R. Pradeep, and S. Vijayaraghavan, "Fast and Accurate Traffic Sign Recognition for Self Driving Cars using RetinaNet Based Detector," in 2019 International Conference on Communication and Electronics Systems, 2019: IEEE, pp. 784-790.
    [9] C. Hao, A. Sarwari, Z. Jin, H. Abu-Haimed, D. Sew, Y. Li, X. Liu, B. Wu, D. Fu, and J. Gu, "A Hybrid GPU+ FPGA System Design for Autonomous Driving Cars," in 2019 IEEE International Workshop on Signal Processing Systems, 2019: IEEE, pp. 121-126.
    [10] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520.
    [11] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size," arXiv preprint arXiv:1602.07360, 2016.
    [12] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, 2017.
    [13] S. Xu, A. Huang, L. Chen, and B. Zhang, "Convolutional Neural Network Pruning: A Survey," in 2020 39th Chinese Control Conference, 2020, pp. 7458-7463.
    [14] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704-2713.
    [15] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "XNOR-Net: Imagenet Classification Using Binary Convolutional Neural Networks," in European Conference on Computer Vision, 2016: Springer, pp. 525-542.
    [16] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, "DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," arXiv preprint arXiv:1606.06160, 2016.
    [17] Z. Liu, Z. Shen, M. Savvides, and K.-T. Cheng, "ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions," in European Conference on Computer Vision, 2020: Springer, pp. 143-159.
    [18] Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K.-T. Cheng, "Bi-Real Net: Enhancing the Performance of 1-bit CNNs with Improved Representational Capability and Advanced Training Algorithm," in European Conference on Computer Vision, 2018, pp. 722-737.
    [19] J. Bethge, C. Bartz, H. Yang, Y. Chen, and C. Meinel, "MeliusNet: An Improved Network Architecture for Binary Neural Networks," in IEEE Winter Conference on Applications of Computer Vision, 2021, pp. 1439-1448.
    [20] A. J. Redfern, L. Zhu, and M. K. Newquist, "BCNN: A Binary CNN with All Matrix Ops Quantized to 1 Bit Precision," in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 4604-4612.
    [21] P. Guo, H. Ma, R. Chen, P. Li, S. Xie, and D. Wang, "FBNA: A Fully Binarized Neural Network Accelerator," in 2018 28th International Conference on Field Programmable Logic and Applications, 2018: IEEE, pp. 51-513.
    [22] M. Ghasemzadeh, M. Samragh, and F. Koushanfar, "ReBNet: Residual Binarized Neural Network," in 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines, 2018: IEEE, pp. 57-64.
    [23] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, "FINN: A Framework for Fast, Scalable Binarized Neural Network Inference," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 65-74.
    [24] R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, and Z. Zhang, "Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 15-24.
    [25] Z. Wang, Z. Wu, J. Lu, and J. Zhou, "BiDet: An Efficient Binarized Object Detector," in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2049-2058.
    [26] S. Xu, J. Zhao, J. Lu, B. Zhang, S. Han, and D. Doermann, "Layer-Wise Searching for 1-Bit Detectors," in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5682-5691.
    [27] J.-I. Guo, C.-C. Tsai, J.-L. Zeng, S.-W. Peng, and E.-C. Chang, "Hybrid Fixed-Point/Binary Deep Neural Network Design Methodology for Low-Power Object Detection," IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 10, no. 3, pp. 388-400, 2020.
    [28] H. Nakahara, H. Yonekawa, T. Fujii, and S. Sato, "A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for An FPGA," in 2018 ACM/SIGDA International Symposium on field-programmable gate arrays, 2018, pp. 31-40.
    [29] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
    [30] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Advances in Neural Information Processing Systems, 2015, pp. 91-99.
    [31] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.
    [32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single Shot Multibox Detector," in European Conference on Computer Vision, 2016: Springer, pp. 21-37.
    [33] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, "DSSD: Deconvolutional Single Shot Detector," arXiv preprint arXiv:1701.06659, 2017.
    [34] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
    [35] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263-7271.
    [36] K. He, X. Zhang, S. Ren, and J. Sun, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on Imagenet Classification," in IEEE International Conference on Computer Vision, 2015, pp. 1026-1034.
    [37] A. Krizhevsky and G. Hinton., "Learning Multiple Layers of Features from Tiny Images," Technical Report. Department of Coumputer Science, University of Toronto, 2009.
    [38] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," in International Conference on Machine Learning, 2015: PMLR, pp. 448-456.
    [39] L. Deng, "The MNIST Database of Handwritten Digit Images for Machine Learning Research," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141-142, 2012.
    [40] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, "Reading Digits in Natural Images with Unsupervised Feature Learning," Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
    [41] J. Deng, W. Dong, R. Socher, L. Li, L. Kai, and F.-F. Li, "ImageNet: A Large-Scale Hierarchical Image Database," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 20-25 June 2009, pp. 248-255.
    [42] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, "Mixup: Beyond Empirical Risk Minimization," International Conference on Learning Representations, 2018.
    [43] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818-2826.
    [44] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier Nonlinearities Improve Neural Network Acoustic Models," in International Conference on Machine Learning Workshop on Deep Learning for Audio, Speech and Language Processing, 2013, vol. 30, no. 1: Citeseer, p. 3.
    [45] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression," in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 658-666.
    [46] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal Loss for Dense Object Detection," in IEEE International Conference on Computer Vision, 2017, pp. 2980-2988.
    [47] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour," arXiv preprint arXiv:1706.02677, 2017.
    [48] Xilinx. Vivado Design Suite AXI Reference Guide (UG1037) [Online] Available: https://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf
    [49] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, 2016.
    [50] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, "PyTorch: An Imperative Style, High-Performance Deep Learning Library," Advances in Neural Information Processing Systems, vol. 32, pp. 8026-8037, 2019.
    [51] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, and N. J. Smith, "Array Programming with NumPy," Nature, vol. 585, no. 7825, pp. 357-362, 2020.
    [52] Digilent. ZedBoard Zynq-7000 Development Board Reference Manual [Online] Available:https://reference.digilentinc.com/programmable-logic/zedboard/reference-manual
    [53] COCO Detection Challenge (Bounding Box) [Online] Available: https://competitions.codalab.org/competitions/20794
    [54] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in International Conference on Learning Representations, 2015.
    [55] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in International Conference on Learning Representations, 2015.

    無法下載圖示 全文公開日期 2024/08/24 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE