研究生: |
張丁友 Ting-Yu Chang |
---|---|
論文名稱: |
基於結構重參數化之改良式輕量化模型架構 Improved Lightweight Architecture with Structural Re-parameterization |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
范志鵬
Chih-Peng Fan 楊士萱 Shih-Hsuan Yang 黃敬群 Ching-Chun Huang 王乃堅 Nai-Jian Wang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 78 |
中文關鍵詞: | 輕量化模型 、模型壓縮 、模型加速 、結構重參數化 |
外文關鍵詞: | lightweight model, model compression, model acceleration, re-parameterization |
相關次數: | 點閱:240 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在以往輕量化的方法中,無論是輕量化模型架構的設計或是模型壓縮與加速的方法,大部分的設計方法都是以減少參數量為主要目的,然而有時候減少參數量並不一定能帶來加速的效果,有時候又是相輔相成的,且在減少參數量的過程當中,連帶準確度也因而下降,因此如何在效率與準確度之間取得平衡一直都是輕量化方法的重要課題。本論文使用不同於以往的模型壓縮與加速得方法,嘗試不再不斷降低參數量來達到加速模型推論速度,而是透過優化模型推論階段的算法以及整體架構的複雜度,進一步使模型在不影響原本架構性能情況下,也不會產生額外的推論時間成本,達到不降低準確度的前提,提升模型推論速度。
本論文使用結構重參數化方法與輕量化模型MobileNetV2進行結合,並提出Bottleneck架構新的結構重參數化等效轉換方法,進一步使MobileNetV2可以從訓練階段Inverted Residual的多分支架構,等效成推論階段時只有Convolution以及Activation function的單路架構,其中推論階段 Convolution的 kernel size皆為3,因為這樣的Convolution計算密度是最好的。最後,實驗結果顯示,透過結構重參數化能有效提高分類準確率約1%,且推論速度有明顯的提升。
In previous lightweight methods, whether in the design of lightweight model architectures or in model compression and acceleration techniques, most design approaches focused on reducing the number of parameters as the primary objective. However, reducing the number of parameters does not always guarantee improved acceleration. Sometimes, the two factors are complementary, and during the process of reducing parameters, there is a simultaneous decrease in accuracy. Therefore, achieving a balance between efficiency and accuracy has always been an important issue in lightweight methods.This paper employs a different approach than previous model compression and acceleration methods. Instead of continuously reducing the number of parameters to achieve faster model inference, it optimizes the algorithm of the model inference stage and the overall complexity of the architecture. This approach ensures that the model does not incur additional inference time costs while maintaining the original architectural performance, thereby improving model inference speed without sacrificing accuracy.
In this thesis, the Structural Re-parameterization method is combined with the lightweight model MobileNetV2. A new equivalent transformation method called the Bottleneck structure re-parameterization is proposed. This transformation allows MobileNetV2, which has a multi-branch structure in the training stage (Inverted Residual), to be equivalent to a single-path structure with only Convolution and Activation functions in the inference stage. In the inference stage, the kernel size of Convolution is set to 3, as it provides the best computation density. Finally, experimental results show that structural re-parameterization effectively improves classification accuracy by approximately 1% while significantly enhancing inference speed.
[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[2] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014: Springer, pp. 818-833.
[3] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
[4] R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in International conference on machine learning, 2013: PMLR, pp. 1310-1318.
[5] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Icml, 2010.
[6] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," arXiv preprint arXiv:1511.07289, 2015.
[7] W. Shang, K. Sohn, D. Almeida, and H. Lee, "Understanding and improving convolutional neural networks via concatenated rectified linear units," in international conference on machine learning, 2016: PMLR, pp. 2217-2225.
[8] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," arXiv preprint arXiv:1706.02515, 2017.
[9] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
[10] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011: JMLR Workshop and Conference Proceedings, pp. 315-323.
[11] C. Gulcehre, M. Moczulski, M. Denil, and Y. Bengio, "Noisy activation functions," in International conference on machine learning, 2016: PMLR, pp. 3059-3068.
[12] M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural networks," arXiv preprint arXiv:1301.3557, 2013.
[13] C. Gulcehre, K. Cho, R. Pascanu, and Y. Bengio, "Learned-norm pooling for deep feedforward and recurrent neural networks," in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2014: Springer, pp. 530-546.
[14] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
[15] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, "On the importance of initialization and momentum in deep learning," in International conference on machine learning,2013: PMLR, pp. 1139-1147.
[16] A. Botev, G. Lever, and D. Barber, "Nesterov's accelerated gradient and momentum as approximations to regularised update descent," in 2017 International Joint Conference on Neural Networks (IJCNN), 2017: IEEE, pp. 1899-1903.
[17] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, "3D U-Net: learning dense volumetric segmentation from sparse annotation," in International conference on medical image computing and computer-assisted intervention, 2016: Springer, pp. 424-432.
[18] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[19] C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of Big Data, vol. 6, no. 1, pp. 1-48, 2019.
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[22] G. H. Dunteman, Principal components analysis (no. 69). Sage, 1989.
[23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
[24] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[25] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[26] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[27] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), 2016, pp. 265-283.
[28] Y. Jia et al., "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675-678.
[29] A. Paszke et al., "Pytorch: An imperative style, high-performance deep learning library," arXiv preprint arXiv:1912.01703, 2019. [30] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand and H. Adam,"Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861,2017. [31] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. C. Chen, "MobileNetV2: Inverted residuals and linear bottlenecks," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520, 2018. [32] N.Ma, X. Zhang, H. T. Zheng, and J. Sun, "Shufflenet v2: Practical guidelines for efficient cnn architecture design," In Proceedings of the European conference on computer vision (ECCV), pp. 116-131, 2018. [33] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding and J. Sun, "Repvgg: Making vgg-style convnets great again," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13733-13742, 2021. [34] X. Ding, X. Zhang, J. Han and G. Ding, "Diverse branch block: Building a convolution as an inception-like unit," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10886-10895, 2021.
[35] K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016. [36] A. Howard, M. Sandler, G. Chu, L. C. Chen, B. Chen, M. Tan and H. Adam, "Searching for mobilenetv3," In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1314-1324, 2019. [37] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," In International conference on machine learning, pp. 6105-6114 PMLR, 2019.