研究生: |
江郡邦 Jun-Bang Jiang |
---|---|
論文名稱: |
運用多樣教師結構進行知識蒸餾於模型壓縮之研究 Study of Knowledge Distillation Based Model Compression by Various Structures with Multi Teachers |
指導教授: |
呂政修
Jenq-Shiou Leu |
口試委員: |
鄭瑞光
Ray-Guang Cheng 陳郁堂 Yie-Tarng Chen 方文賢 Wen-Hsien Fang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 48 |
中文關鍵詞: | 人工智慧 、深度學習 、模型壓縮 、神經網路 、捲積神經網路 、知識蒸餾 、遷移學習 |
外文關鍵詞: | Artificial Intelligence, Deep Learning, Model Compression, Transfer Learning, Knowledge Distillation, Convolution Neural Network |
相關次數: | 點閱:228 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著市場上應用情境的需求不斷提升與改變,深度學習網路架構也逐漸變的複雜且龐大,準確度上升的同時造成了模型參數指數性成長,亦產生了模型效率不佳等更多延伸問題,各個領域的應用需要在準確度與易用性上做出取捨,找到一個良好的平衡點並符合使用需求。因應此種神經網路龐大的問題,模型壓縮的研究極具必要性,最小化縮減模型體積的同時,使準確度仍然保持一定水準甚至是再次提升,是該領域研究的極致目標。
知識蒸餾這項技術屬於知識遷移的一種分支,這項概念也能適合地被使用在神經網路的模型壓縮,相比於參數剪枝、量化、矩陣分解…等方法,知識蒸餾能有效地將模型知識轉移到相對小的架構上,進而達到提高模型效能的結果。在這篇論文中,我們以知識蒸餾技術為主體,提出一種改良型的架構來提升知識遷移的效率,降低模型參數量、推論時間,並且保持甚至提升模型準確度,讓學生模型具備更加的應用能力,實驗結果除了由直觀的準確度做觀察,我們還將從各個面向進行綜合分析,並與MobileNet進行比較,驗證在此種改良型知識蒸餾架構下的學生模型,具備更好的應用能力。
Artificial Intelligence (AI) has been used in our life since the near-century. According to the application context, we need more complex AI models to satisfy the requirement. However, the increased complexity of models would cause lots of problems, such as the high requirements of hardware resources and consuming of inference time. Therefore, model compression is an important technology to solve this kind of problem and make the AI be more efficient and easily applied to different applications.
In this paper, we have proposed an improved Knowledge Distillation architecture to increase the performance of knowledge transfer. The proposed architecture includes three parts: Multi-teacher, Dynamic Temperature, and Multi-experienced-teacher, and it was implemented in two different Convolution Neural Network (CNN), LeNet and ResNet. In order to evaluate the performance of this architecture, we observe not only the accuracy but also make a comprehensive comparison with MobileNet. The evaluated result showed our proposed approach can achieve a better performance in the comprehensive comparison. It has a great advantage in inference time and it means the resulting model would be more reliable for an application context.
[1] "Gartner Says Global Artificial Intelligence Business Value to Reach $1.2 Trillion in 2018," Available in 2018: https://www.gartner.com/en/newsroom/press-releases/2018-04-25-gartner-says-global-artificial-intelligence-business-value-to-reach-1-point-2-trillion-in-2018#:~:text=Global%20business%20value%20derived%20from,reach%20%243.9%20trillion%20in%202022.
[2] Wikipedia, " AlphaGo versus Lee Sedol," Available in 2020: https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
[3] S. Ravi, "Custom On-Device ML Models with Learn2Compress," 2018.
[4] Y. L. Cun, J. S. Denker, and S. A. Solla, "Optimal brain damage," 1990.
[5] Y. Cheng, D. Wang, P. Zhou, and T. Zhang, "A Survey of Model Compression and Acceleration for Deep Neural Networks," arXiv e-prints, p. arXiv:1710.09282Accessed on: October 01, 2017
[6] T. Choudhary, V. Mishra, A. Goswami, and J. J. A. I. R. Sarangapani, "A comprehensive survey on model compression and acceleration," pp. 1 - 43, 2020.
[7] S. Han et al., "DSD: Dense-Sparse-Dense Training for Deep Neural Networks," arXiv e-prints, p. arXiv:1607.04381Accessed on: July 01, 2016
[8] S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," arXiv e-prints, p. arXiv:1510.00149Accessed on: October 01, 2015
[9] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations," arXiv e-prints, p. arXiv:1609.07061Accessed on: September 01, 2016
[10] V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky, "Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition," arXiv e-prints, p. arXiv:1412.6553Accessed on: December 01, 2014
[11] C. Tai, T. Xiao, Y. Zhang, X. Wang, and W. E, "Convolutional neural networks with low-rank regularization," arXiv e-prints, p. arXiv:1511.06067Accessed on: November 01, 2015
[12] J. Ba and R. Caruana, "Do deep nets really need to be deep?," NIPS, 2014.
[13] G. Hinton, O. Vinyals, and J. Dean, "Distilling the Knowledge in a Neural Network," arXiv e-prints, p. arXiv:1503.02531Accessed on: March 01, 2015
[14] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[15] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv e-prints, p. arXiv:1512.03385Accessed on: December 01, 2015
[16] A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," 2009.
[17] A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv e-prints, p. arXiv:1704.04861Accessed on: April 01, 2017