簡易檢索 / 詳目顯示

研究生: 江郡邦
Jun-Bang Jiang
論文名稱: 運用多樣教師結構進行知識蒸餾於模型壓縮之研究
Study of Knowledge Distillation Based Model Compression by Various Structures with Multi Teachers
指導教授: 呂政修
Jenq-Shiou Leu
口試委員: 鄭瑞光
Ray-Guang Cheng
陳郁堂
Yie-Tarng Chen
方文賢
Wen-Hsien Fang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 48
中文關鍵詞: 人工智慧深度學習模型壓縮神經網路捲積神經網路知識蒸餾遷移學習
外文關鍵詞: Artificial Intelligence, Deep Learning, Model Compression, Transfer Learning, Knowledge Distillation, Convolution Neural Network
相關次數: 點閱:228下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著市場上應用情境的需求不斷提升與改變,深度學習網路架構也逐漸變的複雜且龐大,準確度上升的同時造成了模型參數指數性成長,亦產生了模型效率不佳等更多延伸問題,各個領域的應用需要在準確度與易用性上做出取捨,找到一個良好的平衡點並符合使用需求。因應此種神經網路龐大的問題,模型壓縮的研究極具必要性,最小化縮減模型體積的同時,使準確度仍然保持一定水準甚至是再次提升,是該領域研究的極致目標。
    知識蒸餾這項技術屬於知識遷移的一種分支,這項概念也能適合地被使用在神經網路的模型壓縮,相比於參數剪枝、量化、矩陣分解…等方法,知識蒸餾能有效地將模型知識轉移到相對小的架構上,進而達到提高模型效能的結果。在這篇論文中,我們以知識蒸餾技術為主體,提出一種改良型的架構來提升知識遷移的效率,降低模型參數量、推論時間,並且保持甚至提升模型準確度,讓學生模型具備更加的應用能力,實驗結果除了由直觀的準確度做觀察,我們還將從各個面向進行綜合分析,並與MobileNet進行比較,驗證在此種改良型知識蒸餾架構下的學生模型,具備更好的應用能力。


    Artificial Intelligence (AI) has been used in our life since the near-century. According to the application context, we need more complex AI models to satisfy the requirement. However, the increased complexity of models would cause lots of problems, such as the high requirements of hardware resources and consuming of inference time. Therefore, model compression is an important technology to solve this kind of problem and make the AI be more efficient and easily applied to different applications.
    In this paper, we have proposed an improved Knowledge Distillation architecture to increase the performance of knowledge transfer. The proposed architecture includes three parts: Multi-teacher, Dynamic Temperature, and Multi-experienced-teacher, and it was implemented in two different Convolution Neural Network (CNN), LeNet and ResNet. In order to evaluate the performance of this architecture, we observe not only the accuracy but also make a comprehensive comparison with MobileNet. The evaluated result showed our proposed approach can achieve a better performance in the comprehensive comparison. It has a great advantage in inference time and it means the resulting model would be more reliable for an application context.

    論文摘要 I ABSTRACT II 誌謝 III 目錄 IV 圖片索引 VI 表格索引 VII 第 1 章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 章節提要 3 第 2 章 室內定位相關技術 4 2.1 模型壓縮相關技術 4 2.1.1 模型剪枝 4 2.1.2模型量化 5 2.1.3 低秩分解 6 2.1.4 知識蒸餾 7 2.2 相關技術分析 8 第 3 章 模型壓縮架構的設計 9 3.1 Knowledge Distillation 9 3.1.1 Soft target 10 3.1.2 Loss function 11 3.1.3 知識蒸餾架構與步驟 12 3.2 Convolution Neural Network 13 3.2.1 LeNet-5 13 3.2.2 ResNet 14 3.3 提出方法 15 3.3.1 Multi-teacher 15 3.3.2 Dynamic Temperature 16 3.3.3 Multi-experienced-teacher 17 3.3.4 實驗架構 18 第 4 章 實驗測試與評估結果 19 4.1 實驗準備 19 4.1.1 硬體設備 19 4.1.2 系統環境 19 4.1.3 實驗資料集CIFAR-10 20 4.1.4 相關網路 (MobileNet_v1 + Knowledge Distillation) 21 4.1.5 模型密度指標 (Density Indicator, DI) 22 4.2 實驗流程 23 4.3 實驗結果 24 4.3.1 學生模型之DI值比較 25 4.3.2 模型效能之綜合比較 (LeNet & MobileNet) 26 4.3.3 模型效能之綜合比較 (ResNet & MobileNet) 27 4.3.4 LeNet於不同知識蒸餾架構的準確度變化 28 4.3.5 LeNet於不同知識蒸餾架構的模型體積變化 29 4.3.6 ResNet於不同知識蒸餾架構的準確度變化 30 4.3.7 ResNet於不同知識蒸餾架構的模型體積變化 31 4.3.8 提出方法與傳統知識蒸餾方法的結果比較 32 4.4 實驗結果討論 34 第5章 結論 35 附件 36 參考文獻 37

    [1] "Gartner Says Global Artificial Intelligence Business Value to Reach $1.2 Trillion in 2018," Available in 2018: https://www.gartner.com/en/newsroom/press-releases/2018-04-25-gartner-says-global-artificial-intelligence-business-value-to-reach-1-point-2-trillion-in-2018#:~:text=Global%20business%20value%20derived%20from,reach%20%243.9%20trillion%20in%202022.
    [2] Wikipedia, " AlphaGo versus Lee Sedol," Available in 2020: https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
    [3] S. Ravi, "Custom On-Device ML Models with Learn2Compress," 2018.
    [4] Y. L. Cun, J. S. Denker, and S. A. Solla, "Optimal brain damage," 1990.
    [5] Y. Cheng, D. Wang, P. Zhou, and T. Zhang, "A Survey of Model Compression and Acceleration for Deep Neural Networks," arXiv e-prints, p. arXiv:1710.09282Accessed on: October 01, 2017
    [6] T. Choudhary, V. Mishra, A. Goswami, and J. J. A. I. R. Sarangapani, "A comprehensive survey on model compression and acceleration," pp. 1 - 43, 2020.
    [7] S. Han et al., "DSD: Dense-Sparse-Dense Training for Deep Neural Networks," arXiv e-prints, p. arXiv:1607.04381Accessed on: July 01, 2016
    [8] S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," arXiv e-prints, p. arXiv:1510.00149Accessed on: October 01, 2015
    [9] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations," arXiv e-prints, p. arXiv:1609.07061Accessed on: September 01, 2016
    [10] V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky, "Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition," arXiv e-prints, p. arXiv:1412.6553Accessed on: December 01, 2014
    [11] C. Tai, T. Xiao, Y. Zhang, X. Wang, and W. E, "Convolutional neural networks with low-rank regularization," arXiv e-prints, p. arXiv:1511.06067Accessed on: November 01, 2015
    [12] J. Ba and R. Caruana, "Do deep nets really need to be deep?," NIPS, 2014.
    [13] G. Hinton, O. Vinyals, and J. Dean, "Distilling the Knowledge in a Neural Network," arXiv e-prints, p. arXiv:1503.02531Accessed on: March 01, 2015
    [14] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [15] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv e-prints, p. arXiv:1512.03385Accessed on: December 01, 2015
    [16] A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," 2009.
    [17] A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv e-prints, p. arXiv:1704.04861Accessed on: April 01, 2017

    QR CODE