研究生: |
洪宇薇 Yu-Wei Hong |
---|---|
論文名稱: |
以知識蒸餾實現模型壓縮之分析 Analysis of Model Compression Using Knowledge Distillation |
指導教授: |
呂政修
Jenq-Shiou Leu |
口試委員: |
周承復
Cheng-Fu Chou 林敬舜 Ching-Shun Lin 陳郁堂 Yie-Tarng Chen 方文賢 Wen-Hsien Fang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 41 |
中文關鍵詞: | 深度學習 、模型壓縮 、知識蒸餾 、分析 、視覺化 、熱度圖 |
外文關鍵詞: | Deep Learning, Model Compression, Knowledge Distillation, analysis, visualization, heatmap |
相關次數: | 點閱:331 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習的發展,為了因應人類不同種的需求及情境,網路架構被設計的越來越龐大且複雜。這些複雜的網路架構往往造成使用者必須消耗一定的運算資源、記憶體空間,而且不夠即時。為了解決這些缺點,模型壓縮技術領域是一個值得研究的領域。而使用者必須知道如何根據自己的需求,選擇適合的模型壓縮技術及壓縮結果,以達到需求以及效能犧牲的平衡點。
這篇論文中,在已知需壓縮目標模型的情況下,我們提出了兩種模型壓縮方式:切通道深度、切模型層數。在設計出壓縮過後的模型架構之後,實施知識蒸餾以提昇壓縮模型的辨識率。最後,我們將會展示如何用不同角度分析模型壓縮的成果,並且針對結果提出一些取得效能與需求之間的平衡上的建議。在實驗結果中,MobileNet_v1若是以通道深度壓縮方式,可至少壓縮42.27%,若是以層數壓縮方式則至少可壓縮32.42%。除此之外,知識蒸餾所能提昇的準確率以通道深度壓縮方式中尤為有效(多於4.71%)。
In this paper, given a model to compress, we propose two kinds of model compression: cut the network width-wise and layer-wise. Afterwards, Knowledge Distillation is deployed to compensate and improve classifiers' accuracy. At the end, we also demonstrate how to analyze those compressed models from a variety of perspectives, and come up with several suggestions about the trade-off between performance (inference time and accuracy) and compression rate. In the results of experiments, the compression rate of width-wise compression on MobileNet_v1 is at least 42.27 %, whereas that of layer-wise compression is at least 32.42 %. Moreover, the improvement of accuracy between procedure with and without Knowledge Distillation is especially notable for layer-wise compression (more than 4.71 \%).
[1] “Heatmapping deep taylor decompostion.” http://heatmapping.org/deeptaylor/.
[2] D. Mishkin and J. Matas, “All you need is a good init,” CoRR, vol. abs/1511.06422, 2016.
[3] Y. J. Cheng, D. Wang, P. Zhou, and T. Zhang, “A survey of model compression and acceleration for deep neural networks,” ArXiv, vol. abs/1710.09282, 2017.
[4] S. Han, J. Pool, S. Narang, H. Mao, E. Gong, S. Tang, E. Elsen, P. Vajda, M. Paluri, J. Tran, B. Catanzaro, and W. J. Dally, “Dsd: Dense-sparse-dense training for deep neural networks,” in ICLR, 2017.
[5] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding,” CoRR, vol. abs/1510.00149, 2016.
[6] V. Lebedev, Y. Ganin, M. Rakhuba, I. V. Oseledets, and V. S. Lempitsky, “Speeding-up convolutional neural networks using fine-tuned cp-decomposition,” CoRR, vol. abs/1412.6553, 2015.
[7] C. Tai, T. Xiao, X. Wang, and E. Weinan, “Convolutional neural networks with low-rank regularization,” CoRR, vol. abs/1511.06067, 2016.
[8] T. Cohen and M. Welling, “Group equivariant convolutional networks,” in ICML, 2016.
[9] C. Szegedy, S. Ioffe, and V. Vanhoucke, “Inception-v4, inception-resnet and the impact of residual
connections on learning,” in AAAI, 2016.
[10] J. Ba and R. Caruana, “Do deep nets really need to be deep?,” in NIPS, 2014.
[11] C. Bucila, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in KDD, 2006.
[12] G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” ArXiv, vol. abs/
1503.02531, 2015.
[13] “MACHINE LEARNING Cheatsheet loss functions.” https://ml-cheatsheet.readthedocs. io/en/latest/loss_functions.html, 2017.
[14] A. Binder, G. Montavon, S. Bach, K. Müller, and W. Samek, “Layer-wise relevance propagation for neural networks with local renormalization layers,” ArXiv, vol. abs/1604.00825, 2016.
[15] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K. Müller, “Explaining nonlinear classification decisions with deep taylor decomposition,” Pattern Recognition, vol. 65, pp. 211–222, 2017.
[16] M. Alber, S. Lapuschkin, P. Seegerer, M. Hägele, K. T. Schütt, G. Montavon, W. Samek, K.-R. Müller, S. Dähne, and P.-J. Kindermans, “innvestigate neural networks!,” J. Mach. Learn. Res., vol. 20, pp. 93:1–93:8, 2018.
[17] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam,
“Mobilenets: Efficient convolutional neural networks for mobile vision applications,” ArXiv, vol. abs/ 1704.04861, 2017.32
[18] A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2009.
[19] D. 9, “David 9’s blog train a keras model on cifar-10 to accuracy 90%+.” http://nooverfit.com/
wp/%E7%94%A8keras%E8%AE%AD%E7%BB%83%E4%B8%80%E4%B8%AA%E5%87%86%E7%A1%AE%E7%
8E%8790%E7%9A%84cifar-10%E9%A2%84%E6%B5%8B%E6%A8%A1%E5%9E%8B/.
[20] F. Chollet et al., “Keras.” https://keras.io, 2015