Improving the Accuracy of Pruned Network Using Knowledge Distillation｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	SETYA WIDYAWAN PRAKOSA SETYA WIDYAWAN PRAKOSA
論文名稱：	Improving the Accuracy of Pruned Network Using Knowledge Distillation Improving the Accuracy of Pruned Network Using Knowledge Distillation
指導教授：	呂政修 Jenq-Shiou Leu
口試委員:	Hsing-Lung Chen Hsing-Lung Chen Yie-Tarng Chen Yie-Tarng Chen Wen-Shien Fang Wen-Shien Fang Jenq-Shiou Leu Jenq-Shiou Leu Ray-Guang Cheng Ray-Guang Cheng
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	42
中文關鍵詞：	Convolutional Neural Networks (CNN) 、compression technique 、pruning filters 、Knowledge Distillation (KD) 、accuracy 、inference time
外文關鍵詞：	Convolutional Neural Networks (CNN), compression technique, pruning filters, Knowledge Distillation (KD), accuracy, inference time
相關次數：	點閱：287 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

The introduction of Convolutional Neural Networks (CNN) in image processing field has attracted researchers to explore the applications of CNN itself. Some network designs have been proposed to reach the state of the art capability. However, the current design of neural network remains an issue related to the size of the model. Thus, some researchers introduce to reduce or compress the model size.
The compression technique might affect the accuracy of the compressed model compared to the original one. In addition, it may influence the performance of the new model. Furthermore, we need to exploit a new scheme to enhance the accuracy of compressed network. In this study, we explore that Knowledge Distillation (KD) can be integrated to one of pruning methodologies namely pruning filters, as the compression technique, to enhance the accuracy of pruned model.
From all experimental results, we conclude that incorporating KD to create a MobileNets model can enhance the accuracy of pruned network without elongating the inference time. We measured the inference time of model trained with KD is just 0.1s longer than that of without KD. Furthermore, by reducing 26.08% of the model size, the accuracy without KD is 63.65% and by incorporating KD, we can enhance to 65.37%.
By reducing the size of model using pruning filters, we can deduct the size while the original size of MobileNets is 14.4 MB and reducing 26.08% can decrease the size to 11.3 MB. We also save 0.1 s inference time by compressing the size of model.

ABSTRACT    i
ACKNOWLEDGEMENTS    iii
CONTENTS    iv
LIST OF FIGURES    vi
LIST OF TABLES    vii
LIST OF EQUATIONS    viii
CHAPTER 1 INTRODUCTION    1
1.1    Research Background    1
1.2    Objective    3
1.3    Research Scope and Constraint    3
1.4    Outline and Report    4
CHAPTER 2 LITERATURE REVIEW    6
2.1    Existing Methodology on Model Compression    6
2.1.1    Knowledge Distillation    7
2.1.2    Pruning Filters    9
2.2    Neural Network Architecture    10
2.2.1    LeNet-5    11
2.2.2    AlexNet    11
2.2.3    MobileNets    12
2.2.4    AlexNet    13
CHAPTER 3 METHODOLOGY    14
3.1    Preliminary Study    15
3.2    Proposed Scheme    17
3.2.1    Architecture    17
3.2.2    How can we prune?    19
3.3    Environment settings    20
CHAPTER 4 RESULTS AND DISCUSSION    21
4.1    Preliminary Study    21
4.2    Proposed Scheme    22
4.2.1    Accuracy of proposed scheme    22
4.2.2    Inference time and model size    25
4.2.3    Retraining to recover dropped accuracy    26
CHAPTER 5 CONCLUSION AND FUTURE WORKS    27
5.1    Conclusion    27
5.2    Future Works    27
REFERENCES    28

                                

1. A. Manyala, H. Cholakkal, V. Anand, V. Kanhangad, and D. Rajan, “CNN-based Gender Classification in Near-Infrared Periocular Images,” Pattern Analysis and Applications, 2018.
2. M. Zhang, C. Gao, Q. Li, L. Wang, and J. Zhang, “Action detection based on tracklets with the two-stream CNN,” Multimedia Tools and Applications, pp 3303-3316, 2018.
3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, Pages 1097-1105, 2012.
4. K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations (ICLR), 2015.
5. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” CoRR, abs/1512.03385, 2015.
6. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” CoRR, abs/1409.4842, 2014.
7. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” CoRR, abs/1512.00567, 2015.
8. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” CoRR, abs/1602.07261, 2016.
9. G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” NIPS 2014 Deep Learning Workshop, NIPS, 2014.
10. L. J. Ba, and R. Caruana, “Do Deep Nets Really Need to be Deep?” Advances in Neural Information Processing System 27, NIPS, 2014.
11. A. Aghasi, A. Abdi, N. Nguyen, and J. Romberg, “Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee,” 31st Conference on Neural Information Processing Systems, NIPS, 2017.
12. X. Dong, S. Chen, and S. J. Pan, “Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon,” CoRR. abs/1705.07565, 2017.
13. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning Filters for Efficient ConvNets,” Proceedings of NIPS Workshop on Efficient Methods for Deep Neural Networks, 2016.
14. Y. Cheng, D. Wang, and P. Zhou, “A Survey of Model Compression and Acceleration for Deep Neural Networks,” IEEE Signal Processing Magazine, Special Issue on Deep Learning for Image Understanding, 2017.
15. S. J. Hanson and L.Y. Pratt, “Comparing Biases for Minimal Network Construction with Back-Propagation,” in Advances in Neural Information Processing System (NIPS) 1, D. S. Touretzky, Ed., 1989, pp. 177-185.
16. Y. L. Cun, J. S. Denker, and S. A. Solla, “Advances in Neural Information Processing System 2,” D. S. Touretzky, Ed. San Fransisco, CA, USA: Morgan Kaufmann Publishers Inc., 1990, ch. Optimal Brain Damage, pp. 598-605.
17. B. Hassibi, D. G. Stork, and S. C. R. Com, “Second Order Derivatives for Network Pruning: Optimal Brain Surgeon,” in Advances in Neural Information Processing Systems 5. Morgan Kaufmann, 1993, pp. 164-171.
18. C. Bucilua, R. Caruana, and A Niculescu-Mizil, “Model Compression” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ‘06. New York, NY, USA: ACM, 2006, pp 535-541.
19. Y. LeCun, L. Bottou, and Y. Bengio, “Gradient-based Learning Applied to Document Recognition,” Proceedings of the IEEE, Nov. 1998, vol. 86, pp 2278-2324.
20. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Version Applications,” CoRR, abs/1704.04861.
21. F. Chollet, “Deep Learning with Depthwise Separable Convolutions,” CVPR, 2017.

簡易檢索 / 詳目顯示

相關論文