簡易檢索 / 詳目顯示

研究生: Brilian Tafjira Nugraha
Brilian Tafjira Nugraha
論文名稱: Analysis of Layer Efficiency and Layer Reduction on Pre-trained CNN Models
Analysis of Layer Efficiency and Layer Reduction on Pre-trained CNN Models
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 林顯易
Lin Xianyi
鍾聖倫
Sheng-Luen Chung
王文俊
Wang Wenjun
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 48
中文關鍵詞: CNNdeep learninglayer reductionlayer efficiencyneural network
外文關鍵詞: CNN, deep learning, layer reduction, layer efficiency, neural network
相關次數: 點閱:225下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Deep learning still encounters several issues like overfitting and oversize due to the use of a large number of layers. The huge size greatly constrains performance and portability of deep learning models in limited environments like embedded and IoT devices. In this study, we reported our analysis of activation and gradient output and weight in each layer of the pre-trained models of VGG-16 and custom AlexNet to measure the efficiency of its layers. The efficiencies are estimated by using our measurements and compared with the manual layer reduction to validate the most relevant method. The method for multiple layer reductions is used for validation. With this found approach, the time of one-layer reduction and re-training processes on both models can save up to 9 folds and 5 folds respectively without significant accuracy reduction.


Deep learning still encounters several issues like overfitting and oversize due to the use of a large number of layers. The huge size greatly constrains performance and portability of deep learning models in limited environments like embedded and IoT devices. In this study, we reported our analysis of activation and gradient output and weight in each layer of the pre-trained models of VGG-16 and custom AlexNet to measure the efficiency of its layers. The efficiencies are estimated by using our measurements and compared with the manual layer reduction to validate the most relevant method. The method for multiple layer reductions is used for validation. With this found approach, the time of one-layer reduction and re-training processes on both models can save up to 9 folds and 5 folds respectively without significant accuracy reduction.

TABLE OF CONTENTS ABSTRACT I List of Figures IV List of Tables VI I. INTRODUCTION 1 1.1. Background and Problem 1 1.2. Motivations 2 1.3. Thesis Contribution 4 1.4. Thesis Organization 6 II. RELATED WORK 7 2.1. Transfer Learning 7 2.2. Structure Learning 9 2.3. Weight 9 2.4. Activation Function 10 2.5. Saliency Maps and Gradient Class Activation Map (Grad-CAM) 11 2.6. Taylor Expansion (TE) 13 III. APPROACHES 18 3.1. Parameter Collections 19 3.2. Efficiency Measurements 21 3.3. Manual Layer Reduction 23 3.4. Measurement Comparisons 25 3.5. Multiple Layer Reductions 26 IV. EXPERIMENTS AND RESULTS 27 4.1. Dataset and Package 27 4.2. Transfer Learning and Fine-Tuning 29 4.3. The Architecture of Models 29 4.4. Experimental Processes 32 4.5. Definition of Layer Efficiency 33 4.6. Analysis of Layer’s Efficiency 34 4.7. Manual Layer Reduction Performances 37 4.8. Measurement Correlation Results 39 4.9. Multiple Layer Reductions 40 4.10. Discussion on the Weight Measurement 41 4.11. Layer Reduction vs Neuron Pruning 43 4.12. Limitation 43 V. CONCLUSIONS AND FUTURE WORK 45 5.1 Conclusions 45 5.2 Future Work 45 VI. REFERENCES 47

VI. REFERENCES
[1] J. Yosinski, et al, "How Transferable Are Features in Deep Neural Networks?," Neural Information Processing Systems, vol. 2, pp. 3320 – 3328, 2014.
[2] Russakovsky et al, "Imagenet Large Scale Visual Recognition Challenge," International Journal of Computer Vision, pp. 211-252, 2015.
[3] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv preprint arXiv: 1708.07747, 2017.
[4] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” European Conference on Computer Vision, pp. 818-833, 2014.
[5] G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv preprint arXiv: 1503.02531.
[6] J. Yim, et al, “A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning,” Computer Vision and Pattern Recognition, 2017.
[7] W. Pan, H. Dong, and Y. Guo, “Dropneuron: Simplifying The Structure of Deep Neural Networks,” arXiv preprint arXiv:1606.07326, 2016.
[8] J. Liu, et al, “Structure Learning for Deep Neural Networks Based on Multiobjective Optimization,” Transactions on Neural Networks and Learning Systems, pp. 1-14, 2017.
[9] J. Wang, J. Lin, and Z. Wang, “Efficient Hardware Architectures for Deep Convolutional Neural Network,” Transactions on Circuits and Systems, pp. 1-13, 2017.
[10] H. Li, et al, "Pruning Filters for Efficient Convnets," International Conference on Learning Representations, 2017.
[11] P. Molchanov, et al, "Pruning Convolutional Neural Networks for Resource Efficient Inference," International Conference on Learning Representations, 2017.
[12] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations, 2015.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” Neural Information Processing Systems, vol.1, pp. 1097-1105, December 2012.
[14] B. T. Nugraha and S. F. Su, “Towards Self-Driving Car using Convolutional Neural Network and Road Lane Detector,” 2nd International Conference on Automation, Cognitive Science, Optics, Micro Electro-­Mechanical System, and Information Technology, pp. 65-69, October 2017.
[15] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps,” arXiv preprint arXiv: 1312.6034, 2014.
[16] R. R. Selvaraju, et al, "Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization," International Conference on Computer Vision, pp. 618-626, 2017.
[17] R. Kotikalapudi, “Keras-VIS”, 2017. https://github.com/raghakot/keras-vis
[18] N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithm,” Neural Networks, vol. 12, pp. 145-151, 1999.
[19] J. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
[20] M. D. Zeiler, “Adadelta: an Adaptive Learning Rate Method,” arXiv preprint arXiv: 1212.5701, 2012.
[21] S. Ruder, “An Overview of Gradient Descent Optimization Algorithms,” arXiv preprint arXiv: 1609.04747, 2016.
[22] P. Yuvaraj, et al, “Analyzing User knowledge by Pearson and Spearman method,” International Conference on Trends in Electronics and Informatics, pp. 1086-1089, India, May 2017.
[23] I. I. N. Azha, et al, “Pattem recognition using Pearson correlation on neuron values,” IEEE Control and System Graduate Research Colloquium, pp. 46-51, Malaysia, August 2016.
[24] D. S. Mehta and S. Chen, “A spearman correlation based star pattern recognition,” International Conference on Image Processing, pp. 4372-4376, China, September 2017.
[25] G. Griffin, A. Holub, and P. Perona. “Caltech-256 Object Category Dataset,” California Institute of Technology, Pasadena, CA, Caltech Technical Report, 2007.
[26] F. Chollet, et al, “Keras”, 2015. https://keras.io

QR CODE