研究生: |
Brilian Tafjira Nugraha Brilian Tafjira Nugraha |
---|---|
論文名稱: |
Analysis of Layer Efficiency and Layer Reduction on Pre-trained CNN Models Analysis of Layer Efficiency and Layer Reduction on Pre-trained CNN Models |
指導教授: |
蘇順豐
Shun-Feng Su |
口試委員: |
林顯易
Lin Xianyi 鍾聖倫 Sheng-Luen Chung 王文俊 Wang Wenjun |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 48 |
中文關鍵詞: | CNN 、deep learning 、layer reduction 、layer efficiency 、neural network |
外文關鍵詞: | CNN, deep learning, layer reduction, layer efficiency, neural network |
相關次數: | 點閱:225 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Deep learning still encounters several issues like overfitting and oversize due to the use of a large number of layers. The huge size greatly constrains performance and portability of deep learning models in limited environments like embedded and IoT devices. In this study, we reported our analysis of activation and gradient output and weight in each layer of the pre-trained models of VGG-16 and custom AlexNet to measure the efficiency of its layers. The efficiencies are estimated by using our measurements and compared with the manual layer reduction to validate the most relevant method. The method for multiple layer reductions is used for validation. With this found approach, the time of one-layer reduction and re-training processes on both models can save up to 9 folds and 5 folds respectively without significant accuracy reduction.
Deep learning still encounters several issues like overfitting and oversize due to the use of a large number of layers. The huge size greatly constrains performance and portability of deep learning models in limited environments like embedded and IoT devices. In this study, we reported our analysis of activation and gradient output and weight in each layer of the pre-trained models of VGG-16 and custom AlexNet to measure the efficiency of its layers. The efficiencies are estimated by using our measurements and compared with the manual layer reduction to validate the most relevant method. The method for multiple layer reductions is used for validation. With this found approach, the time of one-layer reduction and re-training processes on both models can save up to 9 folds and 5 folds respectively without significant accuracy reduction.
VI. REFERENCES
[1] J. Yosinski, et al, "How Transferable Are Features in Deep Neural Networks?," Neural Information Processing Systems, vol. 2, pp. 3320 – 3328, 2014.
[2] Russakovsky et al, "Imagenet Large Scale Visual Recognition Challenge," International Journal of Computer Vision, pp. 211-252, 2015.
[3] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv preprint arXiv: 1708.07747, 2017.
[4] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” European Conference on Computer Vision, pp. 818-833, 2014.
[5] G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv preprint arXiv: 1503.02531.
[6] J. Yim, et al, “A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning,” Computer Vision and Pattern Recognition, 2017.
[7] W. Pan, H. Dong, and Y. Guo, “Dropneuron: Simplifying The Structure of Deep Neural Networks,” arXiv preprint arXiv:1606.07326, 2016.
[8] J. Liu, et al, “Structure Learning for Deep Neural Networks Based on Multiobjective Optimization,” Transactions on Neural Networks and Learning Systems, pp. 1-14, 2017.
[9] J. Wang, J. Lin, and Z. Wang, “Efficient Hardware Architectures for Deep Convolutional Neural Network,” Transactions on Circuits and Systems, pp. 1-13, 2017.
[10] H. Li, et al, "Pruning Filters for Efficient Convnets," International Conference on Learning Representations, 2017.
[11] P. Molchanov, et al, "Pruning Convolutional Neural Networks for Resource Efficient Inference," International Conference on Learning Representations, 2017.
[12] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations, 2015.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” Neural Information Processing Systems, vol.1, pp. 1097-1105, December 2012.
[14] B. T. Nugraha and S. F. Su, “Towards Self-Driving Car using Convolutional Neural Network and Road Lane Detector,” 2nd International Conference on Automation, Cognitive Science, Optics, Micro Electro-Mechanical System, and Information Technology, pp. 65-69, October 2017.
[15] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps,” arXiv preprint arXiv: 1312.6034, 2014.
[16] R. R. Selvaraju, et al, "Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization," International Conference on Computer Vision, pp. 618-626, 2017.
[17] R. Kotikalapudi, “Keras-VIS”, 2017. https://github.com/raghakot/keras-vis
[18] N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithm,” Neural Networks, vol. 12, pp. 145-151, 1999.
[19] J. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
[20] M. D. Zeiler, “Adadelta: an Adaptive Learning Rate Method,” arXiv preprint arXiv: 1212.5701, 2012.
[21] S. Ruder, “An Overview of Gradient Descent Optimization Algorithms,” arXiv preprint arXiv: 1609.04747, 2016.
[22] P. Yuvaraj, et al, “Analyzing User knowledge by Pearson and Spearman method,” International Conference on Trends in Electronics and Informatics, pp. 1086-1089, India, May 2017.
[23] I. I. N. Azha, et al, “Pattem recognition using Pearson correlation on neuron values,” IEEE Control and System Graduate Research Colloquium, pp. 46-51, Malaysia, August 2016.
[24] D. S. Mehta and S. Chen, “A spearman correlation based star pattern recognition,” International Conference on Image Processing, pp. 4372-4376, China, September 2017.
[25] G. Griffin, A. Holub, and P. Perona. “Caltech-256 Object Category Dataset,” California Institute of Technology, Pasadena, CA, Caltech Technical Report, 2007.
[26] F. Chollet, et al, “Keras”, 2015. https://keras.io