Analysis of Layer Efficiency and Layer Reduction on Pre-trained CNN Models

簡易檢索 / 詳目顯示

回結果列表

研究生：	Brilian Tafjira Nugraha Brilian Tafjira Nugraha
論文名稱：	Analysis of Layer Efficiency and Layer Reduction on Pre-trained CNN Models Analysis of Layer Efficiency and Layer Reduction on Pre-trained CNN Models
指導教授：	蘇順豐 Shun-Feng Su
口試委員:	林顯易 Lin Xianyi 鍾聖倫 Sheng-Luen Chung 王文俊 Wang Wenjun
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	48
中文關鍵詞：	CNN 、deep learning 、layer reduction 、layer efficiency 、neural network
外文關鍵詞：	CNN, deep learning, layer reduction, layer efficiency, neural network
相關次數：	點閱：225 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

Deep learning still encounters several issues like overfitting and oversize due to the use of a large number of layers. The huge size greatly constrains performance and portability of deep learning models in limited environments like embedded and IoT devices. In this study, we reported our analysis of activation and gradient output and weight in each layer of the pre-trained models of VGG-16 and custom AlexNet to measure the efficiency of its layers. The efficiencies are estimated by using our measurements and compared with the manual layer reduction to validate the most relevant method. The method for multiple layer reductions is used for validation. With this found approach, the time of one-layer reduction and re-training processes on both models can save up to 9 folds and 5 folds respectively without significant accuracy reduction.

TABLE OF CONTENTS
ABSTRACT    I
List of Figures    IV
List of Tables    VI
I.    INTRODUCTION    1
1.    Background and Problem    1
2.    Motivations    2
3.    Thesis Contribution    4
4.    Thesis Organization    6
II.    RELATED WORK    7
1.    Transfer Learning    7
2.    Structure Learning    9
3.    Weight    9
4.    Activation Function    10
5.    Saliency Maps and Gradient Class Activation Map (Grad-CAM)    11
6.    Taylor Expansion (TE)    13
III.    APPROACHES    18
1.    Parameter Collections    19
2.    Efficiency Measurements    21
3.    Manual Layer Reduction    23
4.    Measurement Comparisons    25
5.    Multiple Layer Reductions    26
IV.    EXPERIMENTS AND RESULTS    27
1.    Dataset and Package    27
2.    Transfer Learning and Fine-Tuning    29
3.    The Architecture of Models    29
4.    Experimental Processes    32
5.    Definition of Layer Efficiency    33
6.    Analysis of Layer’s Efficiency    34
7.    Manual Layer Reduction Performances    37
8.    Measurement Correlation Results    39
9.    Multiple Layer Reductions    40
10.    Discussion on the Weight Measurement    41
11.    Layer Reduction vs Neuron Pruning    43
12.    Limitation    43
V.    CONCLUSIONS AND FUTURE WORK    45
1 Conclusions    45
2 Future Work    45
VI.    REFERENCES    47


                                

VI. REFERENCES
[1] J. Yosinski, et al, "How Transferable Are Features in Deep Neural Networks?," Neural Information Processing Systems, vol. 2, pp. 3320 – 3328, 2014.
[2] Russakovsky et al, "Imagenet Large Scale Visual Recognition Challenge," International Journal of Computer Vision, pp. 211-252, 2015.
[3] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv preprint arXiv: 1708.07747, 2017.
[4] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” European Conference on Computer Vision, pp. 818-833, 2014.
[5] G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv preprint arXiv: 1503.02531.
[6] J. Yim, et al, “A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning,” Computer Vision and Pattern Recognition, 2017.
[7] W. Pan, H. Dong, and Y. Guo, “Dropneuron: Simplifying The Structure of Deep Neural Networks,” arXiv preprint arXiv:1606.07326, 2016.
[8] J. Liu, et al, “Structure Learning for Deep Neural Networks Based on Multiobjective Optimization,” Transactions on Neural Networks and Learning Systems, pp. 1-14, 2017.
[9] J. Wang, J. Lin, and Z. Wang, “Efficient Hardware Architectures for Deep Convolutional Neural Network,” Transactions on Circuits and Systems, pp. 1-13, 2017.
[10] H. Li, et al, "Pruning Filters for Efficient Convnets," International Conference on Learning Representations, 2017.
[11] P. Molchanov, et al, "Pruning Convolutional Neural Networks for Resource Efficient Inference," International Conference on Learning Representations, 2017.
[12] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations, 2015.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” Neural Information Processing Systems, vol.1, pp. 1097-1105, December 2012.
[14] B. T. Nugraha and S. F. Su, “Towards Self-Driving Car using Convolutional Neural Network and Road Lane Detector,” 2nd International Conference on Automation, Cognitive Science, Optics, Micro Electro-Mechanical System, and Information Technology, pp. 65-69, October 2017.
[15] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps,” arXiv preprint arXiv: 1312.6034, 2014.
[16] R. R. Selvaraju, et al, "Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization," International Conference on Computer Vision, pp. 618-626, 2017.
[17] R. Kotikalapudi, “Keras-VIS”, 2017. https://github.com/raghakot/keras-vis
[18] N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithm,” Neural Networks, vol. 12, pp. 145-151, 1999.
[19] J. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
[20] M. D. Zeiler, “Adadelta: an Adaptive Learning Rate Method,” arXiv preprint arXiv: 1212.5701, 2012.
[21] S. Ruder, “An Overview of Gradient Descent Optimization Algorithms,” arXiv preprint arXiv: 1609.04747, 2016.
[22] P. Yuvaraj, et al, “Analyzing User knowledge by Pearson and Spearman method,” International Conference on Trends in Electronics and Informatics, pp. 1086-1089, India, May 2017.
[23] I. I. N. Azha, et al, “Pattem recognition using Pearson correlation on neuron values,” IEEE Control and System Graduate Research Colloquium, pp. 46-51, Malaysia, August 2016.
[24] D. S. Mehta and S. Chen, “A spearman correlation based star pattern recognition,” International Conference on Image Processing, pp. 4372-4376, China, September 2017.
[25] G. Griffin, A. Holub, and P. Perona. “Caltech-256 Object Category Dataset,” California Institute of Technology, Pasadena, CA, Caltech Technical Report, 2007.
[26] F. Chollet, et al, “Keras”, 2015. https://keras.io

簡易檢索 / 詳目顯示

相關論文