簡易檢索 / 詳目顯示

研究生: 劉博獻
Po-Hsien Liu
論文名稱: 先進卷積式神經網路應用於深度學習及影像通用分類
Novel Convolutional Neural Networks for Deep Learning and Its Applications to General Image Classification
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 黃有評
Yo-Ping Huang
游文雄
Wen-shyong Yu
王乃堅
Nai-jian Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 62
中文關鍵詞: 卷積神經網路深度學習電腦視覺影像分類
外文關鍵詞: convolution neural network, deep learning, computer vision, image classification
相關次數: 點閱:405下載:18
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度學習在許多領域有很多傑出的表現。在深度學習的領域中,最常見的演算法架構為卷積神經網路(CNN),不僅被廣泛的應用在電腦視覺及影音辨識,近年來更大幅的超越所有相關領域的人工特徵。然而,此些技術相較於其餘的演算法,在訓練及測試上所消耗的資源(例如:運算速度以及記憶體資源)極為大量,因此在初期都無法被廣泛應用。多虧於GPU技術及雲端運算的興起,使得這技術能在可預期的時間訓練完畢。故本篇論文主要目標為如何在單一GPU的限制下,加速訓練的收斂時間且不造成辨識準確度下降,來減少整體的測試開發過程及加強可用性。本篇主要探討通用式影像辨識,分辨各種日常生活照片,首先,為了探討卷積神經網路架構上對於整體的辨識度的影響,我們在CIFAR資料庫中,做了不同架構上面的實驗,包含不同的架構深度、濾波器數量以及濾波器核心大小,最後會探討並推薦如何有效的調整參數。其二,我們提出一個新穎的架構,延伸於嵌入式學習的”多階層式監督式學習系統”。從ImageNet [23]的實驗得知,我們提出的多層式監督式學習大幅的減少整體訓練收斂時間(70 epochs 變成 20 epochs,約減少12天),同時在ImageNet中達到top-1準確度66.7%及top-5 準確度86.33%,並且在無額外的資料合成技術下,非常接近[3]的最佳top-5結果89.3%


    Deep learning has recently exhibited nice performance in many applications. The convolution neural network is an often-used architecture for deep learning and has been widely used in computer vision and audio recognition, and outperformed other related handcraft designed feature in recent years. These techniques compared to other artificial intelligence algorithms and handcraft features need extremely much more time in training and testing and then were not widely used in the early days. Due to the rise of graphic processing unit (GPU) and cloud computing, it becomes feasible to implement those techniques recently. The main idea is to use a single GPU by reducing the training time without declining accuracy so that the convolution neural network is applicable for real world problems. The problem considered in this study is general image classification. It is to classify a general image into known categories. The first part of our study is about the impacts of different factors used in the convolution neural network. The considered factors are network depth, numbers of filters, and filter sizes. The used data set is the CIFAR dataset. After our experiments, some suggestions about those factors are recommended in this study. The second part of this study is to propose a novel multi-level supervised learning system. This multi-level system is to expand some embedded learning ideas proposed in the literature. From our experiments on ImageNet, the proposed multi-level supervised learning can significantly reduce the training time from 70 epochs to 20 epochs, at that same time reach the top-1 accuracy 66.7% and top-5 accuracy 86.33% that are close to the best top-5 accuracy 89.3% given in [3] without any data augmentation in the ImageNet dataset.

    中文摘要 I Abstract II Figure list V Table list VI Chapter 1 Introduction 1 1.1 Motivation 5 1.2 Research Environment 5 1.3 Organization 5 Chapter 2 Related Works 6 2.1 Convolution Neural Network Architecture 6 2.2 Convolutional Neural 7 2.3 Non-linear Activation Function 8 2.4 Pooling 10 2.5 Learning Methodology 11 2.6 Visualization Approaches 12 2.6.1 Layer Activations 12 2.6.2 Convolutional Filter Weight Visiualization 14 2.6.3 Retrieving Images that strongest Activation for a Feature Map 15 2.7 The Notation 16 2.8 A Baseline Model 18 2.9 Datasets 20 Chapter 3 Study on the Structure of Convolution Neural Network 22 3.1 Phenomenon with different Network Width 23 3.2 Phenomenon with different Convolutional Filter Sizes 25 3.3 Phenomenon with different Depths in a Stage 27 3.4 Phenomenon in deeper Convolution with different Network Widths 29 3.5 Phenomenon in deeper Convolution with different Filter Sizes 30 Chapter 4 Multi-level Supervised Learning 33 4.1 The Architecture detail 34 4.2 Multi-level Module 35 4.3 Emdedding Scale Layer 36 4.4 Multi-level emdedding Layer 37 4.5 Experiments 38 4.6 A real World Application 45 Chapter 5 Conclusions and Future Works 48 5.1 Conclusions 48 5.2 Future works 49

    [1]K. He, X. Zhang, S. Ren, and J. Sun. “Delving Deep in Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” arXiv:1502.01852, 2015.
    [2]S, Ioffe and C, Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167, 2015.
    [3]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. “Going deeper with convolutions,” arXiv:1409.4842v1, 2014.
    [4]K. Simonyan and A. Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG).” arXiv:1409.1556, 2014.
    [5]D. Ciresan, U. Meier and J. Schmidhuber, “Multi-column Deep Neural Networks for Image Classification.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642-3649, 2012.
    [6]M. Lin, Q. Chen and S. Yan, “Network In Network.” International Conference on Learning Representation (ICLR), 2013.
    [7]B. Graham, “Spatially-Sparse Convolutional Neural Networks.” arXiv:1409.6070v1, 2014.
    [8]K. He, X Zhang, S. Ren and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” arXiv:1406.4729, 2014.
    [9]N. Srivastava and et al., ”Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Neural and Evolutionary Computing, vol. 15, no. 1, pp.1929-1958, 2014.
    [10]X. Zhang, J. Zou, X. Ming, K. He and J. Sun, “Efficient and Accurate Approximations for Nonlinear Convolutional Networks,”arXiv:1411.4229, 2014.
    [11]J. Weston and F. Ratle, “Deep Learning via Semi-Supervised Embedding,” International Conference on Machine Learning (ICML), pp. 1168-1175, 2008.
    [12]J. Long, E. Shelhamer and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,”arXiv:1411.4038, 2015.
    [13]L. J. Ba and R. Caruana, “Do Deep Nets Really Need to be Deep,” Neural and Evolutionary Computing, 2013.
    [14]M. D. Collins and P. Kohli, “Memory Bounded Deep Convolutional Networks,” arXiv:1412.1442, 2014.
    [15]A. Krizhevsky, I. Sutskever and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” Neural Information Processing System (NIPS),pp. 1106-1114, 2012.
    [16]A. Gangopadhyay, S. M. Tripathi, I. Jindal and S. Raman, “SA-CNN: Dynamic Scene Classification using Convolutional Neural Networks,“ arXiv:1502.05243v1, 2015.
    [17]S. Ioffe and C. Szegedy, “Convolutional Neural Network at Constrained Time Cost,” arXiv: 1412.1710v1, 2014.
    [18]I. J. Goodfellow, D. Warde-Farly, M. Mirza, A. Courville and Y. Bengio, “Maxout Networks,” International Conference on Machine Learning (ICML), pp.1319-1327, 2013.
    [19]X. Zhang, J. Zou, X, Ming, K, He and J. Sun, ”Efficient and Accurate Approximations of Nonlinear Convolution Networks,” arXiv:1411.4229v1, 2014.
    [20]Mattthew. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolution Networks,” arXiv:1311.2901v3, 2013.
    [21]X. Glorot and Y. Bengio, “Understanding the Difficulty of Training Deep Feedforward Neural Network,” Artificial Intelligence and Statistics (AISTATS).pp. 249-256, 2010.
    [22]S. Arora, A. Bhaskara, R. Ge and T. Ma, “Provable Bounds for Learning Some Deep Representations,” International Conference on Machine Learning (ICML). 2014.
    [23]O. Russakoveky, et al., “Imagenet Large Scale Visual Recognition Challenge,” arXiv:1409.0575, 2014.
    [24]B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, “Object Detectors Emerge in Deep Learning Scene CNNs,” International Conference on Learning Representation (ICLR), 2015.
    [25]Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick. S. Guadarrarama and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding,” arXiv:1408, 5093, 2014.
    [26]R. Girshick, J Donahue, T. Darrell and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” arXiv:1311.2524v5, 2014.
    [27]K. Lin, H-F. Yang, J-H. Hsiao, C-S. Chen and A. Sinica, “Deep Learning of Binary Hash Codes for Fast Image Retrieval,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    [28]O. Vinyal, A. Toshev, S. Bengio and D. Erhan, “Show and Tell: A Neural Image Caption Generator,” arXiv:1411.4555v1, 2014.
    [29]A. G. Howard, “Some Improvements on Deep Convolution Neural Network based Image Classification,” arXiv:1312.5402, 2013.
    [30]M. D. Zeiler and R. Fergus, “Stochastic Pooling for Regularization of Deep Convolutional Neural Network,” Neural and Evolutionary Computing, 2013.
    [31]A. L. Maas, A. Y. Hannun and A. Y. Ng, “Rectified Nonlinearities improve Neural Network Acoustic Models,” International Conference on Machine Learning (ICML), 2013.
    [32]V. Nair and G. E. Hinton, “Rectified Linear Units improves Restricted Boltzmann Machines,” International Conference on Machine Learning (ICML). 2010.
    [33]A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.
    [34]B. Karlik and A. V. Olgac, “Performance analysis of various activation function in generalize MLP architecture of neural network,” International Journal of Artificial Intelligence and Expert System (IJAE), vol. 1, no. 4, pp 75-122, 2010.
    [35]Y. LeCun and Y. Bengio, “Convolutional Networks for Images, Speech, and Time-Series”, 1998.
    [36]G. E. Hinton, S. Osindero and Y. W. The, “A fast learning algorithm for deep belief nets”, Neural Computation, pp. 1527-1554, 2006.
    [37]R. Keisler, “Convolution Neural Networks and Galaxies”, http://stanford.edu/~rkeisler/gz/keisler_convnet.pdf, 2015

    無法下載圖示
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE