Basic Search / Detailed Display

Author: 李組賢
Tsu-Hsien Lee
Thesis Title: 結合深度卷積神經網路分類在泛用的細微紋理上之研究
A Study of General Fine-Grained Classification with Deep Convolutional Neural Networks
Advisor: 吳怡樂
Yi-Leh Wu
Committee: 陳建中
Jiann-Jone Chen
唐政元
none
閻立剛
none
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2015
Graduation Academic Year: 103
Language: 中文
Pages: 37
Keywords (in Chinese): Caffe預訓練深度卷積神經網路ImageNet Large Scale Visual Recognition Challenge2012 (ILSVRC)
Keywords (in other languages): Caffe, Convolution Neural Network, Pre-train, ILSVRC 2012
Reference times: Clicks: 350Downloads: 1
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 深度學習的通常都需要極大量的時間去訓練一個完整的模組在某個數據上。與目前跑深度學習非機密單位使用的最好的圖形處理器Telsa K40相比較,我們的速度大概慢了兩倍至三倍。而速度是可以取決在圖形處理器上的效能,只要有更高級的圖形處理器或是更多顆且平行處理就可以有更快;但我們覺得這是無止盡的比較,在此篇論文中我們更著手在深度學習的模組、訓練的細節和利用預訓練的方法讓深度學習可以學習地更加快速。我們認為在訓練深度模組前,能有一個良好的初始權重會是一個最重要的關鍵。我們的方法也會著重在更少的時間消耗和有競爭力的結果。在預訓練中我們完整的訓練了修改過ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC)分類比賽冠軍的模組[1]。此模組再加上我們的不一樣的訓練細節比原本的有更快的訓練時間(只需要60個循環13天,而不用到[1]建議的90個循環19.5天)。我們的實驗展示出微調預訓練後的模組在其它的細微紋理的數據上有非常競爭力的結果。


    Deep Learning usually takes lots of time to train. Without super GPUs, the training time may take 2x to 3x longer. We can always use faster GPU or more GPUs with parallel processing to speed up training, but we think that is an endless comparison as long as there are better and faster GPUs. In this paper, we focus on Deep Learning model, training details, and pre-train to speed up the training process. We think that a good initialization weights is the most important key for Deep Learning. Our proposed method focuses on less time consuming (training time and fine-tuning time) process but with competitive classification results. In the pre-training stage, we train a modified ImageNet 2012 classification champion model [1] on ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC 2012) dataset with Caffe [2]. The proposed pre-training(idea from [2]) model with our training details need only 60 cycles (13 days) to converge instead of 90 cycles (19.5 days) as previous reported. Our comprehensive experiments show competitive classification results after fine-tuning the proposed pre-trained model on other difficult fine-grained classification datasets.

    論文摘要 Abstract Contents LIST OF FIGURES LIST OF Tables Chapter 1. Introduction Chapter 2. Deep Learning Model and pre-training works 2.1 Pre-training dataset : ImageNet ILSVRC 2012 dataset 2.2 Original model 2.3 Faster Modified Model Chapter 3. Pre-train and Fine-tune Chapter 4. Experiments 4.1 Pre-training on PRE-TRAIN Model 4.2 Oxford Flower102 dataset 4.2.1 Pure-train on Oxford Flower102 4.2.2 Pre-train on Oxford Flower102 4.2.3 Pure vs Pre training on Oxford Flower102 4.2.4 Compare with the state-of-the-art results 4.3 Caltech-UCSD Birds-200-2011 4.3.1 Pure-train on CUB200-Bird 4.3.2 Pre-train on CUB200-Bird 4.3.3 Pure vs pre training on CUB200-Bird 4.3.4 Compare with the state-of-the-art results on CUB200-Bird 4.4 MIT 67 scene Chapter 5. Conclusions and Future work Reference

    [1] Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information Processing Systems (NIPS) , 2012.
    [2] Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor, “Caffe:Convolutional Architecture for Fast Feature Embedding Jia, Yangqing and Shelhamer”, 2014
    [3] Oxford102 Flower dataset, “http://www.robots.ox.ac.uk/~vgg/data/flowers/102/”, Referenced on May 15 th, 2015
    [4] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie, “The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001”, 2011.
    [5] Ariadna Quattoni, Antonio Torralba. Recognizing Indoor Scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    [6] Basura Fernando, Eugene. Fromont, and Tinne Tuytelaars. Mining midlevel features for image classification. International Journal of Computer Vision (IJCV), 2014
    [7] Maria-Elena Nilsback, Andrew Zisserman. Automated flower classi- fication over a large number of classes, “In Proceedings of the Indian Conference on Computer Vision” Graphics and Image Processing, Dec 2008.
    [8] Ali Sharif Razavian, Hossein Azizpour, Michael J. Sullivan, Stefan Carlsson, "CNN features off-the-shelf: An astounding baseline for recognition", IEEE Conference on Computer Vision and Pattern Recognition (CVPR) DeepVision workshop, 2014
    [9] Maria-Elena Nilsback, Andrew Zisserman, “An automatic visual flora - segmentation and classification of flower images”, 2009
    [10] Anelia Angelova and Sencun Zhu, “Efficient object detection and segmentation for fine-grained recognition”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
    [11] Yuning Chai, Victor Lempitsky, and Andrew Zisserman, “Bicos: A bi-level co-segmentation method for image classification”, International Conference on Computer Vision (ICCV), 2011.
    [12] Catherine Wah, Steve Branson, Peter Welinder, Pietro Peronaand, Serge Belongie “The Caltech-UCSD Birds-200-2011 Dataset”, 2011
    [13] Jian Dong, Wei Xia, Qiang Chen, Jianshi Feng, Zhongyang Huang, Shuicheng Yan, “Hierarchical matching with side information for image classification”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
    [14] Fahad Shahbaz Khan , Joost van de Weijer , Andrew D. Bagdanov, Maria Vanrell1, “Portmanteau vocabularies for multi-cue image representation”, Neural Information Processing Systems (NIPS), 2011.
    [15] Liefeng Bo, Xiaofeng Ren, Dieter Fox, “Kernel descriptors for visual recognition”, Neural Information Processing Systems (NIPS), July.2010.
    [16] Yunling Chai, Esa Rahtu, Victor Lempitsky, Luc Van Gool, Andrew Zisserman, “Tricos: A tri-level class-discriminative cosegmentation method for image classification”, European Conference on Computer Vision (ECCV), 2012.
    [17] Ning Zhang, Ryan Farrell, Trever Darrell,”Pose pooling kernels for sub-category recognition”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
    [18] Jia Deng, Jonathan Krause, Li Fei-Fei, “Fine-grained crowdsourcing for fine-grained recognition”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013
    [19] Yangqing Jia, Oriol Vinyals, Trevor Darrell, “Pooling-Invariant Image Feature Learning”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jan. 2013
    [20] Shulin Yang , Liefeng Bo, Jue Wang, Linda Shapiro, “Unsupervised template learning for fine-grained object recognition”. Neural Information Processing Systems (NIPS), 2012
    [21] Jiongxin Liu and Peter N. Belhumeur, “Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency” , International Conference on Computer Vision (ICCV) ,2013
    [22] Ning Zhang, Ryan Farrell1, Forrest Iandola1, Trevor Darrell, “Deformable part descriptors for fine-grained recognition and attribute prediction”, International Conference on Computer Vision (ICCV), 2013
    [23] Steve Branson, Grant Van Horn, Catherine Wah, Pietro Perona, Serge Belongie, “The ignorant led by the blind: A hybrid human machine vision system for fine-grained categorization”, International Journal of Computer Vision (IJCV) , 2014.
    [24] Thomas Berg, Peter N. Belhumeur, “Poof: Part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
    [25] Christoph Goring, Erik Rodner, Alexander Freytag, and Joachim Denzler ぴ ∗ Computer Vision Group, Friedrich Schiller University Jena, “Nonparametric Part Transfer for Fine-grained Recognition” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
    [26] Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge”, 2014.
    [27] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, ”Going Deeper with Convolutions”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014
    [28] Vinod Nair and Geoffrey Hinton, “Rectified linear units improve restricted Boltzmann machines”, International Conference on Machine Learning (ICML) , 2010.
    [29] Y-Lan Boureau, Francis Bach, Yann LeCun and Jean Ponce, “Learning Mid-Level Features For Recognition”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
    [30] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. “Improving neural networks by preventing co-adaptation of feature detectors”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
    [31] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, november 1998.
    [32] Yoshua Bengio, “Learning deep architectures for AI”, Foundations and Trends in Machine Learning 1(2) pages 1-127, 2009.
    [33] Vincent, H. Larochelle Y. Bengio and P.A. Manzagol, “Extracting and Composing Robust Features with Denoising Autoencoders”, Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML‘08), pages 1096 - 1103, ACM, 2008.
    [34] GPU development in recent years, “http://bkultrasound.com/blog/the-next-generation-of-ultrasound-technology”, Referenced on May 15 th, 2015

    無法下載圖示 Full text public date 2020/07/20 (Intranet public)
    Full text public date This full text is not authorized to be published. (Internet public)
    Full text public date This full text is not authorized to be published. (National library)
    QR CODE