Basic Search / Detailed Display

Author: 邱弘承
Hung-Chen Chiu
Thesis Title: 結合深度卷積神經網路在影像中顯著物體數量分類之研究
A Study of Salient Object Subitizing with Deep Convolutional Neural Networks
Advisor: 吳怡樂
Yi-Leh Wu
Committee: 陳建中
Jiann-Jone Chen
Cheng-Yuan Tang
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2016
Graduation Academic Year: 104
Language: 英文
Pages: 24
Keywords (in Chinese): 預訓練深度卷積神經網路大量影像資料集顯著物體數量偵測場景偵測
Keywords (in other languages): Pre-train, ImageNet, ILSVRC 2012, Subitizing
Reference times: Clicks: 289Downloads: 0
School Collection Retrieve National Library Collection Retrieve Error Report
  • 過去,我們使用傳統影像特徵,如:HOG或SIFT…等,來進行電腦視覺的學習與偵測。但隨著硬體的快速發展,過去需要訓練大量參數的卷積神經網路得以更加深化。在本文中,我們探討使用預訓練的深度卷積神經網路對於影像中顯著物體數量之偵測,在[3]提及這種辨識有助於提升物件偵測的成功率。而我們著重於卷積神經網路在偵測物體數量的效能。然而,在自然影像中包含複雜的背景和不同的物體,在[3]中所使用的以單一物件和單調背景所訓練的ImageNet神經網路模型是不夠的全面性的。所以,我們提出使用場景影像及單一物件影像為中心的混和式資料集所預訓練的深度學習網路模型。並且探討,兩種不同的資料集(場景影像、單一物件影像)所訓練的模型在顯著物體數量偵測的效果。透過深度網路中特徵的擷取和模型的微調,我們也證明了混和式的資料集所預訓練的神經網路模型在顯著物體數量偵測是比ImageNet預訓練的神經網路模型更有優勢的。

    People can quickly tell the number of salient objects (1, 2, 3 or 4+) in a image or scene. This phenomenon is known as Subitizing [8]. But in the object detection task, we need to make the precise object proposal first. To study this problem, [3] demonstrated the benefit of their proposed subitizing technique on salient object detection and object proposal for computer vision applications. But this subitizing technique was based on a pre-trained Convolutional Neural Networks (CNN) model. This CNN model needs to be trained on a large-scale annotated data. It means that the quality and quantity of input data is the key to pre-train CNN model. In [3], Zhang et al. fine-tuned the AlexNet [1] and VGG 16 layer [5] network pre-trained on the ImageNet ILSVRC12 dataset. Although the ILSVRC12 is a large dataset with 1000 categories and 1.2 million images, but we demonstrate a new hybrid data CNN model which is trained by ImageNet and MIT places dataset [2]. This hybrid data contain more than 1300 categories and 3 million images. We transfer its weight to fine-tune on salient object subitizing (SOS) dataset [3] and use the ten-crop technique to process the SOS training data from 5520 to 27600 images. Our experiment results support that the proposed method produces superior subitizing classification result than [3].

    論文摘要I AbstractII ContentsIII LIST OF FIGURESIV LIST OF TABLESV Chapter 1. Introduction1 Chapter 2. Convolution neural network(CNN) and Dataset4 2.1 CNN Models6 2.2 Large scale dataset : ILSVRC and MIT places dataset10 2.3 hybrid CNN Model11 Chapter 3. Transfer Learning and Fine-tune12 Chapter 4. Experiments14 4.1 Dataset pre-processing and environments14 4.2 Train the SOS dataset from scratch15 4.3 Fine-tune MIT places CNN models on SOS dataset16 4.3.1 AlexNet models16 4.3.2 VGG-16 layer models17 4.3.3 GoogLeNet models18 4.4 Fine-tune hybrid CNN models on SOS dataset18 4.4.1 compare with state-of-the art result on SOS dataset19 Chapter 5. Conclusion and Future work21 Reference23

    [1] Krizhevsky, A., Sutskever, I., & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105), 2012.
    [2] Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. Learning deep features for scene recognition using places database. In Advances in neural information processing systems (pp. 487-495), 2014.
    [3] Zhang, J., Ma, S., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., & Mech, R. (2015). Salient object subitizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4045-4054), 2015.
    [4] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9), 2015.
    [5] Simonyan, K., & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    [6] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (pp. 675-678). ACM, 2014.
    [7] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958, 2014.
    [8] Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L., & Schmidhuber, J. Flexible, high performance convolutional neural networks for image classification. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence (Vol. 22, No. 1, p. 1237, 2011.
    [9] Lin, M., Chen, Q., & Yan, S. Network in network. arXiv preprint arXiv:1312.4400, 2013.
    [10] Kataoka, H., Iwata, K., & Satoh, Y. Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection. arXiv preprint arXiv:1509.07627, 2015.
    [11] Zhou, B., Khosla, A., Lapedriza, A., Torralba A. & Oliva, A. Places: An Image Database for Deep Scene Understanding”, Arxiv, 2016.
    [12] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856, 2014.
    [13] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Learning Deep Features for Discriminative Localization. arXiv preprint arXiv:1512.04150, 2015.
    [14] Gidaris, S., & Komodakis, N. Object Detection via a Multi-Region and Semantic Segmentation-Aware CNN Model. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1134-1142), 2015.
    [15] Jia, Y. Caffe: An open source convolutional architecture for fast feature embedding. , 2013.
    [16] Oquab, M., Bottou, L., Laptev, I., & Sivic, J. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1717-1724), 2014.
    [17] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
    [18] Jia, Y., Vinyals, O., & Darrell, T. Pooling-invariant image feature learning. arXiv preprint arXiv:1302.5056, 2013.
    [19] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Berg, A. C. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252, 2015.
    [20] Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. Springer International Publishing. In Computer vision–ECCV (pp. 818-833), 2014.
    [21] GPU development in recent years, 2016
    [22] Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. Learning mid-level features for recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 2559-2566). IEEE, 2010.
    [23]Nair, V., & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807-814), 2010.
    [24]Arora, S., Bhaskara, A., Ge, R., & Ma, T. Provable bounds for learning some deep representations. arXiv preprint arXiv:1310.6343, 2013.
    [25] Ren, S., He, K., Girshick, R., Zhang, X., & Sun, J. Object detection networks on convolutional feature maps. arXiv preprint arXiv:1504.06066, 2015.

    無法下載圖示 Full text public date 2021/07/19 (Intranet public)
    Full text public date This full text is not authorized to be published. (Internet public)
    Full text public date This full text is not authorized to be published. (National library)