研究生: |
邱弘承 Hung-Chen Chiu |
---|---|
論文名稱: |
結合深度卷積神經網路在影像中顯著物體數量分類之研究 A Study of Salient Object Subitizing with Deep Convolutional Neural Networks |
指導教授: |
吳怡樂
Yi-Leh Wu |
口試委員: |
陳建中
Jiann-Jone Chen 唐政元 Cheng-Yuan Tang 閻立剛 none |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 24 |
中文關鍵詞: | 預訓練 、深度卷積神經網路 、大量影像資料集 、顯著物體數量偵測 、場景偵測 |
外文關鍵詞: | Pre-train, ImageNet, ILSVRC 2012, Subitizing |
相關次數: | 點閱:288 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
過去,我們使用傳統影像特徵,如:HOG或SIFT…等,來進行電腦視覺的學習與偵測。但隨著硬體的快速發展,過去需要訓練大量參數的卷積神經網路得以更加深化。在本文中,我們探討使用預訓練的深度卷積神經網路對於影像中顯著物體數量之偵測,在[3]提及這種辨識有助於提升物件偵測的成功率。而我們著重於卷積神經網路在偵測物體數量的效能。然而,在自然影像中包含複雜的背景和不同的物體,在[3]中所使用的以單一物件和單調背景所訓練的ImageNet神經網路模型是不夠的全面性的。所以,我們提出使用場景影像及單一物件影像為中心的混和式資料集所預訓練的深度學習網路模型。並且探討,兩種不同的資料集(場景影像、單一物件影像)所訓練的模型在顯著物體數量偵測的效果。透過深度網路中特徵的擷取和模型的微調,我們也證明了混和式的資料集所預訓練的神經網路模型在顯著物體數量偵測是比ImageNet預訓練的神經網路模型更有優勢的。
People can quickly tell the number of salient objects (1, 2, 3 or 4+) in a image or scene. This phenomenon is known as Subitizing [8]. But in the object detection task, we need to make the precise object proposal first. To study this problem, [3] demonstrated the benefit of their proposed subitizing technique on salient object detection and object proposal for computer vision applications. But this subitizing technique was based on a pre-trained Convolutional Neural Networks (CNN) model. This CNN model needs to be trained on a large-scale annotated data. It means that the quality and quantity of input data is the key to pre-train CNN model. In [3], Zhang et al. fine-tuned the AlexNet [1] and VGG 16 layer [5] network pre-trained on the ImageNet ILSVRC12 dataset. Although the ILSVRC12 is a large dataset with 1000 categories and 1.2 million images, but we demonstrate a new hybrid data CNN model which is trained by ImageNet and MIT places dataset [2]. This hybrid data contain more than 1300 categories and 3 million images. We transfer its weight to fine-tune on salient object subitizing (SOS) dataset [3] and use the ten-crop technique to process the SOS training data from 5520 to 27600 images. Our experiment results support that the proposed method produces superior subitizing classification result than [3].
[1] Krizhevsky, A., Sutskever, I., & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105), 2012.
[2] Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. Learning deep features for scene recognition using places database. In Advances in neural information processing systems (pp. 487-495), 2014.
[3] Zhang, J., Ma, S., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., & Mech, R. (2015). Salient object subitizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4045-4054), 2015.
[4] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9), 2015.
[5] Simonyan, K., & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[6] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (pp. 675-678). ACM, 2014.
[7] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958, 2014.
[8] Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L., & Schmidhuber, J. Flexible, high performance convolutional neural networks for image classification. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence (Vol. 22, No. 1, p. 1237, 2011.
[9] Lin, M., Chen, Q., & Yan, S. Network in network. arXiv preprint arXiv:1312.4400, 2013.
[10] Kataoka, H., Iwata, K., & Satoh, Y. Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection. arXiv preprint arXiv:1509.07627, 2015.
[11] Zhou, B., Khosla, A., Lapedriza, A., Torralba A. & Oliva, A. Places: An Image Database for Deep Scene Understanding”, Arxiv, 2016.
[12] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856, 2014.
[13] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015). Learning Deep Features for Discriminative Localization. arXiv preprint arXiv:1512.04150, 2015.
[14] Gidaris, S., & Komodakis, N. Object Detection via a Multi-Region and Semantic Segmentation-Aware CNN Model. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1134-1142), 2015.
[15] Jia, Y. Caffe: An open source convolutional architecture for fast feature embedding.http://caffe.berkeleyvision.org/ , 2013.
[16] Oquab, M., Bottou, L., Laptev, I., & Sivic, J. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1717-1724), 2014.
[17] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[18] Jia, Y., Vinyals, O., & Darrell, T. Pooling-invariant image feature learning. arXiv preprint arXiv:1302.5056, 2013.
[19] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Berg, A. C. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252, 2015.
[20] Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. Springer International Publishing. In Computer vision–ECCV (pp. 818-833), 2014.
[21] GPU development in recent years, http://bkultrasound.com/blog/the-next-generation-of-ultrasound-technology 2016
[22] Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. Learning mid-level features for recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 2559-2566). IEEE, 2010.
[23]Nair, V., & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807-814), 2010.
[24]Arora, S., Bhaskara, A., Ge, R., & Ma, T. Provable bounds for learning some deep representations. arXiv preprint arXiv:1310.6343, 2013.
[25] Ren, S., He, K., Girshick, R., Zhang, X., & Sun, J. Object detection networks on convolutional feature maps. arXiv preprint arXiv:1504.06066, 2015.