結合深度卷積神經網路分類在不良圖片上之研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	李廷修 Ting-Hsiu Lee
論文名稱：	結合深度卷積神經網路分類在不良圖片上之研究 A Study of Objectionable Images Classification with Deep Convolutional Neural Networks
指導教授：	吳怡樂 Yi-Leh Wu
口試委員:	陳建中 none 唐政元 none 閻立剛 none
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2016
畢業學年度：	104
語文別：	英文
論文頁數：	78
中文關鍵詞：	不良圖片、深度卷積神經網路、深度學習
外文關鍵詞：	Objectionable Images
相關次數：	點閱：239 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在過去不良圖片的辨識通常利用皮膚特徵或是圖片附註的關鍵字描述搭配多個過濾器進行過濾。深度學習通常都需要花費大量的時間來訓練，。隨著硬體的進步以及新的演算法不斷提出，訓練時間問題漸漸緩解。深度卷積神經網路在圖片的分類上有非常良好的效果。在此篇論文中我們利用深度卷積神經網路來解決不良圖片分類的問題。這也是我們所知第一篇使用深度卷積神經網路來學習以及分類不良圖片的論文。我們比較不同模型的深度卷積神經網路的效能，使用八萬張圖片在我們的圖形處理器環境下，大約花費2.5小時就能取得97%左右的分類精確度，這是非常有競爭力的結果。

In the past, people usually use the skin feature extract or the keywords for text contents combined many filters to recognize the objectionable images. Deep learning usually takes lots of time to train. With the advances in hardware and new algorithm proposed, the training time problem is gradually alleviated. The deep convolutional neural networks have good effect on images classification. In this paper, we use deep convolutional neural networks to solve classification of the objectionable images problem. To the best of our knowledge, this study is the first to use deep convolutional neural networks to learn and classify objectionable images. We compare the effects with different deep convolutional neural network models. Using 80000 training images in our GPU environment, we can get 97% classification accuracy with training cost of only 2.5 hours. This is a very competitive result.

論文摘要……………………...…………………….………………………….…I
Abstract……………………...……………..……….………………………….…II
Contents…………………...……………..………...………………………….…III
List of Figures……….………………..………...………...………………….…IV
List of Tables……….………………..………...………...………...……….…VI
Chapter 1. Introduction…………………………………………………………...1
Chapter 2. Deep Learning Model………………………………………………...4
Chapter 3. Caffe and Models……………………………………………………..7
3.1 Alex Net Model………………………………………………………....8
3.2 Reference Net Model…………………………………………………..10
Chapter 4. Experiment…………………………………………………………..12
4.1 Simple Imagenet ILSVRC 2012 training dataset and People-related
dataset……..……………………………………………………………13
4.2 Use Simple Imagenet ILSVRC 2012 training dataset on Alex Net Model
 and Reference Net Model………………………………………………16
4.2.1 Alex Net Model training and testing on Simple Imagenet
 ILSVRC 2012 training dataset…………………………………17
4.2.2 Reference Net Model training and testing on Simple Imagenet
 ILSVRC 2012 training dataset…………………………………24
4.3 Use People-related dataset on Alex Net Model and Reference Net
 Model…………………………………………………………...………31
4.3.1 Alex Net Model training and testing on People-related
 dataset……………………………………………………..……32
4.3.2 Reference Net Model training and testing on People-related
 dataset……………………………………………………..……39
Chapter 5. Conclusions and Future work…………………...…..…..…………..47
Reference…………………………….…………………...……………………..48
Appendix A………………………….…………………...……………………..50
Appendix B………………………….…………………...……………………..55
Appendix C………………………….…………………...……………………..60
Appendix D………………………….…………………...……………………..65

                                

[1] Abadpour, A., & Kasaei, S. (2005). Pixel-based skin detection for pornography filtering. Iranian Journal of Electrical and Electronic Engineering, 1(3), 21-41.
[2] Fleck, M. M., Forsyth, D. A., & Bregler, C. (1996). Finding naked people. In Computer Vision—ECCV'96 (pp. 593-602). Springer Berlin Heidelberg.
[3] Forsyth, D. A., & Fleck, M. M. (1996, December). Identifying nude pictures. In Applications of Computer Vision, 1996. WACV'96., Proceedings 3rd IEEE Workshop on (pp. 103-108). IEEE.
[4] Forsyth, D. A., & Fleck, M. M. (1999). Automatic detection of human nudes. International Journal of Computer Vision, 32(1), 63-77.
[5] Ioffe, S., & Forsyth, D. (1999). Finding people by sampling. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on (Vol. 2, pp. 1092-1097). IEEE.
[6] Wu, Y. L., Goh, K. S., Li, B., You, H., & Chang, E. Y. (2003, August). The anatomy of a multimodal information filter. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 462-471). ACM.
[7] Wu, Y. L., Chang, E., Cheng, K. T., Chang, C. W., Hsu, C. C., Lai, W. C., & Wu, C. T. (2002). MORF: A distributed multimodal information filtering system. Advances in Multimedia Information Processing—PCM 2002, 97-104.
[8] Kia, S. M., Rahmani, H., Mortezaei, R., Moghaddam, M. E., & Namazi, A. (2014). A Novel Scheme for Intelligent Recognition of Pornographic Images. arXiv preprint arXiv:1402.5792.
[9] Marcial-Basilio, J. A., Aguilar-Torres, G., Sánchez-Pérez, G., ToscanoMedina, L. K., & Pérez-Meana, H. M., (2011). Detection of Pornographic Digital Images . International Journal of Computers (Vol. 5)
[10] Wang, J. Z., Li, J., Wiederhold, G., & Firschein, O. (1998). System for screening objectionable images. Computer Communications, 21(15), 1355-1360.
[11] GPU development in recent years, “http://bkultrasound.com/blog/the-next-generation-of-ultrasound-technology”, Referenced on May 15 th, 2015
[12] Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in Machine Learning, 2(1), 1-127.
[13] Lee, T. H., (2015) A Study of General Fine-Grained Classification with Deep Convolutional Neural Networks, National Taiwan University of Science and Technology Department of Computer Science and Information Engineering
[14] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9).
[15] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
[16] Typical CNN architecture,
“https://en.wikipedia.org/wiki/Convolutional_neural_network”
Referenced on May 31 th, 2016
[17] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (pp. 675-678). ACM.
[18] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
[19] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (pp. 675-678). ACM. \
[20] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Berg, A. C. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252.
[21] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807-814).
[22] Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010, June). Learning mid-level features for recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 2559-2566). IEEE.
[23] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.

全文公開日期 2021/07/12 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文