研究生: |
李東哲 Dong-Che Lee |
---|---|
論文名稱: |
殘差神經網路之多標籤多分類物件偵測 Multi Label Classification Object Detection Based on Residual Neural Network |
指導教授: |
王乃堅
Nai-Jian Wang 施慶隆 Ching-Long Shih |
口試委員: |
李文猶
Wen-Yo Lee 吳修銘 Hsiu-Ming Wu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 64 |
中文關鍵詞: | 影像處理 、深度學習 、卷積神經網路 、殘差模塊 、多標籤分類 、物件偵測 |
外文關鍵詞: | Image Processing, Deep Learning, Convolutional Neural Network, Residual Block, Multi-label Classification, Object Detection |
相關次數: | 點閱:310 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文之目的在於使用多標籤多分類的框架來實現多物件追蹤。為達上述之目的,使用不具深度值的羅技C615 RGB相機拍攝實驗用之資料集,經預處理後製成訓練資料集,將其轉換為張量形式作為神經網路的輸入,並以殘差神經網路作為主軸建立神經網路,學習訓練資料集中圖片的特徵圖。本文設計八種於商店或倉庫中可能出現的商品作為辨識的目標物件,拍攝各物件不同角度與不同樣貌的圖片,以此建立小型資料集,作為神經網路分類訓練、驗證與測試的資料。本實驗網路模型透過提取物件外觀的特徵作為物件分類的依據,在訓練模型的輸出端以自適應平均池化層連接全連接層,將神經網路模型訓練成功並以此模型作為物件追蹤的目標。最後以多標籤多分類的方法實現多物件分類之物件偵測並得到99.64%的準確率。
The purpose of this paper is to perform multi-object tracking by using a multi-label and multi-classification framework. To achieve the above objective, the Logitech C615 RGB camera without depth values is used to capture the experimental dataset, which is preprocessed and transformed into a tensor form as the input of the neural network. The residual neural network is used as the main backbone to establish the neural network and learn the feature maps of the images in the training dataset. Eight types of products that may appear in stores or warehouses is selected as target objects for identification, and pictures of each object from different angles and appearances are collected to establish a small dataset as data for neural network classification training, validation, and testing. This experimental network model extracts the appearance features of objects as the basis for object classification. At the output end of the training model, an adaptive average pooling layer is connected to the fully connected layer. The neural network model is successfully trained and used as the target for object tracking. Finally, the multi-label and multi-classification method is used to achieve object detection for multi-object classification with successful rate 99.64%.
1. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, pp. 2278- 2324, 1998.
2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105
3. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
4. N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," Journal of machine learning research, vol. 15, pp. 1929-1958, 2014.
5. D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, "Scalable object detection using deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2147-2154.
6. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580- 587.
7. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, et al., "Ssd: Single shot multibox detector," in European conference on computer vision, 2016, pp. 21-37.
8. C. N. Khac, J. H. Park, and H.-Y. Jung, "Face detection using variance based haar-like feature and svm," World Academy of Science, Engineering and Technology, vol. 60, pp. 165-168, 2009.
9. P. Viola and M. J. Jones, "Robust real-time face detection," International journal of computer vision, vol. 57, pp. 137-154, 2004.
10. D. Gerónimo, A. López, D. Ponsa, and A. D. Sappa, "Haar wavelets and edge orientation histograms for on–board pedestrian detection," in Iberian Conference on Pattern Recognition and Image Analysis, 2007, pp. 418-425.
11. N. X. Tuong, T. Müller, and A. Knoll, "Robust pedestrian detection and tracking from a moving vehicle," in Proc. SPIE, 2011, p. 78780H.
12. D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International journal of computer vision, vol. 60, pp. 91-110, 2004.
13. H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: Speeded up robust features," Computer vision–ECCV 2006, pp. 404-417, 2006.
14. Redmon, J., et al. You Only Look Once: Unified, Real-Time Object Detection. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
15. Tao, Gong, Liu, Bin, Chu, Qi, & Yu, Nenghai, "Using Multi-Label Classification to Improve Object Detection," Neurocomputing, vol. 370, pp. 174-185, 2019.
16. Ya Wang, Dongliang He, Fu Li, Xiang Long, Zhichao Zhou, Jinwen Ma, and Shilei Wen. Multi-label classification with label graph superimposing. 2019. arXiv:1911.09243.
17. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
18. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, 2016, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770 – 778.