簡易檢索 / 詳目顯示

研究生: 李東哲
Dong-Che Lee
論文名稱: 殘差神經網路之多標籤多分類物件偵測
Multi Label Classification Object Detection Based on Residual Neural Network
指導教授: 王乃堅
Nai-Jian Wang
施慶隆
Ching-Long Shih
口試委員: 李文猶
Wen-Yo Lee
吳修銘
Hsiu-Ming Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 64
中文關鍵詞: 影像處理深度學習卷積神經網路殘差模塊多標籤分類物件偵測
外文關鍵詞: Image Processing, Deep Learning, Convolutional Neural Network, Residual Block, Multi-label Classification, Object Detection
相關次數: 點閱:310下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文之目的在於使用多標籤多分類的框架來實現多物件追蹤。為達上述之目的,使用不具深度值的羅技C615 RGB相機拍攝實驗用之資料集,經預處理後製成訓練資料集,將其轉換為張量形式作為神經網路的輸入,並以殘差神經網路作為主軸建立神經網路,學習訓練資料集中圖片的特徵圖。本文設計八種於商店或倉庫中可能出現的商品作為辨識的目標物件,拍攝各物件不同角度與不同樣貌的圖片,以此建立小型資料集,作為神經網路分類訓練、驗證與測試的資料。本實驗網路模型透過提取物件外觀的特徵作為物件分類的依據,在訓練模型的輸出端以自適應平均池化層連接全連接層,將神經網路模型訓練成功並以此模型作為物件追蹤的目標。最後以多標籤多分類的方法實現多物件分類之物件偵測並得到99.64%的準確率。


    The purpose of this paper is to perform multi-object tracking by using a multi-label and multi-classification framework. To achieve the above objective, the Logitech C615 RGB camera without depth values is used to capture the experimental dataset, which is preprocessed and transformed into a tensor form as the input of the neural network. The residual neural network is used as the main backbone to establish the neural network and learn the feature maps of the images in the training dataset. Eight types of products that may appear in stores or warehouses is selected as target objects for identification, and pictures of each object from different angles and appearances are collected to establish a small dataset as data for neural network classification training, validation, and testing. This experimental network model extracts the appearance features of objects as the basis for object classification. At the output end of the training model, an adaptive average pooling layer is connected to the fully connected layer. The neural network model is successfully trained and used as the target for object tracking. Finally, the multi-label and multi-classification method is used to achieve object detection for multi-object classification with successful rate 99.64%.

    摘要 I Abstract II 目錄 III 圖目錄 V 表目錄 VII 第1章 緒論 1 1.1 研究動機與目的 1 1.2 文獻回顧 1 1.3 論文大綱 3 第2章 系統架構與任務流程 4 2.1 系統架構 4 2.2 硬體設備介紹 5 2.3 軟體開發環境 6 2.4 任務說明 7 2.5 物件定義 8 第3章 資料收集與前處理 11 3.1 收集資料集圖片 12 3.2 資料增強和資料集前處理 13 3.3 YOLO物件定位 16 3.4 訓練資料集和驗證資料集標籤 18 第4章 多標籤多分類神經網路 19 4.1 類神經網路 19 4.1.1.梯度消失問題 21 4.2 殘差神經網路 23 4.2.1 殘差神經網路架構 24 4.2.2 殘差塊 26 4.2.3 卷積層 28 4.2.4 激勵函數 29 4.2.5 池化層 31 4.2.6 全連接層 32 4.2.7 損失函數 33 4.2.8 優化器 34 4.3 多標籤多分類問題 34 4.4 網路訓練模型及參數 35 第5章 實驗結果與討論 37 5.1 訓練過程分析 38 5.1.1 殘差神經網路訓練過程分析 38 5.1.2 殘差神經網路訓練參數量分析 40 5.2 測試資料評估 40 5.2.1 殘差神經網路測試資料評估 41 5.3 實際測試與機率預測結果 44 第6章 結論與建議 51 6.1 結論 51 6.2 建議 52 參考文獻 53

    1. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, pp. 2278- 2324, 1998.
    2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105
    3. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
    4. N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," Journal of machine learning research, vol. 15, pp. 1929-1958, 2014.
    5. D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, "Scalable object detection using deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2147-2154.
    6. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580- 587.
    7. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, et al., "Ssd: Single shot multibox detector," in European conference on computer vision, 2016, pp. 21-37.
    8. C. N. Khac, J. H. Park, and H.-Y. Jung, "Face detection using variance based haar-like feature and svm," World Academy of Science, Engineering and Technology, vol. 60, pp. 165-168, 2009.
    9. P. Viola and M. J. Jones, "Robust real-time face detection," International journal of computer vision, vol. 57, pp. 137-154, 2004.
    10. D. Gerónimo, A. López, D. Ponsa, and A. D. Sappa, "Haar wavelets and edge orientation histograms for on–board pedestrian detection," in Iberian Conference on Pattern Recognition and Image Analysis, 2007, pp. 418-425.
    11. N. X. Tuong, T. Müller, and A. Knoll, "Robust pedestrian detection and tracking from a moving vehicle," in Proc. SPIE, 2011, p. 78780H.
    12. D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International journal of computer vision, vol. 60, pp. 91-110, 2004.

    13. H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: Speeded up robust features," Computer vision–ECCV 2006, pp. 404-417, 2006.
    14. Redmon, J., et al. You Only Look Once: Unified, Real-Time Object Detection. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
    15. Tao, Gong, Liu, Bin, Chu, Qi, & Yu, Nenghai, "Using Multi-Label Classification to Improve Object Detection," Neurocomputing, vol. 370, pp. 174-185, 2019.
    16. Ya Wang, Dongliang He, Fu Li, Xiang Long, Zhichao Zhou, Jinwen Ma, and Shilei Wen. Multi-label classification with label graph superimposing. 2019. arXiv:1911.09243.
    17. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
    18. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, 2016, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770 – 778.

    QR CODE