簡易檢索 / 詳目顯示

研究生: 邱建誠
Chien-Cheng Chyou
論文名稱: 基於物件偵測網路之邊界框資料結構改良
Bounding Box Data Structure Improvement of Object Detection Network
指導教授: 王乃堅
Nai-Jian Wang
口試委員: 王乃堅
Nai-Jian Wang
呂學坤
Shyue-Kung Lu
郭景明
Jing-Ming Guo
鍾順平
Shun-Ping Chung
蘇順豐
Shun-Feng Su
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 37
中文關鍵詞: 物件偵測類神經網路深度學習
外文關鍵詞: Object Detection, Artificial Neural Network, Deep Learning
相關次數: 點閱:273下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 物件偵測(object detection)的目地是找出感興趣的物件種類,並標出物件的位置。常見的應用有機器視覺、工廠自動化、電動車等。傳統的物件偵測網路其物件邊界框的中心位置被限制在一定的範圍內,導致訓練時只有一個網格可以正確預測出一個物件。這種訓練方法限制住了訓練的方法,減少訓練出更好物件偵測網路的可能性。本論文提出新的物件邊界框資料結構,可以完全擺脫物件中心的限制,使多個網格都能預測同一物件。基於新的邊界框資料結構,本論文提出另一種訓練方法,此訓練方法有助於降低物件中心位於網格邊緣時的訓練難度,使網路模型的特徵能夠服務其他有難度的物件,因此能偵測出一些舊訓練方法會偵測不到的物件。除此之外,新的邊界框資料結構不但在舊的訓練方法中沒有副作用,還更能適應新的訓練方法。在舊的訓練方法中舊邊界框資料結構的邊界框重合度(intersection over union, IoU)為88.8%,新的邊界框資料結構則是89.4%;在新的訓練方法中,舊的邊界框資料結構的邊界框重合度則是87.9%,新的邊界框資料結構為89.6%。


    Object detection aims to find the objects which people are interested in, and findthese objects’ position and category. The applications of object detection includemachine vision, factory automation, and electric car.The data structure of bounding box in traditional object detection network limitsobjects’ center in certain range. Therefore, only one grid can predict one objectcorrectly. This limits the training method, and may miss some good method to trainbetter object detection network. In this thesis, a new data structure of boundingbox is proposed, and make bounding box get rid of the limit of objects’ center.By applying this data structure of bounding box, many grid can predict one objectcorrectly. Based on th new data structure, a new training method is proposed in thisthesis. This training method helps to reduce the difficulty of training objects whosecenters are near boundary of grids. Then the feature of network model can serveother hard training object. Therefore, the proposed training method can detectsome objects that can’t be detected in old training method.What’s more, not only the new data structure of bounding box makes no sideeffect in old training method, but also adapt to new training method better. In oldtraining method, old data structure gets intersection over union(IoU) 88.8% withtest data, and new data structure gets IoU 89.4%. In new training method, old datastructure gets IoU 87.9%, and new data structure gets IoU 89.6%.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VII 表目錄 VIII 第一章 緒論 1 1.1研究背景與動機 1 1.2文獻回顧 1 1.2.1用深度學習擷取特徵 2 1.2.2用深度學習篩選候選區 2 1.2.3單一階段物件偵測 2 1.2.4提升準確率的結構 3 1.2.5focal loss 3 1.3研究目標 3 1.4論文架構 4 第二章物件偵測網路的架構與訓練 5 2.1類神經網路基本知識 5 2.1.1神經元 5 2.1.2全連接層和卷積層 6 2.1.3深度學習 7 2.1.4梯度下降法 7 2.1.5過擬合 8 2.2系統架構 9 2.2.1系統環境 9 2.2.2網路模型總體架構 9 2.2.3網路結果的解碼 12 2.3網路模型訓練 15 2.3.1物件位置的指定 15 2.3.2錨框的指定和訓練 15 2.3.3損失函數 16 2.3.4處理訓練資料 19 2.3.5訓練參數選定 21 第三章 架構改進 22 3.1網路架構探討 22 3.2網路模型訓練 23 3.2.1物件位置的指定 23 3.2.2物件信心值的指定 24 3.2.3物件邊界框的訓練 25 3.2.4損失函數 25 第四章 實驗結果與分析 27 4.1實驗結果 27 4.1.1不同訓練法結果比較 27 4.1.2不同邊界框資料結構的重合度比較 29 4.2實驗分析 29 4.2.1不同訓練法結果比較 29 4.2.2不同邊界框資料結構的重合度比較 30 第五章 結論與未來研究方向 31 5.1結論 31 5.2未來研究方向 32 5.2.1其他可能的訓練方法 32 5.2.2其他可能的網路架構 32 附錄 34 VOC訓練資料訓練結果 34 參考資料 36

    [1] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
    [2] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
    [3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.
    [4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition
    , 2016, pp. 779–788.
    [5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision, Springer, 2016, pp. 21–37.
    [6] T.-Y. Lin, P. Doll ́ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in CVPR, vol. 1, 2017, p. 4.
    [7] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll ́ar, “Focal loss for dense object detection,”arXiv preprint arXiv:1708.02002, 2017.
    [8] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
    [9] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,“Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
    [10] A. Krogh and J. A. Hertz, “A simple weight decay can improve generalization,” in Advances in neural information processing systems, 1992, pp. 950–957.
    [11] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S.Ghemawat, G. Irving, M. Isard, et al., “Tensorflow: A system for large-scale machine learning.,” in OSDI, vol. 16, 2016, pp. 265–283.
    [12] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
    [13] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.37

    QR CODE