基於錨框鄰近區域之YOLOV4跨層物件偵測｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	田家愷 Chia-Kai Tien
論文名稱：	基於錨框鄰近區域之YOLOV4跨層物件偵測 Cross-layer Object detection on YOLOV4 Based on neighboring region of anchor loction
指導教授：	林伯慎 Bor-Shen Lin
口試委員:	楊傳凱 Chuan-Kai Yang 羅乃維 Nai-Wei Lo
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	59
中文關鍵詞：	物件偵測、YOLOV4 、特徴金字塔、錨點鄰近區偵測、跨層多錨框偵測
外文關鍵詞：	Object Detection, YOLOV4, Feature Pyramid, Object Detection of Anchor Neighboring Region, Cross-layer Multi-anchors Object Detection
相關次數：	點閱：270 下載：17
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

近年來，卷積神經網路已經成功地應用於物件偵測。為了表示圖片中各個位置、不同外觀的物件特徵，多數的卷積神經網路模型，如YOLOV3或YOLOV4，通常會在各尺度上佈滿稠密的網格，每個格點會對應到多組錨框，作為不同尺寸(長寬比)之物件定位的參考。在將所有訓練物件根據位置和大小分配給對應的格點和錨框後，就可進行分類器和位置廻歸器的訓練。此架構下，一個物件在哪個位置預測，是由物件中心在各特徵層中所對應的中心格點負責；而物件會在哪個層級的哪個錨框進行預測，則是根據物件尺寸大小預先指定好，通常是以物件分群決定物件的的尺寸群組。然而，由於物件的種類各異，各物件形狀、方向、或姿態的變異可能很大，也不一定有對稱性；因此，中心格點未必是偵測物件的最佳位置。另外，使用分群演算法強迫各個物件只由某一層級的特定錨框進行偵測，雖然能夠簡化訓練程序，卻可能不是最優的。
　　本論文在YOLOV4框架下提出了兩種改進方法，分別是：使用物件中心的鄰近格點來輔助物件偵測，以及容許物件被分配到跨層級的多個錨框，並由它們同時學習與偵測。在VOC 2007 TestSet實驗結果顯示，我們所提的兩個方法均能夠顯著提升偵測效能。在使用了物件中心附近寬高20％的區域的鄰近格點輔助、以及跨層級錨框的協同偵測，平均精確率可由原 YOLOV4架構的84.04%提升至87.86%。

In recent years, we have witnessed the success of convolutional neural networks in object detection. To represent the spatial layout of objects with various appearances and aspect ratios in convolutional features, such models as YOLOV3 or YOLOV4 are covered with dense grids on the feature pyramid, and every grid uses a few anchors, each with a scale and aspect ratio, as a reference for object positioning. After every object is assigned to a grid and an anchor according to its aspect ratio, the learning of classifiers and location regressors start. Within such framework, for a layer the central grid containing the center of an object is responsible to detect the object, while an anchor is assigned in advance according to the scale and aspect ratio of the object through object clustering. However, the size, orientation, or pose of the objects vary largely and their shapes may not be symmetric, so the central grid is probably not the best location to detect the object. Additionally, forcing every object to be detected in a specific anchor in a layer may not suboptimal.
Accordingly, this paper proposes two approaches to improve the YOLOV4 framework. First, the neighboring grids nearby the center of the object learn to assist the object detection as well as the central grid. Second, every labeled object could be assigned to and learned by more than one anchors across layers, instead of being learned by the uniquely assigned a single anchor within a layer. When tested on VOC 2007 Test Set, it was found the two approaches can improve the performance very significantly. By using the grids within 20% neighboring region and cross-layer anchors to assist the learning for detection, the mean average precision can be improved from 84.04% of the baseline YOLOV4 up to 87.86%.

目錄
第1章    序論    2
1.1    研究背景與動機    2
1.2    研究主要成果    3
1.3    論文組織與架構    4
第2章    文獻回顧    5
2.1    物件偵測定義    5
2.2    Anchor-Based中的兩階段物件偵測模型    7
2.3    Anchor-Based中的一階段物件偵測模型    11
2.4    Anchor-Free中的物件偵測模型    16
第3章    物件偵測模型輸出的架構研究    22
3.1    引言    22
3.2    YOLO V4的網路輸出架構    24
3.3    基線實驗    30
第4章    研究方法與實驗結果    34
4.1    基於錨點鄰近區域物件偵測    34
4.2    跨層物件偵測方法    40
第5章    結論    46
參考文獻    47


                                

[1] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick. Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 2980–2988. IEEE, 2017.
[2] C. Bhagavatula, C. Zhu, K. Luu, and M. Savvides. Faster than real-time facial alignment: A 3d spatial transformer network approach in unconstrained poses. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
[3] X. Liang, T. Wang, L. Yang, and E. Xing. Cirl: Controllable imitative reinforcement learning for vision-based selfdriving. arXiv preprint arXiv:1807.03776, 2018.
[4] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html.
[5] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, Cham, 2014
[6] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, Piotr Dollár. Focal Loss for Dense Object Detection. Computer Science 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
[7] I. Goodfellow et al. "Generative adversarial nets", in Advances in neural information processing systems. Pages, 2672-2680, 2014.
[8] A. Oord et al. “WAVENET: A GENERATIVE MODEL FOR RAW AUDIO”, arXiv preprint arXiv:1609.03499, 2016.
[9] Luis Perez, Jason Wang, The Effectiveness of Data Augmentation in Image Classification using Deep Learning, arXiv:1712.04621, 2017.
[10] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, Feature Pyramid Networks for Object Detection, arXiv:1612.03144, 2017.
[11] Hanchao Li, Pengfei Xiong, Jie An, Lingxue Wang, Pyramid Attention Network for Semantic Segmentation, arXiv:1805.10180,2018.
[12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, arXiv:1406.4729,2015.
[13] Alexey Bochkovskiy, Chien-Yao Wang, H. Liao “YOLOv4: Optimal Speed and Accuracy of Object Detection”, Computer Science, Engineering, ArXiv 2020.
[14] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
[15] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, arXiv:1311.2524, 2014
[16] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, Selective Search for Object Recognition,2016
[17] Ross Girshick, “Fast R-CNN”, arXiv:1504.08083, 2015.
[18] Shaoqing Ren et al., “Faster r-cnn: Towards real-time object detection with region proposal networks”, arXiv:1506.01497, 2016.
[19] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, “You only look once: Unified, real-time object detection”, arXiv:1506.02640 [cs.CV], 2016
[20] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berga, “SSD: Single Shot MultiBox Detector”, arXiv:1512.02325, 2016.
[21] Joseph Redmon, Ali Farhadi, “YOLO9000: Better, Faster, Stronger”, arXiv:1612.08242 (cs),2016.
[22] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár, “Focal Loss for Dense Object Detection”, arXiv:1708.02002, 2018.
[23] Joseph Redmon, Ali Farhadi, “YOLOv3: An Incremental Improvement”, arXiv:1804.02767, 2018.
[24] Hei Law, Jia Deng, “CornerNet: Detecting Objects as Paired Keypoints”, arXiv:1808.01244, 2018.
[25] Chenchen Zhu, Yihui He, Marios Savvides, Feature Selective Anchor-Free Module for Single-Shot Object Detection, arXiv:1903.00621, 2019.
[26] Zhi Tian, Chunhua Shen, Hao Chen, Tong He, “FCOS: Fully Convolutional One-Stage Object Detection”, arXiv:1904.01355, 2019.
[27] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection”, arXiv:2004.10934, 2020.

簡易檢索 / 詳目顯示

相關論文