利用特徵分享加速自駕車場景解析｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃立寬 Li-Kuan Huang
論文名稱：	利用特徵分享加速自駕車場景解析 The study of a feature-sharing approach to parse road scenesfor self-driving cars
指導教授：	陳郁堂 Yie-Tarng Chen
口試委員:	陳省隆 Hsing-Lung Chen 方文賢 Wen-Hsien Fang 林銘波 Ming-Bo Lin 呂政修 Jenq-Shiou Leu
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2018
畢業學年度：	107
語文別：	英文
論文頁數：	39
中文關鍵詞：	物體檢測、深度學習
外文關鍵詞：	object detection, deep learning
相關次數：	點閱：378 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本文中，我們研究了一種特徵共享方法，用於加速自動駕駛汽車的視頻對象檢測，同時在一定程度上保持檢測精度。與靜止圖像中的物體檢測的一般數據集不同，自動駕駛汽車數據集具有高比例的具有嚴重遮擋和運動的中小物體，這對於視頻中的物體檢測提出了挑戰性問題。在本文中，我們使用一種稱為Deep Feature Flow框架的特徵共享方法來加速非關鍵幀的對象檢測。為了進一步提高中小型物體的檢測精度，我們的框架中已經採用了幾種方法。例如，ROI Alignment用於提取ROI的特徵而不是使用ROI Pooling，而Sequence Non-Maximum Suppression也將時間信息和上下文信息合併到對象檢測中。為避免鏈接不相關的對象，我們研究了非最大抑制中的相似度量。最後，Bounding Box Voting用於微調最終物件的定位。為了評估所提議方法的性能，我們使用三個數據集：KITTI、台灣高速公路和台灣街道。

In this thesis, we investigate a feature-sharing approach to accelerate object detection in videos for self-driving cars, while maintaining detection accuracy to a certain degree.
Unlike general datasets for the object detection in still images, the self-driving car datasets have a high proportion of small and medium-sized objects with serious occlusion and ego motion, which pose a challenge issue for object detection in videos. In this thesis, we use a feature-sharing approach, called deep feature flow framework to speed up object detection for non-key frames. To further improve the detection precision for small and medium-sized objects, several approaches have adapted in our framework. For example, ROI Alignment is used to extract features for ROIs instead of ROI Pooling, and Sequence Non-Maximum Suppression incorporates temporal information and context information into object detection as well. To avoid linking unrelated objects, we investigate a Similarity Metric in non-maximum suppression. Finally, Bounding Box Voting is used to finetune the final object localization. To assess the performance of the proposed approach, we use three datasets：Kitti, Taiwan highway and Taiwan Street datadets.

中文摘要
Abstract
Acknowledgment
Table of contents
List of Tables
List of Figures
Introduction
Related Work
1 Object Detection in Images
2 Object Detection in Videos
Object Detection with Feature Flow for Video
1 Deep Feature Flow
1.1 Flow Network
2 Feature Extraction
3 Region Proposal Network
4 RoIAlign
5 Faster R-CNN with ROIAlign
6 R-FCN
7 Sequence Non-maximum Suppression
8 Similarity metric of Non-maximum Suppression with Bootstrap
9Bounding Box Voting
Experimental Results
1 Datasets
2 Evaluation Protocol and Experimental Setup
3 Calculation time for each module
4 Replace ROIPooling with ROIAlign
5 R-FCN and Faster R-CNN with Deep Feature Flow
6 Assessment of sequence length in Seq-NMS
7 Failure Cases
8 Successful Cases
9 Discussion
Conclusion
References
Appendix: Reasons for affecting Seq-NMS speed
                                

[1] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kittivision benchmark suite,” inProceedings of IEEE Conference on Computer Visionand Pattern Recognition, pp. 3354–3361, June 2012.
[2] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene under-standing,” inProceedings of IEEE Conference on Computer Vision and PatternRecognition, pp. 3213–3223, June 2016.
[3] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, “BDD100K:A diverse driving video database with scalable annotation tooling,”arXiv preprintarXiv:1805.04687, 2018.
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time objectdetection with region proposal networks,” inAdvances in Neural Information Pro-cessing Systems, pp. 91–99, 2015.
[5] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa-thy, A. Khosla, M. Bernstein,et al., “ImageNet large scale visual recognition chal-lenge,”International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252,2015.
[6] W. Han, P. Khorrami, T. L. Paine, P. Ramachandran, M. Babaeizadeh, H. Shi, J. Li,S. Yan, and T. S. Huang, “Seq-nms for video object detection,”arXiv preprintarXiv:1602.08465, 2016.
[7] K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang, “Object detectionin videos with tubelet proposal networks,” inProceedings of IEEE Conference onComputer Vision and Pattern Recognition, pp. 727–735, July 2017.
[8] X. Zhu, Y. Xiong, J. Dai, L. Yuan, and Y. Wei, “Deep feature flow for video recog-nition,” inProceedings of IEEE Conference on Computer Vision and Pattern Recog-nition, pp. 2349–2358, July 2017.
[9] X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei, “Flow-guided feature aggregationfor video object detection,” inProceedings of IEEE International Conference onComputer Vision, pp. 408–417, Oct 2017.
[10] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for ac-curate object detection and semantic segmentation,” inProceedings of IEEE Con-ference on Computer Vision and Pattern Recognition, pp. 580–587, June 2014.
[11] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolu-tional networks for visual recognition,” inEuropean conference on computer vision,pp. 346–361, Springer, 2014.
[12] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg,“SSD: Single shot MultiBox detector,” inEuropean Conference on Computer Vision,pp. 21–37, Springer, 2016.
[13] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified,real-time object detection,” inProceedings of IEEE Conference on Computer Visionand Pattern Recognition, pp. 779–788, June 2016.
[14] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” inProceedingsof IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525,July 2017.
[15] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense objectdetection,” inProceedings of IEEE International Conference on Computer Vision,pp. 2980–2988, Oct 2017.
[16] R. Girshick, “Fast R-CNN,” inProceedings of IEEE International Conference onComputer Vision, pp. 1440–1448, December 2015.
[17] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fullyconvolutional networks,” inAdvances in Neural Information Processing Systems,pp. 379–387, 2016.
[18] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProceedings of IEEE Conference onComputer Vision and Pattern Recognition, pp. 2117–2125, July 2017.
[19] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” inProceedings ofIEEE International Conference on Computer Vision, pp. 2961–2969, Oct 2017.
[20] L. Galteri, L. Seidenari, M. Bertini, and A. Del Bimbo, “Spatio-temporal closed-loop object detection,”IEEE Transactions on Image Processing, vol. 26, pp. 1253–1263, March 2017.
[21] K. Kang, W. Ouyang, H. Li, and X. Wang, “Object detection from video tubeletswith convolutional neural networks,” inProceedings of IEEE Conference on Com-puter Vision and Pattern Recognition, pp. 817–825, June 2016.
[22] Y. Xu, X. Zhou, P. Liu, and H. Xu, “Rapid pedestrian detection based on deepomega-shape features with partial occlusion handing,”Neural Processing Letters,pp. 1–15, 2018.
[23] S. Gidaris and N. Komodakis, “Object detection via a multi-region and semanticsegmentation-aware cnn model,” inProceedings of the IEEE International Confer-ence on Computer Vision, pp. 1134–1142, 2015.
[24] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. VanDer Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolu-tional networks,” inProceedings of the IEEE International Conference on ComputerVision, pp. 2758–2766, 2015.
[25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”inProceedings of the IEEE conference on computer vision and pattern recognition,pp. 770–778, 2016.
[26] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ́ar, andC. L. Zitnick, “Microsoft COCO: Common objects in context,” inEuropean Con-ference on Computer Vision, pp. 740–755, Springer, 2014.
[27] D. Hoiem, “Object detection analysis code (v2),” 2014.

簡易檢索 / 詳目顯示

相關論文