簡易檢索 / 詳目顯示

研究生: 黃立寬
Li-Kuan Huang
論文名稱: 利用特徵分享加速自駕車場景解析
The study of a feature-sharing approach to parse road scenesfor self-driving cars
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 陳省隆
Hsing-Lung Chen
方文賢
Wen-Hsien Fang
林銘波
Ming-Bo Lin
呂政修
Jenq-Shiou Leu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2018
畢業學年度: 107
語文別: 英文
論文頁數: 39
中文關鍵詞: 物體檢測深度學習
外文關鍵詞: object detection, deep learning
相關次數: 點閱:378下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本文中,我們研究了一種特徵共享方法,用於加速自動駕駛汽車的視頻對象檢測,同時在一定程度上保持檢測精度。與靜止圖像中的物體檢測的一般數據集不同,自動駕駛汽車數據集具有高比例的具有嚴重遮擋和運動的中小物體,這對於視頻中的物體檢測提出了挑戰性問題。在本文中,我們使用一種稱為Deep Feature Flow框架的特徵共享方法來加速非關鍵幀的對象檢測。為了進一步提高中小型物體的檢測精度,我們的框架中已經採用了幾種方法。例如,ROI Alignment用於提取ROI的特徵而不是使用ROI Pooling,而Sequence Non-Maximum Suppression也將時間信息和上下文信息合併到對象檢測中。為避免鏈接不相關的對象,我們研究了非最大抑制中的相似度量。最後,Bounding Box Voting用於微調最終物件的定位。為了評估所提議方法的性能,我們使用三個數據集:KITTI、台灣高速公路和台灣街道。


    In this thesis, we investigate a feature-sharing approach to accelerate object detection in videos for self-driving cars, while maintaining detection accuracy to a certain degree.
    Unlike general datasets for the object detection in still images, the self-driving car datasets have a high proportion of small and medium-sized objects with serious occlusion and ego motion, which pose a challenge issue for object detection in videos. In this thesis, we use a feature-sharing approach, called deep feature flow framework to speed up object detection for non-key frames. To further improve the detection precision for small and medium-sized objects, several approaches have adapted in our framework. For example, ROI Alignment is used to extract features for ROIs instead of ROI Pooling, and Sequence Non-Maximum Suppression incorporates temporal information and context information into object detection as well. To avoid linking unrelated objects, we investigate a Similarity Metric in non-maximum suppression. Finally, Bounding Box Voting is used to finetune the final object localization. To assess the performance of the proposed approach, we use three datasets:Kitti, Taiwan highway and Taiwan Street datadets.

    中文摘要 Abstract Acknowledgment Table of contents List of Tables List of Figures 1 Introduction 2 Related Work 2.1 Object Detection in Images 2.2 Object Detection in Videos 3 Object Detection with Feature Flow for Video 3.1 Deep Feature Flow 3.1.1 Flow Network 3.2 Feature Extraction 3.3 Region Proposal Network 3.4 RoIAlign 3.5 Faster R-CNN with ROIAlign 3.6 R-FCN 3.7 Sequence Non-maximum Suppression 3.8 Similarity metric of Non-maximum Suppression with Bootstrap 3.9Bounding Box Voting 4 Experimental Results 4.1 Datasets 4.2 Evaluation Protocol and Experimental Setup 4.3 Calculation time for each module 4.4 Replace ROIPooling with ROIAlign 4.5 R-FCN and Faster R-CNN with Deep Feature Flow 4.6 Assessment of sequence length in Seq-NMS 4.7 Failure Cases 4.8 Successful Cases 4.9 Discussion 5 Conclusion References Appendix: Reasons for affecting Seq-NMS speed

    [1] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kittivision benchmark suite,” inProceedings of IEEE Conference on Computer Visionand Pattern Recognition, pp. 3354–3361, June 2012.
    [2] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene under-standing,” inProceedings of IEEE Conference on Computer Vision and PatternRecognition, pp. 3213–3223, June 2016.
    [3] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, “BDD100K:A diverse driving video database with scalable annotation tooling,”arXiv preprintarXiv:1805.04687, 2018.
    [4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time objectdetection with region proposal networks,” inAdvances in Neural Information Pro-cessing Systems, pp. 91–99, 2015.
    [5] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa-thy, A. Khosla, M. Bernstein,et al., “ImageNet large scale visual recognition chal-lenge,”International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252,2015.
    [6] W. Han, P. Khorrami, T. L. Paine, P. Ramachandran, M. Babaeizadeh, H. Shi, J. Li,S. Yan, and T. S. Huang, “Seq-nms for video object detection,”arXiv preprintarXiv:1602.08465, 2016.
    [7] K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang, “Object detectionin videos with tubelet proposal networks,” inProceedings of IEEE Conference onComputer Vision and Pattern Recognition, pp. 727–735, July 2017.
    [8] X. Zhu, Y. Xiong, J. Dai, L. Yuan, and Y. Wei, “Deep feature flow for video recog-nition,” inProceedings of IEEE Conference on Computer Vision and Pattern Recog-nition, pp. 2349–2358, July 2017.
    [9] X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei, “Flow-guided feature aggregationfor video object detection,” inProceedings of IEEE International Conference onComputer Vision, pp. 408–417, Oct 2017.
    [10] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for ac-curate object detection and semantic segmentation,” inProceedings of IEEE Con-ference on Computer Vision and Pattern Recognition, pp. 580–587, June 2014.
    [11] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolu-tional networks for visual recognition,” inEuropean conference on computer vision,pp. 346–361, Springer, 2014.
    [12] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg,“SSD: Single shot MultiBox detector,” inEuropean Conference on Computer Vision,pp. 21–37, Springer, 2016.
    [13] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified,real-time object detection,” inProceedings of IEEE Conference on Computer Visionand Pattern Recognition, pp. 779–788, June 2016.
    [14] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” inProceedingsof IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525,July 2017.
    [15] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense objectdetection,” inProceedings of IEEE International Conference on Computer Vision,pp. 2980–2988, Oct 2017.
    [16] R. Girshick, “Fast R-CNN,” inProceedings of IEEE International Conference onComputer Vision, pp. 1440–1448, December 2015.
    [17] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fullyconvolutional networks,” inAdvances in Neural Information Processing Systems,pp. 379–387, 2016.
    [18] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProceedings of IEEE Conference onComputer Vision and Pattern Recognition, pp. 2117–2125, July 2017.
    [19] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” inProceedings ofIEEE International Conference on Computer Vision, pp. 2961–2969, Oct 2017.
    [20] L. Galteri, L. Seidenari, M. Bertini, and A. Del Bimbo, “Spatio-temporal closed-loop object detection,”IEEE Transactions on Image Processing, vol. 26, pp. 1253–1263, March 2017.
    [21] K. Kang, W. Ouyang, H. Li, and X. Wang, “Object detection from video tubeletswith convolutional neural networks,” inProceedings of IEEE Conference on Com-puter Vision and Pattern Recognition, pp. 817–825, June 2016.
    [22] Y. Xu, X. Zhou, P. Liu, and H. Xu, “Rapid pedestrian detection based on deepomega-shape features with partial occlusion handing,”Neural Processing Letters,pp. 1–15, 2018.
    [23] S. Gidaris and N. Komodakis, “Object detection via a multi-region and semanticsegmentation-aware cnn model,” inProceedings of the IEEE International Confer-ence on Computer Vision, pp. 1134–1142, 2015.
    [24] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. VanDer Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolu-tional networks,” inProceedings of the IEEE International Conference on ComputerVision, pp. 2758–2766, 2015.
    [25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”inProceedings of the IEEE conference on computer vision and pattern recognition,pp. 770–778, 2016.
    [26] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ́ar, andC. L. Zitnick, “Microsoft COCO: Common objects in context,” inEuropean Con-ference on Computer Vision, pp. 740–755, Springer, 2014.
    [27] D. Hoiem, “Object detection analysis code (v2),” 2014.

    QR CODE