簡易檢索 / 詳目顯示

研究生: 戴世庭
Shih-Ting Dai
論文名稱: 利用深度學習進行自動駕駛車埸景解析
Scene understanding for self-driving cars using deep learning
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 吳乾彌
Chen-Mie Wu
林銘波
Ming-Bo Lin
陳省隆
Hsing-Lung Chen
方文賢
Wen-Hsien Fang
呂政修
Jenq-Shiou Leu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 44
中文關鍵詞: 物體檢測類別不平衡深度學習
外文關鍵詞: object detection, class imbalance, deep learining
相關次數: 點閱:284下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著車輛自動駕駛科技在近年來越來越備受關注,愈來越多的團體開始投入研究。在與自動駕駛相關的資料集和課題中,道路上物體的檢測是受大家關注的重點項目。而如何訓練出高效且精確的物件檢測器,也是受大家關注的焦點。在本篇論文中,經過我們分析 driving-scene 資料集中的物體後,發現了這些問題:1) 中小型的物件樣本占了整體的一大部分,而且大多數物件都有被其它物體遮蔽的情況;2) 各個類別的樣本數量有著高度的不平衡。為了解決這些問題,我們嘗試加入各種方法:RoIAlign 提高 RoI 特徵提取的精確性,對於小型物件的檢測有很好的幫助;Focal Loss 藉由降低 well-classified 樣本的權重,可以幫助解決類別不平衡的問題。加入這些方法後後,我們成功地提升了 Faster R-CNN 物件檢測器在 KITTI 資料集上的效能。另外,在影片中的物體檢測課題中,我們嘗試使用 T-CNN 這個架構,將時間與前後關係資訊加入檢測過程裡來增強效能。我們也分析 T-CNN 在這一類 driving-scene 資料集的侷限。


    As autonomous vehicles have received attention in the recent years, more and more researchers have investigated this active issue. Specifically, object detection in road scenes is one of major tasks for autonomous vehicles, which has showed impressive progress. In this thesis, after analyzing self-driving car datasets, we make the following observations: 1) small and medium-size object samples dominate the datasets, and major portion of objects are occluded, 2) the class-imbalance problem among minority classe degrades the performance of object detection. To address these problems, we use RoIAlign to improve the accuracy of small object detection, and use the focal loss to address class imbalance problem. With these methods, we successful improve the performance of object detection with the Faster R-CNN on KITTI dataset. On object detection from videos, we incorporate temporal and contextual information of videos with T-CNN framework. We also analyze the limitations of T-CNN based on the driving-scene datasets.

    中文摘要 Abstract Acknowledgment Table of contents List of Tables List of Figures 1 Introduction 2 Related Work 2.1 Object Detection 2.2 Object Detection in Videos. 2.3 Class Imbalance 2.4 Driving Scene Dataset 3 Object Detection in Road-Environment Image 3.1 Faster R-CNN 3.1.1 Region Proposal Network 3.1.2 RoI Pooling and Bounding-Box Prediction 3.2 Modification for Faster R-CNN 3.2.1 RoIAlign 3.2.2 Focal Loss 3.3 Implementation Details 3.4 Dataset 3.5 Experimental Results 3.5.1 Aspect Ratios of Anchors 3.5.2 RoIAlign 3.5.3 Focal Loss 3.5.4 Performances Analysis 3.6 Visual Results 3.7 Conclusion 4 Object Detection in Road-Environment Video 4.1 T-CNN 4.1.1 Still-image Object Detection 4.1.2 Multi-context Suppression 4.1.3 Motion-guided Propagation 4.1.4 High-confidence Tracking 4.1.5 Spatial Max-pooling 4.1.6 Temporal Re-scoring 4.2 Experiments 4.2.1 Dataset 4.2.2 Implementation Details 4.3 Experimental Results 4.3.1 MCS and MGP 4.3.2 Tubelet Re-scoring 4.4 Conclusion 5 Conclusion References

    [1] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti ision benchmark suite,” in Proceedings of IEEE Conference on Computer Vision nd Pattern Recognition, pp. 3354–3361, June 2012.
    [2] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, . Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” n Proceedings of IEEE Conference on Computer Vision and Pattern ecognition, pp. 3213–3223, June 2016.
    [3] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, “BDD100K: diverse driving video database with scalable annotation tooling,” arXiv preprint rXiv:1805.04687, 2018.
    [4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object etection with region proposal networks,” in Advances in Neural Information Processing ystems, pp. 91–99, 2015.
    [5] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, . Khosla, M. Bernstein, et al., “ImageNet large scale visual recognition challenge,” nternational Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 015.
    [6] W. Han, P. Khorrami, T. L. Paine, P. Ramachandran, M. Babaeizadeh, H. Shi, J. Li, . Yan, and T. S. Huang, “Seq-nms for video object detection,” arXiv preprint rXiv:1602.08465, 2016.
    [7] K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z. Wang, R. Wang, . Wang, et al., “T-CNN: Tubelets with convolutional neural networks for object etection from videos,” arXiv preprint arXiv:1604.02532, 2016.
    [8] K. Kang, H. Li, T. Xiao,W. Ouyang, J. Yan, X. Liu, and X.Wang, “Object detection n videos with tubelet proposal networks,” in Proceedings of IEEE Conference on omputer Vision and Pattern Recognition, pp. 727–735, July 2017.
    [9] X. Zhu, Y. Xiong, J. Dai, L. Yuan, and Y. Wei, “Deep feature flow for video recognition,” n Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p. 2349–2358, July 2017.
    [10] X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei, “Flow-guided feature aggregation or video object detection,” in Proceedings of IEEE International Conference on omputer Vision, pp. 408–417, Oct 2017.
    [11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate bject detection and semantic segmentation,” in Proceedings of IEEE Conference n Computer Vision and Pattern Recognition, pp. 580–587, June 2014.
    [12] R. Girshick, “Fast R-CNN,” in Proceedings of IEEE International Conference on omputer Vision, pp. 1440–1448, December 2015.
    [13] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, SSD: Single shot MultiBox detector,” in European Conference on Computer Vision, p. 21–37, Springer, 2016.
    [14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, eal-time object detection,” in Proceedings of IEEE Conference on Computer Vision nd Pattern Recognition, pp. 779–788, June 2016.
    [15] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in Proceedings f IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525, uly 2017.
    [16] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object etection,” in Proceedings of IEEE International Conference on Computer Vision, p. 2980–2988, Oct 2017.
    [17] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fully onvolutional networks,” in Advances in Neural Information Processing Systems, p. 379–387, 2016.
    [18] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature yramid networks for object detection,” in Proceedings of IEEE Conference on omputer Vision and Pattern Recognition, pp. 2117–2125, July 2017.
    [19] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of EEE International Conference on Computer Vision, pp. 2961–2969, Oct 2017.
    [20] L. Galteri, L. Seidenari, M. Bertini, and A. Del Bimbo, “Spatio-temporal closedloop bject detection,” IEEE Transactions on Image Processing, vol. 26, pp. 1253– 263, March 2017.
    [21] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester, “Cascade object detection ith deformable part models,” in Proceedings of IEEE Computer Society Conference n Computer Vision and Pattern Recognition, pp. 2241–2248, June 2010.
    [22] A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors ith online hard example mining,” in Proceedings of the IEEE Conference on Computer ision and Pattern Recognition, pp. 761–769, June 2016.
    [23] R. Girshick, I. Radosavovic, G. Gkioxari, P. Doll´ar, and K. He, “Detectron.” ttps://github.com/facebookresearch/detectron, 2018.
    [24] A. Osep, W. Mehner, M. Mathias, and B. Leibe, “Combined image- and worldspace racking in traffic scenes,” in Proceedings of IEEE International Conference n Robotics and Automation, pp. 1988–1995, May 2017.
    [25] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The ascal visual object classes (voc) challenge,” International Journal of Computer ision, vol. 88, no. 2, pp. 303–338, 2010.
    [26] K.-H. Kim, S. Hong, B. Roh, Y. Cheon, and M. Park, “PVANET: Deep ut lightweight neural networks for real-time object detection,” arXiv preprint rXiv:1608.08021, 2016.
    [27] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and . L. Zitnick, “Microsoft COCO: Common objects in context,” in European Conference n Computer Vision, pp. 740–755, Springer, 2014.
    [28] Y. Zhu, Z. Lan, S. Newsam, and A. G. Hauptmann, “Hidden two-stream convolutional etworks for action recognition,” arXiv preprint arXiv:1704.00389, 2017.
    [29] G. Farneb¨ack, “Two-frame motion estimation based on polynomial expansion,” in candinavian Conference on Image Analysis, pp. 363–370, 2003.
    [30] L. Wang, W. Ouyang, X. Wang, and H. Lu, “Visual tracking with fully convolutional etworks,” in Proceedings of IEEE International Conference on Computer Vision, p. 3119–3127, Dec 2015.

    QR CODE