簡易檢索 / 詳目顯示

研究生: 吳岳桐
Yueh-Tung Wu
論文名稱: 基於深度學習的物體檢測和物體追蹤之結合用於自動駕駛汽車
Joint Object Detection and Object Tracking for Self-Driving Cars using Deep Learning
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 方文賢
Wen-Hsien Fang
陳省隆
Hsing-Lung Chen
呂政修
Jenq-Shiou Leu
吳乾彌
Chen-Mie Wu
林銘波
Ming-Bo Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 47
中文關鍵詞: 即時多目標追蹤基於CNN之追蹤器基於CNN之物件偵測餘弦相似度匈牙利演算法
外文關鍵詞: Online multi-object tracking, CNN-based tracker, CNN-based detector, cosine similarity, Hungarian algorithm
相關次數: 點閱:330下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文基於卷積神經網路(CNN),提出了一個簡單又有效的即時多目標追蹤方法。我們所提出的方法整合了主要包含四個部分:基於CNN的YOLOv3物件偵測器、基於CNN的Re3追蹤、餘弦相似度網路和匈牙利演算法。透過YOLOv3物件偵測器,我們提出的方法可以自動和有效地產生新目標來追蹤。而在Re3追蹤器裡,透過CNN特徵抓取了追蹤目標的外觀資訊,並透過LSTM來得到物體移動的資訊,再將其預測結果與相鄰物件偵測的結果融合。之後,透過使用餘弦相似度網路計算多個追丟的目標和多個物件偵測目標之間的相似度,我們用匈牙利演算法來重新追回追丟的目標。最後,使用抑制相鄰目標的機制,來避免冗餘物件的追蹤。我們提出的方法是一個通用方案。因此,即使沒有經過微調,這個簡單的架構也可以在MOT Challenge和KITTI的測試中獲得良好的性能。


    This thesis presents a simple but effective approach for real-time multiple object tracking using convolutional neural networks (CNNs). The proposed approach aggregates YOLOv3 object detector, a CNN-based object detector, Re3 object tracker, a CNN-based tracker and the Hungarian algorithm. The YOLOv3 object detector can automatically and effectively detect the new target to initialize the object tracking. In the Re3 tracker, the CNN embedding can capture the appearance of the tracked-state target and LSTMs keep track of motion information for the fusion with adjacent detection. Afterward, we leverage Hungarian algorithm to recover the lost-state targets, where the cosine similarity network is used to obtain pairwise similarity between lost-state targets and object detections. Finally, a neighboring target suppression mechanism is used to avoid redundant object tracking. The proposed method is a generic scenario. Consequently, the simple scheme can achieve a good performance on the MOT Challenge and KITTI benchmark even without fine-tuning.

    中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 Overall Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Object detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Object tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 Object association . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.5 Object detector and object tracker fusion . . . . . . . . . . . . . . . . 16 3.6 System Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.6.1 Life of an object . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.6.2 System architecture . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Experimental Result and Analysis . . . . . . . . . . . . . . . . . . . . . . . 21 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Evaluation on Testing Set . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 MOT Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.2 KITTI Benchmark . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Failure Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    [1] S. Ren, K. He, R. Girshick, and J. Sun, \Faster r-cnn: Towards real-time
    object detection with region proposal networks," in Proceedings of the IEEE
    Transactions on Pattern Analysis and Machine Intelligence, pp. 91{99, 2015.
    [2] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
    Berg, \Ssd: Single shot multibox detector," in Proceedings of the European
    Conference on Computer Vision, pp. 21{37, 2016.
    [3] J. Redmon and A. Farhadi, \Yolo9000: Better, faster, stronger," in Proceed-
    ings of the IEEE Conference on Computer Vision and Pattern Recognition,
    pp. 6517{6525, 2017.
    [4] Y. Xiang, A. Alahi, and S. Savarese, \Learning to track: Online multi-object
    tracking by decision making," in Proceedings of the IEEE International Con-
    ference on Computer Vision, pp. 4705{4713, 2015.
    [5] B. Lee, E. Erdenee, S. Jin, M. Y. Nam, Y. G. Jung, and P. K. Rhee, \Multiclass
    multi-object tracking using changing point detection," in Proceedings of
    the European Conference on Computer Vision, pp. 68{83, 2016.
    [6] M. Yang, Y. Wu, and Y. Jia, \A hybrid data association framework for robust
    online multi-object tracking," IEEE Transactions on Image Processing,
    pp. 5667{5679, 2017.
    [7] Y.-m. Song and M. Jeon, \Online multiple object tracking with the hierarchically
    adopted gm-phd lter using motion and appearance," in Proceedings of
    IEEE International Conference on Consumer Electronics-Asia, pp. 1{4, 2016.
    [8] J. Redmon and A. Farhadi, \Yolov3: An incremental improvement," arXiv
    preprint arXiv:1804.02767, 2018.
    [9] D. Gordon, A. Farhadi, and D. Fox, \Re ^3: Real-time recurrent regression
    networks for visual tracking of generic objects," IEEE Robotics and Automation
    Letters, pp. 788{795, 2018.
    [10] L. Leal-Taixe, A. Milan, I. Reid, S. Roth, and K. Schindler, \Motchallenge
    2015: Towards a benchmark for multi-target tracking," arXiv preprint
    arXiv:1504.01942, 2015.
    [11] A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, \Mot16: A benchmark
    for multi-object tracking," arXiv preprint arXiv:1603.00831, 2016.
    [12] A. Geiger, P. Lenz, and R. Urtasun, \Are we ready for autonomous driving?
    the kitti vision benchmark suite," in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 3354{3361, 2012.
    [13] R. Girshick, \Fast r-cnn," in Proceedings of the IEEE International Conference
    on Computer Vision, pp. 1440{1448, 2015.
    [14] J. Dai, Y. Li, K. He, and J. Sun, \R-fcn: Object detection via region-based fully
    convolutional networks," in Proceedings of the Neural Information Processing
    System, pp. 379{387, 2016.
    [15] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, \Focal loss for dense object
    detection," in Proceedings of IEEE International Conference on Computer
    Vision, pp. 2999{3007, 2017.
    [16] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, \Feature
    pyramid networks for object detection," in Proceedings of IEEE Conference
    on Computer Vision and Pattern Recognition, pp. 936{944, 2017.
    [17] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar,
    and C. L. Zitnick, \Microsoft coco: Common objects in context," in Proceedings
    of the European Conference on Computer Vision, pp. 740{755, 2014.
    [18] X. Yan, X. Wu, I. A. Kakadiaris, and S. K. Shah, \To track or to detect?
    an ensemble framework for optimal selection," in Proceedings of the European
    Conference on Computer Vision, pp. 594{607, 2012.
    [19] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool,
    \Online multiperson tracking-by-detection from a single, uncalibrated camera,"
    IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1820{
    1833, 2011.
    [20] N. Dalal and B. Triggs, \Histograms of oriented gradients for human detection,"
    in Proceedings of IEEE Conference on Computer Vision and Pattern
    Recognition, pp. 886{893, 2005.
    [21] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, \Object
    detection with discriminatively trained part-based models," IEEE Transactions
    on Pattern Analysis and Machine Intelligence, pp. 1627{1645, 2010.
    [22] C. Huang, B. Wu, and R. Nevatia, \Robust object tracking by hierarchical
    association of detection responses," in Proceedings of the European Conference
    on Computer Vision, pp. 788{801, 2008.
    [23] H. Pirsiavash, D. Ramanan, and C. C. Fowlkes, \Globally-optimal greedy algorithms
    for tracking a variable number of objects," in Proceedings of IEEE
    Conference on Computer Vision and Pattern Recognition, pp. 1201{1208, 2011.
    [24] A. Milan, S. Roth, and K. Schindler, \Continuous energy minimization for
    multitarget tracking," IEEE Transactions on Pattern Analysis and Machine
    Intelligence, pp. 58{72, 2014.
    [25] A. R. Zamir, A. Dehghan, and M. Shah, \Gmcp-tracker: Global multi-object
    tracking using generalized minimum clique graphs," in Proceedings of the Eu-
    ropean Conference on Computer Vision, pp. 343{356, 2012.
    [26] S. Tang, B. Andres, M. Andriluka, and B. Schiele, \Subgraph decomposition
    for multi-target tracking," in Proceedings of the IEEE Conference on Computer
    Vision and Pattern Recognition, pp. 5033{5041, 2015.
    [27] S.-H. Bae and K.-J. Yoon, \Robust online multi-object tracking based on tracklet
    con dence and online discriminative appearance learning," in Proceedings of
    the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1218{
    1225, 2014.
    [28] K. He, X. Zhang, S. Ren, and J. Sun, \Deep residual learning for image recognition,"
    in Proceedings of the IEEE Conference on Computer Vision and Pattern
    Recognition, pp. 770{778, 2016.
    [29] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
    A. Karpathy, A. Khosla, M. Bernstein, et al., \Imagenet large scale visual
    recognition challenge," International Journal of Computer Vision, pp. 211{252,
    2015.
    [30] A. W. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and
    M. Shah, \Visual tracking: An experimental survey," IEEE Transactions on
    Pattern Analysis and Machine Intelligence, pp. 1442{1468, 2013.
    [31] K. Gre , R. K. Srivastava, J. Koutnk, B. R. Steunebrink, and J. Schmidhuber,
    \Lstm: A search space odyssey," IEEE Transactions on Neural Networks and
    Learning Systems, pp. 2222{2232, 2017.
    [32] D. Held, S. Thrun, and S. Savarese, \Learning to track at 100 fps with deep
    regression networks," in Proceedings of the European Conference on Computer
    Vision, pp. 749{765, 2016.
    [33] J. Munkres, \Algorithms for the assignment and transportation problems,"
    Journal of the Society for Industrial and Applied Mathematics, pp. 32{38, 1957.
    [34] P. Dollar, R. Appel, S. Belongie, and P. Perona, \Fast feature pyramids for
    object detection," IEEE Transactions on Pattern Analysis and Machine Intel-
    ligence, pp. 1532{1545, 2014.
    [35] F. Yang, W. Choi, and Y. Lin, \Exploit all the layers: Fast and accurate cnn
    object detector with scale dependent pooling and cascaded rejection classi ers,"
    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
    nition, pp. 2129{2137, 2016.
    [36] R. Henschel, L. L.-T. D. Cremers, and B. Rosenhahn, \Fusion of head and
    full-body detectors for multi-object tracking," in Proceedings of the IEEE Con-
    ference on Computer Vision and Pattern Recognition, pp. 1509{1518, 2018.
    [37] M. Keuper, S. Tang, Y. Zhongjie, B. Andres, T. Brox, and B. Schiele, \A multicut
    formulation for joint segmentation and tracking of multiple objects," arXiv
    preprint arXiv:1607.06317, 2016.
    [38] C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, \Multiple hypothesis tracking
    revisited," in Proceedings of the IEEE International Conference on Computer
    Vision, pp. 4696{4704, 2015.
    [39] J. Chen, H. Sheng, Y. Zhang, and Z. Xiong, \Enhancing detection model for
    multiple hypothesis tracking," in Proceedings of the IEEE Conference on Com-
    puter Vision and Pattern Recognition Workshops, pp. 2143{2152, 2017.
    [40] Z. Fu, P. Feng, F. Angelini, J. Chambers, and S. M. Naqvi, \Particle phd
    lter based multiple human tracking using online group-structured dictionary
    learning," IEEE Access, pp. 14764{14778, 2018.
    [41] E. Bochinski, V. Eiselein, and T. Sikora, \High-speed tracking-by-detection
    without using image information," in Proceedings of the IEEE International
    Conference on Advanced Video and Signal Based Surveillance, pp. 1{6, 2017.
    [42] R. Sanchez-Matilla, F. Poiesi, and A. Cavallaro, \Online multi-target tracking
    with strong and weak detections," in Proceedings of the European Conference
    on Computer Vision, pp. 84{99, 2016.
    [43] T. Kutschbach, E. Bochinski, V. Eiselein, and T. Sikora, \Sequential sensor fusion
    combining probability hypothesis density and kernelized correlation lters
    for multi-object tracking in video data," in Proceedings of the IEEE Interna-
    tional Conference on Advanced Video and Signal Based Surveillance, pp. 1{5,
    2017.
    [44] V. Eiselein, D. Arp, M. Patzold, and T. Sikora, \Real-time multi-human tracking
    using a probability hypothesis density lter and multiple detectors," in Pro-
    ceedings of the IEEE International Conference on Advanced Video and Signal
    Based Surveillance, pp. 325{330, 2012.
    [45] S. Tang, M. Andriluka, B. Andres, and B. Schiele, \Multiple people tracking by
    lifted multicut and person reidenti cation," in Proceedings of the IEEE Con-
    ference on Computer Vision and Pattern Recognition, pp. 3701{3710, 2017.
    [46] C. Ma, C. Yang, F. Yang, Y. Zhuang, Z. Zhang, H. Jia, and X. Xie, \Trajectory
    factory: Tracklet cleaving and re-connection by deep siamese bi-gru for multiple
    object tracking," arXiv preprint arXiv:1804.04555, 2018.
    [47] E. Levinkov, J. Uhrig, S. Tang, M. Omran, E. Insafutdinov, A. Kirillov,
    C. Rother, T. Brox, B. Schiele, and B. Andres, \Joint graph decomposition &
    node labeling: Problem, algorithms, applications," in Proceedings of the IEEE
    Conference on Computer Vision and Pattern Recognition, pp. 1904{1912, 2017.
    [48] J. Son, M. Baek, M. Cho, and B. Han, \Multi-object tracking with quadruplet
    convolutional neural networks," in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 5620{5629, 2017.
    [49] S.-H. Bae and K.-J. Yoon, \Con dence-based data association and discriminative
    deep appearance learning for robust online multi-object tracking," IEEE
    Transactions on Pattern Analysis and Machine Intelligence, pp. 595{610, 2018.
    [50] Y. Ban, S. Ba, X. Alameda-Pineda, and R. Horaud, \Tracking multiple persons
    based on a variational bayesian model," in Proceedings of the European
    Conference on Computer Vision, pp. 52{67, 2016.
    [51] A. Dehghan, S. Modiri Assari, and M. Shah, \Gmmcp tracker: Globally optimal
    generalized maximum multi clique problem for multiple object tracking,"
    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
    nition, pp. 4091{4099, 2015.
    [52] L. Chen, H. Ai, C. Shang, Z. Zhuang, and B. Bai, \Online multi-object tracking
    with convolutional neural networks," in Proceedings of the IEEE International
    Conference on Image Processing, pp. 645{649, 2017.
    [53] A. Sadeghian, A. Alahi, and S. Savarese, \Tracking the untrackable: Learning
    to track multiple cues with long-term dependencies," in Proceedings of the IEEE
    International Conference on Computer Vision, pp. 300{311, 2017.
    [54] Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu, \Online multi-object
    tracking using cnn-based single object tracker with spatial-temporal attention
    mechanism," in Proceedings of the IEEE International Conference on Computer
    Vision, pp. 4846{4855, 2017.
    [55] W. Choi, \Near-online multi-target tracking with aggregated local
    ow descriptor,"
    in Proceedings of the IEEE International Conference on Computer Vision,
    pp. 3029{3037, 2015.
    [56] L. Leal-Taixe, C. Canton-Ferrer, and K. Schindler, \Learning by tracking:
    Siamese cnn for robust target association," in Proceedings of the IEEE Con-
    ference on Computer Vision and Pattern Recognition Workshops, pp. 33{40,
    2016.
    [57] S. Sharma, J. A. Ansari, J. K. Murthy, and K. M. Krishna, \Beyond pixels:
    Leveraging geometry and shape cues for online multi-object tracking," arXiv
    preprint arXiv:1802.09298, 2018.
    [58] J. Hong Yoon, C.-R. Lee, M.-H. Yang, and K.-J. Yoon, \Online multi-object
    tracking via structural constraint event aggregation," in Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition, pp. 1392{1400,
    2016.
    [59] A. Milan, K. Schindler, and S. Roth, \Detection-and trajectory-level exclusion
    in multiple object tracking," in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 3682{3689, 2013.
    [60] S. Wang and C. C. Fowlkes, \Learning optimal parameters for multi-target
    tracking with contextual interactions," International Journal of Computer Vi-
    sion, pp. 484{501, 2017.
    [61] J. H. Yoon, M.-H. Yang, J. Lim, and K.-J. Yoon, \Bayesian multi-object tracking
    using motion context from multiple objects," in Proceedings of the IEEE
    Winter Conference on Applications of Computer Vision, pp. 33{40, 2015.

    QR CODE