Basic Search / Detailed Display

Author: 李胡柯
Hu-Ke Li
Thesis Title: 複雜環境下的即時多人跟蹤系統
A Real-time Multiple People Tracking System in Complex Environment
Advisor: 洪西進
Shi-Jinn Horng
Committee: 李正吉
Cheng-Chi Lee
楊昌彪
Chang-Biau Yang
楊竹星
Chu-Sing Yang
林韋宏
Wei-Hung Lin
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2021
Graduation Academic Year: 109
Language: 中文
Pages: 47
Keywords (in Chinese): 多目標檢測行人重識別深度學習Yolov5Deep SortAligned ReID卡爾曼濾波
Keywords (in other languages): Multiple Object Tracking, Person Re-identification, Deep Learning, Yolov5, Deep Sort, Aligned ReID, Kalman filtering
Reference times: Clicks: 237Downloads: 2
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 多目標檢測(Multiple Object Tracking)是計算機視覺的一大領域,由於需
    求越來越多其應用方面也越來越廣。本論文的模型是傳統 Deep Sort 的改進版,
    主要分為兩個部分,物件檢測部分與目標跟蹤部分。以 Yolov5 的改進版本
    Yolov5(PA)作為前置的物件檢測模型,讓 Yolov5(PA)模型在 CrowdHuman 資料集中
    針對「行人」這一類別进行專項訓練,大幅提升了模型在複雜環境下的檢測準確率。
    以 Deep Sort 為基礎跟蹤架構,通過使用馬氏距離、匈牙利算法、Aligned ReID
    等方式來提高模型的 Re-ID 準確率,再通過卡爾曼濾波進行軌跡的預測。本論文
    以 MOT20 資料集提供的視頻為主要測試場景,在獲得良好 MOTA 和 MOTP 的同時,
    保證模型的運行速度,達到 real-time 的效果。


    Multiple Object Tracking is a major research field of computer vision due to
    increasing demand. And its application becomes more and more extensive. The model
    proposed in this paper is an improved version of the traditional Deep Sort, which is mainly
    divided into two parts, the object detection part and the target tracking part. Yolov5(PA),
    the improved version of Yolov5, is used as the front object detection model and it was
    trained specifically for the category of "pedestrians" in the CrowdHuman data set, which
    greatly improved the detection accuracy of the model in a complex environment. Based
    on the Deep Sort tracking architecture, the Re-ID accuracy of the model was improved
    by using Mahalanobis distance, Hungarian algorithm, Aligned Reid, etc., and the tracking
    was predicted by Kalman filtering. In this paper, we use videos from the MOT20 dataset
    as the main test scenario. While achieving good MOTA and MOTP, the running speed of
    this model is guaranteed to achieve the effect of real-time.

    摘要 .................................................................I Abstract ............................................................II 致謝 ...............................................................III 目錄 ................................................................IV 圖目錄 ............................................................ VII 表目錄 ..............................................................IX 第一章 緒論 .....................................................1 1.1 研究動機 .................................................1 1.2 相關研究 .................................................2 第二章 模型介紹 .................................................3 2.1 物件檢測 .................................................3 2.1.1 Backbone .................................................4 2.1.2 Neck .....................................................4 2.1.3 Head .....................................................6 2.2 目標跟蹤 .................................................6 2.2.1 馬氏距離 .................................................6 2.2.2 匈牙利算法 ...............................................7 2.2.3 卡爾曼濾波 ...............................................9 2.3 衡量指標 ................................................14 2.3.1 Recall 和 Precision .....................................14 2.3.2 MOTA 和 MOTP ............................................16 第三章 改進方法 ................................................19 3.1 Anchor Box 尺寸..........................................19 3.2 PANet ...................................................21 3.3 Aligned ReID ...........................................23 3.4 Triplet loss ...........................................26 3.5 系統流程 ...............................................27 第四章 實驗結果 ................................................31 4.1 資料集 ..................................................31 4.1.1 CrowdHuman 資料集 .......................................31 4.1.2 Market-1501 資料集 ......................................35 4.2 實驗結果 ................................................36 4.3 消融實驗 ................................................39 4.3.1 Full Box 與 Vision Box 的抉擇.............................39 4.3.2 特徵提取網絡的選擇 ......................................41 4.3.3 是否添加 NMS ............................................42 第五章 結論 ....................................................43 參考文獻 ............................................................44

    [1] MOT challenge: https://motchallenge.net/
    [2] Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos and Ben Upcroft. “Simple
    online and realtime tracking,” IEEE International Conference on Image Processing
    (ICIP), 2016.
    [3] Nicolai Wojke, Alex Bewley, and Dietrich Paulus, “Simple online and realtime
    tracking with a deep association metric,” IEEE International Conference on Image
    Processing (ICIP), 2017.
    [4] Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick, “Mask R-CNN,”
    IEEE International Conference on Computer Vision, 2017, pp. 2961-2969.
    [5] Long Chen, Haizhou Ai, Zijie Zhuang and Chong Shang, “Real-time multiple people
    tracking with deeply learned candidate selection and person re-identification,” IEEE
    International Conference on Multimedia and Expo (ICME), 2018.
    [6] Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, and Shengjin Wang, “Towards
    real-time multi-object tracking,” arXiv preprint arXiv:1909.12605 2.3 (2019): 4.
    [7] Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi, “You only look
    once: Unified, real-time object detection,” IEEE Conference on Computer Vision
    and Pattern Recognition, 2016, pp. 779-788.
    [8] Ultralytics: https://ultralytics.com/
    [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, “Deep residual learning
    for image recognition,” IEEE Conference on Computer Vision and Pattern
    Recognition, 2016, pp. 770-778.
    [10] Tsung-Yi Lin1, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan and
    Serge Belongie, “Feature pyramid networks for object detection,” IEEE Conference
    on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
    [11] Bernardin, Keni, and Rainer Stiefelhagen, “Evaluating multiple object tracking
    performance: The clear mot metrics,” EURASIP Journal on Image and Video
    Processing, 2008, pp. 1-10.
    [12] Ristani, Ergys, et al., “Performance measures and a data set for multi-target, multicamera tracking,” European Conference on Computer Vision, Springer, Cham, 2016.
    [13] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi and Jiaya Jia, “Path aggregation network
    for instance segmentation,” IEEE Conference on Computer Vision and Pattern
    Recognition, 2018, pp. 8759-8768.
    [14] Xuan Zhang, Hao Luo1, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang,
    Chi Zhang, Jian Sun, “Alignedreid: Surpassing human-level performance in person
    re-identification,” arXiv preprint arXiv:1711.08184, 2017.
    [15]Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian
    Sun, “Crowdhuman: A benchmark for detecting human in a crowd,” arXiv preprint
    arXiv:1805.00123, 2018.
    [16]Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang and Qi Tian,
    “Scalable person re-identification: A benchmark,” IEEE International Conference on
    Computer Vision, 2015, pp. 1116-1124.
    [17]Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva
    Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick, “Microsoft coco: Common objects
    in context,” European Conference on Computer Vision, Springer, Cham, 2014.
    [18] Yolov5: https://github.com/ultralytics/yolov5
    [19] Deng Zhuo, and Longin Jan Latecki, “Amodal detection of 3d objects: Inferring 3d
    bounding boxes from 2d ones in rgb-depth images,” IEEE Conference on Computer
    Vision and Pattern Recognition, 2017, pp. 5762-5770.
    [20] Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier
    Alameda-Pineda, “TransCenter: Transformers with dense queries for multiple-object
    tracking,” arXiv preprint arXiv:2103.15145, 2021.
    [21] Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao, “Yolov4:
    Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934,
    2020.
    [22] Neubeck, Alexander, and Luc Van Gool, “Efficient non-maximum suppression,”
    18th International Conference on Pattern Recognition (ICPR'06), Vol. 3, IEEE,
    2006.
    [23] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN:
    Towards real-time object detection with region proposal networks,” IEEE
    Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, Issue 6, June
    2016, pp. 1137-1149.

    QR CODE