複雜環境下的即時多人跟蹤系統｜國立臺灣科技大學博碩士論文系統

進階查詢 / 詳目顯示

回結果列表

研究生：	李胡柯 Hu-Ke Li
論文名稱：	複雜環境下的即時多人跟蹤系統 A Real-time Multiple People Tracking System in Complex Environment
指導教授：	洪西進 Shi-Jinn Horng
口試委員:	李正吉 Cheng-Chi Lee 楊昌彪 Chang-Biau Yang 楊竹星 Chu-Sing Yang 林韋宏 Wei-Hung Lin
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	47
中文關鍵詞：	多目標檢測、行人重識別、深度學習、Yolov5 、Deep Sort 、Aligned ReID 、卡爾曼濾波
外文關鍵詞：	Multiple Object Tracking, Person Re-identification, Deep Learning, Yolov5, Deep Sort, Aligned ReID, Kalman filtering
相關次數：	點閱：287 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

多目標檢測(Multiple Object Tracking)是計算機視覺的一大領域，由於需
求越來越多其應用方面也越來越廣。本論文的模型是傳統 Deep Sort 的改進版，
主要分為兩個部分，物件檢測部分與目標跟蹤部分。以 Yolov5 的改進版本
Yolov5(PA)作為前置的物件檢測模型，讓 Yolov5(PA)模型在 CrowdHuman 資料集中
針對「行人」這一類別进行專項訓練，大幅提升了模型在複雜環境下的檢測準確率。
以 Deep Sort 為基礎跟蹤架構，通過使用馬氏距離、匈牙利算法、Aligned ReID
等方式來提高模型的 Re-ID 準確率，再通過卡爾曼濾波進行軌跡的預測。本論文
以 MOT20 資料集提供的視頻為主要測試場景，在獲得良好 MOTA 和 MOTP 的同時，
保證模型的運行速度，達到 real-time 的效果。

Multiple Object Tracking is a major research field of computer vision due to
increasing demand. And its application becomes more and more extensive. The model
proposed in this paper is an improved version of the traditional Deep Sort, which is mainly
divided into two parts, the object detection part and the target tracking part. Yolov5(PA),
the improved version of Yolov5, is used as the front object detection model and it was
trained specifically for the category of "pedestrians" in the CrowdHuman data set, which
greatly improved the detection accuracy of the model in a complex environment. Based
on the Deep Sort tracking architecture, the Re-ID accuracy of the model was improved
by using Mahalanobis distance, Hungarian algorithm, Aligned Reid, etc., and the tracking
was predicted by Kalman filtering. In this paper, we use videos from the MOT20 dataset
as the main test scenario. While achieving good MOTA and MOTP, the running speed of
this model is guaranteed to achieve the effect of real-time.

摘要 .................................................................I
Abstract ............................................................II
致謝 ...............................................................III
目錄 ................................................................IV
圖目錄 ............................................................ VII
表目錄 ..............................................................IX
第一章 緒論 .....................................................1
1.1 研究動機 .................................................1
1.2 相關研究 .................................................2
第二章 模型介紹 .................................................3
2.1 物件檢測 .................................................3
2.1.1 Backbone .................................................4
2.1.2 Neck .....................................................4
2.1.3 Head .....................................................6
2.2 目標跟蹤 .................................................6
2.2.1 馬氏距離 .................................................6
2.2.2 匈牙利算法 ...............................................7
2.2.3 卡爾曼濾波 ...............................................9
2.3 衡量指標 ................................................14
2.3.1 Recall 和 Precision .....................................14
2.3.2 MOTA 和 MOTP ............................................16
第三章 改進方法 ................................................19
3.1 Anchor Box 尺寸..........................................19
3.2 PANet ...................................................21
3.3 Aligned ReID ...........................................23
3.4 Triplet loss ...........................................26
3.5 系統流程 ...............................................27
第四章 實驗結果 ................................................31
4.1 資料集 ..................................................31
4.1.1 CrowdHuman 資料集 .......................................31
4.1.2 Market-1501 資料集 ......................................35
4.2 實驗結果 ................................................36
4.3 消融實驗 ................................................39
4.3.1 Full Box 與 Vision Box 的抉擇.............................39
4.3.2 特徵提取網絡的選擇 ......................................41
4.3.3 是否添加 NMS ............................................42
第五章 結論 ....................................................43
參考文獻 ............................................................44

                                

[1] MOT challenge: https://motchallenge.net/
[2] Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos and Ben Upcroft. “Simple
online and realtime tracking,” IEEE International Conference on Image Processing
(ICIP), 2016.
[3] Nicolai Wojke, Alex Bewley, and Dietrich Paulus, “Simple online and realtime
tracking with a deep association metric,” IEEE International Conference on Image
Processing (ICIP), 2017.
[4] Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick, “Mask R-CNN,”
IEEE International Conference on Computer Vision, 2017, pp. 2961-2969.
[5] Long Chen, Haizhou Ai, Zijie Zhuang and Chong Shang, “Real-time multiple people
tracking with deeply learned candidate selection and person re-identification,” IEEE
International Conference on Multimedia and Expo (ICME), 2018.
[6] Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, and Shengjin Wang, “Towards
real-time multi-object tracking,” arXiv preprint arXiv:1909.12605 2.3 (2019): 4.
[7] Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi, “You only look
once: Unified, real-time object detection,” IEEE Conference on Computer Vision
and Pattern Recognition, 2016, pp. 779-788.
[8] Ultralytics： https://ultralytics.com/
[9] Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, “Deep residual learning
for image recognition,” IEEE Conference on Computer Vision and Pattern
Recognition, 2016, pp. 770-778.
[10] Tsung-Yi Lin1, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan and
Serge Belongie, “Feature pyramid networks for object detection,” IEEE Conference
on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[11] Bernardin, Keni, and Rainer Stiefelhagen, “Evaluating multiple object tracking
performance: The clear mot metrics,” EURASIP Journal on Image and Video
Processing, 2008, pp. 1-10.
[12] Ristani, Ergys, et al., “Performance measures and a data set for multi-target, multicamera tracking,” European Conference on Computer Vision, Springer, Cham, 2016.
[13] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi and Jiaya Jia, “Path aggregation network
for instance segmentation,” IEEE Conference on Computer Vision and Pattern
Recognition, 2018, pp. 8759-8768.
[14] Xuan Zhang, Hao Luo1, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang,
Chi Zhang, Jian Sun, “Alignedreid: Surpassing human-level performance in person
re-identification,” arXiv preprint arXiv:1711.08184, 2017.
[15]Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian
Sun, “Crowdhuman: A benchmark for detecting human in a crowd,” arXiv preprint
arXiv:1805.00123, 2018.
[16]Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang and Qi Tian,
“Scalable person re-identification: A benchmark,” IEEE International Conference on
Computer Vision, 2015, pp. 1116-1124.
[17]Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva
Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick, “Microsoft coco: Common objects
in context,” European Conference on Computer Vision, Springer, Cham, 2014.
[18] Yolov5: https://github.com/ultralytics/yolov5
[19] Deng Zhuo, and Longin Jan Latecki, “Amodal detection of 3d objects: Inferring 3d
bounding boxes from 2d ones in rgb-depth images,” IEEE Conference on Computer
Vision and Pattern Recognition, 2017, pp. 5762-5770.
[20] Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier
Alameda-Pineda, “TransCenter: Transformers with dense queries for multiple-object
tracking,” arXiv preprint arXiv:2103.15145, 2021.
[21] Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao, “Yolov4:
Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934,
2020.
[22] Neubeck, Alexander, and Luc Van Gool, “Efficient non-maximum suppression,”
18th International Conference on Pattern Recognition (ICPR'06), Vol. 3, IEEE,
2006.
[23] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN:
Towards real-time object detection with region proposal networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, Issue 6, June
2016, pp. 1137-1149.

進階查詢 / 詳目顯示

相關論文