簡易檢索 / 詳目顯示

研究生: 廖卲慈
Shao-Tsz Liao
論文名稱: 一種應用於無人機影像之無錨框多物件追蹤聯合訓練模型
An Anchor-Free Joint Model for Multiple Object Tracking in UAV Videos
指導教授: 阮聖彰
Shanq-Jang Ruan
林昌鴻
Chang Hong Lin
口試委員: 阮聖彰
林昌鴻
陳維美
呂政修
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 72
中文關鍵詞: 多物件追蹤自動運輸無錨點聯合訓練模型
外文關鍵詞: Multiple Object Tracking (MOT) , Autonomous Transportation, Anchor-Free Joint Model
相關次數: 點閱:95下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 多目標追蹤(MOT)是智慧運輸系統中一項基本任務且重要的任務。其用於同時檢測和追蹤多個物體,具有諸多重要優勢,如確保安全、有效的交通管理和自動駕駛。傳統的多目標追蹤策略將其視為兩個連續的子任務;然而,將檢測和追蹤結合在一起,使模型能夠利用上下文信息並提高一致性。本文提出了一種新穎的多目標追蹤方法將目標檢測和追蹤整合至一個聯合模型中,同時消除了預定義的錨框(anchor boxes),以降低計算量和參數數量。該方法採用 HRNet 作為主幹網絡,並結合極化自注意來加強特徵提取。其設計解決了在車輛的視角下經常出現的雙向運動中追蹤物體的挑戰。根據在 Visdrone 及 UAVDT 數據集上進行的實驗表現,我們所提出的方法更適用於無人機影像,並超越目前現有的多目標追蹤,展現其競爭力。


    Multiple object tracking (MOT) is a fundamental task that plays a crucial role in intelligent transportation systems. It refers to the process of simultaneously detecting and tracking multiple objects and offers numerous important benefits, such as ensuring safety, efficient traffic management, and autonomous driving. The raditional two-stage strategy solves MOT as two consecutive sub-tasks; however, combining detection and tracking enables the model to exploit contextual information and improves consistency. This thesis presents a novel MOT approach that integrates object detection and tracking into a joint model while eliminating predefined anchor boxes to reduce computation and parameters. HRNet is adopted as the backbone and integrated with polarized self-attention to consolidate the feature extraction process. This design addresses the challenges of tracking objects in bidirectional motion, which frequently occurs from the perspective of the vehicle. Experiments conducted on Visdrone and UVADT datasets demonstrate that our proposed method is more suitable for UAV videos and surpasses modern state-of-the-art trackers.

    RECOMMENDATION FORM I COMMITTEE FORM II 摘要 III ABSTRACT IV ACKNOWLEDGEMENTS V TABLE OF CONTENTS VII LIST OF FIGURES XI LIST OF TABLES XIV CHAPTER 1 1 INTRODUCTION 1 1.1 Motivation of This Thesis 1 1.2 Purpose of this thesis 3 1.3 Contribution of This Thesis 4 1.4 Organization of This Thesis 5 CHAPTER 2 6 BACKGROUND 6 2.1 Multiple Object Tracking 7 2.2 Data Association 10 2.3 Convolutional Neural Networks 12 2.4 Attention Mechanism 14 CHAPTER 3 16 RELATED WORKS 16 3.1 Tracking-by-Detection 17 3.2 Joint Detection and Tracking 18 3.3 Impact of Anchors 19 3.4 MOT for UAV Videos 22 CHAPTER 4 23 PROPOSED ARCHITECTURE 23 4.1 Backbone Network 25 4.2 Polarized Self-attention 26 4.3 Output Branches 29 4.4 Data Association 33 CHAPTER 5 35 EXPERIMENTS 35 5.1 Dataset 36 5.2 Experiment Details 39 5.3 Dataset 40 5.4 Performance Comparison 41 5.5 Improvement Analysis 44 5.6 Qualitative Results 48 CHAPTER 6 51 CONCLUSION 51 REFERENCES 53

    [1] P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, “Mots: Multi-object tracking and segmentation,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2019, pp. 7942–7951
    [2] Z. Lu, V. Rathod, R. Votel, and J. Huang, “Retinatrack: Online single stage joint detection and tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp.14 668–14 678.
    [3] Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one-stage object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9627–9636.
    [4] P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “Detection and tracking meet drones challenge,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7380–7399, 2021.
    [5] D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, W. Zhang, Q. Huang, and Q. Tian, “The unmanned aerial vehicle benchmark: Object detection and tracking,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 370–386.
    [6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
    [7] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
    [8] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in Computer Vision ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, 2016, pp.21–37.
    [9] S. Haykin, “Kalman filters,” Kalman filtering and neural networks, pp.1–21, 2001.
    [10] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological cybernetics, vol. 36, no. 4, pp. 193–202, 19
    [11] Z. Xi-yang, W. Xiao-li, and L. Liang-qun, “Online multi-object tracking via maximum entropy intuitionistic fuzzy data association,” in 2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, 2018, pp.03–806.
    [12] Y. Huang, S. Y. Chong, and T. L. Song, “Track-to-track fusion using multiple detection linear multitarget integrated probabilistic data association.” in ICINCO, 2017, pp. 431–439.
    [13] G. Chalvatzaki, X. S. Papageorgiou, C. S. Tzafestas, and P. Maragos, “Augmented human state estimation using interacting multiple model particle filters with probabilistic data association,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1872–1879, 2018.
    [14] Z. Chen et al., “Bayesian filtering: From kalman filters to particle filters, and beyond,” Statistics, vol. 182, no. 1, pp. 1–69, 2003.
    [15] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–1
    [16] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.
    [17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
    [18] H. Sheng, Y. Zhang, J. Chen, Z. Xiong, and J. Zhang, “Heterogeneous association graph fusion for target association in multiple object tracking,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 11, pp. 3269–3280, 2019.
    [19] H. Nodehi and A. Shahbahrami, “Multi-metric re-identification for online multi-person tracking,” IEEE transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 147–159, 2022.
    [20] H. Luo, W. Jiang, Y. Gu, F. Liu, X. Liao, S. Lai, and J. Gu, “A strong baseline and batch normalization neck for deep person re-identification,” IEEE Transactions on Multimedia, vol. 22, no. 10, pp. 2597–2609, 2019
    [21] E. Bochinski, V. Eiselein, and T. Sikora, “High-speed tracking-by-detection without using image information,” in 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, 2017, pp. 1–6.
    [22] Z. Lu, V. Rathod, R. Votel, and J. Huang, “Retinatrack: Online single stage joint detection and tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp.14 668–14 678.
    [23] Z. Wang, L. Zheng, Y. Liu, Y. Li, and S. Wang, “Towards real-time multi-object tracking,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, 2020, pp. 107–122.
    [24] Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” International Journal of Computer Vision, vol. 129, pp. 3069–3087, 2021
    [25] X. Zhou, V. Koltun, and P. Kr ̈ahenb ̈uhl, “Tracking objects as points,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV. Springer, 2020, pp. 474–490.
    [26] X. Zhou, D. Wang, and P. Kr ̈ahenb ̈uhl, “Objects as points,” arXiv preprint arXiv:1904.07850, 2019.
    [27] P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, and L. Leal-Taix ́e, “Mot20: A benchmark for multi object tracking in crowded scenes,” arXiv preprint arXiv:2003.09003, 2020.
    [28] S. Liu, X. Li, H. Lu, and Y. He, “Multi-object tracking meets moving uav,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8876–8885.
    [29] M. Yao, J. Wang, J. Peng, M. Chi, and C. Liu, “Folt: Fast multiple object tracking from uav-captured videos based on optical flow,” arXiv preprint arXiv:2308.07207, 2023.
    [30] C. Xiao, Q. Cao, Y. Zhong, L. Lan, X. Zhang, H. Cai, and Z. Luo, “Enhancing online uav multi-object tracking with temporal context and spatial topological relationships,” Drones, vol. 7, no. 6, p. 389, 2023.
    [31] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., “Deep high-resolution representation learning for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3349–3364, 2020.
    [32] H. Liu, F. Liu, X. Fan, and D. Huang, “Polarized self-attention: Towards high-quality pixel-wise regression,” arXiv preprint arXiv:2107.00782, 2021.
    [33] E. Bochinski, V. Eiselein, and T. Sikora, “High-speed tracking-by-detection without using image information,” in 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, 2017, pp. 1–6.
    [34] H. Pirsiavash, D. Ramanan, and C. C. Fowlkes, “Globally-optimal greedy algorithms for tracking a variable number of objects,” in CVPR 2011. IEEE, 2011, pp. 1201–1208.
    [35] S. Sun, N. Akhtar, H. Song, A. Mian, and M. Shah, “Deep affinity network for multiple object tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 104–119, 2019.

    無法下載圖示 全文公開日期 2025/12/05 (校內網路)
    全文公開日期 2025/12/05 (校外網路)
    全文公開日期 2025/12/05 (國家圖書館:臺灣博碩士論文系統)
    QR CODE