簡易檢索 / 詳目顯示

研究生: 鄭善謙
Shan-Qian Zheng
論文名稱: 一個基於Transformer之高角度固定拍攝影片的船隻追蹤與計數方法
A Transformer-Based Approach to Tracking and Counting Vessels in a High-Angle Fixed-Shooting Video
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 黃元欣
Yuan-Shin Hwang
繆紹綱
Shaou-Gang Miaou
鄭為民
Wei-Min Jeng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 56
中文關鍵詞: 深度學習船隻偵測多目標追蹤Transformer船隻計數
外文關鍵詞: Deep learning, vessel detection, multiple object tracking, Transformer, vessel counting
相關次數: 點閱:212下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著世界各個國家之間的交流增加,對於彼此的交通需求也相應的上升,而臺灣四周被大海所環繞,對外的交通聯繫工具主要依賴船舶或者飛機,因此航運及航空產業長久以來一直都是臺灣的重點發展項目,其中,由於船舶的用途眾多且管理上較為分散,所以如何有效地進行水上活動管理一直是相關產業人員所關心的課題。目前國際海事組織基於航行安全上的考量,在海上人命安全規定中明文規範總噸位超過300噸的船舶需要強制裝設自動識別系統 (Automatic Identification System; AIS);船舶可以依靠AIS送出自身的訊息,自己也能透過AIS獲取其他船隻的訊息,但對於沒有強制配備AIS的中小型船隻而言,管理上就顯得相對較為困難,需要花費較多的人力進行管控。近年來,隨著深度學習的快速發展,電腦視覺的技術被廣泛運用在多項任務當中,對於中小型船隻的管理,我們可以透過影像處理的方式分析岸邊監視器影像,以智慧化的方法協助管理水域交通狀況,以此減輕水域值班人員的負擔。
    綜合上述內容,本論文提出一個透過岸邊監視器影像進行船隻追蹤與計數的方法。首先,我們通過Transformer對當前幀的影像進行偵測,獲得畫面中船隻的偵測框和追蹤框,接著,我們以模型的預測結果進行三個階段的匹配:第一階段會依照船隻外觀資訊進行匹配;第二階段會從被過濾的低分偵測框找回可能受到外部環境影響,而被過濾的船隻進行匹配;第三階段會根據卡爾曼濾波器提供的運動訊息進行匹配。於實驗中,我們比較其它三個最先進的多目標追蹤模型的性能,稱為FairMOT,ByteTrack,以及TransTrack,根據實驗結果顯示,我們的模型表現最好,其中精確率可以達到92.4%、召回率達到68.1%、多目標追蹤精度(MOTA) 達到0.625,而IDF1達到75.1%,並且成功找回資料集中72.5%的船隻軌跡。


    With the increase in communication between various countries worldwide, the demand on transportation for each other has also increased accordingly. Taiwan is surrounded by the sea, and the primary means of transportation to foreign countries is by vessels or airplanes, so the shipping and aviation industry has long been a crucial development project for Taiwan. Among them, the effective maintenance of water traffic has always been the concern of the related personnel in the industry because of the many usages of vessels and comparatively decentralized management.
    Currently, the International Maritime Organization (IMO) explicitly regulates the mandatory installation of an Automatic Identification System (AIS) for vessels over 300 gross tonnages for life safety at sea considerations. Vessels can rely on AIS to send their information and get information from other vessels through AIS. However, small vessels not mandatorily equipped with AIS are relatively difficult to manage and require more human resources for monitoring and control. In recent years, with the rapid development of deep learning, computer vision techniques have been widely used in many tasks. For the management of small vessels, we can analyze the shore-side surveillance video by the method of image processing to help manage the water traffic conditions in an intelligent way, thereby reducing the burden on the waterside duty personnel.
    In this thesis, we propose an approach to tracking and counting vessels by shore-side surveillance video. Firstly, we detect the current frame of the video through Transformer to get the detection box and tracking box of the vessel in the frame. Then we perform three stages of matching with the model prediction results. The first stage will be matched according to the vessel's appearance information. The second stage will retrieve the vessels that may be affected by the external environment from the filtered low score detection frame for matching. And the third stage will be matched based on the motion information provided by the Kalman filter. In the experiment, we compare the performance of our model with that of three other state-of-the-art multiple object tracking models, namely FairMOT, ByteTrack, and TransTrack. According to the experimental results, our model performs the best, in which the precision, recall, multi-object tracking accuracy (MOTA), and the IDF1 can reach 92.4%, 68.1%, and 0.625, 75.1%, respectively. In addition, our model can successfully retrieve 72.5% of vessel trajectories in the dataset.

    Contents 中文摘要 ii Abstract iii 中文致謝 v Contents vi List of Figures viii List of Tables x Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 2 1.3 System Descriptions 3 1.4 Thesis Organization 4 Chapter 2 Related Work 5 2.1 Multiple Object Tracking 5 2.1.1 Tracking by detection 6 2.1.2 Joint detection and tracking 6 2.2 Visual Transformer 8 Chapter 3 Vessel Tracking Method 13 3.1 Vessel Detection 13 3.2 Object Propagation 16 3.3 Data Association Enhancement 17 3.4 Loss Function 23 Chapter 4 Experimental Results and Discussion 24 4.1 Experimental Setup 24 4.2 Vessel Dataset and MOT Evaluation Metrics 25 4.3 Results of Vessel Tracking 30 4.4 Results of Vessel Counting 35 Chapter 5 Conclusions and Future Work 39 5.1 Contributions and Conclusions 39 5.2 Future Work 40 References 42

    [1] G. Praetorius, “Vessel Traffic Service (VTS): A maritime information service or traffic control system?: Understanding everyday performance and resilience in a socio-technical system under change,” Ph.D. Dissertation, Dept. Shipping and Marine Tech., Chalmers tekniska högskola, Gothenburg, Sweden, 2014.
    [2] A. Harati-Mokhtari et al., “Automatic Identification System (AIS): Data reliability and human error implications,” The Journal of Navigation, vol. 60, no. 3, pp. 373-389, 2007.
    [3] S. Ren et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems, vol. 28, 2015.
    [4] W. Liu et al., “SSD: Single shot multibox detector,” in Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21-37.
    [5] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” April 2018. [Online]. Available: https://arxiv.org/abs/1804.02767.
    [6] A. Bochkovskiy et al., “Yolov4: Optimal speed and accuracy of object detection,” April 2020. [Online]. Available: https://arxiv.org/abs/2004.10934
    [7] A. Bewley et al., “Simple online and realtime tracking,” in Proceedings of the IEEE International Conference on Image Processing, Phoenix, Arizona, 2016, pp. 3464-3468.
    [8] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in Proceedings of the IEEE International Conference on Image Processing, Beijing, China, 2017, pp. 3645-3649.
    [9] Z. Wang et al., “Towards real-time multi-object tracking,” in Proceedings of the European Conference on Computer Vision, Glasgow, United Kingdom, 2020, pp. 107-122.
    [10] T. Y. Lin et al., “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, 2017, pp. 2117-2125.
    [11] Y. Zhang et al., “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” International Journal of Computer Vision, vol. 129, no. 11, pp. 3069-3087, 2021.
    [12] K. He et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, 2016, pp. 770-778.
    [13] F. Yu et al., “Deep layer aggregation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, 2018, pp. 2403-2412.
    [14] X. Zhou, V. Koltun, and P. Krähenbühl, “Tracking objects as points,” in Proceedings of the European Conference on Computer Vision, Glasgow, United Kingdom, 2020, pp. 474-490.
    [15] X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” April 2019. [Online]. Available: https://arxiv.org/abs/1904.07850.
    [16] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 483-499.
    [17] A. Vaswani et al., “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
    [18] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Advances in Neural Information Processing Systems, vol. 27, 2014.
    [19] J. Alammar, “The Illustrated Transformer,” [Online]. Available: http://jalammar.github.io/illustrated-transformer/. [Accessed August 16, 2022].
    [20] N. Carion et al., “End-to-end object detection with transformers,” in Proceedings of the European Conference on Computer Vision, Glasgow, United Kingdom, 2020, pp. 213-229.
    [21] X. Zhu et al., “Deformable detr: Deformable transformers for end-to-end object detection,” October 2020. [Online]. Available: https://arxiv.org/abs/2010.04159.
    [22] J. Dai et al., “Deformable convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 764-773.
    [23] P. Sun et al., “Transtrack: Multiple object tracking with transformer,” December 2020. [Online]. Available: https://arxiv.org/abs/2012.15460.
    [24] Y. Zhang et al., “Bytetrack: Multi-object tracking by associating every detection box,” October 2021. [Online]. Available: https://arxiv.org/abs/2110.06864.
    [25] T. Y. Lin et al., “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2980-2988.
    [26] H. Rezatofighi et al., “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, California, 2019, pp. 658-666.
    [27] A. Milan et al., “MOT16: A benchmark for multi-object tracking,” March 2016. [Online]. Available: https://arxiv.org/abs/1603.00831.
    [28] P. Dendorfer et al., “Mot20: A benchmark for multi object tracking in crowded scenes,” March 2020. [Online]. Available: https://arxiv.org/abs/2003.09003.

    無法下載圖示 全文公開日期 2028/01/09 (校內網路)
    全文公開日期 2033/01/09 (校外網路)
    全文公開日期 2033/01/09 (國家圖書館:臺灣博碩士論文系統)
    QR CODE