研究生: |
曾筠筑 YUN-JHU ZENG |
---|---|
論文名稱: |
利用熱紅外影像改善基於孿生網路的嵌入式目標追蹤系統 Improving Embedded Target Tracking Systems Based On Siamese Networks With Infrared Images |
指導教授: |
洪西進
Shi-Jinn Horng |
口試委員: |
李正吉
Cheng-Chi Lee 楊昌彪 Chang-Biau Yang 楊竹星 Chu-Sing Yang 林韋宏 Wei-Hong Lin 洪西進 Shi-Jinn Horng |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 50 |
中文關鍵詞: | 目標追蹤 、熱紅外影像 、孿生追蹤網路 、多模態學習 、三軸伺服馬達 、嵌入式系統 |
外文關鍵詞: | Target tracking, Thermal infrared image, Siamese tracking network, Multi-Modal Machine Learning, 3-axis servo motor, Embedded System |
相關次數: | 點閱:234 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
人類從遠古時代就存在著追蹤目標的本能,無論是為了狩獵獵物,抑或是現代的賽車、遊戲、戰爭,如何準確的追蹤目標是我們致勝的關鍵;而當我們有了機器輔助我們去追蹤目標,自然就可以解決更多的問題。隨著目標追蹤領域發展日益成熟,而目標追蹤被廣泛應用於軍事用途,例如戰機光達與飛彈;民用也百花齊放,涉及居家安全,賽事轉播,智慧居家等等多個領域。
本論文旨在建立一套具實用性的三軸目標追蹤系統,透過模擬蛇類在自然界中的獵捕能力,我們增加熱紅外影像以增加模型模態,優化既有孿生網路追蹤目標的能力。我們結合多模態學習(Multi-Modal Machine Learning , MMML)領域的知識,將孿生追蹤網路修改為多模態融合模型,目的是使模型能更佳的融合普通影像與熱紅外影像的特徵,增加模型對於自然界的感知;同時將訓練普通影像與熱紅外影像的孿生模型分開,用偽孿生網路 (Pseudo-Siamese network) 做為主架構,分別處理不同的模態信息,解決資料量不足的問題。
本論文成功透過多模態融合在多項測試中提升了模型穩定性與準確性。整體模型建立在孿生網路的架構之下,模型具備一定的實用性。透過嵌入式開發板與三軸伺服馬達,更加提高了系統在應用上的靈活性。
Whether it is for hunting prey, or modern racing, games, or war, how to accurately track the target is the key to the victory. When we can have a machine to help us track the target, we can naturally solve more problems. The target tracking is becoming increasingly mature. Not only it can be used in military applications, such as warplane radar and missiles but it can be applied in civilian purposes including home security, event broadcasting, smart home, and many other areas.
In this thesis, we aim to develop a practical three-axis target tracking system by simulating the hunting ability of snakes in nature, and we add thermal infrared images to increase the model modality and optimize the ability of existing Siamese tracking network. We combine our knowledge in the field of Multi-Modal Machine Learning (MMML) and modify the Siamese tracking network into a multi-modal fusion model, so as to improve the model's perception of the natural world by better integrating the features of normal images and thermal infrared images. At the same time, the Siamese networks models for training normal images and thermal infrared images are separated and a pseudo-Siamese network is used as the main framework to process the different modal information separately to solve the problem of insufficient data. This paper successfully improves the stability and accuracy of the model through multi-modal fusion model in several tests. With the embedded development version and three-axis servo motor, our system can be more flexible in application.
[1] Gracheva EO, Ingolia NT,Kelly YM,Corder-Morales JF,Hollopeter G,Chesler AT, “Molecular basis of infrared detection by snakes,” Nature, pp. 464,1006-1011, 2010.
[2] Yang H, Shao L, Zheng F, Wang L,Song Z, “Recent advances and trends in visual tracking: A review,” Neurocomputing, pp. 3823-3821, 2011.
[3] Sun D Q, Roth S, Black M J, “Secrets of optical flow estimation and their principles,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2432-2439, 2010.
[4] A. Doucet, “On sequential Monte Carlo methods for Bayesian filtering,” Intelligent Information Processing III, pp. 297-305, 2006.
[5] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE transactions on pattern analysis and machine intelligence17.8, pp. 790-799, 1995.
[6] G. R. Bradski, “Computer vision face tracking for use in a perceptual user interface,” 1998.
[7] Babenko B, Yang M H, Belongie S, “Robust object tracking with online multiple instance learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1619-1632, 2011.
[8] Kalal Z, Mikolajczyk K, Matas J, “Tracking-learning-detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1409-1422, 2012.
[9] Hare S, Saffari A, Torr P H S, “Struck: structured output tracking with kernels,” IEEE International Conference on Computer Vision, pp. 263-270, 2011.
[10] Babenko B, Yang M H, Belongie S, “Visual tracking with online multiple instance learning,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 983-990, 2009.
[11] Koch G, Zemel R, Salakhutdinov R, “Siamese neural networks for one-shot image recognition,” ICML deep learning workshop, 2015.
[12] Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R, “Signature verification using a" siamese" time delay neural network,” Advances in neural information processing systems, pp. 737-744, 6 1993.
[13] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. A. Torr, “Fully-convolutional Siamese networks for object tracking,” European Conference on Computer Vision, pp. 850-865, 9 2016.
[14] Zagoruyko S, Komodakis N, “Learning to compare image patches via convolutional neural networks,” IEEE conference on computer vision and pattern recognition, pp. 4353-4361, 2015.
[15] Jonathan Long, Evan Shelhamer, Trevor Darrell, “Fully convolutional networks for semantic segmentation,” IEEE conference on computer vision and pattern recognition, pp. 3431-3440, 2015.
[16] Lenc K, Vedaldi A, “Understanding image representations by measuring their equivariance and equivalence,” IEEE conference on computer vision and pattern recognition, pp. 991-999, 2015.
[17] Krizhevsky A, Sutskever I, Hinton G. E., “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, pp. 1097-1105, 2012.
[18] S. Thorpe, D. Fize, C. Marlot, “Speed of processing in the human visual system,” Nature, pp. 381,520-522, 1996.
[19] L. Fei-Fei, “Knowledge transfer in learning to recognize visual object classes,” International Conference on Development and Learning (ICDL), 2006.
[20] L. Fei-Fei, R. Fergus , P. Perona, “One-Shot learning of object categories,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 594 - 611, 2006.
[21] Henriques J. F, Caseiro R, Martins P, Batista, J, “High-speed tracking with kernelized correlation filters,” IEEE transactions on pattern analysis and machine intelligence, pp. 583-596, 2014.
[22] Xing E. P , Ng A. Y, Jordan M. I, Russell S, “Distance metric learning, with application to clustering with side-information,” Advances in Neural Information Processing Systems, p. 505–512, 2002.
[23] Huang L, Zhao X, Huang K, “Got-10k: A large high-diversity benchmark for generic object tracking in the wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
[24] Qiao Liu, Xin Li, Zhenyu He, Chenglong Li, Jun Li, Zikun Zhou, Di Yuan, Jing Li, Kai Yang, Nana Fan, Feng Zheng, “LSOTB-TIR: A large-scale high-Diversity thermal infrared object tracking benchmark,” ACM International Conference on Multimedia, p. 3847–3856, 2020.
[25] Kristan M, Matas J, Leonardis A, Vojíř T, Pflugfelder R, Fernandez G, ... Čehovin L., “A novel performance evaluation methodology for single-target trackers,” IEEE transactions on pattern analysis and machine intelligence, pp. 2137-2155, 2016.
[26] L. Zhang, A. Gonzalez-Garcia1, J. van de Weijer, M. Danelljan, F. S. Khan, “Learning the model update for Siamese trackers,” IEEE International Conference on Computer Vision, pp. 4010-4019, 2019.
[27] Lin T. Y, Goyal P, Girshick R, He K, Dollár P, “Focal loss for dense object detection,” IEEE international conference on computer vision, pp. 2980-2988, 2017.
[28] McFee B, Lanckriet G, Jebara T, “Learning multi-modal similarity,” Journal of machine learning research, p. 2011.