通過基於 Swin Transformer 的神經網路和 CNN 集成增強雷達物件偵測

簡易檢索 / 詳目顯示

回結果列表

研究生：	沈子皓 Zi-Hao Shen
論文名稱：	通過基於 Swin Transformer 的神經網路和 CNN 集成增強雷達物件偵測 Enhancing Radar Object Detection through Swin Transformer-based Neural Networks and CNN Integration
指導教授：	沈上翔 Shan-Hsiang Shen
口試委員:	花凱龍 Kai-Lung Hua 沈上翔 Shan-Hsiang Shen 陳永耀 Yung-Yao Chen
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	112
語文別：	英文
論文頁數：	40
中文關鍵詞：	雷達物體檢測、深度學習、卷積、注意力機制、基於 Swin 變壓器的模型、雷達後處理
外文關鍵詞：	Radar Object Detection, Deep Learning, Convolution, Attention Mechanism, Swin Transformer-Based Model, Radar Post-Processing
相關次數：	點閱：42 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Contents
Recommendation Letter i
ApprovalLetter ii
AbstractinChinese iii
AbstractinEnglish iv
Acknowledgements v
Contents vi
List of Figures viii
List of Tables xi
1 Introduction 1
2 RelatedWork 3
2.1 3-D CNN-Based Radar Object Detection 3
2.2 Transformer-BasedApproaches 4
3 Methodology 6
3.1 ModelArchitecture 6
3.2 M-Net Plus: Enhanced Radar Signal Processing in CRUW Dataset 7
3.3  Learnable Triplet Attention: Enhancing Feature Extraction Capabilities 11
3.4  L-NMS Plus 14
4 Experiment 17
4.1  Dataset 17
4.2  Implementation Details 21
4.3  Ablation Studies 23
4.3.1  M-Net Plus Module 24
4.3.2  Learnable Triplet Attention Module 26
4.3.3  M-Net Plus Module + Learnable Triplet Attention Module 28
4.3.4  L-NMS Plus Module 30
4.3.5  M-Net Plus Module + Learnable Triplet AttentionModule+L-NMSPlusModule 35
5 Conclusion 37
References 39
                                

[1] T. Jiang, L. Zhuang, Q. An, J. Wang, K. Xiao, and A. Wang, “T-rodnet: Transformer for vehicular millimeter-wave radar object detection,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2022.
[2] D. D. Thang, S. C. Hidayati, Y.-Y. Chen, W.-H. Cheng, S.-W. Sun, and K.-L. Hua, “A spatial-pyramid scene categorization algorithm based on locality-aware sparse coding,” in 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), pp. 342–345, 2016.
[3] C.-L. Yang, H. Tampubolon, A. Setyoko, K.-L. Hua, M. Tanveer, and W. Wei, “Secure and privacy-preserving human interaction recognition of pervasive healthcare monitoring,” IEEE Transactions on Network Science and Engineering, vol. 10, no. 5, pp. 2439–2454, 2023.
[4] D. S. Tan, J. H. Soeseno, and K.-L. Hua, “Controllable and identity-aware facial attribute transformation,” IEEE Transactions on Cybernetics, vol. 52, no. 6, pp. 4825–4836, 2022.
[5] F. J. Abdu, Y. Zhang, M. Fu, Y. Li, and Z. Deng, “Application of deep learning on millimeter- wave radar signals: A review,” Sensors, vol. 21, no. 6, p. 1951, 2021.
[6] T. Yamaguchi and T. Mizutani, “Localization of subsurface pipes in radar images by 3d con- volutional neural network and kirchhoff migration,” in 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pp. 4841–4844, IEEE, 2021.
[7] X. Ding, X. Zhang, J. Han, and G. Ding, “Scaling up your kernels to 31x31: Revisiting large kernel design in cnns,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11963–11975, 2022.
[8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[9] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[10] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural network regularization,” arXiv preprint arXiv:1409.2329, 2014.
[11] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[12] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. De- hghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Trans- formers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[13] M. Shahid and K.-L. Hua, “Fire detection using transformer network,” in Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 627–630, 2021.
[14] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
[15] M. Shahid, J. J. Virtusio, Y.-H. Wu, Y.-Y. Chen, M. Tanveer, K. Muhammad, and K.-L. Hua, “Spatio-temporal self-attention network for fire detection and segmentation in video surveillance,” IEEE Access, vol. 10, pp. 1259–1275, 2022.
[16] M. Shahid, I.-F. Chien, W. Sarapugdi, L. Miao, and K.-L. Hua, “Deep spatial-temporal networks for flame detection,” Multimedia Tools and Applications, vol. 80, pp. 35297–35318, 2021.
[17] M. Zhao, T. Li, M. A. Alsheikh, Y. Tian, H. Zhao, A. Torralba, and D. Katabi, “Through-wall human pose estimation using radio signals,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7356–7365, 2018.
[18] D. Misra, T. Nalamada, A. U. Arasanipalai, and Q. Hou, “Rotate to attend: Convolutional triplet attention module,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3138–3147, 2021.
[19] Y. Wang, Z. Jiang, Y. Li, J.-N. Hwang, G. Xing, and H. Liu, “Rodnet: A real-time radar object detection network cross-supervised by camera-radar fused object 3d localization,” IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 4, pp. 954–967, 2021.
[20] M. A. Richards et al., Fundamentals of radar signal processing, vol. 1. Mcgraw-hill New York, 2005.
[21] Y. Wang, Y.-T. Huang, and J.-N. Hwang, “Monocular visual object 3d localization in road scenes,” in Proceedings of the 27th ACM International Conference on Multimedia, pp. 917– 925, 2019.

全文公開日期 2029/01/30 (校內網路)
全文公開日期 2034/01/30 (校外網路)
全文公開日期 2034/01/30 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文