簡易檢索 / 詳目顯示

研究生: 林君豪
Jyun-Hao Lin
論文名稱: 利用跨模態自注意力機制增強的行人重新識別方式
Exploiting Cross-Modality Self-Attention for Enhanced Person Re-Identification Across Modalities
指導教授: 花凱龍
Kai-Lung Hua
沈上翔
Shan-Hsiang Shen
口試委員: 陳永耀
Yung-Yao Chen
楊朝龍
Chao-Lung Yang
陳宜惠
Yi-Hui Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 47
中文關鍵詞: 行人重新識別自注意力機制跨模態
外文關鍵詞: Person Re-Identification, Self-Attention, Cross-Modality
相關次數: 點閱:244下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在跨模態人員重新識別任務中,如何解決可見光與熱成像之間的特徵差異是一個重要挑戰,對齊和匹配問題嚴重影響了重新識別系統的性能,這問題常常導致在不同模態之間準確可靠地識別各個身分變得十分困難。為了解決這一問題,我們提出了一種新穎的跨模態自注意力機制,通過學習捕捉和適應性地對齊兩個領域的重要資訊,有效的縮短了可見光和熱成像之間的特徵差距。我們的跨模態自注意力機制模型基於深度神經網絡架構,包含一個特定模式的特徵提取模塊和一個跨模式對齊模塊,特定模式的特徵提取模塊被設計為從可見光和熱成像中提取出有效的特徵。跨模式對齊模塊利用自注意力機制適應性地對齊和融合跨模式提取的特徵,從而克服對齊和匹配問題。通過強化跨模態的特徵融合結果,我們的方法顯著提高了人員重新識別任務的整體性能,為了驗證我們提出的跨模態自注意力機制方法有效性,我們在幾個基準數據集上進行了廣泛的實驗,包括 RegDB 和 SYSU-MM01,實驗結果表明,我們的方法在準確性、泛化性方面持續超越了最先進的方法。


    In cross-modality person re-identification tasks, aligning the discrepancy between visible light and thermal imaging features presents a significant challenge. Misalignment and mismatching issues drastically impede the performance of re-identification systems, often making it difficult to accurately and reliably identify individuals across different modalities. To address this problem, we propose an innovative Cross-Modality Self-Attention (CMSA) module, which effectively bridges the gap between visible and thermal modalities by adaptively capturing and aligning essential information from both domains. Our CMSA module, built upon a deep neural network architecture, comprises a modality-specific feature extraction component and a cross-modality alignment component. The feature extraction component robustly distills features from both visible light and thermal images, while the cross-modality alignment component employs the self-attention mechanism to adaptively align and merge the extracted features across modalities, thereby resolving the issues of misalignment and mismatching. By enhancing the fusion of salient features across modalities, our approach significantly boosts the overall performance of person re-identification tasks. To validate the effectiveness of our proposed CMSA module, we conduct exhaustive experiments on several benchmark datasets, including RegDB and SYSU-MM01. The experimental results demonstrate that our module consistently outperforms state-of-the-art approaches in terms of accuracy and generalizability.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 RGB-based Person ReID . . . . . . . . . . . . . . . . . . 4 2.2 Visible-infrared Person Re-ID . . . . . . . . . . . . . . . 5 2.3 Attention Mechanisms . . . . . . . . . . . . . . . . . . . 6 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Cross-Modality Self-Attention . . . . . . . . . . . . . . . 10 3.3 Implementation of MAM in Cross-Modality Attention . . 13 3.4 Local Matching Paradigm . . . . . . . . . . . . . . . . . . 15 4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1 Datasets and Experimental Setting . . . . . . . . . . . . . 18 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 26 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Letter of Authority . . . . . . . . . . . . . . . . . . . .

    Q. Wu, P. Dai, J. Chen, C.-W. Lin, Y. Wu, F. Huang, B. Zhong, and R. Ji, “Discover cross-modality nuances for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4330–4339, 2021.
    D. T. Nguyen, H. G. Hong, K. W. Kim, and K. R. Park, “Person recognition system based on a combination of body images from visible light and thermal cameras,” Sensors, vol. 17, no. 3, p. 605,2017.
    A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “Rgb-infrared cross-modality person re-identification,” in Proceedings of the IEEE international conference on computer vision, pp. 5380–5389, 2017.
    S. Gong, M. Cristani, C. C. Loy, and T. M. Hospedales, “The re-identification challenge,” Person re-identification, pp. 1–20, 2014.
    Q. Leng, M. Ye, and Q. Tian, “A survey of open-world person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 4, pp. 1092–1108, 2019.
    H.-X. Yu, A. Wu, and W.-S. Zheng, “Unsupervised person re-identification by deep asymmetric metric embedding,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 4, pp. 956–973, 2018.
    Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in Proceedings of the IEEE international conference on computer vision, pp. 3754–3762, 2017.
    G. Wang, T. Zhang, J. Cheng, S. Liu, Y. Yang, and Z. Hou, “Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3623–3632, 2019.
    G.-A. Wang, T. Zhang, Y. Yang, J. Cheng, J. Chang, X. Liang, and Z.-G. Hou, “Cross-modality paired-images generation for rgb-infrared person re-identification,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12144–12151, 2020.
    M. Ye, X. Lan, J. Li, and P. Yuen, “Hierarchical discriminative learning for visible thermal person re-identification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
    Y. Lu, Y. Wu, B. Liu, T. Zhang, B. Li, Q. Chu, and N. Yu, “Cross-modality person re-identification with shared-specific feature transfer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13379–13389, 2020.
    Y. Hao, N. Wang, J. Li, and X. Gao, “Hsme: Hypersphere manifold embedding for visible thermal person re-identification,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33,pp. 8385–8392, 2019.
    Y. Ling, Z. Zhong, Z. Luo, P. Rota, S. Li, and N. Sebe, “Class-aware modality mix and center-guided metric learning for visible-thermal person re-identification,” in Proceedings of the 28th ACM international conference on multimedia, pp. 889–897, 2020.
    L. Zhang, H. Guo, K. Zhu, H. Qiao, G. Huang, S. Zhang, H. Zhang, J. Sun, and J. Wang, “Hybrid modality metric learning for visible-infrared person re-identification,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 18, no. 1s, pp. 1–15, 2022.
    P. Dai, R. Ji, H. Wang, Q. Wu, and Y. Huang, “Cross-modality person re-identification with generative adversarial training.,” in IJCAI, vol. 1, p. 6, 2018.
    Y. Hao, N. Wang, X. Gao, J. Li, and X. Wang, “Dual-alignment feature embedding for cross-modality person re-identification,” in Proceedings of the 27th ACM International Conference on Multimedia, pp. 57–65, 2019.
    H. Li, J. Pang, D. Tao, and Z. Yu, “Cross adversarial consistency self-prediction learning for unsupervised domain adaptation person re-identification,” Information Sciences, vol. 559, pp. 46–60, 2021.
    Y. Zhang, S. Choi, and S. Hong, “Spatio-channel attention blocks for cross-modal crowd counting,” in Proceedings of the Asian Conference on Computer Vision, pp. 90–107, 2022.
    H. Zhao, J. Jia, and V. Koltun, “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10076–10085, 2020.
    J. Liu, B. Ni, Y. Yan, P. Zhou, S. Cheng, and J. Hu, “Pose transferrable person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4099–4108, 2018.
    R. Hou, B. Ma, H. Chang, X. Gu, S. Shan, and X. Chen, “Interaction-and-aggregation network for person re-identification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9317–9326, 2019.
    M. Tian, S. Yi, H. Li, S. Li, X. Zhang, J. Shi, J. Yan, and X. Wang, “Eliminating background-bias for robust person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5794–5803, 2018.
    Z. Zheng, L. Zheng, and Y. Yang, “A discriminatively learned cnn embedding for person reidentification,” ACM transactions on multimedia computing, communications, and applications (TOMM), vol. 14, no. 1, pp. 1–20, 2017.
    A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,” arXiv preprint arXiv:1703.07737, 2017.
    C. Luo, Y. Chen, N. Wang, and Z. Zhang, “Spectral feature transformation for person reidentification,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 4976–4985, 2019.
    M. Ye, J. Shen, and L. Shao, “Visible-infrared person re-identification via homogeneous augmented tri-modal learning,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 728–739, 2020.
    M. Ye, X. Lan, Z. Wang, and P. C. Yuen, “Bi-directional center-constrained top-ranking for visible thermal person re-identification,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 407–419, 2019.
    M. Ye, X. Lan, Q. Leng, and J. Shen, “Cross-modality person re-identification via modality-aware collaborative ensemble learning,” IEEE Transactions on Image Processing, vol. 29, pp. 9387–9399, 2020.
    Z. Zhang, S. Jiang, C. Huang, Y. Li, and R. Y. Da Xu, “Rgb-ir cross-modality person reid based on teacher-student gan model,” Pattern Recognition Letters, vol. 150, pp. 155–161, 2021.
    S. Choi, S. Lee, Y. Kim, T. Kim, and C. Kim, “Hi-cmd: Hierarchical cross-modality disentanglement for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10257–10266, 2020.
    M. Ye, J. Shen, D. J. Crandall, L. Shao, and J. Luo, “Dynamic dual-attentive aggregation learning for visible-infrared person re-identification,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16, pp. 229–247, Springer, 2020.
    M. Ye, Z. Wang, X. Lan, and P. C. Yuen, “Visible thermal person re-identification via dual-constrained top-ranking.,” in IJCAI, vol. 1, p. 2, 2018.
    Z. Feng, J. Lai, and X. Xie, “Learning modality-specific representations for visible-infrared person re-identification,” IEEE Transactions on Image Processing, vol. 29, pp. 579–590, 2019.
    Z. Wang, Z. Wang, Y. Zheng, Y.-Y. Chuang, and S. Satoh, “Learning to reduce dual-level discrepancy for infrared-visible person re-identification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 618–626, 2019.
    X. Zhang, H. Luo, X. Fan, W. Xiang, Y. Sun, Q. Xiao, W. Jiang, C. Zhang, and J. Sun, “Alignedreid: Surpassing human-level performance in person re-dentification,” arXiv preprint arXiv:1711.08184, 2017.
    W.-Y. Chen, P. Podstreleny, W.-H. Cheng, Y.-Y. Chen, and K.-L. Hua, “Code generation from a graphical user interface via attention-based encoder–decoder model,” Multimedia Systems, vol. 28, no. 1, pp. 121–130, 2022.
    D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
    H.-H. Lin, J.-D. Lin, J. J. M. Ople, J.-C. Chen, and K.-L. Hua, “Social media popularity prediction based on multi-modal self-attention mechanisms,” IEEE Access, vol. 10, pp. 4448–4455, 2021.
    M. Shahid, J. J. Virtusio, Y.-H. Wu, Y.-Y. Chen, M. Tanveer, K. Muhammad, and K.-L. Hua, “Spatio temporal self-attention network for fire detection and segmentation in video surveillance,” IEEE Access, vol. 10, pp. 1259–1275, 2021.
    M. Shahid and K.-l. Hua, “Fire detection using transformer network,” in Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 627–630, 2021.
    M. Shahid, I.-F. Chien, W. Sarapugdi, L. Miao, and K.-L. Hua, “Deep spatial-temporal networks for flame detection,” Multimedia Tools and Applications, vol. 80, pp. 35297–35318, 2021.
    M. Shahid, S.-F. Chen, Y.-L. Hsu, Y.-Y. Chen, Y.-L. Chen, and K.-L. Hua, “Forest fire segmentation via temporal transformer from aerial images,” Forests, vol. 14, no. 3, p. 563, 2023.
    Y.-C. Liu, M. Shahid, W. Sarapugdi, Y.-X. Lin, J.-C. Chen, and K.-L. Hua, “Cascaded atrous dual
    attention u-net for tumor segmentation,” Multimedia tools and applications, vol. 80, pp. 30007–30031, 2021.
    Y.-C. Liu, D. S. Tan, J.-C. Chen, W.-H. Cheng, and K.-L. Hua, “Segmenting hepatic lesions using residual attention u-net with an adaptive weighted dice loss,” in 2019 IEEE International Conference on Image Processing (ICIP), pp. 3322–3326, IEEE, 2019.
    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
    D. Li, X. Wei, X. Hong, and Y. Gong, “Infrared-visible cross-modal person re-identification with an x modality,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 4610–4617, 2020.
    M. Jia, Y. Zhai, S. Lu, S. Ma, and J. Zhang, “A similarity inference metric for rgb-infrared cross-modality person re-identification,” arXiv preprint arXiv:2007.01504, 2020.
    X. Wei, D. Li, X. Hong, W. Ke, and Y. Gong, “Co-attentive lifting for infrared-visible person re-identification,” in Proceedings of the 28th ACM international conference on multimedia, pp. 1028–1037, 2020.
    N. Pu, W. Chen, Y. Liu, E. M. Bakker, and M. S. Lew, “Dual gaussian-based variational subspace disentanglement for visible-infrared person re-identification,” in Proceedings of the 28th ACM International Conference on Multimedia, pp. 2149–2158, 2020.
    Y. Chen, L. Wan, Z. Li, Q. Jing, and Z. Sun, “Neural feature search for rgb-infrared person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 587–597, 2021.
    C. Fu, Y. Hu, X. Wu, H. Shi, T. Mei, and R. He, “Cm-nas: Cross-modality neural architecture search for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11823–11832, 2021.
    Q. Zhang, C. Lai, J. Liu, N. Huang, and J. Han, “Fmcnet: Feature-level modality compensation for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7349–7358, 2022.

    無法下載圖示 全文公開日期 2028/08/14 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE