研究生: |
簡士超 Shih-Chao Chien |
---|---|
論文名稱: |
開放空間之跨鏡追蹤技術 Deep Learning based Open-World Person Re-Identification |
指導教授: |
鍾聖倫
Sheng-Luen Chung |
口試委員: |
鍾聖倫
Sheng-Luen Chung 方文賢 Wen-Hsien Fang 蘇順豐 Shun-Feng Su 郭重顯 Chung-Hsien Kuo 徐繼聖 Gee-Sern Hsu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 開放式行人跨鏡追蹤 、部位特徵 、深度監督 、注意力機制 |
外文關鍵詞: | open-world person re-ID, part-level features, deep supervised learning network, attention model |
相關次數: | 點閱:235 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
行人跨鏡追蹤 (Person Re-Identification, Person Re-ID) 為從一組人像清單 (gallery) 中辨認出查詢 (query) 的人像。此技術可應用於區域監視系統中特定人的自動身份辨識或是行動軌跡之回溯 (trace back)。相較於封閉式跨鏡追蹤設定查詢人必屬於人像清單的假設,本論文研究挑戰難度更高的開放式跨鏡追蹤:查詢人可能不在清單之中,而需要提取比封閉式設定更有區別能力的特徵表示進行辨識。本論文提出深度監督之部位特徵提取網路,將經過人體部位對齊的行人圖像輸入骨幹網路提取出基礎行人特徵圖後,藉由自我注意力模組增加人體重點部位特徵的關注程度,接著使用部位特徵提取子網路將經過自我注意力模組的基礎行人特徵圖水平等比例切分為六個部分,並獨立提取各部位的精細特徵來得到人體部位特徵,同時深度監督子網路會從骨幹網路各階段輸出的不同尺度特徵中提取多階段特徵,透過整合兩種特徵來獲得更精確的驗證效能。本論文採用 Market1501、DukeMTMC-ReID 以及自行建立的EE3F資料集於開放式設定的資料集分配下進行效能測試,在集合驗證 (SV) 項目中,在錯誤目標接受率 (FTR) 為 0.1% 時獲得的正確目標接受率 (TTR) 分別為 40.96%、38.54% 以及 60.9%,而在個別驗證 (IV) 項目中,在錯誤目標接受率 (FTR) 為 0.1% 時獲得的正確目標接受率 (TTR) 分別為 79.22、74.68% 以及 89.25%,皆得到具有競爭力的驗證效能。
Person Re-Identification (Person Re-ID) is to identify a pedestrian in a query image from a gallery that contains candidate images. Person Re-ID is critical to surveillance systems in identifying and tracking pedestrians. This study focuses on the framework of open-world Person Re-ID where, as oppose to closed Person Re-ID, the query may not be in the gallery, requiring a better solution to distinguish people. Accordingly, this paper proposes a Deeply supervised Part Feature (DSPF) Deep learning network. Images pre-processed by part alignment are first feature extracted by a deep learning backbone network, followed by a self-attention module yielding a highlighted global feature map. On the one hand, the global feature map is then equally divided into six horizontal parts and further extracted by a part-level feature extracting subnetwork, yielding a set of six finer features. Meanwhile, another derived set of features is extracted by a deep supervised deep learning subnetwork on various feature maps from different layers in the backbone network. The whole DSPF is then trained based on a cost function defined by these two sets of derived features. Two most comprehensive datasets for Person Re-ID: Market1501, DukeMTMC-ReID are used, in addition to an EE3F dataset containing images from the entrance and corridor cameras on the third floor of EE building in NTUST. Based on the training/testing protocol used in literature, for set verification (SV), at 0.1% False Target Rate (FTR), the True Target Rate (TTR) are: 40.96%, 38.54%, and 60.9%, respectively. For individual verification (IV), at 0.1% False Target Rate (FTR), the True Target Rate (TTR) are: 79.22, 74.68%, and 89.25%, respectively. Both demonstrate the competitive capability of the proposed DSPF in constructing solutions for open-world Person Re-ID problems.
[1] L. Wei, S. Zhang, H. Yao, W. Gao, and Q. Tian, “Glad: Global-local-alignment descriptor for pedestrian retrieval,” in Proceedings of the 2017 ACM Multimedia Conference, pp. 420–428, 2017.
[2] C. Su, J. Li, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Pose-driven deep convolutional model for person re-identification,” in 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3980–3989, 2017.
[3] H. Yao, S. Zhang, R. Hong, Y. Zhang, C. Xu, and Q. Tian, “Deep representation learning with part loss for person re-identification,” IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2860–2871, 2019.
[4] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 480–496, 2018.
[5] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” arXiv preprint arXiv:1805.08318, 2018.
[6] L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identification: Past, present and future,” arXiv preprint arXiv:1610.02984, 2016.
[7] Q. Leng, M. Ye, and Q. Tian, “A survey of open-world person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2019.
[8] R. Varior, M. Haloi, and G. Wang, “Gated siamese convolutional neural network architecture for human re-identification,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9912 LNCS, pp. 791–808, 2016.
[9] D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, “Person re-identification by multi-channel partsbased cnn with improved triplet loss function,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1335–1344, 2016.
[10] W. Chen, X. Chen, J. Zhang, and K. Huang, “Beyond triplet loss: A deep quadruplet network for person re-identification,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1320–1329, 2017.
[11] A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,” arXiv preprint arXiv:1703.07737, 2017.
[12] Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Z. Hu, C. Yan, and Y. Yang, “Improving person re-identification by attribute and identity learning,” Pattern Recognition, vol. 95, pp. 151–161, 2019.
[13] H. Wang, Y. Fan, Z. Wang, L. Jiao, and B. Schiele, “Parameter-free spatial attention network for person re-identification,” arXiv preprint arXiv:1811.12150, 2018.
[14] Y. Wang, L. Wang, Y. You, X. Zou, V. Chen, S. Li, G. Huang, B. Hariharan, and K. Q. Weinberger, “Resource aware person re-identification across multiple resolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8042–8051, 2018.
[15] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in Artificial intelligence and statistics, pp. 562–570, 2015.
[16] M. M. Kalayeh, E. Basaran, M. Gökmen, M. E. Kamasak, and M. Shah, “Human semantic parsing for person re-identification,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1062–1071, 2018.
[17] X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, “Hydraplus-net: Attentive deep features for pedestrian analysis,” in 2017 IEEE International Conference on Computer Vision (ICCV), pp. 350–359, 2017.
[18] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1116–1124, 2015.
[19] Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person reidentification baseline in vitro,” in 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3774–3782, 2017.
[20] W. Zheng, S. Gong, and T. Xiang, “Transfer re-identification: From person to set-based verification,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2650–2657, 2012.
[21] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[22] X. Li, A. Wu, and W.-S. Zheng, “Adversarial open-world person re-identification,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 280–296, 2018.
[23] W. Zheng, S. Gong, and T. Xiang, “Towards open-world person re identification by one-shot groupbased verification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 3, pp. 591–606, 2016.
[24] X. Zhu, B. Wu, D. Huang, and W. Zheng, “Fast open-world person re-identification,” IEEE Transactions on Image Processing, vol. 27, no. 5, pp. 2286–2300, 2018.
[25] T. Xiao, H. Li, W. Ouyang, and X. Wang, “Learning deep feature representations with domain guided dropout for person re-identification,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1249–1258, 2016.
[26] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: A simple and accurate method to fool deep neural networks,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2574–2582, 2016.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.