簡易檢索 / 詳目顯示

研究生: 詹瑞珊
Jui-Shan Chan
論文名稱: 基於人體部位對齊特徵之跨鏡追蹤的改良深度學習技術
Deep Learning of Improved Part-Aligned Features for Person Re-Identification
指導教授: 鍾聖倫
Sheng-Luen Chung
口試委員: 鍾聖倫
Sheng-Luen Chung
蘇順豐
Shun-Feng Su
徐繼聖
Gee-Sern (Jison) Hsu
黃于飛
Fay (Yu-Fei) Huang
陳金聖
Chin-Sheng Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 79
中文關鍵詞: 跨鏡追蹤人體再識別部位對齊特徵學習深度學習
外文關鍵詞: Part Alignment
相關次數: 點閱:193下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

跨鏡追蹤(Re-ID)為跨攝影機鏡頭,於監控系統所覆蓋之場景中,對曾出現過之行人進行再辨識及檢索之動作。而監控攝影機可拍攝到不同場景、不同視角且具有多型態人物之影像之特性,使跨鏡追蹤技術可使用現有影像作為大型之監控系統、人物搜索系統等,對大範圍的人物進行辨識,並可廣泛應用於公共和私人之場所,如:學校、公司、商場等,十分具有其應用價值。然而,跨鏡追蹤面臨大量人物 (IDs)、龐大的搜索圖庫、人物具有相似外觀及穿著、人物圖像解析度低、人體部分被遮蔽等情況,使其成為電腦視覺領域中一深具挑戰性之議題。若以一般物件辨識、人臉辨識萃取全局特徵之方式,不足以準確地進行辨識,因此,需找到更加細微、且更具區分力之特徵訊息,以得出人與人間之差別。近年來,許多相關文獻指出,將人體切分為多個部位,並透過深度學習網路萃取其所屬的區域特徵資訊,能有效提升Re-ID辨識之效能及準確性。為了進一步改進切分區塊特徵之方法,本論文提出改良人體部位對齊特徵網路之深度學習框架,以更好地表徵一個人的完整信息,其具有以下三個特點:首先,使用人體2D關節點偵測 (OpenPose)替換人物偵測器以更好地檢測和裁剪人體之影像,並將人物之影像調整至相似之構圖。接著,將人體以水平及垂直方向切割,透過更精細的部位分割來得到人體細部之特徵,並利用更強力之學習網路,結合Inception整合多尺度卷積、ResNet短接線,以及SENet壓縮及激活,保留最重要之特徵資訊。最後,以聚合距離做為人物相似度之計算,得到更準確之結果。本論文提出之架構已於兩個最大之Re-ID資料庫上進行訓練及測試,並與近年來之解決方案進行比較,對於Market1501(DukeMTMC-reID)數據集,mAP達到85.96% (84.70),以及CMC1達到94.30% (89.84),其分析結果與其它文獻所提頗具競爭力。


Person Re-IDentification (Re-ID) is to recognize and retrieve a person who has been seen before by different cameras from possibly scenes covered by a surveillance system, commonly deployed at public and private premises. Re-ID poses as one of the most difficult computer vision problems owing to the enormous amount of identities involved in a large-scale image pool, with much similar appearance constrained by low resolution image, in a possibly occluded scene, etc. Global features geared for general object recognition and face recognition are far less adequate to re-identifying a same person across cameras. As such, more discriminative features are needed to identify people. In particular, part-based feature extraction methods that extract by learning local fine-grained features of different human body parts from detected persons have been proved effective for person Re-ID. To further improve the part-aligned spatial feature approach, this thesis proposes a deep learning framework to better characterize a person's complete information with the following threes highlights: First, better person detection and cropping by replacing a common person detector by a 2D person skeleton joints localizer (OpenPose) to facilitate the following part-alignment. Second, finer part segmentation: by containing both horizontal and vertical divided strips from the cropped person silhouette to cover both detected person’s feature and possible accessory belonging feature. Third, better learning network, by utilizing the particular advantages of Inception in combining features of different scales, ResNet in circumventing gradient dissipation for networks with deep layers, and SENet in retaining the most critical features. Our proposed solution has been trained and tested on the two most comprehensive Re-ID datasets and compared to reported state-of-the-art solutions: for the dataset of Market1501 (DukeMTMC-reID), our proposed solution both achieves competitive results with mAP of 85.96% (84.70) and CMC 1 of 94.30% (89.84), respectively.

中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I 英文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III 誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI 圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX 表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XII 第1 章: 概論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Re-ID 之描述與應用. . . . . . . . . . . . . . . . . . . . 1 1.2 Re-ID 面臨之挑戰. . . . . . . . . . . . . . . . . . . . . 3 1.2.1 橫跨多台攝影機. . . . . . . . . . . . . . . . . . 4 1.2.2 行人外觀相似. . . . . . . . . . . . . . . . . . . 5 1.2.3 行人影像解析度不佳. . . . . . . . . . . . . . . 5 1.2.4 搜索圖庫龐大. . . . . . . . . . . . . . . . . . . 6 1.2.5 訓練集與測試集人物ID 不相同. . . . . . . . . 7 1.3 目前最先進之Re-ID 技術. . . . . . . . . . . . . . . . . 7 1.4 本論文之貢獻及系統架構. . . . . . . . . . . . . . . . . 9 1.5 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . 10 第2 章: 跨鏡追蹤與文獻上Re-ID 的作法. . . . . . . . . . . . . 12 2.1 Re-ID 之訓練測試協定及解法摘要. . . . . . . . . . . . 13 2.2 文獻審閱. . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 外觀屬性. . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 資料增量. . . . . . . . . . . . . . . . . . . . . . 16 2.2.3 語意分割. . . . . . . . . . . . . . . . . . . . . . 17 2.3 分類Part 學習後合併之方法. . . . . . . . . . . . . . . 18 2.3.1 PCB 基準線解決方案. . . . . . . . . . . . . . . 19 2.3.2 其他分類part 學習之方法. . . . . . . . . . . . 21 2.4 特徵擷取與相似度判定. . . . . . . . . . . . . . . . . . 24 第3 章: 本論文之解法- Improved Par-Aligned Features (IPAF) 27 3.1 人體部位取像. . . . . . . . . . . . . . . . . . . . . . . 27 3.2 人體特徵擷取. . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 骨幹網路. . . . . . . . . . . . . . . . . . . . . . 34 3.2.2 部位特徵擷取網路. . . . . . . . . . . . . . . . 34 3.3 相似度比對. . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 改良之深度學習網路架構. . . . . . . . . . . . . . . . . 42 第4 章: 實驗設置及結果. . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Re-ID 之資料集與評估標準. . . . . . . . . . . . . . . . 45 4.2 實現平台與framework 以及訓練測試網路切換. . . . . 49 4.3 IPAF 與基本型之差異與效能改善. . . . . . . . . . . . 50 4.4 訓練測試時間及Re-ranking 計算之效能分析. . . . . . 54 4.5 交叉測試. . . . . . . . . . . . . . . . . . . . . . . . . . 55 第5 章: 討論與未來展望. . . . . . . . . . . . . . . . . . . . . . . 57 5.1 與其他研究之比較及結論. . . . . . . . . . . . . . . . . 57 5.2 未來研究方向. . . . . . . . . . . . . . . . . . . . . . . 60 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

[1] X. Wang, “Intelligent multi-camera video surveillance: A review,” Pattern recognition letters, vol. 34, no. 1, pp. 3–19, 2013.
[2] T. Huang and S. Russell, “Object identification in a bayesian context,” in IJCAI, vol. 97, pp. 1276–1282, 1997.
[3] W. Zajdel, Z. Zivkovic, and B. Krose, “Keeping track of humans: Have i seen this person before?,” in Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pp. 2081–2086, IEEE, 2005.
[4] N. Gheissari, T. B. Sebastian, and R. Hartley, “Person reidentification using spatiotemporal appearance,” in null, pp. 1528–1535, IEEE, 2006.
[5] L. Bazzani, M. Cristani, A. Perina, M. Farenzena, and V. Murino, “Multiple-shot person reidentification by hpe signature,” in Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 1413–1416, IEEE, 2010.
[6] M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani, “Person re-identification by symmetry-driven accumulation of local features,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 2360–2367, IEEE, 2010.
[7] D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Deep metric learning for person re-identification,” in Pattern Recognition (ICPR), 2014 22nd International Conference on, pp. 34–39, IEEE, 2014.
[8] W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person reidentification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159, 2014.
[9] L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identification: Past, present and future,” arXiv preprint arXiv:1610.02984, 2016.
[10] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1116– 1124, 2015.
[11] Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person reidentification baseline in vitro,” in Proceedings of the IEEE International Conference on Computer Vision, 2017.
[12] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling,” arXiv preprint arXiv:1711.09349, 2017.
[13] D. Gray and H. Tao, “Viewpoint invariant pedestrian recognition with an ensemble of localized features,” in European conference on computer vision, pp. 262–275, Springer, 2008. 61
[14] R. Zhao, W. Ouyang, and X. Wang, “Unsupervised salience learning for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3586–3593, 2013.
[15] Y. Lin, L. Zheng, Z. Zheng, Y. Wu, and Y. Yang, “Improving person re-identification by attribute and identity learning,” arXiv preprint arXiv:1703.07220, 2017.
[16] Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y. Yang, “Camera style adaptation for person reidentification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5157–5166, 2018.
[17] W. Deng, L. Zheng, Q. Ye, G. Kang, Y. Yang, and J. Jiao, “Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person reidentification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 6, 2018.
[18] L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person reidentification,” arXiv preprint arXiv:1711.08565, 2017.
[19] M. M. Kalayeh, E. Basaran, M. Gökmen, M. E. Kamasak, and M. Shah, “Human semantic parsing for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1062–1071, 2018.
[20] C. Song, Y. Huang, W. Ouyang, and L. Wang, “Mask-guided contrastive attention model for person reidentification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1179–1188, 2018.
[21] J. Almazan, B. Gajic, N. Murray, and D. Larlus, “Re-id done right: towards good practices for person re-identification,” arXiv preprint arXiv:1801.05339, 2018.
[22] L. Zheng, Y. Huang, H. Lu, and Y. Yang, “Pose invariant embedding for deep person re-identification,” arXiv preprint arXiv:1701.07732, 2017.
[23] C. Su, J. Li, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Pose-driven deep convolutional model for person re-identification,” in Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 3980– 3989, IEEE, 2017.
[24] L. Wei, S. Zhang, H. Yao, W. Gao, and Q. Tian, “Glad: global-local-alignment descriptor for pedestrian retrieval,” in Proceedings of the 2017 ACM on Multimedia Conference, pp. 420–428, ACM, 2017.
[25] H. Yao, S. Zhang, Y. Zhang, J. Li, and Q. Tian, “Deep representation learning with part loss for person re-identification,” arXiv preprint arXiv:1707.00798, 2017.
[26] W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identification,” in CVPR, vol. 1, p. 2, 2018.
[27] J. Si, H. Zhang, C.-G. Li, J. Kuen, X. Kong, A. C. Kot, and G. Wang, “Dual attention matching network for context-aware feature sequence based person re-identification,” arXiv preprint arXiv:1803.09937, 2018.
[28] X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, “Hydraplus-net: Attentive deep features for pedestrian analysis,” arXiv preprint arXiv:1709.09930, 2017.
[29] L. Zhao, X. Li, J. Wang, and Y. Zhuang, “Deeply-learned part-aligned representations for person reidentification,” in Proceedings of the IEEE International Conference on Computer Vision, vol. 8, 2017.
[30] X. Bai, M. Yang, T. Huang, Z. Dou, R. Yu, and Y. Xu, “Deep-person: Learning discriminative deep features for person re-identification,” arXiv preprint arXiv:1711.10658, 2017.
[31] G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” arXiv preprint arXiv:1804.01438, 2018.
[32] H. Huang, D. Li, Z. Zhang, X. Chen, and K. Huang, “Adversarially occluded samples for person reidentification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5098–5107, 2018.
[33] X. Zhang, H. Luo, X. Fan, W. Xiang, Y. Sun, Q. Xiao, W. Jiang, C. Zhang, and J. Sun, “Alignedreid: Surpassing human-level performance in person re-identification,” arXiv preprint arXiv:1711.08184, 2017.
[34] Y. Suh, J. Wang, S. Tang, T. Mei, and K. M. Lee, “Part-aligned bilinear representations for person re-identification,” arXiv preprint arXiv:1804.07094, 2018.
[35] Z. Zheng, L. Zheng, and Y. Yang, “Pedestrian alignment network for large-scale person reidentification,” arXiv preprint arXiv:1707.00408, 2017.
[36] R. R. Varior, M. Haloi, and G. Wang, “Gated siamese convolutional neural network architecture for human re-identification,” in European Conference on Computer Vision, pp. 791–808, Springer, 2016.
[37] D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, “Person re-identification by multi-channel partsbased cnn with improved triplet loss function,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1335–1344, 2016.
[38] C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Deep attributes driven multi-camera person reidentification,” in European conference on computer vision, pp. 475–491, Springer, 2016.
[39] L. Yang and R. Jin, “Distance metric learning: A comprehensive survey,” Michigan State Universiy, vol. 2, no. 2, p. 4, 2006.
[40] Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking person re-identification with k-reciprocal encoding,” in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pp. 3652–3661, IEEE, 2017.
[41] M. S. Sarfraz, A. Schumann, A. Eberle, and R. Stiefelhagen, “A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking,” arXiv preprint arXiv:1711.10378, 2017.
[42] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” arXiv preprint arXiv:1611.08050, 2016.
[43] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[44] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
[45] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[46] J. Hu, L. Shen, and G. Sun, “Squeeze-and-ex citation networks,” arXiv preprint arXiv:1709.01507, vol. 7, 2017.
[47] P. Jaccard, “Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines,” in Bulletin de la Société Vaudoise des Sciences Naturelles 37, pp. 241–272, 1901.
[48] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2010.
[49] Y. Sun, L. Zheng, W. Deng, and S. Wang, “Svdnet for pedestrian retrieval,” arXiv preprint, vol. 1, no. 6, 2017.

QR CODE