行人重識別之人體部位金字塔多尺度池化特徵融合監督網路

簡易檢索 / 詳目顯示

回結果列表

研究生：	周青翰 Cing-Han Chou
論文名稱：	行人重識別之人體部位金字塔多尺度池化特徵融合監督網路 Part-based Pyramid Pooling Feature Fusion in Multi-Scale Supervised Network for Person Re-Identification
指導教授：	蘇順豐 Shun-Feng Su
口試委員:	姚立德 Leehter Yao 鍾聖倫 Sheng-Luen Chung 陳美勇 Mei-Yung Chen 呂藝光 Yih-Guang Leu 蘇順豐 Shun-Feng Su
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	64
中文關鍵詞：	行人重辨識、特徵學習、深度學習、雙分支分類任務、金字塔池化
外文關鍵詞：	Person Re-Identification, Features Learning, Deep Learning, Bi-branch Classification Task, Pyramid Pooling
相關次數：	點閱：337 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

行人重識別(Person Re-identification) 是利用計算機視覺技術判斷圖像或者影像序列中是否存在特定行人的技術。廣泛被認為是一個圖像檢索的子問題。為了取得多尺度以及具辨別性的人物特徵，因此本論文提出人體部位金字塔多尺度池化特徵融合監督網路(PFMSNet)。利用金字塔池化模塊提取出行人部位的多尺度特徵，並將多尺度特徵進行疊合使網路擁有更大的感受域(receptive field)進行人體部位的分類。但是在進行疊合前的上採樣操作會導致雜訊的產生，因此引入SE (Squeeze-and-Excitation) Block網路結構，利用通道注意力對特徵圖進行加權，將雜訊以及冗餘的特徵進行濾除，並保留重要的特徵訊息。最後再加入多尺度特徵獨立分類任務，使網路成為雙分支分類任務模型，能夠進一步監督多尺度特徵，還能增加更多語義訊息。本論文在Market1501以及DukeMTMC-reID兩個數據集下進行訓練與測試，並在兩個數據集下Rank-1分別達到94.7%及87.7%，而mAP則分別達到85.1%及75.6%。

Person Re-identification is a technique that uses computer vision techniques to determine whether a particular pedestrian is present in images or video sequence. It is widely believed to be a sub-question for image retrieval, given a camera of pedestrian images, the image of the pedestrian across the device is retrieved. In order to obtain pedestrian features with multi-scale and discriminative characteristics, this study proposes a Part-based Pyramid Pooling Feature Fusion in Multi-Scale Supervised Network (PFMSNet). The multi-scale feature of the pedestrian part is extracted by the pyramid pooling module, and the multi-scale features are concatenated to make the network have a larger receptive field for the classification of body parts. However, the upsampling operation before the concatenate will result in the generation of noise. Therefore, using the SE (Squeeze-and-Excitation) Block network structure, the feature map is weighted by the channel attention, so that the noise and redundant features are filtered out and the important information is retained. Finally, the multi-scale feature independent classification task is added to make the network a bi-branch classification task model, which can further supervise multi-scale features and add more semantic information. The neural network model proposed in this study is trained and tested in the two datasets of Market1501 and DukeMTMC-reID. In the two datasets, Rank-1 reaches 94.7% and 87.7%, and mAP reaches 85.1% and 75.6%.

摘要    i
Abstract    ii
致謝    iii
Contents    iv
List of Figures    vi
List of Tables    viii
Chapter 1    Introduction    1
1    Background    1
2    Motivation    2
3    Baseline model    3
4    Contributions and Model Structure    4
5    Thesis Organization    7
Chapter 2    Related work    8
1    Appearance Attribute    8
2    Data Augmentation    9
3    Semantic Segmentation    12
4    Metric Learning    13
5    Semantic Part-Based    15
6    Abstract Part-Based    18
Chapter 3    Methodology    21
1    Pyramid Pooling Module    23
2    SE Block    27
3    Bi-branch Classification Task    28
4    Classification Sub-Network    31
Chapter 4    Experiment    33
1    DataSet    33
1.1    Market-1501    33
1.2    DukeMTMC-reID    34
2    Assessment Index and Protocol    36
2.1    Cumulative Match Characteristic curve (CMC) and Rank-k    36
2.2    mean Average Precision (mAP)    37
2.3    Test Protocol    38
3    Training and Testing process    39
3.1    Training Phase    39
3.2    Testing Phase    40
4    Implementation Detail    41
4.1    Data Augmentation    41
4.2    Training Details and Hyper Parameter Settings    42
4.3    Hardware and Software Environment    42
5    Parameters Analysis    43
5.1    Number of pyramid scales    43
5.2    Number of feature vectors for testing    44
6    Compare with State of the Art    44
7    Cross-Domain Experiment    47
Chapter 5    Conclusions and Future Work    49
1    Conclusions    49
2    Future Work    50
References    51


                                

[1] T. Huang and S. Russell, "Object identification in a bayesian context," in IJCAI, 1997, vol. 97, pp. 1276-1282.
[2] W. Zajdel, Z. Zivkovic, and B. Krose, "Keeping track of humans: Have I seen this person before?," in Proceedings of the 2005 IEEE International Conference on Robotics and Automation, 2005: IEEE, pp. 2081-2086.
[3] N. Gheissari, T. B. Sebastian, and R. Hartley, "Person reidentification using spatiotemporal appearance," in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, vol. 2: IEEE, pp. 1528-1535.
[4] D. Yi, Z. Lei, S. Liao, and S. Z. Li, "Deep metric learning for person re-identification," in 2014 22nd International Conference on Pattern Recognition, 2014: IEEE, pp. 34-39.
[5] W. Li, R. Zhao, T. Xiao, and X. Wang, "Deepreid: Deep filter pairing neural network for person re-identification," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 152-159.
[6] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, "Signature verification using a" siamese" time delay neural network," in Advances in neural information processing systems, 1994, pp. 737-744.
[7] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, "Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 480-496.
[8] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[9] M. Geng, Y. Wang, T. Xiang, and Y. Tian, "Deep transfer learning for person re-identification," arXiv preprint arXiv:1611.05244, 2016.
[10] Y. Lin et al., "Improving person re-identification by attribute and identity learning," Pattern Recognition, 2019.
[11] Z. Zheng, L. Zheng, and Y. Yang, "Unlabeled samples generated by gan improve the person re-identification baseline in vitro," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3754-3762.
[12] L. Wei, S. Zhang, W. Gao, and Q. Tian, "Person transfer gan to bridge domain gap for person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 79-88.
[13] X. Qian et al., "Pose-normalized image generation for person re-identification," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 650-667.
[14] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291-7299.
[15] M. M. Kalayeh, E. Basaran, M. Gökmen, M. E. Kamasak, and M. Shah, "Human semantic parsing for person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1062-1071.
[16] R. R. Varior, M. Haloi, and G. Wang, "Gated siamese convolutional neural network architecture for human re-identification," in European conference on computer vision, 2016: Springer, pp. 791-808.
[17] F. Schroff, D. Kalenichenko, and J. Philbin, "Facenet: A unified embedding for face recognition and clustering," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815-823.
[18] H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, "End-to-end comparative attention networks for person re-identification," IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3492-3506, 2017.
[19] D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, "Person re-identification by multi-channel parts-based cnn with improved triplet loss function," in Proceedings of the iEEE conference on computer vision and pattern recognition, 2016, pp. 1335-1344.
[20] W. Chen, X. Chen, J. Zhang, and K. Huang, "Beyond triplet loss: a deep quadruplet network for person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 403-412.
[21] A. Hermans, L. Beyer, and B. Leibe, "In defense of the triplet loss for person re-identification," arXiv preprint arXiv:1703.07737, 2017.
[22] L. Zheng, Y. Huang, H. Lu, and Y. Yang, "Pose invariant embedding for deep person re-identification," IEEE Transactions on Image Processing, 2019.
[23] H. Zhao et al., "Spindle net: Person re-identification with human body region guided feature decomposition and fusion," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1077-1085.
[24] G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou, "Learning discriminative features with multiple granularities for person re-identification," in 2018 ACM Multimedia Conference on Multimedia Conference, 2018: ACM, pp. 274-282.
[25] X. Zhang et al., "Alignedreid: Surpassing human-level performance in person re-identification," arXiv preprint arXiv:1711.08184, 2017.
[26] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[27] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
[28] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, "Scalable person re-identification: A benchmark," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1116-1124.
[29] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, "Object detection with discriminatively trained part-based models," IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627-1645, 2009.
[30] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, "Performance measures and a data set for multi-target, multi-camera tracking," in European Conference on Computer Vision, 2016: Springer, pp. 17-35.
[31] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[32] P. Goyal et al., "Accurate, large minibatch sgd: Training imagenet in 1 hour," arXiv preprint arXiv:1706.02677, 2017.
[33] Z. Zhong, L. Zheng, D. Cao, and S. Li, "Re-ranking person re-identification with k-reciprocal encoding," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1318-1327.
[34] H. Huang, D. Li, Z. Zhang, X. Chen, and K. Huang, "Adversarially occluded samples for person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5098-5107.
[35] Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y. Yang, "Camera style adaptation for person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5157-5166.
[36] X. Chang, T. M. Hospedales, and T. Xiang, "Multi-level factorisation net for person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2109-2118.
[37] M. Saquib Sarfraz, A. Schumann, A. Eberle, and R. Stiefelhagen, "A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 420-429.
[38] W. Li, X. Zhu, and S. Gong, "Harmonious attention network for person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2285-2294.
[39] J. Si et al., "Dual attention matching network for context-aware feature sequence based person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5363-5372.
[40] C. Wang, Q. Zhang, C. Huang, W. Liu, and X. Wang, "Mancs: A multi-task attentional network with curriculum sampling for person re-identification," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 365-381.
[41] Z. Zhong, L. Zheng, S. Li, and Y. Yang, "Generalizing a person retrieval model hetero-and homogeneously," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 172-188.
[42] H. Fan, L. Zheng, C. Yan, and Y. Yang, "Unsupervised person re-identification: Clustering and fine-tuning," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 14, no. 4, p. 83, 2018.
[43] W. Deng, L. Zheng, Q. Ye, G. Kang, Y. Yang, and J. Jiao, "Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 994-1003.
[44] J. Wang, X. Zhu, S. Gong, and W. Li, "Transferable joint attribute-identity deep learning for unsupervised person re-identification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2275-2284.
[45] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.

全文公開日期 2024/08/15 (校內網路)
全文公開日期 2024/08/15 (校外網路)
全文公開日期 2024/08/15 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文