不同深度卷積神經網路應用於行人再識別之研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林士程 Shih-Cheng Lin
論文名稱：	不同深度卷積神經網路應用於行人再識別之研究 A Study of Different Deep Convolutional Neural Networks on Person Re-Identification
指導教授：	吳怡樂 Yi-Leh Wu
口試委員:	唐政元 Zheng-Yuan Tang 陳建中 Jian-Zhong Chen 閻立剛 Li-Gang Yan
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	110
語文別：	英文
論文頁數：	40
中文關鍵詞：	深度學習、卷積神經網路、行人再識別
外文關鍵詞：	Deep Learning, Convolution Neural Network, Person Re-Identification
相關次數：	點閱：315 下載：16
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

近年來，由於智慧家庭、科技執法等議題，行人再識別在電腦視覺領域中再度受到關注。行人再識別的目標即是在不同的攝影鏡頭中搜尋同一目標行人，達到所謂的跨鏡頭追蹤，而它的困難之處在於行人影像會隨著鏡頭解析度、行人姿態、拍攝角度等不同而造成辨識困難，如何準確的追蹤行人成為一大挑戰。然而，隨著深度學習的快速發展，許多有效的方法也相繼提出。本篇論文將現有的行人再識別模型架構做更改，將架構中的特徵擷取器替換成3種主流的卷積神經網路，並使用Market-1501、DukeMTMC-reID及CUHK03三種真實世界資料集進行一系列實驗。實驗表明，網路中好的特徵擷取器能增加整體性能，以原本的架構相比，無論是準確度以及訓練時間DenseNet都擁有更好的成績。

Recently, the field of computer vision, Pedestrian Re-Identification attracts attention again because of smart home, science and technology in law enforcement etc. issues. Pedestrian Re-Identification primary goal is to find target pedestrian in different parts of video sequences to let it cross-camera tracing. However, difficult point in pedestrian video sequences has different parameters such as camera’s shooting angle, video resolution, even pedestrian posture etc. will make identify difficult. How to trace pedestrian accuracy will be the challenge. However, with the rapid development of deep learning, many methods have been provided. This paper will change feature extractor to three different kind of Convolutional Neural Network at existing Pedestrian Re-Identification model and experiment by the Market-1501, the DukeMTMC-reID, and the CUHK03 which are real-world datasets. The experiments show that a good feature extractor in the architecture can increase the overall performance. Compared with the original architecture, the DenseNet has better results in terms of accuracy and training time.

論文摘要 iii
Abstract iv
Contents v
LIST OF FIGURES vi
LIST OF TABLES vii
Chapter 1. Introduction 1
Chapter 2. Related Work 3
2.1 Person Re-Identification 3
2.2 Deep Residual Network 4
Chapter 3. Feature Extraction 7
3.1 Baseline Architecture 7
3.2 Modified Architecture 7
3.3 Convolutional Neural Network 8
3.3.1 EfficientNet Network 8
3.3.2 MobileNetV2 Network 10
3.3.3 DenseNet Network 11
Chapter 4. Experiments 12
4.1 Datasets 12
4.1.1 Market-1501 12
4.1.2 DukeMTMC-reID 12
4.1.3 CUHK03 12
4.2 Evaluation Metrics and Training Settings 14
4.3 Performance of Baseline 14
4.4 Performance of EfficientNet Feature Extraction 15
4.5 Performance of MobileNetV2 Feature Extraction 17
4.6 Performance of DenseNet Feature Extraction 19
4.7 The Impact of Numbers of Horizontal Stripes 21
4.8 The Impact of Numbers of Fully Connected Layers 23
4.9 The Impact of Dimensions of Column Vector 25
4.10 The Impact of Share Same Convolution Layer 27
Chapter 5. Conclusions and Future Work 29
References 30

                                

[1] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440, 2015.
[2] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, "Convolutional pose machines," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4724-4732, 2016.
[3] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, "Deepercut: A deeper, stronger, and faster multi-person pose estimation model," in European Conference on Computer Vision, pp. 34-50: Springer, 2016.
[4] A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," in European conference on computer vision, pp. 483-499: Springer, 2016.
[5] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291-7299, 2017.
[6] H. Yao, S. Zhang, R. Hong, Y. Zhang, C. Xu, and Q. Tian, "Deep representation learning with part loss for person re-identification," vol. 28, no. 6, pp. 2860-2871, 2019.
[7] L. Zhao, X. Li, Y. Zhuang, and J. Wang, "Deeply-learned part-aligned representations for person re-identification," in Proceedings of the IEEE international conference on computer vision, pp. 3219-3228, 2017.
[8] X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, "Hydraplus-net: Attentive deep features for pedestrian analysis," in Proceedings of the IEEE international conference on computer vision, pp. 350-359, 2017.
[9] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, "Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)," in Proceedings of the European conference on computer vision (ECCV), pp. 480-496, 2018.
[10] N. Gheissari, T. B. Sebastian, and R. Hartley, "Person reidentification using spatiotemporal appearance," in 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), vol. 2, pp. 1528-1535: IEEE, 2006.
[11] A. Das, A. Chakraborty, and A. K. Roy-Chowdhury, "Consistent re-identification in a camera network," in European conference on computer vision, pp. 330-345: Springer, 2014.
[12] Z. Zheng, L. Zheng, and Y. Yang, "Unlabeled samples generated by gan improve the person re-identification baseline in vitro," in Proceedings of the IEEE International Conference on Computer Vision, pp. 3754-3762, 2017.
[13] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[14] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and L. Fei-Fei, "Imagenet large scale visual recognition challenge," vol. 115, no. 3, pp. 211-252, 2015.
[15] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International Conference on Machine Learning, pp. 6105-6114: PMLR, 2019.
[16] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520, 2018.
[17] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017.
[18] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," 2017.
[19] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, "Scalable person re-identification: A benchmark," in Proceedings of the IEEE international conference on computer vision, pp. 1116-1124, 2015.
[20] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, "Performance measures and a data set for multi-target, multi-camera tracking," in European conference on computer vision, pp. 17-35: Springer, 2016.
[21] W. Li, R. Zhao, T. Xiao, and X. Wang, "Deepreid: Deep filter pairing neural network for person re-identification," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 152-159, 2014.
[22] P. Felzenszwalb, D. McAllester, and D. Ramanan, "A discriminatively trained, multiscale, deformable part model," in 2008 IEEE conference on computer vision and pattern recognition, pp. 1-8: IEEE, 2008.
[23] Z. Zhong, L. Zheng, D. Cao, and S. Li, "Re-ranking person re-identification with k-reciprocal encoding," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1318-1327, 2017.
[24] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," 2014.
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," vol. 25, pp. 1097-1105, 2012.
[26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," vol. 86, no. 11, pp. 2278-2324, 1998.
[27] A. Howard, M. Sandler, B. Chen, W. Wang, L. Chen, M. Tan, G. Chu, V. Vasudevan, Y. Zhu, R. Pang, H. Adam, and Q. Le, "Searching for mobilenetv3," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314-1324, 2019.
[28] F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251-1258, 2017.

簡易檢索 / 詳目顯示

相關論文