研究生: |
Albert Christianto Albert Christianto |
---|---|
論文名稱: |
應用深度估計與語義分割進行行人偵測 Pedestrian Detection Using Depth Estimation Maps and Semantic Segmentation |
指導教授: |
方文賢
Wen-Hsien Fang 陳郁堂 Yie-Tarng Chen |
口試委員: |
徐繼聖
Gee-Sern Hsu 賴坤財 Kuen-Tsair Lay 丘建青 Chien-Ching Chiu |
學位類別: |
博士 Doctor |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 58 |
中文關鍵詞: | Depth estimation maps 、fusion network 、multi-scale 、pedestrian detection 、semantic segmentation maps |
外文關鍵詞: | Depth estimation maps, fusion network, multi-scale, pedestrian detection, semantic segmentation maps |
相關次數: | 點閱:245 下載:17 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
This thesis presents a pedestrian detection framework using the combination
of depth estimation maps and semantic segmentation maps. It consists of two main
components, which are a depth segmentation Region Proposal Network (ds-RPN)
and a depth segmentation Region-based Convolutional Neural Network (ds-RCNN).
We employ a Depth Input Network (DIN) as the input to the depth maps and rene
inaccurate depth estimation maps. Thereafter, a segmentation infusion network is
invoked to infuse semantic features into the shared feature maps. Afterward, a fusion
strategy is employed to eectively combine the shared feature maps, the semantic
feature maps, and the depth maps. Finally, the combined feature maps are passed
on to the ds-RPN and the ds-RCNN to perform pedestrian detection. Experiment
results show that the proposed method achieves a competitive result in term of Miss
Rate (MR) on the widespread Caltech dataset.
This thesis presents a pedestrian detection framework using the combination
of depth estimation maps and semantic segmentation maps. It consists of two main
components, which are a depth segmentation Region Proposal Network (ds-RPN)
and a depth segmentation Region-based Convolutional Neural Network (ds-RCNN).
We employ a Depth Input Network (DIN) as the input to the depth maps and rene
inaccurate depth estimation maps. Thereafter, a segmentation infusion network is
invoked to infuse semantic features into the shared feature maps. Afterward, a fusion
strategy is employed to eectively combine the shared feature maps, the semantic
feature maps, and the depth maps. Finally, the combined feature maps are passed
on to the ds-RPN and the ds-RCNN to perform pedestrian detection. Experiment
results show that the proposed method achieves a competitive result in term of Miss
Rate (MR) on the widespread Caltech dataset.
References
[1] X. Zhang, L. Cheng, B. Li, and H. Hu, \Too Far to See? Not Really!
Pedestrian Detection With Scale-Aware Localization Policy," IEEE
Transactions on Image Processing, vol. 27, pp. 3703{3715, Aug 2018.
[2] G. Brazil, X. Yin, and X. Liu, \Illuminating Pedestrians via Simultaneous
Detection and Segmentation," in Proceedings of the IEEE International Con-
ference on Computer Vision, pp. 4960{4969, 2017.
[3] C. Godard, O. Mac Aodha, and G. J. Brostow, \Unsupervised Monocular Depth
Estimation with Left-Right Consistency," in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 270{279, 2017.
[4] P. Dollar, C. Wojek, B. Schiele, and P. Perona, \Pedestrian Detection: An
Evaluation of the State of the Art," IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 34, pp. 743{761, April 2012.
[5] L. Zhang, L. Lin, X. Liang, and K. He, \Is Faster R-CNN Doing Well for
Pedestrian Detection?," in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 443{457, 2016.
[6] J. Li, X. Liang, S. Shen, T. Xu, J. Feng, and S. Yan, \Scale-Aware Fast RCNN
for Pedestrian Detection," IEEE Transactions on Multimedia, vol. 20,
no. 4, pp. 985{996, 2017.
[7] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, \A Unied Multi-scale Deep
Convolutional Neural Network for Fast Object Detection," in Proceedings of
the European Conference on Computer Vision, pp. 354{370, 2016.
[8] X. Du, M. El-Khamy, J. Lee, and L. Davis, \Fused DNN: A Deep Neural
Network Fusion Approach to Fast and Robust Pedestrian Detection," in Pro-
ceedings of the IEEE Winter Conference on Applications of Computer Vision,
pp. 953{961, IEEE, 2017.
[9] T. Song, L. Sun, D. Xie, H. Sun, and S. Pu, \Small-Scale Pedestrian Detection
Based on Topological Line Localization and Temporal Feature Aggregation,"
in Proceedings of the European Conference on Computer Vision, pp. 554{569,
Springer International Publishing, 2018.
[10] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, \Realtime Multi-Person 2D Pose
Estimation Using Part Anity Fields," in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 7291{7299, 2017.
[11] C. Zhou, M. Wu, and S.-K. Lam, \SSA-CNN: Semantic Self-Attention CNN
for Pedestrian Detection," arXiv preprint arXiv:1902.09080, 2019.
[12] S. Ren, K. He, R. Girshick, and J. Sun, \Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks," IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137{1149, 2016.
[13] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C.-Y. Fu, and A. C.
Berg, \SSD: Single Shot MultiBox Detector," in Proceedings of the European
Conference on Computer Vision, 2016.
[14] J. Redmon and A. Farhadi, \YOLO9000: Better, Faster, Stronger," in Pro-
ceedings of the IEEE conference on Computer Vision and Pattern Recognition,
pp. 7263{7271, 2017.
[15] J. Redmon and A. Farhadi, \Yolov3: An Incremental Improvement," arXiv
preprint arXiv:1804.02767, 2018.
[16] X. Zhou, D. Wang, and P. Krahenbuhl, \Objects as Points," in arXiv preprint
arXiv:1904.07850, 2019.
[17] X. Zhou, J. Zhuo, and P. Krahenbuhl, \Bottom-Up Object Detection by Grouping
Extreme and Center Points," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 850{859, 2019.
[18] H. Law and J. Deng, \CornerNet: Detecting Objects as Paired Keypoints,"
in Proceedings of the European Conference on Computer Vision, pp. 734{750,
2018.
[19] A. Newell, K. Yang, and J. Deng, \Stacked Hourglass Networks for Human Pose
Estimation," in Proceedings of the European Conference on Computer Vision,
pp. 483{499, 2016.
[20] H. Fang, S. Xie, Y. Tai, and C. Lu, \RMPE: Regional Multi-person Pose Estimation,"
in Proceedings of the International Conference on Computer Vision,
pp. 2353{2362, Oct 2017.
[21] P. Hu and D. Ramanan, \Finding Tiny Faces," in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 951{959, 2017.
[22] M. Najibi, P. Samangouei, R. Chellappa, and L. S. Davis, \SSH: Single Stage
Headless Face Detector," in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 4875{4884, 2017.
[23] Y. Bai and B. Ghanem, \Multi-Branch Fully Convolutional Network for Face
Detection," arXiv preprint arXiv:1707.06330, 2017.
[24] Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, \Finding Tiny Faces in the Wild
with Generative Adversarial Network," in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 21{30, 2018.
[25] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, \Generative Adversarial Nets," in Proceedings of
the Neural Information Processing Systems, pp. 2672{2680, 2014.
[26] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, \Pyramid Scene Parsing Network,"
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, pp. 6230{6239, 2017.
[27] P. Bilinski and V. Prisacariu, \Dense Decoder Shortcut Connections for Single-
Pass Semantic Segmentation," in Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, June 2018.
[28] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, \Aggregated Residual Transformations
for Deep Neural Networks," in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 1492{1500, 2017.
[29] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, \BiSeNet: Bilateral
Segmentation Network for Real-Time Semantic Segmentation," in Proceedings
of the European Conference on Computer Vision (ECCV), pp. 325{341, 2018.
[30] D. Xu, W.Wang, H. Tang, H. Liu, N. Sebe, and E. Ricci, \Structured Attention
Guided Convolutional Neural Fields for Monocular Depth Estimation," in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 3917{3925, 2018.
[31] C. Godard, O. Mac Aodha, M. Firman, and G. Brostow, \Digging into Self-
Supervised Monocular Depth Estimation," arXiv preprint arXiv:1806.01260,
2018.
[32] K. Simonyan and A. Zisserman, \Very Deep Convolutional Networks for Largescale
Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
[33] C. Lin, J. Lu, G. Wang, and J. Zhou, \Graininess-Aware Deep Feature Learning
for Pedestrian Detection," in Proceedings of the European Conference on
Computer Vision, September 2018.
[34] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,
U. Franke, S. Roth, and B. Schiele, \The Cityscapes Dataset for Semantic Urban
Scene Understanding," in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 3213{3223, 2016.
[35] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, \Sparsity
Invariant CNNs," in Proceedings of the International Conference on 3D
Vision, 2017.
[36] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama,
and T. Darrell, \Cae: Convolutional Architecture for Fast Feature Embedding,"
arXiv preprint arXiv:1408.5093, 2014.
[37] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei, \ImageNet: A Large-
Scale Hierarchical Image Database," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 248{255, 2009.
[38] W. Ouyang, H. Zhou, H. Li, Q. Li, J. Yan, and X. Wang, \Jointly Learning
Deep Features, Deformable Parts, Occlusion and Classication for Pedestrian
Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 40, pp. 1874{1887, Aug 2018.