研究生: |
陳彥翔 Yan-Xiang Chen |
---|---|
論文名稱: |
融合物件偵測與前景分割之行人偵測 Fusion of Object Detection and Foreground Segmentation for Pedestrian Detection. |
指導教授: |
徐繼聖
Gee-Sern Hsu |
口試委員: |
鄭文皇
Wen-Huang Cheng 周碩彥 Shuo-Yan Chou 鍾聖倫 Sheng-Luen Chung |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 機械工程系 Department of Mechanical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 64 |
中文關鍵詞: | 物件偵測 、前景分割 、行人偵測 |
外文關鍵詞: | Object Detection, Foreground Segmentation, Pedestrian Detection |
相關次數: | 點閱:548 下載:21 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習的蓬勃發展,物件偵測器的能力也越來越強大,作為物件種類之一的行人偵測當然也不例外。許多以物件偵測器為基礎,針對行人偵測而改善的偵測器也相繼被提出,然而這些行人偵測器的準確度雖然得到大幅度的改善,但在速度上卻無法達到即時。在本論文中我們將探討圖像分割網路與行人偵測網路的優缺點,並提出一融合圖像分割與物件偵測的行人偵測網路架構。本文提出之網路僅使用單一深層架構,即可同時進行圖像分割與偵測行人的多任務預測,並利用圖像分割之結果,抑制行人偵測器之假陽性(False Positives)偵測的發生率。此方法在Caltech資料庫上雖然略遜於其他深度學習網路,僅達到錯失率(Missing Rate) 15.46%的表現(目前最佳為SDS-RCNN的7.36%),但在速度上卻可作到15FPS的偵測速度(SDS-RCNN為5FPS)。
We propose an integrated network that combines the Fully Connected Network (FCN) and the Single Shot Multi-box Detector (SSD) for fast pedestrian detection. The FCN is good for image segmentation, and the SSD is good for fast object detection. However, the SSD suffers from false positives in many cases. The foreground segments from the FCN are exploited to suppress the false positives. Compared with other methods that combine detection networks and segmentation networks, many outperform the proposed network for 1%~8% better in the Missing Rate on Caltech database. However, the proposed network reaches 15 FPS in speed, and others can only reach 5 FPS.
[1] Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014.
[2] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
[3] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.
[4] Girshick, Ross, et al. "Region-based convolutional networks for accurate object detection and segmentation." IEEE transactions on pattern analysis and machine intelligence 38.1 (2016): 142-158.
[5] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
[6] Everingham, Mark, et al. "The PASCAL visual object classes challenge 2007 (VOC2007) results." (2007).
[7] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[8] Redmon, Joseph, et al. "YOLO9000: Better, Faster, Stronger" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[9] Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
[10] Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
[11] Hariharan, Bharath, et al. "Hypercolumns for object segmentation and fine-grained localization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[12] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." ICLR, 2015.
[13] Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep Face Recognition." BMVC. Vol. 1. No. 3. 2015.
[14] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[15] Jonathan Long, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation" CVPR, 2015.
[16] L. Zhang, L. Lin, X. Liang, K. He. " Is Faster R-CNN Doing Well for Pedestrian Detection?" ECCV, 2016
[17] Li, Jianan, et al. "Scale-aware fast R-CNN for pedestrian detection." IEEE Transactions on Multimedia (2017).
[18] Cai, Zhaowei, et al. "A unified multi-scale deep convolutional neural network for fast object detection." European Conference on Computer Vision. Springer International Publishing, 2016.
[19] Ouyang, Wanli, et al. "Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection." IEEE transactions on pattern analysis and machine intelligence (2017).
[20] Du, Xianzhi, et al. "Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection." Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. IEEE, 2017.
[21] Brazil, Garrick, Xi Yin, and Xiaoming Liu. "Illuminating Pedestrians via Simultaneous Detection & Segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2017.
[22] J. Mao, T. Xiao, Y. Jiang and Z. Cao, "What Can Help Pedestrian Detection?," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6034-6043.
[23] S. Zhang, R. Benenson, M. Omran, J. Hosang, and B. Schiele. "How far are we from solving pedestrian detection?" In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1259–1267, 2016.
[24] P. Doll´ar, C. Wojek, B. Schiele, and P. Perona. "Pedestrian detection: A benchmark." In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 304–311. IEEE, 2009.
[25] A. Geiger, P. Lenz, and R. Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." In Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[26] S. Zhang, R. Benenson and B. Schiele, "CityPersons: A Diverse Dataset for Pedestrian Detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 4457-4465.
[27] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. "The cityscapes dataset for semantic urban scene understanding." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223, 2016.
[28] C. Wojek, S. Walk, and B. Schiele. "Multi-cue onboard pedestrian detection. " In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2009.
[29] A. Ess, B. Leibe and L. Van Gool, "Depth and Appearance for Mobile Scene Analysis," 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 2007, pp. 1-8.
[30] S. Fidler, R. Mottaghi, A. Yuille, and R. Urtasun. "Bottom-up segmentation for top-down detection. " In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3294–3301, 2013.
[31] B. Hariharan, P. Arbel´aez, R. Girshick, and J. Malik. "Simultaneous detection and segmentation. " In European Conference on Computer Vision, pages 297–312. Springer, 2014
[32] A. Geiger, P. Lenz and R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 3354-3361.
[33] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick. "Microsoft coco: Common objects in context. " In European Conference on Computer Vision, pages 740–755. Springer, 2014.
[34] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. "Pyramid scene parsing network. " In CVPR, 2017
[35] Zhao, Hengshuang, et al. "ICNet for Real-Time Semantic Segmentation on High-Resolution Images." arXiv preprint arXiv:1704.08545 (2017)
[36] Paszke, Adam, et al. "Enet: A deep neural network architecture for real-time semantic segmentation." arXiv preprint arXiv:1606.02147 (2016).
[37] V. Badrinarayanan, A. Kendall, and R. Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation. " arXiv:1511.00561, 2015.
[38] Appel R., Fuchs T., Doll´ar P., Perona P., "Quickly boosting decision trees-pruning underachieving features early", in JMLR Workshop and Conference Proceedings, JMLR,2013, vol. 28, 594–602.
[39] Piotr Doll´ar, Ron Appel, Serge Belongie, and Pietro Perona. "Fast feature pyramids for object detection." TPAMI, 36(8):1532–1545, 2014.
[40] Viola, P., Jones, M., Snow, D.: "Detecting pedestrians using patterns of motion and appearance." In: CVPR. (2003)