簡易檢索 / 詳目顯示

研究生: 陳彥翔
Yan-Xiang Chen
論文名稱: 融合物件偵測與前景分割之行人偵測
Fusion of Object Detection and Foreground Segmentation for Pedestrian Detection.
指導教授: 徐繼聖
Gee-Sern Hsu
口試委員: 鄭文皇
Wen-Huang Cheng
周碩彥
Shuo-Yan Chou
鍾聖倫
Sheng-Luen Chung
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 64
中文關鍵詞: 物件偵測前景分割行人偵測
外文關鍵詞: Object Detection, Foreground Segmentation, Pedestrian Detection
相關次數: 點閱:548下載:21
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著深度學習的蓬勃發展,物件偵測器的能力也越來越強大,作為物件種類之一的行人偵測當然也不例外。許多以物件偵測器為基礎,針對行人偵測而改善的偵測器也相繼被提出,然而這些行人偵測器的準確度雖然得到大幅度的改善,但在速度上卻無法達到即時。在本論文中我們將探討圖像分割網路與行人偵測網路的優缺點,並提出一融合圖像分割與物件偵測的行人偵測網路架構。本文提出之網路僅使用單一深層架構,即可同時進行圖像分割與偵測行人的多任務預測,並利用圖像分割之結果,抑制行人偵測器之假陽性(False Positives)偵測的發生率。此方法在Caltech資料庫上雖然略遜於其他深度學習網路,僅達到錯失率(Missing Rate) 15.46%的表現(目前最佳為SDS-RCNN的7.36%),但在速度上卻可作到15FPS的偵測速度(SDS-RCNN為5FPS)。


We propose an integrated network that combines the Fully Connected Network (FCN) and the Single Shot Multi-box Detector (SSD) for fast pedestrian detection. The FCN is good for image segmentation, and the SSD is good for fast object detection. However, the SSD suffers from false positives in many cases. The foreground segments from the FCN are exploited to suppress the false positives. Compared with other methods that combine detection networks and segmentation networks, many outperform the proposed network for 1%~8% better in the Missing Rate on Caltech database. However, the proposed network reaches 15 FPS in speed, and others can only reach 5 FPS.

摘要 IV Abstract V 誌謝 VI 圖目錄 X 表目錄 XII 第一章 介紹 1 1.1 研究背景和動機 1 1.2 方法概述 2 1.3 論文貢獻 3 1.4 論文架構 4 第二章 文獻回顧 5 2.1 泛物件偵測相關文獻 5 2.1.1 Faster R-CNN (2015) 5 2.1.2 You Only Look Once (YOLO, 2016) 7 2.2 行人偵測相關文獻 8 2.2.1 MS-CNN (2016) 8 2.2.2 SA-FastRCNN (2017) 10 2.2.3 RPN+BF (2016) 11 2.3 以圖像分割輔助行人偵測相關文獻 13 2.3.1 Fused-DNN (2017) 13 2.3.2 SDS-RCNN (2017) 14 第三章 主要方法 17 3.1 SSD行人偵測器 17 3.1.1 網路架構 17 3.1.2 訓練階段 18 3.1.3 測試階段 20 3.1.4 針對行人偵測進行偵測器改良 21 3.2 Fully Convolutional Nets for Semantic Segmentation 21 3.2.1 Convolutionalization 21 3.2.2 Upsampling 22 3.2.3 Skip Architecture 23 3.3 同時進行物件偵測與圖像分割 24 3.3.1 加入圖像分割網路的SSD結構 24 3.3.2 弱圖像分割 25 3.3.3 Cost-Sensitive Weight 25 3.3.4 融合物件偵測與前景分割 26 3.3.5 方法總述 27 第四章 實驗設置與分析 28 4.1 行人標準資料庫介紹 28 4.1.1 Caltech介紹 28 4.1.2 TUD-Brussels and TUD-MotionPairs (TUD)介紹 29 4.1.3 ETH介紹 30 4.1.4 KITTI介紹 30 4.1.5 Cityscape介紹 31 4.2 實驗設計 31 4.3 實驗結果與分析 32 4.3.1 SSD預訓練模型效能比較 32 4.3.2 Default Boxes設置分析 34 4.3.3 融合圖像分割之SSD效能比較 36 4.3.4 在智慧校園專案上的成效 39 第五章 結論與未來研究方向 42 第六章 參考文獻 43 第七章 附錄 46 7.1 卷積類神經網路 (Convolutional Neural Network) 46 7.1.1 Feedforward 47 7.1.2 Backpropagation 47 7.1.3 Convolution 48 7.1.4 Max Pooling 49 7.1.5 Inverted Dropout 49

[1] Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014.
[2] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
[3] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.
[4] Girshick, Ross, et al. "Region-based convolutional networks for accurate object detection and segmentation." IEEE transactions on pattern analysis and machine intelligence 38.1 (2016): 142-158.
[5] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
[6] Everingham, Mark, et al. "The PASCAL visual object classes challenge 2007 (VOC2007) results." (2007).
[7] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[8] Redmon, Joseph, et al. "YOLO9000: Better, Faster, Stronger" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[9] Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
[10] Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
[11] Hariharan, Bharath, et al. "Hypercolumns for object segmentation and fine-grained localization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[12] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." ICLR, 2015.
[13] Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep Face Recognition." BMVC. Vol. 1. No. 3. 2015.
[14] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[15] Jonathan Long, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation" CVPR, 2015.
[16] L. Zhang, L. Lin, X. Liang, K. He. " Is Faster R-CNN Doing Well for Pedestrian Detection?" ECCV, 2016
[17] Li, Jianan, et al. "Scale-aware fast R-CNN for pedestrian detection." IEEE Transactions on Multimedia (2017).
[18] Cai, Zhaowei, et al. "A unified multi-scale deep convolutional neural network for fast object detection." European Conference on Computer Vision. Springer International Publishing, 2016.
[19] Ouyang, Wanli, et al. "Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection." IEEE transactions on pattern analysis and machine intelligence (2017).
[20] Du, Xianzhi, et al. "Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection." Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. IEEE, 2017.
[21] Brazil, Garrick, Xi Yin, and Xiaoming Liu. "Illuminating Pedestrians via Simultaneous Detection & Segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2017.
[22] J. Mao, T. Xiao, Y. Jiang and Z. Cao, "What Can Help Pedestrian Detection?," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6034-6043.
[23] S. Zhang, R. Benenson, M. Omran, J. Hosang, and B. Schiele. "How far are we from solving pedestrian detection?" In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1259–1267, 2016.
[24] P. Doll´ar, C. Wojek, B. Schiele, and P. Perona. "Pedestrian detection: A benchmark." In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 304–311. IEEE, 2009.
[25] A. Geiger, P. Lenz, and R. Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." In Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[26] S. Zhang, R. Benenson and B. Schiele, "CityPersons: A Diverse Dataset for Pedestrian Detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 4457-4465.
[27] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. "The cityscapes dataset for semantic urban scene understanding." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223, 2016.
[28] C. Wojek, S. Walk, and B. Schiele. "Multi-cue onboard pedestrian detection. " In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2009.
[29] A. Ess, B. Leibe and L. Van Gool, "Depth and Appearance for Mobile Scene Analysis," 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 2007, pp. 1-8.
[30] S. Fidler, R. Mottaghi, A. Yuille, and R. Urtasun. "Bottom-up segmentation for top-down detection. " In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3294–3301, 2013.
[31] B. Hariharan, P. Arbel´aez, R. Girshick, and J. Malik. "Simultaneous detection and segmentation. " In European Conference on Computer Vision, pages 297–312. Springer, 2014
[32] A. Geiger, P. Lenz and R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 3354-3361.
[33] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick. "Microsoft coco: Common objects in context. " In European Conference on Computer Vision, pages 740–755. Springer, 2014.
[34] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. "Pyramid scene parsing network. " In CVPR, 2017
[35] Zhao, Hengshuang, et al. "ICNet for Real-Time Semantic Segmentation on High-Resolution Images." arXiv preprint arXiv:1704.08545 (2017)
[36] Paszke, Adam, et al. "Enet: A deep neural network architecture for real-time semantic segmentation." arXiv preprint arXiv:1606.02147 (2016).
[37] V. Badrinarayanan, A. Kendall, and R. Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation. " arXiv:1511.00561, 2015.
[38] Appel R., Fuchs T., Doll´ar P., Perona P., "Quickly boosting decision trees-pruning underachieving features early", in JMLR Workshop and Conference Proceedings, JMLR,2013, vol. 28, 594–602.
[39] Piotr Doll´ar, Ron Appel, Serge Belongie, and Pietro Perona. "Fast feature pyramids for object detection." TPAMI, 36(8):1532–1545, 2014.
[40] Viola, P., Jones, M., Snow, D.: "Detecting pedestrians using patterns of motion and appearance." In: CVPR. (2003)

QR CODE