簡易檢索 / 詳目顯示

研究生: 黃莉雯
Li-Wen Huang
論文名稱: MiniNet:基於高效卷積神經網絡之即時人臉檢測系統
MiniNet:An Efficient Convolutional Neural Network for Real-Time Face Detection System
指導教授: 楊英魁
Ying-Kuei Yang
口試委員: 陳俊良
Jiann-Liang Chen
張博綸
Po-Lun Chang
李建南
Chien-Nan Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 95
中文關鍵詞: 卷積神經網絡輕量模型人臉檢測目標檢測圖像識別
外文關鍵詞: convolution neural network, lightweight model, face detection, object detection, image recognition
相關次數: 點閱:257下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本研究提出輕量(lightweight)模型MiniNet,並將其應用於人臉檢測系統上。根據多階段的訓練策略以及MiniNet的有效性,使得該新穎架構能在資源受限的設備端上達到即時檢測,同時保有準確度。
MiniNet由Mini Lower以及Mini Higher組合而成,分別用來提取低階與高階特徵(feature)。Mini Lower利用group卷積(convolution)與通道合併(channel concatenate)提取低階特徵,Mini Higher則是利用可分離的depthwise卷積提取高階特徵,Mini模塊(module)實現的高效卷積,使其大幅減少參數量與計算量,並且於空間維度上引入更多層次所帶來的非線性,可提升模塊的擬合能力。另外,在模型中加以利用更精細(finer-grained)的特徵搭配多尺度(multi-scale)預測,能改善小目標檢測。
基於一系列的消融實驗加以驗證Mini模塊設計的有效性,並透過對照實驗結果證實MiniNet性能優於先前模型。MiniNet運行速度非常快,在中央處理器(Central Processing Unit,CPU)的檢測時間僅為78ms,較先前模型快一倍,mAP在WIDER FACE數據集上高出7.26%,同時誤判率更低。MiniNet具備的優勢性,使其能泛化於真實世界的應用中,並在速度與準確度之間有所權衡。


A lightweight model MiniNet is proposed for face detection system. With the multi-stage strategy and effectiveness of MiniNet, its novel architecture can achieve efficient real-time detection even on devices with limited resources while still maintaining accuracy.
MiniNet consists of two kinds of Mini modules: Mini Lower and Mini Higher. They are designed for extraction of different features at low and high levels respectively. Mini Lowers utilize group convolution and channel concatenation to extract low-level features, while Mini Highers utilize depthwise separable convolution to extract high-level features. The parameter numbers and computation cost are substantially reduced due to the efficient convolution that Mini modules have employed. Moreover, the adoption of nonlinearity in spatial dimension brings more benefit for Mini modules. It’s worth noting that the detection on small objects can be greatly improved by the finer-grained features and multi-scale predictions in the proposed MiniNet.
The strength of Mini modules has been verified by a series of ablation experiments. The results of conducted experiments demonstrate how MiniNet outperforms the prior models. MiniNet is extremely fast with its detection time of only 78ms which is 2 times faster when compared with the performance on CPU. On WIDER FACE dataset, the mAP is 7.26% higher and false positives are less as well. The superiority possessed by MiniNet can be generally applied to real world applications with an excellent trade-off between speed and accuracy.

摘要......I Abstract......II 致謝......III 目錄......IV 圖目錄......VI 表目錄......VIII 第一章 緒論......1 1.1 研究背景與動機......1 1.2 研究目的與問題......2 1.3 論文架構......2 第二章 文獻探討......3 2.1 神經網絡......3 2.1.1 感知器......3 2.1.2 反向傳播......5 2.1.3 激勵函數......7 2.2 卷積模塊......8 2.2.1 卷積神經網絡......8 2.2.2 串聯式模塊......10 2.2.3 並聯式模塊......12 2.2.4 殘差模塊......13 2.3 輕量卷積......15 2.3.1 Group 卷積......15 2.3.2 Depthwise 卷積......16 2.4 目標檢測......17 2.4.1 Two Stages......17 2.4.2 One Stage......20 第三章 MiniNet 模型......23 3.1 檢測系統......23 3.1.1 系統流程......24 3.1.2 輸入前處理......25 3.1.3 模型架構......26 3.2 運作模式......27 3.2.1 訓練階段......29 3.2.2 測試階段......37 3.3 Mini 模塊設計......38 3.3.1 Mini Lower......38 3.3.2 Mini Higher......41 3.4 檢測模型......43 3.4.1 YOLOv3-tiny......43 3.4.2 MiniNet......45 第四章 實驗結果與分析......47 4.1 開發環境......47 4.2 數據集與預處理......48 4.2.1 WIDER FACE......48 4.2.2 數據集預處理......51 4.2.2.1 過濾與篩選......51 4.2.2.2 統計與聚類......55 4.3 訓練方法......60 4.3.1 超參數與優化器......60 4.3.2 數據增強......66 4.4 實驗結果與比較分析......69 4.4.1 空間與時間複雜度......70 4.4.2 實驗結果分析......72 第五章 結論與未來展望......78 5.1 結論......78 5.2 未來展望......79 參考文獻......80

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe Nevada USA, vol. 1, Dec. 2012, pp. 1097-1105.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, Apr. 2015.
[3] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556v6, Apr. 2015.
[4] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston MA USA, Jun. 2015, pp. 1-9.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas NV USA, Jun. 2016, pp. 770-778.
[6] W. S. McCulloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, Dec. 1943.
[7] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Psychological Review, vol. 65, no. 6, pp. 386-408, Dec. 1958.
[8] D. H. Hubel and T. N. Wiesel, “Receptive Fields, Binocular Interaction and Functional Architecture in the Cat’s Visual Cortex,” The Journal of Physiology, vol. 160, no. 1, pp. 106-154, Jan. 1962.
[9] K. Fukushima, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biological Cybernetics, vol. 36, no. 4, pp. 193-202, Apr. 1980.
[10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[11] M. Lin, Q. Chen, and S. Yan, “Network in Network,” arXiv preprint arXiv: 1312.4400v3, Mar. 2014.
[12] D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory, New York: John Wiley & Sons, Inc.; London: Chapman & Hall, Limited, 1949.
[13] S. Arora, A. Bhaskara, R. Ge, and T. Ma, “Provable Bounds for Learning Some Deep Representations,” Proceedings of the 31st International Conference on Machine Learning, Beijing China, vol. 32, Jun. 2014, pp. 584-592.
[14] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust Object Recognition with Cortex-Like Mechanisms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” arXiv preprint arXiv:1603.05027v3, Jul. 2016.
[16] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size,” arXiv preprint arXiv:1602.07360v4, Nov. 2016.
[17] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated Residual Transformations for Deep Neural Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu HI USA, Jul. 2017, pp. 5987-5995.
[18] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City UT USA, Jun. 2018, pp. 6848-6856.
[19] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu HI USA, Jul. 2017, pp. 1800-1807.
[20] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint arXiv:1704.04861v1, Apr. 2017.
[21] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City UT USA, Jun. 2018, pp. 4510-4520.
[22] N. Ma, X. Zhang. H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design,” arXiv preprint arXiv:1807.11164v1, Jul. 2018.
[23] L. Sifre, “Rigid-Motion Scattering for Image Classification,” Ph. D. thesis, 2014.
[24] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus OH USA, Jun. 2014, pp. 580-587.
[25] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, Sep. 2013.
[26] R. Girshick, “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision, Santiago Chile, Dec. 2015, pp. 1440-1448.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, Sep. 2015.
[28] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, Jun. 2017.
[29] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas NV USA, Jun. 2016, pp. 779-788.
[30] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu HI USA, Jul. 2017, pp. 6517-6525.
[31] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767v1, Apr. 2018.
[32] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu HI USA, Jul. 2017, pp. 936-944.
[33] S. Yang, P. Luo, C.C. Loy, and X. Tang, “WIDER FACE: A Face Detection Benchmark,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas NV USA, Jun. 2016, pp. 5525-5533.
[34] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” Proceedings of the 32nd International Conference on Machine Learning, Lille France, vol. 37, Jul. 2015, pp. 448-456.
[35] X. Zhu and D. Ramanan, “Face Detection, Pose Estimation, and Landmark Localization in the Wild,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence RI USA, Jun. 2012, pp. 2879-2886.
[36] V. Jain and E. Learned-Miller, “FDDB: A Benchmark for Face Detection in Unconstrained Settings,” Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst, 2010.
[37] Y. Xiong, K. Zhu, D. Lin, and X. Tang, “Recognize Complex Events from Static Images by Fusing Deep Channels,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston MA USA, Jun. 2015, pp. 1600-1609.
[38] S. Ruder, “An Overview of Gradient Descent Optimization Algorithms,” arXiv preprint arXiv:1609.04747v2, Jun. 2017.
[39] A. Krogh and J. A. Hertz, “A Simple Weight Decay Can Improve Generalization,” Proceedings of the 4th International Conference on Neural Information Processing Systems, Denver Colorado USA, Dec. 1991, pp. 950-957.
[40] N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithms,” Neural Networks: The Official Journal of the International Neural Network Society, vol. 12, no. 1, pp. 145-151, Jan. 1999.
[41] T. Tieleman and G. Hinton, Lecture 6.5-RMSprop: Divide the Gradient by a Running Average of Its Recent Magnitude, COURSERA: Neural Networks for Machine Learning, pp. 26-31. 2012.
[42] D. P. Kingma and J. L. Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980v9, Jan. 2017.
[43] A. R. Smith, “Color Gamut Transform Pairs,” ACM SIGGRAPH Computer Graphics, vol. 12, no. 3, pp. 12-19, Aug. 1978.
[44] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” European Conference on Computer Vision, Zurich Switzerland, vol. 8693, Sep. 2014, pp. 740-755.
[45] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge: A Retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98-136, Jun. 2014.

QR CODE