研究生: |
黃莉雯 Li-Wen Huang |
---|---|
論文名稱: |
MiniNet:基於高效卷積神經網絡之即時人臉檢測系統 MiniNet:An Efficient Convolutional Neural Network for Real-Time Face Detection System |
指導教授: |
楊英魁
Ying-Kuei Yang |
口試委員: |
陳俊良
Jiann-Liang Chen 張博綸 Po-Lun Chang 李建南 Chien-Nan Lee |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 95 |
中文關鍵詞: | 卷積神經網絡 、輕量模型 、人臉檢測 、目標檢測 、圖像識別 |
外文關鍵詞: | convolution neural network, lightweight model, face detection, object detection, image recognition |
相關次數: | 點閱:257 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究提出輕量(lightweight)模型MiniNet,並將其應用於人臉檢測系統上。根據多階段的訓練策略以及MiniNet的有效性,使得該新穎架構能在資源受限的設備端上達到即時檢測,同時保有準確度。
MiniNet由Mini Lower以及Mini Higher組合而成,分別用來提取低階與高階特徵(feature)。Mini Lower利用group卷積(convolution)與通道合併(channel concatenate)提取低階特徵,Mini Higher則是利用可分離的depthwise卷積提取高階特徵,Mini模塊(module)實現的高效卷積,使其大幅減少參數量與計算量,並且於空間維度上引入更多層次所帶來的非線性,可提升模塊的擬合能力。另外,在模型中加以利用更精細(finer-grained)的特徵搭配多尺度(multi-scale)預測,能改善小目標檢測。
基於一系列的消融實驗加以驗證Mini模塊設計的有效性,並透過對照實驗結果證實MiniNet性能優於先前模型。MiniNet運行速度非常快,在中央處理器(Central Processing Unit,CPU)的檢測時間僅為78ms,較先前模型快一倍,mAP在WIDER FACE數據集上高出7.26%,同時誤判率更低。MiniNet具備的優勢性,使其能泛化於真實世界的應用中,並在速度與準確度之間有所權衡。
A lightweight model MiniNet is proposed for face detection system. With the multi-stage strategy and effectiveness of MiniNet, its novel architecture can achieve efficient real-time detection even on devices with limited resources while still maintaining accuracy.
MiniNet consists of two kinds of Mini modules: Mini Lower and Mini Higher. They are designed for extraction of different features at low and high levels respectively. Mini Lowers utilize group convolution and channel concatenation to extract low-level features, while Mini Highers utilize depthwise separable convolution to extract high-level features. The parameter numbers and computation cost are substantially reduced due to the efficient convolution that Mini modules have employed. Moreover, the adoption of nonlinearity in spatial dimension brings more benefit for Mini modules. It’s worth noting that the detection on small objects can be greatly improved by the finer-grained features and multi-scale predictions in the proposed MiniNet.
The strength of Mini modules has been verified by a series of ablation experiments. The results of conducted experiments demonstrate how MiniNet outperforms the prior models. MiniNet is extremely fast with its detection time of only 78ms which is 2 times faster when compared with the performance on CPU. On WIDER FACE dataset, the mAP is 7.26% higher and false positives are less as well. The superiority possessed by MiniNet can be generally applied to real world applications with an excellent trade-off between speed and accuracy.
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe Nevada USA, vol. 1, Dec. 2012, pp. 1097-1105.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, Apr. 2015.
[3] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556v6, Apr. 2015.
[4] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston MA USA, Jun. 2015, pp. 1-9.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas NV USA, Jun. 2016, pp. 770-778.
[6] W. S. McCulloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, Dec. 1943.
[7] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Psychological Review, vol. 65, no. 6, pp. 386-408, Dec. 1958.
[8] D. H. Hubel and T. N. Wiesel, “Receptive Fields, Binocular Interaction and Functional Architecture in the Cat’s Visual Cortex,” The Journal of Physiology, vol. 160, no. 1, pp. 106-154, Jan. 1962.
[9] K. Fukushima, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biological Cybernetics, vol. 36, no. 4, pp. 193-202, Apr. 1980.
[10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[11] M. Lin, Q. Chen, and S. Yan, “Network in Network,” arXiv preprint arXiv: 1312.4400v3, Mar. 2014.
[12] D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory, New York: John Wiley & Sons, Inc.; London: Chapman & Hall, Limited, 1949.
[13] S. Arora, A. Bhaskara, R. Ge, and T. Ma, “Provable Bounds for Learning Some Deep Representations,” Proceedings of the 31st International Conference on Machine Learning, Beijing China, vol. 32, Jun. 2014, pp. 584-592.
[14] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust Object Recognition with Cortex-Like Mechanisms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” arXiv preprint arXiv:1603.05027v3, Jul. 2016.
[16] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size,” arXiv preprint arXiv:1602.07360v4, Nov. 2016.
[17] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated Residual Transformations for Deep Neural Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu HI USA, Jul. 2017, pp. 5987-5995.
[18] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City UT USA, Jun. 2018, pp. 6848-6856.
[19] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu HI USA, Jul. 2017, pp. 1800-1807.
[20] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint arXiv:1704.04861v1, Apr. 2017.
[21] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City UT USA, Jun. 2018, pp. 4510-4520.
[22] N. Ma, X. Zhang. H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design,” arXiv preprint arXiv:1807.11164v1, Jul. 2018.
[23] L. Sifre, “Rigid-Motion Scattering for Image Classification,” Ph. D. thesis, 2014.
[24] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus OH USA, Jun. 2014, pp. 580-587.
[25] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, Sep. 2013.
[26] R. Girshick, “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision, Santiago Chile, Dec. 2015, pp. 1440-1448.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, Sep. 2015.
[28] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, Jun. 2017.
[29] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas NV USA, Jun. 2016, pp. 779-788.
[30] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu HI USA, Jul. 2017, pp. 6517-6525.
[31] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767v1, Apr. 2018.
[32] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu HI USA, Jul. 2017, pp. 936-944.
[33] S. Yang, P. Luo, C.C. Loy, and X. Tang, “WIDER FACE: A Face Detection Benchmark,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas NV USA, Jun. 2016, pp. 5525-5533.
[34] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” Proceedings of the 32nd International Conference on Machine Learning, Lille France, vol. 37, Jul. 2015, pp. 448-456.
[35] X. Zhu and D. Ramanan, “Face Detection, Pose Estimation, and Landmark Localization in the Wild,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence RI USA, Jun. 2012, pp. 2879-2886.
[36] V. Jain and E. Learned-Miller, “FDDB: A Benchmark for Face Detection in Unconstrained Settings,” Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst, 2010.
[37] Y. Xiong, K. Zhu, D. Lin, and X. Tang, “Recognize Complex Events from Static Images by Fusing Deep Channels,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston MA USA, Jun. 2015, pp. 1600-1609.
[38] S. Ruder, “An Overview of Gradient Descent Optimization Algorithms,” arXiv preprint arXiv:1609.04747v2, Jun. 2017.
[39] A. Krogh and J. A. Hertz, “A Simple Weight Decay Can Improve Generalization,” Proceedings of the 4th International Conference on Neural Information Processing Systems, Denver Colorado USA, Dec. 1991, pp. 950-957.
[40] N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithms,” Neural Networks: The Official Journal of the International Neural Network Society, vol. 12, no. 1, pp. 145-151, Jan. 1999.
[41] T. Tieleman and G. Hinton, Lecture 6.5-RMSprop: Divide the Gradient by a Running Average of Its Recent Magnitude, COURSERA: Neural Networks for Machine Learning, pp. 26-31. 2012.
[42] D. P. Kingma and J. L. Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980v9, Jan. 2017.
[43] A. R. Smith, “Color Gamut Transform Pairs,” ACM SIGGRAPH Computer Graphics, vol. 12, no. 3, pp. 12-19, Aug. 1978.
[44] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” European Conference on Computer Vision, Zurich Switzerland, vol. 8693, Sep. 2014, pp. 740-755.
[45] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge: A Retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98-136, Jun. 2014.