簡易檢索 / 詳目顯示

研究生: 王聖鈞
Sheng-Chun Wang
論文名稱: 以Haar特徵分類及CNN迴歸分析之深度學習用於微笑表情偵測研究
A Study on Deep Learning of Smile Detection using Haar-like Features and CNN-Based Regression Analysis
指導教授: 胡國瑞
Kuo-Jui Hu
孫沛立
Pei-Li Sun
口試委員: 胡國瑞
Kuo-Jui Hu
孫沛立
Pei-Li Sun
徐明景
Ming-Ching Shyu
學位類別: 碩士
Master
系所名稱: 應用科技學院 - 色彩與照明科技研究所
Graduate Institute of Color and Illumination Technology
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 91
中文關鍵詞: Haar特徵AdaBoost學習演算CNN卷積神經網路TensorFlow Keras 演算深度學習微笑偵測
外文關鍵詞: Haar-like features, AdaBoost learning algorithm, convolutional neural network, TensorFlow Keras, deep learning, smile detection
相關次數: 點閱:458下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   微笑是人類心理表現的重要特徵之一,而微笑的偵測是從人臉辨別開始。在研究中分別採用機器學習的Haar-like特徵分類訓練模型及深度學習的卷積神經網路(Convolutional Neural Network,CNN)演算架構,進行人臉與微笑的偵測辨識,並探討各模型於不同參數變化及條件設定對微笑準確率的影響,進而得知在何種條件下可使模型獲得即時最佳的臉部微笑偵測能力。未來隨著人工智慧(Artificial Intelligence,AI)相關資源的投入與技術的演進,將使偵測速度及準確率更加提升。
      本論文研究可分為二個部份,第一部份採用AdaBoost迭代學習演算法,在每一輪的樣本訓練中加入一個新的弱分類器,再將各個訓練得到的弱分類器集結起來,建構組合成一個強分類器。此一強分類器主要是利用經編程的Haar-like特徵作為人臉偵測之關鍵視覺特徵,然後刪除影像中大部分不重要的背景區塊,以加速影像處理計算所耗費的時間,接著再將影像轉換為HSV色彩空間,用以辨別膚色作為輔助,並設定不同偵測參數條件及影像高度濾除比例與微笑屬性分類,以增進人臉及臉部微笑偵測效果,由此偵測結果之精確率可達82.3%。接著第二部份係將人臉偵測後,採用深度學習之卷積神經網路,結合TensorFlow Keras API多項式迴歸分析擬合微笑曲率偵測微笑程度,並將微笑圖片標記預處理且進行影像資料訓練,接著應用於臉部微笑偵測。隨著訓練數量的增加,準確率可達97.7%。因此由偵測結果得知採本研究新修編CNN結合TensorFlow Keras深度學習系統用於臉部微笑辨識較AdaBoost機器學習演算架構具有更佳效果及高於15%以上的精確性。本研究可應用於醫療足跡框列,情緒分析,國防軍事,服務業之人際互動及智慧製造等產業,在將來深具實用價值。


      Smile is one of the important characteristics of human mental state, and the detection of smile starts with face recognition. In the study, the Haar-like feature classification training model of machine learning and the convolutional neural network (CNN) algorithm framework of deep learning were used to detect and recognize faces and smiles. The effect of different parameters and conditions on the accuracy of smiles were investigated and to obtain the ability of real-time best facial smile detection.
      The research of thesis were divided into two parts. The first added a new weak classifier in every step of sample training at using AdaBoost iterative learning algorithm, and then assemble the weak classifiers from every training frame to combine into a strong classifier. So that could be used as main visual feature of face detection on modified Haar-like classification training model. The unimportant background blocks of image could be deleted and that increased detecting efficiency of image treatment and calculation. The color space were converted into HSV space to increase distinguish of skin color. Effect of different detection parameter, conditions, image size ratio and smile grade on accuracy of faces and smiles were widely studied. From results of the first part found the accuracy of smile detection can be up to 82.3%. The second part, the deep learning of convolutional neural network (CNN) combined with TensorFlow Keras API was used to calculate degree of facial smile through polynomial regression of smile curvature. The facial image was labeled and trained for applying to smile detection, as the number of trained image increase, the overall accuracy ratio of smile recognition will be increased up to 97.7%. The results show that the CNN-based model is 15% higher in the accuracy of smile recognition than the Haar-based model. This research can be applied to industries such as medical footprint tracking, sentiment analysis, national defense and military affairs, emotional reactions of humans and smart manufacturing. The technology will be provided with practical value in the future.

    摘要 Ⅰ ABSTRACT Ⅱ 謝誌 Ⅲ 目錄 Ⅳ 圖目錄 Ⅵ 表目錄 Ⅸ 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 研究架構 5 第二章 文獻探討 7 2.1 臉部影像特徵偵測 8 2.1.1 哈爾特徵級聯分類器 8 2.1.2 哈爾特徵加速求值法 10 2.1.3 方向梯度直方圖 13 2.1.4 局部二值模型(Local Binary Pattern)特徵 15 2.2 設計分類器(Classification) 17 2.2.1 AdaBoost學習演算法 18 2.3 深度學習於臉部及微笑目標偵測法 21 2.3.1 深度學習神經網路的偵測架構 23 2.3.2 神經元 24 2.3.3 深度學習多層神經網路結構 25 2.3.4 神經網路捨棄(Dropout)之作用 27 2.4 卷積神經網路結合Keras框架之偵測原理 27 2.4.1 卷積偵測的演算原理 28 2.4.2 CNN池化層之原理 31 2.4.3 CNN全連接層之原理 31 2.4.4 卷積神經網路之Keras演算架構 32 第三章 研究方法 33 3.1 第一部份:Haar-like特徵於人臉及微笑偵測之研究方法 36 3.1.1 人臉微笑型態之表徵 41 3.1.2 人臉及微笑偵測步驟 41 3.1.3 人臉及微笑ROI之偵測要項 43 3.1.4 色彩轉換及膚色偵測 43 3.2 第二部份:以卷積神經網路應用於臉部微笑偵測之研究過程 45 3.2.1 臉部微笑資料集的建置 45 3.2.2 臉部微笑資料集之讀取及處理 46 3.2.3 CNN偵測網路之建置 47 3.2.4 CNN臉部微笑偵測方法 48 第四章 實驗結果與討論 53 4.1 第一部份:Haar-like特徵於人臉及微笑偵測之實驗結果 54 4.1.1 不同影像minSize對臉部偵測準確率之影響 54 4.1.2 影像Resize插值法對人臉偵測準確率之影響 55 4.1.3 膚色參數對人臉偵測準確率之影響 57 4.1.4 不同scaleFactor及minNeighbors參數組合對人臉偵測準確性之影響 58 4.1.5 臉部影像高度濾除比例對微笑偵測準確率之影響 60 4.1.6 細層微笑分類對臉部微笑偵測準確率之影響 65 4.1.7 臉部微笑之動態偵測效果評估 66 4.2 第二部份:以卷積神經網路應用於人臉微笑偵測之實驗結果 67 4.2.1 卷積神經網路之微笑匹配值與人臉微笑偵測準確率之關係 67 4.2.2 Haar-like與卷積神經網路影像訓練數量之比較 69 4.2.3 卷積神經網路於動態微笑實測評估 71 第五章 結論與建議 72 參考文獻 74

    [1] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. T. Haar Romeny, J. B. Zimmerman, K. Zuiderveld, “Adaptive histogram equalization and its variations,” Computer vision, graphics, and image processing, 39(3): 355–368, 1987.
    [2] P. Viola, M. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
    [3] S. Z. Wang, H. J. Lee, “A cascade framework for a Real-time statistical plate recognition system,” IEEE Transactions on Information Forensics and Security, vol. 2, no. 2, pp. 267-282, 2007.
    [4] T. Kozakaya, S. Ito, S. Kubota, O. Yamaguchi, “Cat face detection with two heterogeneous features,” IEEE International Conference on Computer Vision (ICIP), pp. 1213-1216, 2009.
    [5] N. Dalal, B. Triggs, “Histograms of oriented gradients for human detection,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886-889, 2005.
    [6] W. Yao, Z. A. Deng, “Robust pedestrian detection approach based on shapelet feature and haar detector ensembles,” Tsinghua Science and Technology, vol. 17, no. 1, pp. 40-50, 2012.
    [7] S. K. Singh, D. S. Chauhan, M. Vatsa, R. Singh, “A robust skin color based face detection algorithm,” Journal of Science and Engineering, vol. 6, no. 4, pp. 227-234, 2003.
    [8] J. Yang, X. Ling, Y. Zhu, Z. Zheng, “A face detection and recognition system in color image series,” Mathematics and Computers in Simulation, vol. 77, no. 5, pp. 531-539, 2008.
    [9] B. Wu, R. Nevatia, “Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors,” Proceedings of the Tenth IEEE International Conference on Computer Vision, vol. 1, pp. 90-97, 2005.
    [10] P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple Features,” Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 518-551, 2001.
    [11] 張榮財,黃建銘,「基於曲線擬合及支持向量機之人臉微笑辨識系統」,南台學報,第37卷第1期,頁13-32,2012。
    [12] N. Srivastava, G. E. Hinton, A. Krizhevsky, et al., “Dropout: a simple way to prevent neural networks from overfitting,” J. of Machine Learning Research, 15(1), pp. 1929-1958, 2014.
    [13] C. Lin, “Face detection in complicated backgrounds and different illumination conditions by using YCbCr color space and neural network,” Pattern Recognition Letter, 28, pp. 2190-2200, 2007.
    [14] M. Krzysztof, Kryszczuk, Andrzej Drygajło, “Color correction for face detection based on human visual perception metaphor,” Proc. of the Workshop on Multimodal User Authentication, pp. 138-143, 2003.
    [15] S. Azuan Nazeer, et al., “Face detection using artificial neural network approach,” First Asia International Conference on Modelling & Simulation, 394-399, 2007.
    [16] L. Luo, Y. Xiong, Y. Liu, et al., “Adaptive gradient methods with Dynamic bound of learning rate,” arXiv preprint arXiv: 1902.09843, 2019.
    [17] M. Jose, Chaves-Gonzalez et al., “Detecting skin in face recognition systems: A colour spaces study,” Digital Image Processing, 20, pp. 806-823, 2010.
    [18] P. Viola, M. Jones, “Robust real-time object detection,” Second International Workshop on Statistical and Computational Theories of Vision – Modeling, Learning, Computing, and Sampling, pp. 1-25, 2001.
    [19] S. Z. Wang, H. J. Lee, “A cascade framework for a Real-time statistical plate recognition system,” IEEE Transactions on Information Forensics and Security, vol. 2, no. 2, pp. 267-282, 2007.
    [20] Q. Chen, N. D. Georganas, “Hand Gesture Recognition Using Haar-like Features and a Stochastic Context-Free Grammar,” IEEE Transactions on Instrumentation and Measurement, vol. 57, no. 8, pp. 1562-1571, 2008.
    [21] T. Burghardt, J. Calic, “Analysing animal behavior in wildlife videos using face detection and tracking,” IEE Proceedings - Vision, Image and signal Processing, vol. 153, no. 3, pp. 305-312, 2006.
    [22] Y. Freund, R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” J of Computer and System Science, 55(1), pp. 119-139, 1997.
    [23] John C Russ, 「數字圖像處理,6版﹝M﹞」,余翔宇等譯,電子工業出版社,北京,2014。
    [24] 郭介銘,賴尚宏,「基於深度學習的臉部表情辨識系統」,國立清華大學資訊工程學研究所,碩士論文,新竹,2017。
    [25] 謝斯宇,黃文吉,「基於臉部偵測及CNN模型之硬體臉部辨識系統」,國立臺灣師範大學資訊工程學研究所,碩士論文,台北,2019。
    [26] Al Hussain Akoum, “Real-Time Best Smile Detection,” International Journal of Emerging Trends & Technology in Computer Science, vol. 7, Issue 5, September-October, pp. 8-12, 2018.
    [27] Stan Z. Li, Z. Zhang, “FloatBoost Learning and Statistical Face Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, No. 9, pp. 1112-1123, 2004.
    [28] S. Zafeiriou, C. Zhang, Z. Zhang, “A Survey on Face Detection in the wild: past, present and future,” Computer Vision and Image Understanding, vol. 138, pp. 1-24, September 2015.
    [29] Kaipeng Zhang, Zhanpeng Zhang, Z. Li, Y. Qiao, “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
    [30] Cha Zhang, Zhengyou Zhang, IEEE Winter Conference on Applications of Computer Vision, pp. 1036-1041, 2014.
    [31] 王天慶,「Python人臉識別」,機械工業出版社,北京,2021。
    [32] C. P. Papageorgious, M. Oren, T. Poggio, “A general framework for object detection,” International Conference on Computer Vision, pp. 555-562, 1988.
    [33] R. Lienhart, J. Maydt, “An extended set of Haar-like features for rapid object detection,” International Conference on Image Processing, pp. I-900 I-903, 2002.
    [34] R. Lienhart, A. Kuranov, V. Pisarevsky, “Empirical analysis of detection cascades of boosted classifiers for rapid object detection,” Pattern Recognition - 25th DAGM Symposium, Magdeburg, Germany, pp. 297-304, 2003.
    [35] N. Dalal, B. Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE Computer Society Conference on Computer Vision and PatternRecognition, vol. 1, pp. 886-893, 2005.
    [36] N. Dalal, “Finding people in images and videos,” PhD thesis, Institut Nat’l Polytechnique de Grenoble, 2006.
    [37] T. Ojala, M. Pietikäinen, D. Harwood, “Performance Evaluation of Texture Measures with Classification Based on Kullback Discrimination of Distributions,” Proceedings of 12th International Conference on Pattern Recognition(ICPR), 1, pp. 582-585, 1994.
    [38] L. Zhang, R. Chu, S. Xiang, S. Liao, S. Z. Li, “Face detection based on Multi-Block LBP representation,” International conference ICB, LNCS, vol. 4642, pp. 11-18, 2007.
    [39] T. Ojala, M. Pietikäinen, D. Harwood, “A comparative study of texture measures with classification based on feature distributions,” Pattern Recognition, vol. 29, no. 1, pp. 51-59, 1996.
    [40] Y. Freund, R. Schapire, “A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting,” J. of Computer and System Sciences, pp. 119-139, 1997.
    [41] 機器學習(四)——Adaboost演算法,2019。 https://www.itread01.com/content/1547087057.html (查詢日期:January. 20, 2023)
    [42] Mathew 1, P. Amudha, S. Sivakumari, “Deep Learning Techniques: An Overview,” 2020 Proceedings of Advanced Machine Learning Technologies and Applications, Springer Nature Singapore Pte Ltd., pp. 599-608, 2021.
    [43] https://chih-sheng-huang821.medium.com/什麼是人工智慧-機器學習和深度學習-587e6a0dc72a (查詢日期:February. 9, 2023)
    [44] P. Goyal, S. Pandey, K. Jain, “Introduction to natural language processing and deep learning,” Deep Learning for Natural Language Processing, pp. 1–74, 2018. doi: 10.1007/978-1-4842-3685-7 1.
    [45] Quoc V Le et al., “A tutorial on deep learning part 2: Autoencoders, convolutional neural networks and recurrent neural networks,” Google Brain, pages 1–20, 2015.
    [46] R. Yamashita, M. Nishio, R. K. Gian Do, K. Togashi, “Convolutional neural networks: an overview and application in radiology,” Insights into imaging, 9(4): 611–629, 2018. doi: 10.1007/s13244-018-0639-9.
    [47] LeCun Yann, Léon Bottou, Y. Bengio, P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of The IEEE, pp. 1-46, 1998.
    [48] https://bangqu.com/QTgn28.html (查詢日期:March. 1, 2023)
    [49] V. Nair, Geoffrey E. Hinton, C. Farabet, “Rectified linear units improve restricted Boltzmann machines,” Proceedings of the 27th International Conference on Machine Learning, pp. 807-814, 2010.
    [50] Yann LeCun, Yoshua Bengio, Geoffrey Hinton, “Review Deep Learning,” NATURE, vol. 521, pp. 436-445, 2015.
    [51] 言有三,「深度學習之人臉圖像處理」,機械工業出版社,北京,2021。
    [52] 方園園,「人臉辨識與美顏算法案例實戰」,機械工業出版社,北京,2020。
    [53] D. Yu, H. Wang, P. Chen, et al., “Mixing pooling for convolutional neural networks,” International Conference on Rough Sets and Knowledge Technology, pp. 364-375, 2014.
    [54] 廖源粕,「AI影像深度學習啟蒙:用Python進行人臉口罩識別」,深智數位股份有限公司,台北,2021。
    [55] https://zihuaweng.github.io/2018/06/26/haar-classifier/ (查詢日期:March. 21, 2023)
    [56] M. D. Zeiler, R. Fergus, “Stochastic pooling for regularization of deep convolutional neural Networks﹝J﹞,” arXiv preprint arXiv: 1301.3557, 2013.
    [57] M. Li, L. H. Xu, F. C. Huang, M. Tang, H. B. Wang, “Reconstruction of Bionic Compound Eye Images Based on Superresolution Algorithm,” ICIT´07, IEEE International Conference on Integration Technology, 706-710, 2007.
    [58] https://susanqq.github.io/UTKFace/ (查詢日期:July. 21, 2023)

    QR CODE