簡易檢索 / 詳目顯示

研究生: 蕭力瑋
Li-Wei Hsiao
論文名稱: 臉部特徵於謊言辨識探究 : 一種基於半監督學習的面部線索欺騙檢測方法
Deception Features Determination: A Method for Deception Detection Based on Semi-Supervised Learning
指導教授: 郭景明
Jing-Ming Guo
口試委員: 黃志良
Chih-Lyang Hwang
鍾國亮
Kuo-Liang Chung
郭天穎
Tien-Ying Kuo
莊仁輝
Jen-Hui Chuang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 99
中文關鍵詞: 欺騙檢測臉部微表情特徵可視化半監督學習
外文關鍵詞: Deception Detection, Facial Micro-Expression, Feature Visualization, Semi-Supervised Learning
相關次數: 點閱:217下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 數十年來,許多的研究學者投入研究判別人們是否在說謊。欺騙的跡象不是單一的,且需同時考慮從臉部或身體姿態所觀察到不同的非語言交流。近年來,在科學領域中欺騙檢測的議題逐漸受到重視,相關的欺騙檢測應用能被用來判斷人們說話的真實性和可信度。因此,需要有效的方法來找出影片中欺騙的線索。
    本論文主要探討臉部特徵用於謊言辨識上,並討論從影像分析上進行欺騙檢測之可行性。藉由特徵可視化方法可觀察到深度學習模型是否有關注在與欺騙特徵相關的微表情上。此外,論文中討論關於臉部說謊特徵的相關心理學文獻,相互比較以藉此證明深度學習找到欺騙線索的有效性與可行性。
    實驗中使用Real-Life資料庫與Bag-of-lies資料庫進行驗證並與效能比較。為了確保同一個受測者的影像不會同時出現在訓練集和測試集中,所有資料集藉由不重疊的受測者身分的方式分為訓練集與驗證集。主要原因是避免深度學習模型學到特定受測者的特徵,以防模型在辨識該受測者是否說謊時受到受測者身分所影響。同時也提出一種驗證欺騙檢測特徵是否可靠的演算法:藉由臉部辨識方法,計算模型提取出來的欺騙特徵是否能分辨出該資料集的每位身分。若辨識準確度高,代表特徵基於辨識身分為主,並不代表特徵觀察微表情線索,因此即使測謊資料集的表現再好,準確度再高,但實際上模型是無法作為欺騙檢測,也無法觀察微表情。
    最後,本論文提出一種新概念。由於並非每部說謊影片中的每一幀影像都存在具有代表性的欺騙特徵,然而過去的研究卻將說謊影片中的每一幀影像標註為說謊的正樣本進行模型訓練,導致模型受到雜訊標註的影響。因此,實驗中利用半監督式學習的方式讓模型訓練上不會因為強制標註問題,導致模型學習受到限制。實驗的結果證明所提出的驗證分析中所得到的欺騙檢測模型能得到較有效且穩定的欺騙特徵線索。


    For decades, many researchers have been investigating how to differentiate whether a subject is lying. The cues of deception are not trivial, and many non-verbal communication observed from either facial expressions or body gestures are taken into consideration. In recent years, deception detection has drawn much attention and its applications have been widely adopted to judge the authenticity and credibility of people's words. Thus, it is highly demanded to study what the deceptive clues in the videos can be used for deception detection effectively.
    In this thesis, the facial features for deception detection are focused, and the feasibility of deception detection from video clips is discussed. The method of feature visualization is used to verify if the deep learning models really focus on the micro-expression for deception detection. In addition, the facial deceptive features from the psychological perspective are investigated for cross-reference.
    The datasets of Real-Life and Bag-of-Lies were used in the experiments for performance evaluation. All videos were partitioned into the training set and validation set by subjects, and thus no subject appears in both the training data and validation data. The reason of that is to avoid the model from learning the facial features to identify subjects, and the experimental results proved the validity of such setting. Theoretically, the deception detection is irrelevant to subjects. Thus, to justify if the models are reliable for deception detection, the learned features were used for subject identification to check if the deception detection was conducted under the influence of identifying subjects, which is not supposed to be.
    Finally, a novel concept is proposed in this study. Since not every frame in the deceptive videos is representative as the deceptive cues; however, all frames in the deceptive videos were labeled as the positive samples for deception detection in the previous studies, resulting in noisy samples for training. Therefore, the semi-supervised learning is applied to tackle the problem of hard labelling that cause noisy samples in the tasks. Experimental results show that the validity and the robustness of the models obtained by the proposed method can be assured by the previously proposed analysis.

    摘要 Abstract 致謝 目錄 圖目錄 表目錄 第一章 緒論 1.1研究背景 1.2研究動機 1.3 論文架構 第二章 文獻探討 2.1 類神經網路介紹 2.1.1 類神經網路 2.1.2 卷積神經網路 2.1.3 卷積神經網路之訓練方法 2.1.4 卷積神經網路之發展 2.1.5 卷積神經網路之視覺化過程 2.2 臉部微表情特徵探討 2.2.1 情緒辨識相關研究 2.2.2 臉部動作編碼系統 (FACS) 2.2.3 基於神經網路之情緒辨識相關文獻 2.3 資料集介紹 2.3.1 Real-Life資料集 2.3.2 Bag-of-Lies資料集 2.4 基於影片進行測謊之過往文獻討 2.4.1 機器學習方法下的測謊應用 2.4.2 深度學習方法下的測謊應用 2.4.3 基於影像進行測謊的可行性分析 第三章 特徵分析方法 3.1資料集比較與分析方法 3.2欺騙特徵提取分析方法 3.2.1從資料庫切割角度分析 3.2.2從前處理角度分析 3.2.3臉部身分辨識方法 3.2.4 Heatmap提取方法 3.3 使用半監督學習模式進行測謊的模型訓練 第四章 實驗結果與討論 4.1網路架構訓練與參數 4.2資料集分析 4.3方法分析 4.3.1 Heatmap探討 4.3.2 訓練資料處理方式 4.3.3 臉部辨識與欺騙預測結果之相關性 4.4與他人比較與討論 第五章 結論與未來展望 參考文獻

    [1] C. F. Bond Jr and B. M. DePaulo, "Accuracy of deception judgments," Personality and social psychology Review, vol. 10, no. 3, pp. 214-234, 2006.
    [2] U. Congress, "Scientific Validity of Polygraph Testing: A Research Review and Evaluation–A Technical Memorandum," ed: Washington, DC: Office of Technology Assessment, OTA-TM-H-15, 1983.
    [3] F. A. Kozel et al., "A pilot study of functional magnetic resonance imaging brain correlates of deception in healthy young men," The Journal of Neuropsychiatry and Clinical Neurosciences, vol. 16, no. 3, pp. 295-305, 2004.
    [4] "Lie detection: You can't hide your lyin' eyes." ScienceDaily. https://www.sciencedaily.com/releases/2010/07/100713213050.htm (accessed.
    [5] F. Horvath, J. McCloughan, D. Weatherman, and S. Slowik, "The Accuracy of auditors' and layered voice Analysis (LVA) operators' judgments of truth and deception during police questioning," Journal of forensic sciences, vol. 58, no. 2, pp. 385-392, 2013.
    [6] H. Hollien, J. D. Harnsberger, C. A. Martin, and K. A. Hollien, "Evaluation of the NITV CVSA," Journal of forensic sciences, vol. 53, no. 1, pp. 183-193, 2008.
    [7] J. D. Harnsberger, H. Hollien, C. A. Martin, and K. A. Hollien, "Stress and deception in speech: evaluating layered voice analysis," Journal of forensic sciences, vol. 54, no. 3, pp. 642-650, 2009.
    [8] K. R. Damphousse, "Voice stress analysis: Only 15 percent of lies about drug use detected in field test," NIJ Journal, vol. 259, pp. 8-12, 2008.
    [9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [10] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014: Springer, pp. 818-833.
    [11] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
    [12] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
    [13] R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in International conference on machine learning, 2013: PMLR, pp. 1310-1318.
    [14] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Icml, 2010.
    [15] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," arXiv preprint arXiv:1511.07289, 2015.
    [16] W. Shang, K. Sohn, D. Almeida, and H. Lee, "Understanding and improving convolutional neural networks via concatenated rectified linear units," in international conference on machine learning, 2016: PMLR, pp. 2217-2225.
    [17] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks," arXiv preprint arXiv:1706.02515, 2017.
    [18] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
    [19] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011: JMLR Workshop and Conference Proceedings, pp. 315-323.
    [20] C. Gulcehre, M. Moczulski, M. Denil, and Y. Bengio, "Noisy activation functions," in International conference on machine learning, 2016: PMLR, pp. 3059-3068.
    [21] M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural networks," arXiv preprint arXiv:1301.3557, 2013.
    [22] C. Gulcehre, K. Cho, R. Pascanu, and Y. Bengio, "Learned-norm pooling for deep feedforward and recurrent neural networks," in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2014: Springer, pp. 530-546.
    [23] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
    [24] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, "On the importance of initialization and momentum in deep learning," in International conference on machine learning, 2013: PMLR, pp. 1139-1147.
    [25] A. Botev, G. Lever, and D. Barber, "Nesterov's accelerated gradient and momentum as approximations to regularised update descent," in 2017 International Joint Conference on Neural Networks (IJCNN), 2017: IEEE, pp. 1899-1903.
    [26] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, "3D U-Net: learning dense volumetric segmentation from sparse annotation," in International conference on medical image computing and computer-assisted intervention, 2016: Springer, pp. 424-432.
    [27] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
    [28] C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of Big Data, vol. 6, no. 1, pp. 1-48, 2019.
    [29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
    [30] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
    [31] G. H. Dunteman, Principal components analysis (no. 69). Sage, 1989.
    [32] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
    [33] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [34] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
    [35] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [36] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), 2016, pp. 265-283.
    [37] Y. Jia et al., "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675-678.
    [38] A. Paszke et al., "Pytorch: An imperative style, high-performance deep learning library," arXiv preprint arXiv:1912.01703, 2019.
    [39] P. Ekman and W. V. Friesen, "Constants across cultures in the face and emotion," Journal of personality and social psychology, vol. 17, no. 2, p. 124, 1971.
    [40] T. Ojala, M. Pietikäinen, and D. Harwood, "A comparative study of texture measures with classification based on featured distributions," Pattern recognition, vol. 29, no. 1, pp. 51-59, 1996.
    [41] P. Liu et al., "Ocular recognition for blinking eyes," IEEE Transactions on Image Processing, vol. 26, no. 10, pp. 5070-5081, 2017.
    [42] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), 2005, vol. 1: Ieee, pp. 886-893.
    [43] E. Friesen and P. Ekman, "Facial action coding system: a technique for the measurement of facial movement," Palo Alto, vol. 3, no. 2, p. 5, 1978.
    [44] Y.-I. Tian, T. Kanade, and J. F. Cohn, "Recognizing action units for facial expression analysis," IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 2, pp. 97-115, 2001.
    [45] M. Pantic and L. J. M. Rothkrantz, "Automatic analysis of facial expressions: The state of the art," IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 12, pp. 1424-1445, 2000.
    [46] A. J. Calder, A. M. Burton, P. Miller, A. W. Young, and S. Akamatsu, "A principal component analysis of facial expressions," Vision research, vol. 41, no. 9, pp. 1179-1208, 2001.
    [47] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, "Web-based database for facial expression analysis," in 2005 IEEE international conference on multimedia and Expo, 2005: IEEE, p. 5 pp.
    [48] A. Dhall, R. Goecke, S. Ghosh, J. Joshi, J. Hoey, and T. Gedeon, "From individual to group-level emotion recognition: Emotiw 5.0," in Proceedings of the 19th ACM international conference on multimodal interaction, 2017, pp. 524-528.
    [49] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 971-987, 2002.
    [50] K. Sikka, T. Wu, J. Susskind, and M. Bartlett, "Exploring bag of words architectures in the facial expression domain," in European Conference on Computer Vision, 2012: Springer, pp. 250-259.
    [51] M. Liu, S. Shan, R. Wang, and X. Chen, "Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1749-1756.
    [52] H. Jung, S. Lee, J. Yim, S. Park, and J. Kim, "Joint fine-tuning in deep neural networks for facial expression recognition," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2983-2991.
    [53] C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, "Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5562-5570.
    [54] J. Yan, W. Zheng, Z. Cui, C. Tang, T. Zhang, and Y. Zong, "Multi-cue fusion for emotion recognition in the wild," Neurocomputing, vol. 309, pp. 27-35, 2018.
    [55] V. Vielzeuf, S. Pateux, and F. Jurie, "Temporal multimodal fusion for video emotion classification in the wild," in Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017, pp. 569-576.
    [56] X. Ouyang et al., "Audio-visual emotion recognition using deep transfer learning and multiple temporal models," in Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017, pp. 577-582.
    [57] D. Hamester, P. Barros, and S. Wermter, "Face expression recognition with a 2-channel convolutional neural network," in 2015 international joint conference on neural networks (IJCNN), 2015: IEEE, pp. 1-8.
    [58] A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, "Robust discriminative response map fitting with constrained local models," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3444-3451.
    [59] X. Zhao et al., "Peak-piloted deep network for facial expression recognition," in European conference on computer vision, 2016: Springer, pp. 425-442.
    [60] V. Pérez-Rosas, M. Abouelenien, R. Mihalcea, and M. Burzo, "Deception detection using real-life trial data," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp. 59-66.
    [61] V. Gupta, M. Agarwal, M. Arora, T. Chakraborty, R. Singh, and M. Vatsa, "Bag-of-lies: A multimodal dataset for deception detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0-0.
    [62] M. Jaiswal, S. Tabibu, and R. Bajpai, "The truth and nothing but the truth: Multimodal analysis for deception detection," in 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2016: IEEE, pp. 938-943.
    [63] S. Poria, A. Gelbukh, E. Cambria, P. Yang, A. Hussain, and T. Durrani, "Merging SenticNet and WordNet-Affect emotion lists for sentiment analysis," in 2012 IEEE 11th international conference on signal processing, 2012, vol. 2: IEEE, pp. 1251-1255.
    [64] S. Poria, A. Gelbukh, E. Cambria, D. Das, and S. Bandyopadhyay, "Enriching SenticNet polarity scores through semi-supervised fuzzy clustering," in 2012 IEEE 12th International Conference on Data Mining Workshops, 2012: IEEE, pp. 709-716.
    [65] E. Cambria, S. Poria, R. Bajpai, and B. Schuller, "SenticNet 4: A semantic resource for sentiment analysis based on conceptual primitives," in Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical papers, 2016, pp. 2666-2677.
    [66] R. P. Fisher and R. E. Geiselman, "The cognitive interview method of conducting police interviews: Eliciting extensive information and promoting therapeutic jurisprudence," International journal of law and psychiatry, vol. 33, no. 5-6, pp. 321-328, 2010.
    [67] Z. Wu, B. Singh, L. Davis, and V. Subrahmanian, "Deception detection in videos," in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32, no. 1.
    [68] D. Avola, M. Cascio, L. Cinque, A. Fagioli, and G. L. Foresti, "LieToMe: An Ensemble Approach for Deception Detection from Facial Cues," International Journal of Neural Systems, pp. 2050068-2050068, 2020.
    [69] M. Gogate, A. Adeel, and A. Hussain, "Deep learning driven multimodal fusion for automated deception detection," in 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017: IEEE, pp. 1-6.
    [70] G. Krishnamurthy, N. Majumder, S. Poria, and E. Cambria, "A deep learning approach for multimodal deception detection," arXiv preprint arXiv:1803.00344, 2018.
    [71] M. Ding, A. Zhao, Z. Lu, T. Xiang, and J.-R. Wen, "Face-focused cross-stream network for deception detection in videos," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7802-7811.
    [72] A. Stathopoulos, L. Han, N. Dunbar, J. K. Burgoon, and D. Metaxas, "Deception Detection in Videos Using Robust Facial Features," in Proceedings of the Future Technologies Conference, 2020: Springer, pp. 668-682.
    [73] L. M. Jupe and D. A. Keatley, "Airport artificial intelligence can detect deception: or am i lying?," Security Journal, vol. 33, no. 4, pp. 622-635, 2020.
    [74] J. Rothwell, Z. Bandar, J. O'Shea, and D. McLean, "Silent talker: a new computer‐based system for the analysis of facial cues to deception," Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition, vol. 20, no. 6, pp. 757-777, 2006.
    [75] V. Belavadi et al., "MultiModal Deception Detection: Accuracy, Applicability and Generalizability," in 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), 2020: IEEE, pp. 99-106.
    [76] J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, and S. Zafeiriou, "Retinaface: Single-stage dense face localisation in the wild," arXiv preprint arXiv:1905.00641, 2019.
    [77] J. Guo, X. Zhu, Y. Yang, F. Yang, Z. Lei, and S. Z. Li, "Towards fast, accurate and stable 3d dense face alignment," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16, 2020: Springer, pp. 152-168.
    [78] J. Li, R. Socher, and S. C. Hoi, "Dividemix: Learning with noisy labels as semi-supervised learning," arXiv preprint arXiv:2002.07394, 2020.
    [79] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. Raffel, "Mixmatch: A holistic approach to semi-supervised learning," arXiv preprint arXiv:1905.02249, 2019.
    [80] Z. Shao, Z. Liu, J. Cai, and L. Ma, "Deep adaptive attention for joint facial action unit detection and face alignment," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 705-720.
    [81] V. Pérez-Rosas, M. Abouelenien, R. Mihalcea, Y. Xiao, C. Linton, and M. Burzo, "Verbal and nonverbal clues for real-life deception detection," in Proceedings of the 2015 conference on empirical methods in natural language processing, 2015, pp. 2336-2346.
    [82] H. Karimi, J. Tang, and Y. Li, "Toward end-to-end deception detection in videos," in 2018 IEEE International Conference on Big Data (Big Data), 2018: IEEE, pp. 1278-1283.

    無法下載圖示 全文公開日期 2024/09/27 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE