簡易檢索 / 詳目顯示

研究生: 鄧羽辰
Yu-Chen Teng
論文名稱: 應用具有序列迴歸卷積網路為基準的動態人臉表情及無線語音命令辨識的人機協同之研究
Human-Robot Collaboration Using Sequential-Recurrent-Convolution-Network-Based Dynamic Face Emotion and Wireless Speech Command Recognitions
指導教授: 黃志良
Chih-Lyang Hwang
口試委員: 黃志良
Chih-Lyang Hwang
郭景明
Jing-Ming Guo
施慶隆
Ching-Long Shih
蔡奇謚
Chi-Yi Tsai
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 56
中文關鍵詞: 卷積神經網路長短期記憶模型人臉檢測動態人臉情感無線語音命令識別全方位服務機器人視覺搜索和跟踪
外文關鍵詞: CNN, LSTM, Human and face detection, Dynamic face emotion, wireless speech command recognition, Omnidirectional service robot, Visual searching and tracking
相關次數: 點閱:348下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本文提出的序列迴歸卷積網路(SRCN)模型包含兩部分:卷積神經網路(CNN),與長短期記憶(LSTM)模型。其中CNN能夠萃取動態人臉情感中的表情影像特徵或無線語音命令中梅爾頻譜圖的頻率特徵向量。將輸入的表情影像或是頻譜圖資料透過卷積層得到的特徵向量後,依序列輸入到長短期記憶模型序列當中,即可完成動態人臉表情識別與無線語音命令的分類。簡而言之,提出了一個用於動態人臉情感識別的SRCN-DFER模型和另一個用於無線語音命令識別的SRCN-WSCR模型來處理人機協同(HRC)任務。所提出的方法不僅可以有效的解決動態人臉情感和語音命令的識別問題,而且能夠以優異的辨識率防止過擬合問題。最後,人機協同的實驗任務內容中包括人士偵測和人臉檢測、軌跡跟踪控制、動態人臉情緒和語音命令識別以及音樂播放,並且在實驗中驗證所提出方法的有效性、可行性和強健性。


    The proposed sequential recurrent convolution network (SRCN) include two parts: one convolution neural network (CNN) and a sequence of long short-term memory (LSTM) models. The CNN is to achieve the corresponding feature vector for dynamic face emotion or wireless speech command. Subsequently, a sequence of LSTM models with the shared weight corresponding to a sequence of inputs (or feature vectors) provided by a pre-trained CNN with a sequence of input sub-images corresponding to face or spectrograms corresponding to speech command. Simply put, one SRCN for dynamic face emotion recognition (SRCN-DFER) and another SRCN for wireless speech command recognition (SRCN-WSCR) are developed to deal with human-robot collaboration (HRC) task. The proposed approaches not only can effectively tackle the recognitions of dynamic mapping of face emotion and speech command but also can prevent the overfitting problem with excellent recognition rate. Finally, the HRC including human and face detections, trajectory tracking control, face emotion and speech command recognitions, and music play, is present to validate the effectiveness, feasibility, and robustness of the proposed method.

    摘要 i Abstract ii 目錄 iii 圖目錄 iv 表目錄 v 第一章 導論與文獻回顧 1 1.1導論 1 1.2論文回顧 2 第二章 系統建構與任務陳述 4 2.1系統建構 4 2.2任務陳述 8 第三章 動態人臉表情辨識 12 3.1卷積神經網路 12 3.2 長短期記憶模型(LSTM) 15 3.3 SRCN-DFER 16 3.4人臉表情訓練測試及資料庫 17 第四章 無線語音命令辨識 24 4.1 語音命令預處理 24 4.2 SRCN-WSCR 29 第五章 實驗結果與討論 32 5.1 基於影片的動態表情識別 32 5.2 語音命令識別結果 35 5.3 人機協同任務 38 第六章 結論和未來研究 43 參考文獻 44

    [1] A. Lim, and H. G. Okuno “The MEI Robot: Towards using Motherese to develop multimodal emotional intelligence,” IEEE Trans. Autom. Mental Develop. Syst., vol. 6, no. 2, pp. 126-138, Jun. 2014.
    [2] L. McCallum, and P. W. McOwan, “Extending human-robot relationships based in music with virtual presence,” IEEE Trans. Autom. Mental Develop. Syst., vol. 10, no. 4, pp. 955-960, Dec. 2018.
    [3] J.-Y. Lin, M. Kawai, Y. Nishio, S. Cosentino, and A. Takanishi, “Development of performance system with musical dynamics expression on humanoid saxophonist robot,” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 1684-1690, Apr. 2019.
    [4] H. Wolfe, M. Peljhan, and Y. Visell “Singing robots: how embodiment affects emotional responses to non-linguistic utterances,” IEEE Trans. Affect. Comput., vol. 11, no. 2, pp. 284-295, Apr-Jun 2020.
    [5] Z. Zhang, K. Qian, B. W. Schuller, and D. Wollherr “An online robot collision detection and identification scheme by supervised learning and Bayesian decision theory,” IEEE Trans. Autom. Sci. Engr., to be published, 2021.
    [6] S. Boucenna, P. Gaussier, and L. Hafemeister, “Development of first social referencing skills: emotional interaction as a way to regulate robot behavior,” IEEE Trans. Autonomous Mental Development, vol. 6, no. 1, pp. 42-55, Mar. 2014.
    [7] A. Zaraki, D. Mazzei, M. Giuliani, and D. De Rossi, “Designing and evaluating a social gaze-control system for a humanoid robot,” IEEE Trans. Human-Machine Syst., vol. 44, no. 2, pp. 157-168, Apr. 2014.
    [8] A. Zaraki, M. Pieroni, D. De Rossi, D. Mazzei, R. Garofalo, L. Cominelli, and M. B. Dehkordi, “Design and evaluation of a unique social perception system for human-robot interaction,” IEEE Trans. Cognitive and Development Syst., vol. 9, no. 4, pp. 341-352, Dec. 2017.
    [9] L. Chen, M. Zhou, M. Wu, J. She, Z. Liu, F. Dong, and K. Hirota, “Three-layer weighted fuzzy support vector regression for emotional intention understanding in human-robot interaction,” IEEE Trans. Fuzzy Syst., vol. 26, no. 5, pp. 2524-2538, Oct. 2018.
    [10] L. Chen, M. Wu, M. Zhou, Z. Liu, J. She, and K. Hirota, “Dynamic emotion understanding in human–robot interactions based on two-layer fuzzy SVR-TS model,” IEEE Trans. Syst. Man, Cybern.: Syst., vol. 50, no. 2, pp. 490-501, Feb. 2020.
    [11] M. Wu, W. Su, L. Chen, Z. Liu, W. Cao, and K. Hirota, “Weight-adapted convolution neural network for facial expression recognition in human-robot interaction,” IEEE Trans. Syst. Man, Cybern.: Syst., vol. 51, no. 3, pp. 1473-1484, Mar. 2021.
    [12] P. Ekman, W. V. Friesen, and P. Ellsworth, “Emotion in the Human Face,” Oxford University Press, 1972.
    [13] Y. Liu, X. Yuan, X. Gong, Z. Xie, F. Fang, and Z. Luoa, “Conditional convolution neural network enhanced random forest for facial expression recognition,” Patter Recognition, vol. 84, pp. 251-261, 2018.
    [14] Y. Yaddaden, M. Addaa, A. Bouzouanea, S. Gaboury, and B. Bouchard, “User action and facial expression recognition for error detection system in an ambient assisted environment,” Expert Syst. and Application, vol. 112, pp. 173-189, 2018.
    [15] A. Ruiz-Garcia, M. Elshaw, A. Altahhan, and V. Palade, “A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots,” Neural Computing and Applications, vol. 29, pp. 59–373, 2018.
    [16] T. T. D. Pham, S. Kim, Y. Lu, S.-W. Jung, and C.-S. Won, “Facial action units-based image retrieval for facial expression recognition,” IEEE Access, vol. 7, pp. 5200-5207, 2019.
    [17] J.-H. Kim, B.-G. Kim, P. P. Roy, and D.-M. Jeong, “Efficient facial expression recognition algorithm based on hierarchical deep neural network structure,” IEEE Access, vol. 7, pp. 41273 -41285, 2019.
    [18] Y. Hu, M. Lu, C. Xie, and X. Lu, “Driver drowsiness recognition via 3D conditional GAN and two-level attention Bi-LSTM,” IEEE Trans. Cir. Syst. for video Technol., vol. 30, no. 12, pp. 4755-4768, Dec. 2020.
    [19] S. Xie, H. Hu, and Y. Chen, “Facial expression recognition with two-branch disentangled generative adversarial network,” IEEE Trans. Cir. Syst. for video Technol., DOI: 10.1109/TCSVT.2020. 3024201.
    [20] F. Zhang, Mingliang Xu, and Changsheng Xu, “Weakly-supervised facial expression recognition in the wild with noisy data,” IEEE Trans. Multimedia, DOI: 10.1109/TMM.2021. 3072786.
    [21] H. Zhang, and M. Xu, “Weakly supervised emotion intensity prediction for recognition of emotions in images,” IEEE Trans. Multimedia, DOI: 10.1109/TMM.2020.3007352.
    [22] W. Nie, M. Ren, J. Nie, and S. Zhao, “C-GCN: Correlation based graph convolutional network for audio-video emotion recognition,” IEEE Trans. Multimedia, DOI: 10.1109/TMM. 2020.3032037.
    [23] H. Wang, P. Tang, Q. Li, and M. Cheng, “Emotion expression with fact transfer for video description,” IEEE Trans. Multimedia, DOI: 10.1109/TMM.2021.3058555.
    [24] A. Shirian, S. Tripathi, and T. Guha, “Dynamic emotion modeling with learnable graphs and graph inception network,” IEEE Trans. Multimedia, DOI: 10.1109/TMM.2021.3059169.
    [25] G. Du, Z. Wang, B. Gao, S. Mumtaz, K. M. Abualnaja, and C. Du, “A convolution bidirectional long short-term memory neural network for driver emotion recognition,” IEEE Trans. Intell. Transport. Syst., DOI: 10.1109/TITS.2020. 3007357.
    [26] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 677-691, Apr. 2017.
    [27] C. Dhiman, and D. K. Vishwakarma, “View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics,” IEEE Trans. Imag. Process., vol. 29, pp. 3835- 3844, 2020.
    [28] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, Jun. 2017.
    [29] J. Li , X. Liang, S. Shen, T. Xu , J. Feng, and S. Yan, “Scale-aware fast R-CNN for pedestrian detection,” IEEE Trans. Multimedia, vol. 20, no. 4, pp. 985-996, Apr. 2018.
    [30] C.-L. Hwang, and G. H. Liao, “Real-time pose imitation by mid-size humanoid robot with servo-cradle-head RGB-D vision system,” IEEE Trans. Syst. Man & Cybern.: Syst., vol. 49, no. 1, pp. 181-191, Jan. 2019.
    [31] Chun-Kuan Lee, Deep CNN Stereo Camera Based Dynamic Face Emotion Recognition to Fulfill Human-Robot Interaction Tasks, Master Thesis of Department of Electrical Engineering, Jun. 2020.
    [32] C.-L. Hwang, F. C. Weng, W. H. Hung, and F. Wu, “Simultaneous translation and rotation tracking design for sharp corner, obstacle avoidance, and time-varying terrain by hierarchical adaptive fixed- time saturated control,” Mech. Syst. Signal Process., vol. 161, pp. 1-23, May 2021. https://authors.elsevier.com/a/1d0se39%7Et0Y0Dw.
    [33] C.-L. Hwang, F. C. Weng, D. S. Wang, and F. Wu, “Experimental validation of speech improvement based adaptive stratified finite-time saturation control of omnidirectional service robot,” IEEE Trans. Sys., Man, Cybern.: Syst., DOI:10.1109/TSMC.2020.3018789.
    [34] G. Zamzmi, R. Paul, M. S. Salekin, D. Goldgof, R. Kasturi, T. Ho, and Y. Sun, “Convolutional neural networks for neonatal pain assessment,” IEEE Trans. Bio. Beh. Id. Sci., vol 1, no. 3, pp. 192-200, Jul. 2019.
    [35] Y. Kim, H. Jung, D. Min, and K. Sohn, “Deep monocular depth estimation via integration of global and local predictions,” IEEE Trans. Imag. Process., vol. 27, no. 8, pp. 4131-4144, Aug. 2018.
    [36] S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 221-231, Jan. 2013.
    [37] S. R. Dubey, S. Chakraborty, S. K. Roy, S. Mukherjee, S. K. Singh, and B. B. Chaudhuri, “diffGrad: An optimization method for convolutional neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 11, pp. 4500-4511, Nov. 2020.
    [38] L. Shao, F. Zhu, and X. Li, “Transfer learning for visual categorization: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 11, pp. 4500-4511, Nov. 2015.
    [39] https://ww2.mathworks.cn/help/dsp/ref/dsp.stft.html.

    QR CODE