簡易檢索 / 詳目顯示

研究生: 朱耿佑
Keng-yu Chu
論文名稱: 使用隱藏式馬可夫模型的即時手勢辨識技術於人與機器人之間的互動系統
Applying Hidden Markov Models to Real-Time Hand Gesture Recognition Techniques for a Human-Robot Interaction System
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 李漢銘
Hahn-Ming Lee
黃仲陵
Chung-Lin Huang
范國清
Kuo-Chin Fan
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 126
中文關鍵詞: 人機互動人臉偵測人臉追蹤粒子濾波器手部偵測手部追蹤特徵擷取手勢辨識隱藏式馬可夫模型
外文關鍵詞: Human-robot interaction, face detection, face tracking, particle filter, hand detection, hand tracking, feature extraction, gesture recognition, hidden Markov models
相關次數: 點閱:386下載:19
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,有越來越多的研究者致力於服務型機器人的開發,為了能有一個人性化且易於操作的人機介面,讓使用者與機器人之間更容易溝通及互動,各種與手勢肢體辨識相關的研究已迅速地成長。從過去硬體鍵盤的輸入方式至現今觸控式面板的操作介面,大大地改變人與機器人之間的溝通模式。然而,最自然的操作與互動模式是不需要任何輔助的物件及控制器便能自由操作;有鑑於此,我們希望直接透過使用者本身的手勢肢體動作來做為輸入的介面工具。
    在本篇論文中,我們提出一個具有即時自動人臉追蹤與手勢追蹤及辨識的機器人互動系統,主要包括人臉偵測與追蹤、手部偵測與追蹤、特徵擷取,以及手勢辨識四大程序。使用者可以利用自行定義的動態或靜態手勢動作進行機器人的控制和其它相對應的互動事件,而於實驗所建構之機器人的視覺系統,我們係選用一般視訊監控用的PTZ攝影機來擷取環境中的連續影像。首先,在人臉偵測與追蹤方面,我們基於影像中的膚色區塊取出候選人臉區域,且利用幾何資訊及髮色、眼睛、嘴巴等特徵來偵測出人臉的初始位置,並根據此目標藉由粒子濾除器達到動態追蹤人臉的需求。接著,在手部偵測與追蹤方面,我們提出一種新的手部偵測方法,於此過程中,使用者不需刻意穿著長袖以遮蔽手臂可能露出之膚色區域;當系統偵測到手部區域的初始位置後,為了提升系統速度及避免其它非手部區域的干擾,我們利用局部區域的追蹤方式來達成手部區域的追蹤,如此,我們便可以獲得手部連續移動的軌跡,進而決定出連續兩軌跡點的移動方向,以作為動態手勢辨識的主要特徵;最後,我們將使用隱藏式馬可夫模型分類器,進行使用者手勢動作的訓練與辨識。本實驗系統目前以單手四個方向性的動態手勢指令(上、下、左、右)為基礎,延伸至雙手組合可以產生24種手勢,若加上靜態的手勢種類,其指令數足以取代部分遙控器的功能,於此,我們將從它們之中選擇八種手勢對應常用之機器人控制與互動的功能。
    根據實驗結果顯示,我們的系統於人臉追蹤率在一般情況下超過97%,而於人臉遭受暫時性遮蔽的情況下高於94%;另外,手部追蹤率在一般的情況下至少有98%,而於手部移動快速的情況下不低於94%;最後,手勢辨識率可達96%以上。另外,本系統所採用的影像解析度為320×240,整體處理速度僅需60~120毫杪的執行時間,上述的系統效能令人非常滿意,它鼓舞我們在不久的未來將此機器人商品化。


    In recent years, many researchers are devoted to the development of service robots. In order to realize a human computer interface which has more human nature and is easy to operate, the researches focusing on hand gestures recognition have been rapidly growing up to make humans easier to communicate and interact with robots. Nowadays, the control panel is gradually switched from a keyboard to a simple hand touch, which dramatically changes the communication style between humans and computers. However, the most natural operation mode is to perform freely without any auxiliary objects and controllers. In view of the above-mentioned facts, we hope employing user’s hand gestures as a port tool to input data directly.
    In this thesis, we present an automatic real-time face tracking together with hand tracking and gesture recognition system installed on a human interacting robot. This system comprises four main processing stages: face detection and tracking, hand detection and tracking, feature extraction, and gesture recognition. The user can utilize dynamic or static gestures we currently define to conduct and interact with a robot, and for the construction of such a robot vision system, we adopt the PTZ camera commonly employed in video surveillance to capture surrounding information. Firstly, as to face detection and tracking, we extract face region candidates based on skin colors applied to eliminate the skin color regions that do not belong to human faces in the HSV color space, and exploit geometrical properties, hair colors, eyes, and mouth to detect the initial position of a human face. Then we develop an improved particle filter to dynamically locate the human face. In the hand detection and tracking procedure, we propose a new method of mapping circle plates to locate hands. Such a method allows humans to wear short-sleeved clothes. After the hand is detected and the centroid point of the hand region is determined, the tracking will take place in the further steps to acquire the hand motion trajectory by the aid of a local search area around the hand region. In the feature extraction stage, we can determine the orientation between two consecutive points from the hand motion trajectory, and it is used as the main feature for gesture recognition. Finally, in the classification stage, we process hand gestures training and recognition based on hidden Markov models. In our system, four basic types of directive gestures such as moving upward, downward, leftward, and rightward are now prescribed for single hand. Then we can pose twenty-four kinds of combination gestures using two hands, which are enough to replace the commands of a controller if including static gestures. Therefore, we select eight kinds of hand gestures from them to represent the most commonly-used commands for robot operation and interaction.
    Experimental results reveal that the face tracking rate is more than 97% in general situations and over 94% when the face suffers from temporal occlusion; as for the hand tracking rate, it is not less than 98% in general situations and exceeds 94% when the hand is moving quickly; the recognition ability of our system can achieve an average recognition rate of 96% at least. Besides this, we set the image resolution of 320×240 pixels in our experiments, and the total system performance only costs 60~120 milliseconds. The efficiency of system execution is very satisfactory, and we are encouraged to commercialize the robot in the near future.

    致謝 i 中文摘要 ii Abstract iv Contents vi List of Figures ix List of Tables xiii Chapter 1 Introduction 1 1.1 Overview 1 1.2 Background and motivation 2 1.3 System description 4 1.4 Thesis organization and system architecture 7 Chapter 2 Related Works 9 2.1 Review of face detection and tracking 10 2.2 Review of hand detection and tracking 13 2.3 Review of hand gesture recognition 16 Chapter 3 Face Detection and Tracking 21 3.1 Color space transformation 22 3.1.1 Skin color detection using the HSV model 23 3.1.2 Hair color detection using the YCbCr model 25 3.2 Face detection 26 3.2.1 Face geometry filtering 26 3.2.2 Facial feature detection 30 3.3 Object description and finding 33 3.4 Improved particle filter 34 3.4.1 Our proposed method 36 3.4.2 Local search method 41 3.5 Implementation of robot control 43 Chapter 4 Hand Detection and Tracking 46 4.1 Hand region localization 47 4.1.1 Scan Converting Circles 48 4.1.2 Hand detection 51 4.2 Hand tracking 54 4.3 Feature extraction 56 Chapter 5 Gesture Recognition 58 5.1 Gesture definition 59 5.2 The hidden Markov model 64 5.2.1 Fundamental of HMMs 65 5.2.2 The forward and backward algorithms 68 5.2.3 The Viterbi algorithm 72 5.2.4 The Baum-Welch algorithm 73 5.3 Gesture Classification 76 Chapter 6 Experimental Results and Discussions 79 6.1 System interface description 79 6.2 The results of face detection and tracking 81 6.3 The results of hand detection and tracking 86 6.4 The result of hand gesture trajectory extraction and recognition 89 6.5 Tests on the human-robot interaction control system 94 Chapter 7 Conclusions and Future Works 101 7.1 Conclusions 101 7.2 Future works 102 References 104

    [1] http://cs.nyu.edu/~jhan/ftirtouch/index.html
    [2] http://www.washingtonpost.com/wp-dyn/content/article/2008/02/04/AR2008020402796.html
    [3] L. Yu, D. Zhang, and K. Wang, “The relative distance of key point based iris recognition,” ScienceDirect Pattern Recognition, Vol. 40 , PP. 423-430, 2007.
    [4] K. Oka, Y. Sato, and H. Koike, “Real-time fingertip tracking and gesture recognition,” Computer Graphics and Applications, IEEE Vol. 22, PP. 64 – 71, 2002.
    [5] K. Y. Wang, “A real-time face tracking and recognition system based on particle filtering and AdaBoosting techniques,” Master Thesis, Department of Computer Science Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, 2006.
    [6] G. R. Bradski, “Computer vision face tracking for use in a perceptual user interface,” Intel Technology Journal, vol. 2, no. 2, pp. 1-15, 1998.
    [7] C. Shan, Y. Wei, T. Tan, and F. Ojardias, “Real time hand tracking by combining particle filtering and mean shift,” in Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, South Korea, pp. 669-674, May 2004.
    [8] K. T. Song and W. J. Chen, “Face recognition and tracking for human-robot interaction,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Hauge, Netherlands, pp. 2877-2882, Octorber 2004.
    [9] X. Liu and K. Fujimura, “Hand gesture recognition using depth data,” in Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, pp. 529-534 May, 2004.
    [10] H. Jiang, Z. N. Li, and M. S. Drew, “Human posture recognition with convex programming,” in Proceedings of the IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, pp. 574-577, July 2005.
    [11] N. Soontranon, S. Aramith, and T. H. Chalidabhongse, “Improved face and hand tracking for sign language recognition,” in Proceedings of the International Conference on Information Technology: Coding and Computing, Bangkok, Thailand, pp. 141-146, vol. 2, 2005.
    [12] X. Zhu, J. Yang, and A. Waibel, “Segmenting hands of arbitrary color,” in Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Pittsburgh, PA, USA, pp. 446-453, March 2000.
    [13] S. Mitra and T. Acharya, “Gesture recognition: A survey,” in Proceedings of the IEEE Transactions on Systems, Man, and Cybernetics, Kolkata, India, pp. 311-324, May 2007.
    [14] N. Liu, B. C. Lovell, P. J. Kootsookos, and R. I. A. Davis, “Model structure selection and training algorithm for a HMM gesture recognition system,” to appear in Proceedings of the International Workshop in Frontiers of Handwriting Recognition, Brisbane, Australia, pp. 100-106, 2004.
    [15] M. H. Yang, D. Kriegman, and N. Ahuja, “Detecting faces in images: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 1, pp. 34-58, 2002.
    [16] G. Yang and T. S. Huang, “Human face detection in complex background,” Pattern Recognition, vol. 27, no. 1, pp. 53-64, 1994.
    [17] K. C. Yow and R. Cipolla, “Feature-based human face detection,” Image and Vision Computing, vol. 15, no. 9, pp. 712-735, 1997.
    [18] D. Chai and A. Bouzerdoum, “A Bayesian approach to skin color classification in YCbCr color space,” in Proceedings of the IEEE Region Ten Conference, Kuala Lumpur, Malaysia, vol. 2, pp. 421-424, September 2000.
    [19] A. Lanitis, C. J. Taylor, and T. F. Cootes, “An automatic face identification system using flexible appearance models,” Image and Vision Computing, vol. 13, no. 5, pp. 393-401, 1995.
    [20] R. Vaillant, C. Monrocq, and Y. Le Cun, “An original approach for the localization of objects in images,” in Proceedings of the IEEE Conference on Artificial Neural Networks, Brighton, UK, pp. 26-30, May 1993.
    [21] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscence, vol. 3, no. 1, pp. 71-86, 1991.
    [22] K. C. Fan, Y. K. Wang, and B. F. Chen, “Introduction of tracking algorithms,” Image and Recognition, Vol. 8, No. 4, pp. 17-30, 2002.
    [23] G. L. Foresti, C. Micheloni, L. Snidaro, and C. Marchiol, “Face detection for visual surveillance,” in Proceedings of the 12th IEEE International Conference on Image Analysis and Processing, Mantova, Italy, pp.115-120, 2003.
    [24] K. H. An, D. H. Yoo, S. U. Jung, and M. J. Chung, “Robust multi-view face tracking,” in Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Edmonton, Canada, pp. 1905-1910, 2005.
    [25] K. Y. Wang, “A real-time face tracking and recognition system based on particle filtering and AdaBoosting techniques,” Master Thesis, Department of Computer Science Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, 2006.
    [26] K. Dorfmüller-Ulhaas and D. Schmalstieg, “Finger tracking for interaction in augmented environments,” in Proceedings of the IEEE and ACM International Symposium on Augmented Reality, pp. 55-64, 2001.
    [27] A. Vacavant and T. Chateau, “Realtime head and hands tracking by monocular vision,” in Proceedings of the IEEE International Conference on Image Processing, vol. 2, pp. 11-14, 2005.
    [28] K. K. Kim, K. C. Kwak and S. Y. Chi, “Gesture Analysis for Human-Robot Interaction,” International Congress on Anti Cancer Treatment, pp. 20-22, 2006.
    [29] M. Elmezain, A. Al-Hamadi, and B. Michaelis, “Real-Time Capable System for Hand Gesture Recognition Using Hidden Markov Models in Stereo Color Image Sequences,” The Journal of W S C G’08, vol. 16, no. 1, pp. 65-72, 2008.
    [30] R. H. Liang and M. Ouhyoung, “A real-time continuous gesture recognition system for sign language,” in Proceedings of the IEEE International Conference on Auto Face and Gesture Recognition, vol. 3, pp. 558-867, 1998.
    [31] Andrew Wilson and Nuria Oliver, “GWINDOWS: Towards Robust Perception-Based UI,” in Proceedings of Computer Vision and Pattern Recognition, 2003.
    [32] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd Ed., Addison-Wesley, Reading, Massachusetts, 1992.
    [33] S. L. Phung, A. Bouzerdoum, and D. Chai, “A novel skin color model in YCbCr color space and its application to human face detection,” in Proceedings of IEEE International Conference on Image Processing, vol. 1, pp. 289-291, 2002.
    [34] C. Lin, “Face detection, pose classification, and face recognition based on triangle geometry and color features,” Master Thesis, Department of Computer Science Information Engineering, National Central University, Jhongli, Taiwan, 2001.
    [35] M. Soriano, S. Huovinen, B. Martinkauppi, and M. Laaksonen, “Using the skin locus to cope with changing illumination conditions in color-based face tracking,” in Proceedings of the IEEE Nordic Signal Processing Symposium, pp. 383-386, 2000.
    [36] K. Suzuki, I. Horiba, and N. Sugie, “Linear-time connected-component labeling based on sequential local operations,” Source Computer Vision and Image Understanding Archive, vol. 89 , no. 1, pp. 1-23, 2003.
    [37] W. Huang, Q. Sun, C. P. Lam, and J. K. Wu, “A robust approach to face and eyes detection from images with cluttered background,” in Proceedings of the international conference on Pattern Recognition, vol. 1, pp. 110-114, Aug, 1998.
    [38] K. M. Lam and H. Yan, “An improved method for locating and extracting the eye in human face images,” in Proceedings of the international conference on Pattern Recognition, vol. 3, pp. 411-415, 1996.
    [39] Z. Tang, Z. Miao, “Fast background subtraction and shadow elimination using improved Gaussian mixture model,” in Proceedings of the IEEE International Workshop on Haptic Audio Visual Environments and their Applications Ottawa, pp. 38-41, 2007.
    [40] Y. H. Ching, “Visual tracking for a moving object using optical flow technique,” Master Thesis, Department of Mechanical and Electro- Mechanical Engineering, National Sun Yat Sen University, Kaohsiung, 2003.
    [41] G. Welch and G. Bishop, “An introduction to the Kalman filter,” Technical Report TR95-041, Department of Computer Science, University of North Carolina at Chapel Hill, NC, 2004.
    [42] S. Caifeng, W. Yucheng, T. Tieniu, and F. Ojardias, “Real time hand tracking by combining particle filtering and mean shift,” in Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 669-674, 2004.
    [43] K. Y. Wang, “A real-time face tracking and recognition system based on particle filtering and AdaBoosting techniques,” Master Thesis, Department of Computer Science Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, 2006.
    [44] J. D. Foley, A. V. Dam, S. K. Feiner, J. F. Hughes, Computer Graphics Principles and Practice 2nd in C Ed., Addison-Wesley, 1997.
    [45] Y. Sato, Y. Kobayashi, H. Koike, “Fast Tracking of Hands and Fingertips in Infrared Images for Augmented Desk Interface,”in Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition, pp. 462-467, 2000.
    [46] H. Yoon, J. Soh, Y. J. Bae, H. S. Yang, “Hand Gesture Recognition using Combined Features of Location, Angle and Velocity,” Pattern Recognition 34(7), pp. 1491-1501, 2001.
    [47] R. R. Lawrence, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.
    [48] A. Korgh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler, “Hidden Markov Models in computational biology: applications to protein modeling,” Journal of Molecular Biology, 235:1501-1531, 1994.
    [49] H. Yang and K. Alman, “2-D shape classification using hidden Markov model,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 13, no. 11, pp. 1172-1184, 1991.
    [50] M. A. Mohamed and P. D. Gader, “Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 18, no. 5, pp. 548-554, 1996.
    [51] G. A. Churchill, “Stochastic models for heterogeneous dna sequences,” Bulletin of Mathematical Biology, vol. 51, pp. 79-94, 1989.
    [52] N. Liu, B. C. Lovell, P. J. Kootsookos, R. I. A. Davis, “Understanding HMM training for video gesture recognition,” In IEEE TENCON, pp. 567-570, 2004.

    QR CODE