簡易檢索 / 詳目顯示

研究生: 于文哲
Wen-che Yu
論文名稱: 基於隱藏式馬可夫模型之人類行為辨識並應用於機器人居家看護
Hidden Markov Model-Based Approach for Recognizing Human Behavior and Application in Robotic Home Care
指導教授: 蔡明忠
Ming-jong Tsai
口試委員: 李敏凡
Min-fan Ricky Lee
鍾國亮
Kuo-liang Chung
學位類別: 碩士
Master
系所名稱: 工程學院 - 自動化及控制研究所
Graduate Institute of Automation and Control
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 81
中文關鍵詞: 人類行為人機互動K-means分群演算法隱藏式馬可夫模型影像監視
外文關鍵詞: Human Behavior, Human-Robot Interaction (HRI), K-means Cluster Algorithm, Hidden Markov model (HMM), Video Surveillance
相關次數: 點閱:422下載:20
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來在許多高階影像監控的發展應用中,如何從影像中推論出人類行為逐漸成為重要的研究主題。本論文應用隱藏式馬可夫模型(Hidden Markov Model)於居家環境中偵測人類特定行為,並可與機器人互動。本研究中,人類行為以一連串靜態影像框來描述,使用包體結構(Convexity Structure)逼近每張影像框中的像素,以取得最大二值化物件;再取出包體結構的三個重要特徵-寬高關係、高度比例、包體周長比例,來描述該人體的靜態姿勢,此法使原影像串列轉型為三維向量串列。本研究預先取了4000個人類於居家環境中可能存在之姿勢且取出上述的特徵向量,然後經由K-means法分為六個族群;此法使上述三維向量串列轉型成一簡單編碼串列。於訓練階段,使用代表某行為的影片轉出上述的編碼串列,訓練該行為的隱藏式馬可夫模型;測試階段時再將未知行為影片轉成編碼串列輸入訓練好的隱藏式馬可夫模型,以推論該片段所屬行為。
    本研究針對五種在居家環境的正常行為(如走路、快走、彎腰、撿東西、跳躍)與一種異常行為(跌倒),共六種行為進行分析與驗證;實作系統以120段影片包含1.8公尺和1.4公尺高的人做為訓練樣本,之後對240段影片進行驗證,並輸出該行為辨識結果。此系統結合人型機器人來模擬居家看護,機器人可對特定人類行為做出對應的動作。根據實驗結果,此系統可達95%的辨識率,對異常行為亦能有效辨識。


    Recent years, how to infer human behavior from video automatically has become an important topic in many advance video surveillance applications. This study applies a HMM (Hidden Markov model)-based methodology to detect some specific-behaviors of a person in a domestic environment. In this study, a behavior is composed of a series of the static image-frames. For each frame, a Convexity Structure is used to enclose a human body’s shape after segmenting the blob. With the Convexity Structure, this study clearly defines three feature parameters, Width Height ratio, Height ratio, and Convex Hull Perimeter ratio as a three-dimensional vector. Firstly, more than 4000 blob types are analyzed from the possible static postures captured from each interested behavior. Then, they are clustered into 6 feature-postures by using K-mean method, and a codebook is created for mapping the image blobs into one of 6 feature blob-types during training/recognition stage. Consequently, the time-sequential blobs are converted to a feature-vector sequence and transformed into a symbolic-sequence by the codebook. Thus a learned HMM can be obtained by this symbolic-sequence from a given specific-behavior. After that, the learned HMM is used to detect the unknown behavior to investigate which behavior it is.
    A system is implemented to automatically recognize five different behaviors in domestic environment (Walk, Walk Fast, Bend, Pick up an Object, Jump) and one abnormal behavior (Fall Down). The system is trained by using 120 known video clips which record the behaviors of 1.8m-height and 1.4m-height human. Subsequently, another 240 video clips are used to verify this system. This system also includes a humanoid robot to simulate home care situation. According to a particular human behavior, the robot can perform a correspondent reaction. From the experimental results, an overall recognition rate of 95% is achieved, and the system can detect abnormal behavior accurately.

    LIST OF CONTENTS 中文摘要........................................................................................................................I ABSTRACT.................................................................................................................II LIST OF CONTENTS...............................................................................................III FIGURE INDEX..........................................................................................................V TABLE INDEX.......................................................................................................VII 1. INTRODUCTION………………………………………………..………………..1 1.1 Background……..............................................................................................1 1.2 Motivation........................................................................................................2 1.3 Objectives of this Study....................................................................................3 1.4 Organization of This Thesis.............................................................................5 2. Literature Review and Related Work…….……………………………..……......6 2.1 The Detection in Human Posture in Sports Activity.……………..…………6 2.2 Detection of Behaviors and Interaction with Robot…………………………8 2.3 Detection of Specific/Non-Specific Behaviors..............................................10 2.4 Basic Concept of K-means Clustering………...............................................11 2.5 Fundamentals of Hidden Markov Model (HMM)…......................................14 2.5.1 Markov Chains and Extension to Hidden Models……........................18 2.5.2 HMM Elements and Symbol Description……...................................20 2.5.3 Learning Process of HMM……..........................................................22 2.5.4 Recognition Process of HMM……......................................................27 2.6 Feature Extraction for human behavior……………......................................30 3. SYSTEM IMPLEMENTATION….......................................................................32 3.1 System Overview………………...................................................................32 3.2 Feature Extraction and Clustering……………...…………………........ 33 3.3 Feature Definition and the Convexity-Structure.……………………........39 3.4 HMM Modeling…........................................................................................40 4. EXPERIMENTAL RESULTS...............................................................................44 5. CONCLUSION AND FEATURE WORK………...............................................72 5.1 Conclusion..........................................................................................72 5.2 Future Work...................................................................................74 6. REFERENCES.......................................................................................................75 APPENDIX…….....................................................................................................80

    [1] Jia-Xu Li, “Baseball Pitch Recognition for Broadcast Television”, Department of Computer Science, National Chung Cheng University, Taiwan, June 2006.
    [2] Chia-Chang Li, Chia-Wen Lin, and Jen-Yu Yu,“Statistical Pitch Type Recognition in Broadcast Baseball Videos”, Taiwan, June 2010.
    [3] Benmokhtar, R., B. Huet, and S.A. Berrani. Low-level feature fusion models for soccer scene classification. in Proc. of the IEEE International Conference on Multimedia & Expo, Hannover, Germany. June 2008.
    [4] Chih-Chieh C., and Chiou-Ting H., “Fusion of audio and motion information on HMM-based highlight extraction for baseball games”, IEEE Transactions on Multimedia, vol.8, no.3, pp.585-599, 2006.
    [5] M. Sugano, K. Uemura, Y. Nakajima and H. Yanagihara, “High-Level Soccer Indexing on Low-Level Feature Space,” in Proc. IEEE Int. Conf. Image Processing, vol. 3, Singapore, 24-27 Oct. 2004, pp. 1625- 1628.
    [6] Y. Rui, A. Gupta and A.Acero, “Automatically Extracting Highlights for Baseball Programs, “in Proc. ACM Multimedia, pp.105-115, Oct. 2000.
    [7] P. Chang, M. Han and Y. Gong, “Extract Highlights from Baseball Game Video with Hidden Markov Models,“ in Proc. IEEE Int. Conf. Image Processing, vol. 1, Rochester, NY, Sep. 22–25, 2002, pp. 609–612.
    [8] C.-H. Liang, W.-T. Chu, J.-H.-Kuo, J.-L.- Wu and W.-H.-Cheng, “Baseball Event Detection Using Game-Specific Feature Sets and Rules,” IEEE International Symposium on Circuits and Systems, vol. 4, 23-26 May 2005, pp. 3829- 3832.
    [9] W.-T. Chu and J.-L. Wu, “Integration of Rule-based and Model-based Decision Methods for Baseball Event Detection,” in Proc. IEEE Int. Conf. Multimedia and Expo, Amsterdam, 6-8 July 2005.
    [10] Guangyu, Z., et al., Human Behavior Analysis for Highlight Ranking in Broadcast Racket Sports Video. IEEE Transactions on Multimedia, v.9 n.6, p.1167-1182, October 2007
    [11] D. Zhang and S.-F. Chang, “Event Detection in Baseball Video Using Superimposed Caption Recognition,” in Proc. ACM Multimedia, Juan-les-Pins, 2002, France, pp. 315 – 318
    [12] Ermis, E.B., et al. Abnormal behavior detection and behavior matching for networked cameras. in Distributed Smart Cameras, 2008. ICDSC 2008. Second ACM/IEEE International Conference on Sep. 7-11, 2008, pp. 1-10.

    [13] X. Yu, C Xu, H.-W. Leong, Q. Tian, Q. Tang and K. W. Wan, “Trajectory-Based Ball Detection and Tracking with Applications to Semantic Analysis of Broadcast Soccer Video,” in Proc. ACM Multimedia, Berkeley, 2003, pp. 11-20
    [14] H. Shum and T. Komura, “A Spatiotemporal Approach to Extract the 3D Trajectory of the Baseball from a Single View Video Sequence,” in Proc. IEEE Int. Conf. Multimedia and Expo, vol. 3, 27-30 June 2004, pp. 1583- 1586
    [15] H. Chen, M.J. Tsai, and C.C. Chan, "A Hidden Markov Model-based approach for recognizing swimmer's behaviors in swimming pool", in Proc. ICMLC, 2010, pp.2459-2465.
    [16] Xiaofeng Tong, Lingyu Duan, Changsheng Xu, Qi Tian, Hanqing Lu1, “Local Motion Analysis and Its Application in Video based Swimming Style Recognition” . The 18th International Conference on Pattern Recognition (ICPR'06)
    [17] A.K. Jain, A. Vailaya, and W. Xiong, “Query by video clip”, Multimedia Systems, 7, 1999, pp.369-384.
    [18] Y.F. Ma, H.J. Zhang, “Motion Texture: A New Motion Based Video Representation”, ICPR 2000, pp. 548-551.
    [19] L.Y Duan, M. Xu, Q. Tian, and C. S. Xu, “Nonparametric motion model with application to camera motion pattern classification”, Proc. of ACM Multimedia, 2004.
    [20] Chin-De, L., C. Yi-Nung, and C. Pau-Choo, An Interaction-Embedded HMM Framework for Human Behavior Understanding: With Nursing Environments as Examples. Information Technology in Biomedicine, IEEE Transactions on. 14(5): pp. 1236-1246.
    [21] Y Xu, M. Guillemot and T. Nishida, An Experiment Study of Gesture-Based Human-Robot Interface, IEEE / ICME International Conference on Complex Medical Engineering-CME2007, pp 458-464, 2007.
    [22] Alvarez-Santos, V., et al., Gesture-based interaction with voice feedback for a tour-guide robot. Journal of Visual Communication and Image Representation.
    [23] Barbi, J., et al. Segmenting motion capture data into distinct behaviors. in Proceedings - Graphics Interface. 2004. London, Ont.
    [24] K. Nickel, E. Seemann and R. Stiefelhagen, 3D-Tracking of Head and Hands for Pointing Gesture Recognition in a Human-Robot Interaction Scenario, IEEE International conference on automatic face and gesture recognition, 2004.

    [25] S.-W. Lee,Automatic Gesture Recognition for Intelligent Human-Robot Interaction, Proc. IEEE Int',l Conf. Automatic Face and Gesture Recognition, pp. 645-650, 2006.
    [26] S. Ferrando, G. Gera, and C. Regazzoni, “Classification of Unattended and Stolen Objects in Video-Surveillance System,” in Proc. AVSS’06, 2006, pp. 21-26.
    [27] Chee Seng, C. and L. Honghai, Fuzzy Qualitative Human Motion Analysis. Fuzzy Systems, IEEE Transactions on, 2009. 17(4): pp. 851-862.
    [28] W. Hu, X. Xiao, Z. Fu, D. Xie, T. Tan, and S. Maybank, “A System for Learning Statistical Motion Patterns,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1450-1464, Sep. 2006.
    [29] Jia-Xu Li, “Baseball Pitch Recognition for Broadcast Television”, Department of Computer Science, National Chung Cheng University, Taiwan, June 2006.
    [30] N. Oliver, B. Rosario, and A. Pentland, “A bayesian computer vision system for modeling human interactions,” IEEE Trans. Pattern Anal. Machine Intell., vol. 22, pp. 831–843, Aug. 2000.
    [31] J. B. MacQueen, "Some Methods for classification and Analysis of Multivariate Observations", Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:pp.281-297,1967.
    [32] J. A. Hartigan and M. A. Wong, "A K-Means Clustering Algorithm", Applied Statistics, Vol. 28, No. 1, pp100-108.,1979
    [33] D. Arthur,S. Vassilvitskii, "How Slow is the k-means Method?," Proceedings of the 2006 Symposium on Computational Geometry (SoCG), 2006
    [34] http://en.wikipedia.org/wiki/Hidden_Markov_model, December-2013.
    [35] Hiroaki Kawashima and Takashi Matsuyama, "Interval-Based Linear Hybrid Dynamical System for Modeling Cross-Media Timing Structures in Multimedia Signals", International Conference on Image Analysis and Processing (ICIAP 2007), pp.789-794, 2007.
    [36] Yu Horii, Hiroaki Kawashima, Takashi Matsuyama, "Speaker Detection Using the Timing Structure of Lip Motion and Sound", IEEE CVPR Workshop on Human Communicative Behavior Analysis (CVPR4HB), 2008.
    [37] Khedr, A.M. and M.H. Ibrahim, Log-odd: A new method for improving hidden Markov model decoding for gene finding. Kuwait Journal of Science and Engineering. 39(2 A): pp. 103-118.
    [38] Based on CS570 Class Note of Year 2004, Dr. Sung Jung Cho’s tutorial, 2005

    [39] J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in timesequential images using hidden markov model,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1992, pp. 379–385.
    [40] Y. Wang, C. Yang, X. Wu, S. Xu, and H. Li, "Kinect Based Dynamic Hand Gesture Recognition Algorithm Research", 4th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 1, 2012, pp. 274-279.
    [41] T. Starner and A. Pentland, “Visual recognition of american sign language using hidden markov model,” in Proc. Int. Workshop on Automatic Face and Gesture Recognition, 1995, pp. 189–194.
    [42] T. Darrel and A. Pentland, “Space-time gesture,” Proc. Computer Vision and Pattern Recognition, pp. 335–340, 1993.
    [43] A. Bobick and A. Wilson, “A state-based technique for the summarization and recognition of gesture,” in Proc. Int. Conf. on Computer Vision, 1995, pp. 382–388.
    [44] Chen, H.S., et al. Human action recognition using star skeleton. in Proceedings of the ACM International Multimedia Conference and Exhibition. 2006. Santa Barbara, CA.
    [45] J. Aggarwal and Q. Cai, “Human motion analysis: A review,” Comput. Vis. Image Understanding, vol. 73, no. 3, pp. 428–440, 1999.
    [46] Rabiner, L., A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989. 77(2): pp. 257-286.
    [47] A. Kale, A. Sundaresan, A. N. Rajagopalan, N. P. Cuntoor, A. Roy-Chowdhury, V. Kruger and R. Chellappa. "Identification of Humans Gait," IEEE Transactions on Image Processing, pp. 1163-1173, 2004.
    [48] F, Niu and M. Abdel-Mottaleb. "View-Invariant Human Activity Recognition Based on Shape and Motion Features," Proceedings of IEEE Sixth International Symposium on Multimedia Software Engineering, pp. 546-556, 2004.
    [49] H. Fujiyoshi and A. J. Lipton. "Real-Time Human Motion Analysis by Image Skeletonization." Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision, pp. 15-21, 1998.
    [50] G. Toussaint. “Solveing geometric problems with the rotating calipers”. In Proceedings of IEEE MELECON’83, Los Alamitos, CA, pp.A10.2/1-4. IEEE Presss, New York,1983.
    [51] Z. Zhang, “Parameter estimation techniques: A tutorial with application to conic fitting,” Image and Vision Computing 15 (1996): 59–76.

    [52] A. W. Fitzgibbon and R. B. Fisher, “A buyer’s guide to conic fi tting,” Proceedings of the 5th British Machine Vision Conference (pp. 513–522), Birmingham, 1995.

    QR CODE