簡易檢索 / 詳目顯示

研究生: 張哲瑋
Zhe-Wei Zhang
論文名稱: 利用雙鏡頭在低成本嵌入式系統上實現即時姿勢評估
Real-time Pose Evaluation using Dual Camera on Low Cost Embedded System
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 阮聖彰
Shanq-Jang Ruan
戴文凱
Wen-Kai Tai
朱宏國
Hung-Kuo Chu
姚智原
Chih-Yuan Yao
賴祐吉
Yu-Chi Lai
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 67
中文關鍵詞: 姿勢評估人類姿勢評估動作偵測動作評估連續姿勢評估
外文關鍵詞: Pose Evaluation, Pose Estimation, Motion Detection, Motion Estimation, Sequential Pose Evaluation
相關次數: 點閱:240下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來人機互動蓬勃發展,捨棄以往依賴按鈕或搖桿式的輸入裝置,取而代之的是身體動作與機器的直接溝通,這不但讓溝通方式更直覺,亦讓互動體驗提升至更高的境界。然而,利用相機去評估人的身體動作在電腦視覺上一直都是高難度的挑戰,往往都要配合運算強大的硬體才得以實現。本論文提出了一種應用於舞蹈遊戲可實時運作且可實作在低成本硬體的人體姿勢評估演算法,此方法基於低運算量的立體視覺演算法搭配光流法進行強化來從雙鏡頭相機產生出適當精度的深度圖,並從深度圖通過直方圖分析切割出目標的邊界框。接著使用慣性模型來優化邊界框的連續性,並進行正規化,最終透過三個邊界框處理過程(平均過程、組比較過程、多幀比較過程)高度使用連續邊界框的時域與空域資訊去實現人體姿態評估。過程中,大部分的影像處理我們都已經使用GPU實現以減低CPU的負擔,並使用多線程邊程來提升系統的穩定性。實驗中,我們使用Kinect感測器搭配我們提出的人體姿態評估算法作為標準數據,並以雙鏡頭相機與之比較。實驗結果顯示我們提出的方法不僅可以有效進行人體姿態評估,在正確跳舞及不正確跳舞下,分數差距平均有36.1%,而使用雙鏡頭相機的效果也可達到使用Kinect感測器時的效果,雙鏡頭相機與Kinect感測器比較,誤差率在正確跳舞下為2.26%,在不正確跳舞下為3.57%,因此我們可以使用雙鏡頭相機取代昂貴的Kinect感測器,並搭配我們提出的算法來實現人體姿勢評估。


    In recent years, Human-computer interaction becomes a trend. Instead of relying on the input devices such as the button, rocker, and keyboard, moving the body or making a pose can communicate with a computer directly based on cameras. However, human pose evaluation is a huge challenge in computer vision, this technology is usually achieved with high computational demand. In this thesis, we propose a dual-camera based real-time human pose evaluation method, whose demand of computation is lower than recent methods, hence can be implemented on low-cost hardware. First, a moderate quality of disparity map can be generated with low computation by combining the block-matching algorithm and optical flow based restoration process. Then, a segmentation of an object can be obtained by analyzing the histogram of the disparity map and performing peak analysis. Next, we compute the bounding box from the segmentation of the object per frame and apply an inertia model to enhance the continuity of the sequence of bounding boxes. Finally, human pose evaluation can be achieved by processing the bounding box data with three proposed processes: Average Process, Group Comparing Process, Multi-frame Comparing Process. Most of the image processing algorithms in this thesis have been implemented in GPU to reduce the load of CPU. In this study, we used Kinect sensor as our ground truth to compare the performance. The experimental results show that our proposed method is capable of evaluating human poses efficiently, the average score difference between the Correctly-dancing condition and the Incorrectly-dancing condition is 36.1%. In addition, in the comparison between dual-camera and Kinect sensor, the deviation under the Correctly-dancing condition is 2.26% and the deviation under the Incorrectly-dancing condition is 3.57%. It reveals that we can use dual-camera to achieve the same performance as using Kinect when applying the proposed pose evaluation method.

    中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 LiteratureReview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Review of Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Motion Capture . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Motion Detector . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 Human Detection and Motion Analysis . . . . . . . . . . . . . 6 2.2 Review of Pose Estimation Methods . . . . . . . . . . . . . . . . . . . 8 2.2.1 Learning-based Method . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Rule-based Method . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Marker-based Method . . . . . . . . . . . . . . . . . . . . . . 10 2.2.4 Sensor-based Method . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.5 Depth-based Method . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Review of Dance Evaluation Methods . . . . . . . . . . . . . . . . . . 12 3 ProposedMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Depth Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Basic Conception . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 Dual-camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.3 Stereo Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Object Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Analysis of Disparity Map . . . . . . . . . . . . . . . . . . . . 24 3.2.2 Detection and Segmentation . . . . . . . . . . . . . . . . . . . 26 3.2.3 Refine the Segmentation . . . . . . . . . . . . . . . . . . . . . 27 3.2.4 Movement Threshold . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 GPU Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.1 Bounding Box of Foreground . . . . . . . . . . . . . . . . . . . 31 3.4.2 Generate A Bounding Box from A Foreground Image . . . . . 33 3.5 Dance Evaluating Algorithm . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.1 Add Inertia Property . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.2 Normalize A Bounding Box . . . . . . . . . . . . . . . . . . . 41 3.5.3 Calculate the Similarity between Two Sequence of Bounding Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.4 Evaluate A Dancing Score . . . . . . . . . . . . . . . . . . . . 49 4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 Testing Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 The Accuracy of The Dance Evaluating Algorithm . . . . . . . . . . . 54 4.3 Comparison between Kinect and Dual-camera . . . . . . . . . . . . . 58 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    [1] V. Belagiannis and A. Zisserman, “Recurrent human pose estimation,” in 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 468–475, 2017.

    [2] I. Marras, P. Palasek, and I. Patras, “Deep refinement convolutional networks
    for human pose estimation,” in 2017 12th IEEE International Conference on
    Automatic Face Gesture Recognition (FG 2017), pp. 446–453, 2017.

    [3] S. Li, W. Zhang, and A. B. Chan, “Maximum-margin structured learning with deep networks for 3d human pose estimation,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2848–2856, 2015.

    [4] L. Zhao, X. Gao, D. Tao, and X. Li, “Learning a tracking and estimation integrated graphical model for human pose tracking,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 12, pp. 3176–3186, 2015.

    [5] R. Ren and J. Collomosse, “Visual sentences for pose retrieval over low-resolution
    cross-media dance collections,” IEEE Transactions on Multimedia, vol. 14, no. 6, pp. 1652–1661, 2012.

    [6] A. Welivita, N. Nimalsiri, R. Wickramasinghe, U. Pathirana, and C. Gamage, “Quantitative evaluation of face detection and tracking algorithms for head pose estimation in mobile platforms,” in 2017 Moratuwa Engineering Research Conference (MERCon), pp. 310–315, 2017.

    [7] S. Saha, A. Konar, and R. Janarthanan, “Posture recognition in ballet dance a case study on fuzzy uniform discrete membership function,” in Proceedings of The 2014 International Conference on Control, Instrumentation, Energy and Communication (CIEC), pp. 708–711, 2014.

    [8] Q. K. Dang, D. D. Pham, and Y. S. Suh, “Dance training system using foot mounted sensors,” in 2015 IEEE/SICE International Symposium on System Integration (SII), pp. 732–737, 2015.

    [9] A. Dib and F. Charpillet, “Pose estimation for a partially observable human body from RGB-D cameras,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4915–4922, 2015.

    [10] W. Liu, Y. Zhang, S. Tang, J. Tang, R. Hong, and J. Li, “Accurate estimation of human body orientation from RGB-D sensors,” IEEE Transactions on Cybernetics, vol. 43, no. 5, pp. 1442–1452, 2013.

    [11] J. Campbell, S. Mills, and M. Paulin, “Mutual information of image intensity and gradient flux for markerless pose estimation,” in 2015 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6, 2015.

    [12] B. Peng and G. Qian, “Binocular dance pose recognition and body orientation estimation via multilinear analysis,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8, 2008.

    [13] K. Alahari, G. Seguin, J. Sivic, and I. Laptev, “Pose estimation and segmentation of people in 3d movies,” in 2013 IEEE International Conference on Computer Vision, pp. 2112–2119, 2013.

    [14] N. Anbarsanti and A. S. Prihatmanto, “Hmm-based model for dance motions with pose representation,” in 2014 IEEE 4th International Conference on System Engineering and Technology (ICSET), vol. 4, pp. 1–7, 2014.

    [15] M. Jang, D. Kim, Y. Kim, and J. Kim, “Automated dance motion evaluation using dynamic time warping and laban movement analysis,” in 2017 IEEE International Conference on Consumer Electronics (ICCE), pp. 141–142, 2017.

    16] N. Ramadijanti, H. F. Fahrul, and D. M. Pangestu, “Basic dance pose applications using kinect technology,” in 2016 International Conference on Knowledge Creation and Intelligent Computing (KCIC), pp. 194–200, 2016.

    [17] Y. Kawana and N. Ukita, “Occlusion-robust model learning for human pose estimation,” in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 494–498, 2015.

    [18] V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3d pictorial structures revisited: Multiple human pose estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 1929–1942, 2016.

    [19] O. Barnich and M. V. Droogenbroeck, “Vibe: A universal background subtraction algorithm for video sequences,” IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1709–1724, 2011.

    [20] Optical Flow implemented in GPU. http://www.webglacademy.com/courses.php?courses=8

    QR CODE