簡易檢索 / 詳目顯示

研究生: 張哲瑋
Zhe-Wei Zhang
論文名稱: 利用雙鏡頭在低成本嵌入式系統上實現即時姿勢評估
Real-time Pose Evaluation using Dual Camera on Low Cost Embedded System
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 阮聖彰
Shanq-Jang Ruan
Wen-Kai Tai
Hung-Kuo Chu
Chih-Yuan Yao
Yu-Chi Lai
學位類別: 碩士
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 67
中文關鍵詞: 姿勢評估人類姿勢評估動作偵測動作評估連續姿勢評估
外文關鍵詞: Pose Evaluation, Pose Estimation, Motion Detection, Motion Estimation, Sequential Pose Evaluation
相關次數: 點閱:393下載:3
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報


In recent years, Human-computer interaction becomes a trend. Instead of relying on the input devices such as the button, rocker, and keyboard, moving the body or making a pose can communicate with a computer directly based on cameras. However, human pose evaluation is a huge challenge in computer vision, this technology is usually achieved with high computational demand. In this thesis, we propose a dual-camera based real-time human pose evaluation method, whose demand of computation is lower than recent methods, hence can be implemented on low-cost hardware. First, a moderate quality of disparity map can be generated with low computation by combining the block-matching algorithm and optical flow based restoration process. Then, a segmentation of an object can be obtained by analyzing the histogram of the disparity map and performing peak analysis. Next, we compute the bounding box from the segmentation of the object per frame and apply an inertia model to enhance the continuity of the sequence of bounding boxes. Finally, human pose evaluation can be achieved by processing the bounding box data with three proposed processes: Average Process, Group Comparing Process, Multi-frame Comparing Process. Most of the image processing algorithms in this thesis have been implemented in GPU to reduce the load of CPU. In this study, we used Kinect sensor as our ground truth to compare the performance. The experimental results show that our proposed method is capable of evaluating human poses efficiently, the average score difference between the Correctly-dancing condition and the Incorrectly-dancing condition is 36.1%. In addition, in the comparison between dual-camera and Kinect sensor, the deviation under the Correctly-dancing condition is 2.26% and the deviation under the Incorrectly-dancing condition is 3.57%. It reveals that we can use dual-camera to achieve the same performance as using Kinect when applying the proposed pose evaluation method.

中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 LiteratureReview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Review of Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Motion Capture . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Motion Detector . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 Human Detection and Motion Analysis . . . . . . . . . . . . . 6 2.2 Review of Pose Estimation Methods . . . . . . . . . . . . . . . . . . . 8 2.2.1 Learning-based Method . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Rule-based Method . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Marker-based Method . . . . . . . . . . . . . . . . . . . . . . 10 2.2.4 Sensor-based Method . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.5 Depth-based Method . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Review of Dance Evaluation Methods . . . . . . . . . . . . . . . . . . 12 3 ProposedMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Depth Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Basic Conception . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 Dual-camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.3 Stereo Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Object Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Analysis of Disparity Map . . . . . . . . . . . . . . . . . . . . 24 3.2.2 Detection and Segmentation . . . . . . . . . . . . . . . . . . . 26 3.2.3 Refine the Segmentation . . . . . . . . . . . . . . . . . . . . . 27 3.2.4 Movement Threshold . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 GPU Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.1 Bounding Box of Foreground . . . . . . . . . . . . . . . . . . . 31 3.4.2 Generate A Bounding Box from A Foreground Image . . . . . 33 3.5 Dance Evaluating Algorithm . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.1 Add Inertia Property . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.2 Normalize A Bounding Box . . . . . . . . . . . . . . . . . . . 41 3.5.3 Calculate the Similarity between Two Sequence of Bounding Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.4 Evaluate A Dancing Score . . . . . . . . . . . . . . . . . . . . 49 4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 Testing Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 The Accuracy of The Dance Evaluating Algorithm . . . . . . . . . . . 54 4.3 Comparison between Kinect and Dual-camera . . . . . . . . . . . . . 58 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

[1] V. Belagiannis and A. Zisserman, “Recurrent human pose estimation,” in 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 468–475, 2017.

[2] I. Marras, P. Palasek, and I. Patras, “Deep refinement convolutional networks
for human pose estimation,” in 2017 12th IEEE International Conference on
Automatic Face Gesture Recognition (FG 2017), pp. 446–453, 2017.

[3] S. Li, W. Zhang, and A. B. Chan, “Maximum-margin structured learning with deep networks for 3d human pose estimation,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2848–2856, 2015.

[4] L. Zhao, X. Gao, D. Tao, and X. Li, “Learning a tracking and estimation integrated graphical model for human pose tracking,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 12, pp. 3176–3186, 2015.

[5] R. Ren and J. Collomosse, “Visual sentences for pose retrieval over low-resolution
cross-media dance collections,” IEEE Transactions on Multimedia, vol. 14, no. 6, pp. 1652–1661, 2012.

[6] A. Welivita, N. Nimalsiri, R. Wickramasinghe, U. Pathirana, and C. Gamage, “Quantitative evaluation of face detection and tracking algorithms for head pose estimation in mobile platforms,” in 2017 Moratuwa Engineering Research Conference (MERCon), pp. 310–315, 2017.

[7] S. Saha, A. Konar, and R. Janarthanan, “Posture recognition in ballet dance a case study on fuzzy uniform discrete membership function,” in Proceedings of The 2014 International Conference on Control, Instrumentation, Energy and Communication (CIEC), pp. 708–711, 2014.

[8] Q. K. Dang, D. D. Pham, and Y. S. Suh, “Dance training system using foot mounted sensors,” in 2015 IEEE/SICE International Symposium on System Integration (SII), pp. 732–737, 2015.

[9] A. Dib and F. Charpillet, “Pose estimation for a partially observable human body from RGB-D cameras,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4915–4922, 2015.

[10] W. Liu, Y. Zhang, S. Tang, J. Tang, R. Hong, and J. Li, “Accurate estimation of human body orientation from RGB-D sensors,” IEEE Transactions on Cybernetics, vol. 43, no. 5, pp. 1442–1452, 2013.

[11] J. Campbell, S. Mills, and M. Paulin, “Mutual information of image intensity and gradient flux for markerless pose estimation,” in 2015 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6, 2015.

[12] B. Peng and G. Qian, “Binocular dance pose recognition and body orientation estimation via multilinear analysis,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8, 2008.

[13] K. Alahari, G. Seguin, J. Sivic, and I. Laptev, “Pose estimation and segmentation of people in 3d movies,” in 2013 IEEE International Conference on Computer Vision, pp. 2112–2119, 2013.

[14] N. Anbarsanti and A. S. Prihatmanto, “Hmm-based model for dance motions with pose representation,” in 2014 IEEE 4th International Conference on System Engineering and Technology (ICSET), vol. 4, pp. 1–7, 2014.

[15] M. Jang, D. Kim, Y. Kim, and J. Kim, “Automated dance motion evaluation using dynamic time warping and laban movement analysis,” in 2017 IEEE International Conference on Consumer Electronics (ICCE), pp. 141–142, 2017.

16] N. Ramadijanti, H. F. Fahrul, and D. M. Pangestu, “Basic dance pose applications using kinect technology,” in 2016 International Conference on Knowledge Creation and Intelligent Computing (KCIC), pp. 194–200, 2016.

[17] Y. Kawana and N. Ukita, “Occlusion-robust model learning for human pose estimation,” in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 494–498, 2015.

[18] V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3d pictorial structures revisited: Multiple human pose estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 1929–1942, 2016.

[19] O. Barnich and M. V. Droogenbroeck, “Vibe: A universal background subtraction algorithm for video sequences,” IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1709–1724, 2011.

[20] Optical Flow implemented in GPU. http://www.webglacademy.com/courses.php?courses=8