簡易檢索 / 詳目顯示

研究生: 劉庭君
Ting-Chun Liu
論文名稱: 利用深度圖實現即時姿態評估系統在低成本平台
Real-time Human Pose Estimation from Single Depth Images via Low-Cost Platform
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 姚智原
Chih-Yuan Yao
朱宏國
Hung-Kuo Chu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 54
中文關鍵詞: 深度圖姿態評估低成本
外文關鍵詞: depth map, pose estimation, low-cost
相關次數: 點閱:286下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來隨著科技的日新月異,人機互動裝置一直是大家廣為討論與發展的
技術,不同以往必須要使用按鈕、鍵盤等傳統周邊設備當做輸入指令的裝置,取
而代之的是透過身體直接與機器做互動、溝通,這樣不但使得操做更具多元、多
變性,也能讓使用者能更容易的進入狀況並且更直觀的依照身體動作和機器做互
動。然而在現有的技術中透過相機來準確判斷人體動作一直是複雜並且高難度
的,一方面是要能即時的完成判斷動作的任務,另一方面則是要達到一定的判斷
能力,所以在這個議題一直是電腦視覺領域爭相討論與研究的題目。
本論文提出了一套可應用於跳舞機或是實境遊戲並可以實現於中低成本硬體
設備的人體骨架評估系統,此方法為利用深度圖像做為分析、訓練之對像並搭配
現今非常熱門的卷積神經網路架構(CNN)可自動偵測出人體12處關節,由於本論
文使用的是深度圖像,因此我們了探討不同深度圖相機所拍攝圖片之效能、特
點,並且集合各個深度圖當做訓練集,使得我們的機器能更加靈活且準確的判斷
關節位置。
本論文主要可以分成四大部分,第一部分為輸入影像資料庫探討。第二部分
為關節標記順序、位置及調整方式。第三部分為卷積神經網路架構之探討。最
後一部分為探討搭配目前主流骨架評估系統效能的方式,包含了OKS和PDJ兩
種。實驗部分,我們採用RealSense相機並搭配我們的架構,並且在不同情境下包
含了室內及遊樂場進行實驗,即時呈現出骨架位置,最後搭配OKS及PDJ評判
我們的骨架評估系統正確率分別為93.1%及79.0%,由此可知本論文提出之人體骨
架評估系統是具有很高的可信度。


In recent years, the rapid development of the technology, human-computer
interaction devices is becoming a hot issue that discussed and developed widely. In-
stead of employing the traditional peripheral products such as buttons and keyboard
as input, users can interact and communicate with the machine through the body
directly, this not only makes the operation more diverse and various but also makes
it much easier for users to get into scenarios. However, it is complicated and dicult
to judge the human body through the camera correctly and it is necessary to judge
the movement in real-time, on the other hand, it is also important to achieve high
identi cation ability, so it has been the hot issue of discussion and research in the
eld of computer vision.
In this thesis, we proposed a system of human pose estimation that can be
applied to dance machine or the games of Virtual Reality(VR) in low-cost hardware
equipment. In our study, we employed depth images as training resource, and we also
employed the popular deep learning method: Convolutional Neural Network(CNN)
to be our architecture to detect 12 joints of the human body automatically. Since
employed depth images, we have explored the performance and characteristics of the
images that taken by di erent depth cameras and collected various depth maps as
training sets, which make our machine identify positions of joints more accurately.
This study was composed of four parts, the rst part is the database of the
depth images we employed. The second part is the labeling of the joints and the
adjustment method in our research. The third part is doing the discussion of CNN
architecture we implement. The last part is to explore the evaluation system at the
present time. Including OKS and PDJ evaluation.
In the experimental part, we employed the depth images by RealSense camera
to be input in our system, doing the experiment in indoor and in video arcade, that
showed the positions of joints immediately. Finally, we employed OKS and PDJ to
be our validation methods to judge our skeleton evaluation system that can achieve
93.1% and 79.0%, with the impressive result of identi cation, we implement the
model we trained into a real-time detection and results showed that the system we
proposed is reliable.

中文摘要 Abstract 1 緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究背景與動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 論文貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 相關研究. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 深度圖人體姿態評估. . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 彩色圖人體姿態評估. . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 卷積神經網路應用. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 實驗方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1 深度圖介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.1 深度圖. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1.2 深度相機種類. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.3 立體視覺技術. . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 訓練資料庫介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2.1 深度圖影像資料庫. . . . . . . . . . . . . . . . . . . . . . . . 15 4.2.2 標記(labeling) . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 深度學習架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.1 輸入影像. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.2 輸出影像. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3.3 CNN架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 實驗結果與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.1 測試資料庫介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 正確率標準介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2.1 人體關鍵點評價指標. . . . . . . . . . . . . . . . . . . . . . . 27 5.2.2 關節偵測百分比. . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3 研究方法調整結果比較. . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.3.1 本研究自行拍攝測試資料庫測試結果. . . . . . . . . . . . . . 29 5.3.2 不同種類相機產生之訓練資料庫訓練結果比較. . . . . . . . . 30 5.3.3 不同階段結果比較. . . . . . . . . . . . . . . . . . . . . . . . 33 5.4 與他人結果比較. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.4.1 本研究使用K2HPD測試結果. . . . . . . . . . . . . . . . . . . 35 5.4.2 比較相關文獻測試結果. . . . . . . . . . . . . . . . . . . . . . 36 6 結論與未來工作. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

[1] Nintendo. Wii. Retrieved 2014, from the World Wide Web: https://www.
nintendo.co.jp/wii/index.html, 2014.
[2] Sony. Playstation vr. Retrieved 2018, from the World Wide Web: https:
//asia.playstation.com/cht-tw/psvr/, 2018.
[3] Jamie Shotton, Ross Girshick, Andrew Fitzgibbon, Toby Sharp, Mat Cook,
Mark Finocchio, Richard Moore, Pushmeet Kohli, Antonio Criminisi, Alex Kip-
man, and Andrew Blake. Ecient human pose estimation from single depth
images. Trans. PAMI, January 2012.
[4] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person
2d pose estimation using part anity elds. In CVPR, 2017.
[5] MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/,
2010.
[6] D. Scharstein and R. Szeliski. High-accuracy stereo depth maps using structured
light. In IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR 2003), 1:195{202, 2003.
[7] Nick. 【圖解】3d感測技術發展與應用趨勢|大和有話說. Retrieved Mar
11, 2018, from the World Wide Web: https://dahetalk.com/2018/03/11/,
2018.
[8] Intel. Intel R
realsenseTM depth camera d435. Retrieved Jul
15, 2018, from the World Wide Web: https://click.intel.com/
intelr-realsensetm-depth-camera-d435.html, 2018.
[9] Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. Unstructured
human activity detection from rgbd images. In International Conference on
Robotics and Automation (ICRA), 2012.
[10] Bogdan Kwolek and Michal Kepski. Human fall detection on embedded plat-
form using depth maps and wireless accelerometer. In Computer Methods and
Programs in Biomedicine, volume 117, pages 489{501, 2014.
[11] I. Lillo, A. Soto, and J. C. Niebles. Discriminative hierarchical modeling of
spatio-temporally composable human activities. IEEE Conference on Computer
Vision and Pattern Recognition, 2014.
[12] L. Xia, C.C. Chen, and JK Aggarwal. View invariant human action recognition
using histograms of 3d joints. In Computer Vision and Pattern Recognition
Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 20{
27. IEEE, 2012.
[13] Keze Wang, Shengfu Zhai, Hui Cheng, Xiaodan Liang, and Liang Lin. Hu-
man pose estimation from depth images via inference embedded multi-task
learning. In Proceedings of the ACM International Conference on Multimedia
(ACM MM), 2016.
[14] Chenxia Wu, Jiemi Zhang, Silvio Savarese, and Ashutosh Saxena. Watch-n-
patch: Unsupervised understanding of actions and relations. In The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[15] Weichen Zhang, Zhiguang Liu, Liuyang Zhou, Howard Leung, and Antoni B.
Chan. Martial arts, dancing and sports dataset: a challenging stereo and multi-
view dataset for 3d human pose estimation. In Image and Vision Computing,
volume 61, pages 22{39, May 2017.
[16] Tsung-Yi Lin, Genevieve Patterson, Matteo R. Ronchi, Yin Cui, Michael Maire,
Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona,
Deva Ramanan, Larry Zitnick, and Piotr Dollar. Coco dataset. Retrieved 2018,
from the World Wide Web: http://cocodataset.org/#home, 2018.
[17] Keze Wang, Liang Lin, Chuangjie Ren, Wei Zhang, and Wenxiu Sun. Convolu-
tional memory blocks for depth data representation learning. In International
Joint Conference on Arti cial Intelligence (IJCAI). IJCAI, 2018.
[18] Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. Convo-
lutional pose machines, 2016.
[19] Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2d
human pose estimation: New benchmark and state of the art analysis. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.

QR CODE