研究生: |
陳弘展 Hung-Chan Chen |
---|---|
論文名稱: |
利用卷積神經網路剪枝技術實現實時人體姿態評估系統 Realtime Pose Estimation via Convolutional Neural Network Pruning |
指導教授: |
姚智原
Chih-Yuan Yao 余能豪 Neng-Hao Yu |
口試委員: |
朱宏國
Hung-Kuo Chu 胡敏君 Min-Chun Hu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 70 |
中文關鍵詞: | 人體姿態評估 、網路剪枝 、卷積神經網路 |
外文關鍵詞: | Human Pose Estimation, Network Pruning, Convolutional Neuron Network |
相關次數: | 點閱:462 下載:18 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基於卷積神經網路架構的人體姿態評估為電腦視覺中廣為討論與發展的領域之一。藉由
硬體效能上的進步,與卷積神經網路的迅速發展,在預測精準度與速度上都有不少的進
步,而後,隨著行動裝置、嵌入式系統等使用需求日益提高,將這些卷積神經網路佈置
到這些平台上的需求也逐步提升,然而,當前的卷積神經網路大多都十分複雜,無法在
這些裝置上正常運行,為此,對這些卷積神經網路進行壓縮使其能在這些裝置上運行的
研究成為近幾年熱門的研究方向。
本論文提出了一套簡單但高效的實時人體姿態評估系統,此方法為利用深度圖像做
為分析、訓練之對像並搭配現今非常熱門的卷積神經網路架構來自動偵測出人體關節的
2D/3D 位置,同時為了加快運行速度,我們利用了網路剪枝技術對所使用的卷積神經網
路進行更進一步的速度優化,使其能在維持近似精準度的同時能得到更進一步的速度提
升。
本論文主要可以分成四大部分,第一部分為 2D 人體姿態評估架構設計。第二部分
為 3D 人體姿態評估架構設計。第三部分為網路剪枝介紹與應用。最後一部分為探討搭
配目前主流骨架評估系統效能的方式,包含了 PDJ、mAP 與 FPS 三種。實驗部分,我
們採用現有的 2D 與 3D 深度圖骨架偵測資料庫,搭配 PDJ 和 mAP 評判我們的骨架評估
系統正確率分別為 75.14% 及 83.04%,同時能在 GTX 1050Ti 與 Pixel 5 上跑出 128FPS
及 40FPS,由此可知本論文提出之人體骨架評估系統在保留一定可信度的同時也具有極
快的推論速度。
Human pose estimation based on convolutional neural network is one of the widely discussed and developed areas in computer vision research. Benefit on the hardware performance advance and the rapid development of convolutional neural networks, there have
been many improvements in detection accuracy and speed. After that, with the increasing popularity of mobile devices, embedded systems, etc, the demand for deploying these
convolutional neural networks on these devices has gradually increased. However, most
of the current convolutional neural networks are very complex and cannot be used in these
devices. For this reason, the research on compressing these convolutional neural networks
so that they can run on these devices has become a popular research direction in recent
years.
In this thesis, we presents a simple and efficient approach for realtime human pose
estimation system. In our study, We employ depth images as the predict target, and we
also employ the popular deep learning method: Convolutional Neural Network(CNN) to
be our architecture to detect 2D/3D joints of the human body automatically. In order to
speed up our system, we employ network pruning method to optimize the speed of the
convolutional neural network we used, so that it can obtain a further speed increase while
maintaining approximate detection accuracy.
This paper was composed of four parts. The first part is the design of 2D human
pose estimation. The second part is the design of 3D human pose estimation. The third
part is the introduction and application of network pruning. The last part is to explore the
evaluation system at the present time, including PDJ, mAP and FPS.
In the experimental part, we used the existing 2D and 3D depth images human pose
estimation database, combined with PDJ and mAP to judge the accuracy of our human
pose estimation system to be 75.14% and 83.04% respectively, and it can run on GTX
1050Ti and Pixel 5 with 128FPS and 40FPS.it can be seen that the human pose estimation
system proposed in this paper retains a certain degree of credibility and also has a very
fast inference speed.
“Threedposeunitybarracuda,” in the World Wide Web:https://github.com/
digital-standard/ThreeDPoseUnityBarracuda, 2020.
S.E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in CVPR, 2016.
Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh, “Openpose: Realtime multiperson 2d pose estimation using part affinity fields,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, 2019.
D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.P. Seidel,
H. Rhodin, G. PonsMoll, and C. Theobalt, “XNect: Realtime multiperson 3D
motion capture with a single RGB camera,” vol. 39, 2020.
G. Moon, J. Chang, and K. M. Lee, “V2vposenet: Voxeltovoxel prediction network for accurate 3d hand and human pose estimation from a single depth map,” in
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
F. Xiong, B. Zhang, Y. Xiao, Z. Cao, T. Yu, J. Zhou Tianyi, and J. Yuan, “A2j:
Anchortojoint regression network for 3d articulated pose estimation from a single
depth image,” in Proceedings of the IEEE Conference on International Conference
on Computer Vision (ICCV), 2019.
D. Osokin, “Realtime 2d multiperson pose estimation on cpu: Lightweight openpose,” in arXiv preprint arXiv:1811.12004, 2018.
A. G. Howard, M. Zhu, D. K. Bo Chen, W. Wang, T. Weyand, M. Andreetto, and
H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision
applications,” in arXiv preprint arXiv:1704.04861, 2017.
S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for
efficient neural networks,” in arXiv preprint arXiv:1811.12004, 2015.
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient
convnets,” in ICLR, p. 1–13, 2017.
K. Wang, S. Zhai, H. Cheng, X. Liang, and L. Lin, “Human pose estimation from
depth images via inference embedded multitask learning,” in In Proceedings of the
ACM International Conference on Multimedia (ACM MM), 2016.
D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli, “3d human pose estimation
in video with temporal convolutions and semisupervised training,” in Conference
on Computer Vision and Pattern Recognition (CVPR), 2019.
R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45,
1960.
A. Haque, B. Peng, Z. Luo, A. Alahi, S. Yeung, and L. FeiFei, “Towards viewpoint
invariant 3d human pose estimation,” in European Conference on Computer Vision,
October 2016.
A. Nibali, Z. He, S. Morgan, and L. Prendergast, “Numerical coordinate regression
with convolutional neural networks,” arXiv preprint arXiv:1801.07372, 2018.
T.C. Liu., “Realtime human pose estimation from single depth images via lowcost
platform,” 2019.
H. Hu, R. Peng, Y.W. Tai, and C.K. Tang, “Network trimming: A datadriven neuron pruning approach towards efficient deep architectures,” 07 2016.
M. Ester, H.P. Kriegel, J. Sander, and X. Xu, “A densitybased algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96,
p. 226–231, AAAI Press, 1996.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
arXiv preprint arXiv:1512.03385, 2015.
J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore,
P. Kohli, A. Criminisi, A. Kipman, and A. Blake., “Efficient human pose estimation
from single depth images.,” in IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2012.