利用卷積神經網路剪枝技術實現實時人體姿態評估系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳弘展 Hung-Chan Chen
論文名稱：	利用卷積神經網路剪枝技術實現實時人體姿態評估系統 Realtime Pose Estimation via Convolutional Neural Network Pruning
指導教授：	姚智原 Chih-Yuan Yao 余能豪 Neng-Hao Yu
口試委員:	朱宏國 Hung-Kuo Chu 胡敏君 Min-Chun Hu
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	70
中文關鍵詞：	人體姿態評估、網路剪枝、卷積神經網路
外文關鍵詞：	Human Pose Estimation, Network Pruning, Convolutional Neuron Network
相關次數：	點閱：392 下載：17
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

基於卷積神經網路架構的人體姿態評估為電腦視覺中廣為討論與發展的領域之一。藉由
硬體效能上的進步，與卷積神經網路的迅速發展，在預測精準度與速度上都有不少的進
步，而後，隨著行動裝置、嵌入式系統等使用需求日益提高，將這些卷積神經網路佈置
到這些平台上的需求也逐步提升，然而，當前的卷積神經網路大多都十分複雜，無法在
這些裝置上正常運行，為此，對這些卷積神經網路進行壓縮使其能在這些裝置上運行的
研究成為近幾年熱門的研究方向。
本論文提出了一套簡單但高效的實時人體姿態評估系統，此方法為利用深度圖像做
為分析、訓練之對像並搭配現今非常熱門的卷積神經網路架構來自動偵測出人體關節的
2D/3D 位置，同時為了加快運行速度，我們利用了網路剪枝技術對所使用的卷積神經網
路進行更進一步的速度優化，使其能在維持近似精準度的同時能得到更進一步的速度提
升。
本論文主要可以分成四大部分，第一部分為 2D 人體姿態評估架構設計。第二部分
為 3D 人體姿態評估架構設計。第三部分為網路剪枝介紹與應用。最後一部分為探討搭
配目前主流骨架評估系統效能的方式，包含了 PDJ、mAP 與 FPS 三種。實驗部分，我
們採用現有的 2D 與 3D 深度圖骨架偵測資料庫，搭配 PDJ 和 mAP 評判我們的骨架評估
系統正確率分別為 75.14% 及 83.04%，同時能在 GTX 1050Ti 與 Pixel 5 上跑出 128FPS
及 40FPS，由此可知本論文提出之人體骨架評估系統在保留一定可信度的同時也具有極
快的推論速度。

Human pose estimation based on convolutional neural network is one of the widely discussed and developed areas in computer vision research. Benefit on the hardware performance advance and the rapid development of convolutional neural networks, there have
been many improvements in detection accuracy and speed. After that, with the increasing popularity of mobile devices, embedded systems, etc, the demand for deploying these
convolutional neural networks on these devices has gradually increased. However, most
of the current convolutional neural networks are very complex and cannot be used in these
devices. For this reason, the research on compressing these convolutional neural networks
so that they can run on these devices has become a popular research direction in recent
years.
In this thesis, we presents a simple and efficient approach for realtime human pose
estimation system. In our study, We employ depth images as the predict target, and we
also employ the popular deep learning method: Convolutional Neural Network(CNN) to
be our architecture to detect 2D/3D joints of the human body automatically. In order to
speed up our system, we employ network pruning method to optimize the speed of the
convolutional neural network we used, so that it can obtain a further speed increase while
maintaining approximate detection accuracy.
This paper was composed of four parts. The first part is the design of 2D human
pose estimation. The second part is the design of 3D human pose estimation. The third
part is the introduction and application of network pruning. The last part is to explore the
evaluation system at the present time, including PDJ, mAP and FPS.
In the experimental part, we used the existing 2D and 3D depth images human pose
estimation database, combined with PDJ and mAP to judge the accuracy of our human
pose estimation system to be 75.14% and 83.04% respectively, and it can run on GTX
1050Ti and Pixel 5 with 128FPS and 40FPS.it can be seen that the human pose estimation
system proposed in this paper retains a certain degree of credibility and also has a very
fast inference speed.

論文摘要
Abstract
誌謝
目錄
圖目錄
表目錄
緒論
1 研究背景與動機
2 論文貢獻
3 論文架構
相關研究
1 彩色圖人體姿態評估
2 深度圖人體姿態評估
3 網路壓縮技術
研究方法
研究設計
1 訓練資料庫使用介紹
1.1 深度圖影像資料庫
1.2 人體姿態 2D/3D 正確解答骨架座標生成
2 2D 人體姿態評估
2.1 輸入影像
2.2 輸出影像
2.3 網路基礎結構
3 3D 人體姿態評估
3.1 輸入 & 輸出資訊
3.2 網路基礎結構
4 網路剪枝的運用
4.1 網路剪枝
4.2 網路剪枝種類
4.3 簡枝演算法
4.4 2D 人體姿態評估上的調整
4.5 3D 人體姿態評估上的調整
實驗結果與分析
1 測試資料集介紹
2 正確率標準介紹
2.1 關節偵測百分比
2.2 平均精確度
3 研究方法調整結果比較
3.1 本研究於各測試資料集上的結果
3.2 不同訓練集交叉比較測試
3.3 剪枝後的影響結果比較
3.4 當前模型的錯誤情況分析
4 與他人結果比較
4.1 K2HPD 相關文獻比較測試結果
4.2 ITOP 相關文獻比較測試結果
結論與後續工作
參考文獻
                                

“Threedposeunitybarracuda,” in the World Wide Web:https://github.com/
digital-standard/ThreeDPoseUnityBarracuda, 2020.

S.E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in CVPR, 2016.

Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh, “Openpose: Realtime multiperson 2d pose estimation using part affinity fields,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, 2019.

D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.P. Seidel,
H. Rhodin, G. PonsMoll, and C. Theobalt, “XNect: Realtime multiperson 3D
motion capture with a single RGB camera,” vol. 39, 2020.

G. Moon, J. Chang, and K. M. Lee, “V2vposenet: Voxeltovoxel prediction network for accurate 3d hand and human pose estimation from a single depth map,” in
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

F. Xiong, B. Zhang, Y. Xiao, Z. Cao, T. Yu, J. Zhou Tianyi, and J. Yuan, “A2j:
Anchortojoint regression network for 3d articulated pose estimation from a single
depth image,” in Proceedings of the IEEE Conference on International Conference
on Computer Vision (ICCV), 2019.

D. Osokin, “Realtime 2d multiperson pose estimation on cpu: Lightweight openpose,” in arXiv preprint arXiv:1811.12004, 2018.

A. G. Howard, M. Zhu, D. K. Bo Chen, W. Wang, T. Weyand, M. Andreetto, and
H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision
applications,” in arXiv preprint arXiv:1704.04861, 2017.

S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for
efficient neural networks,” in arXiv preprint arXiv:1811.12004, 2015.

H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient
convnets,” in ICLR, p. 1–13, 2017.

K. Wang, S. Zhai, H. Cheng, X. Liang, and L. Lin, “Human pose estimation from
depth images via inference embedded multitask learning,” in In Proceedings of the
ACM International Conference on Multimedia (ACM MM), 2016.

D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli, “3d human pose estimation
in video with temporal convolutions and semisupervised training,” in Conference
on Computer Vision and Pattern Recognition (CVPR), 2019.

R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45,
1960.

A. Haque, B. Peng, Z. Luo, A. Alahi, S. Yeung, and L. FeiFei, “Towards viewpoint
invariant 3d human pose estimation,” in European Conference on Computer Vision,
October 2016.

A. Nibali, Z. He, S. Morgan, and L. Prendergast, “Numerical coordinate regression
with convolutional neural networks,” arXiv preprint arXiv:1801.07372, 2018.

T.C. Liu., “Realtime human pose estimation from single depth images via lowcost
platform,” 2019.

H. Hu, R. Peng, Y.W. Tai, and C.K. Tang, “Network trimming: A datadriven neuron pruning approach towards efficient deep architectures,” 07 2016.

M. Ester, H.P. Kriegel, J. Sander, and X. Xu, “A densitybased algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96,
p. 226–231, AAAI Press, 1996.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
arXiv preprint arXiv:1512.03385, 2015.

J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore,
P. Kohli, A. Criminisi, A. Kipman, and A. Blake., “Efficient human pose estimation
from single depth images.,” in IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2012.

簡易檢索 / 詳目顯示

相關論文