簡易檢索 / 詳目顯示

研究生: 陳弘展
Hung-Chan Chen
論文名稱: 利用卷積神經網路剪枝技術實現實時人體姿態評估系統
Realtime Pose Estimation via Convolutional Neural Network Pruning
指導教授: 姚智原
Chih-Yuan Yao
余能豪
Neng-Hao Yu
口試委員: 朱宏國
Hung-Kuo Chu
胡敏君
Min-Chun Hu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 70
中文關鍵詞: 人體姿態評估網路剪枝卷積神經網路
外文關鍵詞: Human Pose Estimation, Network Pruning, Convolutional Neuron Network
相關次數: 點閱:392下載:17
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 基於卷積神經網路架構的人體姿態評估為電腦視覺中廣為討論與發展的領域之一。藉由
    硬體效能上的進步,與卷積神經網路的迅速發展,在預測精準度與速度上都有不少的進
    步,而後,隨著行動裝置、嵌入式系統等使用需求日益提高,將這些卷積神經網路佈置
    到這些平台上的需求也逐步提升,然而,當前的卷積神經網路大多都十分複雜,無法在
    這些裝置上正常運行,為此,對這些卷積神經網路進行壓縮使其能在這些裝置上運行的
    研究成為近幾年熱門的研究方向。
    本論文提出了一套簡單但高效的實時人體姿態評估系統,此方法為利用深度圖像做
    為分析、訓練之對像並搭配現今非常熱門的卷積神經網路架構來自動偵測出人體關節的
    2D/3D 位置,同時為了加快運行速度,我們利用了網路剪枝技術對所使用的卷積神經網
    路進行更進一步的速度優化,使其能在維持近似精準度的同時能得到更進一步的速度提
    升。
    本論文主要可以分成四大部分,第一部分為 2D 人體姿態評估架構設計。第二部分
    為 3D 人體姿態評估架構設計。第三部分為網路剪枝介紹與應用。最後一部分為探討搭
    配目前主流骨架評估系統效能的方式,包含了 PDJ、mAP 與 FPS 三種。實驗部分,我
    們採用現有的 2D 與 3D 深度圖骨架偵測資料庫,搭配 PDJ 和 mAP 評判我們的骨架評估
    系統正確率分別為 75.14% 及 83.04%,同時能在 GTX 1050Ti 與 Pixel 5 上跑出 128FPS
    及 40FPS,由此可知本論文提出之人體骨架評估系統在保留一定可信度的同時也具有極
    快的推論速度。


    Human pose estimation based on convolutional neural network is one of the widely discussed and developed areas in computer vision research. Benefit on the hardware performance advance and the rapid development of convolutional neural networks, there have
    been many improvements in detection accuracy and speed. After that, with the increasing popularity of mobile devices, embedded systems, etc, the demand for deploying these
    convolutional neural networks on these devices has gradually increased. However, most
    of the current convolutional neural networks are very complex and cannot be used in these
    devices. For this reason, the research on compressing these convolutional neural networks
    so that they can run on these devices has become a popular research direction in recent
    years.
    In this thesis, we presents a simple and efficient approach for real­time human pose
    estimation system. In our study, We employ depth images as the predict target, and we
    also employ the popular deep learning method: Convolutional Neural Network(CNN) to
    be our architecture to detect 2D/3D joints of the human body automatically. In order to
    speed up our system, we employ network pruning method to optimize the speed of the
    convolutional neural network we used, so that it can obtain a further speed increase while
    maintaining approximate detection accuracy.
    This paper was composed of four parts. The first part is the design of 2D human
    pose estimation. The second part is the design of 3D human pose estimation. The third
    part is the introduction and application of network pruning. The last part is to explore the
    evaluation system at the present time, including PDJ, mAP and FPS.
    In the experimental part, we used the existing 2D and 3D depth images human pose
    estimation database, combined with PDJ and mAP to judge the accuracy of our human
    pose estimation system to be 75.14% and 83.04% respectively, and it can run on GTX
    1050Ti and Pixel 5 with 128FPS and 40FPS.it can be seen that the human pose estimation
    system proposed in this paper retains a certain degree of credibility and also has a very
    fast inference speed.

    論文摘要 Abstract 誌謝 目錄 圖目錄 表目錄 1 緒論 1.1 研究背景與動機 1.2 論文貢獻 1.3 論文架構 2 相關研究 2.1 彩色圖人體姿態評估 2.2 深度圖人體姿態評估 2.3 網路壓縮技術 3 研究方法 4 研究設計 4.1 訓練資料庫使用介紹 4.1.1 深度圖影像資料庫 4.1.2 人體姿態 2D/3D 正確解答骨架座標生成 4.2 2D 人體姿態評估 4.2.1 輸入影像 4.2.2 輸出影像 4.2.3 網路基礎結構 4.3 3D 人體姿態評估 4.3.1 輸入 & 輸出資訊 4.3.2 網路基礎結構 4.4 網路剪枝的運用 4.4.1 網路剪枝 4.4.2 網路剪枝種類 4.4.3 簡枝演算法 4.4.4 2D 人體姿態評估上的調整 4.4.5 3D 人體姿態評估上的調整 5 實驗結果與分析 5.1 測試資料集介紹 5.2 正確率標準介紹 5.2.1 關節偵測百分比 5.2.2 平均精確度 5.3 研究方法調整結果比較 5.3.1 本研究於各測試資料集上的結果 5.3.2 不同訓練集交叉比較測試 5.3.3 剪枝後的影響結果比較 5.3.4 當前模型的錯誤情況分析 5.4 與他人結果比較 5.4.1 K2HPD 相關文獻比較測試結果 5.4.2 ITOP 相關文獻比較測試結果 6 結論與後續工作 參考文獻

    “Threedposeunitybarracuda,” in the World Wide Web:https://github.com/
    digital-standard/ThreeDPoseUnityBarracuda, 2020.

    S.­E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in CVPR, 2016.

    Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh, “Openpose: Realtime multi­person 2d pose estimation using part affinity fields,” IEEE Transactions
    on Pattern Analysis and Machine Intelligence, 2019.

    D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.­P. Seidel,
    H. Rhodin, G. Pons­Moll, and C. Theobalt, “XNect: Real­time multi­person 3D
    motion capture with a single RGB camera,” vol. 39, 2020.

    G. Moon, J. Chang, and K. M. Lee, “V2v­posenet: Voxel­to­voxel prediction network for accurate 3d hand and human pose estimation from a single depth map,” in
    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

    F. Xiong, B. Zhang, Y. Xiao, Z. Cao, T. Yu, J. Zhou Tianyi, and J. Yuan, “A2j:
    Anchor­to­joint regression network for 3d articulated pose estimation from a single
    depth image,” in Proceedings of the IEEE Conference on International Conference
    on Computer Vision (ICCV), 2019.

    D. Osokin, “Real­time 2d multi­person pose estimation on cpu: Lightweight openpose,” in arXiv preprint arXiv:1811.12004, 2018.

    A. G. Howard, M. Zhu, D. K. Bo Chen, W. Wang, T. Weyand, M. Andreetto, and
    H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision
    applications,” in arXiv preprint arXiv:1704.04861, 2017.

    S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for
    efficient neural networks,” in arXiv preprint arXiv:1811.12004, 2015.

    H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient
    convnets,” in ICLR, p. 1–13, 2017.

    K. Wang, S. Zhai, H. Cheng, X. Liang, and L. Lin, “Human pose estimation from
    depth images via inference embedded multi­task learning,” in In Proceedings of the
    ACM International Conference on Multimedia (ACM MM), 2016.

    D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli, “3d human pose estimation
    in video with temporal convolutions and semi­supervised training,” in Conference
    on Computer Vision and Pattern Recognition (CVPR), 2019.

    R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45,
    1960.

    A. Haque, B. Peng, Z. Luo, A. Alahi, S. Yeung, and L. Fei­Fei, “Towards viewpoint
    invariant 3d human pose estimation,” in European Conference on Computer Vision,
    October 2016.

    A. Nibali, Z. He, S. Morgan, and L. Prendergast, “Numerical coordinate regression
    with convolutional neural networks,” arXiv preprint arXiv:1801.07372, 2018.

    T.­C. Liu., “Real­time human pose estimation from single depth images via low­cost
    platform,” 2019.

    H. Hu, R. Peng, Y.­W. Tai, and C.­K. Tang, “Network trimming: A data­driven neuron pruning approach towards efficient deep architectures,” 07 2016.

    M. Ester, H.­P. Kriegel, J. Sander, and X. Xu, “A density­based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96,
    p. 226–231, AAAI Press, 1996.

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
    arXiv preprint arXiv:1512.03385, 2015.

    J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore,
    P. Kohli, A. Criminisi, A. Kipman, and A. Blake., “Efficient human pose estimation
    from single depth images.,” in IEEE Transactions on Pattern Analysis and Machine
    Intelligence, 2012.

    QR CODE