Basic Search / Detailed Display

Author: 許嘉豐
Richard Sugiarto
Thesis Title: 使用具有動態時間扭曲的多個攝像機的3D重建姿勢進行運動學習
Motion Learning using 3D Reconstruction Pose from Multiple Cameras with Dynamic Time Warping
Advisor: 楊傳凱
Chuan-Kai Yang
Committee: 林伯慎
Bor-Shen Lin
Yuan-Cheng Lai
Degree: 碩士
Department: 管理學院 - 資訊管理系
Department of Information Management
Thesis Publication Year: 2022
Graduation Academic Year: 110
Language: 英文
Pages: 53
Keywords (in Chinese): Human PoseOpenPoseIntrinsic Camera CalibrationExtrinsic Camera CalibrationDynamic Time Warping
Keywords (in other languages): Human Pose, OpenPose, Intrinsic Camera Calibration, Extrinsic Camera Calibration, Dynamic Time Warping
Reference times: Clicks: 202Downloads: 0
School Collection Retrieve National Library Collection Retrieve Error Report

Motion Learning is a common task nowadays. Motion Learning can be done by comparing all the joints of a skeleton with those of other skeletons. A pose estimator is used to get each joint’s location. There are many pose estimators such as OpenPose, DensePose, and many more. In this thesis, OpenPose is used as the 2D pose estimator. In general, a 2D pose estimator is not enough to cover all the pose information because the 2D pose estimator only focuses from one viewing direction. To overcome this problem, a multiple-camera approach is used to get more information from multiple views. Camera calibration is needed for each camera because multiple cameras are used. From the calibration, a 3D coordinate can be obtained. Finally, scoring can be done since the pose estimator produces 3D pose information. Dynamic Time Warping (DTW) is used for the scoring calculation. DTW is an algorithm that can measure the similarity between two temporal sequences. DTW also can handle the problem if the two temporal sequences have different speeds or different number of frames.

Abstract I Acknowledgment II Table of Contents III List of Figures V List of Tables VI Chapter 1. Introduction 1 1.1 Background 1 1.2 Contribution 2 1.3 Research Outline 2 Chapter 2. Related Works 3 2.1 Human Pose Estimation 3 2.2 Camera and Calibration 5 2.3 Pose Comparison 5 2.4 Kalman Filter 6 Chapter 3. Proposed System 8 3.1 System Architecture 8 3.2 2D Pose Estimation 9 3.3 Camera Projection Matrix and the calibration 10 3.4 Noise Reduction 15 3.5 Scoring Phase 17 Chapter 4. Experimental Results 20 4.1 Experiments Parameter 20 4.2 Experimental Results 21 Chapter 5. Conclusion and Discussion 41 5.1 Conclusion 41 References 42

[1] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields.” 2019.
[2] R. A. Güler, N. Neverova, and I. Kokkinos, “DensePose: Dense Human Pose Estimation In The Wild,” Accessed: Dec. 25, 2021. [Online]. Available:
[3] “Human pose estimation using OpenPose with TensorFlow (Part 1) | by Ale Solano | AR/VR Journey: Augmented & Virtual Reality Magazine.” (accessed Dec. 24, 2021).
[4] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields *,” Accessed: Dec. 24, 2021. [Online]. Available:
[5] T. Simon, H. Joo, I. Matthews, and Y. Sheikh, “Hand Keypoint Detection in Single Images using Multiview Bootstrapping.”
[6] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional Pose Machines.”
[7] A. Nibali, Z. He, S. Morgan, and L. Prendergast, “3D Human Pose Estimation with 2D Marginal Heatmaps.” 2018.
[8] D. Mehta et al., “XNect,” ACM Trans. Graph., vol. 39, no. 4, Jul. 2020, doi: 10.1145/3386569.3392410.
[9] “GitHub - mehtadushy/SelecSLS-Pytorch: Reference ImageNet implementation of SelecSLS CNN architecture proposed in the SIGGRAPH 2020 paper ‘XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera’. The repository also includes code for pruning the model based on implicit sparsity emerging from adaptive gradient descent methods, as detailed in the CVPR 2019 paper ‘On implicit filter level sparsity in Convolutional Neural Networks’.” (accessed Dec. 24, 2021).
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[11] O. Wasenmüller and D. Stricker, “Comparison of Kinect v1 and v2 Depth Images in Terms of Accuracy and Precision.”
[12] Y. M. Wang, Y. Li, and J. B. Zheng, “A camera calibration technique based on OpenCV,” in The 3rd International Conference on Information Sciences and Interaction Sciences, 2010, pp. 403–406, doi: 10.1109/ICICIS.2010.5534797.
[13] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, 2000, doi: 10.1109/34.888718.
[14] R. R. Romeošajina, M. Ivaši´, and I. Kos, “Pose estimation, tracking and comparison.”
[15] P. Senin, “Dynamic Time Warping Algorithm Review,” 2009.
[16] B. Costa et al., “Fault Classification on Transmission Lines Using KNN-DTW,” 2017, pp. 174–187, doi: 10.1007/978-3-319-62392-4_13.
[17] R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” J. Basic Eng., vol. 82, no. 1, p. 35, 1960, doi: 10.1115/1.3662552.
[18] Y. Salih and A. S. Malik, “3D object tracking using three Kalman filters,” in 2011 IEEE Symposium on Computers Informatics, 2011, pp. 501–505, doi: 10.1109/ISCI.2011.5958966.
[19] “GitHub - CMU-Perceptual-Computing-Lab/openpose: OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation.” (accessed Dec. 04, 2021).
[20] “OpenPose: OpenPose Doc - Output.” (accessed Dec. 24, 2021).
[21] “Understanding Camera Calibration – PERPETUAL ENIGMA.” (accessed Jan. 15, 2022).
[22] “Blackfly S USB3 | Teledyne FLIR.” (accessed Dec. 24, 2021).