簡易檢索 / 詳目顯示

研究生: 宋俊賢
Song Jyum Xian
論文名稱: 基於人類姿態偵測與動作辨識之觀察學習系統開發及雙臂組裝作業應用
Development of Observational Learning System Based on Human Pose Detection and Motion Classification for Dual-arm Robotic Assembly Operations
指導教授: 郭重顯
Chung-Hsien Kuo
口試委員: 蘇順豐
Shun-Feng Su
鍾聖倫
Sheng-Luen Chung
吳世琳
WU,SHIH-LIN
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 120
中文關鍵詞: 動作辨識雙臂機器人組裝機器人自我學習
外文關鍵詞: motion recognition, dual-arm robot assembly, robot learning
相關次數: 點閱:221下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自從工業4.0 時代來臨,工廠逐漸走向客製化生產的趨勢,傳統的機器人教導流程步驟繁瑣耗時,已經無法應付現今少量多樣的生產,為了改善上述現況,本論文設計一基於人體骨架姿態偵測與行為辨識之雙臂組裝流程教導系統,並分為「姿態偵測與動作辨識之觀察學習系統」及「物件辨識及雙臂組裝作業應用」兩大主軸,前者將各種組裝動作進行任務編排與紀錄,後者則利用12軸雙臂機器人來還原組裝流程。
    首先在行為辨識部份,本論文採用OpenPose人體姿態辨識技術提取人體上半身關節點 (包含胸口、肩膀、手軸及手腕特徵點)並結合手部向量作為動作特徵,其次透過自適應動作分割流程將教導動作分割成每段步驟,利用改良型Two-stream adaptive graph convolutional of long short-term memory networks (2s-agcn-lstm) 神經網路將分割後的每段動作特徵歸類為7種常見的組裝動作(包含移動、組裝、旋轉與抓取物品…等組裝動作類別),同時透過影像色彩分群技術進行組裝物件種類辨識,最後將動作類別與物品編排成任務並紀錄,產生組裝任務鏈;另一方面,為了將組裝任務鏈實踐於工場中,本論文以實驗室所設計之雙臂機器人作為重現平臺,使手眼標定(Eye to hand)機器人能夠準確地抓取指定的組裝零件並重現各種高靈巧度的組裝動作,有效地解決組裝教導所產生的繁瑣問題。
    為了驗證系統準確度與穩定性,本論文進行了不同使用者測試、不同組裝環境測試、不同組裝時間測試及不同視覺範圍測試,動作辨識準確率皆可達到96%以上,並利用五種自製且不同顏色的可組裝式螺絲與螺帽作為組裝零件,以此佐證本論文的可行性與穩定性。


    Since the advance of the Industry 4.0 era, factories have gradually moved towards customized production. The traditional robot teaching process is cumbersome and time-consuming, and it has been unable to cope with the small and diverse production today. In order to improve the above situation, this study designs a gesture detection based on human skeleton. The two-arm assembly process teaching system for behavior recognition and behavior recognition is divided into two main aspects: "observation and learning system for posture detection and motion recognition" and "object recognition and application for two-arm assembly operation." To validate the proposed system, this study uses a 12-axis dual-arm robot to restore the above mentioned assembly process.
    First, in the behavior of recognition part, this study uses OpenPose human body pose recognition technology to extract the joint points of the upper body (including chest, shoulder, hand axis and wrist feature points) and combines hand vector as motion feature. Secondly, the adaptive motion segmentation process uses to divide the action into each step. This process using the improved Two-stream adaptive graph convolutional of long short-term memory networks (2s-agcn-lstm) neural network classify segmentation action feature to 7 common assembly actions (including moving, assembling, rotating, and grabbing items etc.), through image color grouping technology identify the types of assembled objects. Finally, arranging the action categories and items into tasks and recording, generating assembly task training. On the hand, to practice the assembly task in the workshop. This study uses the two-armed robot designed by the laboratory as a reproduction platform, so that the Eye-to-hand robot can accurately grasp designated assembly parts and reproduce various high-dexterity assembly actions, and solving the cumbersome assembly teaching problems.
    In order to verify the accuracy and stability of the system, this paper has conducted different user tests, different assembly environment tests, different assembly time tests and different visual range tests. The accuracy of motion recognition can reach more than 96%, and five self-made screws and nuts of different colors are used as assembly parts to prove the feasibility and stability of this paper.

    目錄 指導教授推薦書 i 口試委員會審定書 ii 誌謝 iii 摘要 iv Abstract v 目錄 vi 表目錄 ix 圖目錄 xi 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3文獻回顧 4 1.3.1人體姿態辨識與機器人互動相關研究 4 1.3.2人體動作辨識與分類之相關研究 5 1.3.3機器人教導與組裝作業之相關研究 7 1.3.4機器人物件辨識與物件姿態偵測研究 9 1.4論文架構 11 第二章 系統架構與實驗平台 12 2.1 系統組織與運作流程 12 2.2雙臂機器人與硬體平臺設計 13 2.3實驗環境設計與限制 19 第三章 姿態偵測與動作辨識之觀察學習系統 20 3.1 人體姿態偵測 20 3.1.1 OpenPose姿態提取技術 20 3.1.2 基於LMA準則之姿態先驗方法 22 3.2 手部姿態偵測 33 3.2.1手指姿態估計 33 3.2.2手部深度補償演算法 35 3.2.3手部抓取坐標 39 3.3 行為辨識分類 43 3.3.1動作資料集設計 43 3.3.2 2s-agcn動作辨識 45 3.3.3 2s-agcn-lstm動作辨識 51 3.4 觀察學習系統之任務鏈生成 54 3.4.1動作辨識與停止判斷 54 3.4.2教導物件辨識 56 第四章 雙臂組裝作業應用 58 4.1 雙手臂運動控制 58 4.1.1雙手臂座標系定義 58 4.1.2雙手臂正向運動學 61 4.1.3雙手臂逆向運動學 62 4.1.4運動規劃與手臂電流回授保護 63 4.2機器視覺物件識別系統 65 4.2.1螺絲姿態識別流程 67 4.2.2螺帽定位流程 69 4.2.3組裝盒子定位流程 69 4.3機器人動作流程 71 4.3.1機器人移動流程 72 4.3.2雙手舉起與放下的動作流程 73 4.3.3同心軸組裝流程以及鎖螺絲流程 74 4.3.3使用者介面設計 76 第五章 實驗結果與討論 80 5.1 姿態定位精度實驗 80 5.2 動作辨識準確度實驗 86 5.3 動作還原實驗 88 第六章 結論及未來發展 92 6.1 結論 92 6.2 未來研究方向 92 參考文獻 93

    參考文獻
    [1] Frankfurt,Top Trends Robotics 2020。檢自https://ifr.org/ifr-press-releases/news/top-trends-robotics-2020 (Feb 19, 2020)
    [2] CTWANT,客制化工廠產能提升70%,檢自https://www.ctwant.com /article/44420。(2020)
    [3] Bastiane Huang,How will AI robots disrupt manufacturing。檢自 https://www.bnext.com.tw/article/56328/ai-robotics-manufacturing (Feb 5, 2020)
    [4] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [5] A. Toshev and C. Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, pp. 1653-1660, 2014.
    [6] J. Tompson, R. Goroshin, A. Jain, Y. LeCun and C. Bregler, “Efficient object localization using Convolutional Networks,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 648-656, 2015.
    [7] Z. Cao, T. Simon, S. Wei and Y. Sheikh, “Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 1302-1310, 2017.
    [8] C. Zimmermann, T. Welschehold, C. Dornhege, W. Burgard and T. Brox, “3D Human Pose Estimation in RGBD Images for Robotic Task Learning,” 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, pp. 1986-1992, 2018.
    [9] J. Sa, B. Yun and K. Choi, “Human Pose Refinement for Reliable Robotic Teleoperation,” 2020 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, pp. 1-2, 2020.
    [10] A. Al-Mayyahi, and W. Wang, “Fuzzy inference approach for autonomous ground vehicle navigation in dynamic environment,” 2014 IEEE International Conference on Control System, Computing and Engineering (ICCSCE 2014), pp. 29-34, 2014.
    [11] B. Tekin, P.M. Neila, M. Salzmann and P. Fua, “Learning to fuse 2d and 3d image cues for monocular body pose estimation,” IEEE International Conference on Computer Vision (ICCV), pp. 1-13, 2017.
    [12] F. Marić, I. Jurin, I. Marković, Z. Kalafatić and I. Petrović, “Robot arm teleoperation via RGBD sensor palm tracking,” 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, pp. 1093-1098, 2016.
    [13] E.R. Parnell, “Bi-Manual Articulated Robot Teleoperation using an External RGB-D Range Sensor,” 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, pp. 298-304, 2018.
    [14] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” In Proceedings of the 27th International Conference on Neural Information Processing Systems, Volume 1 (NIPS’14), MIT Press, Cambridge, MA, USA, pp.568-576, 2014.
    [15] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” In Proceedings of the 33rd International Conference on International Conference on Machine Learning, Volume 48, JMLR.org, pp. 2014–2023, 2016.
    [16] Wikipedia,Laplacian matrix,檢自https://en.wikipedia.org/wiki/ Laplacian_matrix(Feb 19, 2020)
    [17] S. Yan, Y. Xiong and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” AAAI Conference on Artificial Intelligence, pp. 7444-7452, 2018.
    [18] L. Shi, Y. Zhang, J. Cheng, H. Lu, “Skeleton-based action recognition with directed graph neural networks,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7912-7921, 2019.
    [19] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [20] M. Zolfaghari, G. L. Oliveira, N. Sedaghat, and T. Brox, “Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection,” arXiv preprint arXiv:1704.00616, 2017.
    [21] Y. Han, S. Chung, S. Chen, and S. Sut, “Two-stream LSTM for Action Recognition with RGB-D-based hand-crafted features and feature combination,” 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, pp. 3547-3552, 2018.
    [22] P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, N. Zheng, “View adaptive recurrent neural networks for high performance human action recognition from skeleton data,” IEEE International Conference on Computer Vision (ICCV), pp. 2136-2145, 2017.
    [23] F. Li, Q. Jiang, S. Zhang, M. Wei and R. Song, “Robot skill acquisition in assembly process using deep reinforcement learning,” Neuro computing, vol. 345, pp. 92-102, 2019.
    [24] M. Raessa, D. Sánchez, W. Wan, D. Petit and K. Harada, “Teaching a robot to use electric tools with regrasp planning,” CAAI Trans, Intell, Technol, vol.4, pp. 54-63, 2019.
    [25] P. Sermanet, K. Xu, and S. Levine, “Unsupervised perceptual rewards for imitation learning,” arXiv preprint arXiv:1612.06699, 2016.
    [26] Q. Zhao, J. Mu, F. Yang, T. Li and Y. Wang, “A genetic algorithm based approach to search optimal assembly sequences for autonomous robotic assembly,” IEEE SOUTHEASTCON 2014, Lexington, KY, pp. 1-5, 2014.
    [27] C. Weng and I. Chen, “The task-level evaluation model for a flexible assembly task with an industrial dual-arm robot,” 2017 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM), Ningbo, pp. 356-360, 2017.
    [28] M. Haage, G. Piperagkas, C. Papadopoulos, I. Mariolis, J. Malec, Y. Bekiroglu, M. Hedelind and D. Tzoaras, “Teaching Assembly by Demonstration Using Advanced Human Robot Interaction and a Knowledge Integration Framework,” Procedia Manufacturing, vol. 11, pp. 164-173, 2018.
    [29] MBA pedia,Fixed Automation。檢自https://en.wikipedia.org /wiki/ Laplacian_matrix (Jar 16, 2017)
    [30] L. Peternel, T. Petric and J. Babič, “Robotic assembly solution by human-in-the-loop teaching method based on real-time stiffness modulation,” Autonomous Robots, vol. 42, pp. 1-17, 2017.
    [31] T.R. Savarimuthu, “Teaching a Robot the Semantics of Assembly Tasks,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no. 5, pp. 670-692, 2018.
    [32] E.E. Aksoy, “Learning the semantics of object-action relations by observation,” Int. J. Robot. Res., vol. 30, no. 10, pp. 1229-1249, 2011.
    [33] Wikipedia,Simultaneous localization and mapping。檢自https:// en.wikipedia.org/wiki/Simultaneous_localization_and_mapping (2020)
    [34] W. Chen, X. Jia, H.J. Chang, J. Duan, and A. Leonardis, “G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features,” ArXiv, abs/2003.11089, 2020.
    [35] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, pp. 3431-3440, 2015.
    [36] D. Andreas, K. Rigas, M. Sotiris & K. Tae-Kyun, “Recovering 6d object pose and predicting next-best-view in the crowd,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    [37] V. Lempitsky, A. Yao, J. Gall, N. Razavi and L. Van Gool, “Hough Forests for Object Detection, Tracking, and Action Recognition,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 33, no. 11, pp. 2188-2202, 2011.
    [38] S.V. Pharswan, M. Vohra, A. Kumar and L. Behera, “Domain-Independent Unsupervised Detection of Grasp Regions to grasp Novel Objects,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, pp. 640-645, 2019.
    [39] M. Vohra, R. Prakash and L. Behera, “Real-time Grasp Pose Estimation for Novel Objects in Densely Cluttered Environment,” 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp.1-6, 2019.
    [40] D. Morrison, P. Corke and J. Leitner, “Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter,” 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, pp. 8762-8768, 2019.
    [41] H. Zhang, X. Lan, S. Bai, X. Zhou, Z. Tian and N. Zheng, “ROI-based Robotic Grasp Detection for Object Overlapping Scenes,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, pp. 4768-4775, 2019.
    [42] Wikipedia,Region of interest。檢自https://en.wikipedia.org/wiki/Region_ of_interest (June 4, 2020)
    [43] Wikipedia,Universal Robots。檢自https://en.wikipedia.org/wiki/Universal_ Robots (March 10, 2020)
    [44] P.I. Corke, “A Simple and Systematic Approach to Assigning Denavit–Hartenberg Parameters,” IEEE Transactions on Robotics, vol. 23, no. 3, pp. 590-594, 2007.
    [45] G. Canal, C. Angulo and S. Escalera, “Gesture based human multi-robot interaction,” 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2015.
    [46] Wikipedia,Hue, saturation, lightness。檢自https://en.wikipedia.org/wiki/ HSL_and_HSV (June 6, 2020)。
    [47] J. Canny, “A Computational Approach to Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, 1986.
    [48] K. Silventoinen, S. Sammalisto, M. Perola, D.I. Boomsma, B.K. Cornes, C.J. Davis, L. Dunkel, M.D. Lange, J.R. Harris, J.V. Hjelmborg, M. Luciano, N.G. Martin, J.N. Mortensen, L. Nisticò, N.L. Pedersen, A Skytthe, T.D. Spector, M.A. Stazi, G. Willemsen and J. Kaprio, “Heritability of adult body height: a comparative study of twin cohorts in eight countries.” Twin research: the official journal of the International Society for Twin Studies, pp.399-pp.408, 2003.
    [49] A.W. French and T.R. Miller, “Logistic regression and its use in detecting differential item functioning in polytomous items,” Journal of Educational Measurement, 33, pp.315-332, 1996.
    [50] S.E. Wei, V. Ramakrishna, T. Kanade and Y. Sheikh, “Convolutional pose machines,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [51] T. Simon, H. Joo, I. Matthews and Y. Sheikh, “Hand Keypoint Detection in Single Images Using Multiview Bootstrapping,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp. 4645-4653, 2017.
    [52] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [53] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, pp. 1097-1105, 2012.
    [54] X. Chu, W. Ouyang, H. Li and X. Wang, “Structured Feature Learning for Pose Estimation,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp. 4715-4723, 2016
    [55] Wikipedia,Regularization (mathematics)。檢自https://en.wikipedia. org /wiki/Regularization_mathematics (June 8, 2020)
    [56] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in ´ICCV, 2017.
    [57] H.S. Fang, S. Xie, Y.W. Tai, and C. Lu, “RMPE: Regional multiperson pose estimation,” in ICCV, 2017.
    [58] Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun, “Cascaded pyramid network for multi-person pose estimation,” in CVPR, 2018.
    [59] I. Ajili, M. Mallem and J. Didier, “Gesture recognition for humanoid robot teleoperation,” 2017 26th IEEE International Symposium on Robot and Human Interactive Communication(RO-MAN), Lisbon, pp. 1115-1120, 2017
    [60] 程玉美,機器人動作語意與情感訊息之探討,臺中技術學院,2010
    [61] J. Kim, J. Seo and D. Kwon, “Application of effort parameter to robot gesture motion,” 2012 9th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Daejeon, pp. 80-82, 2012.
    [62] 勞工安全衛生展示館,勞動部人體計測資料庫簡介及重要計測值,勞動部勞動及職業安全衛生研究所,2019
    [63] J. Park, H. Kim, Y. Tai, M. S. Brown and I.S. Kweon, “High-Quality Depth Map Upsampling and Completion for RGB-D Cameras,” IEEE Transactions on Image Processing, vol. 23, no. 12, pp. 5559-5572, 2014.
    [64] L. Shi, Y. Zhang, J. Cheng and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” Conference on Computer Vision and Pattern Recognition, pp. 12026-12035, 2019.
    [65] A. Shahroudy, J. Liu, T.T. Ng, and G. Wang, “NTU RGB+ D: A large scale dataset for 3D human activity analysis,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010-1019, 2016.
    [66] S. Bai, J.Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” ArXiv e-prints, 2018.

    無法下載圖示 全文公開日期 2025/08/26 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE