應用深度強化學習控制於具有串聯式彈性致動器的機械手臂研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	江柏承 Bo-Cheng Jiang
論文名稱：	應用深度強化學習控制於具有串聯式彈性致動器的機械手臂研究 Study on Applications of Deep Reinforcement Learning Control to a Robot Manipulator with Series Elastic Actuator
指導教授：	郭永麟 Yong-Lin Kuo
口試委員:	郭永麟 Yong-Lin Kuo 蔡明忠 Ming-Jong Tsai 楊振雄 Cheng-Hsiung Yang 吳宗亮 Tsung-Liang Wu
學位類別：	碩士 Master
系所名稱：	工程學院 - 自動化及控制研究所 Graduate Institute of Automation and Control
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	193
中文關鍵詞：	深度強化學習、位置控制、扭力控制、串聯式彈性致動器、干擾觀測器
外文關鍵詞：	Deep reinforcement learning, Position control, Torque control, Series elastic actuator, Disturbance observer
相關次數：	點閱：472 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

受到近年來工業需求及消費市場變遷的快速發展下，具有高精確度且快速的工業用機械手臂在自動化中其重要性隨之提升，然而複雜且控制難度高的機械手臂在逐漸走向少量多樣的市場模式下，依靠傳統控制、人力操作及使用經驗的控制方式倍受挑戰。
故本文將深度強化學習應用於控制串聯式彈性致動器的高維度非線性機械手臂，目的為有效地提升位置控制及扭力控制的精確度及強健性等其他特性，降低控制系統的難度，使得使用者不需要再用傳統耗時且依賴使用者經驗的調整方法。
本論文以深度確定性決策梯度及雙延遲深度確定性決策梯度作為控制器演算法，以PID控制作為結果對照並另外加入干擾觀測器降低干擾的影響。在串聯式彈性致動器方面使用兩種硬體設備，分別為以馬達串接扭力彈簧後再串接煞車器，以及馬達串接扭力彈簧於三軸機械手臂的各軸關節的結構，並且進行位置控制及扭力控制。以Matlab軟體進行深度強化學習控制之訓練及PID控制模擬，實驗部分則將模擬部分的訓練後之代理轉換為程式碼資料庫輸入至LabVIEW軟體供寫入控制器，控制直流馬達及接收編碼器回授，以此實現深度強化學習控制，並依PID模擬之參數進行實驗以供後續探討。
透過本論文實驗結果探討，可以發現深度強化學習對於位置控制及扭力控制的適用差異，並且對於高維度非線性的串聯式彈性致動器機械手臂，其控制效應受到獎勵機制設計的大幅影響，另外觀察到實驗趨勢與模擬吻合，驗證深度強化學習應用於串聯式彈性致動器系統控制設計的可行性及優勢，討論未來可深入研究方向。

Due to the rapid development of industrial demand and consumer market changes in recent years, the importance of high-precision and fast industrial robotic arms in automation have increased. However, gradually diversified market models are challenging the control methods that rely on traditional control, manual operation, and use experience for complex and difficult-to-control robotic arms.
Therefore, this thesis applies deep reinforcement learning to control a high-dimensional nonlinear manipulator with the series elastic actuators. The purpose is to effectively improve the accuracy and robustness of position control and torque control and to reduce the difficulty of the control system. Users no longer need to use traditional time-consuming adjustment methods that rely on user experiences.
This thesis uses the deep deterministic policy gradient and the twin delayed deep deterministic policy gradient as the controller algorithms, and their control performances are compared with those by the PID control. In addition, the disturbance observer is added to reduce the impact of interferences. Two types of hardware devices are used for the series elastic actuators. One is to connect a torsion spring and a brake in series with a motor. The other is a structure where a motor is connected with a torsion spring in series with each axis joint of a three-axis robotic arm. Both hardware devices perform position control and torque control individually. The software Matlab is used to deep reinforcement learning controllers training and to perform PID control simulation. In the experimental parts, the trained agent is converted into code database and then is input to the software LabVIEW to construct the controllers, which send signals to the DC motors and receive feedback signals from encoders, so as to achieve deep reinforcement learning control. Furthermore, the experiments of PID control are performed based on the simulation parameters, and the follow-up discussions are conducted.
Through the discussion of experimental results, the difference in the applicability of deep reinforcement learning for position control and torque control can be observed. For the high-dimensional nonlinear manipulator with a series elastic actuator, its control effects are greatly affected by the design of the reward functions. In addition, the trends of the experiments are consistent with the simulations, which verifies the feasibilities of applying deep reinforcement learning to the control design of the series elastic actuator system. Finally, future researches are presented.

摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 IX
第一章 緒論 1
1 研究背景 1
2 文獻回顧 2
2.1 串聯式彈性致動器 2
2.2 干擾觀測器 3
2.3 深度強化學習 4
2.4 深度強化學習應用 5
3 研究動機 6
4 研究方法 7
5 研究貢獻 8
6 論文架構 9
第二章 深度強化學習控制 10
1 Matlab強化學習工具箱應用於控制 10
2 深度確定性策略梯度 16
3 雙延遲深度確定性策略梯度 18
4 深度強化學習結合PID控制 19
5 深度強化學習應用 22
第三章 實驗規劃 31
1 串聯式彈性致動器-煞車器系統模型 31
2 串聯式彈性致動器-機械手臂系統模型 35
3 串聯式彈性致動器硬體架構 37
4 控制流程架構 47
5 深度強化學習與實驗硬體整合 53
第四章 模擬與實驗 70
1 實驗流程 70
2 SEA-煞車器位置控制 74
2.1 SEA-煞車器位置控制 74
2.2 SEA-煞車器位置控制含干擾觀測器 80
2.3 討論 86
3 SEA-煞車器扭力控制 87
3.1 SEA-煞車器扭力控制 87
3.2 SEA-煞車器扭力控制含干擾觀測器 92
3.3 討論 98
4 SEA機械手臂位置控制 99
4.1 SEA機械手臂位置控制 102
4.2 SEA機械手臂位置控制含干擾觀測器 114
4.3 討論 130
5 SEA機械手臂扭力控制 130
5.1 SEA機械手臂扭力控制 130
5.2 SEA機械手臂扭力控制含干擾觀測器 145
5.3 討論 161
第五章 結論與建議 162
1 結論 162
2 未來研究方向 163
參考文獻 164
附錄 170
                                

[1] Y. Hori, “A review of torsional vibration control methods and a proposal of disturbance observer-based new techniques,” IFAC Proceedings Volumes, vol. 29, no. 1, pp. 990-995, 1996.
[2] K. Kong, J. Bae, M. Tomizuka, “Control of rotary series elastic actuator for ideal force-mode actuation in human–robot interaction applications,” IEEE/ASME Transactions on Mechatronics, vol. 14, no. 1, pp. 105-118, 2009.
[3] G. A. Pratt, M. M. Williamson, “Series elastic actuators,” Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots, vol. 1, pp. 399-406, 1995.
[4] D. W. Robinson, J. E. Pratt, D. J. Paluska, G. A. Pratt, “Series elastic actuator development for a biomimetic walking robot,” IEEE/ASME International Conference on Advanced Intelligent Mechatronics (Cat. No. 99TH8399), pp. 561-568, 1999.
[5] K. Kong, J. Bae, M. Tomizuka, “A compact rotary series elastic actuator for human assistive systems,” IEEE/ASME Transactions on Mechatronics, vol. 17, no. 2, pp. 288-297, 2011.
[6] T. Boaventura, M. Focchi, M. Frigerio, J. Buchli, C. Semini, G. A. Medrano-Cerda, D. G. Caldwell, “On the role of load motion compensation in high-performance force control,” IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4066-4071, 2012.
[7] J. N. Yun, J. Su, Y. I. Kim, Y. C. Kim, “Robust disturbance observer for two-inertia system,” IEEE Transactions on Industrial Electronics, vol. 60, no. 7, pp. 2700-2710, 2012.
[8] Q. Zhang, A. Zhu, Y. Wu, P. Zhu, X. Zhang, G. Cao, “Impedance control of series elastic actuators in exoskeleton using recurrent neural network,” International Conference on Ubiquitous Robots, pp.183-188, 2019.
[9] X. Li, Y. Pan, G. Chen, H. Yu, “Adaptive human–robot interaction control for robots driven by series elastic actuators,” IEEE Transactions on Robotics, vol. 33, no. 1 ,pp. 169-182, 2017.
[10] S. Kim, J. Bae, “Force-mode control of rotary series elastic actuators in a lower extremity exoskeleton using model-inverse time delay control,” IEEE/ASME Transactions on Mechatronics, vol. 22, no. 3, pp. 1392-1400, 2017.
[11] K. S. Eom, I. H. Suh, W. K. Chung, S. R. Oh, “Disturbance observer based force control of robot manipulator without force sensor,” Proceeding of IEEE International Conference on Robotics and Automation, pp. 3012-3017, 1998.
[12] W. H. Chen, “Disturbance observer based control for nonlinear systems,” IEEE/ASME Transactions on Mechatronics, vol. 9, no. 4, pp. 706-710, 2004.
[13] L. Li, J. Xiao, Y. Zhao, K. Liu, S. Member, X. Peng, H. Luan, K. Li, “Robust position anti-interference control for PMSM servo system with uncertain disturbance,” China Electrotechnical Society Transactions on Electrical Machines and Systems, vol. 4, no. 2, pp. 151-160, 2020.
[14] E. Sariyildiz, G. Chen, H. Yu, “Robust position control of a novel series elastic actuator via disturbance observer,” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5423-5428, 2015.
[15] H. Lee, S. Oh, “Design of reduced order disturbance observer of series elastic actuator for robust force control,” 2018 IEEE 15th International Workshop on Advanced Motion Control (AMC), pp. 663-668, 2018.
[16] M. Wang, L. Sun, W. Yin, S. Dong, J. Liu, “Nonlinear disturbance observer based torque control for series elastic actuator,” 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 286-291, 2016.
[17] W. Chen, D.J. Ballance, P.J. Gawthrop, J. O'Reilly, “A nonlinear disturbance observer for robotic manipulators,” IEEE Transactions on Industrial Electronics, vol. 47, no. 4, pp. 932-938, 2000.
[18] E. Sariyildiz, H. Sekiguchi, T. Nozaki, B. Ugurlu, K. Ohnishi, “A stability analysis for the acceleration-based robust position control of robot manipulators via disturbance observer,” IEEE/ASME Transactions on Mechatronics, vol. 23, no. 5, pp. 2369-2378, 2018.
[19] Z. J. Yang, Y. Fukushima, “Decentralized adaptive robust control of robot manipulators using disturbance observers,” IEEE Transactions on Control Systems Technology, vol. 20, no. 5, pp. 1357-1365, 2011.
[20] A. Mohammadi, M. Tavakoli, H. J. Marquez, F. Hashemzadeh, “Nonlinear disturbance observer design for robotic manipulators,” Control Engineering Practice, vol. 21, no. 3, pp. 253-267, 2013.
[21] C. Watkins, Learning from delayed rewards, King's College London, PhD Thesis, 1989.
[22] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529-533, 2015.
[23] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, “Playing atari with deep reinforcement learning,” NIPS Deep Learning Workshop, pp.1-9, 2013.
[24] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, “Continuous control with deep reinforcement learning,” International Conference on Learning Representations, pp.1-14, 2016.
[25] V. R. Konda, J. N. Tsitsiklis, “Actor-critic algorithms,” Society for Industrial and Applied Mathematics, vol.42, pp.1143-1166, 2003.
[26] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, “Deterministic policy gradient algorithms,” International Conference on Machine Learning, vol. 32, pp. 1-14, 2014.
[27] S. Fujimoto, H. Hoof, D. Meger, “Addressing function approximation error in actor-critic methods,” International Conference on Machine Learning, pp. 1-16, 2018.
[28] M. Gheisarnejad, M. H. Khooban, “An intelligent non-integer PID controller-based deep reinforcement learning: implementation and experimental results,” IEEE Transactions on Industrial Electronics, vol. 68, no. 4, pp. 3609-3618, 2021.
[29] Q. Shi, H. Lam, C. Xuan, M. Chen, “Adaptive neuro-fuzzy PID controller based on twin delayed deep deterministic policy gradient algorithm,” Neurocomputing, vol. 402, pp. 183-194, 2020.
[30] Z. Zhang, X. Li, J. An, W. Man, G. Zhang, “Model-free attitude control of spacecraft based on PID-guide TD3 algorithm,” International Journal of Aerospace Engineering, vol. 2020, pp. 1-13, 2020.
[31] A. A. Apolinarska, M. Pacher, H. Li, N. Cote, R. Pastrana, F. Gramazio, M. Kohler “Robotic assembly of timber joints using reinforcement learning,” Automation in Construction, vol. 125, pp.1-8, 2021.
[32] S. Wang, F. Ma, X. Yan, P. Wu, Y. Liu, “Adaptive and extendable control of unmanned surface vehicle formations using distributed deep reinforcement learning,” Applied Ocean Research, vol. 110, pp. 1-14, 2021.
[33] J. C. Jesus, V. A. Kich, A. H. Kolling, R. B. Grando, M. A. S. L. Cuadros, D. F. T. Gamarra, “Soft actor-critic for navigation of mobile robots,” Journal of Intelligent & Robotic Systems, vol. 14, no. 31, pp. 1-11, 2021.
[34] F. Sanfilippo, T. M. Hua, S. Bos, “A comparison between a two-feedback control loop and a reinforcement learning algorithm for compliant low-cost series elastic actuators,” Proceeding of the Hawaii International Conference on System Sciences, vol. 53, pp. 881-890, 2020.
[35] Y. Ouyang, W. He, X. Li, “Reinforcement learning control of a single-link flexible robotic manipulator,” IET Control Theory & Applications, vol. 11, pp. 1426-1433, 2017.
[36] W. He, H. Gao, C. Zhou, C. Yang, Z. Li, “Reinforcement learning control of a flexible two-link manipulator: an experimental investigation,” IEEE Transactions on Systems, pp. 1-11, 2020.
[37] X. Tu, M. Li, M. Liu, J. Si, “A data-driven reinforcement learning solution framework for optimal and adaptive personalization of a hip exoskeleton,” International Conference on Robotics and Automation, vol. 1, pp. 1-7, 2021.
[38] J. W. Kim, H. Shim, I. Yang, “On improving the robustness of reinforcement learning-based controllers using disturbance observer,” IEEE Conference on Decision and Control, pp. 1-6, 2019.
[39] D. P. Losey, Adaptive and self-adjusting controllers for safe and meaningful human-robot interaction during rehabilitation, Rice University, Master’s Thesis, 2016.
[40] Y. Jiang, C. Yang, Y. Wang, Z. Ju, Y. Li, C. Su, “Multi-hierarchy interaction control of a redundant robot using impedance learning,” International Federation of Automatic Control, vol. 67, no. 102348, 2020.
[41] Mathworks, “PDF Documentation for Reinforcement Learning Toolbox R2021a Ebooks,” Available: https://www.mathworks.com/help/pdf_doc/reinforcement-learning/index.html
[42] 劉梓傑, “應用干擾觀測器於串聯式彈性致動器之運動控制研究,” 國立臺灣科技大學自動化及控制研究所, 碩士論文, 2019.
[43] L. Buşoniu, T. Bruin, D. Tolić, J. Kober, I. Palunko, “Reinforcement learning for control performance stability and deep approximators,” Annual Reviews in Control, vol. 46, pp. 8-28, 2018.
[44] V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, J. Pineau, An Introduction to deep reinforcement learning, Foundations and Trends in Machine Learning, vol. 11, no. 3-4, pp. 1-140, 2018.

全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文