基於深度強化學習之移動機器人｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	SHARFIDEN HASSEN SHARFIDEN HASSEN
論文名稱：	基於深度強化學習之移動機器人 Mobile Robot Navigation Using Deep Reinforcement Learning
指導教授：	李敏凡 Min-Fan Lee
口試委員:	柯正浩 Cheng-Hao Ko 湯梓辰 Joni Tzuchen Tang
學位類別：	碩士 Master
系所名稱：	工程學院 - 自動化及控制研究所 Graduate Institute of Automation and Control
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	68
中文關鍵詞：	自主導航、深度強化學習、移動機器人、卷積神經網路模型、深度Q-學習、深度Q-網路、深度雙Q-網路、避障
外文關鍵詞：	Autonomous agents
相關次數：	點閱：450 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

對於移動機器人而言，學習如何在未知環境中自主導航而不與靜態和動態障礙物碰撞是非常重要的。傳統的移動機器人導航系統不具有自主學習的能力。與傳統的方法不同，本文提出了一種端到端的方法，該方法將深度強化學習應用於在未知環境中的自主移動機器人導航。
本文提出了兩種類型的深度Q-學習主體，像是深度Q-網路和深度雙Q-網路，使移動機器人能夠在未知環境中自主學習如何避免碰撞和導航。主體的輸入為使用Hokuyo雷射測距儀所量測的機器人與障礙物的距離以及使用RGB−D相機在環境中即時觀測到的狀態。主體的輸出動作是移動機器人的線性和角速度的指令。
對於未知環境的自主移動機器人導航，首先使用深度學習模型對目標進行偵測，再根據深度Q-網路或深度雙Q-網路的演算法導航至目標。模擬的結果顯示移動機器人可以在未知環境中自主導航、辨識並到達目標的位置，而不會與靜態和動態障礙物發生碰撞。在現實世界的實驗中只有靜態障礙物也獲得了類似的結果。在模擬過程中，使用深度雙Q-網路到達目標位置的性能優於深度Q-網路5.06%。

Learning how to navigate autonomously in an unknown environment without colliding with static and dynamic obstacles is important for mobile robots. The Conventional mobile robot navigation system does not have the ability to learn autonomously. Unlike conventional approaches, this paper proposes an end-to-end approach that uses deep reinforcement learning for autonomous mobile robot navigation in an unknown environment.
Two types of deep Q-learning agents, such as deep Q-network and double deep Q-network agents are proposed to enable the mobile robot to autonomously learn about collision avoidance and navigation capabilities in an unknown environment. The inputs for the agents are the distance of the mobile robot to the obstacles using the laser rangefinder sensor, and the current observations of the environment in real-time video using an RGB−D camera. The agents' output actions are the linear and angular velocity commands for the mobile robot motion.
For autonomous mobile robot navigation in an unknown environment, the process of detecting the target object is first carried out using a deep learning model, and then the process of navigation to the target object is followed using the deep Q-network or double deep Q-network algorithm. The simulation results show that the mobile robot can autonomously navigate, recognize, and reach the target object location in an unknown environment without colliding with static and dynamic obstacles. Similar results are obtained in real-world experiments, but only with static obstacles. The DDQN agent outperforms the DQN agent in reaching the target object location in the test simulation by 5.06%.

Acknowledgements    III
Chinese Abstract    IV
English Abstract    V
Table of Contents    VI
List of Figures    VII
List of Tables    IX
Chapter 1 Introduction    1
Chapter 2 Background    7
Chapter 3 Methodology    22
Chapter 4 Experimental Results and Discussion    36
Chapter 5 Conclusion and Future Work    53
References    55

[1] M. J. Mataric, “Behaviour-based control: examples from navigation, learning, and group behaviour,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 9, no. 2-3, pp. 323-336, Apr. 1997, Doi: 10.1080/095281397147149.
[2] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015, Doi: 10.1038/nature14236.
[3] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, pp. 484-489, Jan. 2016, Doi: 0.1038/nature16961.
[4] D. Kalashnikov et al., “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation,” ArXiv, vol. abs/1806.10293, Jan. 2018.
[5] O. M. Andrychowicz et al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, 2020.
[6] A. A. Rusu et al., “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” ArXiv, vol. abs/1610.04286, 2017.
[7] N. Heess et al., “Emergence of Locomotion Behaviours in Rich Environments,” ArXiv, vol. abs/1707.02286, 2017.
[8] X. B. Peng et al., “DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning,” ACM Trans. Graph., vol. 36, no. 4, 2017, Doi: 10.1145/3072959.3073602.
[9] S. Shalev-Shwartz et al., “Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving,” ArXiv, vol. abs/1610.03295, 2016.
[10] A. E. Sallab et al., “Deep Reinforcement Learning framework for Autonomous Driving,” ArXiv, vol. abs/1704.02532, 2017.
[11] X. Qiu, K. Wan, and F. Li, “Autonomous Robot Navigation in Dynamic Environment Using Deep Reinforcement Learning,” in IEEE 2nd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 2019, pp. 338-342.
[12] L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, 2017, pp. 31-36.
[13] Y. Zhu et al., “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, pp. 3357-3364.
[14] A. Garulli, A. Giannitrapani, A. Rossi, and A. Vicino, “Mobile robot SLAM for line-based environment representation,” in IEEE Conference on Decision and Control, Seville, Spain, 2005, pp. 2041-2046.
[15] V. Nguyen, A. Harati, A. Martinelli, R. Siegwart, and N. Tomatis, “Orthogonal SLAM: a Step toward Lightweight Indoor Autonomous Navigation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 2006, pp. 5007-5012.
[16] E. H. C. Harik and A. Korsaeth, “Combining Hector SLAM and Artificial Potential Field for Autonomous Navigation Inside a Greenhouse,” Robotics, vol. 7, no. 2, 2006, Doi: 10.3390/robotics7020022.
[17] G. Sepulveda, J. C. Niebles, and A. Soto, “A Deep Learning Based Behavioral Approach to Indoor Autonomous Navigation,” in IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 2018, pp. 4646-4653.
[18] Y. Kim, J. Jang, and S. Yun, “End-to-end deep learning for autonomous navigation of mobile robot,” in IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2018, pp. 1-6.
[19] J. K. Wang, X. Q. Ding, H. Xia, Y. Wang, L. Tang, and R. Xiong, “A LiDAR based end to end controller for robot navigation using deep neural network,” in IEEE International Conference on Unmanned Systems (ICUS), Beijing, China, 2017, pp. 614-619.
[20] H. Quan, Y. Li, and Y. Zhang, “A novel mobile robot navigation method based on deep reinforcement learning,” International Journal of Advanced Robotic Systems, vol. 17, no. 3, 2020, Doi: 10.1177/1729881420921672.
[21] P. Yue, J. Xin, H. Zhao, D. Liu, M. Shan, and J. Zhang, “Experimental Research on Deep Reinforcement Learning in Autonomous navigation of Mobile Robot,” in IEEE Conference on Industrial Electronics and Applications (ICIEA), Xi’an, China, 2020, pp. 1612-1616.
[22] X. Xue, Z. Li, D. Zhang, and Y. Yan, “A Deep Reinforcement Learning Method for Mobile Robot Collision Avoidance based on Double DQN,” in IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, BC, Canada, 2019, pp. 2131-2136.
[23] X. Ruan, D. Ren, X. Zhu, and J. Huang, “Mobile Robot Navigation based on Deep Reinforcement Learning,” in Chinese Control And Decision Conference (CCDC), Nanchang, China, 2019, pp. 6174-6178.
[24] H. Surmann, C. Jestel, R. Marchel, F. Musberg, H. K. Elhadj, and M. Ardani, “Deep Reinforcement learning for real autonomous mobile robot navigation in indoor environments,” ArXiv, vol. abs/2005.13857, 2020.
[25] G. Alvarez, “DQN: Deep Q-Learning for Autonomous Navigation,” Kansas State university Undergraduate Research Conference, Kansas, USA, 2019.
[26] Y. D. Kwon and J. S. Lee, “A Stochastic Map Building Method for Mobile Robot using 2-D Laser Range Finder,” Autonomous Robots, vol. 7, no. 2, pp. 187-200, 2020.
[27] A. Scott, L. E. Parker, and C. Touzet, “Quantitative and qualitative comparison of three laser-range mapping algorithms using two types of laser scanner data,” in IEEE International Conference on Systems, Nashvilla, TN, USA, 2019, pp. 1422-1427.
[28] L. Zhang and B. K. Ghosh, “Line segment based map building and localization using 2D laser rangefinder,” in IEEE International Conference on Robotics and Automation, San Francisco, CA, USA, 2000, pp. 2538-2543.
[29] A. Elfes, “Sonar-based real-world mapping and navigation,” IEEE Journal on Robotics and Automation, vol. 3, no. 3, pp. 249-265, 1987.
[30] H. Moravec and A. Elfes, “High resolution maps from wide angle sonar,” in IEEE International Conference on Robotics and Automation, St. Louis, MO, USA, 1985, pp. 116-121.
[31] H. Moradi, J. Choi, E. Kim, and S. Lee, “A Real-Time Wall Detection Method for Indoor Environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 2006, pp. 4551-4557.
[32] A. Neves, J. P. Cunha, E. Pedrosa, C. Cruz, and N. Lau, “Using a Depth Camera for Indoor Robot Localization and Navigation,” in IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 2012.
[33] J. J. Leonard, H. F. Durrant-Whyte, and I. J. Cox, “Dynamic Map Building for an Autonomous Mobile Robot,” The International Journal of Robotics Research, vol. 11, no. 4, pp. 286-298, 2012.
[34] O. Wulf and B. Wagner, “Fast 3D scanning methods for laser measurement systems,” International Conference on Control Systems and Computer Science, Chicago, USA, 2003.
[35] H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping: part I,” IEEE Robotics & Automation Magazine, vol. 13, no. 2, pp. 99-110, 2006.
[36] S. Thrun, W. Burgard, and D. Fox, “A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots,” Machine Learning, vol. 31, no. 1, pp. 29-53, 1998.
[37] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem,” in Eighteenth national conference on Artificial intelligence, CA, USA, 2002.
[38] T. Bailey, J. Nieto, J. Guivant, M. Stevens, and E. Nebot, “Consistency of the EKF-SLAM Algorithm,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 2006, pp. 3562-3568.
[39] S. Thrun and M. Montemerlo, “The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures,” The International Journal of Robotics Research, vol. 25, no. 5-6, pp. 403-429, 2006, Doi: 10.1177/0278364906065387.
[40] H. V. Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-Learning,” in AAAI, 2016.
[41] V. Mnih et al., “Playing Atari with Deep Reinforcement Learning,” ArXiv, vol. abs/1312.5602, 2013.
[42] H. Van Hasselt, “Double Q-learning,” in 2010 24th Annual Conference on Neural Information Processing Systems, CA, USA, 2010, pp. 2613-2621.
[43] A. a. R. Hill, Antonin and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Traore, Rene and Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai, “Stable Baselines,” GitHub repository, 2018, [Online]. Available: https://github.com/hill-a/stable-baselines.
[44] P. a. H. Dhariwal, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai and Zhokhov, Peter, “OpenAI Baselines,” GitHub repository, 2017, [Online]. Available: https://github.com/openai/baselines.
[45] A. Younis, L. Shixin, S. Jn, and Z. Hai, “Real-Time Object Detection Using Pre-Trained Deep Learning Models MobileNet-SSD,” in the 6th International Conference on Computing and Data Engineering, Sanya, China, 2020.

全文公開日期 2026/02/07 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文