簡易檢索 / 詳目顯示

研究生: Pornsinee Eksupaphan
Pornsinee Eksupaphan
論文名稱: 以影像為基礎的端對端深度加強式之自動駕駛學習
End-to-End Deep Reinforcement Learning in Vision-based Autonomous Driving
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 鍾聖倫
Sheng-Luen Chung
郭重顯
Chung-Hsien Kuo
陳美勇
Mei-Yung Chen
黃有評
Yo-Ping Huang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 65
中文關鍵詞: Autonomous drivingDeep reinforcement learningCarla simulatorProximal Policy GradientState Representation Learning
外文關鍵詞: Autonomous driving, Deep reinforcement learning, Carla simulator, Proximal Policy Gradient, State Representation Learning
相關次數: 點閱:301下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

In this thesis, methods using deep reinforcement learning for training self-driving cars are studied. Autonomous driving has drawn great interests from researchers and many companies, as such technologies promise to solve several problems and to bring a safer and more convenient society. This research is aimed at consider methods for an end-to-end deep reinforcement learning model with well-known reinforcement algorithm for continuous control, Proximal Policy Optimization (PPO) which is on-policy algorithm presented by OpenAI to safely drive starting from point A to point B without any infractions in the urban driving simulator, CARLA. Through our work, we provided the deep reinforcement learning method with two state representation learning methods that worked as feature extractor (1) PPO with Variational Autoencoder and (2) PPO with MobileNetV2. We also provided OpenAI-like environments for CARLA that focused on navigating arbitrary paths created by a topological planner. This environment created is similar to how we used the navigation tool for driving in real-life. Moreover, we propose a sub-policy method of PPO that is used in training. The results from training sub-policy methods show that an agent is able to follow the commands of a topological planner.


In this thesis, methods using deep reinforcement learning for training self-driving cars are studied. Autonomous driving has drawn great interests from researchers and many companies, as such technologies promise to solve several problems and to bring a safer and more convenient society. This research is aimed at consider methods for an end-to-end deep reinforcement learning model with well-known reinforcement algorithm for continuous control, Proximal Policy Optimization (PPO) which is on-policy algorithm presented by OpenAI to safely drive starting from point A to point B without any infractions in the urban driving simulator, CARLA. Through our work, we provided the deep reinforcement learning method with two state representation learning methods that worked as feature extractor (1) PPO with Variational Autoencoder and (2) PPO with MobileNetV2. We also provided OpenAI-like environments for CARLA that focused on navigating arbitrary paths created by a topological planner. This environment created is similar to how we used the navigation tool for driving in real-life. Moreover, we propose a sub-policy method of PPO that is used in training. The results from training sub-policy methods show that an agent is able to follow the commands of a topological planner.

Table of Contents Abstract Acknowledgement Table of Contents List of Figures List of Tables Chapter 1 Introduction 1.1 Background 1.2 Motivation 1.3 Research Contributions 1.4 Thesis Overview Chapter 2 Related Work 2.1 Modular Approaches 2.2 Imitation Learning 2.3 Reinforcement Learning 2.3.1 Markov Decision Processes 2.3.2 RL Algorithms 2.4 Deep Reinforcement Learning 2.4.1 Policy Gradient Methods 2.4.2 Trust Region Methods 2.4.3 Clipped Surrogate Objective 2.5 Reinforcement Learning for Autonomous Vehicle Chapter 3 Methodology 3.1 CARLA 3.2 State Representation 3.2.1 Variational Autoencoder 3.2.2 MobileNetV2 3.3 Implementation Details 3.3.1 Reinforcement Architecture 3.3.2 System overview Chapter 4 Experiments 4.1 Experiment design 4.1.1 Carla Environment 4.1.2 Reward function 4.2 Implement Detail 4.2.1 Hardware 4.2.2 Software 4.2.3 Training Details and Hyper Parameters Setting 4.3 Experiment results Chapter 5 Conclusions and Future Work 5.1 Conclusion 5.2 Future Work References

References
[1] R. Wallace, A. Stentz, C. Thorpe, H. Moravec, W. Whittaker, and T. Kanade, "First Results in Robot Road-Following," in IJCAI, 1985.
[2] S. International, "Automated Driving Levels of Driving Automation are Defined in New SAE International Standard J3016," ed: SAE International Troy, MI, 2014.
[3] D. A. Pomerleau, "Alvinn: An autonomous land vehicle in a neural network," pp. 305–313, 1989. [Online]. Available: http://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network.pdf
[4] M. Bojarski et al., "End to end learning for self-driving cars," arXiv preprint arXiv:1604.07316, 2016.
[5] H. Xu, Y. Gao, F. Yu, and T. Darrell, "End-to-end learning of driving models from large-scale video datasets," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174-2182.
[6] F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, "End-to-end driving via conditional imitation learning," in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018: IEEE, pp. 4693-4700.
[7] M. Bansal, A. Krizhevsky, and A. Ogale, "Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst," arXiv preprint arXiv:1812.03079, 2018.
[8] A. Sauer, N. Savinov, and A. Geiger, "Conditional affordance learning for driving in urban environments," in Conference on Robot Learning, 2018: PMLR, pp. 237-252.
[9] D. Silver et al., "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529, no. 7587, pp. 484-489, 2016.
[10] V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[11] A. Kendall et al., "Learning to drive in a day," in 2019 International Conference on Robotics and Automation (ICRA), 2019: IEEE, pp. 8248-8254.
[12] E. Perot, M. Jaritz, M. Toromanoff, and R. De Charette, "End-to-end driving in a realistic racing game with deep reinforcement learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 3-4.
[13] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, "CARLA: An open urban driving simulator," in Conference on robot learning, 2017: PMLR, pp. 1-16.
[14] X. Liang, T. Wang, L. Yang, and E. Xing, "Cirl: Controllable imitative reinforcement learning for vision-based self-driving," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 584-599.
[15] M. Toromanoff, E. Wirbel, and F. Moutarde, "End-to-end model-free reinforcement learning for urban driving using implicit affordances," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7153-7162.
[16] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
[17] F. Munir, S. Azam, M. I. Hussain, A. M. Sheri, and M. Jeon, "Autonomous vehicle: The architecture aspect of self driving car," in Proceedings of the 2018 International Conference on Sensors, Signal and Image Processing, 2018, pp. 1-5.
[18] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, "A survey of autonomous driving: Common practices and emerging technologies," IEEE access, vol. 8, pp. 58443-58469, 2020.
[19] M. Siam, S. Elkerdawy, M. Jagersand, and S. Yogamani, "Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges," in 2017 IEEE 20th international conference on intelligent transportation systems (ITSC), 2017: IEEE, pp. 1-8.
[20] M. Uřičář, P. Křížek, G. Sistu, and S. Yogamani, "Soilingnet: Soiling detection on automotive surround-view cameras," in 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019: IEEE, pp. 67-72.
[21] S. Milz, G. Arbeiter, C. Witt, B. Abdallah, and S. Yogamani, "Visual slam for automated driving: Exploring the applications of deep learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 247-257.
[22] C. Badue et al., "Self-driving cars: A survey," Expert Systems with Applications, vol. 165, p. 113816, 2021.
[23] D. Pomerleau, "An autonomous land vehicle in a neural network," Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1998.
[24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[25] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.
[26] S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011: JMLR Workshop and Conference Proceedings, pp. 627-635.
[27] T. M. Mitchell, Machine Learning. McGraw-Hill, Inc., 1997.
[28] R. S. Sutton and A. G. Barto, Introduction to reinforcement learning. MIT press Cambridge, 1998.
[29] D. Zhao, Z. Xia, and Q. Zhang, "Model-free optimal control based intelligent cruise control with hardware-in-the-loop demonstration [research frontier]," IEEE Computational Intelligence Magazine, vol. 12, no. 2, pp. 56-69, 2017.
[30] J. Lee, T. Kim, and H. J. Kim, "Autonomous lane keeping based on approximate Q-learning," in 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), 2017: IEEE, pp. 402-405.
[31] L. Manuelli and P. Florence, "Reinforcement learning for autonomous driving obstacle avoidance using lidar," Massachusetts Institute of Technology, Tech. Rep, 2015.
[32] R. A. Howard, "Dynamic programming and markov processes," 1960.
[33] J. Achiam, "Spinning up in deep reinforcement learning," URL . openai. com, 2018.
[34] G. A. Rummery and M. Niranjan, On-line Q-learning using connectionist systems. Citeseer, 1994.
[35] C. J. C. H. Watkins, "Learning from delayed rewards," 1989.
[36] G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural computation, vol. 6, no. 2, pp. 215-219, 1994.
[37] V. Mnih et al., "Human-level control through deep reinforcement learning," nature, vol. 518, no. 7540, pp. 529-533, 2015.
[38] V. Mnih et al., "Asynchronous Methods for Deep Reinforcement Learning," presented at the Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2016. [Online]. Available: .
[39] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust Region Policy Optimization," presented at the Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 2015. [Online]. Available: .
[40] S. Kakade and J. Langford, "Approximately Optimal Approximate Reinforcement Learning," presented at the Proceedings of the Nineteenth International Conference on Machine Learning, 2002.
[41] A. Kendall et al., "Learning to Drive in a Day," 2019 International Conference on Robotics and Automation (ICRA), pp. 8248-8254, 2019.
[42] T. Lillicrap et al., "Continuous control with deep reinforcement learning," CoRR, vol. abs/1509.02971, 2016.
[43] D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," CoRR, vol. abs/1312.6114, 2014.
[44] A. Raffin, A. Hill, K. R. Traoré, T. Lesort, N. D. Rodríguez, and D. Filliat, "Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics," ArXiv, vol. abs/1901.08651, 2019.
[45] H. Noh, S. Hong, and B. Han, "Learning Deconvolution Network for Semantic Segmentation," 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1520-1528, 2015.
[46] M. L. Vergara, "Accelerating Training of Deep Reinforcement Learning-based Autonomous Driving Agents Through Comparative Study of Agent and Environment Designs," NTNU, 2019.
[47] F. Carton, D. Filliat, J. Rabarisoa, and Q. Pham, "Using Semantic Information to Improve Generalization of Reinforcement Learning Policies for Autonomous Driving," 2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW), pp. 144-151, 2021.
[48] M. S. Juanola, "Speed traffic sign detection on the CARLA simulator using YOLO," 2019.
[49] Y. Valeja, S. Pathare, D. Patel, and M. Pawar, "Traffic Sign Detection using Clara and Yolo in Python," 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, pp. 367-371, 2021.
[50] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
[51] J. Huang et al., "Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3296-3297, 2017.
[52] W. Liu et al., "SSD: Single Shot MultiBox Detector," in ECCV, 2016.
[53] T.-Y. Lin et al., "Microsoft COCO: Common Objects in Context," in ECCV, 2014.
[54] M. Everingham, L. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, vol. 88, pp. 303-338, 2009.
[55] K. Frans, J. Ho, X. Chen, P. Abbeel, and J. Schulman, "Meta Learning Shared Hierarchies," ArXiv, vol. abs/1710.09767, 2018.
[56] A. Raffin and R. Sokolkov, "Learning to drive smoothly in minutes," ed, 2019.

無法下載圖示 全文公開日期 2024/09/28 (校內網路)
全文公開日期 2026/09/28 (校外網路)
全文公開日期 2026/09/28 (國家圖書館:臺灣博碩士論文系統)
QR CODE