簡易檢索 / 詳目顯示

研究生: 陳輝正
Hui-Cheng Chen
論文名稱: 深度強化學習應用至聯結車路徑追蹤與後輪輔助轉向功能開發
Research on path tracking and rear wheel steering of articulated vehicle using deep reinforcement learning
指導教授: 陳亮光
Liang-Kuang Chen
口試委員: 徐茂濱
Mau-Pin Hsu
徐繼聖
Gee-Sern Hsu
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 86
中文關鍵詞: 聯結車路徑追蹤後輪轉向控制深度強化學習
外文關鍵詞: Articulated vehicle, path tracking, rear-wheel steering, deep reinforcement learning
相關次數: 點閱:384下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

聯結車可以乘載大量貨物的特性使其在商業物流中扮演重要角色,然而現今聯結車仍然面臨著長途運輸下的疲勞駕駛以及意外事故具有致命性兩項問題,為克服此問題,自駕功能與安全系統開發有其必要性,本研究分別針對自駕功能中的路徑追蹤與安全系統中的後輪轉向(第二軸)作探討。過去研究通常以基於模型的控制器設計方法來克服上述議題,然而基於模型之方法經常面臨模型建立困難、假設失真等問題。另一方面,目前熱門的人工智慧是基於資料的方法,有不局限於模型之優勢。因此本研究欲探討基於資料的人工智慧方法是否可以應用在聯結車的路徑追蹤與後輪轉向(第二軸)之應用開發。而本研究採用之方法為人工智慧中之深度確定性策略梯度(deep deterministic policy gradient, DDPG)演算法,可以在給定的環境中尋找最佳策略。
為明確探討,本研究設定情境為聯結車於高速下之障礙物閃避,在給定閃避路徑的情況下,分為情境一和情境二兩種情境各別探討,情境一在給定路徑下設計控制器操控前輪轉向以達追蹤該路徑的效果,情境二則首先分別定義三種不同行為的駕駛人操控車輛,再設計控制器控制後輪轉向以輔助駕駛人追蹤給定路徑。以上兩情境均透過Matlab與模擬軟體TruckSim作探討。
情境一之模擬結果顯示透過資料訓練設計的控制器DDPG相比於透過模型設計的控制器LQR在急迫閃避路徑下可以有更好的追蹤效果,顯示DDPG透過資料設計之方法可以改善LQR方法模型假設的失真。情境二中以DDPG設計的後輪轉向控制器則可以改善三種不同行為駕駛人的追蹤效果與安全性,且透過DDPG中的Critic Network訂定門檻值來決定後輪轉向控制器介入的程度,透過該門檻值可以權衡路徑追蹤效果以及駕駛人感受。
透過本論文兩個情境的探討,驗證了DDPG應用於聯結車路徑追蹤與後輪轉向控制設計的可行性與其優勢,所使用之架構可供日後推廣至更多樣的研究情境使用。


The large capacity of articulated vehicles makes it play an important role in commercial logistics. However, it still faces problems of fatigue driving and fatal accidents. To overcome these problems, the development of autonomous function and safety system is necessary. Hence, this thesis investigates path tracking in autonomous function and rear wheel steering (2nd axle) in safety system. In the past, researchers usually used model-based controller to overcome the above issues. However, model-based methods usually face two problems, difficulty of modeling and invalidity of model. In contrast, the popular artificial intelligence nowadays is a data-based method, which has advantages not limited to models. Therefore, this thesis intends to investigate whether the data-based artificial intelligence method can be applied to the development of path tracking and rear wheel steering (2nd axle) of articulated vehicles. The method used in this research is the deep deterministic policy gradient (DDPG) algorithm in artificial intelligence, which can find the optimal policy in a given environment through experience.
Specifically, this thesis set the scenario as obstacle avoidance at high speed. Given avoidance path, it is divided into two scenarios, scenario 1 and scenario 2. In scenario 1, we seek to design a controller which controls the front wheel steering under a given path to achieve path tracking. In scenario 2, three drivers with different behaviors control the front wheel steering to track the given path. We seek to design a controller which controls the rear wheel steering and assist the drivers to track the given path. Both scenarios are discussed in Matlab and simulation software TruckSim.
The simulation result of scenario 1 shows that the controller designed by data-based DDPG algorithm can have a better tracking performance than model-based LQR under the urgent path. On the other hand, the rear-wheel steering controller designed with DDPG can improve the tracking performance and safety of three drivers with different behaviors. Moreover, we can set a threshold according to the critic network in DDPG method and adjust the intervention of rear wheel controller. Through the threshold we can compromise between tracking effect and driver’s feeling.
Based on the discussion of two scenarios, the feasibility and advantages of DDPG algorithm applied to the path tracking and rear wheel steering of articulated vehicle are verified. The framework used can be extended in more scenarios in the future.

第一章 緒論 1.1 前言與動機 1.2 文獻回顧 1.2.1 駕駛人路徑追蹤模型 1.2.2 聯結車車輛安全 1.2.3 強化學習及其於車輛之應用 1.3 研究範圍 1.4 論文架構 第二章 演算法簡介與軟體操作 2.1 DDPG演算法 2.2 Matlab軟體操作 2.2.1 Matlab中的DDPG實現架構 2.2.2 神經網路架構 第三章 車輛模型、環境設定與參考狀態 3.1 路徑定義 3.2 車輛狀態與其模型 3.2.1 車輛模型 3.2.2 模型驗證 3.2.3 輪胎特性 3.3 參考狀態計算 3.3.1 參考狀態A 3.3.2 參考狀態B 第四章 情境一: 前輪轉向控制 4.1 駕駛人模型A (蘿蔔點追蹤法) 4.2 駕駛人模型B (蘿蔔點追蹤法 + LQR回授) 4.2.1 LQR控制器設計 4.2.2 模擬結果 4.3 駕駛人模型C (蘿蔔點追蹤法 + DDPG回授) 4.3.1 觀察狀態與輸出動作 4.3.2 獎勵函數 4.3.3 訓練過程 4.3.4 訓練結果 4.4 情境一總結 第五章 情境二: 後輪輔助控制 5.1 駕駛人設定 5.2 後輪輔助控制器設計 5.2.1 觀察狀態與輸出動作 5.2.2 獎勵函數 5.2.3 訓練過程 5.3 訓練結果 5.3.1 參考狀態A 5.3.2 參考狀態B 5.4 後輪輔助開關機制 5.5 情境二總結 第六章 結論 參考文獻 附錄 A. Matlab訓練參數設定 B. 車輛參數 C. 車輛線性模型

[1] Kouchak, S. M. and Gaffar, A. (2017). Determinism in future cars: why autonomous trucks are easier to design. 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), San Francisco, CA, pp. 1-6. DOI: 10.1109/UIC-ATC.2017.8397598
[2] Amer, N. H., Zamzuri, H., Hudha, K. and Kadir, Z. A. (2017). Modelling and control strategies in path tracking control for autonomous ground vehicles: a review of state of the art and challenges. Journal of Intelligent and Robotic Systems, vol. 86, no. 2, pp. 225-254. DOI: 10.1007/s10846-016-0442-0
[3] Iguchi, M. (1959). Manual Control Systems. Journal of Mechanic Society of Japan, vol. 62, no. 481, pp. 215-222. DOI: 10.1299/jsmemag.62.481_215
[4] Ashkens, I. L. and McRuer, D. T. (1962). A theory of handling qualities derived from pilot/vehicle system consideration. Aerospace Engineering, vol. 2, pp. 83-102.
[5] McRuer, D. T. (1967). A review of quasi-linear pilot models. IEEE Trans. on Human Factors in Electronics, vol. 8, no. 3, pp. 231-249. DOI: 10.1109/THFE.1967.234304
[6] Snider, J. M. (2009). Automatic Steering Methods for Autonomous Automobile Path tracking. Robotics Institute, Pittsburgh, PA, Tech. Rep. CMU-RITR-09-08.
[7] Rankin, A. L., Crane III, C. D., Armstrong II, D. G., Nease, A. D. and Brown, H. E. (1996). Autonomous path-planning navigation system for site characterization. Navigation and Control Technologies for Unmanned Systems, vol. 2738, pp. 176-186. DOI: 10.1117/12.241081
[8] Amidi, O. and Thorpe, C. E. (1991). Integrated mobile robot control. Mobile Robots V, vol. 1388, pp. 504-523. DOI: 10.1117/12.25494
[9] Abe, M. (2015). Vehicle Handling Dynamics: Theory and Application. Butterworth-Heinemann.
[10] MacAdam, C. C. (1981). Application of an optimal preview control for simulation of closed-loop automobile driving. IEEE Transactions on Systems, Man, and Cybernetics, vol. 11, no. 6, pp. 393-399. DOI: 10.1109/TSMC.1981.4308705
[11] Martins, F. N., Celeste, W. C., Carelli, R., Sarcinelli-Filho, M. and Bastos-Filho, T. F. (2008). An adaptive dynamic controller for autonomous mobile robot trajectory tracking. Control Engineering Practice, vol. 16, no. 11, pp. 1354-1363. DOI: 10.1016/j.conengprac.2008.03.004
[12] Ping, E. P., Hudha, K. and Jamaluddin, H. (2010). Hardware-in-the-loop simulation of automatic steering control for lane keeping manoeuvre: outer-loop and inner-loop control design. International Journal of Vehicle Safety, vol. 5, no. 1, pp. 35-59. DOI: 10.1504/IJVS.2010.035318
[13] Barbosa, F. M., Marcos, L. B., da Silva, M. M., Terra, M. H. and Junior, V. G. (2019). Robust path-following control for articulated heavy-duty vehicles. Control Engineering Practice, vol. 85, pp. 246-256. DOI: 10.1016/j.conengprac.2019.01.017
[14] Dorion, S. L., Pickard, J. G. and Vespa, S. (1989). Feasibility of anti-jackknifing systems for tractor semitrailers. SAE Transactions, vol. 98, pp. 130-144. DOI: 10.4271/891631
[15] Azad, N. L., Khajepour, A. and McPhee, J. (2005). Analysis of jackknifing in articulated steer vehicles. 2005 IEEE Vehicle Power and Propulsion Conference, Chicago, IL, pp. 86-90. DOI: 10.1109/VPPC.2005.1554559
[16] McCann, R. and Le, A. (2005). Electric motor based steering for jackknife avoidance in large trucks. 2005 IEEE Vehicle Power and Propulsion Conference, Chicago, IL, pp. 103-109. DOI: 10.1109/VPPC.2005.1554540
[17] Jujnovich, B. A. and Cebon, D. (2013). Path-following steering control for articulated vehicles. Journal of Dynamic Systems, Measurement, and Control, vol. 135, no. 3, pp. 1-15. DOI: 10.1115/1.4023396
[18] 安浩宇(2018)。藉由後輪轉向控制增進聯結車行駛與轉向穩定性。國立臺灣科技大學機械工程系碩士論文,未出版。台北市。
[19] Tabatabaei Oreh, S. H., Kazemi, R., Azadi, S. and Zahedi, A. (2012). A new method for directional control of a tractor semi-trailer. Australian Journal of Basic and Applied Sciences, vol. 6, no. 12, pp. 396-409.
[20] Tianjun, Z. and Changfu, Z. (2009). Modelling and active safe control of heavy tractor semi-trailer. 2009 Second International Conference on Intelligent Computation Technology and Automation, Zhangjiajie, China, vol. 2, pp. 112-115. DOI: 10.1109/ICICTA.2009.264
[21] Chen, L. K. and Shieh, Y. A. (2011). Jackknife prevention for articulated vehicles using model reference adaptive control. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, vol. 225, no. 1, pp. 28-42. DOI: 10.1243/09544070JAUTO1513
[22] Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine learning, vol. 8, no. 3-4, pp. 279-292. DOI: 10.1007/BF00992698
[23] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. arXiv:1312.5602 [cs.LG].
[24] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv:1509.02971 [cs.LG].
[25] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D. and Riedmiller, M. (2014). Deterministic policy gradient algorithms. The 31st International Conference on Machine Learning, Beijing, China, pp. 387-395.
[26] Arvind, C. S. and Senthilnath, J. (2019). Autonomous RL: Autonomous vehicle obstacle avoidance in a dynamic environment using MLP-SARSA reinforcement learning. 2019 IEEE 5th International Conference on Mechatronics System and Robots (ICMSR), Singapore, pp. 120-124. DOI: 10.1109/ICMSR.2019.8835462
[27] Okuyama, T., Gonsalves, T. and Upadhay, J. (2018). Autonomous driving system based on deep q learning. 2018 International Conference on Intelligent Autonomous Systems (ICoIAS), Singapore, pp. 201-205. DOI: 10.1109/ICoIAS.2018.8494053
[28] Chae, H., Kang, C. M., Kim, B., Kim, J., Chung, C. C. and Choi, J. W. (2017). Autonomous braking system via deep reinforcement learning. 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, pp. 1-6. DOI: 10.1109/ITSC.2017.8317839
[29] Kamran, D., Zhu, J. and Lauer, M. (2019). Learning Path Tracking for Real Car-like Mobile Robots From Simulation. 2019 European Conference on Mobile Robots (ECMR), Prague, Czech Republic, pp. 1-6. DOI: 10.1109/ECMR.2019.8870947
[30] Gu, W. Y., Xu, X. and Yang, J. (2017). Path following with supervised deep reinforcement learning. 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), Nanjing, China, pp. 448-452. DOI: 10.1109/ACPR.2017.30
[31] Martinsen, A. B. and Lekkas, A. M. (2018). Curved path following with deep reinforcement learning: Results from three vessel models. OCEANS 2018 MTS/IEEE Charleston, Charleston, South Carolina, pp. 1-8. DOI: 10.1109/OCEANS.2018.8604829
[32] Bejar, E. and Morán, A. (2018). Backing up control of a self-driving truck-trailer vehicle with deep reinforcement learning and fuzzy logic. 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Louisville, KY, pp. 202-207. DOI: 10.1109/ISSPIT.2018.8642777
[33] Bejar, E. and Moran, A. (2019). A Preview Neuro-Fuzzy Controller Based on Deep Reinforcement Learning for Backing Up a Truck-Trailer Vehicle. 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), Edmonton, AB, pp. 1-4. DOI: 10.1109/CCECE.2019.8861534
[34] Handaoui, M., Dartois, J. E., Boukhobza, J., Barais, O. and d'Orazio, L. (2020). ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization Of Ephemeral Cloud Resources. arXiv:2009.11208 [cs.PF].
[35] Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. DOI: 10.1016/B978-012526430-3/50003-9
[36] Csáji, B. C. (2001). Approximation with artificial neural networks. Unpublished Master’s dissertation, Faculty of Sciences, Eötvös Loránd University, Hungary.
[37] Liang, S. and Srikant, R. (2016). Why deep neural networks for function approximation?. arXiv:1610.04161 [cs.LG].
[38] Xu, B., Wang, N., Chen, T. and Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 [cs.LG].
[39] LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. nature, vol. 521, no. 7553, pp. 436-444. DOI: 10.1038/nature14539
[40] Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980 [cs.LG].
[41] Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory of the Brownian motion. Physical Review, vol. 36, no. 5, p. 823. DOI: 10.1103/PhysRev.36.823
[42] Pacejka, H. B. and Sharp, R. S. (1991). Shear force development by pneumatic tyres in steady state conditions: a review of modelling aspects. Vehicle System Dynamics, vol. 20, no. 3-4, pp. 121-175. DOI: 10.1080/00423119108968983
[43] Rajamani, R. (2011). Vehicle Dynamics and Control. Springer Science & Business Media.
[44] Pauwelussen, J. (2012). Dependencies of driver steering control parameters. Vehicle System Dynamics, vol. 50, no. 6, pp. 939-959. DOI: 10.1080/00423114.2011.651476
[45] Delice, I. I. and Ertugrul, S. (2007). Intelligent modeling of human driver: A survey. 2007 IEEE intelligent Vehicles Symposium, Istanbul, Turkey, pp. 648-651. DOI: 10.1109/IVS.2007.4290189
[46] Wang, Q., Oya, M., Okumura, K. and Kobayashi, T. (2007). Adaptive steering controller to improve handling stability of combined vehicles. Second International Conference on Innovative Computing, Information and Control (ICICIC 2007), Kumamoto, Japan, p. 428. DOI: 10.1109/ICICIC.2007.116
[47] Suresh, B. V., Karthik, G., Panda, P. K. and Rao, P. S. (2017). Fabrication of combined in-phase and counter-phase steering mechanism of a four wheel drive. International Journal of Engineering, Science and Technology, vol. 6, pp. 401-415.

QR CODE