簡易檢索 / 詳目顯示

研究生: 林政武
Cheng-Wu Lin
論文名稱: 使用深度強化學習演算法針對足底壓力中心之感測器放置優化
Optimizing the Sensor Placement for Foot Plantar Center of Pressure using Deep Reinforcement Learning
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 許維君
Wei-Chun Hsu
林昌鴻
Chang-Hong Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 72
中文關鍵詞: 足底壓力壓力中心優化感測器擺放深度強化學習離散的軟性演員評論家演算法
外文關鍵詞: plantar pressure, center of pressure, sensor placement optimization, deep reinforcement learning, soft actor-critic discrete
相關次數: 點閱:267下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

我們在不使用任何足底先備知識的情況下,透過深度強化學習研究足底傳感器的擺放位置。為了在探索傳感器擺放位置中採用深度強化學習演算法,我們提出了一個傳感器擺放環境,此環境針對自選速度跑步任務中優化足底壓力中心軌跡。在此環境中,電腦(Agent)會在一個 7 x 20 的網格系統中擺放八個感測器,最終網格上擺放的樣式成為傳感器放置的結果。我們的結果表明,該方法(1)生成的傳感器擺放位置,在擬合真實數據時能較人類設計獲得更低的均方誤差,(2)此方法能穩健的在大量的組合可能性中進行搜尋,其組合數量約為 116 萬億種。最後此方法也不只侷限於自選速跑步的情快,對於其他的動作也是可行的。


We study the foot plantar sensor placement by deep reinforcement learning algorithm without using any prior knowledge of foot anatomical area. To apply a reinforcement learning algorithm, we propose a sensor placement environment and reward system that aims to optimize fitting the center of pressure (COP) trajectory during the self-selected speed running task. In this environment, the agent considers placing eight sensors within a 7 x 20 grid coordinate system, and then the final pattern becomes the result of sensor placement. Our results show that this method (1) can generate a sensor placement, which has a low mean square error in fitting ground truth COP trajectory, and (2) robustly discovers optimal sensor placement in a large number of combinations, which more than 116 quadrillion. This method is also feasible for solving different tasks, regardless of the self-selected speed running task.

RecommendationLetter............................. I ApprovalLetter................................. II AbstractinChinese ............................... III AbstractinEnglish ............................... IV Acknowledgements............................... VI Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII ListofFigures.................................. X ListofTables ..................................XVI List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XVII 1 Introduction................................. 1 2 SensorPlacementEnvironmentandRewardSystem . . . . . . . . . . . . 3 2.1 SensorPlacementEnvironment.................... 4 2.2 RewardSystem ............................ 5 2.3 RewardRedistribution ........................ 8 3 SoftActor-CriticDiscrete.......................... 11 3.1 Notation................................ 12 3.2 Maximum Entropy Reinforcement Framework . . . . . . . . . . . . 13 3.3 SAC-Discretealgorithm ....................... 14 4 Applying SAC-Discrete for Sensor Placement Environment . . . . . . . . 16 4.1 NeuralNetworkStructure....................... 16 4.2 Testing Sensor Placement Environment with Created Video . . . . . 17 4.3 Tuning Temperature with Population Based Training . . . . . . . . 18 5 Feeding Plantar Pressure Video for Sensor Placement Environment . . . . 22 5.1 ParticipantsandexperimentalProtocol . . . . . . . . . . . . . . . . 22 5.2 ExperimentalProtocol ........................ 23 5.3 Self Selected Speed Plantar Pressure Video Collection . . . . . . . 23 5.4 DataPreprocessing .......................... 24 6 Results.................................... 27 7 Conclusions................................. 30 References.................................... 32 AppendixASAC-SiscreteHyperparameters.................. 36 AppendixBPBTHyperparameters....................... 37 AppendixCWalkinSenseSensorPlacement.................. 38 AppendixDStanceData ............................ 39 LetterofAuthority ............................... 54

[1] P.Christodoulou,“Softactor-criticfordiscreteactionsettings,”arXivpreprint arXiv:1910.07207, 2019.
[2] M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, et al., “Population based training of neural networks,” arXiv preprint arXiv:1711.09846, 2017.
[3] P. Bonato, “Wearable sensors/systems and their impact on biomedical engi- neering,” IEEE Engineering in Medicine and Biology Magazine, vol. 22, no. 3, pp. 18–20, 2003.
[4] T. Nilpanapan and T. Kerdcharoen, “Social data shoes for gait monitoring of elderly people in smart home,” in 2016 9th Biomedical Engineering Interna- tional Conference (BMEiCON), pp. 1–5, IEEE, 2016.
[5] T. Yamakawa, K. Taniguchi, K. Asari, S. Kobashi, and Y. Hata, “Biometric personal identification based on gait pattern using both feet pressure change,” in 2010 World Automation Congress, pp. 1–6, IEEE, 2010.
[6] A. Razak, A. Hadi, A. Zayegh, R. K. Begg, and Y. Wahab, “Foot plantar pres-
sure measurement system: A review,” Sensors, vol. 12, no. 7, pp. 9884–9912, 2012.
[7] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
[8] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., “Mastering the game of go without human knowledge,” nature, vol. 550, no. 7676, pp. 354–359, 2017.
[9] C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv:1912.06680, 2019.
[10] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” arXiv preprint arXiv:1801.01290, 2018.
[11] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited„ 2016.
[12] J.A.Arjona-Medina,M.Gillhofer,M.Widrich,T.Unterthiner,J.Brandstetter, and S. Hochreiter, “Rudder: Return decomposition for delayed rewards,” in Advances in Neural Information Processing Systems, pp. 13566–13577, 2019.
[13] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, pp. 3104–3112, 2014.
[14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learn- ing,” in International conference on machine learning, pp. 1928–1937, 2016.
[15] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradi- ent methods for reinforcement learning with function approximation,” in Ad- vances in neural information processing systems, pp. 1057–1063, 2000.
[17] S.Fujimoto,H.VanHoof,andD.Meger,“Addressingfunctionapproximation error in actor-critic methods,” arXiv preprint arXiv:1802.09477, 2018.
[18] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[19] Z.-P.Luo,L.J.Berglund,K.-N.An,etal.,“Validationoff-scanpressuresensor system: a technical note,” Journal of rehabilitation research and development, vol. 35, pp. 186–186, 1998.
[20] A. Healy, P. Burgess-Walker, R. Naemi, and N. Chockalingam, “Repeatability of walkinsense® in shoe pressure measurement system: A preliminary study,” The Foot, vol. 22, no. 1, pp. 35–39, 2012.

無法下載圖示 全文公開日期 2025/08/23 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE