当前位置: X-MOL 学术Med. Biol. Eng. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Human locomotion with reinforcement learning using bioinspired reward reshaping strategies
Medical & Biological Engineering & Computing ( IF 3.2 ) Pub Date : 2021-01-08 , DOI: 10.1007/s11517-020-02309-3
Katharine Nowakowski 1 , Philippe Carvalho 1 , Jean-Baptiste Six 1 , Yann Maillet 1 , Anh Tu Nguyen 1 , Ismail Seghiri 1 , Loick M'Pemba 1 , Theo Marcille 1 , Sy Toan Ngo 1 , Tien-Tuan Dao 1, 2
Affiliation  

Recent learning strategies such as reinforcement learning (RL) have favored the transition from applied artificial intelligence to general artificial intelligence. One of the current challenges of RL in healthcare relates to the development of a controller to teach a musculoskeletal model to perform dynamic movements. Several solutions have been proposed. However, there is still a lack of investigations exploring the muscle control problem from a biomechanical point of view. Moreover, no studies using biological knowledge to develop plausible motor control models for pathophysiological conditions make use of reward reshaping. Consequently, the objective of the present work was to design and evaluate specific bioinspired reward function strategies for human locomotion learning within an RL framework. The deep deterministic policy gradient (DDPG) method for a single-agent RL problem was applied. A 3D musculoskeletal model (8 DoF and 22 muscles) of a healthy adult was used. A virtual interactive environment was developed and simulated using opensim-rl library. Three reward functions were defined for walking, forward, and side falls. The training process was performed with Google Cloud Compute Engine. The obtained outcomes were compared to the NIPS 2017 challenge outcomes, experimental observations, and literature data. Regarding learning to walk, simulated musculoskeletal models were able to walk from 18 to 20.5 m for the best solutions. A compensation strategy of muscle activations was revealed. Soleus, tibia anterior, and vastii muscles are main actors of the simple forward fall. A higher intensity of muscle activations was also noted after the fall. All kinematics and muscle patterns were consistent with experimental observations and literature data. Regarding the side fall, an intensive level of muscle activation on the expected fall side to unbalance the body was noted. The obtained outcomes suggest that computational and human resources as well as biomechanical knowledge are needed together to develop and evaluate an efficient and robust RL solution. As perspectives, current solutions will be extended to a larger parameter space in 3D. Furthermore, a stochastic reinforcement learning model will be investigated in the future in scope with the uncertainties of the musculoskeletal model and associated environment to provide a general artificial intelligence solution for human locomotion learning.



中文翻译:

使用仿生奖励重塑策略进行强化学习的人类运动

最近的学习策略,如强化学习 (RL),有利于从应用人工智能向通用人工智能的转变。RL 在医疗保健领域的当前挑战之一与控制器的开发有关,以教授肌肉骨骼模型执行动态运动。已经提出了几种解决方案。然而,目前还缺乏从生物力学角度探索肌肉控制问题的研究。此外,没有研究利用生物学知识为病理生理条件开发合理的运动控制模型,利用奖赏重塑。因此,目前工作的目标是在 RL 框架内设计和评估特定的仿生奖励功能策略,用于人类运动学习。应用了针对单代理 RL 问题的深度确定性策略梯度 (DDPG) 方法。使用健康成人的 3D 肌肉骨骼模型(8 个自由度和 22 块肌肉)。使用opensim-rl 库开发和模拟了一个虚拟交互环境。为步行、向前和侧身跌倒定义了三个奖励函数。训练过程是使用 Google Cloud Compute Engine 执行的。将获得的结果与 NIPS 2017 挑战结果、实验观察和文献数据进行了比较。关于学习走路,模拟的肌肉骨骼模型能够从 18 到 20.5 m 步行以获得最佳解决方案。揭示了肌肉激活的补偿策略。比目鱼肌、胫骨前肌和股四头肌是简单向前跌倒的主要参与者。跌倒后还注意到更高强度的肌肉激活。所有运动学和肌肉模式都与实验观察和文献数据一致。关于侧坠,​​注意到预期坠落侧的肌肉激活程度很高,导致身体失衡。获得的结果表明,需要计算和人力资源以及生物力学知识共同开发和评估有效且稳健的 RL 解决方案。作为观点,当前的解决方案将扩展到更大的 3D 参数空间。此外,未来将在肌肉骨骼模型和相关环境的不确定性的范围内研究随机强化学习模型,为人类运动学习提供通用的人工智能解决方案。所有运动学和肌肉模式都与实验观察和文献数据一致。关于侧坠,​​注意到预期坠落侧的肌肉激活程度很高,导致身体失衡。获得的结果表明,需要计算和人力资源以及生物力学知识共同开发和评估有效且稳健的 RL 解决方案。作为观点,当前的解决方案将扩展到更大的 3D 参数空间。此外,未来将在肌肉骨骼模型和相关环境的不确定性的范围内研究随机强化学习模型,为人类运动学习提供通用的人工智能解决方案。所有运动学和肌肉模式都与实验观察和文献数据一致。关于侧坠,​​注意到预期坠落侧的肌肉激活程度很高,导致身体失衡。获得的结果表明,需要计算和人力资源以及生物力学知识共同开发和评估有效且稳健的 RL 解决方案。作为观点,当前的解决方案将扩展到更大的 3D 参数空间。此外,未来将在肌肉骨骼模型和相关环境的不确定性的范围内研究随机强化学习模型,为人类运动学习提供通用的人工智能解决方案。关于侧坠,​​注意到预期坠落侧的肌肉激活程度很高,导致身体失衡。获得的结果表明,需要计算和人力资源以及生物力学知识共同开发和评估有效且稳健的 RL 解决方案。作为观点,当前的解决方案将扩展到更大的 3D 参数空间。此外,未来将在肌肉骨骼模型和相关环境的不确定性的范围内研究随机强化学习模型,为人类运动学习提供通用的人工智能解决方案。关于侧坠,​​注意到预期坠落侧的肌肉激活程度很高,导致身体失衡。获得的结果表明,需要计算和人力资源以及生物力学知识共同开发和评估有效且稳健的 RL 解决方案。作为观点,当前的解决方案将扩展到更大的 3D 参数空间。此外,未来将在肌肉骨骼模型和相关环境的不确定性的范围内研究随机强化学习模型,为人类运动学习提供通用的人工智能解决方案。获得的结果表明,需要计算和人力资源以及生物力学知识共同开发和评估有效且稳健的 RL 解决方案。作为观点,当前的解决方案将扩展到更大的 3D 参数空间。此外,未来将在肌肉骨骼模型和相关环境的不确定性的范围内研究随机强化学习模型,为人类运动学习提供通用的人工智能解决方案。获得的结果表明,需要计算和人力资源以及生物力学知识共同开发和评估有效且稳健的 RL 解决方案。作为观点,当前的解决方案将扩展到更大的 3D 参数空间。此外,未来将在肌肉骨骼模型和相关环境的不确定性的范围内研究随机强化学习模型,为人类运动学习提供通用的人工智能解决方案。

更新日期:2021-01-08
down
wechat
bug