当前位置: X-MOL 学术IEEE Trans. Transp. Electrif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimizing Steering Feedback Torque with Reinforcement Learning guided by Human Reward Model
IEEE Transactions on Transportation Electrification ( IF 7.2 ) Pub Date : 2024-03-25 , DOI: 10.1109/tte.2024.3381266
Rui Zhao 1 , Weiwen Deng 1 , Kaibo Huang 1 , Jiangfeng Nan 1 , Ying Wang 2 , Juan Ding 3
Affiliation  

Steering feel holds a main role in driving, making steering feedback torque (SFT) generation an important research of steer-by-wire (SBW) system. Authentic steering feel provides sensations comparable to conventional steering systems, while the SFT model aligns with driver preferences, reducing cognitive load and fatigue, thereby improving vehicle safety. Current research on driver-preferred SFT is limited to real-world trials, a method often time-intensive and suboptimal. This paper presents an innovative approach for obtaining optimal SFT with reinforcement learning (RL) guided by human reward model. The human reward model, trained on human feedback data, is introduced to guide SFT model’s optimization. Given complexity of assessing human evaluations, the reward model is validated with virtual data. Building upon previous data-driven SFT modeling, a fine-tuning method leveraging the RL method is presented, using human reward model’s outputs. This approach retains the nonlinear predictive capabilities of the pre-trained model while aligning with human evaluations. Results confirm the human reward model’s ability to replicate human SFT standards. Models adapted using this network retain the nonlinear predictive capabilities of the initial network, even under high-frequency inputs. Overall, this work pioneers the integration of human feedback into SFT optimization, providing novel insights for the advancement of SBW systems.

中文翻译:


通过人类奖励模型引导的强化学习优化转向反馈扭矩



转向感觉在驾驶中起着主要作用,因此转向反馈扭矩(SFT)的生成成为线控转向(SBW)系统的重要研究内容。真实的转向感觉提供与传统转向系统相当的感觉,而 SFT 模型符合驾驶员的偏好,减少认知负荷和疲劳,从而提高车辆安全性。目前对驾驶员偏好的 SFT 的研究仅限于现实世界的试验,这种方法通常耗时且不是最佳的。本文提出了一种创新方法,通过人类奖励模型指导的强化学习(RL)获得最佳 SFT。引入基于人类反馈数据训练的人类奖励模型来指导 SFT 模型的优化。鉴于评估人类评估的复杂性,奖励模型通过虚拟数据进行验证。基于之前的数据驱动 SFT 建模,提出了一种利用 RL 方法的微调方法,使用人类奖励模型的输出。这种方法保留了预训练模型的非线性预测能力,同时与人类评估保持一致。结果证实了人类奖励模型复制人类 SFT 标准的能力。即使在高频输入下,使用该网络调整的模型也保留了初始网络的非线性预测能力。总的来说,这项工作开创了将人类反馈整合到 SFT 优化中,为 SBW 系统的进步提供了新颖的见解。
更新日期:2024-03-25
down
wechat
bug