Dependability Analysis of Deep Reinforcement Learning based Robotics and Autonomous Systems,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dependability Analysis of Deep Reinforcement Learning based Robotics and Autonomous Systems
arXiv - CS - Software Engineering Pub Date : 2021-09-14 , DOI: arxiv-2109.06523
Yi Dong, Xingyu Zhao, Xiaowei Huang

While Deep Reinforcement Learning (DRL) provides transformational capabilities to the control of Robotics and Autonomous Systems (RAS), the black-box nature of DRL and uncertain deployment-environments of RAS pose new challenges on its dependability. Although there are many existing works imposing constraints on the DRL policy to ensure a successful completion of the mission, it is far from adequate in terms of assessing the DRL-driven RAS in a holistic way considering all dependability properties. In this paper, we formally define a set of dependability properties in temporal logic and construct a Discrete-Time Markov Chain (DTMC) to model the dynamics of risk/failures of a DRL-driven RAS interacting with the stochastic environment. We then do Probabilistic Model Checking based on the designed DTMC to verify those properties. Our experimental results show that the proposed method is effective as a holistic assessment framework, while uncovers conflicts between the properties that may need trade-offs in the training. Moreover, we find the standard DRL training cannot improve dependability properties, thus requiring bespoke optimisation objectives concerning them. Finally, our method offers a novel dependability analysis to the Sim-to-Real challenge of DRL.

中文翻译：

基于深度强化学习的机器人和自治系统的可靠性分析

虽然深度强化学习 (DRL) 为控制机器人和自治系统 (RAS) 提供了转换能力，但 DRL 的黑盒性质和 RAS 的不确定部署环境对其可靠性提出了新的挑战。尽管有许多现有工作对 DRL 政策施加了限制以确保成功完成任务，但在考虑所有可靠性属性的整体方式评估 DRL 驱动的 RAS 方面还远远不够。在本文中，我们正式定义了时间逻辑中的一组可靠性属性，并构建了一个离散时间马尔可夫链 (DTMC) 来模拟 DRL 驱动的 RAS 与随机环境交互的风险/故障动态。然后我们根据设计的 DTMC 进行概率模型检查以验证这些属性。我们的实验结果表明，所提出的方法作为整体评估框架是有效的，同时揭示了在训练中可能需要权衡的属性之间的冲突。此外，我们发现标准 DRL 训练无法提高可靠性属性，因此需要定制优化目标。最后，我们的方法为 DRL 的 Sim-to-Real 挑战提供了一种新颖的可靠性分析。

更新日期：2021-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文