Testing self-healing cyber-physical systems under uncertainty with reinforcement learning: an empirical study,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Testing self-healing cyber-physical systems under uncertainty with reinforcement learning: an empirical study
Empirical Software Engineering ( IF 3.5 ) Pub Date : 2021-04-01 , DOI: 10.1007/s10664-021-09941-z
Tao Ma , Shaukat Ali , Tao Yue

Self-healing is becoming an essential feature of Cyber-Physical Systems (CPSs). CPSs with this feature are named Self-Healing CPSs (SH-CPSs). SH-CPSs detect and recover from errors caused by hardware or software faults at runtime and handle uncertainties arising from their interactions with environments. Therefore, it is critical to test if SH-CPSs can still behave as expected under uncertainties. By testing an SH-CPS in various conditions and learning from testing results, reinforcement learning algorithms can gradually optimize their testing policies and apply the policies to detect failures, i.e., cases that the SH-CPS fails to behave as expected. However, there is insufficient evidence to know which reinforcement learning algorithms perform the best in terms of testing SH-CPSs behaviors including their self-healing behaviors under uncertainties. To this end, we conducted an empirical study to evaluate the performance of 14 combinations of reinforcement learning algorithms, with two value function learning based methods for operation invocations and seven policy optimization based algorithms for introducing uncertainties. Experimental results reveal that the 14 combinations of the algorithms achieved similar coverage of system states and transitions, and the combination of Q-learning and Uncertainty Policy Optimization (UPO) detected the most failures among the 14 combinations. On average, the Q-Learning and UPO combination managed to discover two times more failures than the others. Meanwhile, the combination took 52% less time to find a failure. Regarding scalability, the time and space costs of the value function learning based methods grow, as the number of states and transitions of the system under test increases. In contrast, increasing the system’s complexity has little impact on policy optimization based algorithms.

中文翻译：

通过强化学习测试不确定性下的自我修复网络物理系统：一项实证研究

自我修复已成为网络物理系统（CPS）的基本功能。具有此功能的CPS称为自修复CPS（SH-CPS）。SH-CPS在运行时检测由硬件或软件故障引起的错误并从中恢复，并处理因其与环境的交互而产生的不确定性。因此，测试SH-CPS在不确定性下是否仍能按预期运行至关重要。通过在各种条件下测试SH-CPS并从测试结果中学习，强化学习算法可以逐步优化其测试策略，并将这些策略应用于检测故障，即SH-CPS无法按预期运行的情况。然而，没有足够的证据知道哪种强化学习算法在测试SH-CPS行为（包括不确定情况下的自愈行为）方面表现最佳。为此，我们进行了一项实证研究，以评估14种强化学习算法组合的性能，其中两种基于值函数学习的方法用于操作调用，而七种基于策略优化的算法用于引入不确定性。实验结果表明，算法的14种组合实现了对系统状态和转换的相似覆盖，并且Q学习和不确定性策略优化（UPO）的组合在这14种组合中检测到的故障最多。平均而言，Q-Learning和UPO的组合发现的故障比其他方法多两倍。同时，合并花费了52％的时间减少了查找故障的时间。关于可伸缩性，随着被测系统的状态和转换数量的增加，基于价值函数学习的方法的时间和空间成本也在增长。相反，增加系统的复杂性对基于策略优化的算法影响很小。

更新日期：2021-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11