当前位置: X-MOL 学术J. Syst. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A systematic study of reward for reinforcement learning based continuous integration testing
Journal of Systems and Software ( IF 3.5 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.jss.2020.110787
Yang Yang , Zheng Li , Liuliu He , Ruilian Zhao

Abstract Continuous integration(CI) testing is characterized by continually changing test cases, limited execution time, and fast feedback, where the classical test prioritization approaches are no longer suitable. Based on the essence of continuous decision mechanism, reinforcement learning(RL) is suggested for prioritizing test cases in CI testing, in which the reward plays a crucial role. In this paper, we conducted a systematic study of the reward function and reward strategy in CI testing. In terms of reward function, the whole historical execution information of test cases is used with the consideration of the failure times and failure distribution. Further considering the validity of historical information, partial historical information is used by proposing a time-window based approach. In terms of reward strategy which means how to reward, three strategies are introduced, i.e., total reward, partial reward, and fuzzy reward. The empirical study is conducted on four industrial-level programs, and the results reveal that using the reward function with historical information improves the Recall by on average 13.21% when compared with existing TF(Test Case Failure) reward function, and the fuzzy reward strategy is more flexible and improve the NAPFD(Normalized Average Percentage of Faults Detected) by on average 3.43% when compared with the other two strategies.

中文翻译:

基于持续集成测试的强化学习奖励系统研究

摘要 持续集成(CI)测试的特点是不断变化的测试用例、有限的执行时间和快速的反馈,经典的测试优先级方法不再适用。基于持续决策机制的本质,建议强化学习(RL)在 CI 测试中对测试用例进行优先排序,其中奖励起着至关重要的作用。在本文中,我们对 CI 测试中的奖励函数和奖励策略进行了系统的研究。在奖励函数方面,考虑了失败次数和失败分布,使用了测试用例的整个历史执行信息。进一步考虑历史信息的有效性,通过提出基于时间窗口的方法来使用部分历史信息。在奖励策略方面,即如何奖励,介绍了三种策略,即总奖励、部分奖励和模糊奖励。对四个工业级程序进行实证研究,结果表明,与现有的 TF(Test Case Failure) 奖励函数和模糊奖励策略相比,使用带有历史信息的奖励函数平均提高了 13.21% 的召回率与其他两种策略相比,它更灵活,并且将 NAPFD(检测到的故障的标准化平均百分比)平均提高了 3.43%。
更新日期:2020-12-01
down
wechat
bug