当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust risk-averse multi-armed bandits with application in social engagement behavior of children with autism spectrum disorder while imitating a humanoid robot
Information Sciences ( IF 8.1 ) Pub Date : 2021-05-29 , DOI: 10.1016/j.ins.2021.05.067
Azra Aryania , Hadi S. Aghdasi , Rasoul Heshmati , Andrea Bonarini

The stochastic multi-armed bandit problem is a standard model to solve the exploration–exploitation trade-off in sequential decision problems. In clinical trials, which are sensitive to outlier data, the goal is to learn a risk-averse policy to provide a trade-off between exploration, exploitation, and safety. In this paper, we present a risk-averse multi-armed bandit algorithm to solve a decision-making problem based on the social engagement behaviors of children with Autism Spectrum Disorder (ASD). The algorithm is carried out when children interact with a humanoid robot and imitate a sequence of the robot's movements. The proposed algorithm is based on the Best Empirical Sampled Average algorithm under Entropic Value-at-Risk as a risk measure to decide on the best sequence of movements that can improve the social engagement behaviors of the children with ASD while imitating the robot's movements. We provide a detailed experimental analysis to compare the performance of our proposed algorithm to some well-known risk-averse multi-armed bandit algorithms on some artificial scenarios and our real-world problem. The experimental results report that the proposed algorithm outperforms its competitors in terms of robustness, risk avoidance, and cumulative regret, promoting the social engagement behaviors of children with ASD when imitating a robot's movements.



中文翻译:

鲁棒的风险规避多臂匪徒在模仿人形机器人时应用于自闭症谱系障碍儿童的社会参与行为

随机多臂老虎机问题是解决顺序决策问题中探索-开发权衡的标准模型。在对异常数据敏感的临床试验中,目标是学习一种规避风险的策略,以在探索、开发和安全之间进行权衡。在本文中,我们提出了一种规避风险的多臂老虎机算法,以解决基于自闭症谱系障碍 (ASD) 儿童的社会参与行为的决策问题。当儿童与人形机器人互动并模仿机器人的一系列动作时,就会执行该算法。所提出的算法基于风险熵值下的最佳经验采样平均算法作为风险度量来决定最佳动作序列,以在模仿机器人动作的同时改善 ASD 儿童的社交参与行为。我们提供了详细的实验分析,以在一些人工场景和我们的现实世界问题上将我们提出的算法的性能与一些众所周知的规避风险的多臂老虎机算法的性能进行比较。实验结果表明,该算法在鲁棒性、风险规避和累积后悔方面优于竞争对手,促进了 ASD 儿童在模仿机器人动作时的社交参与行为。我们提供了详细的实验分析,以在一些人工场景和我们的现实世界问题上将我们提出的算法的性能与一些众所周知的规避风险的多臂老虎机算法的性能进行比较。实验结果表明,该算法在鲁棒性、风险规避和累积后悔方面优于竞争对手,促进了 ASD 儿童在模仿机器人动作时的社交参与行为。我们提供了详细的实验分析,以在一些人工场景和我们的现实世界问题上将我们提出的算法的性能与一些众所周知的规避风险的多臂老虎机算法的性能进行比较。实验结果表明,该算法在鲁棒性、风险规避和累积后悔方面优于竞争对手,促进了 ASD 儿童在模仿机器人动作时的社交参与行为。

更新日期:2021-06-11
down
wechat
bug