Work in Progress: Temporally Extended Auxiliary Tasks,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Work in Progress: Temporally Extended Auxiliary Tasks
arXiv - CS - Artificial Intelligence Pub Date : 2020-04-01 , DOI: arxiv-2004.00600
Craig Sherstan, Bilal Kartal, Pablo Hernandez-Leal, and Matthew E. Taylor

Predictive auxiliary tasks have been shown to improve performance in numerous reinforcement learning works, however, this effect is still not well understood. The primary purpose of the work presented here is to investigate the impact that an auxiliary task's prediction timescale has on the agent's policy performance. We consider auxiliary tasks which learn to make on-policy predictions using temporal difference learning. We test the impact of prediction timescale using a specific form of auxiliary task in which the input image is used as the prediction target, which we refer to as temporal difference autoencoders (TD-AE). We empirically evaluate the effect of TD-AE on the A2C algorithm in the VizDoom environment using different prediction timescales. While we do not observe a clear relationship between the prediction timescale on performance, we make the following observations: 1) using auxiliary tasks allows us to reduce the trajectory length of the A2C algorithm, 2) in some cases temporally extended TD-AE performs better than a straight autoencoder, 3) performance with auxiliary tasks is sensitive to the weight placed on the auxiliary loss, 4) despite this sensitivity, auxiliary tasks improved performance without extensive hyper-parameter tuning. Our overall conclusions are that TD-AE increases the robustness of the A2C algorithm to the trajectory length and while promising, further study is required to fully understand the relationship between auxiliary task prediction timescale and the agent's performance.

中文翻译：

正在进行的工作：临时扩展的辅助任务

在许多强化学习工作中，预测性辅助任务已被证明可以提高性能，但是，这种效果仍未得到很好的理解。此处介绍的工作的主要目的是研究辅助任务的预测时间尺度对代理策略性能的影响。我们考虑学习使用时间差异学习进行策略预测的辅助任务。我们使用特定形式的辅助任务测试预测时间尺度的影响，其中输入图像用作预测目标，我们将其称为时间差异自动编码器 (TD-AE)。我们使用不同的预测时间尺度，凭经验评估了 TD-AE 对 VizDoom 环境中 A2C 算法的影响。虽然我们没有观察到预测时间尺度与性能之间的明确关系，但我们进行了以下观察：1）使用辅助任务允许我们减少 A2C 算法的轨迹长度，2）在某些情况下，时间扩展的 TD-AE 表现更好与直接自动编码器相比，3) 辅助任务的性能对辅助损失的权重很敏感，4) 尽管有这种敏感性，辅助任务在没有大量超参数调整的情况下提高了性能。我们的总体结论是 TD-AE 增加了 A2C 算法对轨迹长度的鲁棒性，虽然很有希望，但需要进一步研究以充分了解辅助任务预测时间尺度与智能体性能之间的关系。1) 使用辅助任务允许我们减少 A2C 算法的轨迹长度，2) 在某些情况下，时间扩展的 TD-AE 比直接自动编码器表现更好，3) 辅助任务的性能对放置在辅助损失上的权重很敏感, 4) 尽管有这种敏感性，辅助任务在没有大量超参数调整的情况下提高了性能。我们的总体结论是 TD-AE 增加了 A2C 算法对轨迹长度的鲁棒性，虽然很有希望，但需要进一步研究以充分了解辅助任务预测时间尺度与智能体性能之间的关系。1) 使用辅助任务允许我们减少 A2C 算法的轨迹长度，2) 在某些情况下，时间扩展的 TD-AE 比直接自动编码器表现更好，3) 辅助任务的性能对放置在辅助损失上的权重很敏感, 4) 尽管有这种敏感性，辅助任务在没有大量超参数调整的情况下提高了性能。我们的总体结论是 TD-AE 增加了 A2C 算法对轨迹长度的鲁棒性，虽然很有希望，但需要进一步研究以充分了解辅助任务预测时间尺度与智能体性能之间的关系。3) 辅助任务的性能对辅助损失的权重很敏感，4) 尽管有这种敏感性，辅助任务在没有大量超参数调整的情况下提高了性能。我们的总体结论是 TD-AE 增加了 A2C 算法对轨迹长度的鲁棒性，虽然很有希望，但需要进一步研究以充分了解辅助任务预测时间尺度与智能体性能之间的关系。3) 辅助任务的性能对辅助损失的权重很敏感，4) 尽管有这种敏感性，辅助任务在没有大量超参数调整的情况下提高了性能。我们的总体结论是 TD-AE 增加了 A2C 算法对轨迹长度的鲁棒性，虽然很有希望，但需要进一步研究以充分了解辅助任务预测时间尺度与智能体性能之间的关系。

更新日期：2020-04-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>