当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
End-to-End Active Object Tracking and Its Real-World Deployment via Reinforcement Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2-14-2019 , DOI: 10.1109/tpami.2019.2899570
Wenhan Luo , Peng Sun , Fangwei Zhong , Wei Liu , Tong Zhang , Yizhou Wang

We study active object tracking, where a tracker takes visual observations (i.e., frame sequences) as input and produces the corresponding camera control signals as output (e.g., move forward, turn left, etc.). Conventional methods tackle tracking and camera control tasks separately, and the resulting system is difficult to tune jointly. These methods also require significant human efforts for image labeling and expensive trial-and-error system tuning in the real world. To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning. A ConvNet-LSTM function approximator is adopted for the direct frame-to-action prediction. We further propose an environment augmentation technique and a customized reward function, which are crucial for successful training. The tracker trained in simulators (ViZDoom and Unreal Engine) demonstrates good generalization behaviors in the case of unseen object moving paths, unseen object appearances, unseen backgrounds, and distracting objects. The system is robust and can restore tracking after occasional lost of the target being tracked. We also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios. We demonstrate successful examples of such transfer, via experiments over the VOT dataset and the deployment of a real-world robot using the proposed active tracker trained in simulation.

中文翻译:


端到端主动对象跟踪及其通过强化学习的实际部署



我们研究主动对象跟踪,其中跟踪器将视觉观察(即帧序列)作为输入并产生相应的相机控制信号作为输出(例如,向前移动、向左转等)。传统方法分别处理跟踪和相机控制任务,并且所得系统难以联合调整。这些方法还需要大量的人力来进行图像标记,并在现实世界中进行昂贵的试错系统调整。为了解决这些问题,我们在本文中提出了一种通过深度强化学习的端到端解决方案。采用 ConvNet-LSTM 函数逼近器进行直接帧到动作预测。我们进一步提出了环境增强技术和定制奖励函数,这对于成功的训练至关重要。在模拟器(ViZDoom 和 Unreal Engine)中训练的跟踪器在看不见的物体移动路径、看不见的物体外观、看不见的背景和分散注意力的物体的情况下表现出了良好的泛化行为。该系统具有鲁棒性,可以在偶尔丢失跟踪目标后恢复跟踪。我们还发现,仅从模拟器获得的跟踪能力有可能转移到现实世界的场景中。我们通过对 VOT 数据集的实验以及使用经过模拟训练的主动跟踪器部署现实世界的机器人,展示了此类传输的成功示例。
更新日期:2024-08-22
down
wechat
bug