End-to-End Learning Deep CRF models for Multi-Object Tracking,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

End-to-End Learning Deep CRF models for Multi-Object Tracking
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.3 ) Pub Date : 2021-01-01 , DOI: 10.1109/tcsvt.2020.2975842
Jun Xiang , Guohan Xu , Chao Ma , Jianhua Hou

By bundling multiple complex sub-problems into a unified framework, end-to-end deep learning frameworks reduce the need for hand engineering or tuning of parameters for each component, and optimize different modules jointly to ensure the generalization of the whole deep architecture. Despite tremendous success in numerous computer vision tasks, end-to-end learnings for multi-object tracking (MOT), especially for the assignment problem in data association, have been surprisingly less investigated mainly due to the lack of available training data. Furthermore, it is challenging to discriminate target objects under mutual occlusions or to reduce identity switches in crowded scenes. To tackle these challenges, this paper proposes learning deep conditional random field (CRF) networks, aiming to model the assignment costs as unary potentials and the long-term dependencies among detection results as pairwise potentials. Specifically, we use a bidirectional long short-term memory (LSTM) network to encode the long-term dependencies. We pose the CRF inference as a recurrent neural network learning process using the standard gradient descent algorithm, where unary and pairwise potentials are jointly optimized in an end-to-end manner. Extensive experiments are conducted on the challenging MOT datasets including MOT15, MOT16 and MOT17, and the results show that the proposed algorithm performs favorably against the state-of-the-art methods.

中文翻译：

用于多目标跟踪的端到端学习深度 CRF 模型

通过将多个复杂的子问题捆绑到一个统一的框架中，端到端的深度学习框架减少了每个组件的手工工程或参数调整的需要，并联合优化不同的模块以确保整个深度架构的泛化。尽管在众多计算机视觉任务中取得了巨大成功，但由于缺乏可用的训练数据，多目标跟踪 (MOT) 的端到端学习，尤其是数据关联中的分配问题，却出人意料地很少被研究。此外，在相互遮挡的情况下区分目标对象或减少拥挤场景中的身份切换是具有挑战性的。为了应对这些挑战，本文提出学习深度条件随机场（CRF）网络，旨在将分配成本建模为一元势，并将检测结果之间的长期依赖性建模为成对势。具体来说，我们使用双向长短期记忆 (LSTM) 网络来编码长期依赖关系。我们使用标准梯度下降算法将 CRF 推理作为循环神经网络学习过程，其中以端到端的方式联合优化一元和成对电位。在包括 MOT15、MOT16 和 MOT17 在内的具有挑战性的 MOT 数据集上进行了广泛的实验，结果表明，所提出的算法与最先进的方法相比表现良好。我们使用标准梯度下降算法将 CRF 推理作为循环神经网络学习过程，其中以端到端的方式联合优化一元和成对电位。在包括 MOT15、MOT16 和 MOT17 在内的具有挑战性的 MOT 数据集上进行了广泛的实验，结果表明，所提出的算法与最先进的方法相比表现良好。我们使用标准梯度下降算法将 CRF 推理作为循环神经网络学习过程，其中以端到端的方式联合优化一元和成对电位。在包括 MOT15、MOT16 和 MOT17 在内的具有挑战性的 MOT 数据集上进行了广泛的实验，结果表明，所提出的算法与最先进的方法相比表现良好。

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11