当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Short-term anchor linking and long-term self-guided attention for video object detection
Image and Vision Computing ( IF 4.7 ) Pub Date : 2021-04-18 , DOI: 10.1016/j.imavis.2021.104179
Daniel Cores , Víctor M. Brea , Manuel Mucientes

We present a new network architecture able to take advantage of spatio-temporal information available in videos to boost object detection precision. First, box features are associated and aggregated by linking proposals that come from the same anchor box in the nearby frames. Then, we design a new attention module that aggregates short-term enhanced box features to exploit long-term spatio-temporal information. This module takes advantage of geometrical features in the long-term for the first time in the video object detection domain. Finally, a spatio-temporal double head is fed with both spatial information from the reference frame and the aggregated information that takes into account the short- and long-term temporal context. We have tested our proposal in five video object detection datasets with very different characteristics, in order to prove its robustness in a wide number of scenarios. Non-parametric statistical tests show that our approach outperforms the state-of-the-art. Our code is available at https://github.com/daniel-cores/SLTnet.



中文翻译:

用于视频对象检测的短期锚链接和长期自我指导的注意力

我们提出了一种新的网络架构,该架构能够利用视频中的时空信息来提高物体检测的精度。首先,通过链接来自同一锚定框的投标来关联和聚合框特征在附近的框架中。然后,我们设计了一个新的注意力模块,该模块聚集了短期增强的框特征以利用长期的时空信息。该模块首次在视频对象检测领域中长期利用了几何特征。最后,时空双头会同时收到来自参考帧的空间信息和考虑了短期和长期时间上下文的汇总信息。我们已经在五个具有非常不同特征的视频对象检测数据集中测试了我们的建议,以证明其在多种情况下的鲁棒性。非参数统计测试表明,我们的方法优于最新技术。我们的代码可从https://github.com/daniel-cores/SLTnet获得。

更新日期:2021-04-26
down
wechat
bug