POLO: Learning Explicit Cross-Modality Fusion for Temporal Action Localization,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

POLO: Learning Explicit Cross-Modality Fusion for Temporal Action Localization
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2021-02-24 , DOI: 10.1109/lsp.2021.3061289
Binglu Wang , Le Yang , Yongqiang Zhao

Temporal action localization aims at discovering action instances in untrimmed videos, where RGB and flow are two widely used feature modalities. Specifically, RGB chiefly reveals appearance and flow mainly depicts motion. Given RGB and flow features, previous methods employ the early fusion or late fusion paradigm to mine the complementarity between them. By concatenating raw RGB and flow features, the early fusion implicitly achieved complementarity by the network, but it partly discards the particularity of each modality. The late fusion independently maintains two branches to explore the particularity of each modality, but it only fuses the localization results, which is insufficient to mine the complementarity. In this work, we propose ex p licit cr o ss-moda l ity fusi o n (POLO) to effectively utilize the complementarity between two modalities and thoroughly explore the particularity of each modality. POLO performs cross-modality fusion via estimating the attention weight from RGB modality and employing it to flow modality (vice versa). This bridges the complementarity of one modality to supply the other. Assisted with the attention weight, POLO independently learns from RGB and flow features and explores the particularity of each modality. Extensive experiments on two benchmarks demonstrate the preferable performance of POLO.

中文翻译：

POLO：学习显式跨模态融合以实现时间动作本地化

时间动作定位旨在发现未修剪视频中的动作实例，其中 RGB 和流是两种广泛使用的特征模式。具体来说，RGB主要表现外观，流动主要描绘运动。给定RGB和流特征，以前的方法采用早期融合或晚期融合范式来挖掘它们之间的互补性。通过连接原始 RGB 和流特征，早期融合隐式地通过网络实现了互补性，但它部分地丢弃了每种模态的特殊性。后期融合独立维护两个分支来探索每种模态的特殊性，但仅融合定位结果，不足以挖掘互补性。在这项工作中，我们提出显式跨模态融合（POLO），以有效利用两种模态之间的互补性，并彻底探索每种模态的特殊性。 POLO 通过估计 RGB 模态的注意力权重并将其应用于流模态（反之亦然）来执行跨模态融合。这弥合了一种模式的互补性以供应另一种模式。在注意力权重的辅助下，POLO 独立学习 RGB 和流特征，并探索每种模态的特殊性。在两个基准测试上的大量实验证明了 POLO 的较好性能。

更新日期：2021-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11