当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
VideoDG: Generalizing Temporal Relations in Videos to Novel Domains.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2022-10-04 , DOI: 10.1109/tpami.2021.3116945
Zhiyu Yao , Yunbo Wang , Jianmin Wang , Philip S Yu , Mingsheng Long

This paper introduces video domain generalization where most video classification networks degenerate due to the lack of exposure to the target domains of divergent distributions. We observe that the global temporal features are less generalizable, due to the temporal domain shift that videos from other unseen domains may have an unexpected absence or misalignment of the temporal relations. This finding has motivated us to solve video domain generalization by effectively learning the local-relation features of different timescales that are more generalizable, and exploiting them along with the global-relation features to maintain the discriminability. This paper presents the VideoDG framework with two technical contributions. The first is a new deep architecture named the Adversarial Pyramid Network, which improves the generalizability of video features by capturing the local-relation, global-relation, and cross-relation features progressively. On the basis of pyramid features, the second contribution is a new and robust approach of adversarial data augmentation that can bridge different video domains by improving the diversity and quality of augmented data. We construct three video domain generalization benchmarks in which domains are divided according to different datasets, different consequences of actions, or different camera views, respectively. VideoDG consistently outperforms the combinations of previous video classification models and existing domain generalization methods on all benchmarks.

中文翻译:

VideoDG:将视频中的时间关系推广到新领域。

本文介绍了视频域泛化,其中大多数视频分类网络由于缺乏对不同分布的目标域的暴露而退化。我们观察到全局时间特征的泛化性较差,因为来自其他看不见的域的视频可能会出现时间关系的意外缺失或错位。这一发现促使我们通过有效地学习更通用的不同时间尺度的局部关系特征来解决视频域泛化问题,并将它们与全局关系特征一起利用来保持可辨别性。本文介绍了具有两项技术贡献的 VideoDG 框架。第一个是名为对抗金字塔网络的新深度架构,它通过逐步捕获局部关系、全局关系和交叉关系特征来提高视频特征的泛化性。在金字塔特征的基础上,第二个贡献是一种新的、稳健的对抗性数据增强方法,可以通过提高增强数据的多样性和质量来连接不同的视频域。我们构建了三个视频域泛化基准,其中域分别根据不同的数据集、不同的动作后果或不同的摄像机视图进行划分。VideoDG 在所有基准测试中始终优于先前视频分类模型和现有域泛化方法的组合。第二个贡献是对抗性数据增强的一种新的、强大的方法,它可以通过提高增强数据的多样性和质量来连接不同的视频域。我们构建了三个视频域泛化基准,其中域分别根据不同的数据集、不同的动作后果或不同的摄像机视图进行划分。VideoDG 在所有基准测试中始终优于先前视频分类模型和现有域泛化方法的组合。第二个贡献是对抗性数据增强的一种新的、强大的方法,它可以通过提高增强数据的多样性和质量来连接不同的视频域。我们构建了三个视频域泛化基准,其中域分别根据不同的数据集、不同的动作后果或不同的摄像机视图进行划分。VideoDG 在所有基准测试中始终优于先前视频分类模型和现有域泛化方法的组合。或不同的相机视图,分别。VideoDG 在所有基准测试中始终优于先前视频分类模型和现有域泛化方法的组合。或不同的相机视图,分别。VideoDG 在所有基准测试中始终优于先前视频分类模型和现有域泛化方法的组合。
更新日期:2021-10-01
down
wechat
bug