Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2021-10-05 , DOI: 10.1007/s11263-021-01519-y
G. Bellitto ₁ , F. Proietto Salanitri ₁ , S. Palazzo ₁ , D. Giordano ₁ , C. Spampinato ₁ , F. Rundo ₂

Affiliation

In this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model yields state-of-the-art accuracy on supervised saliency prediction. When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two. When, instead, we test it in an unsupervised domain adaptation setting, by enabling hierarchical gradient reversal layers, we obtain performance comparable to supervised state-of-the-art. Source code, trained models and example outputs are publicly available at https://github.com/perceivelab/hd2s.

中文翻译：

用于视频显着性预测的分层域适应特征学习

在这项工作中，我们提出了一种用于视频显着性预测的 3D 全卷积架构，该架构对使用在不同抽象级别提取的特征生成的中间图（称为显着图）进行分层监督。我们为基础分层学习机制提供了两种用于领域适应和领域特定学习的技术. 对于前者，我们鼓励模型在多个尺度上使用梯度反转来无监督地学习分层一般特征，以增强在训练期间未提供注释的数据集的泛化能力。至于领域专业化，我们通过对单个数据集的学习特征进行专业化来采用特定领域的操作（即先验、平滑和批量归一化），以最大限度地提高性能。我们的实验结果表明，所提出的模型在监督显着性预测方面产生了最先进的准确性。当基础层次模型被赋予特定领域的模块时，性能会提高，在 DHF1K 基准测试的五个指标中的三个指标上优于最先进的模型，并在其他两个指标上达到第二好的结果。相反，当我们在无监督域适应设置中对其进行测试，通过启用分层梯度反转层，我们获得了与监督的最新技术相当的性能。源代码、训练模型和示例输出可在 https://github.com/perceivelab/hd2s 上公开获得。

更新日期：2021-10-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11