当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CLSA: A Contrastive Learning Framework With Selective Aggregation for Video Rescaling
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2023-02-10 , DOI: 10.1109/tip.2023.3242774
Yuan Tian 1 , Yichao Yan 2 , Guangtao Zhai 2 , Li Chen 2 , Zhiyong Gao 2
Affiliation  

Video rescaling has recently drawn extensive attention for its practical applications such as video compression. Compared to video super-resolution, which focuses on upscaling bicubic-downscaled videos, video rescaling methods jointly optimize a downscaler and a upscaler. However, the inevitable loss of information during downscaling makes the upscaling procedure still ill-posed. Furthermore, the network architecture of previous methods mostly relies on convolution to aggregate information within local regions, which cannot effectively capture the relationship between distant locations. To address the above two issues, we propose a unified video rescaling framework by introducing the following designs. First, we propose to regularize the information of the downscaled videos via a contrastive learning framework, where, particularly, hard negative samples for learning are synthesized online. With this auxiliary contrastive learning objective, the downscaler tends to retain more information that benefits the upscaler. Second, we present a selective global aggregation module (SGAM) to efficiently capture long-range redundancy in high-resolution videos, where only a few representative locations are adaptively selected to participate in the computationally-heavy self-attention (SA) operations. SGAM enjoys the efficiency of the sparse modeling scheme while preserving the global modeling capability of SA. We refer to the proposed framework as Contrastive Learning framework with Selective Aggregation (CLSA) for video rescaling. Comprehensive experimental results show that CLSA outperforms video rescaling and rescaling-based video compression methods on five datasets, achieving state-of-the-art performance.

中文翻译:

CLSA:一种用于视频缩放的选择性聚合的对比学习框架

视频缩放最近因其视频压缩等实际应用而引起广泛关注。与侧重于放大双三次缩小视频的视频超分辨率相比,视频重新缩放方法共同优化了降频器和升频器。然而,降尺度过程中不可避免的信息丢失使得升尺度过程仍然不适定。此外,以往方法的网络架构大多依赖于卷积来聚合局部区域内的信息,无法有效捕捉远距离位置之间的关系。为了解决上述两个问题,我们通过引入以下设计提出了一个统一的视频缩放框架。首先,我们建议通过对比学习框架来规范缩小视频的信息,特别是,用于学习的难负样本在线合成。有了这个辅助对比学习目标,降频器往往会保留更多有利于升频器的信息。其次,我们提出了一个选择性全局聚合模块 (SGAM) 来有效地捕获高分辨率视频中的远程冗余,其中只有少数代表性位置被自适应地选择以参与计算量大的自注意力 (SA) 操作。SGAM 享有稀疏建模方案的效率,同时保留了 SA 的全局建模能力。我们将提议的框架称为具有选择性聚合 (CLSA) 的对比学习框架,用于视频缩放。综合实验结果表明,CLSA 在五个数据集上优于视频缩放和基于缩放的视频压缩方法,
更新日期:2023-02-10
down
wechat
bug