当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Real-Time Video Super-Resolution by Joint Local Inference and Global Parameter Estimation
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02794
Noam Elron, Alex Itskovich, Shahar S. Yuval, Noam Levy

The state of the art in video super-resolution (SR) are techniques based on deep learning, but they perform poorly on real-world videos (see Figure 1). The reason is that training image-pairs are commonly created by downscaling a high-resolution image to produce a low-resolution counterpart. Deep models are therefore trained to undo downscaling and do not generalize to super-resolving real-world images. Several recent publications present techniques for improving the generalization of learning-based SR, but are all ill-suited for real-time application. We present a novel approach to synthesizing training data by simulating two digital-camera image-capture processes at different scales. Our method produces image-pairs in which both images have properties of natural images. Training an SR model using this data leads to far better generalization to real-world images and videos. In addition, deep video-SR models are characterized by a high operations-per-pixel count, which prohibits their application in real-time. We present an efficient CNN architecture, which enables real-time application of video SR on low-power edge-devices. We split the SR task into two sub-tasks: a control-flow which estimates global properties of the input video and adapts the weights and biases of a processing-CNN that performs the actual processing. Since the process-CNN is tailored to the statistics of the input, its capacity kept low, while retaining effectivity. Also, since video-statistics evolve slowly, the control-flow operates at a much lower rate than the video frame-rate. This reduces the overall computational load by as much as two orders of magnitude. This framework of decoupling the adaptivity of the algorithm from the pixel processing, can be applied in a large family of real-time video enhancement applications, e.g., video denoising, local tone-mapping, stabilization, etc.

中文翻译:

联合局部推断和全局参数估计的实时视频超分辨率

视频超分辨率(SR)的最新技术是基于深度学习的技术,但它们在现实世界中的视频表现不佳(请参见图1)。原因是通常通过缩小高分辨率图像以产生低分辨率副本来创建训练图像对。因此,对深度模型进行了训练以消除缩小比例,并且不能泛化为超分辨率的真实世界图像。最近的一些出版物提出了用于改进基于学习的SR泛化的技术,但都不适合实时应用。我们提出了一种通过模拟两个不同规模的数码相机图像捕获过程来综合训练数据的新颖方法。我们的方法产生图像对,其中两个图像都具有自然图像的属性。使用此数据训练SR模型可以更好地推广到现实世界的图像和视频。此外,深度视频SR模型的特点是每个像素的操作数很高,这阻止了它们的实时应用。我们提出了一种高效的CNN架构,该架构可在低功耗边缘设备上实时应用视频SR。我们将SR任务分为两个子任务:一个控制流,该流估计输入视频的全局属性,并调整执行实际处理的处理CNN的权重和偏差。由于过程CNN是为输入的统计量身定制的,因此它的容量保持较低,同时保持了有效性。此外,由于视频统计信息发展缓慢,因此控制流的运行速率远低于视频帧速率。这样可以将总体计算量减少多达两个数量级。这种将算法的适应性与像素处理解耦的框架可以应用于大量的实时视频增强应用,例如视频降噪,局部色调映射,稳定化等。
更新日期:2021-05-07
down
wechat
bug