当前位置: X-MOL 学术IEEE Trans. Multimedia › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Steered Mixture-of-Experts for Light Field Images and Video: Representation and Coding
IEEE Transactions on Multimedia ( IF 8.4 ) Pub Date : 2020-03-01 , DOI: 10.1109/tmm.2019.2932614
Ruben Verhack , Thomas Sikora , Glenn Van Wallendael , Peter Lambert

Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution.

中文翻译:

光场图像和视频的导向混合专家:表示和编码

光场 (LF) 处理的研究在过去十年中大幅增加。这在很大程度上是由于希望为摄像机捕获的场景实现与当前可用于 CGI 内容相同的沉浸感和导航自由度。MPEG 和 JPEG 等标准化组织继续遵循传统的编码范例,其中视点在二维规则网格上离散表示。然后通过混合 DPCM/变换技术进一步去相关这些网格。然而,这些二维规则网格不太适合高维数据,例如 LF。我们提出了一种用于高维图像模态的新型编码框架,称为专家混合(SMoE)。高维空间中的相干区域由单个高维实体表示,称为内核。这些内核保存有关以任何角度到达某个区域的光线的空间局部信息。因此,全局模型由一组内核组成,这些内核定义了底层全光函数的连续近似。我们介绍了 SMoE 的理论并说明了它在 2-D 图像、4-D LF 图像和 5-D LF 视频中的应用。我们还提出了一种有效的编码策略来将模型参数转换为比特流。即使没有提供高频信息,所提出的方法在 4-D LF 图像的主观视觉质量方面也能与中低范围比特率的现有技术相媲美。在 5-D LF 视频的情况下,我们观察到出色的去相关和编码性能,编码增益为 4 倍的比特率,同样的质量。
更新日期:2020-03-01
down
wechat
bug