当前位置: X-MOL 学术arXiv.cs.CC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Attention-Based Neural Networks for Chroma Intra Prediction in Video Coding
arXiv - CS - Computational Complexity Pub Date : 2021-02-09 , DOI: arxiv-2102.04993
Marc Górriz, Saverio Blasi, Alan F. Smeaton, Noel E. O'Connor, Marta Mrak

Neural networks can be successfully used to improve several modules of advanced video coding schemes. In particular, compression of colour components was shown to greatly benefit from usage of machine learning models, thanks to the design of appropriate attention-based architectures that allow the prediction to exploit specific samples in the reference region. However, such architectures tend to be complex and computationally intense, and may be difficult to deploy in a practical video coding pipeline. This work focuses on reducing the complexity of such methodologies, to design a set of simplified and cost-effective attention-based architectures for chroma intra-prediction. A novel size-agnostic multi-model approach is proposed to reduce the complexity of the inference process. The resulting simplified architecture is still capable of outperforming state-of-the-art methods. Moreover, a collection of simplifications is presented in this paper, to further reduce the complexity overhead of the proposed prediction architecture. Thanks to these simplifications, a reduction in the number of parameters of around 90% is achieved with respect to the original attention-based methodologies. Simplifications include a framework for reducing the overhead of the convolutional operations, a simplified cross-component processing model integrated into the original architecture, and a methodology to perform integer-precision approximations with the aim to obtain fast and hardware-aware implementations. The proposed schemes are integrated into the Versatile Video Coding (VVC) prediction pipeline, retaining compression efficiency of state-of-the-art chroma intra-prediction methods based on neural networks, while offering different directions for significantly reducing coding complexity.

中文翻译:

基于注意力的神经网络在视频编码中色度帧内预测

神经网络可以成功地用于改进高级视频编码方案的多个模块。尤其是,由于设计了适当的基于关注的体系结构,可以使颜色预测的压缩受益于机器学习模型,该体系结构允许进行预测以利用参考区域中的特定样本。然而,这样的架构趋向于复杂且计算量大,并且可能难以在实际的视频编码管线中部署。这项工作着重于降低此类方法的复杂性,以设计一套简化且具有成本效益的基于注意力的色度帧内预测架构。提出了一种新颖的尺寸不可知的多模型方法,以减少推理过程的复杂性。最终的简化体系结构仍然能够胜过最新方法。此外,本文提出了简化的集合,以进一步减少所提出的预测体系结构的复杂性开销。由于这些简化,相对于原始的基于注意力的方法,参数数量减少了约90%。简化包括一个用于减少卷积运算开销的框架,一个集成到原始体系结构中的简化的跨组件处理模型以及一种用于执行整数精度逼近的方法,旨在获得快速且硬件可感知的实现。拟议的方案已集成到多功能视频编码(VVC)预测管道中,
更新日期:2021-02-10
down
wechat
bug