Abstract
The newest video coding standard, versatile video coding (VVC), has just been published recently. While it greatly improves the performance over the last High Efficiency Video Coding (HEVC) standard, there are still blocking artifacts under the more flexible block partitioning structures. In order to reduce the blocking artifact and improve the quality of the reconstructed video frame, an explicit self-attention-based multimodality convolutional neural network (CNN) is proposed in this paper. It adaptively adjusts the restoration of different coding units (CU) according to the CU partition structure and texture of the reconstructed video frame, considering that the loss scales of different CUs can be quite different. The proposed method takes advantage of the CU partition map by using it as a different modality and combined with the attention mechanism. Moreover, the unfiltered reconstructed image is also used to enhance the attention branch, which forms an explicit self-attention model. Then a densely integrated multi-stage fusion is developed where the attention branch is densely fused to the main filtering CNN to adaptively adjust the overall image recovery scale. Thorough analysis on the proposed method is provided with ablation study on each module. Experimental results show that the proposed method achieves the state-of-the-art performance under all intra (AI) configuration, with 7.24% BD-rate savings on average compared with the VVC reference software (VTM).
Similar content being viewed by others
References
Birman R, Segal Y, Hadar O (2020) Overview of research in the field of video compression using deep neural networks. Multimed Tools Appl 79:11699–11722
Bossen F, Boyce J, Li X, and Seregin V, Sühring K (2018) JVET common test conditions and software reference configurations for SDR video, document JVET-L1010, 12th JVET meeting: Macao, CN, pages 3–12
Bross B, Chen J, Liu S (2018) Versatile Video Coding (Draft 3), document JVET-L1001, Macao, CN, pages 3–12
Cavigelli L, Hager P, Benini L (2017) CAS-CNN: A deep convolutional neural network for image compression artifact suppression. 2017 International Joint Conference on Neural Networks (IJCNN), pages 752–759
Dai Y, Liu D, Wu F (2017) A convolutional neural network approach for post-processing in HEVC intra coding. MMM
Ding D, Kong L, Chen G, Liu Z, Fang Y (2020) A switchable deep learning approach for in-loop filtering in video coding. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1871-1887
Dong C, Deng Y, Loy CC, Tang X (2015) Compression Artifacts Reduction by a Deep Convolutional Network. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 576–584
ElHarrouss O, Almaadeed N, Al-Máadeed S, Akbari Y (2019) Image Inpainting: a review. Neural Process Lett 51:2007–2028
Fu C, Alshina E, Alshin A, Huang Y, Chen C, Tsai C, Hsu C, Lei S, Park J, Han W (2012) Sample adaptive offset in the HEVC standard. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1755-1764
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity Mappings in Deep Residual Networks. In European Conference on Computer Vision (ECCV), pages 630–645
He X, Hu Q, Han X, Zhang X, Zhang C, Lin W (2018) Enhancing HEVC Compressed Videos with a Partition-Masked Convolutional Neural Network. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 216–220
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. In IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 2011-2023
Huang G, Liu Z., Weinberger KQ (2017) Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269
Huang Z, Li Y, Sun J (2020) Multi-Gradient Convolutional Neural Network Based In-Loop Filter For Vvc. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6
Jia, C., Wang, S., Zhang, X., Wang S, Liu J, Pu S, Ma S (2019) Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing, pages 3343-3356
Kang J, Kim S, Lee KM (2017) Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec. In 2017 IEEE International Conference on Image Processing (ICIP), pages 26–30
Kingma DP, Ba J (2015). Adam: a method for stochastic optimization. In 2015 international conference on learning representations (ICLR), 2015
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90
Lai P, Wang J (2020) Multi-stage Attention Convolutional Neural Networks for HEVC In-Loop Filtering. In 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pages 173–177
Li D, Yu L (2019) An In-Loop Filter Based on Low-Complexity CNN using Residuals in Intra Video Coding. 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5
Li C, Song L, Xie R, Zhang W (2017) CNN based post-processing to improve HEVC. 2017 IEEE International Conference on Image Processing (ICIP), pages 4577–4580
Li S, Li W, Cook C, Zhu C, Gao Y. (2018). Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5457-5466, 2018
Li S, Li W, Cook C, Zhu C, Gao Y (2019) A fully trainable network with RNN-based pooling. CoRR
Li X, Wang W, Hu X, Yang J (2019) Selective Kernel Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 510–519
Ma D, Zhang F, Bull D (2020) BVI-DVC: A Training Database for Deep Video Compression. CoRR
Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, & Terzopoulos D. (2021) Image Segmentation Using Deep Learning: A Survey. IEEE transactions on pattern analysis and machine intelligence
Mishkin D, Matas J (2016). All you need is a good init. In 2016 international conference on learning representations (ICLR), 2016.
Norkin A, Bjøntegaard G, Fuldseth A, Narroschke M, Ikeda M, Andersson K, Zhou M, Auwera G (2012) HEVC Deblocking Filter. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1746-1754
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. CoRR
Sullivan G, Ohm J, Han W, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1649-1668
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826
Tian C, Fei L, Zheng W, Xu Y, Zuo W, Lin C (2020) Deep learning on image Denoising: an overview. Neural networks: the official journal of the International Neural Network Society 131:251–275
Tsai C, Chen C, Yamakage T, Chong I.S, Huang Y, Fu C, Itoh T, Watanabe T, Chujoh T, Karczewicz M, Lei S (2012) Adaptive loop filtering for video coding. In IEEE Journal of Selected Topics in Signal Processing, pages 934-945
Wang Z, Li F (2021) Convolutional neural network based low complexity HEVC intra encoder. Multimed Tools Appl 80:2441–2460
Wang X, Girshick RB, Gupta A, He K (2018) Non-local Neural Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7794-7803, 2018.
Wang M, Wan S, Gong H, Ma M (2019) Attention-based dual-scale CNN in-loop filter for versatile video coding. In IEEE Access 7:145214–145226
Wang Z, Chen J, Hoi S (2020) Deep learning for image super-resolution: a survey. IEEE Trans Pattern Anal Mach Intell
Woo S, Park J, Lee J, Kweon I (2018) CBAM: Convolutional Block Attention Module. ECCV, 2018.
Yao J and Wang L (2019) CE13–2.1. Convolutional Neural Network Filter (CNNF) for Intra Frame, document JVET-N0169, 14th JVET Meeting, Geneva, Switzerland, pages 19–27
Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual Dense Network for Image Super-Resolution. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2472-2481
Zhang Y, Shen T, Ji X, Xiong R, Dai Q (2018) Residual highway convolutional neural networks for in-loop filtering in HEVC. In IEEE Transactions on Image Processing, pages 3827-3841
Zhang S, Fan Z, Ling N, Jiang M (2020) Recursive residual convolutional neural network- based in-loop filtering for intra frames. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1888-1900
Acknowledgements
This work was partly supported by the National Key R&D Program of China (2018YFE0203900), the National Natural Science Foundation of China (No. 61901083 and 62001092), Project of Quzhou Municipal Government (2020D011), and SDU QILU Young Scholars program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Menghu Jia and Yanbo Gao co-first authors
Rights and permissions
About this article
Cite this article
Jia, M., Gao, Y., Li, S. et al. An explicit self-attention-based multimodality CNN in-loop filter for versatile video coding. Multimed Tools Appl 81, 42497–42511 (2022). https://doi.org/10.1007/s11042-021-11214-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11214-2