Skip to main content
Log in

An explicit self-attention-based multimodality CNN in-loop filter for versatile video coding

  • 1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The newest video coding standard, versatile video coding (VVC), has just been published recently. While it greatly improves the performance over the last High Efficiency Video Coding (HEVC) standard, there are still blocking artifacts under the more flexible block partitioning structures. In order to reduce the blocking artifact and improve the quality of the reconstructed video frame, an explicit self-attention-based multimodality convolutional neural network (CNN) is proposed in this paper. It adaptively adjusts the restoration of different coding units (CU) according to the CU partition structure and texture of the reconstructed video frame, considering that the loss scales of different CUs can be quite different. The proposed method takes advantage of the CU partition map by using it as a different modality and combined with the attention mechanism. Moreover, the unfiltered reconstructed image is also used to enhance the attention branch, which forms an explicit self-attention model. Then a densely integrated multi-stage fusion is developed where the attention branch is densely fused to the main filtering CNN to adaptively adjust the overall image recovery scale. Thorough analysis on the proposed method is provided with ablation study on each module. Experimental results show that the proposed method achieves the state-of-the-art performance under all intra (AI) configuration, with 7.24% BD-rate savings on average compared with the VVC reference software (VTM).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Birman R, Segal Y, Hadar O (2020) Overview of research in the field of video compression using deep neural networks. Multimed Tools Appl 79:11699–11722

    Article  Google Scholar 

  2. Bossen F, Boyce J, Li X, and Seregin V, Sühring K (2018) JVET common test conditions and software reference configurations for SDR video, document JVET-L1010, 12th JVET meeting: Macao, CN, pages 3–12

  3. Bross B, Chen J, Liu S (2018) Versatile Video Coding (Draft 3), document JVET-L1001, Macao, CN, pages 3–12

  4. Cavigelli L, Hager P, Benini L (2017) CAS-CNN: A deep convolutional neural network for image compression artifact suppression. 2017 International Joint Conference on Neural Networks (IJCNN), pages 752–759

  5. Dai Y, Liu D, Wu F (2017) A convolutional neural network approach for post-processing in HEVC intra coding. MMM

  6. Ding D, Kong L, Chen G, Liu Z, Fang Y (2020) A switchable deep learning approach for in-loop filtering in video coding. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1871-1887

  7. Dong C, Deng Y, Loy CC, Tang X (2015) Compression Artifacts Reduction by a Deep Convolutional Network. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 576–584

  8. ElHarrouss O, Almaadeed N, Al-Máadeed S, Akbari Y (2019) Image Inpainting: a review. Neural Process Lett 51:2007–2028

    Article  Google Scholar 

  9. Fu C, Alshina E, Alshin A, Huang Y, Chen C, Tsai C, Hsu C, Lei S, Park J, Han W (2012) Sample adaptive offset in the HEVC standard. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1755-1764

  10. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778

  11. He K, Zhang X, Ren S, Sun J (2016) Identity Mappings in Deep Residual Networks. In European Conference on Computer Vision (ECCV), pages 630–645

  12. He X, Hu Q, Han X, Zhang X, Zhang C, Lin W (2018) Enhancing HEVC Compressed Videos with a Partition-Masked Convolutional Neural Network. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 216–220

  13. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. In IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 2011-2023

  14. Huang G, Liu Z., Weinberger KQ (2017) Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269

  15. Huang Z, Li Y, Sun J (2020) Multi-Gradient Convolutional Neural Network Based In-Loop Filter For Vvc. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6

  16. Jia, C., Wang, S., Zhang, X., Wang S, Liu J, Pu S, Ma S (2019) Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing, pages 3343-3356

  17. Kang J, Kim S, Lee KM (2017) Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec. In 2017 IEEE International Conference on Image Processing (ICIP), pages 26–30

  18. Kingma DP, Ba J (2015). Adam: a method for stochastic optimization. In 2015 international conference on learning representations (ICLR), 2015

  19. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90

    Article  Google Scholar 

  20. Lai P, Wang J (2020) Multi-stage Attention Convolutional Neural Networks for HEVC In-Loop Filtering. In 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pages 173–177

  21. Li D, Yu L (2019) An In-Loop Filter Based on Low-Complexity CNN using Residuals in Intra Video Coding. 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5

  22. Li C, Song L, Xie R, Zhang W (2017) CNN based post-processing to improve HEVC. 2017 IEEE International Conference on Image Processing (ICIP), pages 4577–4580

  23. Li S, Li W, Cook C, Zhu C, Gao Y. (2018). Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5457-5466, 2018

  24. Li S, Li W, Cook C, Zhu C, Gao Y (2019) A fully trainable network with RNN-based pooling. CoRR

  25. Li X, Wang W, Hu X, Yang J (2019) Selective Kernel Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 510–519

  26. Ma D, Zhang F, Bull D (2020) BVI-DVC: A Training Database for Deep Video Compression. CoRR

  27. Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, & Terzopoulos D. (2021) Image Segmentation Using Deep Learning: A Survey. IEEE transactions on pattern analysis and machine intelligence

  28. Mishkin D, Matas J (2016). All you need is a good init. In 2016 international conference on learning representations (ICLR), 2016.

  29. Norkin A, Bjøntegaard G, Fuldseth A, Narroschke M, Ikeda M, Andersson K, Zhou M, Auwera G (2012) HEVC Deblocking Filter. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1746-1754

  30. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. CoRR

  31. Sullivan G, Ohm J, Han W, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1649-1668

  32. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9

  33. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826

  34. Tian C, Fei L, Zheng W, Xu Y, Zuo W, Lin C (2020) Deep learning on image Denoising: an overview. Neural networks: the official journal of the International Neural Network Society 131:251–275

    Article  MATH  Google Scholar 

  35. Tsai C, Chen C, Yamakage T, Chong I.S, Huang Y, Fu C, Itoh T, Watanabe T, Chujoh T, Karczewicz M, Lei S (2012) Adaptive loop filtering for video coding. In IEEE Journal of Selected Topics in Signal Processing, pages 934-945

  36. Wang Z, Li F (2021) Convolutional neural network based low complexity HEVC intra encoder. Multimed Tools Appl 80:2441–2460

    Article  Google Scholar 

  37. Wang X, Girshick RB, Gupta A, He K (2018) Non-local Neural Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7794-7803, 2018.

  38. Wang M, Wan S, Gong H, Ma M (2019) Attention-based dual-scale CNN in-loop filter for versatile video coding. In IEEE Access 7:145214–145226

  39. Wang Z, Chen J, Hoi S (2020) Deep learning for image super-resolution: a survey. IEEE Trans Pattern Anal Mach Intell

  40. Woo S, Park J, Lee J, Kweon I (2018) CBAM: Convolutional Block Attention Module. ECCV, 2018.

  41. Yao J and Wang L (2019) CE13–2.1. Convolutional Neural Network Filter (CNNF) for Intra Frame, document JVET-N0169, 14th JVET Meeting, Geneva, Switzerland, pages 19–27

  42. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual Dense Network for Image Super-Resolution. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2472-2481

  43. Zhang Y, Shen T, Ji X, Xiong R, Dai Q (2018) Residual highway convolutional neural networks for in-loop filtering in HEVC. In IEEE Transactions on Image Processing, pages 3827-3841

  44. Zhang S, Fan Z, Ling N, Jiang M (2020) Recursive residual convolutional neural network- based in-loop filtering for intra frames. In IEEE Transactions on Circuits and Systems for Video Technology, pages 1888-1900

Download references

Acknowledgements

This work was partly supported by the National Key R&D Program of China (2018YFE0203900), the National Natural Science Foundation of China (No. 61901083 and 62001092), Project of Quzhou Municipal Government (2020D011), and SDU QILU Young Scholars program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanbo Gao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Menghu Jia and Yanbo Gao co-first authors

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, M., Gao, Y., Li, S. et al. An explicit self-attention-based multimodality CNN in-loop filter for versatile video coding. Multimed Tools Appl 81, 42497–42511 (2022). https://doi.org/10.1007/s11042-021-11214-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11214-2

Keywords

Navigation