Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
arXiv - CS - Multimedia Pub Date : 2020-09-07 , DOI: arxiv-2009.03411
Wei Zhou and Zhibo Chen

In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ) to predict the perceptual quality of various distorted videos in a no-reference manner. In the proposed DeepSTQ, we first extract local and global spatiotemporal features by pre-trained deep learning models without fine-tuning or training from scratch. The composited features consider distorted video frames as well as frame difference maps from both global and local views. Then, the feature aggregation is conducted by the regression model to predict the perceptual video quality. Finally, experimental results demonstrate that our proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms.

中文翻译：

用于盲视频质量评估的深度局部和全局时空特征聚合

近年来，深度学习在多媒体质量评估方面取得了可喜的成功，尤其是在图像质量评估（IQA）方面。然而，由于视频中存在更复杂的时间特征，因此利用强大的深度卷积神经网络 (DCNN) 在视频质量评估 (VQA) 方面所做的工作很少。在本文中，我们提出了一种名为 Deep SpatioTemporal 视频质量评估器（DeepSTQ）的有效 VQA 方法，以无参考的方式预测各种失真视频的感知质量。在提议的 DeepSTQ 中，我们首先通过预训练的深度学习模型提取局部和全局时空特征，无需微调或从头开始训练。合成特征考虑了失真的视频帧以及来自全局和局部视图的帧差异图。然后，特征聚合由回归模型进行，以预测感知视频质量。最后，实验结果表明，我们提出的 DeepSTQ 优于最先进的质量评估算法。

更新日期：2020-09-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文