当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How deep is your encoder: an analysis of features descriptors for an autoencoder-based audio-visual quality metric
arXiv - CS - Multimedia Pub Date : 2020-03-24 , DOI: arxiv-2003.11100
Helard Martinez and Andrew Hines and Mylene C. Q. Farias

The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a machine learning perspective. The metric receives two sets of audio and video features descriptors and produces a low-dimensional set of features used to predict the audio-visual quality. A basic implementation of NAViDAd was able to produce accurate predictions tested with a range of different audio-visual databases. The current work performs an ablation study on the base architecture of the metric. Several modules are removed or re-trained using different configurations to have a better understanding of the metric functionality. The results presented in this study provided important feedback that allows us to understand the real capacity of the metric's architecture and eventually develop a much better audio-visual quality metric.

中文翻译:

您的编码器有多深:基于自动编码器的视听质量指标的特征描述符分析

为了获得准确的预测,视听质量评估模型的开发提出了许多挑战。这些挑战之一是对音频和视觉刺激所具有的复杂交互以及人类用户如何解释这种交互进行建模。基于深度自动编码器 (NAViDAd) 的无参考视听质量指标从机器学习的角度处理这个问题。该度量接收两组音频和视频特征描述符,并生成一组用于预测视听质量的低维特征。NAViDAd 的基本实现能够产生准确的预测,并使用一系列不同的视听数据库进行测试。当前的工作对度量的基础架构进行了消融研究。几个模块被移除或使用不同的配置重新训练,以更好地理解度量功能。本研究中呈现的结果提供了重要的反馈,使我们能够了解指标体系结构的实际容量,并最终开发出更好的视听质量指标。
更新日期:2020-03-26
down
wechat
bug