Emulating Perceptual Evaluation of Voice Using Scattering Transform Based Features,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Emulating Perceptual Evaluation of Voice Using Scattering Transform Based Features
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 5-27-2022 , DOI: 10.1109/taslp.2022.3178239
Juan Manuel Miramont ₁ , Marcelo Alejandro Colominas ₁ , Gaston Schlotthauer ₁

Affiliation

Voice health is traditionally assessed by methods that rely on the perception of a clinician, who integrates auditory and visual cues in order to reach a conclusion about the voice under evaluation. However, these tasks suffer from inter-professional variability due to its subjective nature, which is why more objective, computational-based methods are of interest. Two examples of such subjective tasks are the classification of voices in three types according to their periodicity, also termed voice typing, and the evaluation of six aspects of voice quality by means of the consensus auditory-perceptual evaluation of voice (CAPE-V) protocol. In this paper, two approaches to emulate each of those tasks are introduced, based on simple features extracted from scattering transform coefficients and support vector machines. Firstly, a system for automatic voice typing was trained and its classification performance was evaluated for intra and inter-dataset trials using two widely known corpora. Accuracies above 80%, comparable to the state-of-the-art, were found for all the experiments conducted. Secondly, a multidimensional, multioutput regression chain model was used to automatically grade the voice quality features of the CAPE-V protocol, obtaining errors and correlation coefficients that are comparable to those found for three human raters.

中文翻译：

使用基于散射变换的特征来模拟语音的感知评估

传统上，声音健康状况的评估方法依赖于临床医生的感知，临床医生整合听觉和视觉线索，以便得出有关评估声音的结论。然而，这些任务由于其主观性质而受到专业间差异的影响，这就是为什么更客观、基于计算的方法受到关注。这种主观任务的两个例子是根据语音的周期性将语音分为三种类型，也称为语音分类，以及通过语音的共识听觉感知评估（CAPE-V）协议对语音质量的六个方面进行评估。在本文中，基于从散射变换系数和支持向量机中提取的简单特征，介绍了两种模拟每个任务的方法。首先，对自动语音打字系统进行了训练，并使用两个众所周知的语料库在数据集内和数据集间试验中评估了其分类性能。所有进行的实验的准确度均高于 80%，与最先进的水平相当。其次，使用多维、多输出回归链模型对 CAPE-V 协议的语音质量特征进行自动评分，获得与三位人类评分者的误差和相关系数相当的误差和相关系数。

更新日期：2024-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文