当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comparative study of fundamental frequency stability between speech and singing
Speech Communication ( IF 2.4 ) Pub Date : 2021-02-11 , DOI: 10.1016/j.specom.2021.02.003
Beatriz Raposo de Medeiros , Joao Paulo Cabral , Alexsandro R. Meireles , Andre A. Baceti

Speaking and singing are mechanisms of vocal production that have distinct articulatory properties and consequently produce sounds that are normally perceived as different. Several papers indicate that the tonal stability characteristic in singing associated with pre-defined fundamental frequency (f0) target tones, i.e., musical notes, is an important differentiating factor relatively to the observed non-predefined f0 target tones with greater f0 variability in speech. However, they are mainly grounded on perceptual experiments and little has been done to demonstrate this difference in terms of acoustic measurements. The aim of this paper is to compare measures of f0 variability between singing and speech to test the hypothesis that singing has lower f0 variability, as it would be expected according with the higher tonal stability in singing. In order to perform this comparison, we built a database with parallel singing and speech recordings. In a first experiment, these two signals were compared using the common statistical measures of f0 variability during linguistic units (syllable and phone), which have been used before in other works, specifically based on f0 variance. Although the results were not conclusive about the hypothesis, a more detailed analysis performed in this first experiment allowed us to find characteristic f0 effects in both speech and singing data that should be taken into account in our subsequent study of f0 stability. Thus, another experiment was conducted with the same recorded data but using a different statistical analysis of f0 variance to take into account these factors. In contrast with the first experiment, the results confirmed the hypothesis of higher f0 stability in singing. The final experiment in this work consisted of using a deep neural network classifier to test if speech and singing can be differentiated directly from the f0 values measured at syllable level, without using statistical measures. The results are consistent with the positive results of the second experiment. The findings of this research are important to better understand the acoustic properties of intonation that permit to distinguish spoken from sung sounds. It also provides cues to derive suitable f0 models for applications depending on the modalities used, such as synthesis or transformation of speech/singing signals.



中文翻译:

语音和唱歌的基本频率稳定性比较研究

说话和唱歌是人声产生的机制,具有明显的发音特性,因此产生通常被认为是不同的声音。几篇论文指出,与预定义的基本频率(f0)目标音调(即音符)相关的歌唱音调稳定性特征是相对于观察到的,语音中具有更大f0变异性的非预定义的f0目标音调的重要区别因素。但是,它们主要基于感知实验,并且几乎没有做任何实验来证明声学测量方面的这种差异。本文的目的是比较歌唱和语音之间的f0变异性的度量,以检验歌唱具有较低的f0变异性的假设,这是根据较高的歌唱音调稳定性而预期的。为了进行这种比较,我们建立了一个包含并行唱歌和语音录音的数据库。在第一个实验中,使用语言单元(音节和电话)中常用的f0变异性统计量度比较了这两个信号,这在其他工作中就已经使用过,特别是基于f0变异性。尽管结果并不确定该假设,但在第一个实验中进行的更详细的分析使我们能够在语音和歌唱数据中找到特征性的f0效应,在我们随后的f0稳定性研究中应将其考虑在内。因此,使用相同的记录数据但使用f0方差的不同统计分析来考虑这些因素,进行了另一个实验。与第一个实验相反,结果证实了在歌唱中较高的f0稳定性的假设。这项工作的最终实验包括使用深度神经网络分类器来测试语音和唱歌是否可以直接与音节水平上测得的f0值区分开,而无需使用统计量度。结果与第二个实验的阳性结果一致。这项研究的发现对于更好地理解语调的声学特性非常重要,该特性可以区分口语和歌声。它还根据使用的模态(例如语音/唱歌信号的合成或转换)提供提示,以导出适合应用的f0模型。这项工作的最终实验包括使用深度神经网络分类器来测试语音和唱歌是否可以直接与音节水平上测得的f0值区分开,而无需使用统计量度。结果与第二个实验的阳性结果一致。这项研究的发现对于更好地理解语调的声学特性非常重要,该特性可以区分口语和歌声。它还根据使用的模态(例如语音/唱歌信号的合成或转换)提供提示,以导出适合应用的f0模型。这项工作的最终实验包括使用深度神经网络分类器来测试语音和唱歌是否可以直接与音节水平上测得的f0值区分开,而无需使用统计量度。结果与第二个实验的阳性结果一致。这项研究的发现对于更好地理解语调的声学特性非常重要,该特性可以区分口语和歌声。它还根据使用的模态(例如语音/唱歌信号的合成或转换)提供提示,以导出适合应用的f0模型。这项研究的发现对于更好地理解语调的声学特性非常重要,该特性可以区分口语和歌声。它还根据使用的模态(例如语音/唱歌信号的合成或转换)提供提示,以导出适合应用的f0模型。这项研究的发现对于更好地理解语调的声学特性非常重要,该特性可以区分口语和歌声。它还根据使用的模态(例如语音/唱歌信号的合成或转换)提供提示,以导出适合应用的f0模型。

更新日期:2021-02-23
down
wechat
bug