当前位置: X-MOL 学术IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Convex weighting criteria for speaking rate estimation.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2015-09-01 , DOI: 10.1109/taslp.2015.2434213
Yishan Jiao 1 , Visar Berisha 1 , Ming Tu 1 , Julie Liss 1

Speaking rate estimation directly from the speech waveform is a long-standing problem in speech signal processing. In this paper, we pose the speaking rate estimation problem as that of estimating a temporal density function whose integral over a given interval yields the speaking rate within that interval. In contrast to many existing methods, we avoid the more difficult task of detecting individual phonemes within the speech signal and we avoid heuristics such as thresholding the temporal envelope to estimate the number of vowels. Rather, the proposed method aims to learn an optimal weighting function that can be directly applied to time-frequency features in a speech signal to yield a temporal density function. We propose two convex cost functions for learning the weighting functions and an adaptation strategy to customize the approach to a particular speaker using minimal training. The algorithms are evaluated on the TIMIT corpus, on a dysarthric speech corpus, and on the ICSI Switchboard spontaneous speech corpus. Results show that the proposed methods outperform three competing methods on both healthy and dysarthric speech. In addition, for spontaneous speech rate estimation, the result show a high correlation between the estimated speaking rate and ground truth values.



直接从语音波形估计语速是语音信号处理中长期存在的问题。在本文中,我们将语速估计问题提出为估计时间密度函数的问题,该时间密度函数在给定区间内的积分产生该区间内的语速。与许多现有方法相比,我们避免了检测语音信号中的单个音素的更困难的任务,并且避免了启发式方法,例如对时间包络进行阈值化以估计元音的数量。相反,所提出的方法旨在学习可以直接应用于语音信号中的时频特征以产生时间密度函数的最佳加权函数。我们提出了两个用于学习权重函数的凸成本函数和一个适应策略,以使用最少的训练来定制针对特定说话者的方法。这些算法在 TIMIT 语料库、构音障碍语音语料库和 ICSI Switchboard 自发语音语料库上进行评估。结果表明,所提出的方法在健康和构音障碍语音方面均优于三种竞争方法。此外,对于自发语速估计,结果显示估计的语速与真实值之间存在高度相关性。