当前位置: X-MOL 学术IEEE J. Sel. Top. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of creaky speech by recurrent neural networks using psychoacoustic roughness
IEEE Journal of Selected Topics in Signal Processing ( IF 7.5 ) Pub Date : 2020-02-01 , DOI: 10.1109/jstsp.2019.2949422
Julian Villegas , Konstantin Markov , Jeremy Perkins , Seunghun J. Lee

The use of a psychoacoustic roughness model as a predictor of creaky voice is reported. We found that the roughness temporal profile of vocalic segments can predict the presence of creakiness in speech. Using a simple bi-directional Recurrent Neural Network (rnn), we were able to predict the presence of creakiness in vocalic segments from only roughness traces with an accuracy similar to that obtained with rnns trained on at least 12-dimensional input data (including amplitude difference between the first two harmonics, residual peak prominence, etc.). Training rnns with the combination of roughness and multidimensional input data improved the performance of the predictor, but not significantly. Likewise, augmenting the dataset by time derivatives of the input features did not improve the predictor's performance. The proposed roughness-based predictor eases interpretation and comparison of creakiness among corpora and suggests that roughness prediction models could be successfully used for classification of creaky intervals in speech.

中文翻译:

使用心理声学粗糙度通过递归神经网络预测吱吱作响的语音

报告了使用心理声学粗糙度模型作为吱吱声的预测器。我们发现声音片段的粗糙度时间分布可以预测语音中是否存在吱吱声。使用简单的双向循环神经网络 (rnn),我们能够仅根据粗糙度轨迹预测人声段中是否存在吱吱声,其精度与使用 rnns 在至少 12 维输入数据(包括振幅)上训练获得的精度相似前两个谐波之间的差异、剩余峰值突出等)。结合粗糙度和多维输入数据训练 rnns 提高了预测器的性能,但并不显着。同样,通过输入特征的时间导数来增加数据集并没有提高预测器的性能。
更新日期:2020-02-01
down
wechat
bug