Automatic detection of Voice Onset Time in voiceless plosives using gated recurrent units,Digital Signal Processing

当前位置： X-MOL 学术 › Digit. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic detection of Voice Onset Time in voiceless plosives using gated recurrent units
Digital Signal Processing ( IF 2.9 ) Pub Date : 2020-05-27 , DOI: 10.1016/j.dsp.2020.102779
T. Arias-Vergara , P. Argüello-Vélez , J.C. Vásquez-Correa , E. Nöth , M. Schuster , M.C. González-Rátiva , J.R. Orozco-Arroyave

Voice Onset Time (VOT) has been used by researchers as an acoustic measure in order to gain some understanding about the impact of different motor speech disorders in speech production. However, VOT values are usually obtained manually, which is expensive and time consuming. In this paper we proposed a method for the automatic detection of VOT based on pre-trained Recurrent Neural Networks with Gated Recurrent Units (GRUs). Speech recordings from 50 Spanish native speakers from Colombia (25 male) are considered for the experiments. The recordings include the utterance of the diadochokinesis task /pa-ta-ka/ which is typically used for the evaluation of motor speech disorders like those caused due to Parkinson's disease. Additionally, the diadochokinesis task allows us to train a system to detect the VOT of voiceless plosive sounds in intermediate positions. Acoustic analysis is performed by extracting different temporal and spectral features from the recordings. According to the results, it is possible to detect the VOT with F1-score values of 0.66 for

, 0.75 for

, and 0.78 for

when the predicted values are compared with respect to the manual VOT labels.

中文翻译：

使用门控递归单元自动检测无声炸药中的语音开始时间

研究人员已将语音起音时间（VOT）用作一种声学测量方法，以便对不同的运动语音障碍在语音产生中的影响有所了解。但是，VOT值通常是手动获得的，这既昂贵又耗时。在本文中，我们提出了一种基于带有门控循环单元（GRU）的预训练循环神经网络的VOT自动检测方法。实验中考虑了50位来自哥伦比亚的西班牙裔母语人士的录音（25位男性）。录音中包含了diachochokinesis任务/ pa-ta-ka /的语音，通常用于评估运动性语音障碍，例如由帕金森氏病引起的运动障碍。另外，diachochokinesis任务使我们能够训练一个系统，以检测中间位置无声爆破声的VOT。通过从记录中提取不同的时间和频谱特征来执行声学分析。根据结果，可以检测到F1得分为0.66的VOT