当前位置: X-MOL 学术J. Phonet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting anticipatory information in speech with signal chopping
Journal of Phonetics ( IF 1.9 ) Pub Date : 2020-07-20 , DOI: 10.1016/j.wocn.2020.100996
Sam Tilsen

Most analyses of articulatory processes in speech assume that word form-related changes in the state of the vocal tract have well-defined beginnings and ends. But how do we determine the precise moments in time when these beginnings and ends occur? More specifically, when should we expect information related to the sound categories of a word to be present in acoustic and articulatory signals? The framework of Articulatory Phonology/Task Dynamics predicts that the earliest time such information becomes available is when the first articulatory gesture of a word becomes active, which closely corresponds to when a movement is initiated. Alternatively, a recent extension of the Articulatory Phonology model holds that gestures may have an influence on the state of the vocal tract after they have been retrieved from memory, but before they become active and before canonical movement initiation. This paper presents evidence that indeed, anticipatory information is available much earlier than is typically assumed: the identity of a syllable onset gesture can be predicted from articulatory and acoustic data quite early, in some cases nearly half a second before movement initiation. Likewise, the identity of a coda gesture can be predicted during the period of time typically associated with an onset consonant. These findings were obtained with a novel analysis method called signal chopping which was paired with deep neural network based classification. In this approach articulatory and acoustic signals are systematically truncated in space and time, and a network training/test procedure is repeated on the chopped signals. By analyzing the effects of chopping on classification accuracy, gesture-specific information can be spatiotemporally localized.



中文翻译:

通过信号斩波检测语音中的预期信息

语音中的发音过程的大多数分析都假设,与言语形式有关的声道状态变化具有明确的起点和终点。但是,我们如何确定这些起点和终点发生的准确时刻?更具体地说,我们何时应该期望与单词的声音类别相关的信息出现在声音和发音信号中?发音语音/任务动力学的框架预测,此类信息最早可用的时间是单词的第一个发音手势激活时的时间,这与启动动作的时间密切相关。另外,“发音语音”模型的最新扩展认为,从记忆中检索手势后,手势可能会对声道状态产生影响,但在它们变得活跃之前和规范动作开始之前。本文提供的证据表明,确实比预期的信息要早得多:音节发作手势的身份可以很早地从发音和声学数据中预测出来,在某些情况下,是在运动开始前将近半秒。同样,可以在通常与起始辅音相关的时间段内预测尾声手势的身份。这些发现是通过一种称为 在某些情况下,运动开始前将近半秒。同样,可以在通常与起始辅音相关的时间段内预测尾声手势的身份。这些发现是通过一种称为 在某些情况下,运动开始前将近半秒。同样,可以在通常与起始辅音相关的时间段内预测尾声手势的身份。这些发现是通过一种称为信号斩波与基于深度神经网络的分类配对。在这种方法中,发音和声音信号在空间和时间上被系统地截断,并且对切碎的信号重复进行网络训练/测试过程。通过分析斩波对分类准确性的影响,可以时空定位特定于手势的信息。

更新日期:2020-07-20
down
wechat
bug