当前位置: X-MOL 学术J. Phonet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Computing low-dimensional representations of speech from socio-auditory structures for phonetic analyses.
Journal of Phonetics ( IF 2.440 ) Pub Date : 2018-10-24
Andrew R Plummer 1 , Patrick F Reidy 2
Affiliation  

Low-dimensional representations of speech data, such as formant values extracted by linear predictive coding analysis or spectral moments computed from whole spectra viewed as probability distributions, have been instrumental in both phonetic and phonological analyses over the last few decades. In this paper, we present a framework for computing low-dimensional representations of speech data based on two assumptions: that speech data represented in high-dimensional data spaces lie on shapes called manifolds that can be used to map speech data to low-dimensional coordinate spaces, and that manifolds underlying speech data are generated from a combination of language-specific lexical, phonological, and phonetic information as well as culture-specific socio-indexical information that is expressed by talkers of a given speech community. We demonstrate the basic mechanics of the framework by carrying out an analysis of children's productions of sibilant fricatives relative to those of adults in their speech community using the phoneigen package - a publicly available implementation of the framework. We focus the demonstration on enumerating the steps for constructing manifolds from data and then using them to map the data to a low-dimensional space, explicating how manifold structure affects the learned low-dimensional representations, and comparing the use of these representations against standard acoustic features in a phonetic analysis. We conclude with a discussion of the framework's underlying assumptions, its broader modeling potential, and its position relative to recent advances in the field of representation learning.

中文翻译:

从社会听觉结构计算语音的低维表示以进行语音分析。

语音数据的低维表示,例如通过线性预测编码分析提取的共振峰值或从被视为概率分布的整个频谱计算出的谱矩,在过去几十年中在语音和音系分析中发挥了重要作用。在本文中,我们提出了一个基于两个假设的计算语音数据低维表示的框架:高维数据空间中表示的语音数据位于称为流形的形状上,可用于将语音数据映射到低维坐标空间,并且流形底层语音数据是由特定语言的词汇、语音和语音信息以及由给定语音社区的说话者表达的特定文化的社会索引信息的组合生成的。我们通过使用phoneigen包(该框架的公开实现)对儿童发出的咝擦音相对于其言语社区中成年人发出的咝擦音进行分析,展示了该框架的基本机制。我们的演示重点是枚举从数据构建流形的步骤,然后使用它们将数据映射到低维空间,解释流形结构如何影响学习的低维表示,并将这些表示的使用与标准声学进行比较语音分析中的特征。最后,我们讨论了该框架的基本假设、其更广泛的建模潜力以及其相对于表示学习领域最新进展的地位。
更新日期:2019-11-01
down
wechat
bug