当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Methods for eliciting, annotating, and analyzing databases for child speech development.
Computer Speech & Language ( IF 4.3 ) Pub Date : 2017-09-26 , DOI: 10.1016/j.csl.2017.02.010
Mary E Beckman 1 , Andrew R Plummer 2 , Benjamin Munson 3 , Patrick F Reidy 4
Affiliation  

Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.

中文翻译:

引发,注释和分析儿童语音开发数据库的方法。

自动语音识别(ASR)的方法(例如分段和强制对齐)促进了对大型成人语音数据库和照顾者-婴儿互动数据库的快速注释和分析,从而使语音科学的发展在几十年前是无法想象的。本文着重解决两个主要问题,以便拥有类似的资源来开发和利用幼儿语音数据库。第一个问题是要理解和欣赏成人和儿童语音之间的差异,这些差异会导致为成人语音开发的ASR模型在应用于儿童语音时失败。这些差异包括以下事实:儿童的声道比成年男性的声道小,并且在发育过程中其大小和形状也迅速变化,导致不同年龄段的谈话者之间的差异,使成年男女之间的谈话者之间的差异相形见war。而且,儿童直到年幼时才达到完全像成年人一样的语音运动控制,并且他们的词汇量和语音能力也在发展,从而导致说话者内的可变性以及说话者间的可变性更大。然后,第二个问题是确定哪些注释方案和分析技术可以最有效地捕获此可变性的相关方面。的确,应用于儿童语音的标准声学特征表明,以成人为中心的注释模式无法捕获现象,例如在儿童不断发展的语音系统中隐性对比的出现,同时还揭示了儿童的语音系统。当他们获得母语的语音系统时,他们朝着社区言语规范的不统一发展。这两个问题都指出,需要对发音系统(以及词典和语音系统)的成长和发展进行更基础的研究,而后者明确地面向适合年龄的计算模型的构建。
更新日期:2019-11-01
down
wechat
bug