Children's speaker verification in low and zero resource conditions,Digital Signal Processing

当前位置： X-MOL 学术 › Digit. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Children's speaker verification in low and zero resource conditions
Digital Signal Processing ( IF 2.9 ) Pub Date : 2021-06-07 , DOI: 10.1016/j.dsp.2021.103115
S. Shahnawazuddin , Waquar Ahmad , Nagaraj Adiga , Avinash Kumar

Our efforts towards developing an automatic speaker verification (ASV) system for child speakers are presented in this paper. For the majority of the languages, children's speech data for training the ASV system is either unavailable (zero-resource) or very limited (low-resource). Under low- and zero-resource conditions, developing an ASV system becomes a very challenging problem. To overcome this issue, we have studied the effectiveness of in-domain and out-of-domain data augmentation in this work. Speed and pitch modifications of children's speech are employed for synthetically creating data in the case of in-domain data augmentation. On the other hand, a limited amount of adults' speech is used when out-of-domain data augmentation is performed. Using adults' speech leads to severe acoustic mismatch due to dissimilarity in the attributes of speech data from adult and child speakers. To address this drawback, speech data from adult speakers are subjected to voice conversion (VC) to alter the acoustic attributes. A cycle-consistent generative adversarial network is used in this work for voice conversion. Voice conversion renders adults' speech perceptually similar to children's speech. The voice converted adults' data can then be used for augmentation, ensuring that the acoustic mismatch is minimal. To study the effectiveness of proposed data augmentation techniques experimentally, x-vector-based ASV system architecture is employed. At the same time, the role of i-vector is also studied in this paper. As a consequence of data augmentation, both equal error rate and minimum decision cost function are reduced significantly in low- and zero-resource conditions. At the same time, employing i-vectors for modeling speaker characteristics is noted to be superior. Finally, we have also presented a detailed study on the effect of data augmentation with child speakers' age variation.

中文翻译：

低资源和零资源条件下的儿童说话人验证

本文介绍了我们为儿童说话者开发自动说话人验证 (ASV) 系统的努力。对于大多数语言，用于训练 ASV 系统的儿童语音数据要么不可用（零资源），要么非常有限（低资源）。在低资源和零资源条件下，开发 ASV 系统成为一个非常具有挑战性的问题。为了克服这个问题，我们在这项工作中研究了域内和域外数据增强的有效性。在域内数据增强的情况下，使用儿童语音的速度和音调修改来综合创建数据。另一方面，当执行域外数据增强时，会使用有限数量的成人语音。由于来自成人和儿童说话者的语音数据的属性不同，使用成人的语音会导致严重的声学失配。为了解决这个缺点，来自成年说话者的语音数据要经过语音转换 (VC) 以改变声学属性。在这项工作中使用循环一致的生成对抗网络进行语音转换。语音转换使成人的语音在感知上与儿童的语音相似。然后可以将语音转换的成人数据用于增强，确保声学失配最小。为了通过实验研究所提出的数据增强技术的有效性，采用基于x向量的 ASV 系统架构。同时，本文还研究了i- vector的作用。作为数据增强的结果，在低资源和零资源条件下，相等错误率和最小决策成本函数都显着降低。同时，注意到使用i向量对说话者特征进行建模是优越的。最后，我们还详细研究了数据增强对儿童说话者年龄变化的影响。

更新日期：2021-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11