Deep neural networks for automatic speech processing: a survey from large corpora to limited data,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep neural networks for automatic speech processing: a survey from large corpora to limited data
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2022-08-17 , DOI: 10.1186/s13636-022-00251-w
Vincent Roger , Jérôme Farinas , Julien Pinquier

Most state-of-the-art speech systems use deep neural networks (DNNs). These systems require a large amount of data to be learned. Hence, training state-of-the-art frameworks on under-resourced speech challenges are difficult tasks. As an example, a challenge could be the limited amount of data to model impaired speech. Furthermore, acquiring more data and/or expertise is time-consuming and expensive. In this paper, we focus on the following speech processing tasks: automatic speech recognition, speaker identification, and emotion recognition. To assess the problem of limited data, we firstly investigate state-of-the-art automatic speech recognition systems, as this is the hardest task (due to the wide variability in each language). Next, we provide an overview of techniques and tasks requiring fewer data. In the last section, we investigate few-shot techniques by interpreting under-resourced speech as a few-shot problem. In that sense, we propose an overview of few-shot techniques and the possibility of using such techniques for the speech problems addressed in this survey. It is true that the reviewed techniques are not well adapted for large datasets. Nevertheless, some promising results from the literature encourage the usage of such techniques for speech processing.

中文翻译：

用于自动语音处理的深度神经网络：从大型语料库到有限数据的调查

大多数最先进的语音系统都使用深度神经网络 (DNN)。这些系统需要学习大量数据。因此，针对资源不足的语音挑战训练最先进的框架是一项艰巨的任务。例如，挑战可能是对受损语音进行建模的数据量有限。此外，获取更多数据和/或专业知识既耗时又昂贵。在本文中，我们专注于以下语音处理任务：自动语音识别、说话人识别和情感识别。为了评估有限数据的问题，我们首先研究最先进的自动语音识别系统，因为这是最困难的任务（由于每种语言的广泛可变性）。接下来，我们概述了需要较少数据的技术和任务。在最后一节中，我们通过将资源不足的语音解释为小样本问题来研究小样本技术。从这个意义上说，我们提出了小样本技术的概述，以及将这些技术用于本调查中解决的语音问题的可能性。确实，所审查的技术不能很好地适应大型数据集。尽管如此，文献中的一些有希望的结果鼓励使用这种技术进行语音处理。

更新日期：2022-08-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>