当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hybrid-task learning for robust automatic speech recognition
Computer Speech & Language ( IF 4.3 ) Pub Date : 2020-05-03 , DOI: 10.1016/j.csl.2020.101103
Gueorgui Pironkov , Sean UN Wood , Stéphane Dupont

In order to properly train an automatic speech recognition system, speech with its annotated transcriptions is most often required. The amount of real annotated data recorded in noisy and reverberant conditions is extremely limited, especially compared to the amount of data than can be simulated by adding noise to clean annotated speech. Thus, using both real and simulated data is important in order to improve robust speech recognition, as this increases the amount and diversity of training data (thanks to the simulated data) while also benefiting from a reduced mismatch between training and operation of the system (thanks to the real data). Another promising method applied to speech recognition in noisy and reverberant conditions is multi-task learning. The idea is to train one acoustic model to solve simultaneously at least two tasks that are different but related, with speech recognition being the main task. A successful auxiliary task consists of generating clean speech features using a regression loss (as a denoising auto-encoder). This auxiliary task though uses as targets clean speech, which implies that real data cannot be used. In order to tackle this problem a Hybrid-Task Learning system is proposed. This system switches frequently between multi and single-task learning depending on whether the input is real or simulated data respectively. Having a hybrid architecture allows us to benefit from both real and simulated data while using a denoising auto-encoder as auxiliary task of a multi-task setup. We show that the relative improvement brought by the proposed hybrid-task learning architecture can reach up to 4.4% compared to the traditional single-task learning approach on the CHiME4 database. We also demonstrate the benefits of the hybrid approach compared to multi-task learning or adaptation.



中文翻译:

混合任务学习可实现强大的自动语音识别

为了正确地训练自动语音识别系统,最经常需要带有注释的转录的语音。在嘈杂和混响条件下记录的实际带注释的数据量非常有限,尤其是与无法模拟的数据量相比通过添加噪声以清除带注释的语音。因此,同时使用真实数据和模拟数据对于改善鲁棒的语音识别非常重要,因为这会增加训练数据的数量和多样性(由于模拟数据),同时还能减少系统训练和操作之间的失配(多亏了真实的数据)。在嘈杂和混响条件下应用于语音识别的另一种有前途的方法是多任务学习。这个想法是训练一个声学模型来同时解决至少两个不同但相关的任务,其中语音识别是主要任务。成功的辅助任务包括使用回归损失(作为降噪自动编码器)生成清晰的语音特征。尽管此辅助任务将干净的语音用作目标,这意味着无法使用真实数据。为了解决这个问题,提出了一种混合任务学习系统。该系统分别根据输入是真实数据还是模拟数据在多任务和单任务学习之间频繁切换。拥有混合架构,我们可以在使用降噪自动编码器作为多任务设置的辅助任务的同时,受益于真实数据和模拟数据。我们显示,与CHiME4数据库上的传统单任务学习方法相比,所提出的混合任务学习体系结构所带来的相对改进可以达到4.4%。与多任务学习或适应相比,我们还证明了混合方法的好处。该系统分别根据输入是真实数据还是模拟数据在多任务和单任务学习之间频繁切换。拥有混合架构,我们可以在使用降噪自动编码器作为多任务设置的辅助任务的同时,受益于真实数据和模拟数据。我们显示,与CHiME4数据库上的传统单任务学习方法相比,所提出的混合任务学习体系结构所带来的相对改进可以达到4.4%。与多任务学习或适应相比,我们还证明了混合方法的好处。该系统分别根据输入是真实数据还是模拟数据在多任务和单任务学习之间频繁切换。拥有混合架构,我们可以在使用降噪自动编码器作为多任务设置的辅助任务的同时,受益于真实数据和模拟数据。我们显示,与CHiME4数据库上的传统单任务学习方法相比,所提出的混合任务学习体系结构所带来的相对改进可以达到4.4%。与多任务学习或适应相比,我们还证明了混合方法的好处。我们显示,与CHiME4数据库上的传统单任务学习方法相比,所提出的混合任务学习体系结构所带来的相对改进可以达到4.4%。与多任务学习或适应相比,我们还证明了混合方法的好处。我们显示,与CHiME4数据库上的传统单任务学习方法相比,所提出的混合任务学习体系结构所带来的相对改进可以达到4.4%。与多任务学习或适应相比,我们还证明了混合方法的好处。

更新日期:2020-05-03
down
wechat
bug