当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
arXiv - CS - Sound Pub Date : 2020-07-03 , DOI: arxiv-2007.01836
Pavel Denisov, Ngoc Thang Vu

Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps. These components are optimized independently to allow usage of available data, but the overall system suffers from error propagation. In this paper, we propose a novel training method that enables pretrained contextual embeddings to process acoustic features. In particular, we extend it with an encoder of pretrained speech recognition systems in order to construct end-to-end spoken language understanding systems. Our proposed method is based on the teacher-student framework across speech and text modalities that aligns the acoustic and the semantic latent spaces. Experimental results in three benchmarks show that our system reaches the performance comparable to the pipeline architecture without using any training data and outperforms it after fine-tuning with ten examples per class on two out of three benchmarks.

中文翻译:

通过跨模态师生学习实现端到端口语理解的预训练语义语音嵌入

口语理解通常基于管道架构,包括语音识别和自然语言理解步骤。这些组件经过独立优化以允许使用可用数据,但整个系统会受到错误传播的影响。在本文中,我们提出了一种新的训练方法,使预训练的上下文嵌入能够处理声学特征。特别是,我们使用预训练语音识别系统的编码器对其进行了扩展,以构建端到端的口语理解系统。我们提出的方法基于跨语音和文本模态的师生框架,该框架对齐声学和语义潜在空间。
更新日期:2020-08-13
down
wechat
bug