Tight Integrated End-to-End Training for Cascaded Speech Translation,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tight Integrated End-to-End Training for Cascaded Speech Translation
arXiv - CS - Computation and Language Pub Date : 2020-11-24 , DOI: arxiv-2011.12167
Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney

A cascaded speech translation model relies on discrete and non-differentiable transcription, which provides a supervision signal from the source side and helps the transformation between source speech and target text. Such modeling suffers from error propagation between ASR and MT models. Direct speech translation is an alternative method to avoid error propagation; however, its performance is often behind the cascade system. To use an intermediate representation and preserve the end-to-end trainability, previous studies have proposed using two-stage models by passing the hidden vectors of the recognizer into the decoder of the MT model and ignoring the MT encoder. This work explores the feasibility of collapsing the entire cascade components into a single end-to-end trainable model by optimizing all parameters of ASR and MT models jointly without ignoring any learned parameters. It is a tightly integrated method that passes renormalized source word posterior distributions as a soft decision instead of one-hot vectors and enables backpropagation. Therefore, it provides both transcriptions and translations and achieves strong consistency between them. Our experiments on four tasks with different data scenarios show that the model outperforms cascade models up to 1.8% in BLEU and 2.0% in TER and is superior compared to direct models.

中文翻译：

紧密集成的端到端培训，用于级联语音翻译

级联语音翻译模型依赖于离散和不可区分的转录，该转录提供来自源端的监督信号，并有助于源语音和目标文本之间的转换。这种建模遭受ASR和MT模型之间的错误传播。直接语音翻译是避免错误传播的另一种方法。但是，其性能通常落后于级联系统。为了使用中间表示并保持端到端的可训练性，先前的研究提出了通过将识别器的隐藏矢量传递到MT模型的解码器而忽略MT编码器的两阶段模型。这项工作探索了通过联合优化ASR和MT模型的所有参数而不忽略任何学习到的参数，将整个级联组件折叠为单个端到端可训练模型的可行性。它是一种紧密集成的方法，它将经过重新规范化的源词后验分布作为一种软决策而不是单热向量，并且可以进行反向传播。因此，它提供了转录和翻译，并在它们之间实现了强大的一致性。我们在具有不同数据场景的四个任务上进行的实验表明，该模型的层叠模型在BLEU和TER上的性能分别高达1.8％和2.0％，并且优于直接模型。它是一种紧密集成的方法，它将经过重新规范化的源词后验分布作为一种软决策而不是单热向量，并且可以进行反向传播。因此，它提供了转录和翻译，并在它们之间实现了强大的一致性。我们在具有不同数据场景的四个任务上进行的实验表明，该模型的层叠模型在BLEU和TER上的性能分别高达1.8％和2.0％，并且优于直接模型。它是一种紧密集成的方法，它将经过重新规范化的源词后验分布作为一种软决策而不是单热向量，并且可以进行反向传播。因此，它提供了转录和翻译，并在它们之间实现了强大的一致性。我们在具有不同数据场景的四个任务上进行的实验表明，该模型的层叠模型在BLEU和TER上的性能分别高达1.8％和2.0％，并且优于直接模型。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文