Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
arXiv - CS - Sound Pub Date : 2021-04-29 , DOI: arxiv-2104.14470
Ha Nguyen, Yannick Estève, Laurent Besacier

Boosted by the simultaneous translation shared task at IWSLT 2020, promising end-to-end online speech translation approaches were recently proposed. They consist in incrementally encoding a speech input (in a source language) and decoding the corresponding text (in a target language) with the best possible trade-off between latency and translation quality. This paper investigates two key aspects of end-to-end simultaneous speech translation: (a) how to encode efficiently the continuous speech flow, and (b) how to segment the speech flow in order to alternate optimally between reading (R: encoding input) and writing (W: decoding output) operations. We extend our previously proposed end-to-end online decoding strategy and show that while replacing BLSTM by ULSTM encoding degrades performance in offline mode, it actually improves both efficiency and performance in online mode. We also measure the impact of different methods to segment the speech signal (using fixed interval boundaries, oracle word boundaries or randomly set boundaries) and show that our best end-to-end online decoding strategy is surprisingly the one that alternates R/W operations on fixed size blocks on our English-German speech translation setup.

中文翻译：

编码和分段策略对端到端同时语音翻译的影响

在IWSLT 2020的同时翻译共享任务的推动下，最近提出了有希望的端到端在线语音翻译方法。它们包括以增量方式对语音输入（以源语言）进行编码，并解码对应的文本（以目标语言），并在等待时间和翻译质量之间取得最佳平衡。本文研究了端到端同时语音翻译的两个关键方面：（a）如何有效编码连续语音流，以及（b）如何分割语音流以在阅读之间进行最佳交替（R：编码输入））和写入（W：解码输出）操作。我们扩展了我们先前提出的端到端在线解码策略，并表明，尽管用ULSTM编码代替BLSTM会降低离线模式下的性能，它实际上提高了在线模式下的效率和性能。我们还测量了分割语音信号的不同方法的影响（使用固定间隔边界，oracle词边界或随机设置的边界），并表明我们最好的端到端在线解码策略令人惊讶地是交替使用R / W操作的策略在我们的英语-德语语音翻译设置中使用固定大小的块。

更新日期：2021-04-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>