当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
arXiv - CS - Sound Pub Date : 2021-04-29 , DOI: arxiv-2104.14470 Ha Nguyen, Yannick Estève, Laurent Besacier
arXiv - CS - Sound Pub Date : 2021-04-29 , DOI: arxiv-2104.14470 Ha Nguyen, Yannick Estève, Laurent Besacier
Boosted by the simultaneous translation shared task at IWSLT 2020, promising
end-to-end online speech translation approaches were recently proposed. They
consist in incrementally encoding a speech input (in a source language) and
decoding the corresponding text (in a target language) with the best possible
trade-off between latency and translation quality. This paper investigates two
key aspects of end-to-end simultaneous speech translation: (a) how to encode
efficiently the continuous speech flow, and (b) how to segment the speech flow
in order to alternate optimally between reading (R: encoding input) and writing
(W: decoding output) operations. We extend our previously proposed end-to-end
online decoding strategy and show that while replacing BLSTM by ULSTM encoding
degrades performance in offline mode, it actually improves both efficiency and
performance in online mode. We also measure the impact of different methods to
segment the speech signal (using fixed interval boundaries, oracle word
boundaries or randomly set boundaries) and show that our best end-to-end online
decoding strategy is surprisingly the one that alternates R/W operations on
fixed size blocks on our English-German speech translation setup.
中文翻译:
编码和分段策略对端到端同时语音翻译的影响
在IWSLT 2020的同时翻译共享任务的推动下,最近提出了有希望的端到端在线语音翻译方法。它们包括以增量方式对语音输入(以源语言)进行编码,并解码对应的文本(以目标语言),并在等待时间和翻译质量之间取得最佳平衡。本文研究了端到端同时语音翻译的两个关键方面:(a)如何有效编码连续语音流,以及(b)如何分割语音流以在阅读之间进行最佳交替(R:编码输入) )和写入(W:解码输出)操作。我们扩展了我们先前提出的端到端在线解码策略,并表明,尽管用ULSTM编码代替BLSTM会降低离线模式下的性能,它实际上提高了在线模式下的效率和性能。我们还测量了分割语音信号的不同方法的影响(使用固定间隔边界,oracle词边界或随机设置的边界),并表明我们最好的端到端在线解码策略令人惊讶地是交替使用R / W操作的策略在我们的英语-德语语音翻译设置中使用固定大小的块。
更新日期:2021-04-30
中文翻译:
编码和分段策略对端到端同时语音翻译的影响
在IWSLT 2020的同时翻译共享任务的推动下,最近提出了有希望的端到端在线语音翻译方法。它们包括以增量方式对语音输入(以源语言)进行编码,并解码对应的文本(以目标语言),并在等待时间和翻译质量之间取得最佳平衡。本文研究了端到端同时语音翻译的两个关键方面:(a)如何有效编码连续语音流,以及(b)如何分割语音流以在阅读之间进行最佳交替(R:编码输入) )和写入(W:解码输出)操作。我们扩展了我们先前提出的端到端在线解码策略,并表明,尽管用ULSTM编码代替BLSTM会降低离线模式下的性能,它实际上提高了在线模式下的效率和性能。我们还测量了分割语音信号的不同方法的影响(使用固定间隔边界,oracle词边界或随机设置的边界),并表明我们最好的端到端在线解码策略令人惊讶地是交替使用R / W操作的策略在我们的英语-德语语音翻译设置中使用固定大小的块。