当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer
arXiv - CS - Computation and Language Pub Date : 2021-06-09 , DOI: arxiv-2106.04833
Xingshan Zeng, Liangyou Li, Qun Liu

End-to-end simultaneous speech translation (SST), which directly translates speech in one language into text in another language in real-time, is useful in many scenarios but has not been fully investigated. In this work, we propose RealTranS, an end-to-end model for SST. To bridge the modality gap between speech and text, RealTranS gradually downsamples the input speech with interleaved convolution and unidirectional Transformer layers for acoustic modeling, and then maps speech features into text space with a weighted-shrinking operation and a semantic encoder. Besides, to improve the model performance in simultaneous scenarios, we propose a blank penalty to enhance the shrinking quality and a Wait-K-Stride-N strategy to allow local reranking during decoding. Experiments on public and widely-used datasets show that RealTranS with the Wait-K-Stride-N strategy outperforms prior end-to-end models as well as cascaded models in diverse latency settings.

中文翻译:

RealTranS:使用卷积加权收缩变换器的端到端同步语音翻译

端到端同步语音翻译 (SST),将一种语言的语音直接实时翻译成另一种语言的文本,在许多场景中都很有用,但尚未得到充分研究。在这项工作中,我们提出了 RealTranS,一种 SST 的端到端模型。为了弥合语音和文本之间的模态差距,RealTranS 使用交错卷积和单向 Transformer 层逐渐对输入语音进行下采样以进行声学建模,然后使用加权收缩操作和语义编码器将语音特征映射到文本空间。此外,为了提高同时场景中的模型性能,我们提出了一个空白惩罚来提高收缩质量和一个 Wait-K-Stride-N 策略,以允许在解码期间进行局部重新排序。
更新日期:2021-06-10
down
wechat
bug