当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2020-01-02 , DOI: 10.1007/s11263-019-01281-2
Stephanie Stoll , Necati Cihan Camgoz , Simon Hadfield , Richard Bowden

We present a novel approach to automatic Sign Language Production using recent developments in Neural Machine Translation (NMT), Generative Adversarial Networks, and motion generation. Our system is capable of producing sign videos from spoken language sentences. Contrary to current approaches that are dependent on heavily annotated data, our approach requires minimal gloss and skeletal level annotations for training. We achieve this by breaking down the task into dedicated sub-processes. We first translate spoken language sentences into sign pose sequences by combining an NMT network with a Motion Graph. The resulting pose information is then used to condition a generative model that produces photo realistic sign language video sequences. This is the first approach to continuous sign video generation that does not use a classical graphical avatar. We evaluate the translation abilities of our approach on the PHOENIX14 T Sign Language Translation dataset. We set a baseline for text-to-gloss translation, reporting a BLEU-4 score of 16.34/15.26 on dev/test sets. We further demonstrate the video generation capabilities of our approach for both multi-signer and high-definition settings qualitatively and quantitatively using broadcast quality assessment metrics.

中文翻译:

Text2Sign:使用神经机器翻译和生成对抗网络实现手语生成

我们使用神经机器翻译 (NMT)、生成对抗网络和运动生成的最新发展,提出了一种自动手语生成的新方法。我们的系统能够从口语句子中生成手语视频。与当前依赖大量注释数据的方法相反,我们的方法需要最少的光泽和骨架级别的注释来进行训练。我们通过将任务分解为专用的子流程来实现这一点。我们首先通过将 NMT 网络与运动图相结合,将口语句子翻译成手势序列。然后使用得到的姿势信息来调节生成照片逼真的手语视频序列的生成模型。这是第一种不使用经典图形化身的连续标志视频生成方法。我们在 PHOENIX14 T 手语翻译数据集上评估了我们的方法的翻译能力。我们为 text-to-gloss 翻译设置了基线,报告在开发/测试集上的 BLEU-4 分数为 16.34/15.26。我们进一步使用广播质量评估指标定性和定量地证明了我们的方法对多签名者和高清设置的视频生成能力。
更新日期:2020-01-02
down
wechat
bug