Exploring Unsupervised Pretraining Objectives for Machine Translation,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploring Unsupervised Pretraining Objectives for Machine Translation
arXiv - CS - Computation and Language Pub Date : 2021-06-10 , DOI: arxiv-2106.05634
Christos Baziotis, Ivan Titov, Alexandra Birch, Barry Haddow

Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We pretrain models with different methods on English$\leftrightarrow$German, English$\leftrightarrow$Nepali and English$\leftrightarrow$Sinhala monolingual data, and evaluate them on NMT. In (semi-) supervised NMT, varying the pretraining objective leads to surprisingly small differences in the finetuned performance, whereas unsupervised NMT is much more sensitive to it. To understand these results, we thoroughly study the pretrained models using a series of probes and verify that they encode and use information in different ways. We conclude that finetuning on parallel data is mostly sensitive to few properties that are shared by most models, such as a strong decoder, in contrast to unsupervised NMT that also requires models with strong cross-lingual abilities.

中文翻译：

探索机器翻译的无监督预训练目标

无监督跨语言预训练在神经机器翻译 (NMT) 方面取得了显著成果，因为它大大减少了对大型并行数据的需求。大多数方法通过屏蔽部分输入并在解码器中重建它们，使屏蔽语言建模 (MLM) 适应序列到序列架构。在这项工作中，我们通过根据上下文重新排序和替换单词，系统地将掩码与产生类似于真实（完整）句子的输入的替代目标进行比较。我们在 English$\leftrightarrow$German、English$\leftrightarrow$Nepali 和 English$\leftrightarrow$Sinhala 单语数据上用不同的方法预训练模型，并在 NMT 上对其进行评估。在（半）监督 NMT 中，改变预训练目标会导致微调性能的微小差异，而无监督的 NMT 对它更敏感。为了理解这些结果，我们使用一系列探针彻底研究了预训练模型，并验证它们以不同方式编码和使用信息。我们得出的结论是，对并行数据的微调主要对大多数模型共享的少数属性（例如强大的解码器）敏感，而无监督 NMT 也需要具有强大跨语言能力的模型。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>