当前位置: X-MOL 学术Comput. Linguist. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Linguistic Representational Power of Neural Machine Translation Models
Computational Linguistics ( IF 9.3 ) Pub Date : 2020-03-01 , DOI: 10.1162/coli_a_00367
Yonatan Belinkov 1 , Nadir Durrani 2 , Fahim Dalvi 2 , Hassan Sajjad 2 , James Glass 3
Affiliation  

Despite the recent success of deep neural networks in natural language processing (NLP) and other spheres of artificial intelligence (AI), their interpretability remains a challenge. We analyze the representations learned by neural machine translation (NMT) models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word-structure captured within the learned representations, which is an important aspect in translating morphologically-rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information. Notable findings include the following observations: i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers of the model; (iii) Representations learned using characters are more informed about wordmorphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models.

中文翻译:

神经机器翻译模型的语言表征能力

尽管最近深度神经网络在自然语言处理 (NLP) 和人工智能 (AI) 的其他领域取得了成功,但它们的可解释性仍然是一个挑战。我们分析了不同粒度级别的神经机器翻译 (NMT) 模型学习的表示,并通过相关的外在属性评估其质量。特别是,我们寻求以下问题的答案:(i)在学习的表示中捕获单词结构的准确度如何,这是翻译形态丰富的语言的一个重要方面?(ii) 表示是否捕获了长期依赖关系,并有效地处理了语法上不同的语言?(iii) 表示是否捕获了词汇语义?我们对几个参数进行了彻底的调查:(i) 架构中的哪些层捕获了这些语言现象中的每一个;(ii) 翻译单元(词、字符或子词单元)的选择如何影响底层表征捕获的语言特性?(iii) 编码器和解码器的学习方式是否不同且独立?(iv) 多语言 NMT 模型学习的表征是否捕获了与其双语对应物相同数量的语言信息?我们的数据驱动的定量评估阐明了 NMT 模型的重要方面及其捕捉各种语言现象的能力。我们展示了以端到端方式训练的深度 NMT 模型,在训练过程中没有提供任何直接监督,学习了大量的语言信息。值得注意的发现包括以下观察结果:i) 在模型的较低层捕获词形态和词性信息;(ii) 相比之下,词汇语义或非本地句法和语义依赖性在模型的较高层得到更好的表示;(iii) 与使用子词单元学习的表示相比,使用字符学习的表示更了解词形态;(iv) 与双语模型相比,多语言模型学习的表示更丰富。
更新日期:2020-03-01
down
wechat
bug