Detecting light verb constructions across languages,Natural Language Engineering

当前位置： X-MOL 学术 › Nat. Lang. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting light verb constructions across languages
Natural Language Engineering ( IF 2.5 ) Pub Date : 2019-07-15 , DOI: 10.1017/s1351324919000330
István Nagy T. , Anita Rácz , Veronika Vincze

Light verb constructions (LVCs) are verb and noun combinations in which the verb has lost its meaning to some degree and the noun is used in one of its original senses, typically denoting an event or an action. They exhibit special linguistic features, especially when regarded in a multilingual context. In this paper, we focus on the automatic detection of LVCs in raw text in four different languages, namely, English, German, Spanish, and Hungarian. First, we analyze the characteristics of LVCs from a linguistic point of view based on parallel corpus data. Then, we provide a standardized (i.e., language-independent) representation of LVCs that can be used in machine learning experiments. After, we experiment on identifying LVCs in different languages: we exploit language adaptation techniques which demonstrate that data from an additional language can be successfully employed in improving the performance of supervised LVC detection for a given language. As there are several annotated corpora from several domains in the case of English and Hungarian, we also investigate the effect of simple domain adaptation techniques to reduce the gap between domains. Furthermore, we combine domain adaptation techniques with language adaptation techniques for these two languages. Our results show that both out-domain and additional language data can improve performance. We believe that our language adaptation method may have practical implications in several fields of natural language processing, especially in machine translation.

中文翻译：

跨语言检测轻动词结构

轻量动词结构 (LVC) 是动词和名词的组合，其中动词在某种程度上失去了意义，并且名词以其原始意义之一使用，通常表示事件或动作。它们表现出特殊的语言特征，尤其是在多语言环境中。在本文中，我们专注于在四种不同语言（即英语、德语、西班牙语和匈牙利语）的原始文本中自动检测 LVC。首先，我们基于并行语料库数据从语言学角度分析LVCs的特征。然后，我们提供了可用于机器学习实验的 LVC 的标准化（即与语言无关）表示。之后，我们尝试识别不同语言的 LVC：我们利用语言适应技术证明来自其他语言的数据可以成功地用于提高给定语言的监督 LVC 检测的性能。由于在英语和匈牙利语的情况下有来自多个领域的几个带注释的语料库，我们还研究了简单的领域适应技术对减少领域之间差距的效果。此外，我们将领域适应技术与这两种语言的语言适应技术结合起来。我们的结果表明，域外和其他语言数据都可以提高性能。我们相信我们的语言适应方法可能在自然语言处理的多个领域具有实际意义，尤其是在机器翻译方面。

更新日期：2019-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>