Lexical ambiguity detection in professional discourse,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Lexical ambiguity detection in professional discourse
Information Processing & Management ( IF 7.4 ) Pub Date : 2022-07-06 , DOI: 10.1016/j.ipm.2022.103000
Yang Liu , Alan Medlar , Dorota Głowacka

Professional discourse is the language used by specialists, such as lawyers, doctors and academics, to communicate the knowledge and assumptions associated with their respective fields. Professional discourse can be especially difficult for non-specialists to understand due to the lexical ambiguity of commonplace words that have a different or more specific meaning within a specialist domain. This phenomena also makes it harder for specialists to communicate with the general public because they are similarly unaware of the potential for misunderstandings.

In this article, we present an approach for detecting domain terms with lexical ambiguity versus everyday English. We demonstrate the efficacy of our approach with three case studies in statistics, law and biomedicine. In all case studies, we identify domain terms with a precision@100 greater than 0.9, outperforming the best performing baseline by 18.1–91.7%. Most importantly, we show this ranking is broadly consistent with semantic differences. Our results highlight the difficulties that existing semantic difference methods have in the cross-domain setting, which rank non-domain terms highly due to noise or biases in the data. We additionally show that our approach generalizes to short phrases and investigate its data efficiency by varying the number of labeled examples.

中文翻译：

专业话语中的词汇歧义检测

专业话语是律师、医生和学者等专家用来交流与各自领域相关的知识和假设的语言。由于在专业领域内具有不同或更具体含义的普通词的词汇模糊性，非专业人士可能特别难以理解专业话语。这种现象也使专家更难与公众交流，因为他们同样没有意识到潜在的误解。

在本文中，我们提出了一种用于检测具有词汇歧义的领域术语与日常英语的方法。我们通过统计、法律和生物医学方面的三个案例研究证明了我们方法的有效性。在所有案例研究中，我们识别出精度@100 大于 0.9 的域术语，比表现最好的基线高出 18.1-91.7%。最重要的是，我们表明该排名与语义差异大体一致。我们的结果突出了现有语义差异方法在跨域设置中的困难，由于数据中的噪声或偏差，非域术语排名很高。我们还表明，我们的方法可以推广到短语，并通过改变标记示例的数量来研究其数据效率。

更新日期：2022-07-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11