当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Network measures: A new paradigm towards reliable novel word sense detection
Information Processing & Management ( IF 7.4 ) Pub Date : 2019-11-28 , DOI: 10.1016/j.ipm.2019.102173
Abhik Jana , Animesh Mukherjee , Pawan Goyal

In this era of digitization, with the fast flow of information on the web, words are being used to denote newer meanings. Thus novel sense detection becomes a crucial and challenging task in order to build any natural language processing application which depends on the efficient semantic representation of words. With the recent availability of large amounts of digitized texts, automated analysis of language evolution has become possible. Given corpus from two different time periods, the main focus of our work is to detect the words evolved with a novel sense precisely. We pose this problem as a binary classification task to detect whether a new sense of a target word has emerged. This paper presents a unique proposal based on network features to improve the precision of this task of detecting emerged new sense of a target word. For a candidate word where a new sense has been detected by comparing the sense clusters induced at two different time periods, we further compare the network properties of the subgraphs induced from novel sense clusters across these two time periods. Using the mean fractional change in edge density, structural similarity and average path length as features in a Support Vector Machine (SVM) classifier, manual evaluation gives precision values of 0.86 and 0.74 for the task of new sense detection, when tested on 2 distinct time-point pairs, in comparison to the precision values in the range of 0.23-0.32, when the proposed scheme is not used. The outlined method can, therefore, be used as a new post-hoc step to improve the precision of novel word sense detection in a robust and reliable way where the underlying framework uses a graph structure. Another important observation is that even though our proposal is a post-hoc step, it can be used in isolation and that itself results in a very decent performance achieving a precision of 0.54-0.62. Finally, we also show that our method is able to detect well-known historical shifts in 80% cases.



中文翻译:

网络措施:可靠的新型词义检测的新范例

在这个数字化时代,随着信息在网络上的快速流动,单词被用来表示较新的含义。因此,为了构建依赖于单词的有效语义表示的任何自然语言处理应用程序,新颖的感觉检测成为一项至关重要且具有挑战性的任务。随着最近大量数字化文本的可用性,对语言演变的自动分析已成为可能。给定来自两个不同时间段的语料库,我们工作的主要重点是准确地检测具有新颖意义的词。我们将此问题作为二进制分类任务来检测是否出现了目标词的新含义。本文提出了一种基于网络特征的独特建议,以提高检测目标单词出现的新含义的任务的准确性。对于通过比较在两个不同时间段内诱导的有义簇检测到新感觉的候选词,我们进一步比较了在这两个时间段内从新颖意义簇所诱导的子图的网络特性。使用边缘密度,结构相似性和平均路径长度的平均分数变化作为支持向量机(SVM)分类器的功能,在2个不同的时间进行测试时,人工评估得出的新感觉检测任务的精度值分别为0.86和0.74不使用建议的方案时,与0.23-0.32范围内的精度值相比,具有双点对。因此,可以将概述的方法用作新的事后步骤,以在底层框架使用图形结构的情况下以健壮和可靠的方式提高新颖的词义检测的精度。另一个重要的观察结果是,即使我们的建议是事后的步骤,也可以将其单独使用,并且其本身会产生非常不错的性能,从而达到0.54-0.62的精度。最后,我们还表明,我们的方法能够在80%的情况下检测到众所周知的历史变化。

更新日期:2020-04-21
down
wechat
bug