当前位置: X-MOL 学术Appl. Netw. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Graph-based exploration and clustering analysis of semantic spaces
Applied Network Science ( IF 1.3 ) Pub Date : 2019-11-13 , DOI: 10.1007/s41109-019-0228-y
Alexander Veremyev , Alexander Semenov , Eduardo L. Pasiliao , Vladimir Boginski

The goal of this study is to demonstrate how network science and graph theory tools and concepts can be effectively used for exploring and comparing semantic spaces of word embeddings and lexical databases. Specifically, we construct semantic networks based on word2vec representation of words, which is “learnt” from large text corpora (Google news, Amazon reviews), and “human built” word networks derived from the well-known lexical databases: WordNet and Moby Thesaurus. We compare “global” (e.g., degrees, distances, clustering coefficients) and “local” (e.g., most central nodes and community-type dense clusters) characteristics of considered networks. Our observations suggest that human built networks possess more intuitive global connectivity patterns, whereas local characteristics (in particular, dense clusters) of the machine built networks provide much richer information on the contextual usage and perceived meanings of words, which reveals interesting structural differences between human built and machine built semantic networks. To our knowledge, this is the first study that uses graph theory and network science in the considered context; therefore, we also provide interesting examples and discuss potential research directions that may motivate further research on the synthesis of lexicographic and machine learning based tools and lead to new insights in this area.

中文翻译:

基于图的语义空间探索和聚类分析

这项研究的目的是演示如何有效地使用网络科学和图论工具和概念来探索和比较单词嵌入和词法数据库的语义空间。具体来说,我们基于word2vec单词表示构建语义网络,这是从大型文本语料库(谷歌新闻,亚马逊评论)中“学习”的,以及从著名词汇数据库WordNet和Moby Thesaurus派生的“人为”单词网络。我们比较考虑的网络的“全局”(例如度,距离,聚类系数)和“局部”(例如大多数中心节点和社区类型的密集簇)特征。我们的观察结果表明,人为构建的网络拥有更直观的全局连接模式,而本地机器构建的网络的特征(特别是密集的簇)提供了有关单词的上下文用法和感知的含义的更丰富的信息,这揭示了人类构建的语义与机器构建的语义网络之间有趣的结构差异。据我们所知,这是第一个在考虑的上下文中使用图论和网络科学的研究。因此,我们还提供了有趣的示例并讨论了潜在的研究方向,这些方向可能会鼓励对基于词典和机器学习的工具的综合进行进一步的研究,并在该领域获得新的见识。
更新日期:2019-11-13
down
wechat
bug