当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings
arXiv - CS - Computation and Language Pub Date : 2020-09-23 , DOI: arxiv-2009.11226
Alexander Kalinowski and Yuan An

Sentence embeddings encode natural language sentences as low-dimensional dense vectors. A great deal of effort has been put into using sentence embeddings to improve several important natural language processing tasks. Relation extraction is such an NLP task that aims at identifying structured relations defined in a knowledge base from unstructured text. A promising and more efficient approach would be to embed both the text and structured knowledge in low-dimensional spaces and discover semantic alignments or mappings between them. Although a number of techniques have been proposed in the literature for embedding both sentences and knowledge graphs, little is known about the structural and semantic properties of these embedding spaces in terms of relation extraction. In this paper, we investigate the aforementioned properties by evaluating the extent to which sentences carrying similar senses are embedded in close proximity sub-spaces, and if we can exploit that structure to align sentences to a knowledge graph. We propose a set of experiments using a widely-used large-scale data set for relation extraction and focusing on a set of key sentence embedding methods. We additionally provide the code for reproducing these experiments at https://github.com/akalino/semantic-structural-sentences. These embedding methods cover a wide variety of techniques ranging from simple word embedding combination to transformer-based BERT-style model. Our experimental results show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.

中文翻译:

句子嵌入的结构和语义特性的比较研究

句子嵌入将自然语言句子编码为低维密集向量。在使用句子嵌入来改进几个重要的自然语言处理任务方面已经投入了大量精力。关系抽取就是这样一项 NLP 任务,旨在从非结构化文本中识别知识库中定义的结构化关系。一种有前途且更有效的方法是将文本和结构化知识嵌入到低维空间中,并发现它们之间的语义对齐或映射。尽管在文献中已经提出了许多用于嵌入句子和知识图的技术,但对于这些嵌入空间在关系提取方面的结构和语义特性知之甚少。在本文中,我们通过评估具有相似意义的句子嵌入紧密邻近子空间的程度,以及我们是否可以利用该结构将句子与知识图对齐来研究上述属性。我们提出了一组实验,使用广泛使用的大规模数据集进行关系提取,并专注于一组关键句子嵌入方法。我们还在 https://github.com/akalino/semantic-structural-sentences 上提供了重现这些实验的代码。这些嵌入方法涵盖了从简单的词嵌入组合到基于转换器的 BERT 样式模型的各种技术。我们的实验结果表明,不同的嵌入空间在结构和语义属性方面具有不同程度的强度。
更新日期:2020-09-24
down
wechat
bug