当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings
arXiv - CS - Computation and Language Pub Date : 2020-09-23 , DOI: arxiv-2009.11226 Alexander Kalinowski and Yuan An
arXiv - CS - Computation and Language Pub Date : 2020-09-23 , DOI: arxiv-2009.11226 Alexander Kalinowski and Yuan An
Sentence embeddings encode natural language sentences as low-dimensional
dense vectors. A great deal of effort has been put into using sentence
embeddings to improve several important natural language processing tasks.
Relation extraction is such an NLP task that aims at identifying structured
relations defined in a knowledge base from unstructured text. A promising and
more efficient approach would be to embed both the text and structured
knowledge in low-dimensional spaces and discover semantic alignments or
mappings between them. Although a number of techniques have been proposed in
the literature for embedding both sentences and knowledge graphs, little is
known about the structural and semantic properties of these embedding spaces in
terms of relation extraction. In this paper, we investigate the aforementioned
properties by evaluating the extent to which sentences carrying similar senses
are embedded in close proximity sub-spaces, and if we can exploit that
structure to align sentences to a knowledge graph. We propose a set of
experiments using a widely-used large-scale data set for relation extraction
and focusing on a set of key sentence embedding methods. We additionally
provide the code for reproducing these experiments at
https://github.com/akalino/semantic-structural-sentences. These embedding
methods cover a wide variety of techniques ranging from simple word embedding
combination to transformer-based BERT-style model. Our experimental results
show that different embedding spaces have different degrees of strength for the
structural and semantic properties. These results provide useful information
for developing embedding-based relation extraction methods.
中文翻译:
句子嵌入的结构和语义特性的比较研究
句子嵌入将自然语言句子编码为低维密集向量。在使用句子嵌入来改进几个重要的自然语言处理任务方面已经投入了大量精力。关系抽取就是这样一项 NLP 任务,旨在从非结构化文本中识别知识库中定义的结构化关系。一种有前途且更有效的方法是将文本和结构化知识嵌入到低维空间中,并发现它们之间的语义对齐或映射。尽管在文献中已经提出了许多用于嵌入句子和知识图的技术,但对于这些嵌入空间在关系提取方面的结构和语义特性知之甚少。在本文中,我们通过评估具有相似意义的句子嵌入紧密邻近子空间的程度,以及我们是否可以利用该结构将句子与知识图对齐来研究上述属性。我们提出了一组实验,使用广泛使用的大规模数据集进行关系提取,并专注于一组关键句子嵌入方法。我们还在 https://github.com/akalino/semantic-structural-sentences 上提供了重现这些实验的代码。这些嵌入方法涵盖了从简单的词嵌入组合到基于转换器的 BERT 样式模型的各种技术。我们的实验结果表明,不同的嵌入空间在结构和语义属性方面具有不同程度的强度。
更新日期:2020-09-24
中文翻译:
句子嵌入的结构和语义特性的比较研究
句子嵌入将自然语言句子编码为低维密集向量。在使用句子嵌入来改进几个重要的自然语言处理任务方面已经投入了大量精力。关系抽取就是这样一项 NLP 任务,旨在从非结构化文本中识别知识库中定义的结构化关系。一种有前途且更有效的方法是将文本和结构化知识嵌入到低维空间中,并发现它们之间的语义对齐或映射。尽管在文献中已经提出了许多用于嵌入句子和知识图的技术,但对于这些嵌入空间在关系提取方面的结构和语义特性知之甚少。在本文中,我们通过评估具有相似意义的句子嵌入紧密邻近子空间的程度,以及我们是否可以利用该结构将句子与知识图对齐来研究上述属性。我们提出了一组实验,使用广泛使用的大规模数据集进行关系提取,并专注于一组关键句子嵌入方法。我们还在 https://github.com/akalino/semantic-structural-sentences 上提供了重现这些实验的代码。这些嵌入方法涵盖了从简单的词嵌入组合到基于转换器的 BERT 样式模型的各种技术。我们的实验结果表明,不同的嵌入空间在结构和语义属性方面具有不同程度的强度。