当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Transfer learning for fine-grained entity typing
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2021-02-13 , DOI: 10.1007/s10115-021-01549-5
Feng Hou , Ruili Wang , Yi Zhou

Fine-grained entity typing (FGET) is to classify the mentions of entities into hierarchical fine-grained semantic types. There are two main issues with existing FGET approaches. Firstly, the process of training corpora for FGET is normally to label the data automatically, which inevitably induces noises. Existing approaches either directly tweak noisy labels in corpora by heuristics or algorithmically retreat to parental types, both leading to coarse-grained type labels instead of fine-grained ones. Secondly, existing approaches usually use recurrent neural networks to generate feature representations of mention phrases and their contexts, which, however, perform relatively poor on long contexts and out-of-vocabulary (OOV) words. In this paper, we propose a transfer learning-based approach to extract more efficient feature representations and offset label noises. More precisely, we adopt three transfer learning schemes: (i) transferring sub-word embeddings to generate more efficient OOV embeddings; (ii) using a pre-trained language model to generate more efficient context features; (iii) using a pre-trained topic model to transfer the topic-type relatedness through topic anchors and select confusing fine-grained types at inference time. The pre-trained topic model can offset the label noises without retreating to coarse-grained types. The experimental results demonstrate the effectiveness of our transfer learning approach for FGET.



中文翻译:

转移学习以进行细粒度的实体打字

细粒度实体类型(FGET)是将对实体的提及分类为分层的细粒度语义类型。现有的FGET方法存在两个主要问题。首先,为FGET训练语料库的过程通常是自动标记数据,这不可避免地会产生噪音。现有的方法要么通过启发式直接调整语料库中的嘈杂标签,要么通过算法退缩到父母类型,这两种方法都会导致使用粗粒度的标签而不是细粒度的标签。其次,现有方法通常使用递归神经网络来生成提及短语及其上下文的特征表示,但是,在较长的上下文和词汇不足(OOV)单词上,其表现相对较差。在本文中,我们提出了一种基于转移学习的方法来提取更有效的特征表示和偏移标签噪声。更准确地说,我们采用三种转移学习方案:(i)转移子词嵌入以生成更有效的OOV嵌入;(ii)使用预先训练的语言模型来生成更有效的上下文特征;(iii)使用预先训练的主题模型通过主题锚点转移主题类型的相关性,并在推理时选择令人困惑的细粒度类型。预先训练的主题模型可以抵消标签噪音,而不会退缩为粗粒度类型。实验结果证明了我们的FGET转移学习方法的有效性。(ii)使用预先训练的语言模型来生成更有效的上下文特征;(iii)使用预先训练的主题模型通过主题锚点转移主题类型的相关性,并在推理时选择令人困惑的细粒度类型。预先训练的主题模型可以抵消标签噪音,而不会退缩为粗粒度类型。实验结果证明了我们的FGET转移学习方法的有效性。(ii)使用预先训练的语言模型来生成更有效的上下文特征;(iii)使用预先训练的主题模型通过主题锚点转移主题类型的相关性,并在推理时选择令人困惑的细粒度类型。预先训练的主题模型可以抵消标签噪音,而不会退缩为粗粒度类型。实验结果证明了我们的FGET转移学习方法的有效性。

更新日期:2021-02-15
down
wechat
bug