当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Few-Shot Relation Extraction Towards Special Interests
Big Data Research ( IF 3.3 ) Pub Date : 2021-09-16 , DOI: 10.1016/j.bdr.2021.100273
Siqi Fan 1 , Binbin Zhang 2 , Silin Zhou 1 , Menghan Wang 1 , Ke Li 1
Affiliation  

With the continuous development of natural language processing, Relation extraction (RE) has been intensively studied and well performed in extracting relations from unstructured texts in both English and modern Chinese. In this paper, we study to extract relations from a special type of text, that is, Chinese textual description of Han Dynasty Stone Reliefs (HanDSR). We aim to develop an efficient relation extractor for special interests with a small number of samples. The problem is challenging due to the large number of rare words in the texts and the mixed-use of modern and ancient Chinese in the same sentence without a domain corpus. To address these problems, we propose a relation extraction method based on dependency parsing and utilize the information of HanDSR on the basic parser. To exploit the representation of dependency trees, we design five dependency semantic path patterns(DSPPs) to extract relation triples of special interests. Besides, we build the HanDSR Treebank that includes 4190 sentences, 28124 dependency trees, following the annotation format of the Penn Chinese Treebank 8.0, which addresses the lack of domain-specific corpus and could be used in extract relations from such texts. Extensive experiments on HanDSR dataset demonstrate the accuracy and efficiency of our solution. The experimental results illustrate that our proposal significantly outperforms the rule-based relation extraction model in both effectiveness and efficiency.



中文翻译:

针对特殊兴趣的少镜头关系提取

随着自然语言处理的不断发展,[R兴高采烈ēxtraction (RE) 在从英语和现代汉语的非结构化文本中提取关系方面得到了深入研究并表现良好。在本文中,我们研究从一种特殊类型的文本中提取关系,即汉代石刻的中文文本描述(HanDSR)。我们的目标是开发一种高效的关系提取器,用于具有少量样本的特殊兴趣。由于文本中大量的稀有词以及在没有领域语料库的情况下在同一个句子中现代和古代汉语的混合使用,该问题具有挑战性。为了解决这些问题,我们提出了一种基于依赖解析的关系提取方法,并在基本解析器上利用了 HanDSR 的信息。为了利用依赖树的表示,我们设计了五种依赖语义路径模式(DSPP)来提取特殊兴趣的关系三元组。此外,我们按照 Penn Chinese Treebank 8.0 的注释格式构建了包含 4190 个句子、28124 个依赖树的 HanDSR Treebank,它解决了缺乏特定领域语料库的问题,可用于从此类文本中提取关系。HanDSR 数据集上的大量实验证明了我们解决方案的准确性和效率。实验结果表明,我们的提议在有效性和效率上都明显优于基于规则的关系提取模型。它解决了缺乏特定领域语料库的问题,可用于从此类文本中提取关系。HanDSR 数据集上的大量实验证明了我们解决方案的准确性和效率。实验结果表明,我们的提议在有效性和效率上都明显优于基于规则的关系提取模型。它解决了缺乏特定领域语料库的问题,可用于从此类文本中提取关系。HanDSR 数据集上的大量实验证明了我们解决方案的准确性和效率。实验结果表明,我们的提议在有效性和效率上都明显优于基于规则的关系提取模型。

更新日期:2021-09-28
down
wechat
bug