当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A patent keywords extraction method using TextRank model with prior public knowledge
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2021-03-29 , DOI: 10.1007/s40747-021-00343-8
Zhaoxin Huang , Zhenping Xie

For large amount of patent texts, how to extract their keywords in an unsupervised way is a very important problem. In existing methods, only the own information of patent texts is analyzed. In this study, an improved TextRank model is proposed, in which prior public knowledge is effectively utilized. Specifically, two following points are first considered: (1) a TextRank network is constructed for each patent text, (2) a prior knowledge network is constructed based on public dictionary data, in which network edges represent the prior interpretation relationship among all dictionary words in dictionary entries. Then, an improved node rank value evaluation formula is designed for TextRank networks of patent texts, in which prior interpretation information in prior knowledge network are introduced. Finally, patent keywords can be extracted by finding top-k node words with higher node rank values. In our experiments, patent text clustering task is used to examine the performance of proposed method, wherein several comparison experiments are executed. Corresponding results demonstrate that, new method can markedly obtain better performance than existing methods for patent keywords extraction task in an unsupervised way.



中文翻译:

具有先验知识的使用TextRank模型的专利关键词提取方法

对于大量的专利文本,如何以无监督的方式提取其关键字是一个非常重要的问题。在现有方法中,仅分析专利文本自身的信息。在这项研究中,提出了一种改进的TextRank模型,其中有效地利用了先前的公共知识。具体来说,首先要考虑以下两个方面:(1)为每个专利文本构建一个TextRank网络,(2)基于公共词典数据构建一个先验知识网络,其中网络边缘代表所有词典词之间的先验解释关系在字典条目中。然后,针对专利文本的TextRank网络设计了一种改进的节点秩值评价公式,引入了先验知识网络中的先验解释信息。最后,可以通过找到具有较高节点等级值的前k个节点词来提取专利关键字。在我们的实验中,专利文本聚类任务用于检查所提出方法的性能,其中执行了多个比较实验。相应的结果表明,与现有方法相比,新方法可以在无监督的情况下明显优于现有方法。

更新日期:2021-03-30
down
wechat
bug