Text segmentation for patent claim simplification via Bidirectional Long-Short Term Memory and Conditional Random Field,Computational Intelligence

当前位置： X-MOL 学术 › Comput. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Text segmentation for patent claim simplification via Bidirectional Long-Short Term Memory and Conditional Random Field
Computational Intelligence ( IF 1.8 ) Pub Date : 2021-05-14 , DOI: 10.1111/coin.12455
Boting Geng ₁

Affiliation

Text simplification is a vital work for comprehending patent claims due to its complex syntactic structures and lengthy sentences. Therefore, almost all patent analysis practitioners cannot be able to directly and intuitively understand patent essence even through some common natural language processing (NLP) tools are applied to parse these patent claim paragraph or sentences. Universal text analysis tools above is almost useless, or even crashed when applied to some complex paragraphs of patent claims. Therefore, it is necessary to propose a patent text oriented simplification approach to help patent researchers grasp the essence of patent quickly and intuitively. Motivated by the above reason, we in this article propose a simplification method based on deep learning to segment patent claim into shorter and comprehensible sentences for downstream tasks of patent analysis. The proposed approach contains two stages: on one stage, we use a machine learning approach of conditional random field (CRF) to decompose syntactically complex paragraphs into coarse-grained level sentences with simplified structures and complete semantics; on another stage, a deep Learning architecture of bidirectional long-short term memory (Bi-LSTM)-CRF is applied to segment coarse-grained and lengthy sentences of former stage into fined-grained and shorter sentences. Compared with a series of baselines, our patent segmentation architecture based on deep learning of Bi-LSTM-CRF achieves higher performance than any other methods on the evaluation measures of precision, recall, and F1.

中文翻译：

通过双向长短期记忆和条件随机场简化专利权利要求的文本分割

由于其复杂的句法结构和冗长的句子，文本简化是理解专利权利要求的一项重要工作。因此，即使通过一些常见的自然语言处理（NLP）工具来解析这些专利权利要求段落或句子，几乎所有专利分析从业者都无法直接直观地理解专利本质。上述通用文本分析工具几乎无用，甚至在应用于专利权利要求的某些复杂段落时会崩溃。因此，有必要提出一种面向专利文本的简化方法，以帮助专利研究人员快速、直观地掌握专利的本质。出于以上原因，我们在本文中提出了一种基于深度学习的简化方法，将专利权利要求分割成更短且易于理解的句子，用于专利分析的下游任务。所提出的方法包含两个阶段：在一个阶段，我们使用条件随机场（CRF）的机器学习方法将句法复杂的段落分解为具有简化结构和完整语义的粗粒度级句子；在另一个阶段，双向长短期记忆（Bi-LSTM）-CRF的深度学习架构被应用于将前阶段的粗粒度和长句子分割成细粒度和较短的句子。与一系列基线相比，

更新日期：2021-05-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11