当前位置: X-MOL 学术Adv. Eng. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An information entropy and latent Dirichlet allocation approach to noise patent filtering
Advanced Engineering Informatics ( IF 8.8 ) Pub Date : 2021-01-13 , DOI: 10.1016/j.aei.2020.101243
Janghyeok Yoon , Byeongki Jeong , Mujin Kim , Changyong Lee

Defining valid patents in a particular technological field is an indispensable step in patent analysis. To minimise the risk of missing valid patents, domain experts manually exclude irrelevant patents, known as noise patents, from an initial patent set derived using a loose retrieval query. However, this task has become time-consuming and labour intensive due to the increasing number of patents and rising complexity of technological knowledge. This study proposes a semi-automated approach to noise patent filtering based on information entropy theory and latent Dirichlet allocation. The proposed approach comprises four discrete steps: (1) structuring patents using a term-weighting method; (2) recommending noise patent seeds based on the information quantity of patents in terms of focal keyword groups; (3) measuring text similarities for patent clustering using latent Dirichlet allocation; and (4) identifying potential noise patent clusters with respect to the noise patent seeds. Our case study confirms that the proposed approach is valuable as a complementary noise patent filtering tool that will enable domain experts to focus more on their own knowledge-intensive tasks such as prior art analysis and research and development (R&D) strategy formulation.



中文翻译:

一种信息熵和潜在狄利克雷分配方法进行噪声专利过滤

在特定技术领域中定义有效专利是专利分析中必不可少的步骤。为了最大程度地减少丢失有效专利的风险,领域专家会从使用松散检索查询得出的初始专利集中手动排除不相关的专利(称为噪音专利)。然而,由于专利数量的增加和技术知识的复杂性的增加,该任务变得费时费力。本研究提出了一种基于信息熵理论和潜在狄利克雷分配的半自动噪声专利过滤方法。所提出的方法包括四个离散步骤:(1)使用期限加权法构造专利;(2)根据重点关键词群,根据专利信息量推荐噪声专利种子;(3)使用潜在的Dirichlet分配来测量专利聚类的文本相似度;(4)确定有关噪声专利种子的潜在噪声专利集群。我们的案例研究证实,所提出的方法作为一种补充性的噪声专利过滤工具非常有价值,它将使领域专家能够将更多精力放在自己的知识密集型任务上,例如现有技术分析和研发(R&D)策略制定。

更新日期:2021-01-14
down
wechat
bug