当前位置: X-MOL 学术Mobile Netw. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese
Mobile Networks and Applications ( IF 2.3 ) Pub Date : 2021-01-03 , DOI: 10.1007/s11036-020-01725-x
Shuai Liu , Tenghui He , Jianhua Dai

Chinese word segmentation is an important research direction in related research on elementary mathematics knowledge extraction. The speed of segmentation directly affects subsequent applications, and the accuracy of segmentation directly affects corresponding research in the next step. In the machine learning methods for extracting basic mathematical knowledge points, the Conditional Random Field (CRF) model implements new word discovery well, and is increasingly used in knowledge extraction of basic mathematics. This article first introduces the traditional CRF process of named entity recognition. Then, an improved algorithm CRF++for conditional field model is proposed. Since the recognition rate of named entities based on traditional machine learning methods is not high, a post-processing method for entity recognition that automatically generates a dictionary is proposed. After identifying mathematical entities, a pruning strategy combining Viterbi algorithm and rules is proposed to achieve a higher recognition rate of elementary mathematical entities. Finally, several methods of disambiguation after entity recognition are introduced.



中文翻译:

基于CRF算法的汉语基础数学知识提取研究

中文分词是基础数学知识提取相关研究的重要研究方向。分割的速度直接影响后续的应用,而分割的准确性直接影响下一步的相应研究。在用于提取基本数学知识点的机器学习方法中,条件随机场(CRF)模型很好地实现了新词发现,并且越来越多地用于基本数学知识的提取中。本文首先介绍命名实体识别的传统CRF过程。然后,提出了一种用于条件场模型的改进算法CRF ++。由于基于传统机器学习方法的命名实体的识别率不高,提出了一种自动生成字典的实体识别后处理方法。在识别数学实体之后,提出了一种结合维特比算法和规则的修剪策略,以实现较高的基本数学实体识别率。最后,介绍了几种实体识别后的消歧方法。

更新日期:2021-01-03
down
wechat
bug