当前位置: X-MOL 学术Int. J. Softw. Eng. Knowl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Two Improved Topic Word Detection Algorithms
International Journal of Software Engineering and Knowledge Engineering ( IF 0.9 ) Pub Date : 2020-10-15 , DOI: 10.1142/s0218194020400173
Zehao Yu 1
Affiliation  

Topic word extraction is the task of identifying single or multi-word expressions that represent the main topics of a document. In this paper, two improved algorithms for extracting and discovering topic words are proposed in the Rapid Topic word Detection (RTD) Algorithm and CategoryTextRank (CTextRank) Algorithm, which can effectively obtain information by extracting and filtering the topic words in the text. The algorithms overcome the shortcomings of traditional topic words discovering algorithms that require deep linguistic knowledge, domain or language specific annotated corpora. The two algorithms we proposed can process both short and long text. The biggest advantage of the algorithms is that they are unsupervised machine learning algorithms. They need not be trained to process text directly to get topic words. The Accuracy rate, recall rate and F-measure index have been greatly improved when using the two algorithms which show that the results obtained compare favorably with previously published results on datasets Inspec and SemEval. The first algorithm Rapid Topicword Detection improves the metrics compared to PositionRank and TextRank, the second algorithm CategoryTextRank improves the metrics compared to TextRank, SingleRank and TF-IDF.

中文翻译:

两种改进的主题词检测算法

主题词提取是识别代表文档主要主题的单个或多个词表达的任务。本文在快速主题词检测(RTD)算法和CategoryTextRank(CTextRank)算法中提出了两种改进的主题词提取和发现算法,可以有效地通过提取和过滤文本中的主题词来获取信息。该算法克服了传统主题词发现算法需要深入的语言知识、领域或语言特定的注释语料库的缺点。我们提出的两种算法都可以处理短文本和长文本。这些算法的最大优点是它们是无监督的机器学习算法。他们不需要被训练直接处理文本来获取主题词。准确率,使用这两种算法时,召回率和 F-measure 指数得到了极大的提高,这表明所获得的结果与之前在数据集 Inspec 和 SemEval 上发布的结果相比具有优势。第一个算法 Rapid Topicword Detection 相比 PositionRank 和 TextRank 改进了指标,第二个算法 CategoryTextRank 相比 TextRank、SingleRank 和 TF-IDF 改进了指标。
更新日期:2020-10-15
down
wechat
bug