当前位置: X-MOL 学术Hum. Cent. Comput. Inf. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Research paper classification systems based on TF-IDF and LDA schemes
Human-centric Computing and Information Sciences ( IF 3.9 ) Pub Date : 2019-08-26 , DOI: 10.1186/s13673-019-0192-7
Sang-Woon Kim , Joon-Min Gil

With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order to overcome the limitations, this paper proposes a research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects. The proposed system extracts representative keywords from the abstracts of each paper and topics by Latent Dirichlet allocation (LDA) scheme. Then, the K-means clustering algorithm is applied to classify the whole papers into research papers with similar subjects, based on the Term frequency-inverse document frequency (TF-IDF) values of each paper.

中文翻译:

基于TF-IDF和LDA方案的研究论文分类系统

随着计算机和信息技术的不断发展,在线和离线发表了大量研究论文,并且随着新研究领域的不断建立,用户在寻找和分类他们感兴趣的研究论文时遇到了很多麻烦。为了克服这些局限性,本文提出了一种研究论文分类系统,该系统可以将研究论文聚类为有意义的类,在这些类中,论文很可能具有相似的主题。所提出的系统通过潜在狄利克雷分配(LDA)方案从每篇论文和主题的摘要中提取代表性关键字。然后,基于每篇论文的词频-逆文档频率(TF-IDF)值,使用K-means聚类算法将整个论文分类为具有相似主题的研究论文。
更新日期:2019-08-26
down
wechat
bug