当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2021-02-18 , DOI: 10.1016/j.eswa.2021.114710
Vivek Mehta , Seema Bawa , Jasmeet Singh

Document clustering in text mining is a problem that is heavily researched upon. It is observed that individual approaches based on statistical features and semantic features have been extensively used to solve this problem. However, techniques combining the advantages of both types of features have not been frequently researched upon. Specifically, when the growth in the size of textual data is immense, there is a need for such an approach that combines the advantages of both types of features to give more accurate results within an acceptable range of time. In this paper, a document clustering technique is proposed that combines the effectiveness of the statistical features (using TF-IDF) and semantic features (using lexical chains). It is designed to use a fewer number of features while maintaining a comparable and even better accuracy for the task of document clustering.



中文翻译:

静态聚类:将统计和语义特征相结合以对大型文本数据集进行聚类

文本挖掘中的文档聚类是一个受到大量研究的问题。可以看出,基于统计特征和语义特征的各种方法已被广泛用于解决此问题。然而,结合两种类型的特征的优点的技术尚未被频繁研究。具体地,当文本数据的大小的增长极大时,需要这样一种方法,其结合两种类型的特征的优点以在可接受的时间范围内给出更准确的结果。在本文中,提出了一种文档聚类技术,该技术结合了统计特征(使用TF-IDF)和语义特征(使用词法链)的有效性。

更新日期:2021-03-03
down
wechat
bug