当前位置: X-MOL 学术Ann. Math. Artif. Intel. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic string operation for specializing AHC algorithm for text clustering
Annals of Mathematics and Artificial Intelligence ( IF 1.2 ) Pub Date : 2020-01-13 , DOI: 10.1007/s10472-019-09687-x
Taeho Jo

This article proposes the modified AHC (Agglomerative Hierarchical Clustering) algorithm which clusters string vectors, instead of numerical vectors, as the approach to the text clustering. The results from applying the string vector based algorithms to the text clustering were successful in previous works and synergy effect between the text clustering and the word clustering is expected by combining them with each other; the two facts become motivations for this research. In this research, we define the operation on string vectors called semantic similarity, and modify the AHC algorithm by adopting the proposed similarity metric as the approach to the text clustering. The proposed AHC algorithm is empirically validated as the better approach in clustering texts in news articles and opinions. We need to define and characterize mathematically more operations on string vectors for modifying more advanced machine learning algorithms.

中文翻译:

专门用于文本聚类的 AHC 算法的语义字符串操作

本文提出了改进的 AHC(Agglomerative Hierarchical Clustering)算法,该算法将字符串向量而不是数值向量作为文本聚类的方法。将基于字符串向量的算法应用于文本聚类的结果在以前的工作中是成功的,文本聚类和词聚类之间的相互结合有望产生协同效应;这两个事实成为这项研究的动力。在本研究中,我们定义了对字符串向量的操作称为语义相似度,并通过采用所提出的相似度度量作为文本聚类的方法来修改 AHC 算法。所提出的 AHC 算法经过经验验证,是对新闻文章和观点中的文本进行聚类的更好方法。
更新日期:2020-01-13
down
wechat
bug