当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An efficient generic approach for automatic taxonomy generation using HMMs
Pattern Analysis and Applications ( IF 3.7 ) Pub Date : 2020-09-18 , DOI: 10.1007/s10044-020-00918-0
Sylvain Iloga , Olivier Romain , Maurice Tchuenté

Taxonomies are essential tools for fast information retrieval and classification of knowledge. Many existing techniques for automatic taxonomy generation strongly depend on the specific properties of a particular domain and are consequently hard to apply to other domains. Some attempts have been made to design taxonomies for multiple domains. Unfortunately, they induce high hierarchical classification error rates for some datasets. The automatic design of a taxonomy requires the capability of measuring the similarity between classes. More precisely, the fact that two classes are near intuitively implies that some elements of one class are scattered in the neighborhood of some elements of the other class. This observation is used in this paper to propose a new generic technique for automatic taxonomy generation. A topological analysis of the neighborhood of each instance is first performed. The results of this analysis are used to initialize and train a hidden Markov model for each class. The model of a given class c captures the frequencies of the classes found in the neighborhood of the instances of c, from the most dominant class to the least dominant. The similarities between these models are finally used to derive a taxonomy. Hierarchical classification experiments realized on 20 datasets from various domains showed an average accuracy of \(97.22\%\) and a standard deviation of \(4.11\%\). Comparison results revealed that the proposed approach outperforms existing work with accuracy gains reaching \(38.62\%\) for one dataset.



中文翻译:

使用HMM自动生成分类法的有效通用方法

分类法是快速信息检索和知识分类的重要工具。用于自动分类法生成的许多现有技术在很大程度上取决于特定域的特定属性,因此很难应用于其他域。已经进行了一些尝试来设计用于多个域的分类法。不幸的是,它们为某些数据集带来了高层次的分类错误率。分类法的自动设计要求能够测量类之间的相似性。更准确地说,两个类别在直觉上接近这一事实意味着一个类别的某些元素分散在另一个类别的某些元素的附近。本文使用此观察结果来提出一种用于自动分类法生成的新通用技术。首先对每个实例的邻域进行拓扑分析。分析的结果用于初始化和训练每个类的隐马尔可夫模型。给定类的模型ç捕获类的频率中的情况下,附近发现Ç,从最统治阶级到最不重要。这些模型之间的相似性最终用于导出分类法。在来自各个领域的20个数据集上进行的分层分类实验显示,平均精度为\(97.22 \%\)和标准偏差为\(4.11 \%\)。比较结果表明,对于一个数据集,该方法的性能优于现有工作,其准确度增益达到\(38.62 \%\)

更新日期:2020-09-20
down
wechat
bug