Machine learning techniques for XML (co-)clustering by structure-constrained phrases,Information Retrieval Journal

当前位置： X-MOL 学术 › Inf. Retrieval J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Machine learning techniques for XML (co-)clustering by structure-constrained phrases
Information Retrieval Journal ( IF 2.5 ) Pub Date : 2017-08-04 , DOI: 10.1007/s10791-017-9314-x
Gianni Costa , Riccardo Ortale

A new method is proposed for clustering XML documents by structure-constrained phrases. It is implemented by three machine-learning approaches previously unexplored in the XML domain, namely non-negative matrix (tri-)factorization, co-clustering and automatic transactional clustering. A novel class of XML features approximately captures structure-constrained phrases as n-grams contextualized by root-to-leaf paths. Experiments over real-world benchmark XML corpora show that the effectiveness of the three approaches improves with contextualized n-grams of suitable length. This confirms the validity of the devised method from multiple clustering perspectives. Two approaches overcome in effectiveness several state-of-the-art competitors. The scalability of the three approaches is investigated, too.

中文翻译：

通过结构约束的短语进行XML（共）聚类的机器学习技术

提出了一种通过结构约束短语对XML文档进行聚类的新方法。它是通过XML领域以前未曾探索过的三种机器学习方法来实现的，即非负矩阵（tri）分解，共聚和自动事务聚类。一类新颖的XML功能可以近似地捕获结构约束的短语，这些短语是由根到叶路径上下文化的n-gram。在现实世界中的基准XML语料库上进行的实验表明，这三种方法的有效性随着适当长度的上下文n-gram的提高而提高。这从多个聚类的角度证实了该方法的有效性。两种方法有效地克服了几个最先进的竞争对手。还研究了这三种方法的可伸缩性。

更新日期：2017-08-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>