当前位置: X-MOL 学术Inf. Retrieval J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning techniques for XML (co-)clustering by structure-constrained phrases
Information Retrieval Journal ( IF 2.5 ) Pub Date : 2017-08-04 , DOI: 10.1007/s10791-017-9314-x
Gianni Costa , Riccardo Ortale

A new method is proposed for clustering XML documents by structure-constrained phrases. It is implemented by three machine-learning approaches previously unexplored in the XML domain, namely non-negative matrix (tri-)factorization, co-clustering and automatic transactional clustering. A novel class of XML features approximately captures structure-constrained phrases as n-grams contextualized by root-to-leaf paths. Experiments over real-world benchmark XML corpora show that the effectiveness of the three approaches improves with contextualized n-grams of suitable length. This confirms the validity of the devised method from multiple clustering perspectives. Two approaches overcome in effectiveness several state-of-the-art competitors. The scalability of the three approaches is investigated, too.

中文翻译:

通过结构约束的短语进行XML(共)聚类的机器学习技术

提出了一种通过结构约束短语对XML文档进行聚类的新方法。它是通过XML领域以前未曾探索过的三种机器学习方法来实现的,即非负矩阵(tri)分解,共聚和自动事务聚类。一类新颖的XML功能可以近似地捕获结构约束的短语,这些短语是由根到叶路径上下文化的n-gram。在现实世界中的基准XML语料库上进行的实验表明,这三种方法的有效性随着适当长度的上下文n-gram的提高而提高。这从多个聚类的角度证实了该方法的有效性。两种方法有效地克服了几个最先进的竞争对手。还研究了这三种方法的可伸缩性。
更新日期:2017-08-04
down
wechat
bug