Customizable HMM-based measures to accurately compare tree sets,Pattern Analysis and Applications

当前位置： X-MOL 学术 › Pattern Anal. Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Customizable HMM-based measures to accurately compare tree sets
Pattern Analysis and Applications ( IF 3.7 ) Pub Date : 2021-03-31 , DOI: 10.1007/s10044-021-00971-3
Sylvain Iloga

Trees have been topics of much interest since many decades due to various emerging applications using data represented as trees. Several techniques have been developed to compare two trees. But there is a serious lack of metrics to compare weighted trees. Existing approaches do not also allow to explicitly specify the targeted nodes properties on which the comparison should be performed. Furthermore, the problem of comparing two tree sets is not specifically addressed by existing techniques. This paper attempts to solve these problems by first proposing a distance and a similarity for the comparison of two finite sets of rooted ordered trees which can be labeled or not, as well as weighted or unweighted. To achieve this goal, a hidden Markov model is associated with each tree set for each targeted nodes property. The model associated with a tree set T for the targeted nodes property p learns how much the nodes of the trees in T verify property p. The resulting models are finally compared to derive a distance and similarity between the two sets of trees. The previous measures are then generalized for the comparison of unrooted and unordered trees. Flat classification experiments were carried out on two synthetic databases named FirstLast-L and FirstLast-LW available online. They both contain four classes of 100 rooted ordered trees whose specific and non-trivial nodes properties are clearly defined. When the distance proposed in this paper is selected as metric for the Nearest Neighbor classifier, a perfect accuracy of \(100\%\) is obtained for these two databases. This performance is \(41\%\) higher than the accuracy exhibited when the widespread tree Edit distance is selected for FirstLast-L.

中文翻译：

可自定义的基于HMM的量度以准确比较树集

几十年来，由于使用以树表示的数据的各种新兴应用，树木一直是人们非常感兴趣的主题。已经开发了几种技术来比较两棵树。但是，严重缺乏衡量加权树的指标。现有方法也不允许显式地指定应在其上执行比较的目标节点属性。此外，现有技术并未具体解决比较两个树集的问题。本文尝试通过提出距离和相似度来比较两个有限集的有根有序树的方法来解决这些问题，该树可以被标记或不被标记，以及被加权或未被加权。为了实现此目标，将隐藏的马尔可夫模型与每个目标节点属性的每个树集相关联。目标节点属性p的T了解T验证属性p中树的节点数。最后，将生成的模型进行比较，以得出两组树之间的距离和相似度。然后将先前的措施推广到比较无根树和无序树。在两个可在线获得的名为FirstLast-L和FirstLast-LW的合成数据库上进行了平面分类实验。它们都包含四类100根有序的树，它们的特定和非平凡的节点属性已明确定义。当在本文所提出的距离被选择为度量最近邻分类器，一个完美的准确性\（100 \％\）是针对这两个数据库获得的。该性能比为FirstLast-L选择大树“编辑距离”时的精度高\（41 \％\）。

更新日期：2021-03-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11