当前位置: X-MOL 学术IEEE Trans. Inform. Theory › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lossless Compression of Binary Trees with Correlated Vertex Names.
IEEE Transactions on Information Theory ( IF 2.5 ) Pub Date : 2019-09-21 , DOI: 10.1109/tit.2018.2851224
Abram Magner 1 , Krzysztof Turowski 1 , Wojciech Szpankowski 1
Affiliation  

Compression schemes for advanced data structures have become a central modern challenge. Information theory has traditionally dealt with conventional data such as text, images, or video. In contrast, most data available today is multitype and context-dependent. To meet this challenge, we have recently initiated a systematic study of advanced data structures such as unlabeled graphs [8]. In this paper, we continue this program by considering trees with statistically correlated vertex names. Trees come in many forms, but here we deal with binary plane trees (where order of subtrees matters) and their non-plane version (where order of subtrees doesn't matter). Furthermore, we assume that each name is generated by a known memoryless source (horizontal independence), but a symbol of a vertex name depends in a Markovian sense on the corresponding symbol of the parent vertex name (vertical Markovian dependency). Such a model is closely connected to models of phylogenetic trees. While in general the problem of multimodal compression and associated analysis can be extremely complicated, we find that in this natural setting, both the entropy analysis and optimal compression are analytically tractable. We evaluate the entropy for both types of trees. For the plane case, with or without vertex names, we find that a simple two-stage compression scheme is both efficient and optimal. We then present efficient and optimal compression algorithms for the more complicated non-plane case.

中文翻译:

具有相关顶点名称的二叉树的无损压缩。

高级数据结构的压缩方案已成为现代的主要挑战。信息理论传统上处理的是常规数据,例如文本,图像或视频。相反,当今可用的大多数数据是多类型的且取决于上下文。为了应对这一挑战,我们最近启动了对高级数据结构(如未标记图)的系统研究[8]。在本文中,我们通过考虑具有统计相关顶点名称的树来继续执行此程序。树有多种形式,但是这里我们处理二进制平面树(子树的顺序重要)和它们的非平面版本(子树的顺序无关紧要)。此外,我们假设每个名称都是由已知的无记忆源(水平独立性)生成的,但是顶点名称的符号在马尔可夫意义上取决于父顶点名称的对应符号(垂直马尔可夫相关性)。这样的模型与系统发育树的模型紧密相关。虽然一般来说,多峰压缩和相关分析的问题可能非常复杂,但我们发现在这种自然环境下,熵分析和最佳压缩在分析上都是易于处理的。我们评估两种树的熵。对于具有或不具有顶点名称的平面情况,我们发现简单的两阶段压缩方案既有效又最优。然后,我们针对更复杂的非平面情况提出了有效且最佳的压缩算法。虽然一般来说,多峰压缩和相关分析的问题可能非常复杂,但我们发现在这种自然环境下,熵分析和最佳压缩在分析上都是易于处理的。我们评估两种树的熵。对于具有或不具有顶点名称的平面情况,我们发现简单的两阶段压缩方案既有效又最优。然后,我们针对更复杂的非平面情况提出了有效且最佳的压缩算法。虽然一般来说,多峰压缩和相关分析的问题可能非常复杂,但我们发现在这种自然环境下,熵分析和最佳压缩在分析上都是易于处理的。我们评估两种树的熵。对于具有或不具有顶点名称的平面情况,我们发现简单的两阶段压缩方案既有效又最优。然后,我们针对更复杂的非平面情况提出了有效且最佳的压缩算法。我们发现简单的两阶段压缩方案既有效又最优。然后,我们针对更复杂的非平面情况提出了有效且最佳的压缩算法。我们发现简单的两阶段压缩方案既有效又最优。然后,我们针对更复杂的非平面情况提出了有效且最佳的压缩算法。
更新日期:2019-11-01
down
wechat
bug