Compression for Quadratic Similarity Queries,IEEE Transactions on Information Theory

当前位置： X-MOL 学术 › IEEE Trans. Inform. Theory › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Compression for Quadratic Similarity Queries
IEEE Transactions on Information Theory ( IF 2.2 ) Pub Date : 2015-05-01 , DOI: 10.1109/tit.2015.2402972
Amir Ingber ₁ , Thomas Courtade ₁ , Tsachy Weissman ₁

Affiliation

The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on the compressed data. For a Gaussian source, we show that the queries can be answered reliably if and only if the compression rate exceeds a given threshold-the identification rate-which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution.

中文翻译：

二次相似性查询的压缩

考虑对压缩数据进行相似性查询的问题。我们专注于二次相似性度量，并研究压缩率、序列长度和对压缩数据执行查询的可靠性之间的基本权衡。对于高斯源，我们表明当且仅当压缩率超过我们明确描述的给定阈值（识别率）时，才能可靠地回答查询。此外，当以大于识别率的速率执行压缩时，可以使得对压缩数据的查询的响应呈指数级可靠。我们给出了该指数的完整表征，它分别类似于信道和源编码中的误差和过度失真指数。对于一般源，我们证明，与经典压缩一样，高斯源需要具有给定方差的源中最大的压缩率。此外，还描述了一种稳健的方案，该方案可以为任何源分布实现最大速率。

更新日期：2015-05-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11