The Rate-Distortion Risk in Estimation From Compressed Data,IEEE Transactions on Information Theory

当前位置： X-MOL 学术 › IEEE Trans. Inform. Theory › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Rate-Distortion Risk in Estimation From Compressed Data
IEEE Transactions on Information Theory ( IF 2.2 ) Pub Date : 2021-03-26 , DOI: 10.1109/tit.2021.3068981
Alon Kipnis , Stefano Rini , Andrea J. Goldsmith

Consider the problem of estimating a latent signal from a lossy compressed version of the data when the compressor is agnostic to the relation between the signal and the data. This situation arises in a host of modern applications when data is transmitted or stored prior to determining the downstream inference task. Given a bitrate constraint and a distortion measure between the data and its compressed version, let us consider the joint distribution achieving Shannon’s rate-distortion (RD) function. Given an estimator and a loss function associated with the downstream inference task, define the RD risk as the expected loss under the RD-achieving distribution. We provide general conditions under which the operational risk in estimating from the compressed data is asymptotically equivalent to the RD risk. The main theoretical tools to prove this equivalence are transportation-cost inequalities in conjunction with properties of compression codes achieving Shannon’s RD function. Whenever such equivalence holds, a recipe for designing estimators from datasets undergoing lossy compression without specifying the actual compression technique emerges: design the estimator to minimize the RD risk. Our conditions are simplified in the special cases of discrete memoryless or multivariate normal data. For these scenarios, we derive explicit expressions for the RD risk of several estimators and compare them to the optimal source coding performance associated with full knowledge of the relation between the latent signal and the data.

中文翻译：

压缩数据估计中的速率失真风险

考虑当压缩器不可知信号和数据之间的关系时，根据数据的有损压缩版本来估计潜在信号的问题。在确定下游推理任务之前传输或存储数据时，在许多现代应用程序中都会出现这种情况。给定数据及其压缩版本之间的比特率约束和失真度量，让我们考虑实现香农速率失真（RD）功能的联合分布。给定与下游推理任务相关的估计器和损失函数，将RD风险定义为实现RD的分布下的预期损失。我们提供了一般条件，在这些条件下，根据压缩数据估算的操作风险渐近等于RD风险。证明这种等效性的主要理论工具是运输成本不平等以及实现Shannon RD功能的压缩代码的属性。只要保持这样的等价关系，就会出现一种在不指定实际压缩技术的情况下从经历有损压缩的数据集中设计估算器的方法：设计估算器以最大程度地降低RD风险。在离散的无记忆或多元正常数据的特殊情况下，我们的条件得到了简化。对于这些情况，我们导出了几个估计量的RD风险的显式表达式，并将它们与与潜在信号和数据之间的关系的充分了解相关的最佳源编码性能进行了比较。只要保持这样的等价关系，就会出现一种在不指定实际压缩技术的情况下从经历有损压缩的数据集中设计估算器的方法：设计估算器以最大程度地降低RD风险。在离散的无记忆或多元正常数据的特殊情况下，我们的条件得到了简化。对于这些情况，我们导出了几个估计量的RD风险的显式表达式，并将它们与与潜在信号和数据之间的关系的充分了解相关的最佳源编码性能进行了比较。只要保持这样的等价关系，就会出现一种在不指定实际压缩技术的情况下从经历有损压缩的数据集中设计估算器的方法：设计估算器以最大程度地降低RD风险。在离散的无记忆或多元正常数据的特殊情况下，我们的条件得到了简化。对于这些情况，我们导出了几个估计量的RD风险的显式表达式，并将它们与与潜在信号和数据之间的关系的充分了解相关的最佳源编码性能进行了比较。在离散的无记忆或多元正常数据的特殊情况下，我们的条件得到了简化。对于这些情况，我们导出了几个估计量的RD风险的显式表达式，并将它们与与潜在信号和数据之间的关系的充分了解相关的最佳源编码性能进行了比较。在离散的无记忆或多元正常数据的特殊情况下，我们的条件得到了简化。对于这些情况，我们导出了几个估计量的RD风险的显式表达式，并将它们与与潜在信号和数据之间的关系的充分了解相关的最佳源编码性能进行了比较。

更新日期：2021-04-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11