Minimax Estimation of Functionals of Discrete Distributions,IEEE Transactions on Information Theory

当前位置： X-MOL 学术 › IEEE Trans. Inform. Theory › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Minimax Estimation of Functionals of Discrete Distributions
IEEE Transactions on Information Theory ( IF 2.2 ) Pub Date : 2015-05-01 , DOI: 10.1109/tit.2015.2412945
Jiantao Jiao ₁ , Kartik Venkat ₁ , Yanjun Han ₂ , Tsachy Weissman ₁

Affiliation

We propose a general methodology for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the support size S is unknown and may be comparable with or even much larger than the number of observations n. We treat the respective regions where the functional is nonsmooth and smooth separately. In the nonsmooth regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the smooth regime, we apply a bias-corrected version of the maximum likelihood estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing the performance of the resulting schemes for estimating two important information measures: 1) the entropy H(P) = Σ^S_i=1 -p_i ln p_i and 2) F_α(P) = Σ^S_i=1 p^α_i, α > 0. We obtain the minimax L₂ rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity n × S/ln S for entropy estimation. We also demonstrate that the sample complexity for estimating F_α(P), 0 <; α <; 1, is n×S^1/α/ln S, which can be achieved by our estimator but not the MLE. For 1 <; α <; 3/2, we show the minimax L₂ rate for estimating F_α(P) is (n ln n)^-2(α-1) for infinite support size, while the maximum L₂ rate for the MLE is n^-2(α-1). For all the above cases, the behavior of the minimax rate-optimal estimators with n samples is essentially that of the MLE (plug-in rule) with n ln n samples, which we term “effective sample size enlargement.” We highlight the practical advantages of our schemes for the estimation of entropy and mutual information. We compare our performance with various existing approaches, and demonstrate that our approach reduces running time and boosts the accuracy. Moreover, we show that the minimax rate-optimal mutual information estimator yielded by our framework leads to significant performance boosts over the Chow-Liu algorithm in learning graphical models. The wide use of information measure estimation suggests that the insights and estimators obtained in this paper could be broadly applicable.

中文翻译：

离散分布泛函的极小极大估计

我们提出了一种通用方法，用于构建和分析大量有限维参数泛函的极大极小估计量，并详细说明了离散分布的情况，其中支持大小 S 未知并且可能与此相当甚至更大比观察次数n。我们分别处理泛函不光滑和光滑的各个区域。在非平滑状态下，我们应用无偏估计量来获得函数的最佳多项式近似，而在平滑状态下，我们应用最大似然估计量 (MLE) 的偏差校正版本。我们通过彻底分析用于估计两个重要信息度量的结果方案的性能来说明这种方法的优点：1) 熵 H(P) = Σ ^S_i=1 -p _i ln p _i和 2) F _α (P) = Σ ^S_i=1 p ^α_i，α > 0。我们获得了用于估计这些泛函的极小极大 L ₂率。特别是，我们证明了我们的估计器实现了熵估计的最佳样本复杂度 n × S/ln S。我们还证明了估计 F _α (P), 0 <; α <; 1，是 n×S ^1/α /ln S，这可以通过我们的估计器实现，但不能通过 MLE 实现。对于 1 <; α <; 3/2，我们展示了用于估计 F _α (P)的极小极大 L ₂比率是 (n ln n) ^-2(α-1)对于无限支持大小，而最大 L₂ MLE 的比率是 n ^-2(α-1). 对于上述所有情况，具有 n 个样本的极小极大速率最优估计器的行为本质上是具有 n ln n 个样本的 MLE（插件规则）的行为，我们称之为“有效样本量扩大”。我们强调了我们的方案在估计熵和互信息方面的实际优势。我们将我们的性能与各种现有方法进行比较，并证明我们的方法减少了运行时间并提高了准确性。此外，我们表明，我们的框架产生的极小极大速率最优互信息估计器在学习图形模型方面比 Chow-Liu 算法有显着的性能提升。信息度量估计的广泛使用表明，本文获得的见解和估计量可以广泛适用。

更新日期：2015-05-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11