当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-Bayesian Parametric Missing-Mass Estimation
IEEE Transactions on Signal Processing ( IF 5.4 ) Pub Date : 2022-06-27 , DOI: 10.1109/tsp.2022.3186176
Shir Cohen 1 , Tirza Routtenberg 1 , Lang Tong 2
Affiliation  

We consider the classical problem of missing-mass estimation, which deals with estimating the total probability of unseen elements in a sample. The missing-mass estimation problem has various applications in machine learning, statistics, language processing, ecology, sensor networks, and others. The naive, constrained maximum likelihood (CML) estimator is inappropriate for this problem since it tends to overestimate the probability of the observed elements. Similarly, the constrained Cramér-Rao bound (CCRB), which is a lower bound on the mean-squared-error (MSE) of unbiased estimators of the entire probability mass function (pmf) vector, does not provide a relevant bound for missing-mass estimation. In this paper, we introduce a non-Bayesian parametric model of the problem of missing-mass estimation. We introduce the concept of missing-mass unbiasedness by using the Lehmann unbiasedness definition. We derive a non-Bayesian CCRB-type lower bound on the missing-mass MSE (mmMSE), named the missing-mass CCRB (mmCCRB), based on the missing-mass unbiasedness. The proposed mmCCRB can be used for system design and for the performance evaluation of existing estimators. Moreover, based on the mmCCRB, we propose a new method to improve estimators by an iterative missing-mass Fisher-scoring method. Finally, we demonstrate via numerical simulations that the biased mmCCRB is a valid and informative lower bound on the mmMSE of state-of-the-art estimators for this problem: the CML, asymptotic profile maximum likelihood (aPML), Good-Turing, and Laplace estimators. We also show that the mmMSE and missing-mass bias of the Laplace estimator is reduced by using the new missing-mass Fisher-scoring method.

中文翻译:

非贝叶斯参数缺失质量估计

我们考虑缺失质量估计的经典问题,该问题涉及估计样本中未见元素的总概率。缺失质量估计问题在机器学习、统计学、语言处理、生态学、传感器网络等领域有多种应用。朴素的、受约束的最大似然 (CML) 估计器不适用于这个问题,因为它往往会高估观察到的元素的概率。类似地,受约束的 Cramér-Rao 界 (CCRB) 是整个概率质量函数 (pmf) 向量的无偏估计量的均方误差 (MSE) 的下界,它没有为缺失提供相关界 -质量估计。在本文中,我们介绍了缺失质量估计问题的非贝叶斯参数模型。我们通过使用 Lehmann 无偏性定义来引入缺失质量无偏性的概念。我们基于缺失质量无偏性推导出了缺失质量 MSE (mmMSE) 的非贝叶斯 CCRB 类型下界,称为缺失质量 CCRB (mmCCRB)。提议的 mmCCRB 可用于系统设计和现有估计器的性能评估。此外,基于 mmCCRB,我们提出了一种通过迭代缺失质量 Fisher 评分方法改进估计量的新方法。最后,我们通过数值模拟证明有偏的 mmCCRB 是该问题的最先进估计器的 mmMSE 的有效且信息丰富的下界:CML、渐近轮廓最大似然 (aPML)、Good-Turing 和拉普拉斯估计。
更新日期:2022-06-27
down
wechat
bug