Two-Component Mixture Model in the Presence of Covariates,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two-Component Mixture Model in the Presence of Covariates
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2021-04-06 , DOI: 10.1080/01621459.2021.1888739
Nabarun Deb ₁ , Sujayam Saha ₂ , Adityanand Guntuboyina ₃ , Bodhisattva Sen ₁

Affiliation

Abstract

In this article, we study a generalization of the two-groups model in the presence of covariates—a problem that has recently received much attention in the statistical literature due to its applicability in multiple hypotheses testing problems. The model we consider allows for infinite dimensional parameters and offers flexibility in modeling the dependence of the response on the covariates. We discuss the identifiability issues arising in this model and systematically study several estimation strategies. We propose a tuning parameter-free nonparametric maximum likelihood method, implementable via the expectation-maximization algorithm, to estimate the unknown parameters. Further, we derive the rate of convergence of the proposed estimators—in particular we show that the finite sample Hellinger risk for every ‘approximate’ nonparametric maximum likelihood estimator achieves a near-parametric rate (up to logarithmic multiplicative factors). In addition, we propose and theoretically study two ‘marginal’ methods that are more scalable and easily implementable. We demonstrate the efficacy of our procedures through extensive simulation studies and relevant data analyses—one arising from neuroscience and the other from astronomy. We also outline the application of our methods to multiple testing. The companion R package NPMLEmix implements all the procedures proposed in this article.

中文翻译：

存在协变量的二元混合物模型

摘要

在这篇文章中，我们研究了存在协变量时两组模型的泛化——这个问题最近在统计文献中受到很多关注，因为它适用于多个假设检验问题。我们考虑的模型允许无限维参数，并在建模响应对协变量的依赖性方面提供了灵活性。我们讨论了该模型中出现的可识别性问题，并系统地研究了几种估计策略。我们提出了一种无调整参数的非参数最大似然法，可通过期望最大化算法实现，以估计未知参数。进一步，我们推导出所提出的估计量的收敛率——特别是我们表明，每个“近似”非参数最大似然估计量的有限样本 Hellinger 风险都达到了接近参数的速率（高达对数乘法因子）。此外，我们提出并从理论上研究了两种更具可扩展性和易于实施的“边际”方法。我们通过广泛的模拟研究和相关数据分析证明了我们程序的有效性——一个来自神经科学，另一个来自天文学。我们还概述了我们的方法在多重测试中的应用。配套的 R 包 NPMLEmix 实现了本文中提出的所有过程。我们提出并从理论上研究了两种更具可扩展性和易于实施的“边际”方法。我们通过广泛的模拟研究和相关数据分析证明了我们程序的有效性——一个来自神经科学，另一个来自天文学。我们还概述了我们的方法在多重测试中的应用。配套的 R 包 NPMLEmix 实现了本文中提出的所有过程。我们提出并从理论上研究了两种更具可扩展性和易于实施的“边际”方法。我们通过广泛的模拟研究和相关数据分析证明了我们程序的有效性——一个来自神经科学，另一个来自天文学。我们还概述了我们的方法在多重测试中的应用。配套的 R 包 NPMLEmix 实现了本文中提出的所有过程。

更新日期：2021-04-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>