当前位置: X-MOL 学术Stat. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal allocation of Monte Carlo simulations to multiple hypothesis tests
Statistics and Computing ( IF 1.6 ) Pub Date : 2019-10-05 , DOI: 10.1007/s11222-019-09906-9
Georg Hahn

Multiple hypothesis tests are often carried out in practice using p-value estimates obtained with bootstrap or permutation tests since the analytical p-values underlying all hypotheses are usually unknown. This article considers the allocation of a pre-specified total number of Monte Carlo simulations \(K \in \mathbb {N}\) (i.e., permutations or draws from a bootstrap distribution) to a given number of \(m \in \mathbb {N}\) hypotheses in order to approximate their p-values \(p \in [0,1]^m\) in an optimal way, in the sense that the allocation minimises the total expected number of misclassified hypotheses. A misclassification occurs if a decision on a single hypothesis, obtained with an approximated p-value, differs from the one obtained if its p-value was known analytically. The contribution of this article is threefold: under the assumption that p is known and \(K \in \mathbb {R}\), and using a normal approximation of the Binomial distribution, the optimal real-valued allocation of K simulations to m hypotheses is derived when correcting for multiplicity with the Bonferroni correction, both when computing the p-value estimates with or without a pseudo-count. Computational subtleties arising in the former case will be discussed. Second, with the help of an algorithm based on simulated annealing, empirical evidence is given that the optimal integer allocation is likely of the same form as the optimal real-valued allocation, and that both seem to coincide asympotically. Third, an empirical study on simulated and real data demonstrates that a recently proposed sampling algorithm based on Thompson sampling asympotically mimics the optimal (real-valued) allocation when the p-values are unknown and thus estimated at runtime.

中文翻译:

蒙特卡洛模拟对多个假设检验的最优分配

在实践中,通常使用自举检验或置换检验获得的p值估算值进行多个假设检验,因为通常假设所有假设的分析p值都是未知的。本文考虑将预先指定的总数的Monte Carlo模拟\(K \ in \ mathbb {N} \)(即,从引导分布进行排列或绘制)分配给给定数量的\(m \ in \ mathbb {N} \)假说以接近其p -值\(p \在[0,1] ^ M \)从某种意义上说,分配可以最大程度地减少误分类假设的总预期数量。如果使用近似p值获得的单个假设的决策不同于通过解析已知其p值获得的假设,则会发生分类错误。本文的贡献是三方面的:假设p是已知的并且\(K \ in \ mathbb {R} \),并使用二项分布的正态近似,则将K模拟的最优实值分配给m在计算p时用Bonferroni校正校正多重性时得出假设带有或不带有伪计数的值估计。将讨论在前一种情况下产生的计算细节。其次,借助基于模拟退火的算法,经验证据表明,最佳整数分配可能与最佳实值分配具有相同的形式,并且两者似乎是渐近一致的。第三,对模拟和真实数据的经验研究表明,最近提出的基于Thompson采样的采样算法在p值未知并因此在运行时进行估计时,渐近地模拟了最佳(实值)分配。
更新日期:2019-10-05
down
wechat
bug