当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributions and Power of Optimal Signal-Detection Statistics in Finite Case
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2020-01-16 , DOI: 10.1109/tsp.2020.2967179
Hong Zhang , Jiashun Jin , Zheyang Wu

For detecting weak and sparse signals by a set of nn input pp-values, the Higher Criticism (HC) type statistics, the Berk-Jones (B-J) type statistics, and the phi-divergence statistics have the equivalent asymptotic optimality as nn goes to infinity. However, they can have significantly different performance in practical data analysis, where nn is always finite and even very small. To address this problem in a broader context, this paper introduces a general family of goodness-of-fit statistics, called the gGOF, which unifies a broad signal-detection statistics including these optimal ones. Efficient and accurate analytical calculations for the distributions of the gGOF statistics are provided under arbitrary i.i.d. continuous models of the null and the alternative hypotheses. Based on that, a systematic power study reveals that in finite case, the number of signals is often more relevant than the signal proportion. The HC and the reverse HC have advantages for relatively sparser and denser signals, respectively, while the B-J is more robust. A general framework is given to apply the gGOF into data analysis based on the generalized linear models. An application to the SNP-set based genome-wide association study (GWAS) for Crohn's disease shows that these optimal statistics have a good potential for detecting novel disease genes with weak SNP effects. The calculations have been implemented into an R package SetTest and published on the CRAN.

中文翻译:


有限情况下最优信号检测统计量的分布和功效



为了通过一组 nn 输入 pp 值检测弱且稀疏的信号,较高批评 (HC) 类型统计量、Berk-Jones (BJ) 类型统计量和 phi 散度统计量具有等效的渐近最优性,因为 nn 变为无穷大。然而,它们在实际数据分析中可能具有显着不同的性能,其中 nn 始终是有限的,甚至非常小。为了在更广泛的背景下解决这个问题,本文引入了一个通用的拟合优度统计数据系列,称为 gGOF,它统一了包括这些最优信号检测统计数据在内的广泛信号检测统计数据。在零假设和替代假设的任意独立同分布连续模型下,提供了 gGOF 统计分布的高效且准确的分析计算。基于此,系统功率研究表明,在有限情况下,信号数量通常比信号比例更相关。 HC 和反向 HC 分别对于相对稀疏和密集的信号具有优势,而 BJ 则更加鲁棒。给出了将gGOF应用于基于广义线性模型的数据分析的通用框架。克罗恩病基于 SNP 集的全基因组关联研究 (GWAS) 的应用表明,这些最佳统计数据具有检测 SNP 效应较弱的新疾病基因的良好潜力。计算已实施到 R 包 SetTest 中并发布在 CRAN 上。
更新日期:2020-01-16
down
wechat
bug