On the relation between the true and sample correlations under Bayesian modelling of gene expression datasets,Statistical Applications in Genetics and Molecular Biology

当前位置： X-MOL 学术 › Stat. Appl. Genet. Molecul. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the relation between the true and sample correlations under Bayesian modelling of gene expression datasets
Statistical Applications in Genetics and Molecular Biology ( IF 0.8 ) Pub Date : 2018-07-16 , DOI: 10.1515/sagmb-2017-0068
Royi Jacobovic ₁

Affiliation

The prediction of cancer prognosis and metastatic potential immediately after the initial diagnoses is a major challenge in current clinical research. The relevance of such a signature is clear, as it will free many patients from the agony and toxic side-effects associated with the adjuvant chemotherapy automatically and sometimes carelessly subscribed to them. Motivated by this issue, several previous works presented a Bayesian model which led to the following conclusion: thousands of samples are needed to generate a robust gene list for predicting outcome. This conclusion is based on existence of some statistical assumptions including asymptotic independence of sample correlations. The current work makes two main contributions: (1) It shows that while the assumptions of the Bayesian model discussed by previous papers seem to be non-restrictive, they are quite strong. To demonstrate this point, it is shown that some standard sparse and Gaussian models are not included in the set of models which are mathematically consistent with these assumptions. (2) It is shown that the empirical Bayes methodology which was applied in order to test the relevant assumptions does not detect severe violations and consequently an overestimation of the required sample size might be incurred. Finally, we suggest that under some regularity conditions it is possible that the current theoretical results can be used for development of a new method to test the asymptotic independence assumption.

中文翻译：

基因表达数据集贝叶斯建模下的真实相关性与样本相关性的关系

在初步诊断后立即预测癌症预后和转移潜能是当前临床研究的主要挑战。这种签名的相关性是显而易见的，因为它将使许多患者摆脱与辅助化疗相关的痛苦和毒副作用，自动地，有时甚至不经意地接受它们。受这个问题的启发，之前的几项工作提出了一个贝叶斯模型，该模型得出以下结论：需要数千个样本来生成一个可靠的基因列表来预测结果。这个结论是基于一些统计假设的存在，包括样本相关性的渐近独立性。目前的工作有两个主要贡献：（1）它表明，虽然以前的论文讨论的贝叶斯模型的假设似乎是非限制性的，他们非常强大。为了证明这一点，表明一些标准的稀疏和高斯模型不包括在与这些假设在数学上一致的模型集中。(2) 结果表明，为了测试相关假设而应用的经验贝叶斯方法没有检测到严重的违规行为，因此可能会高估所需的样本量。最后，我们建议在某些正则性条件下，现有的理论结果可以用于开发一种新的方法来检验渐近独立性假设。(2) 结果表明，为了测试相关假设而应用的经验贝叶斯方法没有检测到严重的违规行为，因此可能会高估所需的样本量。最后，我们建议在某些正则性条件下，现有的理论结果可以用于开发一种新的方法来检验渐近独立性假设。(2) 结果表明，为了测试相关假设而应用的经验贝叶斯方法没有检测到严重的违规行为，因此可能会高估所需的样本量。最后，我们建议在某些正则性条件下，现有的理论结果可以用于开发一种新的方法来检验渐近独立性假设。

更新日期：2018-07-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11