当前位置: X-MOL 学术Stat. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rank-based Bayesian variable selection for genome-wide transcriptomic analyses
Statistics in Medicine ( IF 1.8 ) Pub Date : 2022-07-18 , DOI: 10.1002/sim.9524
Emilie Eliseussen 1 , Thomas Fleischer 2 , Valeria Vitelli 1
Affiliation  

Variable selection is crucial in high-dimensional omics-based analyses, since it is biologically reasonable to assume only a subset of non-noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank-based unsupervised transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. Simulation studies demonstrate the versatility and robustness of the proposed method in a variety of scenarios, as well as its superiority with respect to several competitors when varying the data dimension or data generating process. We use the novel approach to analyze genome-wide RNAseq gene expression data from ovarian cancer patients: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the usefulness of the method in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation.

中文翻译:

基于等级的贝叶斯变量选择用于全基因组转录组分析

变量选择在基于高维组学的分析中至关重要,因为假设只有一部分非噪声特征对数据结构有贡献在生物学上是合理的。然而,这项任务在无监督的环境中特别困难,尽管存在明显的缺点和缺乏可重复性,但先验的临时变量选择仍然是一种非常常用的方法。我们提出了一种用于基于等级的无监督转录组学分析的贝叶斯变量选择方法。与经典统计方法相比,使用数据排名而不是实际的连续测量增加了结论的稳健性,并且将变量选择嵌入到推理任务中可以实现完全的重现性。具体来说,我们开发了用于变量选择的贝叶斯锦葵模型的新扩展,允许进行完整的概率分析,从而导致不确定性的连贯量化。仿真研究证明了所提出方法在各种场景中的通用性和稳健性,以及在改变数据维度或数据生成过程时相对于几个竞争对手的优势。我们使用这种新颖的方法来分析来自卵巢癌患者的全基因组 RNAseq 基因表达数据:以完全无人监督的方式正确检测了几个影响癌症发展的基因,显示了该方法在癌症基因组学特征发现背景下的实用性。此外,进行不确定性量化的可能性在随后的生物学研究中起着关键作用。
更新日期:2022-07-18
down
wechat
bug