Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness.,IEEE/ACM Transactions on Computational Biology and Bioinformatics

当前位置： X-MOL 学术 › IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness.
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2018-07-23 , DOI: 10.1109/tcbb.2018.2858814
Ali Foroughi pour , Lori Dalton

Optimal Bayesian feature filtering (OBF) is a fast and memory-efficient algorithm that optimally identifies markers with distributional differences between treatment groups under Gaussian models. Here, we study the performance and robustness of OBF for biomarker discovery. Our contributions are twofold: (1) we examine how OBF performs on data that violates modeling assumptions, and (2) we provide guidelines on how to set input parameters for robust performance. Contribution (1) addresses an important, relevant, and commonplace problem in computational biology, where it is often impossible to validate an algorithm's core assumptions. To accomplish both tasks, we present a battery of simulations that implement OBF with different inputs and challenge each assumption made by OBF. In particular, we examine the robustness of OBF with respect to incorrect input parameters, false independence, imbalanced sample size, and we address the Gaussianity assumption by considering performance on an extensive family of non-Gaussian distributions. We address advantages and disadvantages between different priors and optimization criteria throughout. Finally, we evaluate the utility of OBF in biomarker discovery using acute myeloid leukemia (AML) and colon cancer microarray datasets, and show that OBF is successful at identifying well-known biomarkers for these diseases that rank low under moderated t-test.

中文翻译：

用于生物标记物发现的最佳贝叶斯滤波：性能和鲁棒性。

最佳贝叶斯特征过滤（OBF）是一种快速且内存效率高的算法，可在高斯模型下最佳地识别在治疗组之间具有分布差异的标记。在这里，我们研究了生物标记物发现的OBF的性能和鲁棒性。我们的贡献是双重的：（1）我们检查OBF如何处理违反建模假设的数据，（2）提供有关如何设置输入参数以增强性能的指南。贡献（1）解决了计算生物学中的一个重要，相关且司空见惯的问题，在该问题中，通常无法验证算法的核心假设。为了完成这两项任务，我们提出了一系列模拟，这些模拟使用不同的输入来实现OBF，并挑战了OBF所做的每个假设。特别是，我们针对错误的输入参数，错误的独立性，不均衡的样本大小检查了OBF的鲁棒性，并通过考虑在广泛的非高斯分布族上的性能来解决高斯假设。我们始终解决不同先验和优化标准之间的优缺点。最后，我们使用急性髓细胞性白血病（AML）和结肠癌微阵列数据集评估了OBF在生物标志物发现中的效用，并显示OBF成功地为这些疾病确定了知名的生物标志物，这些疾病在中度t检验下排名较低。我们始终解决不同先验和优化标准之间的优缺点。最后，我们使用急性髓细胞性白血病（AML）和结肠癌微阵列数据集评估了OBF在生物标志物发现中的效用，并显示OBF成功地为这些疾病确定了知名的生物标志物，这些疾病在中度t检验下排名较低。我们始终解决不同先验和优化标准之间的优缺点。最后，我们使用急性髓细胞性白血病（AML）和结肠癌微阵列数据集评估了OBF在生物标志物发现中的效用，并显示OBF成功地为这些疾病确定了知名的生物标志物，这些疾病在中度t检验下排名较低。

更新日期：2020-03-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文