当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
R/PY-SUMMA: An R/Python Package for Unsupervised Ensemble Learning for Binary Classification Problems in Bioinformatics.
Journal of Computational Biology ( IF 1.7 ) Pub Date : 2020-09-04 , DOI: 10.1089/cmb.2019.0348
Mehmet Eren Ahsen 1 , Robert Vogel 2 , Gustavo A Stolovitzky 2
Affiliation  

The increasing availability of complex data in biology and medicine has promoted the use of machine learning in classification tasks to address important problems in translational and fundamental science. Two important obstacles, however, may limit the unraveling of the full potential of machine learning in these fields: the lack of generalization of the resulting models and the limited number of labeled data sets in some applications. To address these important problems, we developed an unsupervised ensemble algorithm called strategy for unsupervised multiple method aggregation (SUMMA). By virtue of being an ensemble method, SUMMA is more robust to generalization than the predictions it combines. By virtue of being unsupervised, SUMMA does not require labeled data. SUMMA receives as input predictions from a diversity of models and estimates their classification performance even when labeled data are unavailable. It then uses these performance estimates to combine these different predictions into an ensemble model. SUMMA can be applied to a variety of binary classification problems in bioinformatics including but not limited to gene network inference, cancer diagnostics, drug response prediction, somatic mutation, and differential expression calling. In this application note, we introduce the R/PY-SUMMA packages, available in R or Python, that implement the SUMMA algorithm.

中文翻译:

R/PY-SUMMA:用于生物信息学中二进制分类问题的无监督集成学习的 R/Python 包。

生物学和医学中复杂数据的可用性不断提高,促进了机器学习在分类任务中的使用,以解决转化和基础科学中的重要问题。然而,两个重要的障碍可能会限制机器学习在这些领域的全部潜力的发挥:结果模型缺乏泛化能力以及某些应用中标记数据集的数量有限。为了解决这些重要问题,我们开发了一种称为无监督多方法聚合策略 (SUMMA) 的无监督集成算法。由于是一种集成方法,SUMMA 对泛化的鲁棒性比它组合的预测更稳健。由于是无监督的,SUMMA 不需要标记数据。SUMMA 接收来自各种模型的预测作为输入,并估计它们的分类性能,即使在标记数据不可用时也是如此。然后使用这些性能估计将这些不同的预测组合成一个集成模型。SUMMA 可应用于生物信息学中的各种二元分类问题,包括但不限于基因网络推理、癌症诊断、药物反应预测、体细胞突变和差异表达调用。在本应用笔记中,我们介绍了 R/PY-SUMMA 包,可在 R 或 Python 中使用,用于实现 SUMMA 算法。SUMMA 可应用于生物信息学中的各种二元分类问题,包括但不限于基因网络推理、癌症诊断、药物反应预测、体细胞突变和差异表达调用。在本应用笔记中,我们介绍了 R/PY-SUMMA 包,可在 R 或 Python 中使用,用于实现 SUMMA 算法。SUMMA 可应用于生物信息学中的各种二元分类问题,包括但不限于基因网络推理、癌症诊断、药物反应预测、体细胞突变和差异表达调用。在本应用笔记中,我们介绍了 R/PY-SUMMA 包,可在 R 或 Python 中使用,用于实现 SUMMA 算法。
更新日期:2020-09-14
down
wechat
bug