mlf-core: a framework for deterministic machine learning,arXiv - CS - Mathematical Software

当前位置： X-MOL 学术 › arXiv.cs.MS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

mlf-core: a framework for deterministic machine learning
arXiv - CS - Mathematical Software Pub Date : 2021-04-15 , DOI: arxiv-2104.07651
Lukas Heumos, Philipp Ehmele, Kevin Menden, Luis Kuhn Cuellar, Edmund Miller, Steffen Lemke, Gisela Gabernet, Sven Nahnsen

Machine learning has shown extensive growth in recent years. However, previously existing studies highlighted a reproducibility crisis in machine learning. The reasons for irreproducibility are manifold. Major machine learning libraries default to the usage of non-deterministic algorithms based on atomic operations. Solely fixing all random seeds is not sufficient for deterministic machine learning. To overcome this shortcoming, various machine learning libraries released deterministic counterparts to the non-deterministic algorithms. We evaluated the effect of these algorithms on determinism and runtime. Based on these results, we formulated a set of requirements for reproducible machine learning and developed a new software solution, the mlf-core ecosystem, which aids machine learning projects to meet and keep these requirements. We applied mlf-core to develop fully reproducible models in various biomedical fields including a single cell autoencoder with TensorFlow, a PyTorch-based U-Net model for liver-tumor segmentation in CT scans, and a liver cancer classifier based on gene expression profiles with XGBoost.

中文翻译：

mlf-core：确定性机器学习的框架

近年来，机器学习已显示出广泛的增长。但是，先前已有的研究强调了机器学习中的可再现性危机。无法再现的原因是多方面的。主要的机器学习库默认使用基于原子操作的非确定性算法。仅固定所有随机种子不足以进行确定性机器学习。为了克服这一缺点，各种机器学习库发布了与非确定性算法相对应的确定性副本。我们评估了这些算法对确定性和运行时间的影响。基于这些结果，我们为可再现的机器学习制定了一系列要求，并开发了一种新的软件解决方案mlf-core生态系统，该解决方案有助于机器学习项目满足并保持这些要求。

更新日期：2021-04-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文