当前位置:
X-MOL 学术
›
arXiv.cs.MS
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
mlf-core: a framework for deterministic machine learning
arXiv - CS - Mathematical Software Pub Date : 2021-04-15 , DOI: arxiv-2104.07651 Lukas Heumos, Philipp Ehmele, Kevin Menden, Luis Kuhn Cuellar, Edmund Miller, Steffen Lemke, Gisela Gabernet, Sven Nahnsen
arXiv - CS - Mathematical Software Pub Date : 2021-04-15 , DOI: arxiv-2104.07651 Lukas Heumos, Philipp Ehmele, Kevin Menden, Luis Kuhn Cuellar, Edmund Miller, Steffen Lemke, Gisela Gabernet, Sven Nahnsen
Machine learning has shown extensive growth in recent years. However,
previously existing studies highlighted a reproducibility crisis in machine
learning. The reasons for irreproducibility are manifold. Major machine
learning libraries default to the usage of non-deterministic algorithms based
on atomic operations. Solely fixing all random seeds is not sufficient for
deterministic machine learning. To overcome this shortcoming, various machine
learning libraries released deterministic counterparts to the non-deterministic
algorithms. We evaluated the effect of these algorithms on determinism and
runtime. Based on these results, we formulated a set of requirements for
reproducible machine learning and developed a new software solution, the
mlf-core ecosystem, which aids machine learning projects to meet and keep these
requirements. We applied mlf-core to develop fully reproducible models in
various biomedical fields including a single cell autoencoder with TensorFlow,
a PyTorch-based U-Net model for liver-tumor segmentation in CT scans, and a
liver cancer classifier based on gene expression profiles with XGBoost.
中文翻译:
mlf-core:确定性机器学习的框架
近年来,机器学习已显示出广泛的增长。但是,先前已有的研究强调了机器学习中的可再现性危机。无法再现的原因是多方面的。主要的机器学习库默认使用基于原子操作的非确定性算法。仅固定所有随机种子不足以进行确定性机器学习。为了克服这一缺点,各种机器学习库发布了与非确定性算法相对应的确定性副本。我们评估了这些算法对确定性和运行时间的影响。基于这些结果,我们为可再现的机器学习制定了一系列要求,并开发了一种新的软件解决方案mlf-core生态系统,该解决方案有助于机器学习项目满足并保持这些要求。
更新日期:2021-04-16
中文翻译:
mlf-core:确定性机器学习的框架
近年来,机器学习已显示出广泛的增长。但是,先前已有的研究强调了机器学习中的可再现性危机。无法再现的原因是多方面的。主要的机器学习库默认使用基于原子操作的非确定性算法。仅固定所有随机种子不足以进行确定性机器学习。为了克服这一缺点,各种机器学习库发布了与非确定性算法相对应的确定性副本。我们评估了这些算法对确定性和运行时间的影响。基于这些结果,我们为可再现的机器学习制定了一系列要求,并开发了一种新的软件解决方案mlf-core生态系统,该解决方案有助于机器学习项目满足并保持这些要求。