当前位置: X-MOL 学术Adv. Phys. X › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Synergy of physics-based reasoning and machine learning in biomedical applications: towards unlimited deep learning with limited data
Advances in Physics: X ( IF 6 ) Pub Date : 2019-03-14 , DOI: 10.1080/23746149.2019.1582361
Valeriy Gavrishchaka 1 , Olga Senyukova 2 , Mark Koepke 1
Affiliation  

Technological advancements enable collecting vast data, i.e., Big Data, in science and industry including biomedical field. Increased computational power allows expedient analysis of collected data using statistical and machine-learning approaches. Historical data incompleteness problem and curse of dimensionality diminish practical value of pure data-driven approaches, especially in biomedicine. Advancements in deep learning (DL) frameworks based on deep neural networks (DNN) improved accuracy in image recognition, natural language processing, and other applications yet severe data limitations and/or absence of transfer-learning-relevant problems drastically reduce advantages of DNN-based DL. Our earlier works demonstrate that hierarchical data representation can be alternatively implemented without NN, using boosting-like algorithms for utilization of existing domain knowledge, tolerating significant data incompleteness, and boosting accuracy of low-complexity models within the classifier ensemble, as illustrated in physiological-data analysis. Beyond obvious use in initial-factor selection, existing simplified models are effectively employed for generation of realistic synthetic data for later DNN pre-training. We review existing machine learning approaches, focusing on limitations caused by training-data incompleteness. We outline our hybrid framework that leverages existing domain-expert models/knowledge, boosting-like model combination, DNN-based DL and other machine learning algorithms for drastic reduction of training-data requirements. Applying this framework is illustrated in context of analyzing physiological data.



中文翻译:

生物医学应用中基于物理的推理与机器学习的协同作用:利用有限的数据实现无限的深度学习

技术的进步使人们能够在科学和工业领域(包括生物医学领域)收集大量数据,即大数据。更高的计算能力允许使用统计和机器学习方法方便地分析收集的数据。历史数据不完整问题和维数诅咒削弱了纯数据驱动方法的实用价值,尤其是在生物医学中。基于深度神经网络(DNN)的深度学习(DL)框架的进步,提高了图像识别,自然语言处理和其他应用程序的准确性,但严重的数据限制和/或缺少与转移学习相关的问题,大大降低了DNN-基于DL。我们之前的工作表明,分层数据表示可以在没有NN的情况下实现,如生理数据分析中所示,使用类似Boosting的算法来利用现有领域知识,容忍重大数据不完整性以及提高分类器集合中低复杂度模型的准确性。除了在初始因素选择中的明显用途外,现有的简化模型还可以有效地用于生成实际的合成数据,以用于以后的DNN预训练。我们回顾了现有的机器学习方法,重点是由训练数据不完整引起的限制。我们概述了我们的混合框架,该框架利用现有的领域专家模型/知识,类似Boosting的模型组合,基于DNN的DL和其他机器学习算法来大幅减少训练数据的需求。在分析生理数据的背景下说明了该框架的应用。

更新日期:2019-03-14
down
wechat
bug