当前位置: X-MOL 学术Metab. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrated knowledge mining, genome-scale modeling, and machine learning for predicting Yarrowia lipolytica bioproduction
Metabolic Engineering ( IF 8.4 ) Pub Date : 2021-07-07 , DOI: 10.1016/j.ymben.2021.07.003
Jeffrey J Czajka 1 , Tolutola Oyetunde 2 , Yinjie J Tang 1
Affiliation  

Predicting bioproduction titers from microbial hosts has been challenging due to complex interactions between microbial regulatory networks, stress responses, and suboptimal cultivation conditions. This study integrated knowledge mining, feature extraction, genome-scale modeling (GSM), and machine learning (ML) to develop a model for predicting Yarrowia lipolytica chemical titers (i.e., organic acids, terpenoids, etc.). First, Y. lipolytica production data, including cultivation conditions, genetic engineering strategies, and product information, was manually collected from literature (~100 papers) and stored as either numerical (e.g., substrate concentrations) or categorical (e.g., bioreactor modes) variables. For each case recorded, central pathway fluxes were estimated using GSMs and flux balance analysis (FBA) to provide metabolic features. Second, a ML ensemble learner was trained to predict strain production titers. Accurate predictions on the test data were obtained for instances with production titers >1 g/L (R2 = 0.87). However, the model had reduced predictability for low performance strains (0.01–1 g/L, R2 = 0.29) potentially due to biosynthesis bottlenecks not captured in the features. Feature ranking indicated that the FBA fluxes, the number of enzyme steps, the substrate inputs, and thermodynamic barriers (i.e., Gibbs free energy of reaction) were the most influential factors. Third, the model was evaluated on other oleaginous yeasts and indicated there were conserved features for some hosts that can be potentially exploited by transfer learning. The platform was also designed to assist computational strain design tools (such as OptKnock) to screen genetic targets for improved microbial production in light of experimental conditions.



中文翻译:

用于预测解脂耶氏酵母生物生产的综合知识挖掘、基因组规模建模和机器学习

由于微生物调控网络、应激反应和次优培养条件之间的复杂相互作用,预测微生物宿主的生物生产滴度一直具有挑战性。该研究整合了知识挖掘、特征提取、基因组规模建模 (GSM) 和机器学习 (ML),以开发预测解脂耶氏酵母化学效价(即有机酸、萜类化合物等)的模型。一、解脂耶氏酵母生产数据,包括栽培条件、基因工程策略和产品信息,是从文献(约 100 篇论文)中手动收集的,并存储为数字(例如,底物浓度)或分类(例如,生物反应器模式)变量。对于记录的每个案例,使用 GSM 和通量平衡分析 (FBA) 估计中心通路通量以提供代谢特征。其次,训练一个 ML 集成学习器来预测菌株生产滴度。对于生产滴度 >1 g/L (R 2  = 0.87) 的实例,获得了对测试数据的准确预测。然而,该模型降低了对低性能菌株(0.01–1 g/L,R 2 = 0.29) 可能是由于特征中未捕获的生物合成瓶颈。特征排序表明,FBA 通量、酶步数、底物输入和热力学障碍(即反应的吉布斯自由能)是最有影响的因素。第三,该模型在其他产油酵母上进行了评估,并表明某些宿主的保守特征可能被转移学习利用。该平台还旨在协助计算菌株设计工具(如 OptKnock)根据实验条件筛选遗传目标以改善微生物生产。

更新日期:2021-07-14
down
wechat
bug