当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-07-08 , DOI: 10.1021/acs.jcim.0c00489
Japheth E Gado 1, 2 , Gregg T Beckham 2 , Christina M Payne 1
Affiliation  

Accurate prediction of the optimal catalytic temperature (Topt) of enzymes is vital in biotechnology, as enzymes with high Topt values are desired for enhanced reaction rates. Recently, a machine learning method (temperature optima for microorganisms and enzymes, TOME) for predicting Topt was developed. TOME was trained on a normally distributed data set with a median Topt of 37 °C and less than 5% of Topt values above 85 °C, limiting the method’s predictive capabilities for thermostable enzymes. Due to the distribution of the training data, the mean squared error on Topt values greater than 85 °C is nearly an order of magnitude higher than the error on values between 30 and 50 °C. In this study, we apply ensemble learning and resampling strategies that tackle the data imbalance to significantly decrease the error on high Topt values (>85 °C) by 60% and increase the overall R2 value from 0.527 to 0.632. The revised method, temperature optima for enzymes with resampling (TOMER), and the resampling strategies applied in this work are freely available to other researchers as Python packages on GitHub.

中文翻译:

通过重采样策略和集成学习改善酶的最佳温度预测。

精确预测酶的最佳催化温度(T opt)在生物技术中至关重要,因为需要具有高T opt值的酶来提高反应速率。最近,开发了用于预测T opt的机器学习方法(微生物和酶的最佳温度,TOME)。在正态分布的数据集上对TOME进行了训练,其中值T opt为37°C,而高于85°C的T opt值小于5%,这限制了该方法对热稳定酶的预测能力。由于训练数据的分布,T opt的均方误差高于85°C的温度值比30至50°C的温度值的误差高将近一个数量级。在这项研究中,我们采用整体学习和重采样策略来解决数据不平衡问题,以将高T opt值(> 85°C)上的误差显着降低60%,并将总R 2值从0.527增加到0.632。修改后的方法,带重采样的酶的温度最佳方法(TOMER)和这项工作中应用的重采样策略可在GitHub上以Python软件包的形式免费提供给其他研究人员。
更新日期:2020-08-24
down
wechat
bug