当前位置: X-MOL 学术Front. Bioeng. Biotech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimization of a Soft Ensemble Vote Classifier for the Prediction of Chimeric Virus-Like Particle Solubility and Other Biophysical Properties
Frontiers in Bioengineering and Biotechnology ( IF 5.7 ) Pub Date : 2020-07-31 , DOI: 10.3389/fbioe.2020.00881
Philipp Vormittag 1 , Thorsten Klamp 2 , Jürgen Hubbuch 1
Affiliation  

Chimeric virus-like particles (cVLPs) are protein-based nanostructures applied as investigational vaccines against infectious diseases, cancer, and immunological disorders. Low solubility of cVLP vaccine candidates is a challenge that can prevent development of these very substances. Solubility of cVLPs is typically assessed empirically, leading to high time and material requirements. Prediction of cVLP solubility in silico can aid in reducing this effort. Protein aggregation by hydrophobic interaction is an important factor driving protein insolubility. In this article, a recently developed soft ensemble vote classifier (sEVC) for the prediction of cVLP solubility was used based on 91 literature amino acid hydrophobicity scales. Optimization algorithms were developed to boost model performance, and the model was redesigned as a regression tool for ammonium sulfate concentration required for cVLP precipitation. The present dataset consists of 568 cVLPs, created by insertion of 71 different peptide sequences using eight different insertion strategies. Two optimization algorithms were developed that (I) modified the sEVC with regard to systematic misclassification based on the different insertion strategies, and (II) modified the amino acid hydrophobicity scale tables to improve classification. The second algorithm was additionally used to synthesize scales from random vectors. Compared to the unmodified model, Matthew’s Correlation Coefficient (MCC), and accuracy of the test set predictions could be elevated from 0.63 and 0.81 to 0.77 and 0.88, respectively, for the best models. This improved performance compared to literature scales was suggested to be due to a decreased correlation between synthesized scales. In these, tryptophan was identified as the most hydrophobic amino acid, i.e., the amino acid most problematic for cVLP solubility, supported by previous literature findings. As a case study, the sEVC was redesigned as a regression tool and applied to determine ammonium sulfate concentrations for the precipitation of cVLPs. This was evaluated with a small dataset of ten cVLPs resulting in an R2 of 0.69. In summary, we propose optimization algorithms that improve sEVC model performance for the prediction of cVLP solubility, allow for the synthesis of amino acid scale tables, and further evaluate the sEVC as regression tool to predict cVLP-precipitating ammonium sulfate concentrations.

中文翻译:

用于预测嵌合病毒样颗粒溶解度和其他生物物理特性的软集成投票分类器的优化

嵌合病毒样颗粒 (cVLP) 是基于蛋白质的纳米结构,可用作针对传染病、癌症和免疫疾病的研究疫苗。cVLP 候选疫苗的低溶解度是一个挑战,可以阻止这些物质的发展。cVLP 的溶解度通常是凭经验评估的,这会导致对时间和材料的要求很高。预测 cVLP 在硅胶中的溶解度有助于减少这种努力。疏水相互作用引起的蛋白质聚集是导致蛋白质不溶性的重要因素。在本文中,基于 91 个文献氨基酸疏水性尺度,使用最近开发的软集成投票分类器 (sEVC) 来预测 cVLP 溶解度。开发了优化算法以提高模型性能,该模型被重新设计为 cVLP 沉淀所需硫酸铵浓度的回归工具。本数据集由 568 个 cVLP 组成,通过使用八种不同的插入策略插入 71 个不同的肽序列而创建。开发了两种优化算法,(I) 根据不同的插入策略,针对系统性错误分类修改 sEVC,以及 (II) 修改氨基酸疏水性尺度表以改进分类。第二种算法还用于从随机向量合成尺度。与未修改的模型相比,马修的相关系数 (MCC) 和测试集预测的准确性可以分别从 0.63 和 0.81 提高到最佳模型的 0.77 和 0.88。与文献量表相比,这种改进的性能被认为是由于合成量表之间的相关性降低。其中,色氨酸被鉴定为疏水性最强的氨基酸,即对 cVLP 溶解性最有问题的氨基酸,这得到了先前文献发现的支持。作为案例研究,sEVC 被重新设计为回归工具,并用于确定 cVLP 沉淀的硫酸铵浓度。这是用十个 cVLP 的小数据集进行评估的,结果 R2 为 0.69。总之,我们提出了优化算法,以提高 sEVC 模型性能以预测 cVLP 溶解度,允许合成氨基酸比例表,并进一步评估 sEVC 作为预测 cVLP 沉淀硫酸铵浓度的回归工具。
更新日期:2020-07-31
down
wechat
bug