当前位置: X-MOL 学术BBA Gen. Subj. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets.
Biochimica et Biophysica Acta (BBA) - General Subjects ( IF 3 ) Pub Date : 2020-01-16 , DOI: 10.1016/j.bbagen.2020.129535
Michelle P Aranha 1 , Catherine Spooner 2 , Omar Demerdash 3 , Bogdan Czejdo 2 , Jeremy C Smith 1 , Julie C Mitchell 4
Affiliation  

Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machine classifier (SVMC) models on physicochemical sequence-based and structure-based descriptor sets to predict peptide binding to a well-studied model mouse MHC I allele, H-2Db. Recursive feature elimination and two-way forward feature selection were also performed. Although low on sensitivity compared to the current state-of-the-art algorithms, models based on physicochemical descriptor sets achieve specificity and precision comparable to the most popular sequence-based algorithms. The best-performing model is a hybrid descriptor set containing both sequence-based and structure-based descriptors. Interestingly, close to half of the physicochemical sequence-based descriptors remaining in the hybrid model were properties of the anchor positions, residues 5 and 9 in the peptide sequence. In contrast, residues flanking position 5 make little to no residue-specific contribution to the binding affinity prediction. The results suggest that machine-learned models incorporating both sequence-based descriptors and structural data may provide information on specific physicochemical properties determining binding affinities.

中文翻译:

使用基于序列和结构的特征集的机器学习预测肽与MHC的结合。

选择与主要组织相容性复合物(MHC)牢固结合的肽以包含在疫苗中具有治疗感染和肿瘤的潜力。存在针对肽:MHC(p:MHC)结合预测的序列数据训练的机器学习模型。在这里,我们在基于理化序列和基于结构的描述符集上训练支持向量机分类器(SVMC)模型,以预测肽与成熟的模型小鼠MHC I等位基因H-2Db的结合。还执行了递归特征消除和双向特征选择。尽管与当前最先进的算法相比灵敏度较低,但基于理化描述符集的模型可实现与最流行的基于序列的算法相媲美的特异性和精确度。表现最佳的模型是包含基于序列的描述符和基于结构的描述符的混合描述符集。有趣的是,杂化模型中剩余的基于物理化学序列的描述子的近一半是锚定位置的特性,即肽序列中的残基5和9。相反,位于位置5侧翼的残基对结合亲和力预测几乎没有或没有残基特异性的贡献。结果表明,结合了基于序列的描述符和结构数据的机器学习模型可能会提供有关确定结合亲和力的特定理化性质的信息。肽序列中的5和9位残基。相反,位于位置5侧翼的残基对结合亲和力预测几乎没有或没有残基特异性的贡献。结果表明,结合了基于序列的描述符和结构数据的机器学习模型可能会提供有关确定结合亲和力的特定理化性质的信息。肽序列中的5和9位残基。相反,位于位置5侧翼的残基对结合亲和力预测几乎没有或没有残基特异性的贡献。结果表明,结合了基于序列的描述符和结构数据的机器学习模型可能会提供有关确定结合亲和力的特定理化性质的信息。
更新日期:2020-01-17
down
wechat
bug