当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions.
Molecular Biology and Evolution ( IF 11.0 ) Pub Date : 2019-11-21 , DOI: 10.1093/molbev/msz276
Jing Li 1 , Sen Zhang 1 , Bo Li 2 , Yi Hu 1 , Xiao-Ping Kang 1 , Xiao-Yan Wu 1 , Meng-Ting Huang 1, 3 , Yu-Chang Li 1 , Zhong-Peng Zhao 4 , Cheng-Feng Qin 1 , Tao Jiang 1, 3
Affiliation  

Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide composition. A total of 217,549 IAV full-length coding sequences of the PB2 (polymerase basic protein-2), PB1, PA (polymerase acidic protein), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). A total of 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13, 10 and 9 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic curve indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic data sets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.

中文翻译:

基于病毒核苷酸成分预测人类适应性甲型流感病毒的机器学习方法。

每种流感大流行至少部分是由禽源和/或猪源的A型流感病毒(IAV)引起的。当前无法预测下一次大流行中涉及的IAV的时机和可能。我们旨在建立机器学习(ML)模型,以预测人类可适应的IAV核苷酸组成。分别分解了PB2(聚合酶碱性蛋白2),PB1,PA(聚合酶酸性蛋白),HA(血凝素),NP(核蛋白)和NA(神经氨酸酶)片段的217,549个IAV全长编码序列。基于密码子位置的单核苷酸(12 nts)和二核苷酸(48 dnts)。共重采样了68,742条人类序列和68,739条禽序列(1:1),以通过主成分分析(PCA)和其他ML模型表征与人类适应相关的(d)nt。然后,基于特征化的(d)nt预测IAV序列的人适应性。分别针对这六个片段优化了9、12、11、13、10和9个人类适应性(d)nt。PCA和层次聚类分析揭示了优化的(d)nts在人类和鸟类之间的线性可分离性。混淆矩阵的结果和接收器工作特性曲线下的面积表明ML模型具有较高的性能,可预测人对IAV的适应性。我们的模型在预测2009年H1N1大流行之前和之后人类对猪/禽IAV的适应性方面表现良好。总之,我们确定了与人类适应相关的IAV片段的基因组组成。使用大型IAV基因组数据集的用于IAV人类适应性预测的ML模型可以促进识别影响病毒传播/致病性的关键病毒因素。最重要的是,它可以预测大流行性流感。
更新日期:2020-04-17
down
wechat
bug