当前位置: X-MOL 学术Comput. Biol. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
VEPAD - Predicting the effect of variants associated with Alzheimer's disease using machine learning.
Computers in Biology and Medicine ( IF 7.0 ) Pub Date : 2020-08-05 , DOI: 10.1016/j.compbiomed.2020.103933
Uday Rangaswamy 1 , S Akila Parvathy Dharshini 1 , Dhanusha Yesudhas 1 , M Michael Gromiha 2
Affiliation  

Introduction

Alzheimer's disease (AD) is a complex and heterogeneous disease that affects neuronal cells over time and it is prevalent among all neurodegenerative diseases. Next Generation Sequencing (NGS) techniques are widely used for developing high-throughput screening methods to identify biomarkers and variants, which help early diagnosis and treatments.

Objective

The primary purpose of this study is to develop a classification model using machine learning for predicting the deleterious effect of variants with respect to AD.

Methods

We have constructed a set of 20,401 deleterious and 37,452 control variants from Genome-Wide Association Study (GWAS) and Genotype-Tissue Expression (GTEx) portals, respectively. Recursive feature elimination using cross-validation (RFECV) followed by a forward feature selection method was utilized to select the important features and a random forest classifier was used for distinguishing between deleterious and neutral variants.

Results

Our method showed an accuracy of 81.21% on 10-fold cross-validation and 70.63% on a test set of 5785 variants. The same test set was used to compare the performance of CADD and FATHMM and their accuracies are in the range of 54%–62%.

Conclusion

Our model is freely available as the Variant Effect Predictor for Alzheimer's Disease (VEPAD) at http://web.iitm.ac.in/bioinfo2/vepad/. VEPAD can be used to predict the effect of new variants associated with AD.



中文翻译:

VEPAD-使用机器学习预测与阿尔茨海默氏病相关的变体的效果。

介绍

阿尔茨海默氏病(AD)是一种复杂的异质性疾病,随着时间的推移会影响神经元细胞,并且在所有神经退行性疾病中普遍存在。下一代测序(NGS)技术被广泛用于开发高通量筛选方法,以识别生物标志物和变异体,从而有助于早期诊断和治疗。

目的

这项研究的主要目的是使用机器学习来开发分类模型,以预测变体相对于AD的有害影响。

方法

我们分别从全基因组关联研究(GWAS)和基因型组织表达(GTEx)门户网站构建了20,401种有害变异和37,452种控制变异。使用交叉验证(RFECV)进行递归特征消除,然后使用前向特征选择方法来选择重要特征,并使用随机森林分类器来区分有害变体和中性变体。

结果

我们的方法在10倍交叉验证中显示了81.21%的准确度,在5785个变体的测试集上显示了70.63%的准确度。使用相同的测试集比较CADD和FATHMM的性能,其准确度在54%–62%的范围内。

结论

我们的模型可作为http://web.iitm.ac.in/bioinfo2/vepad/上的阿尔茨海默病变异效应预测变量(VEPAD)免费获得。VEPAD可用于预测与AD相关的新变体的效果。

更新日期:2020-08-19
down
wechat
bug