当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TagSNP-set selection for genotyping using integrated data
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2020-09-09 , DOI: 10.1016/j.future.2020.09.007
Shudong Wang , Gaowei Liu , Xinzeng Wang , Yuanyuan Zhang , Sicheng He , Yulin Zhang

Single-nucleotide polymorphisms (SNPs) are vital in identifying genetic level variations in complex disease. It was found that the information of SNPs on adjacent or identical genes can be represented by a few tagSNPs (called tag SNP-set or tagSNP-set). In this work, we propose a novel method called TagSNP-set Selection by Optimal Iteration with Linkage Disequilibrium (TSOILD) and develop a quantificationally analytical tagSNP-set prediction method called Physical Distance-Linkage Disequilibrium Prediction Method (PDLDPM). To verify the validity of TSOILD method and PDLDPM, a large amount of test data is generated by simulation software HAPGEN2. According to the experimental results, the prediction accuracy of TSOILD is improved by 6.73%, 3.19%, 6.52% and 1.72% over the Random Sampling, Genetic Algorithm (GA) , Greedy Algorithm and TagSNP-Set Selection Method with Maximum Information (TSMI) respectively. In addition, PDLDPM, Linkage Coverage and selection of tag SNPs to maximize prediction accuracy (STAMPA) are used to evaluate the tagSNP-set selected by Random Sampling, GA, Greedy Algorithm and TSMI. Results show that the PDLDPM performs better than the other two methods. These methods provide effective assistance for the study of genetic level variation of complex diseases.



中文翻译:

使用集成数据进行基因分型的TagSNP集选择

单核苷酸多态性(SNP)在鉴定复杂疾病的遗传水平变异中至关重要。发现相邻或相同基因上SNP的信息可以由一些tagSNP(称为标签SNP集或tagSNP集)表示。在这项工作中,我们提出了一种新方法,称为通过连锁不平衡最优迭代的TagSNP集选择(TSOILD),并开发了一种定量分析的tagSNP集预测方法,称为物理距离-连锁不平衡预测方法(PDLDPM)。为了验证TSOILD方法和PDLDPM的有效性,模拟软件HAPGEN2生成了大量测试数据。根据实验结果,TSOILD的预测精度比随机抽样遗传算法(GA)提高了6.73%,3.19%,6.52%和1.72%。分别采用最大信息量(TSMI)的贪婪算法和TagSNP集选择方法。另外,PDLDPM,链接覆盖率和标签SNP的选择以最大化预测准确性(STAMPA)用于评估由随机抽样,GA,贪婪算法和TSMI选择的tagSNP集。结果表明,PDLDPM的性能优于其他两种方法。这些方法为研究复杂疾病的遗传水平变异提供了有效的帮助。

更新日期:2020-09-24
down
wechat
bug