当前位置: X-MOL 学术Cybersecurity › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Privacy-preserving decision tree for epistasis detection
Cybersecurity ( IF 3.9 ) Pub Date : 2019-02-18 , DOI: 10.1186/s42400-019-0025-z
Qingfeng Chen , Xu Zhang , Ruchang Zhang

The interaction between gene loci, namely epistasis, is a widespread biological genetic phenomenon. In genome-wide association studies(GWAS), epistasis detection of complex diseases is a major challenge. Although many approaches using statistics, machine learning, and information entropy were proposed for epistasis detection, the privacy preserving for single nucleotide polymorphism(SNP) data has been largely ignored. Thus, this paper proposes a novel two-stage approach. A fusion strategy assists in combining and sorting the SNPs importance scores obtained by the relief and mutual information, thereby obtaining a candidate set of SNPs. This avoids missing some SNPs with strong interaction. Furthermore, differentially private decision tree is applied to search for SNPs. This achieves the efficient epistasis detection of complex diseases on the basis of privacy preserving compared with heuristic methods. The recognition rate on simulation data set is more than 90%. Also, several susceptible loci including rs380390 and rs1329428 are found in the real data set for Age-related Macular Degeneration (AMD). This demonstrates that our method is promising in epistasis detection.

中文翻译:

用于上位性检测的隐私保护决策树

基因位点之间的相互作用,即上位性,是一种广泛存在的生物遗传现象。在全基因组关联研究(GWAS)中,复杂疾病的上位性检测是一项重大挑战。尽管提出了许多使用统计、机器学习和信息熵的方法用于上位性检测,但在很大程度上忽略了单核苷酸多态性(SNP)数据的隐私保护。因此,本文提出了一种新颖的两阶段方法。融合策略有助于对通过救济和互信息获得的SNPs重要性得分进行组合和排序,从而获得SNPs的候选集。这样可以避免遗漏一些具有强相互作用的 SNP。此外,差分私有决策树被应用于搜索 SNP。与启发式方法相比,这实现了在隐私保护的基础上对复杂疾病的高效上位检测。在仿真数据集上的识别率超过90%。此外,在年龄相关性黄斑变性 (AMD) 的真实数据集中发现了几个易感位点,包括 rs380390 和 rs1329428。这表明我们的方法在上位性检测中很有前途。
更新日期:2019-02-18
down
wechat
bug