当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
Journal of Computational Biology ( IF 1.7 ) Pub Date : 2021-05-20 , DOI: 10.1089/cmb.2020.0445
Brooks Paige 1, 2 , James Bell 1 , Aurélien Bellet 3 , Adrià Gascón 1, 4 , Daphne Ezer 1, 4, 5
Affiliation  

Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here, we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained by using a largely overlapping set of participants, it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analyzing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of single nucleotide polymorphisms within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study.

中文翻译:

从遗传风险评分重建私人基因组数据库中的基因型

一些组织,如 23andMe 和 UK Biobank 拥有大型基因组数据库,可重复用于多种不同的全基因组关联研究。即使是编译较小基因组数据库的研究也经常利用这些数据库来调查许多相关性状。该研究通常会报告出版物中每个性状的遗传风险评分 (GRS) 模型。在这里,我们展示了在某些情况下,这些 GRS 模型可用于恢复这些基因组数据库中个体的遗传变异——重建攻击。特别是,如果使用大量重叠的参与者集训练两个 GRS 模型,则通常可以确定用于训练一个 GRS 模型的每个人的基因型,但不能确定另一个。我们通过分析康奈尔犬基因组数据库在理论上和实验上证明了这一点。我们的重建攻击的准确性取决于我们可以多准确地估计私有数据库中单核苷酸多态性对的共现率,因此,如果这些汇总信息被发布,它将大大降低私有基因组数据库的安全性. 使用同一数据库进行多重分析时应谨慎,特别是当研究的一部分包含或排除少数个体时。这将大大降低私人基因组数据库的安全性。使用同一数据库进行多重分析时应谨慎,特别是当研究的一部分包含或排除少数个体时。这将大大降低私人基因组数据库的安全性。使用同一数据库进行多重分析时应谨慎,特别是当研究的一部分包含或排除少数个体时。
更新日期:2021-05-22
down
wechat
bug