当前位置: X-MOL 学术BMC Med. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semi-Parallel logistic regression for GWAS on encrypted data.
BMC Medical Genomics ( IF 2.1 ) Pub Date : 2020-07-21 , DOI: 10.1186/s12920-020-0724-z
Miran Kim 1 , Yongsoo Song 2 , Baiyu Li 3 , Daniele Micciancio 3
Affiliation  

The sharing of biomedical data is crucial to enable scientific discoveries across institutions and improve health care. For example, genome-wide association studies (GWAS) based on a large number of samples can identify disease-causing genetic variants. The privacy concern, however, has become a major hurdle for data management and utilization. Homomorphic encryption is one of the most powerful cryptographic primitives which can address the privacy and security issues. It supports the computation on encrypted data, so that we can aggregate data and perform an arbitrary computation on an untrusted cloud environment without the leakage of sensitive information. This paper presents a secure outsourcing solution to assess logistic regression models for quantitative traits to test their associations with genotypes. We adapt the semi-parallel training method by Sikorska et al., which builds a logistic regression model for covariates, followed by one-step parallelizable regressions on all individual single nucleotide polymorphisms (SNPs). In addition, we modify our underlying approximate homomorphic encryption scheme for performance improvement. We evaluated the performance of our solution through experiments on real-world dataset. It achieves the best performance of homomorphic encryption system for GWAS analysis in terms of both complexity and accuracy. For example, given a dataset consisting of 245 samples, each of which has 10643 SNPs and 3 covariates, our algorithm takes about 43 seconds to perform logistic regression based genome wide association analysis over encryption. We demonstrate the feasibility and scalability of our solution.

中文翻译:


加密数据上的 GWAS 半并行逻辑回归。



生物医学数据的共享对于跨机构的科学发现和改善医疗保健至关重要。例如,基于大量样本的全基因组关联研究(GWAS)可以识别致病的遗传变异。然而,隐私问题已成为数据管理和利用的主要障碍。同态加密是最强大的加密原语之一,可以解决隐私和安全问题。它支持加密数据的计算,这样我们就可以在不可信的云环境中聚合数据并进行任意计算,而不会泄露敏感信息。本文提出了一种安全的外包解决方案,用于评估数量性状的逻辑回归模型,以测试其与基因型的关联。我们采用了 Sikorska 等人的半并行训练方法,该方法为协变量构建了一个逻辑回归模型,然后对所有个体单核苷酸多态性 (SNP) 进行一步并行回归。此外,我们修改了底层的近似同态加密方案以提高性能。我们通过对真实数据集的实验评估了我们的解决方案的性能。它在复杂性和准确性方面都实现了用于 GWAS 分析的同态加密系统的最佳性能。例如,给定一个由 245 个样本组成的数据集,每个样本都有 10643 个 SNP 和 3 个协变量,我们的算法需要大约 43 秒来通过加密执行基于逻辑回归的全基因组关联分析。我们展示了我们的解决方案的可行性和可扩展性。
更新日期:2020-07-21
down
wechat
bug