当前位置: X-MOL 学术arXiv.cs.CR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies
arXiv - CS - Cryptography and Security Pub Date : 2021-01-21 , DOI: arxiv-2101.08879
Anisa Halimi, Leonard Dervishi, Erman Ayday, Apostolos Pyrgelis, Juan Ramon Troncoso-Pastoriza, Jean-Pierre Hubaux, Xiaoqian Jiang, Jaideep Vaidya

Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a researcher while protecting individuals' privacy in the researcher's dataset. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another statistical study (using publicly available datasets) to distinguish between correct statistics and incorrect ones. We use case the proposed framework in the genome-wide association studies (GWAS), in which the goal is to identify highly associated point mutations (variants) with a given phenotype. For evaluation, we use real genomic data and show that the correctness of the workflow output can be verified with high accuracy even when the aggregate statistics of a small number of variants are provided. We also quantify the privacy leakage due to the provided workflow and its associated metadata in the GWAS use-case and show that the additional privacy risk due to the provided metadata does not increase the existing privacy risk due to sharing of the research results. Thus, our results show that the workflow output (i.e., research results) can be verified with high confidence in a privacy-preserving way. We believe that this work will be a valuable step towards providing provenance in a privacy-preserving way while providing guarantees to the users about the correctness of the results.

中文翻译:

全基因组关联研究结果的隐私保护和有效验证

在科学的工作流程中提供源对于重现性和可审核性至关重要。工作流系统建模并记录出处,以描述为获得计算最终结果而执行的步骤。在这项工作中,我们提出了一个框架,该框架可验证研究人员进行的统计测试结果的正确性,同时保护研究人员数据集中的个人隐私。研究人员发布所进行研究的工作流程,其输出以及相关的元数据。它们将研究数据集保持私有,同时作为元数据的一部分提供部分嘈杂的数据集(实现局部差异隐私)。为了检查工作流程输出的正确性,验证者会使用工作流程,其元数据,以及另一项统计研究(使用公开数据集)的结果来区分正确的统计数据和错误的统计数据。我们在全基因组关联研究(GWAS)中使用拟议的框架,其目的是确定具有给定表型的高度关联的点突变(变体)。为了进行评估,我们使用了真实的基因组数据,并表明即使提供了少量变体的汇总统计信息,也可以高精度地验证工作流输出的正确性。我们还对在GWAS用例中由于提供的工作流及其关联的元数据导致的隐私泄漏进行了量化,并表明由于提供的元数据而导致的额外隐私风险不会由于共享研究结果而增加现有的隐私风险。因此,我们的结果表明工作流程输出(即 研究结果)可以高度保密地以隐私保护的方式进行验证。我们相信这项工作将是朝着以隐私保护的方式提供出处,同时向用户保证结果的正确性迈出的重要一步。
更新日期:2021-01-25
down
wechat
bug