当前位置: X-MOL 学术BMC Med. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Decentralized genomics audit logging via permissioned blockchain ledgering.
BMC Medical Genomics ( IF 2.1 ) Pub Date : 2020-07-21 , DOI: 10.1186/s12920-020-0720-3
Nicholas D Pattengale 1 , Corey M Hudson 2
Affiliation  

One of the tasks in the iDASH Secure Genome Analysis Competition in 2018 was to develop blockchain-based immutable logging and querying for a cross-site genomic dataset access audit trail. The specific challenge was to design a time/space efficient structure and mechanism of storing/retrieving genomic data access logs, based on MultiChain version 1.0.4 ( https://www.multichain.com/ ). Our technique uses the MultiChain stream application programming interface (which affords treating MultiChain as a key value store) and employs a two-level index, which naturally supports efficient queries of the data for single clause constraints. The scheme also supports heuristic and binary search techniques for queries containing conjunctions of clause constraints, and timestamp range queries. Of note, all of our techniques have complexity independent of inserted data set size, other than the timestamp ranges, which logarithmically scale with input size. We implemented our insertion and querying techniques in Python, using the MultiChain library Savoir ( https://github.com/dxmarkets/savoir ), and comprehensively tested our implementation across a benchmark of datasets of varying sizes. We also tested a port of our challenge submission to a newer version of MultiChain (2.0 beta), which natively supports multiple indices. We presented creative and efficient techniques for storing and querying log file data in MultiChain 1.0.4 and 2.0 beta. We demonstrated that it is feasible to use a permissioned blockchain ledger for genomic query log data when data volume is on the order of hundreds of megabytes and query times of dozens of minutes is acceptable. We demonstrated that evolution in the ledger platform (MultiChain 1 to 2) yielded a 30%-40% increase in insertion efficiency. All source code for this challenge has been made available under a BSD-3 license from https://github.com/sandialabs/idash2018task1/ .

中文翻译:

通过许可的区块链分类账进行分散的基因组学审计日志记录。

2018年iDASH安全基因组分析竞赛的任务之一是开发基于区块链的不可变日志记录和查询,以进行跨站点基因组数据集访问审计追踪。具体挑战是基于MultiChain版本1.0.4(https://www.multichain.com/)设计一种节省时间/空间的有效结构和机制,以存储/检索基因组数据访问日志。我们的技术使用MultiChain流应用程序编程接口(可将MultiChain视为键值存储)并采用两级索引,该索引自然支持对单子句约束的数据有效查询。该方案还支持启发式和二进制搜索技术,用于包含子句约束和时间戳范围查询的查询。值得注意的是 我们的所有技术都具有复杂度,与插入数据集的大小无关,除了时间戳范围外,时间戳范围与输入大小成对数关系。我们使用MultiChain库Savoir(https://github.com/dxmarkets/savoir)在Python中实现了插入和查询技术,并在不同大小的数据集的基准测试中全面测试了我们的实现。我们还测试了将挑战提交的端口移植到较新版本的MultiChain(2.0 beta),该版本本身支持多个索引。我们介绍了在MultiChain 1.0.4和2.0 beta中用于存储和查询日志文件数据的创新有效的技术。我们证明了当数据量在数百兆字节的数量级并且数十分钟的查询时间是可接受的时候,使用许可的区块链分类帐进行基因组查询日志数据是可行的。我们证明了分类账平台(MultiChain 1到2)的发展使插入效率提高了30%-40%。已通过BSD-3许可从https://github.com/sandialabs/idash2018task1/获得了用于此挑战的所有源代码。
更新日期:2020-07-21
down
wechat
bug