Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases.,IEEE Transactions on NanoBioscience

当前位置： X-MOL 学术 › IEEE Trans. NanoBiosci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases.
IEEE Transactions on NanoBioscience ( IF 3.9 ) Pub Date : 2019-07-22 , DOI: 10.1109/tnb.2019.2930494
Dariusz Mrozek , Jacek Kwiendacz , Bozena Malysiak-Mrozek

Exploration of various characteristics of 3D protein structures through querying relational databases storing the structures can be challenging due to the necessity to conform to a particular database schema. However, this also brings several advantages, like the ability to perform extensive database searches with declarative SQL language, protect data against hardware damages through regular backup mechanisms, and secure data against unauthorized access. Since relational databases do not provide exploration methods specific for protein data and its biological semantics, like searches on the basis of protein structural patterns, the use of relational databases in this domain is still rare and requires the development of dedicated methods to increase the speed of data exploration techniques. In this paper, we show a novel data partitioning scheme for distributing data across database clusters that can be used for performing sophisticated explorations of 3D protein structures. The data partitioning scheme relies on protein construction, which requires data preprocessing but results in shorter exploration times through querying federated databases. We solve the problem of finding proteins in Oracle relational database on the basis of the similarity of 3D protein structures with the use of distributed PAR-P3D-SQL queries. Since 3D protein structure similarity searching is one of the most time-consuming exploration processes that can be performed for protein data, we make use of a distributed environment of Oracle federated databases, distributed query processing, and dedicated load balancing methods to accelerate the exploration. Results of performed tests confirm that we are able to significantly increase the speed of the exploration process, proportionally to the number of database nodes in the federated environment.

中文翻译：

基于蛋白质构建的数据分区方案，用于通过联合数据库中的分布式查询来对齐蛋白质大分子结构。

由于需要符合特定的数据库模式，因此通过查询存储结构的关系数据库来探索3D蛋白质结构的各种特征可能具有挑战性。但是，这也带来了许多优势，例如能够使用声明性SQL语言执行广泛的数据库搜索，通过常规备份机制保护数据免受硬件损坏以及保护数据免受未经授权的访问。由于关系数据库没有提供特定于蛋白质数据及其生物学语义的探索方法，例如基于蛋白质结构模式的搜索，因此在该领域中仍然很少使用关系数据库，因此需要开发专用方法来提高蛋白质鉴定的速度。数据探索技术。在本文中，我们展示了一种用于跨数据库群集分布数据的新颖数据分区方案，该方案可用于执行3D蛋白质结构的复杂探索。数据分区方案依赖于蛋白质构建，这需要进行数据预处理，但是通过查询联合数据库可以缩短探索时间。我们基于3D蛋白质结构与分布式PAR-P3D-SQL查询的相似性，解决了在Oracle关系数据库中寻找蛋白质的问题。由于3D蛋白质结构相似性搜索是可以对蛋白质数据执行的最耗时的探索过程之一，因此，我们利用Oracle联合数据库的分布式环境，分布式查询处理和专用的负载平衡方法来加速探索。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>