当前位置: X-MOL 学术CSI Trans. ICT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High performance computing approach for DNA motif discovery
CSI Transactions on ICT Pub Date : 2019-08-17 , DOI: 10.1007/s40012-019-00235-w
Deepti D. Shrimankar

Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. The motifs are short, recurring patterns in DNA sequences that are presumed to have a biological function. Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic sequences have been available. Recent advances in genome sequence availability and in high throughput gene expression analysis technologies have allowed for the development of computational methods for motif discovery. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The detection of regulatory elements from a large set of regulatory regions is a challenging problem in computational genomics. However, computational methods to extract this biological meaningful information suffer from high computational requirements. High performance computing appears as a magic bullet in this challenge. Designing a parallel algorithm to detect regulatory elements using correlation with gene expression data and its implementation with openMPI and openMP will leads to significant runtime savings on distributed system. Solving computationally intensive problems on high performance computing architecture can significantly improve and speedup the run time of the problem solution when proper task distribution, scheduling strategy and suitable parallel computing paradigms are used. Deploying more and more cluster computers can bridge the gap of speed difference between architectures and will result in fewer numbers of concurrent jobs that can be allocated to the system.

中文翻译:

DNA基序发现的高性能计算方法

阐明调节基因表达的机制是生物学的主要挑战。在这一挑战中的一项重要任务是确定调节元件,尤其是脱氧核糖核酸(DNA)中转录因子的结合位点。这些结合位点是短的DNA片段,称为基序。这些基序是推测具有生物学功能的DNA序列中的短重复模式。自从基因组序列问世以来,基序的发现一直是生物信息学中研究最广泛的问题之一。基因组序列可用性和高通量基因表达分析技术的最新进展已允许开发用于发现基序的计算方法。结果是,在过去的十年中,已经实现了大量的图案发现算法并将其应用于各种图案模型。由于调节元件通常是短而易变的,因此很难使用计算算法进行识别和发现。然而,在用于DNA调节元件的建模和检测的计算方法中已经取得了重大进展。从大量的调控区中检测调控元件是计算基因组学中一个具有挑战性的问题。但是,提取这种生物学上有意义的信息的计算方法受到很高的计算要求。高性能计算似乎是应对这一挑战的灵丹妙药。设计一种并行算法以使用与基因表达数据的相关性来检测调控元件,并使用openMPI和openMP进行实施,这将大大节省分布式系统的运行时间。当使用适当的任务分配,调度策略和合适的并行计算范例时,在高性能计算体系结构上解决计算密集型问题可以显着改善并加快问题解决方案的运行时间。部署越来越多的群集计算机可以弥合体系结构之间速度差异的鸿沟,并将导致可分配给系统的并发作业数量减少。当使用适当的任务分配,调度策略和合适的并行计算范例时,在高性能计算体系结构上解决计算密集型问题可以显着改善并加快问题解决方案的运行时间。部署越来越多的群集计算机可以弥合体系结构之间速度差异的鸿沟,并将导致可分配给系统的并发作业数量减少。当使用适当的任务分配,调度策略和合适的并行计算范例时,在高性能计算体系结构上解决计算密集型问题可以显着改善并加快问题解决方案的运行时间。部署越来越多的群集计算机可以弥合体系结构之间速度差异的鸿沟,并将导致可分配给系统的并发作业数量减少。
更新日期:2019-08-17
down
wechat
bug