当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering.
GigaScience ( IF 9.2 ) Pub Date : 2019-12-01 , DOI: 10.1093/gigascience/giz150
Milad Miladi 1 , Eteri Sokhoyan 1 , Torsten Houwaart 2 , Steffen Heyne 3 , Fabrizio Costa 4 , Björn Grüning 1, 5 , Rolf Backofen 1, 5, 6
Affiliation  

BACKGROUND RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. RESULTS Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs. CONCLUSIONS By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR's 3' untranslated region that contains multiple binding stem-loops that are evolutionary conserved.

中文翻译:

GraphClust2:使用可扩展且可访问的集成簇对结构化RNA进行注释和发现。

背景技术RNA在所有已知的生命形式中都起着至关重要的作用。将具有共同序列和结构的RNA序列聚类是研究RNA功能的必不可少的步骤。随着高通量测序技术的出现,实验和基因组数据正在扩展,以补充预测方法。但是,现有方法不能有效地利用和应对大量可用数据。结果已检测到成千上万的非编码RNA。但是,它们的注释滞后。在这里,我们介绍GraphClust2,这是一种基于序列和结构相似性用于RNA的可伸缩聚类的综合方法。GraphClust2填补了高通量测序和结构RNA分析之间的空白,并通过Galaxy框架以可访问的方式合并了各种实验数据和基因组数据,从而提供了一个集成解决方案。GraphClust2可以有效地对大型RNA数据集进行聚类和批注,并支持结构探测数据。我们证明聚簇功能RNA的注释性能可以大大提高。此外,引入了现成的方法来鉴定长RNA中的局部保守结构候选物。我们建议系统发育保守的局部结构的存在和稀疏的长非编码RNA的集合。结论通过对2个交联免疫沉淀实验的数据进行聚类,我们展示了在存在生物学和方法学偏见的情况下GraphClust2进行主题发现的好处。最后,我们发现了双链RNA结合蛋白Roquin-1的显着靶标,例如BCOR的3'非翻译区,其中包含多个进化保守的结合茎环。
更新日期:2019-12-06
down
wechat
bug