当前位置: X-MOL 学术BBA Gen. Subj. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of novel RNA design candidates by clustering the extended RNA-As-Graphs library.
Biochimica et Biophysica Acta (BBA) - General Subjects ( IF 2.8 ) Pub Date : 2020-01-16 , DOI: 10.1016/j.bbagen.2020.129534
Swati Jain 1 , Qiyao Zhu 2 , Amiel S P Paz 3 , Tamar Schlick 4
Affiliation  

BACKGROUND We re-evaluate our RNA-As-Graphs clustering approach, using our expanded graph library and new RNA structures, to identify potential RNA-like topologies for design. Our coarse-grained approach represents RNA secondary structures as tree and dual graphs, with vertices and edges corresponding to RNA helices and loops. The graph theoretical framework facilitates graph enumeration, partitioning, and clustering approaches to study RNA structure and its applications. METHODS Clustering graph topologies based on features derived from graph Laplacian matrices and known RNA structures allows us to classify topologies into 'existing' or hypothetical, and the latter into, 'RNA-like' or 'non RNA-like' topologies. Here we update our list of existing tree graph topologies and RAG-3D database of atomic fragments to include newly determined RNA structures. We then use linear and quadratic regression, optionally with dimensionality reduction, to derive graph features and apply several clustering algorithms on our tree-graph library and recently expanded dual-graph library to classify them into the three groups. RESULTS The unsupervised PAM and K-means clustering approaches correctly classify 72-77% of all existing graph topologies and 75-82% of newly added ones as RNA-like. For supervised k-NN clustering, the cross-validation accuracy ranges from 57 to 81%. CONCLUSIONS Using linear regression with unsupervised clustering, or quadratic regression with supervised clustering, provides better accuracies than supervised/linear clustering. All accuracies are better than random, especially for newly added existing topologies, thus lending credibility to our approach. GENERAL SIGNIFICANCE Our updated RAG-3D database and motif classification by clustering present new RNA substructures and RNA-like motifs as novel design candidates.

中文翻译:

通过将扩展的RNA-As-Graphs库聚类来鉴定新型RNA设计候选物。

背景技术我们使用扩展的图形库和新的RNA结构来重新评估我们的RNA-As-Graphs聚类方法,以识别潜在的类似于RNA的拓扑设计。我们的粗粒度方法将RNA二级结构表示为树和对偶图,其顶点和边缘对应于RNA螺旋和环。图的理论框架促进了图的枚举,划分和聚类方法研究RNA结构及其应用。方法基于从图拉普拉斯矩阵和已知RNA结构派生的特征对图拓扑进行聚类,使我们可以将拓扑分为“现有”或假设的拓扑,而将后者分为“类RNA”或“非类RNA”拓扑。在这里,我们更新现有树形图拓扑和原子片段的RAG-3D数据库列表,以包括新确定的RNA结构。然后,我们使用线性和二次回归(可选地带有降维)来导出图特征,并在我们的树图库和最近扩展的对偶图库中将几种聚类算法应用于三类。结果无监督的PAM和K-means聚类方法正确地将所有现有图拓扑的72-77%和新添加图的75-82%归类为RNA样。对于监督性k-NN聚类,交叉验证的准确性范围为57%至81%。结论将线性回归与无监督聚类一起使用,或将二次回归与有监督聚类一起使用,提供了比监督/线性聚类更好的准确性。所有精度都优于随机精度,尤其是对于新添加的现有拓扑而言,因此使我们的方法可信。一般意义通过将最新的RNA亚结构和类RNA的基序聚类,我们更新的RAG-3D数据库和基序分类成为了新颖的设计候选对象。
更新日期:2020-01-17
down
wechat
bug