当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Novel quality metrics allow identifying and generating high-quality assemblies of piRNA clusters
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2021-06-28 , DOI: 10.1111/1755-0998.13455
Filip Wierzbicki 1, 2 , Florian Schwarz 1, 2 , Odontsetseg Cannalonga 1 , Robert Kofler 1
Affiliation  

In most animals, it is thought that the proliferation of a transposable element (TE) is stopped when the TE jumps into a piRNA cluster. Despite this central importance, little is known about the composition and the evolutionary dynamics of piRNA clusters. This is largely because piRNA clusters are notoriously difficult to assemble as they are frequently composed of highly repetitive DNA. With long reads, we may finally be able to obtain reliable assemblies of piRNA clusters. Unfortunately, it is unclear how to generate and identify the best assemblies, as many assembly strategies exist and standard quality metrics are ignorant of TEs. To address these problems, we introduce several novel quality metrics that assess: (a) the fraction of completely assembled piRNA clusters, (b) the quality of the assembled clusters and (c) whether an assembly captures the overall TE landscape of an organisms (i.e. the abundance, the number of SNPs and internal deletions of all TE families). The requirements for computing these metrics vary, ranging from annotations of piRNA clusters to consensus sequences of TEs and genomic sequencing data. Using these novel metrics, we evaluate the effect of assembly algorithm, polishing, read length, coverage, residual polymorphisms and finally identify strategies that yield reliable assemblies of piRNA clusters. Based on an optimized approach, we provide assemblies for the two Drosophila melanogaster strains Canton-S and Pi2. About 80% of known piRNA clusters were assembled in both strains. Finally, we demonstrate the generality of our approach by extending our metrics to humans and Arabidopsis thaliana.

中文翻译:

新的质量指标允许识别和生成高质量的 piRNA 簇组装

在大多数动物中,人们认为当 TE 跳入 piRNA 簇时,转座因子 (TE) 的增殖就会停止。尽管具有这一核心重要性,但人们对 piRNA 簇的组成和进化动力学知之甚少。这主要是因为众所周知,piRNA 簇很难组装,因为它们通常由高度重复的 DNA 组成。通过长读,我们最终可能能够获得可靠的 piRNA 簇组装。不幸的是,目前尚不清楚如何生成和识别最佳装配,因为存在许多装配策略并且标准质量指标对 TE 不了解。为了解决这些问题,我们引入了几个新的质量指标来评估:(a)完全组装的 piRNA 簇的比例,(b) 组装簇的质量和 (c) 组装是否捕获了生物体的整体 TE 景观(即所有 TE 家族的丰度、SNP 数量和内部缺失)。计算这些指标的要求各不相同,从 piRNA 簇的注释到 TE 的共有序列和基因组测序数据。使用这些新指标,我们评估组装算法、抛光、读取长度、覆盖率、残留多态性的影响,并最终确定产生可靠的 piRNA 簇组装的策略。基于优化的方法,我们为两者提供组件 从 piRNA 簇的注释到 TE 的共有序列和基因组测序数据。使用这些新指标,我们评估组装算法、抛光、读取长度、覆盖率、残留多态性的影响,并最终确定产生可靠的 piRNA 簇组装的策略。基于优化的方法,我们为两者提供组件 从 piRNA 簇的注释到 TE 的共有序列和基因组测序数据。使用这些新指标,我们评估组装算法、抛光、读取长度、覆盖率、残留多态性的影响,并最终确定产生可靠的 piRNA 簇组装的策略。基于优化的方法,我们为两者提供组件果蝇菌株 Canton-S 和 Pi2。大约 80% 的已知 piRNA 簇在两种菌株中组装。最后,我们通过将我们的指标扩展到人类和拟南芥来证明我们方法的通用性。
更新日期:2021-06-28
down
wechat
bug