当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
Genome Biology ( IF 12.3 ) Pub Date : 2019-12-01 , DOI: 10.1186/s13059-019-1905-y
Shujun Ou 1 , Weija Su 2 , Yi Liao 3 , Kapeel Chougule 4 , Jireh R A Agda 5 , Adam J Hellinga 5 , Carlos Santiago Blanco Lugo 5 , Tyler A Elliott 5 , Doreen Ware 4, 6 , Thomas Peterson 2 , Ning Jiang 7 , Candice N Hirsch 8 , Matthew B Hufford 1
Affiliation  

BackgroundSequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.ResultsWe benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.ConclusionsThe benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

中文翻译:

对可转座元素注释方法进行基准测试,以创建流线型、全面的管道

背景测序技术和组装算法已经成熟到可以对大型重复基因组进行高质量的从头组装。当前的程序集遍历可转座元件 (TE) 并提供对 TE 进行全面注释的机会。存在许多用于注释每一类 TE 的方法,但尚未系统地比较它们的相对性能。此外,需要一个全面的管道来为缺乏这种资源的物种生成一个非冗余的 TE 库,以生成全基因组 TE 注释。结果我们基于精心策划的水稻 TE 库对现有程序进行了基准测试。我们评估了注释长末端重复 (LTR) 逆转录转座子、末端反向重复 (TIR) 转座子的方法的性能,称为微型倒转转座元件 (MITE) 和 Helitron 的短 TIR 转座子。性能指标包括灵敏度、特异性、准确度、精密度、FDR 和 F1。使用最强大的程序,我们创建了一个名为 Extensive de-novo TE Annotator (EDTA) 的综合管道,它生成一个过滤的非冗余 TE 库,用于注释结构完整和碎片化的元素。EDTA 还可以解卷积在高度重复的基因组区域中经常发现的嵌套 TE 插入。使用其他具有精选 TE 库(玉米和果蝇)的模型物种,表明 EDTA 在植物和动物物种中都具有稳健性。结论此处开发的基准测试结果和管道将极大地促进真核基因组中的 TE 注释。这些注释将促进更深入地了解 TE 在种内和种间水平上的多样性和进化。EDTA 是开源且免费提供的:https://github.com/oushujun/EDTA。
更新日期:2019-12-01
down
wechat
bug