当前位置: X-MOL 学术Mobile DNA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A benchmark of transposon insertion detection tools using real data
Mobile DNA ( IF 4.9 ) Pub Date : 2019-12-30 , DOI: 10.1186/s13100-019-0197-9
Pol Vendrell-Mir 1 , Fabio Barteri 1 , Miriam Merenciano 2 , Josefa González 2 , Josep M Casacuberta 1 , Raúl Castanera 1
Affiliation  

Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions. We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in Drosophila and humans and show that this trend is maintained in genomes of different size and complexity. We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision.

中文翻译:

使用真实数据的转座子插入检测工具的基准

转座因子 (TE) 是真核基因组中基因组变异性的重要来源。它们的活动影响基因组结构和基因表达,并可能导致剧烈的表型变化。因此,识别 TE 多态性是更好地理解基因型和表型之间联系的关键。然而,大多数基因型到表型的分析都集中在单核苷酸多态性上,因为它们更容易使用短读长数据进行可靠检测。已经开发了许多生物信息学工具来使用短读长从重测序数据中识别转座子插入。然而,大多数这些工具的性能已经使用模拟插入进行了测试,这并不能准确地再现自然插入的复杂性。我们通过比较两个优质水稻基因组构建插入数据集克服了这一限制,然后进行了广泛的手动管理。该数据集包含两种非常不同类型的 TE、LTR 逆转录转座子和 MITE 的经过验证的插入。使用该数据集,我们对 12 种常用工具的灵敏度和精度进行了基准测试,我们的结果表明,一般而言,在使用模拟数据时,它们的灵敏度被高估了。我们的结果还表明,增加覆盖范围会带来更好的灵敏度,但会降低精度。此外,我们发现工具性能存在重要差异,一些工具在特定类型的 TE 上表现更好。我们还在果蝇和人类中使用了两组经过实验验证的插入,并表明这种趋势在不同大小和复杂性的基因组中保持不变。我们根据研究目标讨论了可能的工具选择,并表明适当的工具组合可能是大多数方法的一种选择,在提高灵敏度的同时保持良好的精度。
更新日期:2019-12-30
down
wechat
bug