当前位置: X-MOL 学术Mobile DNA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RepetDB: a unified resource for transposable element references.
Mobile DNA ( IF 4.7 ) Pub Date : 2019-01-29 , DOI: 10.1186/s13100-019-0150-y
Joëlle Amselem 1 , Guillaume Cornut 1 , Nathalie Choisne 1 , Michael Alaux 1 , Françoise Alfama-Depauw 1 , Véronique Jamilloux 1 , Florian Maumus 1 , Thomas Letellier 1 , Isabelle Luyten 1 , Cyril Pommier 1 , Anne-Françoise Adam-Blondon 1 , Hadi Quesneville 1
Affiliation  

BACKGROUND Thanks to their ability to move around and replicate within genomes, transposable elements (TEs) are perhaps the most important contributors to genome plasticity and evolution. Their detection and annotation are considered essential in any genome sequencing project. The number of fully sequenced genomes is rapidly increasing with improvements in high-throughput sequencing technologies. A fully automated de novo annotation process for TEs is therefore required to cope with the deluge of sequence data.However, all automated procedures are error-prone, and an automated procedure for TE identification and classification would be no exception. It is therefore crucial to provide not only the TE reference sequences, but also evidence justifying their classification, at the scale of the whole genome. A few TE databases already exist, but none provides evidence to justify TE classification. Moreover, biological information about the sequences remains globally poor. RESULTS We present here the RepetDB database developed in the framework of GnpIS, a genetic and genomic information system. RepetDB is designed to store and retrieve detected, classified and annotated TEs in a standardized manner. RepetDB is an implementation with extensions of InterMine, an open-source data warehouse framework used here to store, search, browse, analyze and compare all the data recorded for each TE reference sequence. InterMine can display diverse information for each sequence and allows simple to very complex queries. Finally, TE data are displayed via a worldwide data discovery portal. RepetDB is accessible at urgi.versailles.inra.fr/repetdb. CONCLUSIONS RepetDB is designed to be a TE knowledge base populated with full de novo TE annotations of complete (or near-complete) genome sequences. Indeed, the description and classification of TEs facilitates the exploration of specific TE families, superfamilies or orders across a large range of species. It also makes possible cross-species searches and comparisons of TE family content between genomes.

中文翻译:

RepetDB:转座元素引用的统一资源。

背景 由于它们能够在基因组内移动和复制,转座因子 (TE) 可能是基因组可塑性和进化的最重要贡献者。它们的检测和注释在任何基因组测序项目中都被认为是必不可少的。随着高通量测序技术的改进,完全测序的基因组数量正在迅速增加。因此,需要对 TE 进行完全自动化的从头注释过程来应对大量序列数据。但是,所有自动化程序都容易出错,用于 TE 识别和分类的自动化程序也不例外。因此,不仅要提供 TE 参考序列,而且要在全基因组范围内提供证明其分类合理性的证据,这一点至关重要。一些 TE 数据库已经存在,但没有提供证据证明 TE 分类的合理性。此外,关于序列的生物学信息在全球范围内仍然很差。结果 我们在这里介绍在 GnpIS 框架下开发的 RepetDB 数据库,GnpIS 是一个遗传和基因组信息系统。RepetDB 旨在以标准化方式存储和检索检测、分类和注释的 TE。RepetDB 是 InterMine 扩展的实现,InterMine 是一个开源数据仓库框架,用于存储、搜索、浏览、分析和比较为每个 TE 参考序列记录的所有数据。InterMine 可以显示每个序列的不同信息,并允许从简单到非常复杂的查询。最后,TE 数据通过全球数据发现门户显示。RepetDB 可通过 urgi.versailles.inra.fr/repetdb 访问。结论 RepetDB 被设计为一个 TE 知识库,其中填充了完整(或接近完整)基因组序列的完整从头 TE 注释。事实上,对 TE 的描述和分类有助于探索大范围物种中特定的 TE 家族、超家族或目。它还使跨物种搜索和基因组之间 TE 家族内容的比较成为可能。
更新日期:2019-11-01
down
wechat
bug