DBTree: Very large phylogenies in portable databases,Methods in Ecology and Evolution

当前位置： X-MOL 学术 › Methods Ecol. Evol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DBTree: Very large phylogenies in portable databases
Methods in Ecology and Evolution ( IF 6.6 ) Pub Date : 2020-01-26 , DOI: 10.1111/2041-210x.13337
Rutger A. Vos _{1,

2}

Affiliation

Growing numbers of large phylogenetic syntheses are being published. Sometimes as part of a hypothesis testing framework, sometimes to present novel methods of phylogenetic inference, and sometimes as a snapshot of the diversity within a database. Commonly used methods to reuse these trees in scripting environments have their limitations.
I present a toolkit that transforms data presented in the most commonly used format for such trees into a database schema that facilitates quick topological queries. Specifically, the need for recursive traversal commonly presented by schemata based on adjacency lists is largely obviated. This is accomplished by computing pre‐ and post‐order indexes and node heights on the topology as it is being ingested.
The resulting toolkit provides several command line tools to do the transformation and to extract subtrees from the resulting database files. In addition, reusable library code with object–relational mappings for programmatic access is provided. To demonstrate the utility of the general approach I also provide database files for trees published by Open Tree of Life, Greengenes, D‐PLACE, PhyloTree, the NCBI taxonomy and a recent estimate of plant phylogeny.
The database files that the toolkit produces are highly portable (either as SQLite or tabular text) and can readily be queried, for example, in the R environment. Programming languages with mature frameworks for object‐relational mapping and phylogenetic tree analysis, such as Python, can use these facilities to make much larger phylogenies conveniently accessible to researcher programmers.

中文翻译：

DBTree：便携式数据库中的大型系统发育

越来越多的大型系统进化合成正在被发表。有时作为假设检验框架的一部分，有时提出系统发生推断的新方法，有时作为数据库内多样性的快照。在脚本环境中重用这些树的常用方法有其局限性。
我提供了一个工具包，该工具包可将以此类树的最常用格式显示的数据转换为便于快速拓扑查询的数据库模式。具体地，很大程度上消除了由模式基于邻接列表通常提出的递归遍历的需求。这是通过在摄取拓扑时计算其前后索引和节点高度来实现的。
生成的工具包提供了几个命令行工具来进行转换并从生成的数据库文件中提取子树。此外，还提供了具有对象-关系映射的可重用库代码，以进行程序访问。为了演示通用方法的实用性，我还提供了由生命开放树，Greengenes，D-PLACE，PhyloTree，NCBI分类法和最近对植物系统发生的估计所发表的树木的数据库文件。
该工具包生成的数据库文件具有很高的可移植性（SQLite或表格文本），并且可以很容易地查询到，例如，在R环境中。具有成熟的用于对象关系映射和系统树分析的框架的编程语言（例如Python）可以使用这些功能来使研究人员方便地访问更大的系统树。

更新日期：2020-01-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>