当前位置: X-MOL 学术Annu. Rev. Stat. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data Integration in Bayesian Phylogenetics
Annual Review of Statistics and Its Application ( IF 7.4 ) Pub Date : 2022-09-28 , DOI: 10.1146/annurev-statistics-033021-112532
Gabriel W Hassler 1 , Andrew Magee 2 , Zhenyu Zhang 2 , Guy Baele 3 , Philippe Lemey 3 , Xiang Ji 4 , Mathieu Fourment 5 , Marc A Suchard 1, 2, 6
Affiliation  

Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g., DNA), time, location (both continuous and discrete), and environmental covariates (e.g., social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increasedsize and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances, as well as unsolved problems and areas of active research.

中文翻译:


贝叶斯系统发育学中的数据集成



研究病毒病原体和其他生物体进化的研究人员越来越多地遇到并使用来自多个不同来源的大型复杂数据集。贝叶斯系统发育学的统计研究已经迎接了这一挑战。研究人员利用系统发育学不仅可以重建一组生物体的进化历史,还可以了解指导其进化和在空间和时间上传播的过程。为此,整合众多数据源现已成为常态。例如,研究病毒在某个地区传播的流行病学家将包括基因序列(例如 DNA)、时间、位置(连续和离散)和环境协变量(例如地区之间的社会连通性)在内的数据整合到一个连贯的统计模型中。进化生物学家通常对基因序列、位置、时间、化石和现代表型以及生态协变量做同样的事情。这些复杂的分层模型很容易容纳离散和连续数据,并具有巨大的组合离散/连续参数空间,至少包括系统发育树拓扑和分支长度。这些统计模型规模和复杂性的增加刺激了计算方法的进步,使其易于处理。我们讨论建模和计算的进展,以及未解决的问题和活跃的研究领域。
更新日期:2022-09-28
down
wechat
bug