当前位置: X-MOL 学术bioRxiv. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic inference of demographic parameters using Generative Adversarial Networks
bioRxiv - Genomics Pub Date : 2021-02-09 , DOI: 10.1101/2020.08.05.237834
Zhanpeng Wang , Jiaping Wang , Michael Kourakos , Nhung Hoang , Hyong Hark Lee , Iain Mathieson , Sara Mathieson

Population genetics relies heavily on simulated data for validation, inference, and intuition. In particular, since the evolutionary "ground truth" for real data is always limited, simulated data is crucial for training supervised machine learning methods. Simulation software can accurately model evolutionary processes, but requires many hand-selected input parameters. As a result, simulated data often fails to mirror the properties of real genetic data, which limits the scope of methods that rely on it. Here, we develop a novel approach to estimating parameters in population genetic models that automatically adapts to data from any population. Our method, pg-gan, is based on a generative adversarial network that gradually learns to generate realistic synthetic data. We demonstrate that our method is able to recover input parameters in a simulated isolation-with-migration model. We then apply our method to human data from the 1000 Genomes Project, and show that we can accurately recapitulate the features of real data.

中文翻译:

使用生成对抗网络自动推断人口统计参数

人口遗传学在很大程度上依赖于模拟数据来进行验证,推断和直觉。特别是,由于始终限制实际数据的进化“地面真理”,因此模拟数据对于训练监督型机器学习方法至关重要。仿真软件可以准确地模拟进化过程,但是需要许多手动选择的输入参数。结果,模拟数据通常无法反映真实遗传数据的属性,从而限制了依赖于该方法的方法的范围。在这里,我们开发了一种新的方法来估算种群遗传模型中的参数,该方法可以自动适应来自任何种群的数据。我们的方法pg-gan基于生成的对抗网络,该网络逐渐学会生成逼真的合成数据。我们证明了我们的方法能够在模拟的带迁移隔离模型中恢复输入参数。然后,我们将我们的方法应用于来自1000个基因组计划的人类数据,并表明我们可以准确地概括真实数据的特征。
更新日期:2021-02-10
down
wechat
bug