当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection.
Molecular Biology and Evolution ( IF 10.7 ) Pub Date : 2020-05-27 , DOI: 10.1093/molbev/msaa134
Xiaoheng Cheng 1, 2 , Michael DeGiorgio 3
Affiliation  

Abstract
Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.


中文翻译:

灵活的混合模型方法,可适应足迹尺寸的变化,以稳健地检测平衡选择。

摘要
长期平衡选择通常会留下增加遗传多样性的狭窄足迹,因此大多数检测方法只有在检查足够小的基因组区域(即窗口)时才能实现最佳性能。此类方法对窗口尺寸敏感,并且当窗口较大时会遭受大量功率损失。在这里,我们采用混合模型构建一组五个复合似然比检验统计量,我们统称为B统计量。这些统计数据与窗口大小无关,并且可以对不同形式的输入数据进行操作。通过模拟,我们表明它们表现出与当前性能最佳的方法相当的功率,并且无论窗口大小如何,都保持相当高的功率。它们还对高突变率和不均匀的重组景观以及一系列其他常见的混淆场景表现出相当的鲁棒性。此外,我们将B统计的特定版本(称为B 2 )应用于人类群体基因组数据集,并从先前的研究中恢复了许多顶级候选者,包括当时未表征的STPG2CCDC169SOHLH2,两者都与配子功能。我们进一步将B 2应用于倭黑猩猩种群基因组数据集。除了MHC-DQ基因之外,我们还发现了几个新的候选基因,例如参与病毒防御的KLRD1和与疼痛感知相关的SCN9A。最后,我们展示了我们的方法可以扩展到考虑多等位基因平衡选择,并将这组统计数据集成到名为 BalLeRMix 的开源软件中,以供科学界未来应用。
更新日期:2020-11-21
down
wechat
bug