当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MrIML: Multi-response interpretable machine learning to model genomic landscapes
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2021-08-27 , DOI: 10.1111/1755-0998.13495
Nicholas M Fountain-Jones 1 , Christopher P Kozakiewicz 2 , Brenna R Forester 3 , Erin L Landguth 4 , Scott Carver 1 , Michael Charleston 1 , Roderick B Gagne 5 , Brandon Greenwell 6 , Simona Kraberger 7 , Daryl R Trumbo 3 , Michael Mayer 8 , Nicholas J Clark 9 , Gustavo Machado 10
Affiliation  

We introduce a new R package “MrIML” (“Mister iml”; Multi-response Interpretable Machine Learning). MrIML provides a powerful and interpretable framework that enables users to harness recent advances in machine learning to quantify multilocus genomic relationships, to identify loci of interest for future landscape genetics studies, and to gain new insights into adaptation across environmental gradients. Relationships between genetic variation and environment are often nonlinear and interactive; these characteristics have been challenging to address using traditional landscape genetic approaches. Our package helps capture this complexity and offers functions that fit and interpret a wide range of highly flexible models that are routinely used for single-locus landscape genetics studies but are rarely extended to estimate response functions for multiple loci. To demonstrate the package's broad functionality, we test its ability to recover landscape relationships from simulated genomic data. We also apply the package to two empirical case studies. In the first, we model genetic variation of North American balsam poplar (Populus balsamifera, Salicaceae) populations across environmental gradients. In the second case study, we recover the landscape and host drivers of feline immunodeficiency virus genetic variation in bobcats (Lynx rufus). The ability to model thousands of loci collectively and compare models from linear regression to extreme gradient boosting, within the same analytical framework, has the potential to be transformative. The MrIML framework is also extendable and not limited to modelling genetic variation; for example, it can quantify the environmental drivers of microbiomes and coinfection dynamics.

中文翻译:

MrIML:多响应可解释机器学习建模基因组景观

我们引入了一个新的 R 包“MrIML”(“Mister iml”;多响应可解释机器学习)。MrIML 提供了一个强大且可解释的框架,使用户能够利用机器学习的最新进展来量化多位点基因组关系,为未来的景观遗传学研究确定感兴趣的位点,并获得对跨环境梯度适应的新见解。遗传变异与环境之间的关系通常是非线性和相互作用的;使用传统的景观遗传方法来解决这些特征具有挑战性。我们的软件包有助于捕捉这种复杂性,并提供适合和解释各种高度灵活的模型的函数,这些模型通常用于单位点景观遗传学研究,但很少扩展到估计多个位点的响应函数。为了展示该包的广泛功能,我们测试了它从模拟基因组数据中恢复景观关系的能力。我们还将该软件包应用于两个实证案例研究。首先,我们模拟了北美香脂杨的遗传变异(Populus balsamifera , Salicaceae) 种群跨越环境梯度。在第二个案例研究中,我们恢复了山猫 ( Lynx rufus )中猫免疫缺陷病毒遗传变异的景观和宿主驱动因素。在同一分析框架内对数千个基因座进行集体建模并比较从线性回归到极端梯度提升的模型的能力具有变革性的潜力。MrIML 框架也是可扩展的,不仅限于对遗传变异进行建模;例如,它可以量化微生物组和共感染动态的环境驱动因素。
更新日期:2021-11-05
down
wechat
bug