当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Regarding the F-word: The effects of data filtering on inferred genotype-environment associations
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2021-02-10 , DOI: 10.1111/1755-0998.13351
Collin W Ahrens 1 , Rebecca Jordan 2 , Jason Bragg 3 , Peter A Harrison 4 , Tara Hopley 5 , Helen Bothwell 6 , Kevin Murray 6 , Dorothy A Steane 2, 4 , John W Whale 1 , Margaret Byrne 5 , Rose Andrew 7 , Paul D Rymer 1
Affiliation  

Genotype-environment association (GEA) methods have become part of the standard landscape genomics toolkit, yet, we know little about how to best filter genotype-by-sequencing data to provide robust inferences for environmental adaptation. In many cases, default filtering thresholds for minor allele frequency and missing data are applied regardless of sample size, having unknown impacts on the results, negatively affecting management strategies. Here, we investigate the effects of filtering on GEA results and the potential implications for assessment of adaptation to environment. We use empirical and simulated data sets derived from two widespread tree species to assess the effects of filtering on GEA outputs. Critically, we find that the level of filtering of missing data and minor allele frequency affect the identification of true positives. Even slight adjustments to these thresholds can change the rate of true positive detection. Using conservative thresholds for missing data and minor allele frequency substantially reduces the size of the data set, lessening the power to detect adaptive variants (i.e., simulated true positives) with strong and weak strengths of selection. Regardless, strength of selection was a good predictor for GEA detection, but even some SNPs under strong selection went undetected. False positive rates varied depending on the species and GEA method, and filtering significantly impacted the predictions of adaptive capacity in downstream analyses. We make several recommendations regarding filtering for GEA methods. Ultimately, there is no filtering panacea, but some choices are better than others, depending on the study system, availability of genomic resources, and desired objectives.

中文翻译:

关于 F 字:数据过滤对推断的基因型-环境关联的影响

基因型-环境关联 (GEA) 方法已成为标准景观基因组学工具包的一部分,然而,我们对如何最好地过滤基因型测序数据以提供环境适应的可靠推断知之甚少。在许多情况下,无论样本大小如何,都会应用次要等位基因频率和缺失数据的默认过滤阈值,对结果产生未知影响,对管理策略产生负面影响。在这里,我们研究了过滤对 GEA 结果的影响以及对环境适应性评估的潜在影响。我们使用源自两种广泛分布的树种的经验和模拟数据集来评估过滤对 GEA 输出的影响。至关重要的是,我们发现缺失数据的过滤水平和次要等位基因频率会影响真阳性的识别。即使对这些阈值稍作调整,也能改变真阳性检测率。对缺失数据和次要等位基因频率使用保守的阈值大大减少了数据集的大小,降低了检测具有强弱选择强度的适应性变异(即模拟真阳性)的能力。无论如何,选择强度是 GEA 检测的一个很好的预测指标,但即使是一些在强选择下的 SNP 也未被检测到。假阳性率因物种和 GEA 方法而异,过滤显着影响了下游分析中适应性能力的预测。我们针对 GEA 方法的过滤提出了一些建议。归根结底,没有过滤灵丹妙药,但有些选择比其他选择更好,这取决于研究系统、基因组资源的可用性、
更新日期:2021-02-10
down
wechat
bug