当前位置: X-MOL 学术bioRxiv. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen
bioRxiv - Genomics Pub Date : 2021-03-22 , DOI: 10.1101/2020.10.23.352468
Thomas Badet , Simone Fouché , Fanny E. Hartmann , Marcello Zala , Daniel Croll

Species harbor extensive structural variation underpinning recent adaptive evolution and major disease phenotypes. Most sequence rearrangements are generated non-randomly along the genome through non-allelic recombination and transposable element activity. However, the causality between genomic features and the induction of new rearrangements is poorly established. Here, we analyze a global set of telomere-to-telomere genome assemblies of a major fungal pathogen of wheat to establish a nucleotide-level map of structural variation. We show that the recent emergence of pesticide resistance has been disproportionally driven by rearrangements. We used machine-learning to train a model on structural variation events based on 30 chromosomal sequence features. We show that base composition and gene density are the major determinants of structural variation. Low-copy LINE and Gypsy retrotransposons explain most inversion, indel and duplication events. We retrain our model on Arabidopsis thaliana and show that our modelling approach can be extended to more complex genomes. Finally, we analyzed complete genomes of haploid offspring in a four-generation pedigree. Meiotic crossover locations were enriched for newly generated structural variation consistent with crossovers being mutational hotspots. The model trained on species-wide structural variation predicted the position of >74% of the newly generated variants along the pedigree. The predictive power highlights causality between specific sequence features and the induction of chromosomal rearrangements. Our work demonstrates that training sequence-derived models can accurately identify regions of intrinsic DNA instability in eukaryotic genomes.

中文翻译:

机器学习预测真核病原体中由减数分裂驱动的结构变异的基因组决定因素

物种具有广泛的结构变异,这些变异是近期适应性进化和主要疾病表型的基础。大多数序列重排是通过非等位基因重组和转座因子活性沿基因组非随机产生的。但是,在基因组特征和新的重排的诱导之间的因果关系还很不明确。在这里,我们分析小麦的主要真菌病原体的端粒到端粒基因组大会的全球设置,以建立核苷酸水平的结构变异图。我们表明,最近出现的农药抗性是由重排驱动的。我们使用机器学习来训练基于30个染色体序列特征的结构变异事件模型。我们表明碱基组成和基因密度是结构变异的主要决定因素。低拷贝的LINE和吉普赛逆转座子可解释大多数倒位,插入缺失和重复事件。我们在拟南芥上对我们的模型进行了重新训练,并表明我们的建模方法可以扩展到更复杂的基因组。最后,我们在四代谱系中分析了单倍体后代的完整基因组。减数分裂的交换位置被丰富了,以产生新的结构变异,这与交换是突变热点是一致的。在全物种结构变异上训练的模型预测了谱系中新生成的变异中> 74%的位置。预测能力突出了特定序列特征与染色体重排的诱导之间的因果关系。
更新日期:2021-03-22
down
wechat
bug