当前位置: X-MOL 学术Plant Genome › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning approaches to identify core and dispensable genes in pangenomes
The Plant Genome ( IF 3.9 ) Pub Date : 2021-09-17 , DOI: 10.1002/tpg2.20135
Alan E Yocca 1, 2 , Patrick P Edger 2, 3
Affiliation  

A gene in a given taxonomic group is either present in every individual (core) or absent in at least a single individual (dispensable). Previous pangenomic studies have identified certain functional differences between core and dispensable genes. However, identifying if a gene belongs to the core or dispensable portion of the genome requires the construction of a pangenome, which involves sequencing the genomes of many individuals. Here we aim to leverage the previously characterized core and dispensable gene content for two grass species [Brachypodium distachyon (L.) P. Beauv. and Oryza sativa L.] to construct a machine learning model capable of accurately classifying genes as core or dispensable using only a single annotated reference genome. Such a model may mitigate the need for pangenome construction, an expensive hurdle especially in orphan crops, which often lack the adequate genomic resources.

中文翻译:

机器学习方法识别泛基因组中的核心和可有可无的基因

给定分类群中的基因要么存在于每个个体中(核心),要么存在于至少一个个体中(可有可无)。以前的泛基因组研究已经确定了核心基因和可有可无的基因之间的某些功能差异。然而,确定一个基因是否属于基因组的核心或可有可无的部分需要构建泛基因组,这涉及对许多个体的基因组进行测序。在这里,我们的目标是利用先前表征的两种草种的核心和可有可无的基因含量[ Brachypodium distachyon (L.) P. Beauv。和水稻L.] 构建一个机器学习模型,该模型能够仅使用单个带注释的参考基因组将基因准确分类为核心基因或可有可无的基因。这样的模型可以减轻泛基因组构建的需求,这是一个昂贵的障碍,尤其是在孤儿作物中,这些作物通常缺乏足够的基因组资源。
更新日期:2021-09-17
down
wechat
bug