Biotechnology Advances ( IF 16.0 ) Pub Date : 2021-08-27 , DOI: 10.1016/j.biotechadv.2021.107822 Tulio L Campos 1 , Pasi K Korhonen 2 , Andreas Hofmann 2 , Robin B Gasser 2 , Neil D Young 2
The availability of high-quality genomes and advances in functional genomics have enabled large-scale studies of essential genes in model eukaryotes, including the ‘elegant worm’ (Caenorhabditis elegans; Nematoda) and the ‘vinegar fly’ (Drosophila melanogaster; Arthropoda). However, this is not the case for other, much less-studied organisms, such as socioeconomically important parasites, for which functional genomic platforms usually do not exist. Thus, there is a need to develop innovative techniques or approaches for the prediction, identification and investigation of essential genes. A key approach that could enable the prediction of such genes is machine learning (ML). Here, we undertake an historical review of experimental and computational approaches employed for the characterisation of essential genes in eukaryotes, with a particular focus on model ecdysozoans (C. elegans and D. melanogaster), and discuss the possible applicability of ML-approaches to organisms such as socioeconomically important parasites. We highlight some recent results showing that high-performance ML, combined with feature engineering, allows a reliable prediction of essential genes from extensive, publicly available ‘omic data sets, with major potential to prioritise such genes (with statistical confidence) for subsequent functional genomic validation. These findings could ‘open the door’ to fundamental and applied research areas. Evidence of some commonality in the essential gene-complement between these two organisms indicates that an ML-engineering approach could find broader applicability to ecdysozoans such as parasitic nematodes or arthropods, provided that suitably large and informative data sets become/are available for proper feature engineering, and for the robust training and validation of algorithms. This area warrants detailed exploration to, for example, facilitate the identification and characterisation of essential molecules as novel targets for drugs and vaccines against parasitic diseases. This focus is particularly important, given the substantial impact that such diseases have worldwide, and the current challenges associated with their prevention and control and with drug resistance in parasite populations.
中文翻译:
利用模型生物基因组学来支持基于机器学习的真核生物必需基因预测——生物技术意义
高质量基因组的可用性和功能基因组学的进步使得能够对模型真核生物中的必需基因进行大规模研究,包括“优雅蠕虫”(秀丽隐杆线虫;线虫)和“醋蝇”(黑腹果蝇); 节肢动物)。然而,对于其他研究较少的生物体而言,情况并非如此,例如具有社会经济意义的寄生虫,其功能基因组平台通常不存在。因此,需要开发用于预测、鉴定和研究必需基因的创新技术或方法。可以预测此类基因的关键方法是机器学习 (ML)。在这里,我们对用于表征真核生物必需基因的实验和计算方法进行历史回顾,特别关注模型蜕皮动物(C. elegans和D. melanogaster),并讨论 ML 方法可能适用于生物体,例如具有社会经济意义的寄生虫。我们强调了一些最近的结果,这些结果表明,高性能 ML 与特征工程相结合,可以从广泛的、公开可用的“组学数据集”中可靠地预测必需基因,并具有为随后的功能基因组优先考虑这些基因(具有统计置信度)的巨大潜力验证。这些发现可以为基础和应用研究领域“打开大门”。这两种生物之间的基本基因补体存在一些共性的证据表明,ML 工程方法可以更广泛地适用于寄生线虫或节肢动物等蜕皮动物,前提是适当大且信息丰富的数据集成为/可用于适当的特征工程, 以及算法的稳健训练和验证。该领域值得详细探索,例如,促进基本分子的鉴定和表征,作为针对寄生虫病的药物和疫苗的新靶标。鉴于此类疾病在全球范围内产生的重大影响,以及当前与预防和控制相关的挑战以及寄生虫种群的耐药性,这一重点尤为重要。