当前位置: X-MOL 学术Brief. Funct. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning-based approaches for disease gene prediction.
Briefings in Functional Genomics ( IF 2.5 ) Pub Date : 2020-06-22 , DOI: 10.1093/bfgp/elaa013
Duc-Hau Le 1
Affiliation  

Disease gene prediction is an essential issue in biomedical research. In the early days, annotation-based approaches were proposed for this problem. With the development of high-throughput technologies, interaction data between genes/proteins have grown quickly and covered almost genome and proteome; thus, network-based methods for the problem become prominent. In parallel, machine learning techniques, which formulate the problem as a classification, have also been proposed. Here, we firstly show a roadmap of the machine learning-based methods for the disease gene prediction. In the beginning, the problem was usually approached using a binary classification, where positive and negative training sample sets are comprised of disease genes and non-disease genes, respectively. The disease genes are ones known to be associated with diseases; meanwhile, non-disease genes were randomly selected from those not yet known to be associated with diseases. However, the later may contain unknown disease genes. To overcome this uncertainty of defining the non-disease genes, more realistic approaches have been proposed for the problem, such as unary and semi-supervised classification. Recently, more advanced methods, including ensemble learning, matrix factorization and deep learning, have been proposed for the problem. Secondly, 12 representative machine learning-based methods for the disease gene prediction were examined and compared in terms of prediction performance and running time. Finally, their advantages, disadvantages, interpretability and trust were also analyzed and discussed.

中文翻译:

基于机器学习的疾病基因预测方法。

疾病基因预测是生物医学研究中的一个重要问题。在早期,针对这个问题提出了基于注释的方法。随着高通量技术的发展,基因/蛋白质之间的相互作用数据迅速增长,几乎涵盖了基因组和蛋白质组;因此,针对该问题的基于网络的方法变得突出。同时,还提出了将问题表述为分类的机器学习技术。在这里,我们首先展示了基于机器学习的疾病基因预测方法的路线图。一开始,这个问题通常使用二元分类来解决,其中正负训练样本集分别由疾病基因和非疾病基因组成。疾病基因是已知与疾病相关的基因;同时,非疾病基因是从那些尚不知道与疾病相关的基因中随机选择的。然而,后者可能含有未知的疾病基因。为了克服定义非疾病基因的这种不确定性,已经针对该问题提出了更现实的方法,例如一元和半监督分类。最近,针对该问题提出了更先进的方法,包括集成学习、矩阵分解和深度学习。其次,在预测性能和运行时间方面对 12 种具有代表性的基于机器学习的疾病基因预测方法进行了检查和比较。最后,还对它们的优缺点、可解释性和信任度进行了分析和讨论。后者可能含有未知的疾病基因。为了克服定义非疾病基因的这种不确定性,已经针对该问题提出了更现实的方法,例如一元和半监督分类。最近,针对该问题提出了更先进的方法,包括集成学习、矩阵分解和深度学习。其次,在预测性能和运行时间方面对 12 种具有代表性的基于机器学习的疾病基因预测方法进行了检查和比较。最后,还对它们的优缺点、可解释性和信任度进行了分析和讨论。后者可能含有未知的疾病基因。为了克服定义非疾病基因的这种不确定性,已经针对该问题提出了更现实的方法,例如一元和半监督分类。最近,针对该问题提出了更先进的方法,包括集成学习、矩阵分解和深度学习。其次,在预测性能和运行时间方面对 12 种具有代表性的基于机器学习的疾病基因预测方法进行了检查和比较。最后,还对它们的优缺点、可解释性和信任度进行了分析和讨论。已经针对该问题提出了更先进的方法,包括集成学习、矩阵分解和深度学习。其次,在预测性能和运行时间方面对 12 种具有代表性的基于机器学习的疾病基因预测方法进行了检查和比较。最后,还对它们的优缺点、可解释性和信任度进行了分析和讨论。已经针对该问题提出了更高级的方法,包括集成学习、矩阵分解和深度学习。其次,在预测性能和运行时间方面对 12 种具有代表性的基于机器学习的疾病基因预测方法进行了检查和比较。最后,还对它们的优缺点、可解释性和信任度进行了分析和讨论。
更新日期:2020-06-22
down
wechat
bug