Supervised learning algorithms in the classification of plant populations with different degrees of kinship,Brazilian Journal of Botany

当前位置： X-MOL 学术 › Braz. J. Bot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Supervised learning algorithms in the classification of plant populations with different degrees of kinship
Brazilian Journal of Botany ( IF 1.6 ) Pub Date : 2021-02-04 , DOI: 10.1007/s40415-021-00703-1
Leandro Skowronski , Paula Martin de Moraes , Mario Luiz Teixeira de Moraes , Wesley Nunes Gonçalves , Michel Constantino , Celso Soares Costa , Wellington Santos Fava , Reginaldo B. Costa

The population discrimination and the classification of individuals have great importance for genetic improvement in population studies and genetic diversity conservation. Furthermore, multivariate approaches are often used, especially the Fisher and Anderson discriminant functions. New methodologies based on machine learning (ML) have shown to be promising for such procedures, but there is nonetheless a need for further evaluation and comparison of these methods. Thus, the present study evaluates the efficacy of supervised ML algorithms in classifying populations with different degrees of similarity—comparing them with discriminant analysis techniques proposed by Anderson and by Fisher. The methods of supervised ML tested were as follows: Naive Bayes, Decision Tree, k-Nearest Neighbors (kNN), Random Forest, Support Vector Machine (SVM) and Multi-layer Perceptron Neural Networks (MLP/ANN). To compare classification methods, we used phenotypic data of populations with different degrees of genetic similarity. Data stemmed from the genotypic information simulation for different populations submitted to the backcrossing scheme. Accuracy here means 30 repetitions from each classification method were compared by the Friedman and Nemenyi tests with a 95% confidence level. Classification methods based on machine learning algorithms showed superior results to the Fisher and Anderson discriminant functions, obtaining high accuracy where there was a higher similarity between populations. The kNN, Random Forest, SVM and Naive Bayes algorithms presented the highest accuracy, surpassing the Decision Tree algorithm and even MLP/ANN (which lost accuracy at a 96.88% similarity condition between populations). Thus, the present work confirms that ML techniques demonstrate greater accuracy in the discrimination and classification of populations without the limitations of statistical techniques.

中文翻译：

监督学习算法在不同亲缘关系植物种群分类中的应用

人口歧视和个体分类对于人口研究中的遗传改良和遗传多样性保护非常重要。此外，经常使用多元方法，尤其是Fisher和Anderson判别函数。基于机器学习（ML）的新方法已被证明可用于此类程序，但是仍然需要对这些方法进行进一步的评估和比较。因此，本研究评估了监督ML算法在对相似度不同的人群进行分类中的功效，并将其与Anderson和Fisher提出的判别分析技术进行了比较。监督的ML测试方法如下：朴素贝叶斯，决策树，k最近邻（kNN），随机森林，支持向量机（SVM）和多层感知器神经网络（MLP / ANN）。为了比较分类方法，我们使用了具有不同遗传相似度的人群的表型数据。数据来源于提交回交方案的不同人群的基因型信息模拟。此处的准确性是指通过Friedman和Nemenyi检验比较了每种分类方法的30次重复，置信度为95％。基于机器学习算法的分类方法显示出优于Fisher和Anderson判别函数的结果，在总体之间具有更高相似性的情况下获得了较高的准确性。kNN，Random Forest，SVM和Naive Bayes算法表现出最高的准确性，超过了决策树算法甚至MLP / ANN（后者的准确性下降了96。群体之间的相似度为88％）。因此，目前的工作证实了机器学习技术在对人口的辨别和分类中表现出更高的准确性，而没有统计技术的限制。

更新日期：2021-02-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>