Can statistical learning models make early selection among sugarcane families easier and still efficient?,Crop Science

当前位置： X-MOL 学术 › Crop Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Can statistical learning models make early selection among sugarcane families easier and still efficient?
Crop Science ( IF 2.0 ) Pub Date : 2020-09-16 , DOI: 10.1002/csc2.20334
Édimo Fernando Alves Moreira ₁ , Marcio Henrique Pereira Barbosa ₂ , Luiz Alexandre Peternelli ₃

Affiliation

The selection of genotypes at the early stages is one of the main challenges facing sugarcane (Saccharum officinarum L.) breeding programs. The present work aimed to compare classification techniques, namely, logistic regression (LR), k‐nearest neighbor (KNN), random forests (RF), and support vector machine (SVM) against the selection among families of sugarcane via artificial neural networks (ANN) and via a procrefers to the families incorrectly selected byedure based on the weighing of the plots. The data used in this work were obtained from 110 families. In the families, the number of stalks (NS), stalk diameter (SD), and stalk height (SH) were collected, in addition to the actual yield, expressed in tons of cane per hectare (TCH). We considered the NS, SD, and SH as explanatory variables for the training of the classifiers. The response used was the indicator Y = 0 if the family is not selected via TCH or Y = 1 otherwise. To increase the efficiency in training, we produced synthetic data based on the simulation of NS, SD, SH, and TCH values. Two models were also considered: a full model with all the predictors and a reduced model without the SH. We used the apparent error rate (AER) and the true positive rate (TPR) for the evaluation of the classifiers. All classifiers present low values for the AER and high values for the TPR in both models. The best performance was observed in the SVM. The reduced model should be preferred, since its performance is very close to that of the full model and its operation is more straightforward.

中文翻译：

统计学习模型能否使甘蔗家庭中的早期选择更加容易且仍然有效？

早期选择基因型是甘蔗（Saccharum officinarum L.）育种计划面临的主要挑战之一。本工作旨在比较分类技术，即逻辑回归（LR），k-最近邻居（KNN），随机森林（RF）和支持向量机（SVM）反对通过人工神经网络（ANN）在甘蔗家族中进行选择，而通过proc则指的是根据权重的不正确选择通过育种选择的家族情节。这项工作中使用的数据来自110个家庭。在这些家庭中，除实际产量外，还收集了秸秆数量（NS），秸秆直径（SD）和秸秆高度（SH），以每公顷甘蔗吨数（TCH）表示。我们认为NS，SD和SH是用于训练分类器的解释变量。如果未通过TCH或Y选择族，则使用的响应为指标Y = 0否则为1。为了提高训练效率，我们基于对NS，SD，SH和TCH值的模拟生成了综合数据。还考虑了两个模型：具有所有预测变量的完整模型和没有SH的简化模型。我们使用表观错误率（AER）和真实阳性率（TPR）来评估分类器。在两个模型中，所有分类器的AER值都较低，而TPR的值较高。在SVM中观察到最佳性能。精简模型应该是首选，因为它的性能非常接近完整模型，并且其操作更简单。

更新日期：2020-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11