当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach.
Bioinformatics ( IF 5.8 ) Pub Date : 2019-12-15 , DOI: 10.1093/bioinformatics/btz421
Camilo L M Morais 1 , Marfran C D Santos 2 , Kássio M G Lima 2 , Francis L Martin 1
Affiliation  

MOTIVATION Data splitting is a fundamental step for building classification models with spectral data, especially in biomedical applications. This approach is performed following pre-processing and prior to model construction, and consists of dividing the samples into at least training and test sets; herein, the training set is used for model construction and the test set for model validation. Some of the most-used methodologies for data splitting are the random selection (RS) and the Kennard-Stone (KS) algorithms; here, the former works based on a random splitting process and the latter is based on the calculation of the Euclidian distance between the samples. We propose an algorithm called the Morais-Lima-Martin (MLM) algorithm, as an alternative method to improve data splitting in classification models. MLM is a modification of KS algorithm by adding a random-mutation factor. RESULTS RS, KS and MLM performance are compared in simulated and six real-world biospectroscopic applications using principal component analysis linear discriminant analysis (PCA-LDA). MLM generated a better predictive performance in comparison with RS and KS algorithms, in particular regarding sensitivity and specificity values. Classification is found to be more well-equilibrated using MLM. RS showed the poorest predictive response, followed by KS which showed good accuracy towards prediction, but relatively unbalanced sensitivities and specificities. These findings demonstrate the potential of this new MLM algorithm as a sample selection method for classification applications in comparison with other regular methods often applied in this type of data. AVAILABILITY AND IMPLEMENTATION MLM algorithm is freely available for MATLAB at https://doi.org/10.6084/m9.figshare.7393517.v1.

中文翻译:

使用随机突变 Kennard-Stone 算法方法改进光谱化学分析中分类应用的数据拆分。

动机 数据拆分是使用光谱数据构建分类模型的基本步骤,尤其是在生物医学应用中。这种方法是在预处理之后和模型构建之前执行的,包括将样本至少分为训练集和测试集;这里,训练集用于模型构建,测试集用于模型验证。一些最常用的数据拆分方法是随机选择 (RS) 和 Kennard-Stone (KS) 算法;在这里,前者基于随机分裂过程,后者基于计算样本之间的欧几里得距离。我们提出了一种称为 Morais-Lima-Martin (MLM) 算法的算法,作为改进分类模型中数据拆分的替代方法。MLM 是 KS 算法通过添加随机变异因子的修改。结果 使用主成分分析线性判别分析 (PCA-LDA) 在模拟和六个真实世界的生物光谱应用中比较 RS、KS 和 MLM 性能。与 RS 和 KS 算法相比,MLM 产生了更好的预测性能,特别是在敏感性和特异性值方面。使用 MLM 可以更好地平衡分类。RS 显示出最差的预测反应,其次是 KS,其对预测显示出良好的准确性,但敏感性和特异性相对不平衡。与通常用于此类数据的其他常规方法相比,这些发现证明了这种新的 MLM 算法作为分类应用的样本选择方法的潜力。
更新日期:2020-01-13
down
wechat
bug