A modified genetic algorithm and weighted principal component analysis based feature selection and extraction strategy in agriculture,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A modified genetic algorithm and weighted principal component analysis based feature selection and extraction strategy in agriculture
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2021-09-03 , DOI: 10.1016/j.knosys.2021.107460
K. Aditya Shastry ₁ , Sanjay H.A. ₁

Affiliation

Data pre-processing is a technique that transforms the raw data into a useful format for applying machine learning (ML) techniques. Feature selection (FS) and feature extraction (FeExt) form significant components of data pre-processing. FS is the identification of relevant features that enhances the accuracy of a model. Since, agricultural data contain diverse features related to climate, soil, fertilizer, FS attains significant importance as irrelevant features may adversely impact the prediction of the model built. Likewise, FeExt involves the derivation of new attributes from the prevailing attributes. All the information that the original attributes possess is present in these new features minus the duplicity. Keeping these points in mind, this work proposes a hybrid feature selection and feature extraction strategy for selecting features from the agricultural data set. A modified-Genetic Algorithm (m-GA) was developed by designing a fitness function based on “Mutual Information” (MutInf), and “Root Mean Square Error” (RtMSE) to choose the best features that affected the target attribute (crop yield in this case). These selected features were then subjected to feature extraction using “weighted principal component analysis” (wgt-PCA). The extracted features were then fed into different ML models viz. “Regression” (Reg), “Artificial Neural Networks” (ArtNN), “Adaptive Neuro Fuzzy Inference System” (ANFIS), “Ensemble of Trees” (EnT), and “Support Vector Regression” (SuVR). Trials on 3 benchmark and 8 real-world farming datasets revealed that the developed hybrid feature selection and extraction technique performed with significant improvements with respect to Rsq², RtMSE, and “mean absolute error” (MAE) in comparison to FS and FeExt methods such as Correlation Analysis (CA), Singular Valued Decomposition (SiVD), Genetic Algorithm (GA), and wgt-PCA on “benchmark” and “real-world” farming datasets.

中文翻译：

基于改进遗传算法和加权主成分分析的农业特征选择与提取策略

数据预处理是一种将原始数据转换为应用机器学习 (ML) 技术的有用格式的技术。特征选择 (FS) 和特征提取 (FeExt) 构成了数据预处理的重要组成部分。FS 是对提高模型准确性的相关特征的识别。由于农业数据包含与气候、土壤、肥料相关的多种特征，因此 FS 具有重要意义，因为不相关的特征可能会对构建的模型的预测产生不利影响。同样，FeExt 涉及从现有属性中推导出新属性。原始属性拥有的所有信息都存在于这些新特征中，但没有重复。牢记这几点，这项工作提出了一种混合特征选择和特征提取策略，用于从农业数据集中选择特征。通过设计基于“互信息”(MutInf) 和“均方根误差”(RtMSE) 的适应度函数来选择影响目标属性（作物产量）的最佳特征，开发了一种改进的遗传算法（m-GA）在这种情况下）。然后使用“加权主成分分析”（wgt-PCA）对这些选定的特征进行特征提取。然后将提取的特征输入不同的 ML 模型，即。“回归”（Reg）、“人工神经网络”（ArtNN）、“自适应神经模糊推理系统”（ANFIS）、“树的集合”（EnT）和“支持向量回归”（SuVR）。²、RtMSE 和“平均绝对误差”(MAE) 与 FS 和 FeExt 方法相比，例如“基准”上的相关分析 (CA)、奇异值分解 (SiVD)、遗传算法 (GA) 和 wgt-PCA 和“真实世界”农业数据集。

更新日期：2021-09-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11