Improved data clustering methods and integrated A-FP algorithm for crop yield prediction,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improved data clustering methods and integrated A-FP algorithm for crop yield prediction
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2021-07-15 , DOI: 10.1007/s10619-021-07350-1
P. Suvitha Vani ₁ , S. Rathi ₂

Affiliation

Big data analysis is the process of gathering, managing and analyzing a large volume of data to determine patterns and other valuable information. Agricultural data can be a significant area of big data applications. The big data analysis for agricultural data can comprise the various data from both internal systems and outside sources like weather data, soil data, and crop data. Though big data analysis has led to advances in different industries, it has not yet been extensively used in agriculture. Several machine learning techniques are developed to cluster the data for the prediction of crop yield. However, it has low accuracy and low quality of the clustering. To improve clustering accuracy with less complexity, a Proximity Likelihood Maximization Data Clustering (PLMDC) technique is developed for both sparse and densely distributed agricultural big data to enhance the accuracy of crop yield prediction for farmers. In this process, unnecessary data is cleansed from the sparse and dense based agricultural data using a logical linear regression model. After that, the presented clustering method is executed depending on the similarity and weight-based Manhattan distance. The genetic algorithm (GA) is applied with a good fitness function to select the features from the clustered data. Finally, the decision support system is computed by the A-FP growth algorithm to predict the crop yields according to their selected features such as weather features and crop features. The results of the proposed PLMDC technique are better in case of clustering accuracy of both spare and densely distributed data with minimum time and space complexity. Based on the results observations, the PLMDC technique is more efficient than the existing methods.

中文翻译：

改进的数据聚类方法和用于作物产量预测的集成 A-FP 算法

大数据分析是收集、管理和分析大量数据以确定模式和其他有价值信息的过程。农业数据可能是大数据应用的一个重要领域。农业数据的大数据分析可以包括来自内部系统和外部来源的各种数据，如天气数据、土壤数据和作物数据。大数据分析虽然在不同行业取得了进步，但尚未广泛应用于农业。开发了几种机器学习技术来聚类数据以预测作物产量。然而，它具有低精度和低聚类质量。为了以较低的复杂性提高聚类精度，针对稀疏和密集分布的农业大数据开发了近似似然最大化数据聚类 (PLMDC) 技术，以提高农民对作物产量预测的准确性。在这个过程中，使用逻辑线性回归模型从稀疏和密集的农业数据中清除不必要的数据。之后，根据相似性和基于权重的曼哈顿距离执行所提出的聚类方法。遗传算法（GA）应用了一个很好的适应度函数来从聚类数据中选择特征。最后，决策支持系统通过 A-FP 生长算法计算，根据其选择的特征，如天气特征和作物特征来预测作物产量。在具有最小时间和空间复杂度的备用和密集分布数据的聚类精度的情况下，所提出的 PLMDC 技术的结果更好。根据结果观察，PLMDC 技术比现有方法更有效。

更新日期：2021-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11