当前位置: X-MOL 学术J. Cloud Comp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rough fuzzy model based feature discretization in intelligent data preprocess
Journal of Cloud Computing ( IF 3.7 ) Pub Date : 2021-01-18 , DOI: 10.1186/s13677-020-00216-4
Qiong Chen , Mengxing Huang

Feature discretization is an important preprocessing technology for massive data in industrial control. It improves the efficiency of edge-cloud computing by transforming continuous features into discrete ones, so as to meet the requirements of high-quality cloud services. Compared with other discretization methods, the discretization based on rough set has achieved good results in many applications because it can make full use of the known knowledge base without any prior information. However, the equivalence class of rough set is an ordinary set, which is difficult to describe the fuzzy components in the data, and the accuracy is low in some complex data types in big data environment. Therefore, we propose a rough fuzzy model based discretization algorithm (RFMD). Firstly, we use fuzzy c-means clustering to get the membership of each sample to each category. Then, we fuzzify the equivalence class of rough set by the obtained membership, and establish the fitness function of genetic algorithm based on rough fuzzy model to select the optimal discrete breakpoints on the continuous features. Finally, we compare the proposed method with the discretization algorithm based on rough set, the discretization algorithm based on information entropy, and the discretization algorithm based on chi-square test on remote sensing datasets. The experimental results verify the effectiveness of our method.

中文翻译:

智能数据预处理中基于粗糙模糊模型的特征离散

特征离散化是工业控制中海量数据的重要预处理技术。通过将连续特征转换为离散特征,提高了边缘云计算的效率,以满足高质量云服务的需求。与其他离散化方法相比,基于粗糙集的离散化在很多应用中都取得了良好的效果,因为它可以在没有任何先验信息的情况下充分利用已知的知识库。但是,粗糙集的等价类是普通集,难以描述数据中的模糊成分,在大数据环境中某些复杂数据类型的准确性较低。因此,我们提出了一种基于模糊模型的离散化算法(RFMD)。首先,我们使用模糊c均值聚类来获取每个样本到每个类别的成员资格。然后,对获得的隶属度模糊化粗糙集的等价类,建立基于粗糙模糊模型的遗传算法的适应度函数,以在连续特征上选择最优的离散断点。最后,我们将该方法与基于粗糙集的离散化算法,基于信息熵的离散化算法以及基于卡方检验的离散化算法进行了遥感数据集的比较。实验结果证明了该方法的有效性。建立基于粗糙模糊模型的遗传算法的适应度函数,以在连续特征上选择最佳离散断点。最后,我们将该方法与基于粗糙集的离散化算法,基于信息熵的离散化算法以及基于卡方检验的离散化算法进行了遥感数据集的比较。实验结果证明了该方法的有效性。建立基于粗糙模糊模型的遗传算法的适应度函数,以在连续特征上选择最佳离散断点。最后,我们将该方法与基于粗糙集的离散化算法,基于信息熵的离散化算法以及基于卡方检验的离散化算法进行了遥感数据集的比较。实验结果证明了该方法的有效性。
更新日期:2021-01-18
down
wechat
bug