当前位置: X-MOL 学术Aut. Control Comp. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Parallel Data Mining Approach Based on Segmentation and Pruning Optimization
Automatic Control and Computer Sciences ( IF 0.6 ) Pub Date : 2021-01-14 , DOI: 10.3103/s0146411620060097
Jiameng Wang , Yunfei Yin , Xiyu Deng

Abstract

Parallel optimization is one of the important research topics of data mining at this stage. Taking CART parallelization as an example, a parallel data mining algorithm based on segmentation and pruning optimization is proposed, namely SSP-OGini-PCCP optimization. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. The experimental results show that this method can significantly improve the efficiency of data classification and decision making, which meets the high demands of contemporary mass data processing.



中文翻译:

基于分割和修剪优化的并行数据挖掘方法

摘要

并行优化是现阶段数据挖掘的重要研究课题之一。以CART并行化为例,提出了一种基于分割和修剪优化的并行数据挖掘算法,即SSP-OGini-PCCP优化。针对选择最佳CART分割点的问题,本文设计了一种无数据关联的S-SP模型。为了有效地计算基尼系数,设计了一种并行的OGini计算方法。另外,为了提高修剪算法的效率,提出了一种同步PCCP修剪策略。本文对最优分割计算,基尼系数计算和修剪算法进行了深入研究。这些是并行数据挖掘的重要组成部分。通过构建基于SPARK的分布式集群仿真系统,测试了基于SSP-OGini-PCCP的数据挖掘方法。实验结果表明,该方法可以显着提高数据分类和决策的效率,可以满足当代海量数据处理的高要求。

更新日期:2021-01-15
down
wechat
bug