当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MapReduce based improved quick reduct algorithm with granular refinement using vertical partitioning scheme
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2019-10-14 , DOI: 10.1016/j.knosys.2019.105104
Pandu Sowkuntla , P.S.V.S. Sai Prasad

In the last few decades, rough sets have evolved to become an essential technology for feature subset selection by way of reduct computation in categorical decision systems. In recent years with the proliferation of MapReduce for distributed/parallel algorithms, several scalable reduct computation algorithms have been developed in this field for large-scale decision systems using MapReduce. The existing MapReduce based reduct computation approaches use horizontal partitioning (division in object space) of the dataset into the nodes of the cluster, requiring a complicated shuffle and sort phase. In this work, we propose an algorithm MR_IQRA_VP which is designed using vertical partitioning (division in attribute space) of the dataset with a simplified shuffle and sort phase of the MapReduce framework. MR_IQRA_VP is a distributed/parallel implementation of the Improved Quick Reduct Algorithm (IQRA_IG) and is implemented using iterative MapReduce framework of Apache Spark. We have done an extensive comparative study through experimentation on benchmark decision systems using existing horizontal partitioning based reduct computation algorithms. Through experimental analysis, along with theoretical validation, we have established that MR_IQRA_VP is suitable and scalable to datasets of larger size attribute space and moderate object space prevalent in the areas of Bioinformatics and Web mining.



中文翻译:

基于MapReduce的改进的快速还原算法,使用垂直分区方案进行细化

在过去的几十年中,粗集已经发展成为通过分类决策系统中的约简计算来选择特征子集的一项必不可少的技术。近年来,随着用于分布式/并行算法的MapReduce的激增,针对使用MapReduce的大规模决策系统,该领域已经开发了几种可伸缩的缩减计算算法。现有的基于MapReduce的归约计算方法使用数据集的水平分区(在对象空间中进行划分)到群集的节点中,需要复杂的混洗和排序阶段。在这项工作中,我们提出了一种算法MR_IQRA_VP,该算法是使用数据集的垂直分区(属性空间中的划分)设计的,并简化了MapReduce框架的混洗和排序阶段。MR_IQRA_VP是改进的快速减少算法(IQRA_IG)的分布式/并行实现,并且使用Apache Spark的迭代MapReduce框架实现。通过使用基于现有水平划分的归约计算算法的基准决策系统进行实验,我们进行了广泛的比较研究。通过实验分析以及理论验证,我们已经确定MR_IQRA_VP适用于生物信息学和Web挖掘领域中普遍存在的较大尺寸属性空间和中等对象空间的数据集,并且可扩展。通过使用基于现有水平划分的归约计算算法的基准决策系统进行实验,我们进行了广泛的比较研究。通过实验分析以及理论验证,我们已经确定MR_IQRA_VP适用于生物信息学和Web挖掘领域中普遍存在的较大尺寸属性空间和中等对象空间的数据集,并且可扩展。通过使用基于现有水平划分的归约计算算法的基准决策系统进行实验,我们进行了广泛的比较研究。通过实验分析以及理论验证,我们已经确定MR_IQRA_VP适用于生物信息学和Web挖掘领域中普遍存在的较大尺寸属性空间和中等对象空间的数据集,并且可扩展。

更新日期:2020-01-16
down
wechat
bug