当前位置: X-MOL 学术SICS Softw.-Inensiv. Cyber-Phys. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ASAP-DM: a framework for automatic selection of analytic platforms for data mining
SICS Software-Intensive Cyber-Physical Systems Pub Date : 2019-08-17 , DOI: 10.1007/s00450-019-00408-7
Manuel Fritz , Osama Muazzen , Michael Behringer , Holger Schwarz

The plethora of analytic platforms escalates the difficulty of selecting the most appropriate analytic platform that fits the needed data mining task, the dataset as well as additional user-defined criteria. Especially analysts, who are rather focused on the analytics domain, experience difficulties to keep up with the latest developments. In this work, we introduce the ASAP-DM framework, which enables analysts to seamlessly use several platforms, whereas programmers can easily add several platforms to the framework. Furthermore, we investigate how to predict a platform based on specific criteria, such as lowest runtime or resource consumption during the execution of a data mining task. We formulate this task as an optimization problem, which can be solved by today’s classification algorithms. We evaluate the proposed framework on several analytic platforms such as Spark, Mahout, and WEKA along with several data mining algorithms for classification, clustering, and association rule discovery. Our experiments unveil that the automatic selection process can save up to 99.71% of the execution time due to automatically choosing a faster platform.

中文翻译:

ASAP-DM:自动选择用于数据挖掘的分析平台的框架

过多的分析平台使选择最适合需要的数据挖掘任务,数据集以及其他用户定义标准的最合适分析平台的困难升级。尤其是专注于分析领域的分析师,在跟上最新动态方面遇到困难。在这项工作中,我们介绍了ASAP-DM框架,该框架使分析师能够无缝使用多个平台,而程序员可以轻松地向该框架添加多个平台。此外,我们研究如何根据特定标准(例如,在执行数据挖掘任务期间的最低运行时间或资源消耗)来预测平台。我们将此任务表述为一个优化问题,可以通过当今的分类算法来解决。我们在几种分析平台(例如Spark,Mahout和WEKA)上评估了提出的框架,并在分类,聚类和关联规则发现方面使用了几种数据挖掘算法。我们的实验表明,由于自动选择更快的平台,自动选择过程最多可以节省99.71%的执行时间。
更新日期:2019-08-17
down
wechat
bug