当前位置: X-MOL 学术Int. J. High Perform. Comput. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fitness evaluation reuse for accelerating GPU-based evolutionary induction of decision trees
The International Journal of High Performance Computing Applications ( IF 3.5 ) Pub Date : 2020-09-15 , DOI: 10.1177/1094342020957393
Krzysztof Jurczuk 1 , Marcin Czajkowski 1 , Marek Kretowski 1
Affiliation  

Decision trees (DTs) are one of the most popular white-box machine-learning techniques. Traditionally, DTs are induced using a top-down greedy search that may lead to sub-optimal solutions. One of the emerging alternatives is an evolutionary induction inspired by the biological evolution. It searches for the tree structure and tests simultaneously, which results in less complex DTs with at least comparable prediction performance. However, the evolutionary search is computationally expensive, and its effective application to big data mining needs algorithmic and technological progress. In this paper, noting that many trees or their parts reappear during the evolution, we propose a reuse strategy. A fixed number of recently processed individuals (DTs) is stored in a so-called repository. A part of the repository entry (related to fitness calculations) is maintained on a CPU side to limit CPU/GPU memory transactions. The rest of the repository entry (tree structures) is located on a GPU side to speed up searching for similar DTs. As the most time-demanding task of the induction is the DTs’ evaluation, the GPU first searches similar DTs in the repository for reuse. If it fails, the GPU has to evaluate DT from the ground up. Large artificial and real-life datasets and various repository strategies are tested. Results show that the concept of reusing information from previous generations can accelerate the original GPU-based solution further. It is especially visible for large-scale data. To give an idea of the overall acceleration scale, the proposed solution can process even billions of objects in a few hours on a single GPU workstation.

中文翻译:

用于加速基于 GPU 的决策树进化归纳的适应度评估重用

决策树 (DT) 是最流行的白盒机器学习技术之一。传统上,DTs 是使用自上而下的贪婪搜索来诱导的,这可能会导致次优解决方案。一种新兴的替代方案是受生物进化启发的进化归纳法。它同时搜索树结构和测试,这会导致复杂度较低的 DTs 至少具有可比的预测性能。然而,进化搜索在计算上是昂贵的,其在大数据挖掘中的有效应用需要算法和技术进步。在本文中,注意到许多树或其部分在进化过程中重新出现,我们提出了一种重用策略。固定数量的最近处理的个人 (DT) 存储在所谓的存储库中。存储库条目的一部分(与适应度计算相关)在 CPU 端维护以限制 CPU/GPU 内存事务。存储库条目的其余部分(树结构)位于 GPU 端,以加快对类似 DT 的搜索。由于归纳中最耗时的任务是对 DT 的评估,因此 GPU 首先在存储库中搜索相似的 DT 以供重用。如果失败,GPU 必须从头开始评估 DT。测试了大型人工和现实生活数据集以及各种存储库策略。结果表明,重用前几代信息的概念可以进一步加速原始基于 GPU 的解决方案。对于大规模数据尤其明显。给出整体加速度尺度的概念,
更新日期:2020-09-15
down
wechat
bug