F-PENN— Forest path encoding for neural networks,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

F-PENN— Forest path encoding for neural networks
Information Fusion ( IF 14.7 ) Pub Date : 2021-06-23 , DOI: 10.1016/j.inffus.2021.06.005
Yoni Cohen , Gilad Katz , Lior Rokach

Deep neural nets (DNNs) mostly tend to outperform other machine learning (ML) approaches when the training data is abundant, high-dimensional, sparse, or consisting of raw data (e.g., pixels). For datasets with other characteristics—for example, dense tabular numerical data—algorithms such as Gradient Boosting Machines and Random Forest often achieve comparable or better performance at a fraction of the time and resources. These differences suggest that combining these approaches has potential to yield superior performance. Existing attempts to combine DNNs with other ML approaches, which usually consist of feeding the output of the latter into the former, often do not produce positive results. We argue that this lack of improvement stems from the fact that the final classifications fail to provide the DNN with an understanding of the other algorithms’ decision-making process (i.e., its “logic”). In this study we present F-PENN, a novel approach for combining decision forests and DNNs. Instead of providing the final output of the forest (or its trees) to the DNN, we provide the paths traveled by each sample. This information, when fed to the neural net, yields significant improvement in performance. We demonstrate the effectiveness of our approach by conducting extensive evaluation on 56 datasets and comparing F-PENN to four leading baselines: DNNs, Gradient Boosted Decision Trees (GBDT), Random Forest and DeepFM. We show that F-PENN outperforms the baselines in 69%–89% of dataset and achieves an overall average error reduction of 16%–26%.

中文翻译：

F-PENN——神经网络的森林路径编码

当训练数据丰富、高维、稀疏或由原始数据（例如，像素）组成时，深度神经网络 (DNN) 的表现往往优于其他机器学习 (ML) 方法。对于具有其他特征的数据集（例如，密集表格数值数据），诸如梯度提升机和随机森林之类的算法通常会以很少的时间和资源实现可比或更好的性能。这些差异表明，结合这些方法有可能产生卓越的性能。现有尝试将 DNN 与其他 ML 方法相结合，通常包括将后者的输出馈入前者，通常不会产生积极的结果。我们认为，这种缺乏改进源于最终分类未能让 DNN 了解其他算法的决策过程（即其“逻辑”）。在这项研究中，我们提出了 F-PENN，这是一种结合决策森林和 DNN 的新方法。我们没有将森林（或其树木）的最终输出提供给 DNN，而是提供每个样本经过的路径。将此信息馈送到神经网络时，会显着提高性能。我们通过对 56 个数据集进行广泛评估并将 F-PENN 与四个主要基线进行比较来证明我们方法的有效性：DNN、梯度提升决策树 (GBDT)、随机森林和 DeepFM。

更新日期：2021-06-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11