A forest-based algorithm for selecting informative variables using Variable Depth Distribution,Engineering Applications of Artificial Intelligence

当前位置： X-MOL 学术 › Eng. Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A forest-based algorithm for selecting informative variables using Variable Depth Distribution
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2020-11-24 , DOI: 10.1016/j.engappai.2020.104073
Sergii Voronov , Daniel Jung , Erik Frisk

Predictive maintenance of systems and their components in technical systems is a promising approach to optimize system usage and reduce system downtime. Various sensor data are logged during system operation for different purposes, but sometimes not directly related to the degradation of a specific component. Variable selection algorithms are necessary to reduce model complexity and improve interpretability of diagnostic and prognostic algorithms. This paper presents a forest-based variable selection algorithm that analyzes the distribution of a variable in the decision tree structure, called Variable Depth Distribution, to measure its importance. The proposed variable selection algorithm is developed for datasets with correlated variables that pose problems for existing forest-based variable selection methods. The proposed variable selection method is evaluated and analyzed using three case studies: survival analysis of lead–acid batteries in heavy-duty vehicles, engine misfire detection, and a simulated prognostics dataset. The results show the usefulness of the proposed algorithm, with respect to existing forest-based methods, and its ability to identify important variables in different applications. As an example, the battery prognostics case study shows that similar predictive performance is achieved when only 17% percent of the variables are used compared to all measured signals.

中文翻译：

基于森林的使用变量深度分布选择信息变量的算法

对系统及其在技术系统中的组件进行预测性维护是一种优化系统使用率并减少系统停机时间的有前途的方法。在系统运行期间，出于各种目的记录了各种传感器数据，但有时与特定组件的降级没有直接关系。变量选择算法对于降低模型复杂性和提高诊断和预后算法的可解释性是必需的。本文提出了一种基于森林的变量选择算法，该算法分析了决策树结构中变量的分布，称为变量深度分布。，以衡量其重要性。所提出的变量选择算法是针对具有相关变量的数据集而开发的，这些数据集给现有基于森林的变量选择方法带来了问题。拟议的变量选择方法将通过三个案例研究进行评估和分析：重型车辆中铅酸蓄电池的生存分析，发动机失火检测以及模拟的预测数据集。结果表明，相对于现有的基于森林的方法，该算法是有用的，并且能够识别不同应用中的重要变量。例如，电池预测案例研究表明，与所有测量信号相比，仅使用17％的变量时，可实现类似的预测性能。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11