当前位置: X-MOL 学术Intell. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Boosting meta-learning with simulated data complexity measures
Intelligent Data Analysis ( IF 0.9 ) Pub Date : 2020-09-30 , DOI: 10.3233/ida-194803
Luís P.F. Garcia 1 , Adriano Rivolli 2 , Edesio Alcoba 3 , Ana C. Lorena 4 , André C.P.L.F. de Carvalho 3
Affiliation  

Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance ofa pool of machine learning algorithms when applied to these datasets. The meta-features must describe essential aspects of the dataset and distinguish different problems and solutions. However, if one wants the use of Meta-Learning to be computationally efficient, the extraction of the meta-feature values should also show a low computational cost, considering a trade-off between the time spent to run all the algorithms and the time required to extract the meta-features. One class of measures with successful results in the characterization of classification datasets is concerned with estimating the underlying complexity of the classification problem. These data complexity measures take into account the overlap between classes imposed by the feature values, the separability of the classes and distribution of the instances within the classes. However, the extraction of these measures from datasets usually presents a high computational cost. In this paper, we propose an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability. The proposal consists of a novel Meta-Learning system able to predict the values of the data complexity measures for a dataset by using simpler meta-features as input. In an extensive set of experiments, we show that the predictive performance achieved by Meta-Learning systems which use the predicted data complexity measures is similar to the performance obtained using the original data complexity measures, but the computational cost involved in their computation is significantly reduced.

中文翻译:

通过模拟数据复杂性措施促进元学习

过去几年,元学习已被广​​泛使用,以支持针对新数据集推荐最合适的机器学习算法和超参数。传统上,会创建一个元数据库,其中包含从多个数据集中提取的元功能,以及应用于这些数据集时机器学习算法池的性能。元功能必须描述数据集的基本方面,并区分不同的问题和解决方案。但是,如果希望使用元学习功能在计算方面很有效,则考虑到运行所有算法所花费的时间与所需时间之间的权衡,元特征值的提取也应显示出较低的计算成本。提取元特征。在分类数据集的表征中取得成功结果的一类措施与估计分类问题的潜在复杂性有关。这些数据复杂性度量考虑了由特征值强加的类之间的重叠,类的可分离性以及类内实例的分布。但是,从数据集中提取这些度量通常会带来很高的计算成本。在本文中,我们提出了一种经验方法,旨在降低计算数据复杂性度量的计算成本,同时仍保持其描述能力。该提案包括一个新颖的元学习系统,该系统能够通过使用更简单的元特征作为输入来预测数据集的数据复杂性度量的值。
更新日期:2020-10-04
down
wechat
bug