当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dataset2Vec: learning dataset meta-features
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2021-02-25 , DOI: 10.1007/s10618-021-00737-9
Hadi S. Jomaa , Lars Schmidt-Thieme , Josif Grabocka

Meta-learning, or learning to learn, is a machine learning approach that utilizes prior learning experiences to expedite the learning process on unseen tasks. As a data-driven approach, meta-learning requires meta-features that represent the primary learning tasks or datasets, and are estimated traditonally as engineered dataset statistics that require expert domain knowledge tailored for every meta-task. In this paper, first, we propose a meta-feature extractor called Dataset2Vec that combines the versatility of engineered dataset meta-features with the expressivity of meta-features learned by deep neural networks. Primary learning tasks or datasets are represented as hierarchical sets, i.e., as a set of sets, esp. as a set of predictor/target pairs, and then a DeepSet architecture is employed to regress meta-features on them. Second, we propose a novel auxiliary meta-learning task with abundant data called dataset similarity learning that aims to predict if two batches stem from the same dataset or different ones. In an experiment on a large-scale hyperparameter optimization task for 120 UCI datasets with varying schemas as a meta-learning task, we show that the meta-features of Dataset2Vec outperform the expert engineered meta-features and thus demonstrate the usefulness of learned meta-features for datasets with varying schemas for the first time.



中文翻译:

Dataset2Vec:学习数据集元功能

元学习或学习学习是一种机器学习方法,它利用先前的学习经验来加快针对未见任务的学习过程。作为一种数据驱动的方法,元学习需要表示主要学习任务或数据集的元功能,并以传统方式将其作为工程数据集统计进行估算,而工程数据集统计则需要针对每个元任务量身定制的专家级知识。在本文中,首先,我们提出了一种称为Dataset2Vec的元功能提取器,该提取器将工程数据集元功能的多功能性与深度神经网络学习的元功能的表达能力相结合。初级学习任务或数据集表示为层次集,即表示为一组集,尤其是。作为一组预测变量/目标对,然后采用DeepSet架构对它们的元功能进行回归。第二,我们提出了一种新颖的辅助元学习任务,该任务具有丰富的数据,称为数据集相似性学习,旨在预测两个批次是来自同一数据集还是来自不同数据集。在一项针对120个具有不同模式的UCI数据集的大规模超参数优化任务的实验中,我们发现,Dataset2Vec的元功能优于专家设计的元功能,从而证明了学习型元功能的实用性首次为具有不同架构的数据集提供功能。

更新日期:2021-02-26
down
wechat
bug