当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automating Outlier Detection via Meta-Learning
arXiv - CS - Machine Learning Pub Date : 2020-09-22 , DOI: arxiv-2009.10606
Yue Zhao, Ryan A. Rossi, Leman Akoglu

Given an unsupervised outlier detection (OD) task on a new dataset, how can we automatically select a good outlier detection method and its hyperparameter(s) (collectively called a model)? Thus far, model selection for OD has been a "black art"; as any model evaluation is infeasible due to the lack of (i) hold-out data with labels, and (ii) a universal objective function. In this work, we develop the first principled data-driven approach to model selection for OD, called MetaOD, based on meta-learning. MetaOD capitalizes on the past performances of a large body of detection models on existing outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset. To capture task similarity, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Through comprehensive experiments, we show the effectiveness of MetaOD in selecting a detection model that significantly outperforms the most popular outlier detectors (e.g., LOF and iForest) as well as various state-of-the-art unsupervised meta-learners while being extremely fast. To foster reproducibility and further research on this new problem, we open-source our entire meta-learning system, benchmark environment, and testbed datasets.

中文翻译:

通过元学习自动检测异常值

给定一个新数据集上的无监督异常值检测 (OD) 任务,我们如何自动选择一个好的异常值检测方法及其超参数(统称为模型)?迄今为止,OD的选型一直是一门“黑色艺术”;因为任何模型评估都是不可行的,因为缺乏 (i) 带有标签的保留数据,以及 (ii) 通用目标函数。在这项工作中,我们基于元学习开发了第一个原则性数据驱动的 OD 模型选择方法,称为 MetaOD。MetaOD 利用大量检测模型在现有异常检测基准数据集上的过去表现,并继承这一先前的经验来自动选择要在新数据集上使用的有效模型。为了捕捉任务相似性,我们引入了专门的元特征来量化数据集的外围特征。通过综合实验,我们展示了 MetaOD 在选择检测模型方面的有效性,该模型显着优于最流行的异常检测器(例如,LOF 和 iForest)以及各种最先进的无监督元学习器,同时速度极快。为了促进对这个新问题的可重复性和进一步研究,我们将整个元学习系统、基准环境和测试平台数据集开源。
更新日期:2020-09-23
down
wechat
bug