当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Unified Framework for Automatic Distributed Active Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-11-23 , DOI: 10.1109/tpami.2021.3129793
Xu Chen 1 , Brett Wujek 1
Affiliation  

We propose a novel unified frameork for automated distributed active learning (AutoDAL) to address multiple challenging problems in active learning such as limited labeled data, imbalanced datasets, automatic hyperparameter selection as well as scalability to big data. First, automated graph-based semi-supervised learning is conducted by aggregating the proposed cost functions from different compute nodes and jointly optimizing hyperparameters in both the classification and query selection stages. For dense datasets, clustering-based uncertainty sampling with maximum entropy (CME) loss is applied in the optimization. For sparse and imbalanced datasets, shrinkage optimized KL-divergence regularization and local selection based active learning (SOAR) loss are further naturally adapted in AutoDAL. The optimization is efficiently resolved by iteratively executing a genetic algorithm (GA) refined with a local generating set search (GSS) and solving an integer linear programming (ILP) problem. Moreover, we propose an efficient distributed active learning algorithm which is scalable for big data. The proposed AutoDAL algorithm is applied to multiple benchmark datasets and two real-world datasets including an electrocardiogram (ECG) dataset and a credit fraud detection dataset for classification. We demonstrate that the proposed AutoDAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art AutoML approaches and active learning algorithms.

中文翻译:


自动分布式主动学习的统一框架



我们提出了一种用于自动分布式主动学习(AutoDAL)的新型统一框架,以解决主动学习中的多个具有挑战性的问题,例如有限的标记数据、不平衡的数据集、自动超参数选择以及大数据的可扩展性。首先,通过聚合来自不同计算节点的所提出的成本函数并在分类和查询选择阶段联合优化超参数来进行基于图的自动化半监督学习。对于密集数据集,优化中应用基于聚类的不确定性采样和最大熵(CME)损失。对于稀疏和不平衡的数据集,收缩优化的 KL 散度正则化和基于局部选择的主动学习 (SOAR) 损失在 AutoDAL 中得到了进一步自然的适应。通过迭代执行使用局部生成集搜索 (GSS) 细化的遗传算法 (GA) 并解决整数线性规划 (ILP) 问题,可以有效地解决优化问题。此外,我们提出了一种针对大数据可扩展的高效分布式主动学习算法。所提出的 AutoDAL 算法应用于多个基准数据集和两个现实世界数据集,包括心电图 (ECG) 数据集和信用欺诈检测数据集进行分类。我们证明,与几种最先进的 AutoML 方法和主动学习算法相比,所提出的 AutoDAL 算法能够实现显着更好的性能。
更新日期:2021-11-23
down
wechat
bug