Large deviations in the perceptron model and consequences for active learning,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Large deviations in the perceptron model and consequences for active learning
Machine Learning: Science and Technology ( IF 6.013 ) Pub Date : 2021-07-13 , DOI: 10.1088/2632-2153/abfbbb
H Cui , L Saglietti , L Zdeborová

Active learning (AL) is a branch of machine learning that deals with problems where unlabeled data is abundant yet obtaining labels is expensive. The learning algorithm has the possibility of querying a limited number of samples to obtain the corresponding labels, subsequently used for supervised learning. In this work, we consider the task of choosing the subset of samples to be labeled from a fixed finite pool of samples. We assume the pool of samples to be a random matrix and the ground truth labels to be generated by a single-layer teacher random neural network. We employ replica methods to analyze the large deviations for the accuracy achieved after supervised learning on a subset of the original pool. These large deviations then provide optimal achievable performance boundaries for any AL algorithm. We show that the optimal learning performance can be efficiently approached by simple message-passing AL algorithms. We also provide a comparison with the performance of some other popular active learning strategies.

中文翻译：

感知器模型的大偏差和主动学习的后果

主动学习 (AL) 是机器学习的一个分支，用于处理未标记数据丰富但获取标签成本高昂的问题。学习算法有可能查询有限数量的样本以获得相应的标签，随后用于监督学习。在这项工作中，我们考虑从固定的有限样本池中选择要标记的样本子集的任务。我们假设样本池是一个随机矩阵，地面实况标签由单层教师随机神经网络生成。我们采用复制方法来分析在对原始池的子集进行监督学习后达到的准确度的大偏差。然后，这些大偏差为任何 AL 算法提供了可实现的最佳性能边界。我们表明，通过简单的消息传递 AL 算法可以有效地达到最佳学习性能。我们还提供了与其他一些流行的主动学习策略的性能的比较。

更新日期：2021-07-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>