Provable training set debugging for linear regression,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Provable training set debugging for linear regression
Machine Learning ( IF 7.5 ) Pub Date : 2021-08-16 , DOI: 10.1007/s10994-021-06040-4
Xiaomin Zhang ₁ , Xiaojin Zhu ₁ , Po-Ling Loh ₂

Affiliation

We investigate problems in penalized M-estimation, inspired by applications in machine learning debugging. Data are collected from two pools, one containing data with possibly contaminated labels, and the other which is known to contain only cleanly labeled points. We first formulate a general statistical algorithm for identifying buggy points and provide rigorous theoretical guarantees when the data follow a linear model. We then propose an algorithm for tuning parameter selection of our Lasso-based algorithm with theoretical guarantees. Finally, we consider a two-person “game” played between a bug generator and a debugger, where the debugger can augment the contaminated data set with cleanly labeled versions of points in the original data pool. We develop and analyze a debugging strategy in terms of a Mixed Integer Linear Programming (MILP). Finally, we provide empirical results to verify our theoretical results and the utility of the MILP strategy.

中文翻译：

线性回归的可证明训练集调试

我们调查受罚M 中的问题-estimation，受机器学习调试中的应用程序启发。数据是从两个池中收集的，一个包含可能带有污染标签的数据，另一个已知仅包含干净标记的点。我们首先制定了一个通用的统计算法来识别错误点，并在数据遵循线性模型时提供严格的理论保证。然后，我们提出了一种算法，用于调整基于 Lasso 的算法的参数选择，并具有理论保证。最后，我们考虑在 bug 生成器和调试器之间玩的两人“游戏”，其中调试器可以用原始数据池中干净标记的点版本来扩充受污染的数据集。我们根据混合整数线性规划 (MILP) 开发和分析调试策略。最后，

更新日期：2021-08-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>