当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mitigating Class-Boundary Label Uncertainty to Reduce Both Model Bias and Variance
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2021-03-05 , DOI: 10.1145/3429447
Matthew Almeida 1 , Yong Zhuang 1 , Wei Ding 1 , Scott E. Crouter 2 , Ping Chen 1
Affiliation  

The study of model bias and variance with respect to decision boundaries is critically important in supervised learning and artificial intelligence. There is generally a tradeoff between the two, as fine-tuning of the decision boundary of a classification model to accommodate more boundary training samples (i.e., higher model complexity) may improve training accuracy (i.e., lower bias) but hurt generalization against unseen data (i.e., higher variance). By focusing on just classification boundary fine-tuning and model complexity, it is difficult to reduce both bias and variance. To overcome this dilemma, we take a different perspective and investigate a new approach to handle inaccuracy and uncertainty in the training data labels, which are inevitable in many applications where labels are conceptual entities and labeling is performed by human annotators. The process of classification can be undermined by uncertainty in the labels of the training data; extending a boundary to accommodate an inaccurately labeled point will increase both bias and variance. Our novel method can reduce both bias and variance by estimating the pointwise label uncertainty of the training set and accordingly adjusting the training sample weights such that those samples with high uncertainty are weighted down and those with low uncertainty are weighted up. In this way, uncertain samples have a smaller contribution to the objective function of the model’s learning algorithm and exert less pull on the decision boundary. In a real-world physical activity recognition case study, the data present many labeling challenges, and we show that this new approach improves model performance and reduces model variance.

中文翻译:

减轻类别边界标签的不确定性以减少模型偏差和方差

关于决策边界的模型偏差和方差的研究在监督学习和人工智能中至关重要。两者之间通常需要权衡,因为微调分类模型的决策边界以适应更多边界训练样本(即更高的模型复杂度)可能会提高训练精度(即更低的偏差),但会损害对未见数据的泛化能力(即更高的方差)。通过只关注分类边界微调和模型复杂性,很难同时减少偏差和方差。为了克服这一困境,我们采取不同的观点并研究一种新方法来处理训练数据标签中的不准确性和不确定性,在标签是概念实体且标签由人工注释者执行的许多应用中,这是不可避免的。训练数据标签的不确定性可能会破坏分类过程;扩展边界以容纳不准确标记的点将增加偏差和方差。我们的新方法可以通过估计训练集的逐点标签不确定性来减少偏差和方差,并相应地调整训练样本的权重,使那些具有高不确定性的样本被加权,而那些具有低不确定性的样本被加权。这样,不确定样本对模型学习算法的目标函数的贡献较小,对决策边界的拉力也较小。在一个真实的身体活动识别案例研究中,
更新日期:2021-03-05
down
wechat
bug