当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pointly-supervised scene parsing with uncertainty mixture
Computer Vision and Image Understanding ( IF 4.5 ) Pub Date : 2020-07-11 , DOI: 10.1016/j.cviu.2020.103040
Hao Zhao , Ming Lu , Anbang Yao , Yiwen Guo , Yurong Chen , Li Zhang

Pointly-supervised learning is an important topic for scene parsing, as dense annotation is extremely expensive and hard to scale. The state-of-the-art method harvests pseudo labels by applying thresholds upon softmax outputs (logits). There are two issues with this practice: (1) Softmax output does not necessarily reflect the confidence of the network output. (2) There is no principled way to decide on the optimal threshold. Tuning thresholds can be time-consuming for deep neural networks. Our method, by contrast, builds upon uncertainty measures instead of logits and is free of threshold tuning. We motivate the method with a large-scale analysis of the distribution of uncertainty measures, using strong models and challenging databases. This analysis leads to the discovery of a statistical phenomenon called uncertainty mixture. Specifically speaking, for each independent category, the distribution of uncertainty measures for unlabeled points is a mixture of two components (certain v.s. uncertain samples). The phenomenon of uncertainty mixture is surprisingly ubiquitous in real-world datasets like PascalContext and ADE20k. Inspired by this discovery, we propose to decompose the distribution of uncertainty measures with a Gamma mixture model, leading to a principled method to harvest reliable pseudo labels. Beyond that, we assume the uncertainty measures for labeled points are always drawn from the certain component. This amounts to a regularized Gamma mixture model. We provide a thorough theoretical analysis of this model, showing that it can be solved with an EM-style algorithm with convergence guarantee. Our method is also empirically successful. On PascalContext and ADE20k, we achieve clear margins over the baseline, notably with no threshold tuning in the pseudo label generation procedure. On the absolute scale, since our method collaborates well with strong baselines, we reach new state-of-the-art performance on both datasets.



中文翻译:

具有不确定性混合的点监督场景解析

定向监督学习是场景解析的重要主题,因为密集注释非常昂贵且难以扩展。最先进的方法是通过将阈值应用于softmax输出(logit)来收集伪标签。这种做法有两个问题:(1)Softmax输出不一定反映网络输出的置信度。(2)没有原则上的方法来确定最佳阈值。调整阈值对于深度神经网络可能非常耗时。相比之下,我们的方法基于不确定性度量而非对数,并且没有阈值调整。我们使用强大的模型和具有挑战性的数据库,通过对不确定性度量的分布进行大规模分析来激发该方法。这种分析导致发现一种称为不确定性混合的统计现象。具体而言,对于每个独立类别,未标记点的不确定性度量的分布是两个分量(特定样本与不确定样本)的混合。令人惊讶的是,不确定性混合现象在PascalContext和ADE20k等现实数据集中普遍存在。受此发现的启发,我们建议使用Gamma混合模型分解不确定性度量的分布,从而导致一种有原则的方法来收集可靠的伪标记。除此之外,我们假设标记点的不确定性度量总是从特定组件中提取的。这相当于一个正则化的伽玛混合模型。我们对该模型进行了详尽的理论分析,表明可以使用具有收敛性保证的EM样式算法来解决该模型。我们的方法在经验上也很成功。在PascalContext和ADE20k上,我们在基线上获得了明显的边距,尤其是在伪标签生成过程中没有阈值调整的情况下。在绝对规模上,由于我们的方法可以与强大的基准很好地协作,因此我们在两个数据集上均达到了最新的性能水平。

更新日期:2020-07-17
down
wechat
bug