当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incorporating Side Information by Adaptive Convolution
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2020-07-02 , DOI: 10.1007/s11263-020-01345-8
Di Kang , Debarun Dhar , Antoni B. Chan

Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in deep learning based counting systems. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information. In particular, we model the filter weights as a low-dimensional manifold within the high-dimensional space of filter weights. The filter weights are generated using a learned “filter manifold” sub-network, whose input is the side information. With the help of side information and adaptive weights, the ACNN can disentangle the variations related to the side information, and extract discriminative features related to the current context (e.g. camera perspective, noise level, blur kernel parameters). We demonstrate the effectiveness of ACNN incorporating side information on 3 tasks: crowd counting, corrupted digit recognition, and image deblurring. Our experiments show that ACNN improves the performance compared to a plain CNN with a similar number of parameters and achieves similar or better than state-of-the-art performance on crowd counting task. Since existing crowd counting datasets do not contain ground-truth side information, we collect a new dataset with the ground-truth camera angle and height as the side information. We also perform ablation experiments, mainly for crowd counting, to study the helpfulness of the side information, and the effect of the placement of the adaptive convolutional layers in order to get insight about ACNNs.



计算机视觉任务通常具有有助于解决任务的辅助信息。例如,对于人群计数,相机视角(例如,相机角度和高度)提供了关于场景中人的外观和比例的线索。虽然辅助信息已被证明对使用传统手工制作特征的计数系统很有用,但它尚未在基于深度学习的计数系统中得到充分利用。为了合并可用的边信息,我们提出了一种自适应卷积神经网络 (ACNN),其中卷积滤波器权重通过边信息适应当前场景上下文。特别地,我们将过滤器权重建模为过滤器权重的高维空间内的低维流形。过滤器权重是使用学习到的“过滤器流形”子网络生成的,其输入是边信息。借助边信息和自适应权重,ACNN 可以解开与边信息相关的变化,并提取与当前上下文相关的判别特征(例如相机视角、噪声水平、模糊内核参数)。我们证明了 ACNN 在 3 个任务上结合辅助信息的有效性:人群计数、损坏的数字识别和图像去模糊。我们的实验表明,与具有相似数量参数的普通 CNN 相比,ACNN 提高了性能,并且在人群计数任务上实现了与最先进的性能相似或更好的性能。由于现有的人群计数数据集不包含真实的辅助信息,我们收集了一个以真实摄像机角度和高度作为辅助信息的新数据集。