CRF with deep class embedding for large scale classification,Computer Vision and Image Understanding

当前位置： X-MOL 学术 › Comput. Vis. Image Underst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CRF with deep class embedding for large scale classification
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2019-11-06 , DOI: 10.1016/j.cviu.2019.102865
Eran Goldman , Jacob Goldberger

This paper presents a novel deep learning architecture for classifying structured objects in ultrafine-grained datasets, where classes may not be clearly distinguishable by their appearance but rather by their context. We model sequences of images as linear-chain CRFs, and jointly learn the parameters from both local-visual features and neighboring class information. The visual features are learned by convolutional layers, whereas class-structure information is reparametrized by factorizing the CRF pairwise potential matrix. This forms a context-based semantic similarity space, learned alongside the visual similarities, and dramatically increases the learning capacity of contextual information. This new parametrization, however, forms a highly nonlinear objective function which is challenging to optimize. To overcome this, we develop a novel surrogate likelihood which allows for a local likelihood approximation of the original CRF with integrated batch-normalization. This model overcomes the difficulties of existing CRF methods to learn the contextual relationships thoroughly when there is a large number of classes and the data is sparse. The performance of the proposed method is illustrated on a huge dataset that contains images of retail-store product displays, and shows significantly improved results compared to linear CRF parametrization, unnormalized likelihood optimization, and RNN modeling. We also show improved results on a standard OCR dataset.

中文翻译：

具有深度类嵌入功能的CRF，可进行大规模分类

本文提出了一种新颖的深度学习架构，用于对超细粒度数据集中的结构化对象进行分类，其中类别可能无法通过外观清晰区分，而可以通过上下文轻松区分。我们将图像序列建模为线性链CRF，并共同从局部视觉特征和邻近类别信息中学习参数。视觉特征是通过卷积层学习的，而类结构信息是通过分解CRF成对势矩阵来重新设置的。这形成了基于上下文的语义相似性空间，与视觉相似性一起学习，并显着提高了上下文信息的学习能力。然而，这种新的参数化形成了高度非线性的目标函数，这对优化具有挑战性。为了克服这个问题我们开发了一种新颖的替代可能性，该可能性可以通过集成批处理归一化处理原始CRF的局部可能性。该模型克服了现有的CRF方法难以在存在大量类且数据稀疏时彻底学习上下文关系的难题。在包含零售商店产品展示图像的巨大数据集上说明了该方法的性能，并且与线性CRF参数化，非归一化似然优化和RNN建模相比，显示了显着改善的结果。我们还在标准OCR数据集上显示了改进的结果。该模型克服了现有的CRF方法难以在存在大量类且数据稀疏时彻底学习上下文关系的难题。在包含零售商店产品展示图像的巨大数据集上说明了所提出方法的性能，并且与线性CRF参数化，非归一化似然优化和RNN建模相比，显示了显着改善的结果。我们还在标准OCR数据集上显示了改进的结果。该模型克服了现有的CRF方法难以在存在大量类且数据稀疏时彻底学习上下文关系的难题。在包含零售商店产品展示图像的巨大数据集上说明了所提出方法的性能，并且与线性CRF参数化，非归一化似然优化和RNN建模相比，显示了显着改善的结果。我们还在标准OCR数据集上显示了改进的结果。

更新日期：2020-01-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11