Deep Mixture of Diverse Experts for Large-Scale Visual Recognition,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Mixture of Diverse Experts for Large-Scale Visual Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 4-20-2018 , DOI: 10.1109/tpami.2018.2828821
Tianyi Zhao , Qiuyu Chen , Zhenzhong Kuang , Jun Yu , Wei Zhang , Jianping Fan

In this paper, a deep mixture of diverse experts algorithm is developed to achieve more efficient learning of a huge (mixture) network for large-scale visual recognition application. First, a two-layer ontology is constructed to assign large numbers of atomic object classes into a set of task groups according to the similarities of their learning complexities, where certain degrees of inter-group task overlapping are allowed to enable sufficient inter-group message passing. Second, one particular base deep CNNs with M+1M+1 outputs is learned for each task group to recognize its MM atomic object classes and identify one special class of “not-in-group”, where the network structure (numbers of layers and units in each layer) of the well-designed deep CNNs (such as AlexNet, VGG, GoogleNet, ResNet) is directly used to configure such base deep CNNs. For enhancing the separability of the atomic object classes in the same task group, two approaches are developed to learn more discriminative base deep CNNs: (a) our deep multi-task learning algorithm that can effectively exploit the inter-class visual similarities; (b) our two-layer network cascade approach that can improve the accuracy rates for the hard object classes at certain degrees while effectively maintaining the high accuracy rates for the easy ones. Finally, all these complementary base deep CNNs with diverse but overlapped outputs are seamlessly combined to generate a mixture network with larger outputs for recognizing tens of thousands of atomic object classes. Our experimental results have demonstrated that our deep mixture of diverse experts algorithm can achieve very competitive results on large-scale visual recognition.

中文翻译：

不同专家的深度融合，实现大规模视觉识别

本文开发了一种不同专家的深度混合算法，以实现大规模视觉识别应用中庞大（混合）网络的更有效学习。首先，构建一个两层本体，根据学习复杂度的相似性将大量原子对象类分配到一组任务组中，其中允许一定程度的组间任务重叠，以实现足够的组间消息通过。其次，为每个任务组学习一个具有 M+1M+1 输出的特定基础深度 CNN，以识别其 MM 原子对象类并识别一种特殊的“非组内”类别，其中网络结构（层数和精心设计的深度 CNN（例如 AlexNet、VGG、GoogleNet、ResNet）的每层单元）直接用于配置此类基础深度 CNN。为了增强同一任务组中原子对象类的可分离性，开发了两种方法来学习更具辨别力的基础深度 CNN：（a）我们的深度多任务学习算法，可以有效地利用类间视觉相似性； (b)我们的两层网络级联方法可以在一定程度上提高硬对象类别的准确率，同时有效地保持简单对象的高准确率。最后，所有这些具有不同但重叠输出的互补基础深度 CNN 被无缝组合，生成具有更大输出的混合网络，用于识别数以万计的原子对象类。我们的实验结果表明，我们的不同专家算法的深度混合可以在大规模视觉识别上取得非常有竞争力的结果。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11