Conditional information gain networks as sparse mixture of experts,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Conditional information gain networks as sparse mixture of experts
Pattern Recognition ( IF 8 ) Pub Date : 2021-07-10 , DOI: 10.1016/j.patcog.2021.108151
Ufuk Can Bicici _{1,

2} , Lale Akarun ₁

Affiliation

Deep neural network models owe their representational power and high performance in classification tasks to the high number of learnable parameters. Running deep neural network models in limited resource environments is a problematic task. Models employing conditional computing aim to reduce the computational burden while retaining model performance on par with more complex neural network models. In this paper, we propose a new model, Conditional Information Gain Networks (CIGNs). A CIGN model is neural tree which allows skipping parts of the tree conditionally based on routing mechanisms inserted into the architecture. These routing mechanisms are based on differentiable Information Gain objectives. CIGN groups semantically similar samples in the leaves, enabling simpler classifiers to focus on differentiating between similar classes. This lets the CIGN model to attain high classification performances with lighter models. We further improve the basic CIGN model by proposing a sparse mixture of experts model for difficult to classify samples which may get routed to suboptimal branches. If a sample has a routing confidence higher than a specific threshold, the sample may be routed towards multiple child nodes. The classification decision can then be taken as the mixture of these expert decisions. We learn the optimal routing thresholds by Bayesian Optimization over a validation set, by minimizing a weighted loss including the classification accuracy and the number of multiplication and accumulations (MAC). We show the effectiveness of the CIGN models enhanced with the Sparse Mixture of Experts approach with extensive tests on MNIST, Fashion MNIST, CIFAR 100 and UCI-USPS datasets, as well as comparisons with methods from the literature. The CIGN models can retain high generalization performance, on par with a thick unconditional model while keeping the operation burden at the same level with a much thinner model.

中文翻译：

条件信息增益网络作为专家的稀疏混合

深度神经网络模型在分类任务中的表征能力和高性能归功于大量的可学习参数。在有限资源环境中运行深度神经网络模型是一项有问题的任务。采用条件计算的模型旨在减少计算负担，同时保持模型性能与更复杂的神经网络模型相当。在本文中，我们提出了一种新模型，即条件信息增益网络（CIGN）。CIGN 模型是神经树，它允许根据插入到架构中的路由机制有条件地跳过树的一部分。这些路由机制基于可区分的信息增益目标。CIGN 将叶子中语义相似的样本分组，使更简单的分类器能够专注于区分相似的类。这让 CIGN 模型可以用更轻的模型获得高分类性能。我们通过提出一个稀疏的专家模型混合模型来进一步改进基本的 CIGN 模型，用于难以分类的样本，这些样本可能会被路由到次优分支。如果样本的路由置信度高于特定阈值，则可以将样本路由到多个子节点。然后可以将分类决策作为这些专家决策的混合。我们通过贝叶斯优化在验证集上学习最佳路由阈值，通过最小化包括分类精度和乘法和累加数 (MAC) 的加权损失。我们通过对 MNIST、Fashion MNIST、CIFAR 100 和 UCI-USPS 数据集的广泛测试，展示了使用稀疏专家混合方法增强的 CIGN 模型的有效性，以及与文献方法的比较。CIGN 模型可以保持较高的泛化性能，与厚的无条件模型相当，同时将操作负担保持在与更薄的模型相同的水平。

更新日期：2021-07-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>