当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mutual Information Regularized Feature-Level Frankenstein for Discriminative Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2021-05-04 , DOI: 10.1109/tpami.2021.3077397
Xiaofeng Liu , Yang Chao , Jane J. You , C.-C. Jay Kuo , Bhagavatula Vijayakumar

Deep learning recognition approaches can potentially perform better if we can extract a discriminative representation that controllably separates nuisance factors. In this paper, we propose a novel approach to explicitly enforce the extracted discriminative representation $\boldsymbol{d}$d, extracted latent variation $\boldsymbol{l}$l (e,g., background, unlabeled nuisance attributes), and semantic variation label vector $\boldsymbol{s}$s (e.g., labeled expressions/pose) to be independent and complementary to each other. We can cast this problem as an adversarial game in the latent space of an auto-encoder. Specifically, with the to-be-disentangled $\boldsymbol{s}$s, we propose to equip an end-to-end conditional adversarial network with the ability to decompose an input sample into ${\boldsymbol{d}}$d and $\boldsymbol{l}$l. However, we argue that maximizing the cross-entropy loss of semantic variation prediction from $\boldsymbol{d}$d is not sufficient to remove the impact of $\boldsymbol{s}$s from $\boldsymbol{d}$d, and that the uniform-target and entropy regularization are necessary. A collaborative mutual information regularization framework is further proposed to avoid unstable adversarial training. It is able to minimize the differentiable mutual information between the variables to enforce independence. The proposed discriminative representation inherits the desired tolerance property guided by prior knowledge of the task. Our proposed framework achieves top performance on diverse recognition tasks, including digits classification, large-scale face recognition on LFW and IJB-A datasets, and face recognition tolerant to changes in lighting, makeup, disguise, etc.

中文翻译:

互信息正则化特征级科学怪人用于判别识别

如果我们可以提取可控制地分离滋扰因素的判别表示,深度学习识别方法可能会表现得更好。在本文中,我们提出了一种新方法来显式执行提取的判别表示$\粗体符号{d}$d, 提取的潜在变异$\粗体符号{l}$l(例如,背景、未标记的讨厌属性)和语义变化标签向量$\粗体符号{s}$s(例如,标记的表情/姿势)相互独立和互补。我们可以将此问题视为自动编码器潜在空间中的对抗游戏。具体来说,与待解开的$\粗体符号{s}$s,我们建议为端到端条件对抗网络配备将输入样本分解为${\boldsymbol{d}}$d$\粗体符号{l}$l. 然而,我们认为最大化语义变化预测的交叉熵损失$\粗体符号{d}$d不足以消除影响$\粗体符号{s}$s$\粗体符号{d}$d,并且统一目标和熵正则化是必要的。进一步提出了一种协作互信息正则化框架,以避免不稳定的对抗训练。它能够最小化变量之间的可微互信息以强制独立。所提出的判别表示继承了由任务的先验知识引导的所需容差属性。我们提出的框架在各种识别任务上实现了最佳性能,包括数字分类、LFW 和 IJB-A 数据集上的大规模人脸识别,以及对光照、化妆、伪装等变化的人脸识别。
更新日期:2021-05-04
down
wechat
bug