GMFAD: Towards Generalized Visual Recognition via Multilayer Feature Alignment and Disentanglement,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GMFAD: Towards Generalized Visual Recognition via Multilayer Feature Alignment and Disentanglement
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2020-09-01 , DOI: 10.1109/tpami.2020.3020554
Haoliang Li ₁ , Shiqi Wang ₂ , Renjie Wan ₁ , Alex C. Kot ₁

Affiliation

The deep learning based approaches which have been repeatedly proven to bring benefits to visual recognition tasks usually make a strong assumption that the training and test data are drawn from similar feature spaces and distributions. However, such an assumption may not always hold in various practical application scenarios on visual recognition tasks. Inspired by the hierarchical organization of deep feature representation that progressively leads to more abstract features at higher layers of representations, we propose to tackle this problem with a novel feature learning framework, which is called GMFAD, with better generalization capability in a multilayer perceptron manner. We first learn feature representations at the shallow layer where shareable underlying factors among domains (e.g., a subset of which could be relevant for each particular domain) can be explored. In particular, we propose to align the domain divergence between domain pair(s) by considering both inter-dimension and inter-sample correlations, which have been largely ignored by many cross-domain visual recognition methods. Subsequently, to learn more abstract information which could further benefit transferability, we propose to conduct feature disentanglement at the deep feature layer. Extensive experiments based on different visual recognition tasks demonstrate that our proposed framework can learn better transferable feature representation compared with state-of-the-art baselines.

中文翻译：

GMFAD：通过多层特征对齐和解缠结实现广义视觉识别

基于深度学习的方法已被反复证明可以为视觉识别任务带来好处，通常会强烈假设训练和测试数据来自相似的特征空间和分布。然而，这样的假设在视觉识别任务的各种实际应用场景中可能并不总是成立。受深度特征表示的分层组织逐渐导致更高层表示中更多抽象特征的启发，我们建议使用一种称为 GMFAD 的新型特征学习框架来解决这个问题，该框架在多层感知器方式中具有更好的泛化能力。我们首先在浅层学习特征表示，其中域之间可共享的潜在因素（例如，一个子集可能与每个特定领域相关）可以被探索。特别是，我们建议通过考虑维度间和样本间相关性来对齐域对之间的域差异，这在很大程度上被许多跨域视觉识别方法所忽略。随后，为了学习更多可以进一步提高可迁移性的抽象信息，我们建议在深层特征层进行特征解耦。基于不同视觉识别任务的大量实验表明，与最先进的基线相比，我们提出的框架可以学习更好的可迁移特征表示。许多跨域视觉识别方法在很大程度上忽略了这一点。随后，为了学习更多可以进一步提高可迁移性的抽象信息，我们建议在深层特征层进行特征解耦。基于不同视觉识别任务的大量实验表明，与最先进的基线相比，我们提出的框架可以学习更好的可迁移特征表示。许多跨域视觉识别方法在很大程度上忽略了这一点。随后，为了学习更多可以进一步提高可迁移性的抽象信息，我们建议在深层特征层进行特征解耦。基于不同视觉识别任务的大量实验表明，与最先进的基线相比，我们提出的框架可以学习更好的可迁移特征表示。

更新日期：2020-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>