Adversarial Attribute-Text Embedding for Person Search with Natural Language Query,IEEE Transactions on Multimedia

当前位置： X-MOL 学术 › IEEE Trans. Multimedia › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adversarial Attribute-Text Embedding for Person Search with Natural Language Query
IEEE Transactions on Multimedia ( IF 8.4 ) Pub Date : 2020-07-01 , DOI: 10.1109/tmm.2020.2972168
Zheng-Jun Zha , Jiawei Liu , Di Chen , Feng Wu

The newly emerging task of person search with natural language query aims at retrieving the target pedestrian by a text description of the pedestrian. It is more applicable compared to person search with image/video query, i.e., person re-identification. In this paper, we propose a novel Adversarial Attribute-Text Embedding (AATE) network for person search with text query. In particular, a cross-modal adversarial learning module is proposed to learn discriminative and modality-invariant visual-textual features. It consists of a cross-modal learner and a modality discriminator, playing a min-max game in an adversarial learning way. The former is to improve intra-modality discrimination and inter-modality invariance towards confusing the modality discriminator. The latter is to distinguish the features from different modalities and boost the learning of modality-invariant features. Moreover, a visual attribute graph convolutional network is proposed to learn visual attributes of pedestrians, which possess better descriptiveness, interpretability and robustness compared to pedestrian appearance features. A hierarchical text embedding network, consisting of multi-stacked bidirectional LSTMs and a textual attention block, is developed to extract effective textual features from text descriptions of pedestrians. Extensive experimental results on two challenging benchmarks, have demonstrated the effectiveness of the proposed approach.

中文翻译：

使用自然语言查询进行人物搜索的对抗性属性文本嵌入

新出现的自然语言查询人员搜索任务旨在通过行人的文本描述来检索目标行人。与使用图像/视频查询的人员搜索（即人员重新识别）相比，它更适用。在本文中，我们提出了一种新颖的对抗属性文本嵌入 (AATE) 网络，用于通过文本查询进行人物搜索。特别是，提出了一个跨模态对抗学习模块来学习判别性和模态不变的视觉文本特征。它由一个跨模态学习器和一个模态鉴别器组成，以对抗性学习的方式玩最小-最大游戏。前者是为了提高模态内判别和模态间不变性，以混淆模态判别器。后者是为了区分不同模态的特征，促进模态不变特征的学习。此外，提出了一种视觉属性图卷积网络来学习行人的视觉属性，与行人外观特征相比，该网络具有更好的描述性、可解释性和鲁棒性。开发了一个分层文本嵌入网络，由多堆叠双向 LSTM 和文本注意块组成，用于从行人的文本描述中提取有效的文本特征。在两个具有挑战性的基准上的大量实验结果证明了所提出方法的有效性。与行人外观特征相比，它们具有更好的描述性、可解释性和鲁棒性。开发了一个分层文本嵌入网络，由多堆叠双向 LSTM 和文本注意块组成，用于从行人的文本描述中提取有效的文本特征。在两个具有挑战性的基准上的大量实验结果证明了所提出方法的有效性。与行人外观特征相比，它们具有更好的描述性、可解释性和鲁棒性。开发了一个分层文本嵌入网络，由多堆叠双向 LSTM 和文本注意块组成，用于从行人的文本描述中提取有效的文本特征。在两个具有挑战性的基准上的大量实验结果证明了所提出方法的有效性。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11