Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification

doi:10.1016/j.isprsjprs.2021.08.001

ISPRS Journal of Photogrammetry and Remote Sensing

Volume 179, September 2021, Pages 145-158

https://doi.org/10.1016/j.isprsjprs.2021.08.001 Get rights and content

Abstract

Although deep learning has revolutionized remote sensing (RS) image scene classification, current deep learning-based approaches highly depend on the massive supervision of predetermined scene categories and have disappointingly poor performance on new categories that go beyond predetermined scene categories. In reality, the classification task often has to be extended along with the emergence of new applications that inevitably involve new categories of RS image scenes, so how to make the deep learning model own the inference ability to recognize the RS image scenes from unseen categories, which do not overlap the predetermined scene categories in the training stage, becomes incredibly important. By fully exploiting the RS domain characteristics, this paper constructs a new remote sensing knowledge graph (RSKG) from scratch to support the inference recognition of unseen RS image scenes. To improve the semantic representation ability of RS-oriented scene categories, this paper proposes to generate a Semantic Representation of scene categories by representation learning of RSKG (SR-RSKG). To pursue robust cross-modal matching between visual features and semantic representations, this paper proposes a novel deep alignment network (DAN) with a series of well-designed optimization constraints, which can simultaneously address zero-shot and generalized zero-shot RS image scene classification. Extensive experiments on one merged RS image scene dataset, which is the integration of multiple publicly open datasets, show that the recommended SR-RSKG obviously outperforms the traditional knowledge types (e.g., natural language processing models and manually annotated attribute vectors), and our proposed DAN shows better performance compared with the state-of-the-art methods under both the zero-shot and generalized zero-shot RS image scene classification settings. The constructed RSKG will be made publicly available along with this paper (https://github.com/kdy2021/SR-RSKG).

Introduction

Benefiting from the rapid advances in aerospace, sensor and communication technologies, human beings have entered an era of remote sensing (RS) big data (Chi et al., 2016, Li et al., 2021a, Lobry et al., 2020). Automatically accurate classification of these oversized RS images is one basic but important task for mining the value of RS big data (Cheng and Han, 2016, Gu et al., 2019, Li et al., 2020, Marcos et al., 2018). Along with the spatial resolution improvement of RS imagery, pixel-level or object-level classification methods show great limitations (Blaschke, 2010, Li et al., 2016, Cheng et al., 2017). As a consequence, more attention has been given to scene-level RS image classification due to its stable classification performance and its wide applications in natural disaster monitoring (Cheng et al., 2013), multimodal data fusion (Gerke et al., 2014), functional zone classification (Zhang et al., 2018), object detection (Tao et al., 2019a, Tao et al., 2019b), and image retrieval (Demir and Bruzzone, 2016, Li et al., 2018).

Until now, deep learning (LeCun et al., 2015) has greatly improved RS image scene classification (Li et al., 2021c, Zhang et al., 2016). However, the current deep learning models have good classification performance only when each scene category has sufficient samples. In the era of RS big data, the number of RS scene categories presents an explosive growth trend. It is unrealistic to collect sufficient RS image samples and construct their labels for all categories at once. Hence, identifying RS image scenes that never appear in the training stage has important practical value (Li et al., 2017a) in the era of RS big data. Inspired by humans’ inference ability, embedding prior knowledge into the learning process is an ideal method for addressing this issue (Li et al., 2021b).

In the literature, the development of zero-shot learning (ZSL) (Larochelle et al., 2008, Palatucci et al., 2009, Ji et al., 2020) in recent years has provided promising solutions to recognize samples from unseen categories. By leveraging the prior knowledge of categories, including seen and unseen categories, as auxiliary information, ZSL can learn samples from seen categories to identify samples from the unseen categories. Generally, the semantic information of seen and unseen classes is the common sense of human beings, which is universal and can be used in both of the training and testing stages, but the image samples of unseen classes do not exist in the training stage. Hence, how to express semantics is the key to pursue the superior performance of ZSL. For example, we can recognize the zebra image through the images of tiger, panda and horse, combined with the semantic information such as tiger stripes, panda colors and horse shapes. From this intuitive finding, we can also see the indispensable importance of semantic information in the ZSL task. As an extension of ZSL, generalized zero-shot learning (GZSL) attempts to learn samples from seen categories to simultaneously recognize seen and unseen samples in the testing stage, which is a more challenging but practical task. In the field of computer vision, large numbers of ZSL and GZSL methods have been proposed. In contrast, ZSL and GZSL are rarely discussed in the field of RS (Sumbul et al., 2017). Compared with the computer vision field, the following characteristics in the RS field limit the development of ZSL and GZSL. On the one hand, the names of RS scene categories often have domain specificity. If the semantic representations of RS scene categories are generated by directly leveraging the general natural language processing model (e.g., Word2Vec) to map the names of RS scene categories, the semantic representations cannot reflect the intrinsically semantic information of the RS category. On the other hand, RS image scenes, presenting large intraclass differences and large interclass similarities, generally have more complex appearances than natural images in the computer vision field. Generally, the ZSL and GZSL methods that have achieved excellent results in the field of computer vision cannot be directly extended to address the task in the RS domain. Overall, it deserves much more exploration to promote zero-shot and generalized zero-shot RS image scene classification.

With the aforementioned considerations, this paper mainly focuses on exploiting zero-shot and generalized zero-shot RS image scene classification. The quality of semantic representation of categories plays an important role in ZSL and GZSL (Li et al., 2017a, Li et al., 2017b, Li et al., 2017c). To generate the high-quality semantic representations of RS scene categories, this paper constructs a new remote sensing knowledge graph (RSKG) based on the domain prior knowledge from human experts, where RSKG fully considers the rich connections between RS scene elements. To the best of our knowledge, this paper, for the first time, proposes to calculate the Semantic Representations of RS scene categories by representation learning of RSKG (SR-RSKG). Based on SR-RSKG, this paper proposes a new deep alignment network with a series of well-designed constraints (DAN), which can robustly match the visual features and semantic representations in the latent space, to address zero-shot and generalized zero-shot RS image scene classification. Experimental results on one integrated RS image scene dataset show that our proposed SR-RSKG is superior to traditional knowledge types (e.g., Word2Vec (Mikolov et al., 2013), BERT (Devlin et al., 2018), and manually annotated attribute vectors). In addition, the proposed DAN performs better than the state-of-the-art methods under both the ZSL and GZSL settings. The major contributions of this paper are summarized as follows.

1)
To the best of our knowledge, this paper, for the first time, proposes to generate the semantic representations of RS scene categories by representation learning of RSKG. Extensive experiments verify its superiority compared with traditional prior knowledge types. The constructed RSKG will be made publicly available along with this paper.
2)
By pursuing the stable cross-modal alignment of the same category and scattered distribution of different categories, this paper proposes a novel DAN to robustly match visual features and semantic features in the latent space. Extensive experiments show that the proposed DAN outperforms the existing methods under both the ZSL and GZSL settings.

The remainder of this paper is organized as follows. Section 2 discusses the related works. Section 3 introduces the construction process of RSKG and depicts representation learning of RSKG. Section 4 introduces the DAN model in detail. Section 5 summarizes the experimental results. Finally, the conclusion is detailed in Section 6.

Section snippets

Related work

In this section, we briefly review the most relevant works in the literature that include semantic representations of RS scene categories and zero-shot RS image scene classification.

Representation learning of remote sensing knowledge graph

In this section, we first introduce the construction process of RSKG and then discuss representation learning of RSKG.

Robust deep alignment network for zero-shot and generalized zero-shot remote sensing image scene classification

Section 4.1 introduces the definition of ZSL and GZSL. In Section 4.2, we clarify the robust deep alignment network for zero-shot and generalized zero-shot RS image scene classification. In addition, we introduce the process of classifying RS image scenes from unseen categories.

Experimental analysis and discussion

In this section, we design extensive experiments to evaluate our proposed approach. In Section 5.1, we introduce the experimental settings. Then, we analyze the sensitivity of critical parameters in our proposed approach in Section 5.2. Finally, we compare our method with the state-of-the-art methods in Section 5.3.

Conclusion

Driven by the increasing practical demands of ZSL and GZSL in the RS field, this paper mainly focuses on zero-shot and generalized zero-shot RS image scene classification. Considering that natural language processing models based on generalized corpora have poor performance in describing RS-oriented scene categories appropriately, this paper, for the first time, proposes to generate semantic representations of RS scene categories through representation learning of RSKG and applies them to

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFB0505003; the National Natural Science Foundation of China under Grant 41971284 ; the State Key Program of the National Natural Science Foundation of China under Grants 42030102 and 92038301; the Foundation for Innovative Research Groups of the Natural Science Foundation of Hubei Province under Grant 2020CFA003; the China Postdoctoral Science Foundation under Grants 2016M590716 and

References (67)

T. Blaschke
Object based image analysis for remote sensing
ISPRS Journal of Photogrammetry and Remote Sensing
(2010)
G. Cheng et al.
A survey on object detection in optical remote sensing images
ISPRS Journal of Photogrammetry and Remote Sensing
(2016)
J. Hoffart et al.
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia
Artificial Intelligence
(2013)
Y. Li et al.
Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning
Remote Sensing of Environment
(2020)
Y. Li et al.
Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation
ISPRS Journal of Photogrammetry and Remote Sensing
(2021)
D. Marcos et al.
Land cover mapping at very high resolution with rotation equivariant CNNs: Towards small yet accurate models
ISPRS Journal of Photogrammetry and Remote Sensing
(2018)
Chao Tao et al.
Spatial information inference net: road extraction using road-specific contextual information
ISPRS Journal of Photogrammetry and Remote Sensing
(2019)
Nicolas Tempelmeier et al.
Linking openstreetmap with knowledge graphs—link discovery for schema-agnostic volunteered geographic information
Future Generation Computer Systems
(2021)
Xiuyuan Zhang et al.
Integrating bottom-up classification and top-down feedback for improving urban land-cover and functional-zone mapping
Remote Sensing of Environment
(2018)
Weixun Zhou et al.
PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval
ISPRS Journal of Photogrammetry and Remote Sensing
(2018)

S. Auer et al.

Dbpedia: A nucleus for a web of open data

The Semantic Web

(2007)

P. Bojanowski et al.

Enriching word vectors with subword information

Transactions of the Association for Computational Linguistics

(2017)

K. Bollacker et al.

Freebase: a collaboratively created graph database for structuring human knowledge

A. Bordes et al.

Translating embeddings for modeling multi-relational data

Proceedings of Neural Information Processing Systems

(2013)

T. Chen et al.

A simple framework for contrastive learning of visual representations

International conference on machine learning. PMLR

(2020)

G. Cheng et al.

Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA

International Journal of Remote Sensing

(2013)

G. Cheng et al.

Remote sensing image scene classification: Benchmark and state of the art

Proceedings of the IEEE

(2017)

M. Chi et al.

Big data for remote sensing: Challenges and opportunities

Proceedings of the IEEE

(2016)

E. Clementini

A conceptual framework for modelling spatial relations

Information Technology and Control

(2009)

B. Demir et al.

Hashing-based scalable remote sensing image search and retrieval in large archives

IEEE Transactions on Geoscience and Remote Sensing

(2016)

T. Dettmers et al.

Convolutional 2d knowledge graph embeddings

In: Proceedings of the AAAI Conference on Artificial Intelligence

(2018)

Devlin, J., Chang, M., Lee, K., Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language...

M. Elhoseiny et al.

Creativity inspired zero-shot learning

F. Erxleben et al.

Introducing Wikidata to the linked data web

Gerke, M., Xiao J., Fusion of airborne laserscanning point clouds and images for supervised and unsupervised scene...

Y. Gu et al.

A survey on deep learning-driven remote sensing image scene understanding: Scene classification, scene retrieval and scene-guided object detection

Applied Sciences

(2019)

W. Guo et al.

Deep multimodal representation learning: A survey

IEEE Access

(2019)

K. He et al.

Deep residual learning for image recognition

K. He et al.

Momentum contrast for unsupervised visual representation learning

Y. Hua et al.

Relation network for multilabel aerial image classification

IEEE Transactions on Geoscience and Remote Sensing

(2020)

Z. Ji et al.

Deep ranking for image zero-shot multi-label classification

IEEE Transactions on Image Processing

(2020)

Kingma, D.P., Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint...

E. Kodirov et al.

Semantic autoencoder for zero-shot learning

Cited by (88)

Few-shot remote sensing image scene classification: Recent advances, new baselines, and future trends
2024, ISPRS Journal of Photogrammetry and Remote Sensing
Remote sensing image scene classification (RSI-SC) is crucial for various high-level applications, including RSI retrieval, image captioning, and object detection. Deep learning-based methods can accurately predict scene categories. However, these approaches often require numerous labeled samples for training, limiting their practicality in real-world RS applications with scarce label resources. In contrast, few-shot remote sensing image scene classification (FS-RSI-SC) has garnered substantial research interest owing to its potential to mitigate the need for extensive training samples. In recent years, there has been a surge in studies on FS-RSI-SC.
This paper presents a comprehensive overview of FS-RSI-SC research, categorizing existing methods into two groups. The first group comprises approaches based on data augmentation, transfer learning, metric learning, and meta-learning. Our analysis reveals that most existing FS-RSI-SC methods fall into the meta-learning category, employing attention mechanisms, self-supervised learning (SSL), and feature fusion techniques for enhanced performance. Additionally, transfer learning-based methods consistently outperform other approaches in this category. The second group is centered around large-scale pre-training, which has demonstrated remarkable competitiveness across various tasks, including FS-RSI-SC. This special group of methods has shown considerable potential and is expected to attract more attention with the increasing popularity of large-scale pre-training and the unimodal and multimodal foundation models.
Moreover, we proposed a pipeline that harnesses the capabilities of powerful large vision-language models (VLMs) as image encoders, establishing new baselines for FS-RSI-SC on commonly used datasets under standard experimental settings. Our empirical results validated the effectiveness of utilizing large VLMs and highlighted their potential for FS-RSI-SC. Through a joint analysis of state-of-the-art methods and our experiments with VLMs, we identified the prevailing challenges in FS-RSI-SC and outlined promising directions for future research.
Trustworthy remote sensing interpretation: Concepts, technologies, and applications
2024, ISPRS Journal of Photogrammetry and Remote Sensing
Geographic spaces is a vast and complex system involving multiple elements and nonlinear interactions of these elements, and rich in geographical phenomena, processes and patterns. Artificial intelligence methods (AI) are increasingly utilized to extract information of interest, patterns and insights from massive remote sensing (RS) images. Among them, two representative paradigms for RS interpretation are knowledge-driven symbolism and data-driven connectionism. Knowledge-driven approaches are certain and theoretically sound, which consider the regularities, formulae with accompanying theorems, and expert knowledge. However, it is difficult to address huge-scale RS data and complex nonlinear problems. The data-driven paradigm has stronger data learning and representation ability, yet poor interpretability, low trustworthiness of interpretation results, and reliance on massive labeled data. To address these limitations, we argue that current RS intelligent interpretation requires the guidance of geographic ideas and a unified theoretical framework rather than the independent use of remotely sensed data, duplicative development of interpretation models, or isolated applications of geographic information extraction. Thus, we introduce the concepts of trustworthy RS interpretation (TRSI) with multi-granularity of space, time, and attribute to facilitate the understanding of the morphology, types, and indicators (visual perception) and the perspective of the structure, evolution, and trends (geographical cognition) of complex geographical space. Second, integrated technologies of the TRSI with multiple scales and stages are developed, including a pixel-level visual perception, a geo-parcel-level quantitative inversion, and a scene-level geographical cognition. Finally, a RS big data interpretation system with quantitative semantic parsing is designed to support precise applications with cloud–edge-end collaboration. Rather than a simple overview and accumulation of traditional RS intelligent interpretation methods, this paper analyzes current challenges in RS interpretation and proposes new concepts, technologies, and applications of TRSI. It will empower us to perceive the complex geographical space comprehensibly and reveal its structural composition and evolutionary mechanism, serving geospatial decision applications such as evaluation, planning, and forecasting.
Knowledge correlation graph-guided multi-source interaction domain adaptation network for rotating machinery fault diagnosis
2023, ISA Transactions
Leveraging generalized knowledge from multiple source domains with rich labels to the target domain without labeled data is a more realistic and challenging issue compared with single-source domain adaptation. Furthermore, the distribution discrepancies between each source domain and the expansion of data categories increase the difficulty of aligning each source domain with the target domain. To alleviate these issues, a knowledge correlation graph-guided multi-source interaction domain adaptation network (KCGMIDAN) is developed for rotating machinery fault diagnosis. Firstly, a random mini-batch is randomly selected to update comprehensive feature representations (CFR) extracted from each data category across all domains, thus promoting the knowledge interaction of acquired CFR between the current and the next epochs. Then, a knowledge correlation graph (KCG) is established on all CFR to boost knowledge propagation among various domains. To improve the compactness of characteristics within the same category and the separation of various categories, two losses are designed in this procedure to place constraints on the relationships between categories. Finally, query samples are added into KCG to construct the extended KCG, and the recognition of samples is completed by using built deep graph network based on the extended KCG. Extensive experimental results verify that KCGMIDAN can achieve better recognition performance than existing methods.
MSPIF: Multi-stage progressive visible and infrared image fusion with structures preservation
2023, Infrared Physics and Technology
Infrared and visible image fusion is one of the hottest research in computer vision to improve image quality. Traditional fusion methods suffer from detail loss, low resolution, and application scenarios limitation. To address these problems, this paper proposes a multi-stage progressive visible and infrared image fusion strategy (MSPIF), including the following key stages: Firstly, the visible image is enhanced using a weighted fusion visible image enhancement algorithm. Secondly, the infrared and enhanced visible images are both decomposed by a pre-trained network, namely Retinex_Net, to obtain their respective illumination and reflectance components. Thirdly, the reflectance components are decomposed by discrete wavelet transform (DWT). The low-frequency components are fused by a weighted information entropy fusion strategy while the high-frequency components are fused by a local energy fusion strategy. Such two fused parts contribute to the final fused reflectance component with inverse DWT (IDWT). Meanwhile, the illumination components are fused using a multi-scale fusion strategy based on the Laplace pyramid. Finally, the improved result is generated by the fused reflectance and illumination components based on Retinex theory. Quantitative and qualitative results of experiments on the TNO, KAIST, and ROAD datasets show that the proposed MSPIF is effective and achieves good results with structures preservation, even is superior to existing methods.
A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Due to limited resources and environmental pollution, monitoring the geological environment has become essential for many countries’ sustainable development. As various high-resolution remote-sensing (RS) imaging platforms are continuously available, the remote sensing of the geological environment (GERS) provides a fine-grain, all-weather, and low-cost method for identifying geological elements. Mainstream machine learning (ML) and deep learning (DL) methods can extract high-level high-dimensional semantic information and thus supply an efficient tool for high-precision classification and recognition in many fields. Therefore, the integration of advanced methods and multi-source RS images for GERS interpretation has achieved remarkable breakthroughs during the past decades. However, to the best of our knowledge, a systematic survey of the advances of GERS interpretation regarding ML and DL methods is still lacking. Through the collection of extensive published research in this area, this survey outlines and analyzes the challenges, progress, and promising directions of GERS interpretation. Specifically, the main challenges and difficulties in identifying GERS elements are first summarized in four aspects: sufficient element characteristics and variations, complex context disturbance, RS image quality and types, and other limitations in GERS interpretation. Second, we systematically introduce various RS imaging platforms and advanced ML and DL methods for GERS interpretation. Third, the research status and trends of several GERS applications, including their use for lithology, soil, water, rock glacier, and geological disaster, are ultimately collected and compared. Finally, potential opportunities for future research are discussed. After the systematic and comprehensive review, the conclusive findings suggest that longtime large-scale GERS interpretation and corresponding change pattern analysis will be a significant future direction to meet the needs of environment improvement and sustainable development. To complete the above goals, a fusion of satellite, airplane, environmental monitoring, geological survey, and other types of data will provide enough discriminative information, and expert knowledge, GIS, and high-performance computing techniques will be helpful to improve the efficiency and generalizability of ML and DL methods for processing the multi-platform RS data.
Style and content separation network for remote sensing image cross-scene generalization
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Domain shift is the problem in which trained models fail to maintain their performance when they confront new test domains. Cross-scene classification, a technique developed to overcome this challenge, has attracted significant research interest in the field of remote sensing (RS). By exploring the correlation between the source domain and the target domain, relevant models can have better generalization ability regarding the target domain. Nevertheless, Domain Adaptation (DA), as the main technique for cross-scene classification, necessarily requires access to the target samples to assist the model training, a condition that is difficult to satisfy in real-world applications. Domain Generalization (DG) has attracted increasing research attention in recent years. Given one or several source domain(s), DG tends to learn models that can perform well when dealing with unseen (inaccessible) target domains. DG can better deal with out-of-distribution generalization with fewer restrictions, making it a good fit for cross-scene classification. Notably, little research has been conducted on domain generalization in the field of remote sensing. Thus, we refer to this type of method as cross-scene generalization. Recent studies have shown that convolutional neural networks have a strong bias towards recognizing textures rather than shapes. Accordingly, we proposed a Style and Content Separation Network (SCSN) for RS image cross-scene generalization in this paper, which can improve the generalizability and discriminative capability. The Style and Content Separation (SCS) module uses instance normalization to obtain the content information and thereby ensure better generalization ability. Moreover, the residual feature, which contains the style information, can supplement the feature representations after refinement. We further proposed a separation loss to constrain the style and content separation process. Experimental results and relevant analysis demonstrate the effectiveness of the proposed SCSN on cross-scene generalization tasks. Code is available at https://github.com/WHUzhusihan96/SCSN.

View all citing articles on Scopus

View full text

Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification

Abstract

Introduction

Section snippets

Related work

Representation learning of remote sensing knowledge graph

Robust deep alignment network for zero-shot and generalized zero-shot remote sensing image scene classification

Experimental analysis and discussion

Conclusion

Declaration of Competing Interest

Acknowledgments

ISPRS Journal of Photogrammetry and Remote Sensing

ISPRS Journal of Photogrammetry and Remote Sensing

Artificial Intelligence

Remote Sensing of Environment

ISPRS Journal of Photogrammetry and Remote Sensing

ISPRS Journal of Photogrammetry and Remote Sensing

ISPRS Journal of Photogrammetry and Remote Sensing

Future Generation Computer Systems

Remote Sensing of Environment

ISPRS Journal of Photogrammetry and Remote Sensing

Dbpedia: A nucleus for a web of open data

The Semantic Web

Enriching word vectors with subword information

Transactions of the Association for Computational Linguistics

Freebase: a collaboratively created graph database for structuring human knowledge

Translating embeddings for modeling multi-relational data

Proceedings of Neural Information Processing Systems

A simple framework for contrastive learning of visual representations

International conference on machine learning. PMLR

Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA

International Journal of Remote Sensing

Remote sensing image scene classification: Benchmark and state of the art

Proceedings of the IEEE

Big data for remote sensing: Challenges and opportunities

Proceedings of the IEEE

A conceptual framework for modelling spatial relations

Information Technology and Control

Hashing-based scalable remote sensing image search and retrieval in large archives

IEEE Transactions on Geoscience and Remote Sensing

Convolutional 2d knowledge graph embeddings

In: Proceedings of the AAAI Conference on Artificial Intelligence

Creativity inspired zero-shot learning

Introducing Wikidata to the linked data web

A survey on deep learning-driven remote sensing image scene understanding: Scene classification, scene retrieval and scene-guided object detection

Applied Sciences

Deep multimodal representation learning: A survey

IEEE Access

Deep residual learning for image recognition

Momentum contrast for unsupervised visual representation learning

Relation network for multilabel aerial image classification

IEEE Transactions on Geoscience and Remote Sensing

Deep ranking for image zero-shot multi-label classification

IEEE Transactions on Image Processing

Semantic autoencoder for zero-shot learning