Deep-STaR: Classification of image time series based on spatio-temporal representations
Graphical abstract
Introduction
The multiplicity of sensors, coupled with the society appetites (e.g., industrial, scientific, leisure) in image content, leads to the production of mass of visual data. They have to be processed, analyzed, understood automatically for indexing or classification purposes. In some cases, this visual data are ( in practice) data when the sensors produce images of a scene at different times.
Such data sources are varied and many applications could benefit from them. In remote sensing, optical satellite sensors image certain regions every week. These data are used for environmental studies or land-cover mapping. For example, the Sentinel-2 Earth Observation satellite constellation provides image sequences over the same geographical area with high spatial, spectral and temporal resolutions around the globe (Drusch et al., 2012). In medicine, radiology imaging devices are used to follow each month the evolution of a pathology in a patient for longitudinal studies (Madhyastha et al., 2018). In biology, a camera fixed on a microscope can be employed to analyze the cell developments (Stuurman and Vale, 2016), etc.
The produced data carry rich spatial and temporal information that must be taken into account to understand particular phenomena not being observable from a single image of the sequence (e.g., vegetation seasonal development from satellite images, tumor remission in medicine) (Ren et al., 2009, Sumpter and Bulpitt, 2000, Weng et al., 2019).
Whether considering a stack of images or a video, we will denote these data as Image Time Series (ITS) in the following. An ITS is basically a set of images of the same scene, ordered chronologically. It can be encoded as a data-cube, two spatial and one temporal dimensions. The acquisition of an ITS can be done with one or multiple sensors to obtain a larger data series with a high temporal frequency.
In this work, we consider a classification task where, given an ITS representing a scene or a particular object, a class label potentially linked to an evolution along time, has to be predicted. In addition, depending on the scene, moving objects or deformable content can be represented.
The analysis of an ITS generally requires the extraction, from image pixels, of visual features as discriminating as possible. In the literature, some approaches focus rather on the temporal aspect. They consider an ITS as a set of independent pixels characterized with their time series (i.e., temporal pixels) and classified individually. In a supervised classification scheme, this has the advantage of providing many learning examples to train a model. The spatial aspect of data is then totally ignored. Nevertheless, in various applications, this aspect is necessary to discriminate certain complex classes. The joint study of the spatial and temporal domains may allow a finer analysis and a better understanding of some phenomena which can characterize the studied objects of interest and their evolution. In this context, some approaches combine spatial and temporal features. Often, the two domains are processed independently and a fusion is operated for the decision. There are also approaches that directly take into account spatio-temporal features calculated from the data-cube, e.g., convolutional features obtained from a Deep Neural Network (DNN). Such features are then natively spatio-temporal but training such models is expensive.
The problem studied in this article is the extraction of spatio-temporal features from ITS and their involvement in a deep classification procedure. Our methodological contribution is twofold:
- •
we propose Deep-STaR, a method dedicated to ITS classification. We investigate novel planar representations of the ITS data, involving both temporal and spatial information. The original spatial dimension of the ITS is embedded in a structure trying to preserve the pixels spatial configuration. Such spatial structure is coupled with the original temporal domain of the ITS in a (planar) spatio-temporal representation, leading to a novel way to structure the ITS, making easier calculus and interpretation. This new representation is used to feed a Convolutional Neural Network (CNN) to learn spatio-temporal features, resulting ultimately to classification decisions;
- •
we investigate an attention mechanism, integrated in our system, providing a semantic map explaining the decision. The main originality is to embed the attention information in the original ITS dimensions. This constitutes a plus value regarding the state-of-the-art since attention was mainly studied in the spatial or temporal domains.
The remainder of this article is organized as follows. Section 2 introduces some related works. In Section 3, we present the Deep-STaR method: firstly, the proposed spatio-temporal representations, secondly, the proposed attention mechanism. Sections 4 Experimental study in remote sensing, 5 Results and discussion present an experimental study in remote sensing and a discussion of the results, coupled with a comparative study. Finally, conclusion and perspectives will be found in Section 6.
Section snippets
Related works for ITS analysis
Numerous approaches exist in the literature for ITS analysis. Depending on the task and the application field (e.g., remote sensing, medical imaging, video analysis). We focus here on the features and the adopted point of view (i.e., dimension). We distinguish three groups of approaches, presented hereinafter: (1) those treating ITS as a set of pixel time series, (2) those integrating spatial information in the analysis, and (3) those exploiting more directly spatio-temporal features.
Deep-STaR: ITS analysis from spatio-temporal representations
This section presents the methodological foundation of the proposed Deep-STaR method for image time series to predict a semantic (class) label from an input ITS. Fig. 1 illustrates the workflow of Deep-STaR.
The ITS can be either a (rectangular) patch representing a complete scene (see left of Fig. 1), or only a region of interest (ROI), a connected set of pixels in the image domain (see Fig. 7). We assume that all pixels of the patch/ROI share the same label. In the following, we will use
Experimental study in remote sensing
Deep-STaR is experimented on a remote sensing application. Recently, new Earth Observation satellite constellations sense masses of satellite image time series (SITS). The Sentinel-2 provides image sequences over a geographical area with high spatial, spectral and temporal resolutions. Such imaging data are useful for agricultural and environmental policy makers, since they enable for example the control of agricultural crop-fields at large-scale to check the annual farmers declarations.
Results and discussion
We discuss here the classification results on the remote sensing application, with the local MS-STR and global G-STR approaches and present some comparisons with selected competitive methods from the state-of-the-art. We finally provide visual results obtained with the attention mechanisms.
Conclusion
In this work, we have proposed the Deep-STaR method designed for image time series classification. Thanks to a remodeling of the image time series into a planar spatio-temporal representation, spatial relationship of pixels is partially preserved, without losing the temporal information and native spatio-temporal features are learned while training a classical CNN. The use of a CNN allows to benefit of pre-learned weights, extracted from ImageNet and fine-tuned with specific data. Two
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The French ANR supported this work under Grant ANR-17-CE23-0015.
References (52)
Pedestrian detection and tracking using temporal differencing and HOG features
Comput. Electric. Enginee
(2014)- et al.
Sentinel-2: Esa’s optical high-resolution mission for GMES operational services
Remot. Sens. Environ.
(2012) - et al.
Duplo: A dual view point deep learning architecture for time series classification
ISPRS J. Photogram. Remote Sens.
(2019) - et al.
Current methods and limitations for longitudinal fMRI analysis across development
Develop. Cogn. Neuro.
(2018) - et al.
Spatio-temporal reasoning for the classification of satellite image time series
Patt. Rec. Lett.
(2012) - et al.
State-of-the-art on spatio-temporal information-based video retrieval
Pattern Rec.
(2009) - et al.
Learning spatio-temporal patterns for predicting object behaviour
Image Vis. Comput.
(2000) - et al.
Detecting trend and seasonal changes in satellite image time series
Remote Sens. Environ.
(2010) - et al.
Fourier analysis of multi-temporal AVHRR data applied to a land cover classification
Int. J. Remote Sens.
(1994) - et al.
Spatio-temporal data mining: A survey of problems and methods
ACM Comput. Surv.
(2018)
The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances
Data Min. Knowl. Discov.
Dense bag-of-temporal-SIFT-words for time series classification
Automatic analysis of the difference image for unsupervised change detection
IEEE Trans. Geosci. Remote Sens.
Alternative algorithm for Hilbert’s space-filling curve
IEEE Trans. Comput.
Deep spatio-temporal random fields for efficient video segmentation
Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks
Urban land cover analysis from satellite image time series based on temporal stability
Image time series classification based on a planar spatio-temporal data representation
Digital change detection methods in ecosystem monitoring: A review
Int. J. Remote Sens.
A method for the analysis of small crop fields in sentinel-2 dense time series
IEEE Trans. Geosci. Remote Sens.
End-to-end learning of deep spatio-temporal representations for satellite image time series classification
Change detection in VHR images based on morphological attribute profiles
IEEE Geosci. Remote Sens. Lett.
Deep learning for time series classification: a review
Data Min. Knowl. Discov.
Spatiotemporal multiplier networks for video action recognition
Unsupervised feature learning from temporal data
Large-scale semantic classification: Outcome of the first year of inria aerial image labeling benchmark
Cited by (7)
Prospects for the use of pseudo-color image processing in analysis of long-term time series of satellite data in the task of assessing vegetation cover state
2023, Sovremennye Problemy Distantsionnogo Zondirovaniya Zemli iz KosmosaDealing with Incomplete Land-Cover Database Annotations Applied to Satellite Image Time Series Semantic Segmentation
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
- 1
All authors have contributed in an equal way to the different steps of the elaboration of the paper.