Identify autism spectrum disorder via dynamic filter and deep spatiotemporal feature extraction

https://doi.org/10.1016/j.image.2021.116195Get rights and content

Highlights

  • An effective model is proposed to identify children with autism spectrum disorder.

  • Dynamic filters are introduced to customize feature maps for each scanpath.

  • Field of view (FoV) maps are proposed to extract fixation-specific features.

  • A module based on scanpath and saliency prediction is proposed to generate spatiotemporal features.

Abstract

Early intervention and treatment are crucial for individuals with autism spectrum disorder (ASD). However, it is challenging to identify individuals with ASD at an early age, i.e. under 3 years old, due to the lack of an effective and objective identification method. The mainstream clinical diagnosis relies on long-term observation of children’s behaviors, which is time-consuming and expensive, and thus how to accurately and quickly distinguish children with ASD in early childhood has become a critical issue. In this paper, we propose an eye movement based model to identify children with ASD. Specifically, children are required to freely observe some images. At the same time, their eye movements are recorded to analyze. Both the observed image and eye movements are input into our model. The input data are processed by the embedding layer, dynamic filters and LSTM block, respectively. Eventually, the spatiotemporal features are extracted to identify the eye movements belonging to a child with ASD or a typically developed child. Experiments on the Saliency4ASD dataset demonstrate that the proposed model achieves state-of-the-art performance in identifying children with ASD.

Introduction

ASD is a heritable neurodevelopmental disorder across the lifespan [1]. Since people with ASD show delays in development and cannot take care of themselves, the disorder has a remarkably negative impact on the family. The currently accepted and effective treatment is to intervene at an early age [2], so how to accurately and quickly diagnose children with ASD in early childhood has become a critical issue. Its diagnosis, however, is quite difficult because of its complicated causes. At present, the diagnosis mainly depends on the subjective and time-consuming clinical judgment, e.g. observation [3] and interview [4], which is limited by scarce medical services and tends to be inconsistent. Thus there is an urgent requirement for an effective and objective method to identify individuals with ASD.

Researchers applied eye trackers to collect eye movement data of the subjects when viewing a set of images, and found that the gazed points (namely fixations) significantly differed between the neurodevelopmental disorder group and the typically developed group, which indicates that individuals with neurodevelopmental disorders have an atypical visual attention pattern [5], [6]. Individuals with ASD prefer idiosyncratic objects (e.g. metalwork and keyboards) and hand-related utensils (e.g. scissors and bottles) [7], [8], [9]. They, on the contrary, tend to avoid eye contact with humans and have reduced attention to faces [10], [11]. Consequently, eye movement data have been used as a biomarker to identify individuals with ASD [12], [13].

In recent years, the rapid development of deep learning has remarkably shaped many research directions, and also contributed to the improvement in some eye movement related researches. Since deep learning has an inevitable demand for data, some eye movement datasets have been built and released [14], [15], [16]. Besides, there are also eye movement datasets that focus on some specific populations, such as individuals with ASD [17], [18] and people of different ages [19]. The success of deep learning is also pushing ASD identification forward. For example, by using the deep learning based face detection models, the eye movement data on face areas is analyzed and exploited to identify individuals with ASD [20]. Similarly, the automatically predicted saliency values at fixation locations are exploited to predict the fixation sequence belonging to either an ASD or a typical development (TD) [21], [22]. Besides, the deep visual features, which are extracted by well-trained deep neural networks, can be combined with eye movement data to identify people with ASD [23], [24]. However, due to the inadequate mining of eye movement data characteristics and the insufficiency of combining eye movement data with deep features, the accuracy of identification methods remains to be improved.

The proposed model takes a scanpath that consists of a sequence of fixations and the viewed image as inputs, and then predicts the viewer as either an ASD or a TD. Specifically, the scanpath is used repeatedly to fully capture the visual behavior pattern of an observer and is combined with the visual feature maps extracted by a deep neural network (DNN) to better identify people with ASD. In summary, the contributions of this paper can be summarized in two-fold:

  • We propose to apply the dynamic filters to convolve with the deep visual feature maps, so as to transfer the universal feature maps, which are extracted from the observed image by pre-trained visual feature encoder, to sample-specific feature maps. The dynamic filter generator learns to process eye movement data to dynamic convolution kernels which can adjust responses in feature maps according to the eye movement.

  • We propose a module to effectively extract spatiotemporal features of scanpath via the adoption of field of view (FoV) maps. The spatial features are extracted based on the field of view of each fixation and deep feature maps, and the temporal features are processed based on the spatial features by using an LSTM (Long Short-Term Memory) block. The spatiotemporal features, which vary as the observer’s attention shifts, are discriminative for ASD identification.

The rest of this paper is organized as follows. The related works on ASD identification and the application of the dynamic filter are introduced in Section 2. The proposed model is elaborated in Section 3. Experiment setup and results analysis are presented in Section 4. Section 5 summarizes the conclusions.

Section snippets

Previous work

Saliency4ASD [25], organized at IEEE ICME’2019, is a grand challenge aiming at promoting research on visual attention of children with ASD. The challenge had two tracks: (1) predict visual attention maps, a.k.a. saliency maps, of individuals with ASD, and (2) identify children with ASD from typically developed children.

In the first track, we proposed a fully convolutional network that exploits multi-level feature maps to effectively predict the visual attention of children with ASD [26]. In [26]

Proposed model

As shown in Fig. 1, given the inputs including an image I and a scanpath P, the proposed model predicts the owner of the scanpath, either an ASD or a TD. First, a visual feature encoder is used to encode the image and obtain its visual feature maps. Second, the scanpath is reused in three modules, i.e. eye movement embedding, dynamic filters generator and FoV maps generator, to fully capture the visual behavior pattern of an observer. Specifically, in the first module, the intention of using

Experiments

In this section, we introduce the dataset, metrics and implementation details in Section 4.1. Then we conduct an ablation study to demonstrate the indispensability of each module in Section 4.2. Eventually, we compare our model with state-of-the-art models in Section 4.3.

Conclusions

We propose an image-level model to identify a scanpath belonging to either an individual with ASD or a typically developed child. First, the viewed image is encoded via a visual feature encoder. Then, the scanpath is utilized in three modules. The first module is the eye movement embedding layer, which computes the eye movement indicators and embeds them into a feature vector. The second module generates dynamic filters based on the scanpath. The dynamic filters transfer the universal visual

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant No. 61771301.

References (49)

  • ThaparA. et al.

    Neurodevelopmental disorders

    Lancet Psychiatr.

    (2017)
  • WangS. et al.

    Atypical visual saliency in autism spectrum disorder quantified through model-based eye tracking

    Neuron

    (2015)
  • ZallaT. et al.

    Reduced saccadic inhibition of return to moving eyes in autism spectrum disorders

    Vis. Res.

    (2016)
  • BradshawJ. et al.

    Feasibility and effectiveness of very early intervention for infants at-risk for autism spectrum disorder: A systematic review

    J. Autism Dev. Disord.

    (2015)
  • LordC. et al.

    Austism diagnostic observation schedule: A standardized observation of communicative and social behavior

    J. Autism Dev. Disord.

    (1989)
  • LordC. et al.

    Autism diagnostic interview-revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders

    J. Autism Dev. Disord.

    (1994)
  • SassonN.J. et al.

    Visual attention to competing social and object images by preschool children with autism spectrum disorder

    J. Autism Dev. Disord.

    (2014)
  • SweeneyJ.A. et al.

    Eye movements in neurodevelopmental disorders

    Curr. Opin. Neurol.

    (2004)
  • SassonN.J. et al.

    Affective responses by adults with autism are reduced to social images but elevated to images related to circumscribed interests

    PLoS One

    (2012)
  • DuanH. et al.

    Learning to predict where the children with asd look

  • BirminghamE. et al.

    Comparing social attention in autism and amygdala lesions: Effects of stimulus and task condition

    Soc. Neurosci.

    (2011)
  • DuanH. et al.

    Visual attention analysis and prediction on human faces for children with autism spectrum disorder

    ACM Trans. Multimed. Comput. Commun. Appl.

    (2019)
  • MuriasM. et al.

    Validation of eye-tracking measures of social attention as a potential biomarker for autism clinical trials

    Autism Res.

    (2018)
  • FreedmanE.G. et al.

    Eye movements, sensorimotor adaptation and cerebellar-dependent learning in autism: Toward potential biomarkers and subphenotypes

    Eur. J. Neurosci.

    (2018)
  • XuJ. et al.

    Predicting human gaze beyond pixels

    J. Vis.

    (2014)
  • BorjiA. et al.

    CAT2000: A large scale fixation dataset for boosting saliency research

    (2015)
  • CheZ. et al.

    How is gaze influenced by image transformations? Dataset and model

    IEEE Trans. Image Process.

    (2020)
  • H. Duan, G. Zhai, X. Min, Z. Che, Y. Fang, X. Yang, J. Gutiérrez, P. Le Callet, A dataset of eye movements for the...
  • CaretteR. et al.

    Visualization of eye-tracking patterns in autism spectrum disorder: Method and dataset

  • BucherA. et al.

    Age differences in emotion perception in a multiple target setting: An eye-tracking study

    Emotion

    (2019)
  • LiuW. et al.

    Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework

    Autism Res.

    (2016)
  • StartsevM. et al.

    Classifying autism spectrum disorder based on scanpaths and saliency

  • TaoY. et al.

    SP-ASDNet: CNN-LSTM based ASD classification model using observer scanpaths

  • JiangM. et al.

    Learning visual attention to identify people with autism spectrum disorder

    IEEE Int. Conf. Comput. Vis.

    (2017)
  • Cited by (6)

    • Deep learning with image-based autism spectrum disorder analysis: A systematic review

      2024, Engineering Applications of Artificial Intelligence
    • SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer

      2024, Proceedings of the AAAI Conference on Artificial Intelligence
    View full text