Cortical networks of dynamic scene category representation in the human brain

doi:10.1016/j.cortex.2021.07.008

Cortex

Volume 143, October 2021, Pages 127-147

https://doi.org/10.1016/j.cortex.2021.07.008 Get rights and content

Abstract

Humans have an impressive ability to rapidly process global information in natural scenes to infer their category. Yet, it remains unclear whether and how scene categories observed dynamically in the natural world are represented in cerebral cortex beyond few canonical scene-selective areas. To address this question, here we examined the representation of dynamic visual scenes by recording whole-brain blood oxygenation level-dependent (BOLD) responses while subjects viewed natural movies. We fit voxelwise encoding models to estimate tuning for scene categories that reflect statistical ensembles of objects and actions in the natural world. We find that this scene-category model explains a significant portion of the response variance broadly across cerebral cortex. Cluster analysis of scene-category tuning profiles across cortex reveals nine spatially-segregated networks of brain regions consistently across subjects. These networks show heterogeneous tuning for a diverse set of dynamic scene categories related to navigation, human activity, social interaction, civilization, natural environment, non-human animals, motion-energy, and texture, suggesting that the organization of scene category representation is quite complex.

Introduction

A primary aim of visual neuroscience is to shed light on how the human brain represents diverse information in natural scenes. Behavioral research on scene perception suggests that humans categorize scenes to more efficiently process the wealth of information in visual scenes (Greene & Oliva, 2009; Konkle, Brady, Alvarez, & Oliva, 2010; Rousselet, Joubert, & Fabre-Thorpe, 2005). Therefore, it is likely that information on scene categories is represented across cortex. Consistent with this notion, previous neuroimaging studies have demonstrated that the category of a visual scene could be classified among a limited number of basic categories (e.g., beaches, forests, mountains) based on blood-oxygen level-dependent (BOLD) responses in classical scene-selective regions (parahippocampal place area, PPA; retrosplenial complex, RSC; and occipital place area, OPA), object-selective lateral occipital complex (LO), and anterior visual cortex (Epstein & Morgan, 2012; Jung, Larsen, & Walther, 2018; Walther, Caddigan, Fei–Fei, & Beck, 2009; Walther, Chai, Caddigan, Beck, & Fei–Fei, 2011). A common approach in these studies was to operationally define visual scenes into few non-overlapping categories. However, natural scene categories might show varying degrees of statistical correlation, and a real-world scene might be characterized under several distinct categories. In addition, because these studies used static scenes, they did not possess the necessary tools to demonstrate how dynamic scene categories are represented in the human brain.

To examine the statistics of natural scene categories, a recent study (Stansbury, Naselaris, & Gallant, 2013) used a data-driven algorithm to procure a broad set of scene categories wherein potential similarities between the categories were also taken into account. In this approach, each scene category is defined as a list of presence probabilities for a large array of constituent objects that appear within natural scenes. Once the algorithm learns a set of categories, the likelihood that a given scene belongs to each of the learned categories can be inferred based on the objects within the scene. This scene category model has been reported to yield improved predictions of single-voxel BOLD responses in classical face- and scene-selective areas compared to an alternative model based on the presence of a few diagnostic objects that frequently appeared in the presented natural images (Stansbury et al., 2013). This result raises the possibility that object co-occurrence statistics form the basis of scene category definitions above and beyond individual objects present in scenes.

Stansbury et al. defined categories of static scenes via their constituent objects and focused on category responses in classical scene-selective regions like many prior studies on scene representation (Epstein & Morgan, 2012; Jung et al., 2018; Walther et al., 2009, 2011). Yet, several recent studies imply that much of anterior visual cortex might be organized by differential tuning of voxels for actions within visual scenes (Tarhan & Konkle, 2020; Çukur, Huth, Nishimoto, & Gallant, 2016). In fact, real-world scenes contain dynamic interactions between objects and actions leading to more elaborate categories (Greene, Baldassano, Esteva, Beck, & Fei–Fei, 2016), and they have been reported to elicit widespread responses across visual cortex (Deen, Koldewyn, Kanwisher, & Saxe, 2015; Epstein & Baker, 2019; Isik, Koldewyn, Beeler, & Kanwisher, 2017; Maguire et al., 1998). Therefore, it is likely that natural scene categories based on co-occurrence of objects and actions are represented across broadly distributed networks in the human brain.

Here, we sought to learn high-level features that capture scene-category information in dynamic visual scenes, and to examine how this information is represented across cerebral cortex. We first recorded BOLD responses while subjects viewed a large set of natural movies that contained 5252 distinct objects and actions. To identify scene-category features, we employed a statistical learning algorithm that learned a large set of categories on the basis of the co-occurrence statistics of objects and actions in the natural world. We then used the learned scene categories within a voxelwise modeling framework (Huth, Nishimoto, Vu, & Gallant, 2012; Nishimoto et al., 2011; Çukur et al., 2016; Çukur, Nishimoto, Huth, & Gallant, 2013) to estimate scene-category tuning profiles in single voxels across cerebral cortex. Subsequently, we performed a clustering analysis in order to reveal large-scale networks of brain regions that differ in their scene-category tuning.

Section snippets

Subjects

Five healthy human subjects (all male, ages 25–32 years) with normal or corrected-to-normal vision participated in this study. MRI data were collected in five separate scan sessions: three sessions for the main experiment, one session for acquiring functional localizers, and one session for acquiring anatomical data. Experimental protocols were approved by the Committee for the Protection of Human Subjects at the University of California, Berkeley. All subjects gave written informed consent

Results

To investigate the nature of high-level scene information that is represented across the cerebral cortex, we recorded BOLD responses while subjects passively viewed 2 h of natural movies. We used voxelwise modeling to assess scene representations in single voxels. We fit a scene-category model to measure tuning for scene categories (e.g., an urban street, a forest) that reflect co-occurrence statistics of objects and actions in natural scenes. Model performance was evaluated by calculating

Discussion

The aim of this study was to investigate representation of dynamic visual scenes across the human brain. To do this, we fit a scene-category model to measure voxelwise tuning for hundreds of scene categories, where categories were learned inductively as statistical ensembles of objects and actions in natural scenes. We find that this scene-category model explains a significant portion of the response variance broadly across cerebral cortex. We then performed cluster analysis on voxelwise tuning

CRediT author statement

Emin Çelik: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing – Review & Editing, Visualization.

Umit Keles: Conceptualization, Methodology, Software, Formal analysis, Writing – Original Draft, Visualization.

İbrahim Kiremitçi: Validation, Formal analysis.

Jack L. Gallant: Writing – Review & Editing.

Tolga Çukur: Conceptualization, Investigation, Resources, Writing – Review & Editing, Supervision, Project Administration, Funding Acquisition.

Open practices

The study in this article earned Open Data and Open Materials badges for transparent practices. Data for this study can be found at https://crcns.org/data-sets/vc/vim-2 and https://crcns.org/data-sets/vc/vim-4.

Acknowledgments

The authors declare no competing financial interests. The work was supported in part by a National Eye Institute Grant (EY019684), by a Marie Curie Actions Career Integration Grant (PCIG13-GA-2013-618101), by a European Molecular Biology Organization Installation Grant (IG 3028), by a TUBA GEBIP 2015 fellowship, and by a Science Academy BAGEP 2017 award. We thank D. Stansbury, A. Huth, and S. Nishimoto for assistance in various aspects of this research. We report how we determined our sample

References (100)

E. Çelik et al.
Spatially informed voxelwise modeling for naturalistic fMRI experiments
Neuroimage
(2019)
J.C. Culham et al.
Neuroimaging of cognitive functions in human parietal cortex
Current Opinion in Neurobiology
(2001)
M. Eickenberg et al.
Seeing it all: Convolutional network layers map the function of the human visual system
Neuroimage
(2017)
R.A. Epstein
Parahippocampal and retrosplenial contributions to human spatial navigation
Trends in Cognitive Sciences
(2008)
R.A. Epstein et al.
Neural responses to visual scenes reveals inconsistencies between fMRI adaptation and multivoxel pattern analysis
Neuropsychologia
(2012)
K. Ferrara et al.
Neural representation of scene boundaries
Neuropsychologia
(2016)
M.R. Greene et al.
Recognition of natural scenes from global properties: Seeing the forest without representing the trees
Cognitive Psychology
(2009)
K. Grill-Spector et al.
The lateral occipital complex and its role in object recognition
Vision Research
(2001)
K. Han et al.
Variational autoencoder: An unsupervised model for encoding and decoding fMRI activity in visual cortex
Neuroimage
(2019)
J.V. Haxby et al.
A common, high-dimensional model of the representational space in human ventral temporal cortex
Neuron
(2011)

A.G. Huth et al.

A continuous semantic space describes the representation of thousands of object and action categories across the human brain

Neuron

(2012)

M. Jenkinson et al.

Improved optimization for the robust and accurate linear registration and motion correction of brain images

Neuroimage

(2002)

M.X. Lowe et al.

Neural representation of geometry and surface properties in object and scene perception

Neuroimage

(2017)

G.L. Malcolm et al.

Making sense of real-world scenes

Trends in Cognitive Sciences

(2016)

S. Nishimoto et al.

Reconstructing visual experiences from brain activity evoked by natural movies

Current Biology

(2011)

M.I. Posner et al.

Analyzing and shaping human attentional networks

Neural Networks

(2006)

R. Saxe

Uniquely human social cognition

Current Opinion in Neurobiology

(2006)

A. Schindler et al.

Visual high-level regions respond to high-level stimulus content in the absence of low-level confounds

Neuroimage

(2016)

J.T. Serences et al.

Computational advances towards linking BOLD and behavior

Neuropsychologia

(2012)

D.E. Stansbury et al.

Natural scene statistics account for the representation of scene categories in human visual cortex

Neuron

(2013)

D.M. Watson et al.

Patterns of response to visual scenes are linked to the low-level properties of the image

Neuroimage

(2014)

P. Agrawal et al.

Pixels to voxels: Modeling visual representation in the human brain

(2014)

G.K. Aguirre et al.

Environmental knowledge is subserved by separable dorsal/ventral neural areas

Journal of Neuroscience

(1997)

T.J. Andrews et al.

Low-level properties of natural images predict topographic patterns of neural response in the ventral visual pathway

Journal of Vision

(2015)

D. Arthur et al.

k-means++: The advantages of careful seeding

H.C. Barrett

A hierarchical model of the evolution of human brain specializations

Proceedings of the National Academy of Sciences of the United States of America

(2012, June 26)

Y. Benjamini et al.

The control of the false discovery rate in multiple testing under dependency

The Annals of Statistics

(2001)

M. Bilalić

Revisiting the role of the fusiform face area in expertise

Journal of Cognitive Neuroscience

(2016)

S. Bird et al.

Natural language processing with Python: Analyzing text with the natural language toolkit

(2009)

D.M. Blei et al.

Latent dirichlet allocation

Journal of Machine Learning Research

(2003)

C.F. Cadieu et al.

Deep neural networks rival the representation of primate IT cortex for core visual object recognition

Plos Computational Biology

(2014)

G. Calvert et al.

Reading speech from still and moving faces: The neural substrates of visible speech

Journal of Cognitive Neuroscience

(2003)

A.E. Cavanna et al.

The precuneus: A review of its functional anatomy and behavioural correlates

Brain

(2006)

D.L. Chen et al.

Collecting highly parallel data for paraphrase evaluation

R.M. Cichy et al.

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence

Scientific Reports

(2016)

T. Çukur et al.

Functional subdomains within human FFA

Journal of Neuroscience

(2013)

T. Çukur et al.

Functional subdomains within scene-selective cortex: Parahippocampal place area, retrosplenial complex, and occipital place area

Journal of Neuroscience

(2016)

T. Çukur et al.

Attention during natural vision warps semantic representation across the human brain

Nature Neuroscience

(2013)

S.V. David et al.

Predicting neuronal responses during natural vision

Network: Computation in Neural Systems

(2005)

B. Deen et al.

Functional organization of social perception and cognition in the superior temporal sulcus

Cerebral Cortex

(2015)

D.D. Dilks et al.

The occipital place area is causally and selectively involved in scene perception

Journal of Neuroscience

(2013)

P. Downing et al.

A cortical area selective for visual processing of the human body

Science

(2001)

S.A. Engel et al.

Retinotopic organization in human visual cortex and the spatial precision of functional MRI

Cerebral Cortex

(1997)

R.A. Epstein

Neural systems for visual scene recognition

R.A. Epstein et al.

Scene perception in the human brain

Annual Review of Vision Science

(2019)

R. Epstein et al.

A cortical representation of the local visual environment

Nature

(1998)

E.F. Ester et al.

Categorical biases in human occipitoparietal cortex

The Journal of Neuroscience

(2020)

J.S. Gao et al.

Pycortex: An interactive surface visualizer for fMRI

Frontiers in Neuroinformatics

(2015)

I. Gauthier et al.

Expertise for cars and birds recruits brain areas involved in face recognition

Nature Neuroscience

(2000)

I. Gauthier et al.

The fusiform “face area” is part of a network that processes faces at the individual level

Journal of Cognitive Neuroscience

(2000)

Cited by (0)

View full text

Research ReportCortical networks of dynamic scene category representation in the human brain

Abstract

Introduction

Section snippets

Subjects

Results

Discussion

CRediT author statement

Open practices

Acknowledgments

Neuroimage

Current Opinion in Neurobiology

Neuroimage

Trends in Cognitive Sciences

Neuropsychologia

Neuropsychologia

Cognitive Psychology

Vision Research

Neuroimage

Neuron

Neuron

Neuroimage

Neuroimage

Trends in Cognitive Sciences

Current Biology

Neural Networks

Current Opinion in Neurobiology

Neuroimage

Neuropsychologia

Neuron

Neuroimage

Pixels to voxels: Modeling visual representation in the human brain

Environmental knowledge is subserved by separable dorsal/ventral neural areas

Journal of Neuroscience

Low-level properties of natural images predict topographic patterns of neural response in the ventral visual pathway

Journal of Vision

k-means++: The advantages of careful seeding

A hierarchical model of the evolution of human brain specializations

Proceedings of the National Academy of Sciences of the United States of America

The control of the false discovery rate in multiple testing under dependency

The Annals of Statistics

Revisiting the role of the fusiform face area in expertise

Journal of Cognitive Neuroscience

Natural language processing with Python: Analyzing text with the natural language toolkit

Latent dirichlet allocation

Journal of Machine Learning Research

Deep neural networks rival the representation of primate IT cortex for core visual object recognition

Plos Computational Biology

Reading speech from still and moving faces: The neural substrates of visible speech

Journal of Cognitive Neuroscience

The precuneus: A review of its functional anatomy and behavioural correlates

Brain

Collecting highly parallel data for paraphrase evaluation

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence

Scientific Reports

Functional subdomains within human FFA

Journal of Neuroscience

Functional subdomains within scene-selective cortex: Parahippocampal place area, retrosplenial complex, and occipital place area

Journal of Neuroscience

Attention during natural vision warps semantic representation across the human brain

Nature Neuroscience

Predicting neuronal responses during natural vision

Network: Computation in Neural Systems

Functional organization of social perception and cognition in the superior temporal sulcus

Cerebral Cortex

The occipital place area is causally and selectively involved in scene perception

Journal of Neuroscience

A cortical area selective for visual processing of the human body

Science

Retinotopic organization in human visual cortex and the spatial precision of functional MRI

Cerebral Cortex

Neural systems for visual scene recognition

Scene perception in the human brain

Annual Review of Vision Science

A cortical representation of the local visual environment

Nature

Categorical biases in human occipitoparietal cortex

The Journal of Neuroscience

Pycortex: An interactive surface visualizer for fMRI

Frontiers in Neuroinformatics

Research Report
Cortical networks of dynamic scene category representation in the human brain