Are emotional objects visually salient? The Emotional Maps Database

https://doi.org/10.1016/j.jvcir.2021.103221Get rights and content

Highlights

  • We present a database of manual selections of meaningful regions in emotional images.

  • We measure similarity between meaning and saliency maps.

  • High-arousing and negative content is less salient than low-arousing and positive.

  • People agree more selecting meaningful regions in high-arousing and negative images.

Abstract

The visual system prioritizes emotional content in natural scenes, but it is unclear whether emotional objects are systematically more salient. We compare emotional maps - created by averaging multiple manual selections of the most meaningful regions in images of negative, positive, and neutral affective valence - with saliency maps generated by Graph-Based Visual Saliency, Proto-object, and SalGAN models. We found that similarity between emotional and saliency maps is modulated by the scenes’ arousal and valence ratings: the more negative and high-arousing content, the less it was salient. Simultaneously, the negative and high-arousing content was the easiest to identify by the participants, as shown by the highest inter-individual agreement in the selections. Our results support the “affective gap” hypothesis, i.e., decoupling of emotional meaning from image’s formal features. The Emotional Maps Database created for this study, proven useful in gaze fixation prediction, is available online for scientific use.

Introduction

Most images depicting real-life scenes are composed of more informative regions (e.g., foreground objects, human figures, facial display of emotions), which attract visual attention, and less informative ones (e.g., blank walls, background objects, homogeneous surfaces), which are mostly ignored [1]. In many cases, a detail within a scene creates meaning and evokes emotions in the viewer: a wound or a smile can transform an otherwise neutral scene into an intensely emotional one. In this research, we analyze the distribution of meaningful content within images commonly used in the studies on emotions. In particular, we explore how visually salient and clearly delimited are the key elements in the positive, negative, and neutral scenes. We also provide a set of “emotional meaning maps” for the commonly used emotional images databases.

Emotion-evoking stimuli rapidly attract attention and are processed in a prioritized way [2], [3], [4], [5], [6], [7], [8], [9]. To study this prioritization (as well as other aspects of emotional processing), emotional images depicting real-life scenes have been used in a multitude of studies over the past 40 years [10], [11], [12]. To achieve better experimental control over the emotion induction, researchers in social sciences use databases that provide emotionally charged photographs with standardized ratings of their emotional arousal and valence, obtained in the process of a large-sample evaluation [10]. Creators of those databases often use the dimensional concept of emotions [13], assuming emotion can be described on two basic dimensions: arousal and valence. Valence determines if the stimulus is pleasant or unpleasant, while arousal determines whether the stimulus is calming or exciting. Both arousal and valence dimensions of a stimulus, be it an image, sound, or word, can be conveniently assessed using the Self-Assessment Manikin Scale, a graphical 9-point rating scale devised by Bradley and Lang [10] which became a standard tool for emotional stimuli evaluation. This approach is especially advantageous when regression models are used, as in this study. [14].

Among databases of natural emotional images are EmoPics [15], Geneva Affective Picture Database (GAPED) [16], the International Affective Picture System (IAPS) [17], and the Nencki Affective Picture System (NAPS) [18]. All of them contain natural, real-life images, depicting a broad cross-section of scene categories, including people, social interactions, animals, artificial objects, landscapes, interiors, erotica, food, etc.

The distribution of meaningful content within these scenes has rarely been analyzed systematically, even though, in many cases, detailed information regarding the scene composition is of primary interest. For example, in eye-tracking studies, the information on the key objects’ location is routinely used to select regions of interest - a preparatory step necessary for more refined analyses of fixation patterns [19], [20]. Meaning-driven regions of interest have also been employed in the eye-tracking studies involving emotional images presentation [5], [21], [22], [23], [24], [25], [26], [27], [28]. In computer vision, the most important emotional region’s location can be useful as ground truth for DNN algorithms [29], [30].

The selection of the most informative or meaningful regions of an image can be performed either manually or algorithmically, including machine learning. The algorithmic approach was implemented in a variety of visual saliency models (for reviews [31], [32], [33]) including machine learning approaches, e.g. [34], [35], [36], [37]. Some saliency models take into account low-level local features such as edges or contrast of the luminance and colors, e.g. [38], [39]; others are based on the analysis of higher-order features, such as objecthood, e.g. [40], [41]; even more specialized ones take into account pre-trained information regarding the usual scene composition and its elements, e.g. [34], [35], [38], [42], [43] or even combine few models to build multi-stage learning, e.g. [37]. The main advantage of those approaches is their ease of use, efficiency, and repeatability.

Yet, in the case of emotional images, visual saliency seems to be a relatively poor predictor of attention engagement - and thus, presumably, a poor predictor of the distribution of the most meaningful regions within a scene. The experiments by Humphrey, Underwood, and Lambert [24] and Niu and colleagues [5] showed that emotional visual objects attract attention irrespective of their visual saliency. Moreover, Pilarczyk and Kuniecki [25] showed that the visual saliency alone, when decoupled from the meaning, does not attract attention better than chance, particularly in the case of emotional images. The primacy of meaning over visual saliency in attention guidance was also confirmed in other studies, not employing emotional images [44], [45], [46], [47], [48]. Still, the relation of the meaning and the visual saliency is somewhat intertwined. Elazary and Itti [49] showed that interesting objects within neutral scenes tend to be also visually salient, as measured with Graph-Based Visual Saliency (GBVS) [50], a purely bottom-up algorithm based on simple visual features.

When the algorithmic approach is not sufficient, researchers make use of manual segmentation of a scene. However, when it comes to databases of emotional images featuring demarcation of the most meaningful region, to our knowledge, researchers are limited to EMOd [29] and EmotionROI [30], both originating from the computer vision community. EMOd is composed of 1019 images (321 from IAPS [17] and 698 from the Internet) and features outlines of most dominant objects along with their categorizations, providing semantic segmentation. However, since the object markings and descriptions were made by only three participants, the EMOd does not provide high-resolution meaning maps, and does not allow for pixel by pixel representation of emotional load, analogously to saliency maps.

In contrast, the EmotionROI image database by Peng and colleagues [51] is provided with an accompanying Emotion Stimuli Map (ESM) created by averaging selections of regions that best capture the emotional meaning of an image [30], made by Mechanical Turk participants. EmotionROI consists of 1980 images collected on Flickr by the authors, using search keywords matching six universal emotions identified by Ekman and Friesen [52] (anger, disgust, fear, joy, sadness, surprise). Apart from ESM, EmotionROI also provides valence and arousal evaluations for each image, made using a tool modeled on the Self-Assessment Manikin Scale [10].

The aim of our project was twofold. First, we aimed to create a database of meaning maps showing the distribution of emotional content in photographs that were rigorously standardized in terms of emotional valence and arousal. The Emotional Maps Database (EMD) is conceived as a complementary tool for the established and widely used sets of emotional images: the EmoPics [15], the Geneva Affective Picture Database (GAPED) [16], the International Affective Picture System (IAPS) [17], and the Nencki Affective Picture System (NAPS) [18]. Second, we wanted to explore the similarity between the meaning and visual saliency maps in emotional images. To this end, we compared our EMD maps (as well as ESM maps, see Appendix A) with maps generated by three saliency models: GBVS [50], Proto-objects [41], and SalGAN [53]. We also aimed to explore the participants’ agreement in selecting the most meaningful region in emotional scenes.

Our main contributions are as follows:

  • 1)

    We present a new database of 950 meaning maps for images commonly used in emotion research. Meaning maps represent the spatial distribution of emotionally charged regions selected by a large group of participants. Thus, the maps have properties similar to saliency maps and can be directly compared with them.

  • 2)

    We provide comprehensive analysis on how emotional characteristics of images, i.e., valence (pleasant-unpleasant) and arousal (calm-arousing) dimensions, influence the similarity of meaning and saliency distributions. We analyze saliency models employing different saliency definitions and mechanisms of its detection. We also use various similarity measures to compare saliency and meaning maps.

  • 3)

    We find that, despite being clearly delineated and easily detected by human participants, high-arousing and negative objects are less effectively detected by saliency models.

Section snippets

Participants

We recruited 296 participants (244 women, aged 18–52, M = 22, SD = 4)1. The participants were required to have normal or corrected-to-normal eyesight and intact color vision. The participants were recruited with the Jagiellonian University advertisement mailing system, and the majority of them were local university students. For participation in the study, they received course credit or payment. All participants gave informed consent prior to the

Results

The relationship between valence, arousal, and similarity between emotional meaning maps and saliency maps was highly significant for all saliency models and all similarity measures, as evidenced by F values and their probabilities of all regression models (Table 2). Valence was positively related to saliency in all models except Proto-object (Fig. 3). This signifies that the most meaningful regions in negative images tend to be relatively less visually salient than in positive ones. Arousal

Discussion

In this study, we compared the emotional meaning maps (created from manual selections of key objects by the participants) with the maps generated automatically by three saliency models, using three different similarity measures. We investigated how emotional valence and arousal of an image influence similarity in map-making between humans and the algorithms and how it affects the agreement between the participants in selecting images’ key regions.

Comparing the emotional meaning maps with

CRediT authorship contribution statement

Joanna Pilarczyk: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Project administration, Validation, Visualization, Writing – original draft, Writing - review & editing. Weronika Janeczko: Investigation, Data curation, Writing – original draft, Writing - review & editing. Radosław Sterna: Data curation, Writing – original draft, Writing - review & editing. Michał Kuniecki: Conceptualization, Formal analysis, Funding acquisition, Methodology, Project

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Science Center in Poland (grant numbers 2012/07/E/HS6/01046 and 2017/25/B/HS6/00758). During work on the paper, Radosław Sterna was supported by the funding from the budget for science in the years 2019-2023, as a research project (project number DI2018 015848) under the Diamond Grant program financed by the Ministry of Education and Science of Poland. We would like to thank Piotr Wójcik for technical support in the development of the computer tool for

References (70)

  • J. Stoll et al.

    Overt attention in natural scenes: Objects dominate features

    Vision Res.

    (2015)
  • J. Markovic et al.

    Tuning to the significant: Neural and genetic processes underlying affective enhancement of visual perception and memory

    Behav. Brain Res.

    (2014)
  • K. Grill-Spector et al.

    The lateral occipital complex and its role in object recognition

    Vision Res.

    (2001)
  • J.M. Henderson et al.

    Eye movements and visual memory: Detecting changes to saccade targets in scenes

    Perception & Psychophysics

    (2003)
  • A. Keil et al.

    Early modulation of visual perception by emotional arousal: evidence from steady-state visual evoked brain potentials

    Cognitive, Affective, & Behavioral Neuroscience

    (2003)
  • Kuniecki, M., Pilarczyk, J., & Wichary, S. (2015). The color red attracts attention in an emotional context. An ERP...
  • E. McSorley et al.

    The time course of implicit affective picture processing: An eye movement study

    Emotion

    (2013)
  • Y. Niu et al.

    Affective salience can reverse the effects of stimulus-driven salience on eye movements in complex scenes

    Front. Psychol.

    (2012)
  • L. Nummenmaa et al.

    Eye movement assessment of selective attentional capture by emotional pictures

    Emotion

    (2006)
  • A. Ohman et al.

    Emotion drives attention: Detecting the snake in the grass

    Emotion

    (2001)
  • M. Diano et al.

    Amygdala response to emotional stimuli without awareness: facts and interpretations

    Front. Psychol.

    (2017)
  • W.L. Libby et al.

    Pupillary and cardiac activity during visual attention

    Psychophysiology

    (1973)
  • J.A. Russell

    A circumplex model of affect

    J. Pers. Soc. Psychol.

    (1980)
  • Zhao, S., Ding, G., Huang, Q., Chua, T. S., Schuller, B. W., & Keutzer, K. (2018, July). Affective Image Content...
  • M. Wessa et al.

    EmoPicS: subjective and psychophysiological evaluation of new imagery for clinical biopsychological research

    Z. Klin. Psychol. Psychother. Suppl

    (2010)
  • E.S. Dan-Glauser et al.

    The Geneva affective picture database (GAPED): a new 730-picture database focusing on valence and normative significance

    Behavior Research Methods

    (2011)
  • P.J. Lang et al.

    International affective picture system (IAPS): affective ratings of pictures and instruction manual

    (2008)
  • A. Marchewka et al.

    The Nencki Affective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database

    Behavior Research Methods

    (2014)
  • J.L. Orquin et al.

    Threats to the validity of eye-movement research in psychology

    Behavior research methods

    (2018)
  • T. Pedale et al.

    Enhanced insular/prefrontal connectivity when resisting from emotional distraction during visual search

    Brain Struct. Funct.

    (2019)
  • D.J. Acunzo et al.

    No emotional “pop-out” effect in natural scene viewing

    Emotion

    (2011)
  • Humphrey, K., Underwood, G., & Lambert, T. (2012). Salience of the lambs: A test of the saliency map hypothesis with...
  • Pilarczyk, J., & Kuniecki, M. (2014). Emotional content of an image attracts attention more than visually salient...
  • M. Kuniecki et al.

    Effects of scene properties and emotional valence on brain activations: a fixation-related fMRI study

    Front. Hum. Neurosci.

    (2017)
  • S. Fan et al.

    Emotional attention: A study of image sentiment and visual attention

  • Cited by (0)

    View full text