Elsevier

Cognition

Volume 197, April 2020, 104151
Cognition

The temporal dynamics of infants' joint attention: Effects of others' gaze cues and manual actions

https://doi.org/10.1016/j.cognition.2019.104151Get rights and content

Highlights

  • Joint attention was studied with high-resolution eye tracking and naturalistic social stimuli.

  • 8- and 12-month-old infants and adults looked more at hands-and-objects than at the actors' faces.

  • Critically, allocation of attention was moderated by age and social cues (gaze direction & actions).

  • Age differences were primarily related to the time course of fixations following a shift in social cues.

  • It is concluded that joint attention is not a monolithic process nor does it develop all at once.

Abstract

Infants' development of joint attention shows significant advances between 9 and 12 months of age, but we still need to learn much more about how infants coordinate their attention with others during this process. The objective of this study was to use eye tracking to systematically investigate how 8- and 12-month-old infants as well as adults dynamically select their focus of attention while observing a social partner demonstrate infant-directed actions. Participants were presented with 16 videos of actors performing simple infant-directed actions from a first-person perspective. Looking times to faces as well as hands-and-objects were calculated for participants at each age, and developmental differences were observed, although all three groups looked more at hands-and-objects than at faces. In order to assess whether visual attention was coordinated with the actors' behaviors, we compared participants looking at faces and objects in response to gaze direction as well as infant-directed actions vs. object-directed actions. By presenting video stimuli that involved continuously changing actions, we were able to document that the likelihood of joint attention changes in both real and developmental time. Overall, adults and 12-month-old infants' visual attention was modulated by gaze cues as well as actions, whereas this was only partially true for 8-month-old infants. Our results reveal that joint attention is not a monolithic process nor does it develop all at once.

Introduction

Social attention to others' eyes, faces, and actions is foundational to how we communicate, learn about the social and physical world, regulate emotions, and develop attachments with others. Beginning at birth, infants attend preferentially to faces, and are most sensitive to the presence of eyes in a face (Acerra, Burnod, & de Schonen, 2002; Batki, Baron-Cohen, Wheelright, Connellan, & Ahluwalia, 2000; Johnson & Morton, 1991). In addition, newborn infants prefer to orient to faces displaying direct gaze (Farroni, Csibra, Simion, & Johnson, 2002), and show a rudimentary form of gaze following (Farroni, Massaccesi, Pividori, & Johnson, 2004). Some evidence suggests that newborns recognize their mother's face (e.g., Bushnell, 2001), and these recognition abilities continue to develop over the first few months (Nelson, 2001). Beginning around 10 weeks of age infants fixate more consistently on the internal features of a face than on the external features and contours, especially when the face is speaking (Haith, Bergman, & Moore, 1977; Hunnius & Geuze, 2004). By three months, infants begin to differentiate faces based on the social categories of gender and race (Kelly et al., 2005; Quinn, Yahr, Kuhn, Slater, & Pascalis, 2002).

This improvement in face perception continues to develop during the next few months, because infants engage often in dyadic interactions with their caregivers ensuring that faces are a prominent part of their visual experience (Lock & Zukow-Goldring, 2010; Lockman, 2000). Once they can sit without support and coordinate their reaches toward objects, infants' reliance on interactions with other people for stimulation begins to decline. By around six months of age, infants are much more likely to divide their attention between exploring objects with their eyes and hands and interacting with social partners (Lock & Zukow-Goldring, 2010). For the next few months they typically distribute their attention to either objects or social partners, but they still must learn to share their attention to a common referent with someone else. It is not until 9 to 12 months of age (at least according to most social-cognitive theorists) that infants attribute intentional states to social partners enabling them to engage in triadic interactions (Tomasello, 2008; Woodward, 2009), such as participating with others in joint attention to objects and establishing common ground (Bakeman & Adamson, 1984; Carpenter, Nagell, & Tomasello, 1998), pointing to objects communicatively (Carpenter et al., 1998), and expecting social partners to express interest in shared referents (Liszkowski, Carpenter, & Tomasello, 2007).

In order for infants to develop these skills they must first learn to coordinate their attention to their social partner with their attention to objects (Bertenthal, Boyer, & Harding, 2014). Although it is well established that this developmental transition occurs, little is known about how a preference for faces gives way to a more distributed view of the social world that includes not only faces, but bodies and actions, as well as objects. In general, attention is the front-end of encoding and interpreting all stimulus information encountered in the environment, and thus it is essential for not only learning to recognize and discriminate faces, but others' actions as well. How do infants decide where to look from moment-to-moment when confronted with not only a dyadic partner but also an assortment of objects, other people, and events in their optic arrays? Early on, infants' orienting to stimuli in the environment is primarily under exogenous stimulus-driven control, but over time they begin to also develop endogenous control over their attention (Johnson, 2011; Mundy & Jarrold, 2010). As such, they begin modulating their attention in response to the actions of their social partner as well as the context (Bertenthal & Boyer, 2015). Indeed, this is exactly what is necessary for infants to follow the gaze direction of a social partner during shared attention. If infants could not modulate their attention, then they would simply continue to be guided by their bias for faces, but the development of joint attention suggests otherwise.

Although there has been considerable research investigating the social cognitive prerequisites for joint attention, such as shared intentions or common ground (Tomasello, 2008), much less is known about how and when infants begin to dynamically coordinate their social attention among faces, actions, and objects. One reason for the sparseness of relevant findings is that most studies obviate the need for infants choosing between different stimulus cues. Infants are typically presented with a specific sequence of events, such as an actor eliciting an infant's attention, and then looking or pointing in a specific direction, followed by an object appearing either in that direction or the opposite direction; infants merely have to attend to the stimuli in the order they appear and not choose when and what to look at (e.g., Bertenthal et al., 2014; Gredebäck, Fikke, & Melinder, 2010; Senju & Csibra, 2008). In more naturalistic situations, such as an infant interacting with a caregiver in a cluttered room among a set of objects over a more extended period of time, the caregiver might alternate between gazing at the child and the objects and jointly playing with those objects or showing them to the child. The question then becomes, how much are infants' looking behaviors guided by attention to the face or by attention to the manual actions of the caregiver, the orientation of her face, her body posture, or changes in her object-directed actions? This is a critical question because infants' systematic selection of social information in triadic interactions may not only precede but catalyze their appreciation of shared intentions or common ground. In other words, these preferences establish new opportunities for social interaction and social learning, which might very well contribute to their social-cognitive development. It is for this reason that we sought to study how infants distribute their attention during social interactions.

Recent advances in infants' eye tracking research offer important opportunities for systematically investigating how infants allocate their attention to social and non-social stimuli. Most studies, however, still rely on presenting highly scripted and repetitive actions to infants in experimental paradigms involving a live, digital image or movie of a social partner looking or reaching toward an object following an ostensive cue, such as eye contact with the viewer (e.g., Daum, Ulber, & Gredebäck, 2013; Senju & Csibra, 2008; Woodward, 1998). During the past decade, Frank and colleagues (Frank, Vul, & Johnson, 2009; Frank, Vul, & Saxe, 2012) made some important progress in studying infants' and toddlers' social attention to more naturalistic visual scenes. For instance, Frank et al. (2012) measured the visual fixations of infants and toddlers between 3 and 30 months of age while viewing short videos of objects, faces, children playing with toys, and complex social scenes involving more than one person. The results revealed that the youngest infants looked primarily at faces, and eyes in particular, but older infants and toddlers distributed their gaze more flexibly and looked more at the mouth and also significantly more at the hands, especially when the hands were engaged in actions on objects. One important question that could not be addressed by these studies is whether children's attention is directed differently to people observed from a first-person as opposed to a third-person perspective.

A more recent study by Elsabbagh et al. (2014) also studied infants' relative distribution of fixations to the eyes and mouth when viewing a social partner (observed from a first-person perspective) with eyes, mouth or hands moving or expressing multiple communicative signals (e.g., “peek-a-boo”). Consistent with previous studies, infants between 7 and 15 months of age looked at the eyes more than the mouth, but this difference was contextually modulated, such that when only the mouth moved infants looked more at the mouth than when only the eyes moved. Taken together, these last few studies suggest that by sometime during the latter half of the first year infants' social attention is controlled by both stimulus-driven factors, such as sensory (e.g., contrast, color, orientation, and motion) and social salience (e.g., faces), as well as more endogenous or goal-directed factors that can exert control of looking behavior.

The objective of the current study was to move beyond these generalizations in order to better understand how infants dynamically select their focus of attention while observing people who appear to be interacting with them. This dynamic selection of where to look is a prerequisite for joint attention. During direct gaze there is an opportunity for eye contact and communication with the social partner, whereas during averted gaze there is an opportunity for joint attention toward another person or object (Farroni, Mansfield, Lai, & Johnson, 2003; Senju & Csibra, 2008; Senju, Csibra, & Johnson, 2008). Previous eye tracking studies were restricted to reporting where infants directed their attention based on first-order stimulus information, such as faces or objects in the scene (e.g., Jones & Klin, 2013). As such, these studies ignored how contextual and social cues, such as gaze direction or actions, might orient infants to look toward a specific location. These second-order cues result in a more complex and probabilistic process, because the observer decides where to look not only as a function of the region of interest (e.g., faces, objects) but also in response to other actions as well as knowledge of the preceding events. For example, the likelihood of looking at someone's face during a conversation is much higher if that individual's gaze is oriented directly toward you as opposed to looking toward another object (Kleinke, 1986; Senju & Hasegawa, 2005). If, however, the social partner is also waving her hands or manipulating an object while looking toward you, the likelihood of looking at the face and establishing eye contact with the social partner decreases. In typical social interactions, the cues for where to look will often compete and this is especially true for young infants outside of the lab. This is the reason that we sought to study how infants guide their visual attention during more naturalistic social situations.

We measured infants' eye gaze to dynamic social scenes. Unlike the studies conducted by Frank and colleagues, the stimuli were not movies of people or cartoon characters shown from a third-person perspective such that infants were simply watching a movie. Instead, our stimuli were created to show different actors socially engaged with an observer viewed from a first-person perspective. Although the stimuli were videos, they were designed to simulate naturalistic situations that could occur between a social partner and an infant. As such, each of 16 videos presented one of five female actors talking and demonstrating a sequence of simple actions, such as putting a shirt on a stuffed animal. Since our primary goal was to conduct a detailed analysis of the changing focus of attention during joint attention, it was especially important to include both people and objects. Contrary to conventional wisdom, a few recent studies suggest that infants do not always look at the social partner's eyes or face during joint attention; instead they focus primarily on sharing attention to the same object-directed actions (Deák, Krasno, Jasso, & Triesch, 2018; Deák, Krasno, Triesch, Lewis, & Sepeta, 2014; Franchak, Kretch, Soska, & Adolph, 2011; Yu & Smith, 2013). Thus, it was especially important for us to include not only people and their gestures, but object-directed actions as well.

Three age groups were tested: 8- and 12-month-old infants, and adults. The two infant groups were selected to straddle the age at which joint attention develops and adults were included to enable a comparison of the infants' performance with more mature visual scanning behavior. Our goal was to assess the degree to which developmental changes in shifting attention to faces vs. objects was a function of the direction of head and eye gaze as well as infant-directed and object-directed actions.

We hypothesized that 12-month-old infants and adults would systematically sustain or shift attention as a function of the actors' gaze direction and actions, whereas 8-month-old infants' attentional focus would be less predictable from the actors' social cues. This prediction for 8-month-old infants was predicated on a number of specific findings: Most of the current evidence suggests that infants do not respond to gaze cues as referential prior to 9 months of age, and thus they are less likely to systematically respond to gaze direction during observation of the actions of a social partner (e.g., Johnson, Ok, & Luo, 2007; Senju et al., 2008; Woodward, 2003). There is, however, a caveat to this finding. Infants as young as 3- to 4-months of age will shift their attention in the direction of averted gaze if the target consists of moving hands and objects (Amano, Kezuka, & Yamamoto, 2004; Deák et al., 2018). Accordingly, we expected 8-month-old infants to respond to averted gaze more like 12-month-old infants when this gaze was coupled with object-directed actions. Less clear was how participants in all three age groups would respond to social cues that were incongruent (e.g., direct gaze from the viewer's face while performing an object-directed action). As we will discuss, object-directed actions were often the best predictor of when infants would share attention with the actors in the videos.

Section snippets

Participants

Twenty-two eight-month-old infants (M = 243.0-days, SD = 8.7-days; 11 females, 11 males), 20 twelve-month-old infants (M = 371.6-days, SD = 8.7-days; 7 females, 13 males), and 20 adults (10 females, 10 males) comprised the sample for this study. Two additional eight-month-old infants were tested but were excluded due to fussiness or our inability to calibrate the eye-tracking system and record valid data. Parents provided consent for their child's participation and all infants received a

Results

The main goal of this study was to test whether infants and adults modulated their attention to faces and objects as a function of gaze direction and action type. In order to address this question, it was necessary to first determine how visual attention should be measured. Although most developmental studies measure visual attention in terms of total duration of looking, we opted to measure attention exclusively in terms of visual fixations. Our eyes scan the visual world via saccadic

Discussion

A prerequisite for joint attention is that both infants and adults coordinate their focus of attention with the gaze direction and actions of their social partner. Most previous eye tracking studies presented faces in isolation, which obviated the need for joint attention. By contrast, this study presented videos of actors appearing to interact with observers so that we could precisely measure how direction of gaze and actions affect the spatiotemporal patterning of infants' gaze during more

Conclusions

This study adopted a hybrid approach to studying joint attention combining high spatial resolution eye tracking with more naturalistic social stimuli. Our results reveal that joint attention is not a monolithic process nor does it develop all at once. Indeed, it is not even possible to suggest that infants respond to different gaze cues in the same way. Our results suggest important processing differences between direct and averted gaze in triggering joint attention. Moreover, object-directed

Author contributions

Ty W. Boyer: Conceptualization, Methodology, Investigation, Software, Data Curation, Writing – Reviewing and Editing.

Samuel M. Harding: Formal analysis, Visualization, Software, Data Curation, Writing – Reviewing and Editing.

Bennett I. Bertenthal: Supervision, Conceptualization, Writing – Original draft preparation

Acknowledgements

Portions of these data were previously presented at the biennial meetings of the International Society for Infant Studies, New Orleans, LA, May 2016, and the meetings of the Psychonomic Society, New Orleans, LA, November 2018. This research was supported in part by funds from NIH Grant (U54 RR025215) to the third author. The authors wish to thank the parents and children who participated, and Jimeisha Brooks, Sloan Fulton, Jessica Luke, Keeley Newsom, and Ian Nolan for assistance in coding the

References (67)

  • B. Siposova et al.

    A new look at joint attention and common knowledge

    Cognition

    (2019)
  • A.L. Woodward

    Infants selectively encode the goal object of an actor's reach

    Cognition

    (1998)
  • F. Acerra et al.

    Modelling aspects of face processing in early infancy

    Developmental Science

    (2002)
  • D. Amso et al.

    An eye tracking investigation of developmental change in bottom-up attention orienting to faces in cluttered natural scenes

    PLoS One

    (2014)
  • R. Bakeman et al.

    Coordinating attention to people and objects in mother-infant and peer-infant interaction

    Child Development

    (1984)
  • C. Berger et al.

    GazeAlyze: a MATLAB toolbox for the analysis of eye movement data

    Behavior Research Methods

    (2012)
  • B.I. Bertenthal et al.

    Development of social attention in human infants

  • B.I. Bertenthal et al.

    When do infants begin to follow a point?

    Developmental Psychology

    (2014)
  • E. Birmingham et al.

    Gaze selection in complex social scenes

    Visual Cognition

    (2008)
  • I.W.R. Bushnell

    Mother's face recognition in newborn infants: Learning and memory

    Infant and Child Development

    (2001)
  • G. Butterworth et al.

    What minds have in common is space: Spatial mechanisms serving joint visual attention in infancy

    British Journal of Developmental Psychology

    (1991)
  • M. Carpenter et al.

    Social cognition, joint attention, and communicative competence from 9 to 15 months of age

    Monographs of the Society for Research in Child Development

    (1998)
  • L.B. Cohen

    Attention-getting and attention-holding processes of infant visual preferences

    Child Development

    (1972)
  • G. Csibra

    Recognizing communicative intentions in infancy

    Mind & Language

    (2010)
  • M.M. Daum et al.

    The development of pointing perception in infancy: Effects of communicative signals on covert shifts of attention

    Developmental Psychology

    (2013)
  • G.O. Deák et al.

    What leads to shared attention? Maternal cues and infant responses during object play

    Infancy

    (2018)
  • G.O. Deák et al.

    Watch the hands: Infants can learn to follow gaze by seeing adults manipulate objects

    Developmental Science

    (2014)
  • M. Elsabbagh et al.

    What you see is what you get: Contextual modulation of face scanning in typical and atypical development

    Social Cognitive and Affective Neuroscience

    (2014)
  • T. Farroni et al.

    Eye contact detection in humans from birth

    Proceedings of the National Academy of Sciences of the United States of America

    (2002)
  • T. Farroni et al.

    Gaze following in newborns

    Infancy

    (2004)
  • J.M. Franchak et al.

    Head-mounted eye tracking: A new method to describe infant looking

    Child Development

    (2011)
  • M.C. Frank et al.

    Measuring the development of social attention using free-viewing

    Infancy

    (2012)
  • G. Gredebäck et al.

    The development of joint visual attention: A longitudinal study of gaze following during interactions with mothers and strangers

    Developmental Science

    (2010)
  • Cited by (13)

    • An interactionist perspective on the development of coordinated social attention

      2021, Advances in Child Development and Behavior
      Citation Excerpt :

      The question then becomes, how much will infants' social attention be guided by the actions of the social partner, such as eye gaze, gestures and goal-directed actions, the orientation of her face, her body posture, or changes in her object-directed actions? This is a critical question because infants' systematic selection of social information in triadic interactions may not only precede but catalyze their appreciation of shared intentions or common ground (Boyer et al., 2020; Mundy & Newell, 2007). In other words, these preferences establish new opportunities for social interaction and social learning, which might very well contribute to infants' social understanding.

    View all citing articles on Scopus
    View full text