Facial metrics generated from manually and automatically placed image landmarks are highly correlated

https://doi.org/10.1016/j.evolhumbehav.2020.09.002Get rights and content

Abstract

Research on social judgments of faces often investigates relationships between measures of face shape taken from images (facial metrics), and either perceptual ratings of the faces on various traits (e.g., attractiveness) or characteristics of the photographed individual (e.g., their health). A barrier to carrying out this research using large numbers of face images is the time it takes to manually position the landmarks from which these facial metrics are derived. Although research in face recognition has led to the development of algorithms that can automatically position landmarks on face images, the utility of such methods for deriving facial metrics commonly used in research on social judgments of faces has not yet been established. Thus, across two studies, we investigated the correlations between four facial metrics commonly used in social perception research (sexual dimorphism, distinctiveness, bilateral asymmetry, and facial width to height ratio) when measured from manually and automatically placed landmarks. In the first study, in two independent sets of open access face images, we found that facial metrics derived from manually and automatically placed landmarks were typically highly correlated, in both raw and Procrustes-fitted representations. In study two, we investigated the potential for automatic landmark placement to differ between White and East Asian faces. We found that two metrics, facial width to height ratio and sexual dimorphism, were better approximated by automatic landmarks in East Asian faces. However, this difference was small, and easily corrected with outlier detection. These data validate the use of automatically placed landmarks for calculating facial metrics to use in research on social judgments of faces, but we urge caution in their use. We also provide a tutorial for the automatic placement of landmarks on face images.

Introduction

The human face is an important social stimulus. From a multitude of signals within faces, we can infer information about an individual that is often critical for social interaction, such as their age (Imai & Okami, 2019) and sex (Burton et al., 1993). People also make inferences regarding social social traits, such as attractiveness (Rhodes, 2006), health (Jones, 2018), and trustworthiness (Sutherland et al., 2013), from facial characteristics. Although the veracity of these perceptions is often questionable, they can influence important social outcomes, such as hiring and voting decisions and romantic partner choice (Todorov et al., 2015).

Researchers investigating social judgments of faces will often take specific shape measurements from face images and examine associations between these measurements and either perceived or physical characteristics of the photographed individual. For example, many studies have used this facial metric approach to investigate putative relationships between sexual dimorphism, distinctiveness, bilateral asymmetry, or facial width to height ratio (fWHR) and ratings of traits such as attractiveness, health, or dominance of face images (Holzleitner et al., 2014; Jones, 2018; Komori et al., 2009, Komori et al., 2011; Said & Todorov, 2011; Scheib et al., 1999). Other studies have used this approach to investigate putative relationships between these metrics and qualities of the photographed individuals such as their physical health, hormonal profile, or body size (Cai et al., 2019; Geniole et al., 2014; Lefevre et al., 2013; Wolffhechel et al., 2015). This approach has been invaluable for providing insights into the nature of the relationships among facial shape, person perception, and physical condition and, in doing so, has helped identify factors that drive social judgments of faces.

A significant barrier to addressing these research questions, and more importantly, addressing them well, is the length of time it takes to manually place the landmarks that are essential for calculating these facial metrics. Indeed, this cost may explain why studies investigating relationships among measured face shape and perceived or physical characteristics of the photographed individual are often underpowered (Cai et al., 2019; Holzleitner et al., 2014). Manual placement of landmarks on face images is also arguably a barrier to the reproducibility of facial metrics, since some research demonstrates that different people place key landmarks in different locations on face images (Geniole et al., 2014; Grammer & Thornhill, 1994; Rikowski & Grammer, 1999; Scheib et al., 1999). With many open face image sets now available (for a comprehensive list of open access face image sets, see https://rystoli.github.io/FSTC.html), these issues represent a significant block on research progress. In addition, these landmarks are also often used to create facial averages that individual images can be warped between to test the effect on perceptions (Scott, Kramer, Jones, & Ward, 2013; Sutherland et al., 2017), highlighting the essential nature and involvement of manual landmarking in many avenues of face perception research.

An alternative approach to manual placement of landmarks is to use fully automated landmark placement. Computer vision research has developed powerful face recognition algorithms trained to place landmarks quickly, automatically, and reproducibly, using regression tree methods (King, 2009). While they have seen extensive use in computer vision work (Baddar, Son, Kim, Kim, & Ro, 2016; Damer et al., 2019; Özseven & Düğenci, 2017; Schroff, Kalenichenko, & Philbin, 2015), these methods have not yet been validated for use in social perception research. Given that these automatically placed landmarks capture shape information vital for facial recognition (Juhong & Pintavirooj, 2017; Shi, Samal, & Marx, 2006), they may capture equally well the metrics of interest to social perception. If validated for measurement of facial metrics, automatic landmark placement would substantially decrease the time cost that manual landmark placements require, produce fully reproducible facial metrics, and ultimately improve the quality of research using facial metrics to investigate social perception.

In light of the above, in our first study, we investigated the correlations between four facial metrics that are commonly used in social perception research, sexual dimorphism, distinctiveness, bilateral asymmetry, and fWHR, derived from manually and automatically placed landmarks. As these shape-dependent measures are sensitive to scaling, translation, and rotation, we also examined these correlations between these manual and automatic landmarks after submitting them to a Generalized Procrustes Analysis (GPA; see Kleisner, Chvátalová, & Flegr, 2014; Mitteroecker, Windhager, Müller, & Schaefer, 2015). Finally, to investigate the generalizability of our results across image sets, we investigated these correlations in two independent open-access image sets (DeBruine & Jones, 2017; DeBruine & Jones, 2020). In our second study, we investigated whether these facial metric generated from manual and automatic landmarks show any systematic biases when measured on faces of different ethnicities, to test whether automatic methods may be generalizable to different study populations without introducing biases that can be present in facial detection algorithms (O'Toole, Phillips, An, & Dunlop, 2012).

Section snippets

Method

All data and analyses (including code for calculating facial metrics) can be found on the Open Science Framework (osf.io/5e3qp). Analyses were conducted using Python 3.6 and JupyterLab notebooks that detail the measurements and statistical analysis. We have also provided a tutorial notebook for automatic landmark of faces, also available on the Open Science Framework.

Image sets

The first open access image set used in our study was the Face Research Lab London Set (DeBruine & Jones, 2017). This image set

Study Two - Testing for potential biases in automatic landmark placement

We have demonstrated that strong correlations emerge between commonly used facial metrics measured from manual and automatically placed landmarks. Aside from errors in automatic landmarking on certain faces, automatic placement appears to be accurate and capable of deriving metrics of interest. However, automatic landmark placement of the kind leveraged here is a critical step in face detection and recognition algorithms (Damer et al., 2019; Juhong & Pintavirooj, 2017; Köstinger, Wohlhart,

Discussion

The current studies used several independent image sets to investigate the correlations between four facial metrics commonly used in social perception research (sexual dimorphism, distinctiveness, bilateral asymmetry, and fWHR) when they were derived from manually and automatically placed landmarks, as well as estimating the degree of bias that may occur if these landmarking procedures are used on faces of different ethnicities.

Fig. 2 highlights the main finding that, across both image sets and

References (43)

  • W.J. Baddar et al.

    A deep facial landmarks detection with facial contour and facial components constraint

  • A.M. Burton et al.

    What’s the difference between men and women? Evidence from facial measurement

    Perception

    (1993)
  • J.G. Cavazos et al.

    Accuracy comparison across face recognition algorithms: Where are we on measuring race bias?

    ArXiv:1912.07398 [Cs]

    (2020)
  • N. Damer et al.

    Detecting face morphing attacks by analyzing the directed distances of facial landmarks shifts

  • D.P. Danel et al.

    A cross-cultural study of sex-typicality and averageness: Correlation between frontal and lateral measures of human faces

    American Journal of Human Biology

    (2018)
  • A. Das et al.

    Mitigating Bias in Gender, Age and Ethnicity Classification: A Multi-Task Convolution Neural Network Approach

  • L. DeBruine

    WebMorph

  • L. DeBruine et al.

    3DSK face set with webmorph templates

    (2020)
  • L.M. DeBruine et al.

    Face research lab London set

    (2017)
  • B.A. Efraty et al.

    Facial component-landmark detection

    Face and Gesture

    (2011)
  • F. Fang et al.

    A systematic review of inter-ethnic variability in facial dimensions

    Plastic and Reconstructive Surgery

    (2011)
  • Cited by (8)

    • Is facial structure an honest cue to real-world dominance and fighting ability in men? A pre-registered direct replication of Třebický et al. (2013)

      2022, Evolution and Human Behavior
      Citation Excerpt :

      Jones, Schild, and Jones (2020) indeed noted an outstanding r = 0.91 correlation between automatic and manual landmarking procedures, but, as it might be argued that this is not an entirely perfect correlation, we further provided our research assistants with standardised information via: (1) an in-person education session to each research assistant, which comprised information on the exact landmarking locations, which were then (2) repeated via email, and research assistants were then (3) provided with the previously mentioned UFC fighter facial landmarking order image seen in Fig. S2 in the Supplemental Material to further ensure standardisation, and (4) we then finally re-examined each of the facial images to ensure their precision. It is noted, however, that future studies—where feasible—are encouraged to follow automatic landmarking procedures employed by Jones et al. (2020). After landmarking procedures, we then used the Geomorph package in R (Adams & Otárola-Castillo, 2013) to import and analyse the landmark data.

    • Is facial width-to-height ratio reliably associated with social inferences?

      2021, Evolution and Human Behavior
      Citation Excerpt :

      The three sets of manually measured fWHRs were all highly intercorrelated with one another (rs from 0.92–0.96), so we took an average across the three manual measurements for each face. Additionally, we used a recently-published automated method to calculate fWHR (Jones et al., 2020). Because the automated procedure has only been validated on two sets of faces, we compared the fWHR scores obtained from the manual measurements to the scores provided by the automated procedure, so that we could further validate the automated method and examine any potential ethnicity bias in the automated procedure.

    • Face templates for the Chicago Face Database

      2023, Behavior Research Methods
    View all citing articles on Scopus
    View full text