Facial metrics generated from manually and automatically placed image landmarks are highly correlated

doi:10.1016/j.evolhumbehav.2020.09.002

Evolution and Human Behavior

Volume 42, Issue 3, May 2021, Pages 186-193

https://doi.org/10.1016/j.evolhumbehav.2020.09.002 Get rights and content

Abstract

Research on social judgments of faces often investigates relationships between measures of face shape taken from images (facial metrics), and either perceptual ratings of the faces on various traits (e.g., attractiveness) or characteristics of the photographed individual (e.g., their health). A barrier to carrying out this research using large numbers of face images is the time it takes to manually position the landmarks from which these facial metrics are derived. Although research in face recognition has led to the development of algorithms that can automatically position landmarks on face images, the utility of such methods for deriving facial metrics commonly used in research on social judgments of faces has not yet been established. Thus, across two studies, we investigated the correlations between four facial metrics commonly used in social perception research (sexual dimorphism, distinctiveness, bilateral asymmetry, and facial width to height ratio) when measured from manually and automatically placed landmarks. In the first study, in two independent sets of open access face images, we found that facial metrics derived from manually and automatically placed landmarks were typically highly correlated, in both raw and Procrustes-fitted representations. In study two, we investigated the potential for automatic landmark placement to differ between White and East Asian faces. We found that two metrics, facial width to height ratio and sexual dimorphism, were better approximated by automatic landmarks in East Asian faces. However, this difference was small, and easily corrected with outlier detection. These data validate the use of automatically placed landmarks for calculating facial metrics to use in research on social judgments of faces, but we urge caution in their use. We also provide a tutorial for the automatic placement of landmarks on face images.

Introduction

The human face is an important social stimulus. From a multitude of signals within faces, we can infer information about an individual that is often critical for social interaction, such as their age (Imai & Okami, 2019) and sex (Burton et al., 1993). People also make inferences regarding social social traits, such as attractiveness (Rhodes, 2006), health (Jones, 2018), and trustworthiness (Sutherland et al., 2013), from facial characteristics. Although the veracity of these perceptions is often questionable, they can influence important social outcomes, such as hiring and voting decisions and romantic partner choice (Todorov et al., 2015).

Researchers investigating social judgments of faces will often take specific shape measurements from face images and examine associations between these measurements and either perceived or physical characteristics of the photographed individual. For example, many studies have used this facial metric approach to investigate putative relationships between sexual dimorphism, distinctiveness, bilateral asymmetry, or facial width to height ratio (fWHR) and ratings of traits such as attractiveness, health, or dominance of face images (Holzleitner et al., 2014; Jones, 2018; Komori et al., 2009, Komori et al., 2011; Said & Todorov, 2011; Scheib et al., 1999). Other studies have used this approach to investigate putative relationships between these metrics and qualities of the photographed individuals such as their physical health, hormonal profile, or body size (Cai et al., 2019; Geniole et al., 2014; Lefevre et al., 2013; Wolffhechel et al., 2015). This approach has been invaluable for providing insights into the nature of the relationships among facial shape, person perception, and physical condition and, in doing so, has helped identify factors that drive social judgments of faces.

A significant barrier to addressing these research questions, and more importantly, addressing them well, is the length of time it takes to manually place the landmarks that are essential for calculating these facial metrics. Indeed, this cost may explain why studies investigating relationships among measured face shape and perceived or physical characteristics of the photographed individual are often underpowered (Cai et al., 2019; Holzleitner et al., 2014). Manual placement of landmarks on face images is also arguably a barrier to the reproducibility of facial metrics, since some research demonstrates that different people place key landmarks in different locations on face images (Geniole et al., 2014; Grammer & Thornhill, 1994; Rikowski & Grammer, 1999; Scheib et al., 1999). With many open face image sets now available (for a comprehensive list of open access face image sets, see https://rystoli.github.io/FSTC.html), these issues represent a significant block on research progress. In addition, these landmarks are also often used to create facial averages that individual images can be warped between to test the effect on perceptions (Scott, Kramer, Jones, & Ward, 2013; Sutherland et al., 2017), highlighting the essential nature and involvement of manual landmarking in many avenues of face perception research.

An alternative approach to manual placement of landmarks is to use fully automated landmark placement. Computer vision research has developed powerful face recognition algorithms trained to place landmarks quickly, automatically, and reproducibly, using regression tree methods (King, 2009). While they have seen extensive use in computer vision work (Baddar, Son, Kim, Kim, & Ro, 2016; Damer et al., 2019; Özseven & Düğenci, 2017; Schroff, Kalenichenko, & Philbin, 2015), these methods have not yet been validated for use in social perception research. Given that these automatically placed landmarks capture shape information vital for facial recognition (Juhong & Pintavirooj, 2017; Shi, Samal, & Marx, 2006), they may capture equally well the metrics of interest to social perception. If validated for measurement of facial metrics, automatic landmark placement would substantially decrease the time cost that manual landmark placements require, produce fully reproducible facial metrics, and ultimately improve the quality of research using facial metrics to investigate social perception.

In light of the above, in our first study, we investigated the correlations between four facial metrics that are commonly used in social perception research, sexual dimorphism, distinctiveness, bilateral asymmetry, and fWHR, derived from manually and automatically placed landmarks. As these shape-dependent measures are sensitive to scaling, translation, and rotation, we also examined these correlations between these manual and automatic landmarks after submitting them to a Generalized Procrustes Analysis (GPA; see Kleisner, Chvátalová, & Flegr, 2014; Mitteroecker, Windhager, Müller, & Schaefer, 2015). Finally, to investigate the generalizability of our results across image sets, we investigated these correlations in two independent open-access image sets (DeBruine & Jones, 2017; DeBruine & Jones, 2020). In our second study, we investigated whether these facial metric generated from manual and automatic landmarks show any systematic biases when measured on faces of different ethnicities, to test whether automatic methods may be generalizable to different study populations without introducing biases that can be present in facial detection algorithms (O'Toole, Phillips, An, & Dunlop, 2012).

Section snippets

Method

All data and analyses (including code for calculating facial metrics) can be found on the Open Science Framework (osf.io/5e3qp). Analyses were conducted using Python 3.6 and JupyterLab notebooks that detail the measurements and statistical analysis. We have also provided a tutorial notebook for automatic landmark of faces, also available on the Open Science Framework.

Image sets

The first open access image set used in our study was the Face Research Lab London Set (DeBruine & Jones, 2017). This image set

Study Two - Testing for potential biases in automatic landmark placement

We have demonstrated that strong correlations emerge between commonly used facial metrics measured from manual and automatically placed landmarks. Aside from errors in automatic landmarking on certain faces, automatic placement appears to be accurate and capable of deriving metrics of interest. However, automatic landmark placement of the kind leveraged here is a critical step in face detection and recognition algorithms (Damer et al., 2019; Juhong & Pintavirooj, 2017; Köstinger, Wohlhart,

Discussion

The current studies used several independent image sets to investigate the correlations between four facial metrics commonly used in social perception research (sexual dimorphism, distinctiveness, bilateral asymmetry, and fWHR) when they were derived from manually and automatically placed landmarks, as well as estimating the degree of bias that may occur if these landmarking procedures are used on faces of different ethnicities.

Fig. 2 highlights the main finding that, across both image sets and

References (43)

Z. Cai et al.
No evidence that facial attractiveness, femininity, averageness, or coloration are cues to susceptibility to infectious illnesses in a university sample of young adult women
Evolution and Human Behavior
(2019)
S.N. Geniole et al.
Fearless dominance mediates the relationship between the facial width-to-height ratio and willingness to cheat
Personality and Individual Differences
(2014)
A.L. Jones
The influence of shape and colour cue classes on facial health perception
Evolution and Human Behavior
(2018)
M. Komori et al.
Averageness or symmetry: Which is more important for facial attractiveness?
Acta Psychologica
(2009)
C.E. Lefevre et al.
Telling facial metrics: Facial width is associated with testosterone levels in men
Evolution and Human Behavior
(2013)
A.J. O’Toole et al.
Demographic effects on estimates of automatic face recognition performance
Image and Vision Computing
(2012)
N.J. Scott et al.
Facial cues to depressive symptoms and their associated personality attributions
Psychiatry Research
(2013)
J. Shi et al.
How effective are landmarks and their geometry for face recognition?
Computer Vision and Image Understanding
(2006)
C.A.M. Sutherland et al.
Social inferences from faces: Ambient images generate a three-dimensional model
Cognition
(2013)
S.H. Abdurrahim et al.
Review on the effects of age, gender, and race demographics on automatic face recognition
The Visual Computer
(2018)

W.J. Baddar et al.

A deep facial landmarks detection with facial contour and facial components constraint

A.M. Burton et al.

What’s the difference between men and women? Evidence from facial measurement

Perception

(1993)

J.G. Cavazos et al.

Accuracy comparison across face recognition algorithms: Where are we on measuring race bias?

ArXiv:1912.07398 [Cs]

(2020)

N. Damer et al.

Detecting face morphing attacks by analyzing the directed distances of facial landmarks shifts

D.P. Danel et al.

A cross-cultural study of sex-typicality and averageness: Correlation between frontal and lateral measures of human faces

American Journal of Human Biology

(2018)

A. Das et al.

Mitigating Bias in Gender, Age and Ethnicity Classification: A Multi-Task Convolution Neural Network Approach

L. DeBruine

WebMorph

L. DeBruine et al.

3DSK face set with webmorph templates

(2020)

L.M. DeBruine et al.

Face research lab London set

(2017)

B.A. Efraty et al.

Facial component-landmark detection

Face and Gesture

(2011)

F. Fang et al.

A systematic review of inter-ethnic variability in facial dimensions

Plastic and Reconstructive Surgery

(2011)

Cited by (8)

Assessing the attractiveness of human face based on machine learning
2023, Procedia Computer Science
The attractiveness of the face plays an important role in everyday life, especially in the modern world where social media and the Internet surround us. In this study, an attempt to assess the attractiveness of a face by machine learning is shown. Attractiveness is determined by three deep models whose sum of predictions is the final score. Two annotated datasets available in the literature are employed for training and testing the algorithms, i.e., a dataset named SCUT-FBP5500 to train the deep learning models to predict facial attractiveness and Face Research Lab London Set designated for the test. The first model pays attention to the dominant background colors in the photo; the second model is based on a pre-trained deep neural network. Finally, for facial proportion assessment, distances between key points on the face are linked with attractiveness ratings, so the last dataset considers face proportions. Several algorithms are trained and tested, including baseline machine learning algorithms, i.e., LinearSVR, SDGRegressor, Lasso, RandomForestRegressor, and deep models, such as Xception VGG19 ResNet50v2, and MobileNetv2. A discussion of the results, as well as some concluding remarks, are also provided. The results from the trained models based on SCUT-FBP5500 show a systematic error for the Face Research Lab London Set database. This is probably caused by a different type of image evaluation in both databases. Although the results obtained show no visible winner among the algorithms employed, the best results are seen for five clusters and five colors fed onto the regressor.
Is facial structure an honest cue to real-world dominance and fighting ability in men? A pre-registered direct replication of Třebický et al. (2013)
2022, Evolution and Human Behavior
Citation Excerpt :
Jones, Schild, and Jones (2020) indeed noted an outstanding r = 0.91 correlation between automatic and manual landmarking procedures, but, as it might be argued that this is not an entirely perfect correlation, we further provided our research assistants with standardised information via: (1) an in-person education session to each research assistant, which comprised information on the exact landmarking locations, which were then (2) repeated via email, and research assistants were then (3) provided with the previously mentioned UFC fighter facial landmarking order image seen in Fig. S2 in the Supplemental Material to further ensure standardisation, and (4) we then finally re-examined each of the facial images to ensure their precision. It is noted, however, that future studies—where feasible—are encouraged to follow automatic landmarking procedures employed by Jones et al. (2020). After landmarking procedures, we then used the Geomorph package in R (Adams & Otárola-Castillo, 2013) to import and analyse the landmark data.
Masculine facial morphology (e.g., larger jaw, prominent cheekbones) have been linked to a suite of social outcomes—including greater wealth, career progress, romantic desirability, and even greater political success. A leading explanation for these links is that dominant facial structures represent honest cues of physical dominance and fighting ability. Třebický et al. (2013) published the first study to demonstrate that masculine facial cues (e.g., large nose, deep-set eyes) similarly predict both dominance judgments and real-world fighting success, but no studies have replicated these popular findings that a large and specific assortment of masculine facial structures are implicated in both fighting ability and dominance perception. Thus, we conducted a pre-registered direct replication and extension of Třebický et al. (2013). Two separate samples of United States MTurk participants rated 516 UFC fighters' facial photographs on perceived aggressiveness (N = 500) and fighting ability (N = 500). Results showed that perceived aggressiveness was associated with masculine facial morphology (e.g., large nose, deep-set eyes) independent of bodily size. There was also some evidence that perceived fighting ability was associated with masculine facial morphology, which disappeared after controlling for bodily size. There was no support for the relation between facial structure and fighting success, and there were often negative relations between perceived aggressiveness and fighting ability on fighting success (i.e., more successful fighters were perceived as less successful and aggressive). We argue that our evolved psychology differentially processes the distinct avenues to victory that constitute overall fighting success (e.g., knockout versus submission wins).
Is facial width-to-height ratio reliably associated with social inferences?
2021, Evolution and Human Behavior
Citation Excerpt :
The three sets of manually measured fWHRs were all highly intercorrelated with one another (rs from 0.92–0.96), so we took an average across the three manual measurements for each face. Additionally, we used a recently-published automated method to calculate fWHR (Jones et al., 2020). Because the automated procedure has only been validated on two sets of faces, we compared the fWHR scores obtained from the manual measurements to the scores provided by the automated procedure, so that we could further validate the automated method and examine any potential ethnicity bias in the automated procedure.
Theoretical considerations and early empirical findings suggested facial width-to-height ratio (fWHR) may be relevant to person perception because it is associated with behavioral dispositions. More recent evidence failing to find fWHR-behavior links suggests that mismatch or byproduct hypotheses may be necessary to explain fWHR-based trait inferences; however, these explanations may not be needed because it is not clear that fWHR is reliably associated with trait inferences. To investigate the robustness of fWHR-inference links, we conducted secondary analyses of a cross-national dataset consisting of ratings by 11,481 participants across 11 world regions who judged 60 male and 60 female faces on one of 13 social traits (ns per trait range from 760 to 975). In preregistered analyses—and exploratory analyses of a subset of traits in the larger sample of 597 faces from which the 120 faces were drawn—we found mixed evidence for fWHR-based social judgments. In multilevel models, fWHR was not reliably linked to raters' judgments of male faces for any of the 13 trait-inferences but was negatively associated with ratings of female faces' dominance, trustworthiness, sociability, emotional stability, responsibility, confidence, attractiveness, and intelligence. In exploratory analyses of a subset of traits using the larger sample of faces, fWHR was associated positively with perceptions of meanness and aggressiveness in male but not female faces, negatively with attractiveness and dominance in female but not male faces, and negatively with trustworthiness in male but not female faces. We interpret these mixed findings to suggest that (1) fWHR-inference links are likely to be smaller and less reliable than expected from prior research; (2) fWHR may play a larger role in perceptions of female faces than would be predicted from the theory underpinning fWHR hypotheses; and (3) future research should more closely examine the extent to which robust fWHR-inferences reflect mismatch in the reliability of fWHR-behavior links between ancestral and modern environments versus byproducts of other person perception mechanisms.
Assessing the Roles of Symmetry, Prototypicality, and Sexual Dimorphism of face Shape in Health Perceptions
2024, Adaptive Human Behavior and Physiology
Face templates for the Chicago Face Database
2023, Behavior Research Methods
DeepSmile: Anomaly Detection Software for Facial Movement Assessment
2023, Diagnostics

View all citing articles on Scopus

View full text

Facial metrics generated from manually and automatically placed image landmarks are highly correlated

Abstract

Introduction

Section snippets

Method

Image sets

Study Two - Testing for potential biases in automatic landmark placement

Discussion

Evolution and Human Behavior

Personality and Individual Differences

Evolution and Human Behavior

Acta Psychologica

Evolution and Human Behavior

Image and Vision Computing

Psychiatry Research

Computer Vision and Image Understanding

Cognition

Review on the effects of age, gender, and race demographics on automatic face recognition

The Visual Computer

A deep facial landmarks detection with facial contour and facial components constraint

What’s the difference between men and women? Evidence from facial measurement

Perception

Accuracy comparison across face recognition algorithms: Where are we on measuring race bias?

ArXiv:1912.07398 [Cs]

Detecting face morphing attacks by analyzing the directed distances of facial landmarks shifts

A cross-cultural study of sex-typicality and averageness: Correlation between frontal and lateral measures of human faces

American Journal of Human Biology

Mitigating Bias in Gender, Age and Ethnicity Classification: A Multi-Task Convolution Neural Network Approach

WebMorph

3DSK face set with webmorph templates

Face research lab London set

Facial component-landmark detection

Face and Gesture

A systematic review of inter-ethnic variability in facial dimensions

Plastic and Reconstructive Surgery