Facial metrics generated from manually and automatically placed image landmarks are highly correlated
Introduction
The human face is an important social stimulus. From a multitude of signals within faces, we can infer information about an individual that is often critical for social interaction, such as their age (Imai & Okami, 2019) and sex (Burton et al., 1993). People also make inferences regarding social social traits, such as attractiveness (Rhodes, 2006), health (Jones, 2018), and trustworthiness (Sutherland et al., 2013), from facial characteristics. Although the veracity of these perceptions is often questionable, they can influence important social outcomes, such as hiring and voting decisions and romantic partner choice (Todorov et al., 2015).
Researchers investigating social judgments of faces will often take specific shape measurements from face images and examine associations between these measurements and either perceived or physical characteristics of the photographed individual. For example, many studies have used this facial metric approach to investigate putative relationships between sexual dimorphism, distinctiveness, bilateral asymmetry, or facial width to height ratio (fWHR) and ratings of traits such as attractiveness, health, or dominance of face images (Holzleitner et al., 2014; Jones, 2018; Komori et al., 2009, Komori et al., 2011; Said & Todorov, 2011; Scheib et al., 1999). Other studies have used this approach to investigate putative relationships between these metrics and qualities of the photographed individuals such as their physical health, hormonal profile, or body size (Cai et al., 2019; Geniole et al., 2014; Lefevre et al., 2013; Wolffhechel et al., 2015). This approach has been invaluable for providing insights into the nature of the relationships among facial shape, person perception, and physical condition and, in doing so, has helped identify factors that drive social judgments of faces.
A significant barrier to addressing these research questions, and more importantly, addressing them well, is the length of time it takes to manually place the landmarks that are essential for calculating these facial metrics. Indeed, this cost may explain why studies investigating relationships among measured face shape and perceived or physical characteristics of the photographed individual are often underpowered (Cai et al., 2019; Holzleitner et al., 2014). Manual placement of landmarks on face images is also arguably a barrier to the reproducibility of facial metrics, since some research demonstrates that different people place key landmarks in different locations on face images (Geniole et al., 2014; Grammer & Thornhill, 1994; Rikowski & Grammer, 1999; Scheib et al., 1999). With many open face image sets now available (for a comprehensive list of open access face image sets, see https://rystoli.github.io/FSTC.html), these issues represent a significant block on research progress. In addition, these landmarks are also often used to create facial averages that individual images can be warped between to test the effect on perceptions (Scott, Kramer, Jones, & Ward, 2013; Sutherland et al., 2017), highlighting the essential nature and involvement of manual landmarking in many avenues of face perception research.
An alternative approach to manual placement of landmarks is to use fully automated landmark placement. Computer vision research has developed powerful face recognition algorithms trained to place landmarks quickly, automatically, and reproducibly, using regression tree methods (King, 2009). While they have seen extensive use in computer vision work (Baddar, Son, Kim, Kim, & Ro, 2016; Damer et al., 2019; Özseven & Düğenci, 2017; Schroff, Kalenichenko, & Philbin, 2015), these methods have not yet been validated for use in social perception research. Given that these automatically placed landmarks capture shape information vital for facial recognition (Juhong & Pintavirooj, 2017; Shi, Samal, & Marx, 2006), they may capture equally well the metrics of interest to social perception. If validated for measurement of facial metrics, automatic landmark placement would substantially decrease the time cost that manual landmark placements require, produce fully reproducible facial metrics, and ultimately improve the quality of research using facial metrics to investigate social perception.
In light of the above, in our first study, we investigated the correlations between four facial metrics that are commonly used in social perception research, sexual dimorphism, distinctiveness, bilateral asymmetry, and fWHR, derived from manually and automatically placed landmarks. As these shape-dependent measures are sensitive to scaling, translation, and rotation, we also examined these correlations between these manual and automatic landmarks after submitting them to a Generalized Procrustes Analysis (GPA; see Kleisner, Chvátalová, & Flegr, 2014; Mitteroecker, Windhager, Müller, & Schaefer, 2015). Finally, to investigate the generalizability of our results across image sets, we investigated these correlations in two independent open-access image sets (DeBruine & Jones, 2017; DeBruine & Jones, 2020). In our second study, we investigated whether these facial metric generated from manual and automatic landmarks show any systematic biases when measured on faces of different ethnicities, to test whether automatic methods may be generalizable to different study populations without introducing biases that can be present in facial detection algorithms (O'Toole, Phillips, An, & Dunlop, 2012).
Section snippets
Method
All data and analyses (including code for calculating facial metrics) can be found on the Open Science Framework (osf.io/5e3qp). Analyses were conducted using Python 3.6 and JupyterLab notebooks that detail the measurements and statistical analysis. We have also provided a tutorial notebook for automatic landmark of faces, also available on the Open Science Framework.
Image sets
The first open access image set used in our study was the Face Research Lab London Set (DeBruine & Jones, 2017). This image set
Study Two - Testing for potential biases in automatic landmark placement
We have demonstrated that strong correlations emerge between commonly used facial metrics measured from manual and automatically placed landmarks. Aside from errors in automatic landmarking on certain faces, automatic placement appears to be accurate and capable of deriving metrics of interest. However, automatic landmark placement of the kind leveraged here is a critical step in face detection and recognition algorithms (Damer et al., 2019; Juhong & Pintavirooj, 2017; Köstinger, Wohlhart,
Discussion
The current studies used several independent image sets to investigate the correlations between four facial metrics commonly used in social perception research (sexual dimorphism, distinctiveness, bilateral asymmetry, and fWHR) when they were derived from manually and automatically placed landmarks, as well as estimating the degree of bias that may occur if these landmarking procedures are used on faces of different ethnicities.
Fig. 2 highlights the main finding that, across both image sets and
References (43)
- et al.
No evidence that facial attractiveness, femininity, averageness, or coloration are cues to susceptibility to infectious illnesses in a university sample of young adult women
Evolution and Human Behavior
(2019) - et al.
Fearless dominance mediates the relationship between the facial width-to-height ratio and willingness to cheat
Personality and Individual Differences
(2014) The influence of shape and colour cue classes on facial health perception
Evolution and Human Behavior
(2018)- et al.
Averageness or symmetry: Which is more important for facial attractiveness?
Acta Psychologica
(2009) - et al.
Telling facial metrics: Facial width is associated with testosterone levels in men
Evolution and Human Behavior
(2013) - et al.
Demographic effects on estimates of automatic face recognition performance
Image and Vision Computing
(2012) - et al.
Facial cues to depressive symptoms and their associated personality attributions
Psychiatry Research
(2013) - et al.
How effective are landmarks and their geometry for face recognition?
Computer Vision and Image Understanding
(2006) - et al.
Social inferences from faces: Ambient images generate a three-dimensional model
Cognition
(2013) - et al.
Review on the effects of age, gender, and race demographics on automatic face recognition
The Visual Computer
(2018)
A deep facial landmarks detection with facial contour and facial components constraint
What’s the difference between men and women? Evidence from facial measurement
Perception
Accuracy comparison across face recognition algorithms: Where are we on measuring race bias?
ArXiv:1912.07398 [Cs]
Detecting face morphing attacks by analyzing the directed distances of facial landmarks shifts
A cross-cultural study of sex-typicality and averageness: Correlation between frontal and lateral measures of human faces
American Journal of Human Biology
Mitigating Bias in Gender, Age and Ethnicity Classification: A Multi-Task Convolution Neural Network Approach
WebMorph
3DSK face set with webmorph templates
Face research lab London set
Facial component-landmark detection
Face and Gesture
A systematic review of inter-ethnic variability in facial dimensions
Plastic and Reconstructive Surgery
Cited by (8)
Assessing the attractiveness of human face based on machine learning
2023, Procedia Computer ScienceIs facial structure an honest cue to real-world dominance and fighting ability in men? A pre-registered direct replication of Třebický et al. (2013)
2022, Evolution and Human BehaviorCitation Excerpt :Jones, Schild, and Jones (2020) indeed noted an outstanding r = 0.91 correlation between automatic and manual landmarking procedures, but, as it might be argued that this is not an entirely perfect correlation, we further provided our research assistants with standardised information via: (1) an in-person education session to each research assistant, which comprised information on the exact landmarking locations, which were then (2) repeated via email, and research assistants were then (3) provided with the previously mentioned UFC fighter facial landmarking order image seen in Fig. S2 in the Supplemental Material to further ensure standardisation, and (4) we then finally re-examined each of the facial images to ensure their precision. It is noted, however, that future studies—where feasible—are encouraged to follow automatic landmarking procedures employed by Jones et al. (2020). After landmarking procedures, we then used the Geomorph package in R (Adams & Otárola-Castillo, 2013) to import and analyse the landmark data.
Is facial width-to-height ratio reliably associated with social inferences?
2021, Evolution and Human BehaviorCitation Excerpt :The three sets of manually measured fWHRs were all highly intercorrelated with one another (rs from 0.92–0.96), so we took an average across the three manual measurements for each face. Additionally, we used a recently-published automated method to calculate fWHR (Jones et al., 2020). Because the automated procedure has only been validated on two sets of faces, we compared the fWHR scores obtained from the manual measurements to the scores provided by the automated procedure, so that we could further validate the automated method and examine any potential ethnicity bias in the automated procedure.
Assessing the Roles of Symmetry, Prototypicality, and Sexual Dimorphism of face Shape in Health Perceptions
2024, Adaptive Human Behavior and PhysiologyFace templates for the Chicago Face Database
2023, Behavior Research MethodsDeepSmile: Anomaly Detection Software for Facial Movement Assessment
2023, Diagnostics