Introduction

Our appraisal of the trustworthiness of strangers from their facial appearance is a very important aspect of our daily social life. A brief exposure to unknown faces is sufficient for us to evaluate their trustworthiness (Willis and Todorov, 2006). This trustworthiness evaluation involves the activation of the amygdala, a subcortical brain region that plays a crucial role in fear response generation. For example, patients with bilateral amygdala lesions show impairments in discriminating between trustworthy- and untrustworthy-looking faces (Adolphs et al., 1998). Further, brain imaging studies report that amygdala activation increases linearly with a decrease in the trustworthiness of faces (Engell et al., 2007; Winston et al., 2002). Notably, amygdala’s activity decreases in association with facial familiarity (Gobbini and Haxby, 2007) and becomes completely inactive in a response to one’s own face at either the subliminal or supraliminal level (Ota and Nakano, 2021a, 2021b). If amygdala activation is negatively correlated with trustworthiness, we may perceive faces that resemble our own trustworthy. Correspondingly, many studies have demonstrated the impact of self-resemblance on inferences of trustworthiness by using faces created by digitally morphing self-face and other faces (Bailenson et al., 2008; DeBruine, 2002, 2005; Verosky and Todorov, 2010). For example, a composite face with self-face increased the prosocial behavior in a trust game (DeBruine, 2002) and biased candidate preference in a political election (Bailenson et al., 2008). However, in reality, there is no such thing as a stranger’s face containing half of one’s own face. There is a trivial possibility that the mere exposure effect of the self-face (Zajonc, 1968) or self-positivity bias (Lin et al., 2003) produces this effect. Therefore, it is necessary to investigate whether this effect is observed on natural unfamiliar faces in real life.

Composite face stimuli with self-face have been used in the previous studies due to the difficulties involved in quantitatively assessing the similarity between the self-face and other faces. Facial similarity is affected by various factors, including the shape, size, and arrangement of facial parts. It is very difficult to determine the factors that contribute to facial similarity evaluation and the extent of their contributions. To address this problem, the current study used deep convolutional neural networks (DCNNs) for human face recognition to assess the similarity between self-face and other faces. DCNNs have been extensively studied in many application fields, such as industrial process monitoring and fault detection (Huang et al., 2020), credit default prediction (Xu et al., 2021), and facial recognition (Baltrušaitis et al., 2018; Schroff et al., 2015). Especially, for face recognition, DCNNs learn to reduce the distance between the feature vectors of a person as much as possible. In other words, the smaller the distance between the feature vectors, the more similar the faces to each other as a whole. A previous study compared estimation of couples’ facial similarity between human judgment and the DCNNs’ facial recognition algorithm. The study confirmed that similar results were obtained (Tea-Makorn and Kosinski, 2020). Recently, some researchers succeeded in obtaining highly discriminative features for face recognition by incorporating a margin in the loss function (Deng et al., 2019; Wang et al., 2018), such as the additive angular margin loss (ArcFace) (Deng et al., 2019). By adding additive margin loss for optimizing the neural network, the DCNN learns to maximize the separation of variable faces, enabling more accurate person identification. In this study, we used this state-of-the-art DCNN for face recognition to compute the face dissimilarity distance on a large dataset of natural faces. To eliminate the effects of race and age on the appraisal of trustworthiness, we limited this study’s models and evaluators to individuals of the same race, age, and social class and examined whether the people whose faces resembled one’s own face were perceived more trustworthy than others.

To the best of our knowledge, this is the first study to demonstrate the relationship between facial similarity estimated from the feature vectors of deep learning and trustworthiness perceived by people. The results of our study have the potential to be used for a wide range of applications in the online society. For example, the automatic estimation of feature vectors from faces will enable personalized suggestions and matching, such as finding partners in peer to peer (P2P) lending, member matching in SNS, and creating more reliable avatars.

Methods

We created a facial dataset comprising the faces of 200 Japanese college students (100 male and 100 female students aged 19–24 years). Images of full frontal face with neutral expression without any eyeglasses or accessories were taken using a smartphone-camera. Subsequently, the face area was trimmed to fit a square shape. All the photo images were 512 × 512 pixels in size and converted to gray scale, with a mean intensity adjusted to 128 and a standard deviation of intensity set to 50. Another group of 30 Japanese college students (15 male and 15 female students aged 19–24 years) participated in the study as evaluators. We created their facial photographs in the aforementioned manner with the facial dataset. The chosen sample sizes of the face datasets and evaluators are similar to those in previous publications related to trustworthiness evaluation of human faces (Adolphs et al., 1998; Todorov et al., 2008).

The review board of Osaka University approved the experimental protocol (FBS30-4), and our procedures followed the guidelines outlined by the Declaration of Helsinki. All participants provided written informed consent prior to the experiment.

The evaluators participated in a behavioral experiment to rate the trustworthiness of the 200 faces of the dataset (Fig. 1A). For this purpose, they were positioned in front of a monitor display (EIZO 24 inch, 1920 × 1080 pixels) at a distance of 65 cm. In each trial, following the presentation of the fixation cross for 1 s against a gray background, the facial photograph was presented for 0.5 s. Subsequently, the evaluators were asked to rate the face’s trustworthiness on a scale of 1 (very untrustworthy) to 7 (very trustworthy) using a universal serial bus numeric keypad. The input number was converted to a range from −3 (very untrustworthy) to +3 (very trustworthy). To appraise trustworthiness, the raters imagined trusting the person whose face was presented with their money. Once they pressed the button on the display, the next trial started after 1.5 s. Half the number of raters first assessed a group of women and then a group of men. The remaining raters evaluated the facial images in the reverse order. Within the same group, the facial images were presented in a random order. The presentations of the stimuli were controlled using MATLAB 2019a (Mathworks).

Fig. 1: Face similarity map depicting the distribution of face models and evaluators.
figure 1

A Protocol of the face evaluation task. After fixation, a face photo was presented for 0.5 s, and participants were asked to rate the trustworthiness of it. B Calculation of a 512-dimensional embedding vector for each face using the pretrained Deep Convolutional Neural Network (ArcFace). C Face dissimilarity distance (L2 distance) matrix for all face model-evaluator pairs. D A t-distributed stochastic neighbor embedding (tSNE) map compressing the 512 dimensions of the embedding vector of all participants into two dimensions.

Facial similarity was calculated using the pretrained Deep Convolutional Neural Network (DCNN) for face recognition supervised by the ArcFace loss (Deng et al., 2019). The pretrained ArcFace model based on ResNet50 (He et al., 2016) and pre-trained using MS-Celeb-1M dataset (Guo et al., 2016) is available at https://github.com/mobilesec/arcface-tensorflowlite. This neural network computes a 512-dimensional embedding vector for each face. We analyzed the vectors of 230 faces (the 200 faces in the dataset and 30 raters), calculated the L2 distance between vectors for each pair, and used it as the face dissimilarity distance. Here, photo images resized to 112 × 112 pixels were inputted to the model, and each embedding vector was L2 normalized. To visualize the high-dimensional data, we compressed them into two dimensions using t-distributed stochastic neighbor embedding (tSNE).

The relationship between the face model’s trustworthiness and the face dissimilarity distance between the rater and model was analyzed using the linear regression model (y = beta*x + b, trustworthiness score for y and the face dissimilarity distance for x). Subsequently, the effect of sex on the slope was evaluated by analysis of covariance (ANCOVA). To examine whether the averaged face was evaluated to be more trustworthy than the face model, we calculated the dissimilar distance between each face model and the averaged face. Further, the averaged face was created by averaging the vectors among the same-sex group. Subsequently, we analyzed the correlation between the dissimilar distance of each face model from the average face and the mean trustworthiness score of the face model. All statistical analysis was conducted using the Matlab 2019a (Mathworks).

Results

Initially, we created a dataset of the frontal faces with emotionally neutral expressions of 200 (100 male and 100 female) Japanese college students (aged 19–24 years). A second group of 30 (15 male and 15 female) Japanese college students (19–24 years) rated the trustworthiness of the faces on a 7-point scale (Fig. 1A). The trustworthiness score was converted to a range from −3 (very untrustworthy) to +3 (very trustworthy) so that neutral was 0, for later analysis. The mean trustworthiness was 0.09 ± 0.82 for female faces and −0.02 ± 0.82 for male faces, and there was no significant difference between the two values (two-sample t-test, t198 = −0.98, p = 0.33). In addition, there was no significant difference in trustworthiness values between female and male raters (female faces: female rater 0.14 ± 0.39, male rater 0.03 ± 0.35, t28 = 0.8, p = 0.4; male faces: female rater 0.03 ± 0.3, male rater −0.08 ± 0.53, t28 = 0.73, p = 0.47).

Subsequently, to calculate the facial dissimilarity distance between the face dataset and the raters, we computed a 512-dimensional embedding vector for each face using the pretrained DCNNs for face recognition (ArcFace (Deng et al., 2019), see Fig. 1B). Further, we calculated the L2 distance between the vectors for each pair. Figure 1C depicts the L2 distance matrix for all face model-rater pairs. It further indicates that the face dissimilarity distance between the faces of opposite sex is greater than that between the faces of same sex. A t-distributed stochastic neighbor embedding (tSNE) map, which visualizes high-dimensional data in the low-dimensional map, revealed a clear difference in terms of sex in the 512-dimensional embedding vector (Fig. 1D), as well. In addition, the raters’ faces were dispersed within the distribution of same-sex face models. Therefore, we utilized this L2 distance as an index of the face dissimilarity distance between the rater and face model and analyzed the relationship between self-resemblance and trustworthiness rating.

Next, we analyzed a relationship between the face dissimilarity distance and perceived trustworthiness using linear regression model. A significant negative slope between face dissimilarity distance and trustworthiness was detected (beta = −0.18, t5,996 = −2.34, p = 0.019, R2 = 0.001). We further examined the effect of sex on this negative correlation. As shown in Fig. 2, the face dissimilarity distance between the face model and evaluator was greater when both were of opposite sexes (blue line) than when they were of the same sex (red line). In the same sex, perceived trustworthiness dramatically increased with a decrease in the face dissimilarity distance. Contrastingly, perceived trustworthiness did not change with the face dissimilarity distance for the opposite sex. Further, ANCOVA confirmed that a significant difference in slope between the same and opposite sex (F5,996 = 3.85, p = 0.049, η² = 0.001; the same sex, beta = −0.36, t2,998 = −3.15, p = 0.0016, R2 = 0.003; the opposite sex, beta = −0.04, t2,998 = −0.36, p = 0.72, R2 = 0.0004). In addition, there was no significant difference in slope between the male and female evaluators (F5,996 = 1.24, p = 0.27, η² = 0.000), and between the male and female models (F5,996 = 0.95, p = 0.33, η² = 0.000).

Fig. 2: Relationship between trustworthiness and face dissimilarity distance.
figure 2

The red line represents the face dissimilarity distance between the evaluator and face model when both were of the same sex, and the blue line represents the distance between the two when they were of opposite sexes.

To examine the impact of the performance of DCNNs for estimating facial similarity on the correlation between facial similarity and trust level, we calculated the facial dissimilarity distance using FaceNet512 (Schroff et al., 2015) (https://pypi.org/project/deepface/), which does not introduce a margin in the loss function. As a result, the facial dissimilarity distance was not significantly correlated with the perceived trust level (beta = 0.01, t5996 = 1.71, p = 0.09). We further examined the effect of center cropping of the dataset using the ArcFace network (https://github.com/peteryuX/arcface-tf2) and found that without center cropping, the significant correlation between facial similarity and trust level disappeared (with center cropping, beta = −0.4, t5996 = −2.40, p = 0.017; without center cropping, beta = −0.14, t5996 = −0.8, p = 0.42). These results suggest that to examine the relationship of facial similarity with the trust level, it is critical to use a neural network that can more accurately represent the relative relationship of facial similarity, such as by introducing a margin in the loss function and centrally cropping the database.

We further examined the effect of dataset size on the correlation between the face dissimilarity distance and perceived trustworthiness. From the original dataset consisting of 200 people, we randomly selected 50, 100, and 150 people 1000 times each, and calculated the correlation coefficient in each dataset. The results confirmed that when the number of people in the dataset exceeded 150, the 95% confidence interval of the correlation coefficient fell below 0 (from −0.06 to −0.005; see Supplementary Fig. 1).

To examine the effect of the number of trustworthy levels on the correlation between facial dissimilarity distance and trustworthy level, we performed sensitive analysis. We merged 7-levels (−3 - +3) of trust level into 3-levels ([−3, −2], [−1, 0, +1], [+2, +3]) or 5-levels (−3, [−2, −1], 0, [+1, +2], +3), and analyzed Pearson’s correlation coefficient R for each. As a result, we consistently observed a significant negative correlation for all scales (3-levels, r = −0.028, p = 0.028; 5-levels, r = −0.028, p = 0.03; 7-levels = −0.03, p = 0.019); however, the negative correlation was the highest for 7-levels (see Supplementary Fig. 2).

There is a trivial possibility that this phenomenon occurred because the face with the smallest average distance from all evaluators was rated the most trustworthy. To examine this possibility, we averaged the embedding vectors across the same-sex face models for each sex and defined these vectors as the embedding vectors of average faces (indicated using green-colored triangles and circles in Fig. 1B). Then, we calculated the face dissimilarity distance from the average face to each face model and analyzed the association between this distance and the trustworthiness score of each face model (Fig. 3). No significant correlation was found between trustworthiness scores and the distance from the average face for both male and female faces (female: r = −0.08, p = 0.44; male: r = 0.02, p = 0.82).

Fig. 3: Relationship between the trustworthiness score and the face dissimilarity distance from the average face for each sex.
figure 3

Datasets for A female and B male faces.

Discussion

By examining a large number of unaltered natural face images and measuring their objective facial similarity using the state-of-art DCNNs, the current study demonstrated that people automatically calculate the similarity of a stranger’s face to their own and perceive faces that resemble themselves to be more trustworthy than those that do not. However, this phenomenon is observed only when the individual and stranger are of the same sex. Further, our results suggest that, in real life, people flexibly change their social judgment strategies depending on the stranger’s gender.

Why does self-resemblance affect the perception of trustworthiness? Brain imaging studies report that the amygdala’s activity increased when a person confronted an untrustworthy-looking face (Engell et al., 2007; Winston et al., 2002). Moreover, researchers reveal that patients with bilateral lesions in the amygdala evaluate every face to be trustworthy (Adolphs et al., 1998). These findings demonstrate that when the amygdala is not activated, people perceive an object to be trustworthy. Accordingly, the results of the current study imply that the amygdala is activated on viewing faces that do not resemble us, whereas it is suppressed on encountering faces that resemble us. Therefore, one should consider why the face that resembles us does not activate the amygdala. According to a previous study, the extent to which a person looks caring and attractive accounts for more than eighty percent of the variance in trustworthiness judgments (Todorov et al., 2008). On the other hand, the amygdala is activated when a person encounters fearful (Adolphs et al., 1994) or unattractive faces (Gobbini and Haxby, 2007; Ota and Nakano, 2021a). Based on these considerations, we speculate that a face that resembles one’s own produces positive valence, such as being secure and attractive, which suppresses the amygdala’s activity and causes the face to be perceived more trustworthy.

Many psychological arguments explain a mechanism whereby self-resemblance produces positive valence. First, the frequency of exposure to an object induces positive affective response to that object; this is known as the mere exposure effect (Zajonc, 1968). Since we frequently view our own face in mirrors and pictures, the self-face’s mere exposure effect may increase our familiarity with the self-resemblance face and produce a positive bias toward it. However, in real life, we spend significantly more time looking at other people’s faces than our own. Therefore, the average face that balances the facial characteristics of many people has a generalized mere exposure effect (Rhodes et al., 2005) and should be perceived trustworthy. However, the current study revealed that the similarity to the average face did not have any effect on trustworthiness ratings. The mere exposure effect could not adequately explain self-resemblance’s impact on trustworthiness.

Another argument is that self-similarity by itself increases attractiveness, independent of familiarity. Byrne proposed this similarity-attract theory based on his findings that people who have similar attitudes and beliefs are highly likely to be attracted to each other (Byrne and Nelson, 1965). We are attracted to those who are similar to us in terms of physical characteristics, social class, and personality, as well. Moreover, incidental similarities, such as the same birth date, same first name, and similarity in fingerprints, increase prosocial behaviors (Burger et al., 2004). According to Byrne’s theory, we prefer people who are similar to us because we expect them to have similar thoughts and values and to feel secure when they are in our company. Since the sense of security is a particularly important consideration in trustworthiness assessments, our preference for people who resemble ourselves may increase our trust in such people.

In the field of evolutional psychology, similarity-based preferences are explained by Hamilton’s theory of inclusive fitness (Hamilton, 1964). Hamilton proposed that insects and animals exhibit numerous prosocial behaviors toward their kin to increase the survival chance of genes similar to their own. The theory also applies to human beings since they can detect kin using olfactory cues (Porter and Moore, 1981) and treat their kin preferentially in crisis situations (Shavit et al. 1994). Earlier studies using composite faces of self-face with another face reported that self-resemblance promotes parental investment (DeBruine and Lisa, 2004; Paltek et al., 2002) and altruistic behavior in human beings (DeBruine, 2002). They further proposed that human beings use facial resemblance as a kinship cue. Since one-half of the composite face comprised self-face, the appearance of the composite face probably closely resembled a kin’s face. However, the size of real-world human communities is so large that the likelihood of a stranger being one’s kin is overwhelmingly small. Therefore, if we decide to trust strangers based on whether they are our kin or not, we will end up trusting only a very few number of people because most of the people we meet are not our kin. In this respect, the current study presented many strangers’ faces of the same generation to raters, and the raters evaluated approximately half of these strangers to be trustworthy. Therefore, it is not likely that they judged trustworthiness based on the criterion of whether the faces were their kin or not.

Finally, a positive association between self-resemblance and trustworthiness was observed for faces of the same sex alone. This is because another psychological process may be involved in the trustworthiness assessment of faces of the opposite sex. Regarding mate selection, Winch proposed the complementary needs theory that people tend to prefer a mate whose needs are opposite and complementary to their own (Winch et al., 1954). This theory is applicable to the face preference of people of opposite sexes, as well. For instance, women’s masculinity preferences are stronger in cultures where poor health is particularly harmful to survival (DeBruine et al., 2010). Therefore, we suppose that the antagonism of psychological processes between self-similarity and self-dissimilarity preferences diminishes the impact of self-resemblance on the trustworthiness ratings for faces of the opposite sex. Irrespective of whether this inference is true, the current study provides a new insight how people flexibly switch strategies in their social judgment of a stranger depending on whether he or she belongs to the same or opposite sex.

In this study, we limited the dataset to students aged 19–24 years to eliminate factors such as race, age, and social status that might influence judgments of trustworthiness. As a result, we successfully demonstrated a significant correlation between facial similarity and trust level in a dataset of 200 people. However, because of the small number of data, it is unclear whether the association between facial similarity and trust level is a general phenomenon across ages. Further investigation is required in the future with a larger dataset that includes a wide range of age groups.

Conclusions

Using facial similarity estimated from state-of-the-art deep learning, in this study, we demonstrated that people tend to trust faces that are similar to their own. We observed particularly strong correlations in judging the trustworthiness of same-sex faces. We also demonstrated the importance of carefully selecting a neural network that more accurately represents relative facial similarity between persons. If we can predict a person’s perceived trustworthiness based on facial similarity using artificial intelligence, it may have a wide range of social applications, such as online P2P lending and the creation of trustworthy avatars.