Introduction

Over the last three decades, databases have proven their value in several domains, and psycholinguistic norms have been collected for a variety of stimuli: words (e.g., the CELEX Database of Baayen et al., 1993; Bird et al., 2001; Coltheart, 1981; Cortese & Fugett, 2004; Gilhooly & Logie, 1980a, 1980b; Morrison et al., 1997; Paivio et al., 1968; Spreen & Schulz, 1966; Stuart et al., 2003); pictures and objects (Alario & Ferrand, 1999; Carroll & White, 1973; Cycowicz et al., 1997; Masterson & Druks, 1998; Szekely et al., 2004); drawings (Snodgrass & Vanderwart, 1980; Vitkovitch & Tyrrell, 1995 but see Alario et al., 2004, for a review of object naming studies); and actions (Bonin et al., 2004; Cuetos & Alija, 2003; Fiez & Tranel, 1997; Masterson & Druks, 1998; Schwitter et al., 2004; Szekely et al., 2005).

Recently, databases of faces have emerged because many researchers use photographs of persons as stimuli to study diverse psychological phenomena: emotion expression identification (Ebner & Johnson, 2009); stereotyping and prejudice (Blair et al., 2004; Livingston & Brewer, 2002); recognition of faces partially occluded with masks (Carbon, 2020; Freud et al., 2020); or interpersonal attraction (Cloutier et al., 2008; Graziano et al., 1993). Some of the databases provide high-resolution standardized pictures of male and female faces, with different ages, ethnicity, and facial emotional expressions (e.g., 10k US Adult Faces Database - Bainbridge et al., 2013; The Chicago Face Database - Ma et al., 2015; FACES Database – Ebner et al., 2010; KDEF Database – Goeleven et al., 2008). These databases can also include physical facial measures (e.g., face size, lip thickness) and subjective attributes (e.g., attractiveness, trustworthiness). Other databases consist of computer-generated faces with several traits also characterized (e.g., competence, dominance, or threat).

Also, the number of databases with pictures of celebrities is steadily increasing and this emergence has spanned across different countries (France - Bonin et al., 2008; Spain - Marful et al., 2018; Italy - Bizzozero et al., 2007; Rizzo et al., 2002; England - Smith-Spark et al., 2006). This type of stimuli is essential because it allows us to study familiarity using human faces. When we think about familiar faces, we usually refer to our relatives, friends, or co-workers, but to use family and friends’ faces for each participant would require creating a unique set of stimuli each time the experiment is implemented, which would be undoable. Therefore, since highly familiar persons vary individually, celebrities’ pictures are vastly used as stimuli to emulate such a cohort of familiar persons.

Celebrities pictures have been widely applied as stimuli in studies of varied areas, with applications in forensic psychology (Greene & Fraser, 2002), neurosciences (Ishai et al., 2002), and cognitive psychology (Cleary & Specker, 2007). For example, these stimuli have been used to understand which facial characteristics are essential for facial identification (e.g., the presence or absence of eyebrows; Sadr et al., 2003).

In the human memory field, this type of stimuli has been used for decades (e.g., Greene & Hodges, 1996). More recently, celebrities’ pictures were used in destination memory procedures, a research line investigating the capacity to monitor to whom a person delivers a piece of information (Gopie et al., 2010; Gopie & MacLeod, 2009). The use of celebrities emulates people that we know and that are frequently whom we transmit information.

Several models were developed to understand face processing. Those that gather a broad acceptance regarding face processing (Burton et al., 1990; Valentine et al., 1996) acknowledge that seeing a familiar face, such as a celebrity photo, activates a previously stored representation at the face recognition unit (FRU) level. This specific level contains representations of familiar faces attributes. These representations are gathered independently of perceptual features that could hamper the recognition, such as the face position, lighting, or angle.

According to these models, there are four sets of units that work cumulatively to allow face recognition: face recognition units (FRUs), person identification nodes (PINs), semantic information units (SIUs), and lexical input. These models are based on face recognition daily, with the first stage that occurs being face familiarity. This happens at the PINs level, and if a PIN’s activation exceeds a given threshold, a face can be recognized without the ability to correctly naming it. Afterwards, the semantic information associated with the face becomes available (e.g., a person’s occupation); however, there is still an absence of name retrieval. According to this model, general semantic details are easier to access than people’s names, making this general information available earlier in the processing stream. This moment in the processing may cause the emergence of a tip-of-the-tongue (TOT) state for a person’s name. Sometimes, people have only access to information related to a target word they are trying to retrieve (e.g., Schwartz & Metcalfe, 2011). In the final stage, retrieval of the person’s name may finally occur.

When using faces of celebrities as stimuli, it is important to ensure that the photographs presented to the participants are recognizable. More importantly, the participants adequately name them to assure that familiarity with celebrities was attained. Despite this greater difficulty in face naming, Brédart (1993) has demonstrated a positive relationship between celebrity names’ rated familiarity and naming accuracy. So, even though face naming presents a higher challenge to the participants, we should request a naming task to assure celebrity familiarity.

In addition to the development of models that explain face recognition and face naming, some variables that influence these processes have also been identified: familiarity, distinctiveness, and age-of-acquisition. Familiar faces are recognized more accurately than unfamiliar faces (Klatzky & Forrest, 1984), and distinctive faces are better recognized than less distinctive or more typical ones (Knapp et al., 2006). Also observed was an accuracy advantage in face naming for early acquired over late-acquired famous faces (Smith-Spark & Moore, 2009).

With celebrities’ pictures as stimuli in such a wide variety of studies, the collection of normative data to depict familiar faces is fundamental. However, the selection of famous people is constrained by the geographic and socio-cultural contexts in which the studies are conducted. Although we can indeed find universally known celebrities, most of them are only famous in niche contexts. For example, only 2% of the famous people present in the British norms (Smith-Spark et al., 2006) appeared, later on, in the French norms (Bonin et al., 2008). This justifies the contribution that this work can bring to experimental studies with celebrities’ persons to be implemented in Portugal. With that objective in mind, data concerning the face recognition, naming, familiarity, facial distinctiveness, and AoA of 160 celebrities, Portuguese and international and male and female, were gathered from a sample of Portuguese young adults aged between 18 and 25 years old.

The data collection took place in two different studies. In study 1, participants were asked to recognize and name celebrity faces (e.g., Cristiano Ronaldo). In study 2, celebrity names were rated for AoA, familiarity, and distinctiveness. Also, possible relationships between these variables were analyzed and presented.

Study 1

The first phase in the construction of the celebrity database involves selecting famous people as stimuli. With that in mind, the first step was to determine which categories of fame (e.g., actors) were identified to be used. The authors chose a diversity of categories to encompass as many fields of fame as possible, which was also used in previous celebrity databases (Bonin et al., 2008; Smith-Spark et al., 2006). Ten categories were identified, both for national and international celebrities: actors, comedians, football coaches, sports players, athletes, TV hosts, musicians/singers, politicians, influential personalities, and royalty members. After searching the categories on the Internet and considering celebrities’ names most frequent in 2019, 160 celebrities were obtained. In study 1, the recognition and naming rates were collected for those 160 celebrities.

Method

Participants

This study involved 379 participants. Since these norms’ main aim is to use them with young adults’ studies, we defined age as a participants’ inclusion criterion considering the range between 18 and 25 years old. However, as the experiment was carried out online, a wide range of participants’ ages was obtained (18 to 62). Of the 379 participants collected, we eliminated 173, who were over 25 years old. The final sample integrated 206 participants (48 males, 156 females, and two others), aged between 18 and 25 years old (M = 22.29; SD = 2.11). The selected participants were randomly assigned into four groups, with each one recognizing and naming a distinct group of famous persons (see Table 1).

Table 1 Demographic characteristics of the groups of participants

Design

We applied a 2 ×2 within-subjects design, where each participant was required to recognize and name celebrities’ faces from both conditions: background (International and Portuguese) and sex (male and female). To avoid the influence of fatigue, each participant had to answer to one of the four lists, each corresponding to only 40 out of 160 faces. While performing the task, the participant engaged in two different answers: an identification decision, in which the participant simply answered if he recognized the face presented; and a naming response, in which the participant wrote the name of the celebrity showed. Our dependent variables are the correct recognition rate and naming rate of the celebrities.

Materials

The final set of pictures included in the dataset were researched and culled from various Internet resources. All were frontal pictures, measured 9 cm2, and were converted into black and white (when necessary) using Adobe Photoshop CC (Adobe Systems Incorporated, 2014). Given that context affects face recognition (Deffler et al., 2015) and our objective was to access the recognition of faces regardless of the context, the presentation of faces occurred in a background without any distinguishable features.

A total of 160 photos were organized according to two orthogonal categories: Portuguese or International; male and female. All photos are available at https://osf.io/rvc62/?view_only=1f1dd2371d7d4b548c2583e875dcc093.

Lastly, as mentioned before, to reduce the total time that the participants spent on the online response procedure, four sets of 40 photos each were created. Each set contained ten International female celebrities, ten International male, ten Portuguese female, and ten Portuguese male celebrities.

Procedure

Online or web surveys have been increasingly used in psychological research due to their advantages, such as low cost (see Couper, 2000 for a review). We created an online questionnaire in Qualtrics software (Qualtrics, 2015) since it allows direct access to many participants. The link to the questionnaire was placed on the social network Facebook, and an invitation with the questionnaire link was sent via e-mail to all of the students attending a university in the north of Portugal.

At the beginning of the experiment, the informed consent and the questionnaire’s instructions were presented to the participants. Afterwards, participants had to respond to two questions regarding each of the faces. The first question was a yes/no answer where the participant was questioned if he recognized the presented face. The second one was a text entry box in which the participant should write the name of the person presented. There was also an option to leave it blank and skip to the next stimuli if the participant recognized the celebrity but could not name it.

Each participant was randomly assigned to one of the four manipulated sets of photos and rated 40 out of 160 images. The 40 pictures of each set were presented in random order.

Analyses

To make sure that participants were engaged in the task, we decided to eliminate all the participants that did not recognize at least 50% of the Portuguese celebrities presented (corresponding to ten faces shown) and 50% of the International celebrities (ten faces), with 21 participants being eliminated.

Answers were considered wrong when the participants wrote the character’s name and not the actor’s name, for example, when Johnny Deep was addressed as Jack Sparrow (his character from the movie “Pirates of the Caribbean”).

Results and discussion

Study 1 aimed to provide norms regarding face recognition and face naming of a set of 160 celebrities (both Portuguese and International), supplying rates of recognition and naming tasks for each face. Descriptive statistics corresponding to the recognition rate and naming rate are presented in Table 2.

Table 2 Descriptive statistics for recognition and naming rate

The celebrity with the lowest recognition rate had an average was recognized by 39.22% of the participants, and the celebrity with the lowest naming rate had an average of 21.57%. There were celebrities with an average of 100%, both in the recognition and naming rates. Also, as expected, the faces’ average naming rate was lower than the average recognition rate. This result is in line with the face processing models (Burton et al., 1990; Valentine et al., 1996), who establish that face naming is more difficult than recognition.

Regarding recognition rate, 142 of the presented faces (N = 160) had a recognition rate between 80% and 100%, 13 between 60% and 80%, four between 40% and 60% and, only one face had a recognition rate between 20% and 40%. Considering naming rate, 80 of the presented faces had a naming rate between 80% and 100%, 55 between 60% and 80%, 19 faces between 40% and 60%, and only six faces had a recognition rate between 20% and 40%. The face naming rate and the face recognition rate by background (i.e., International or National) and sex is available in Table 3.

Table 3 Number of faces as a function of recognition rate by background and sex

Table 3 shows that 142 faces had a recognition rate between 80% and 100%, but only 80 faces had a naming rate between 80% and 100%, which emphasizes the higher level of difficulty in the naming task. The large number of faces accurately identified (up to 80%) turns this database an important asset to apply to paradigms that intend to use well-known faces.

Study 2

As discussed previously, it is essential to have control of several variables known to influence celebrity recognition. To accomplish it, each famous person presented in study 1 was also rated for AoA, familiarity, and distinctiveness in study 2. These variables were selected based on previous research and celebrities’ face norms (Bonin et al., 2008; Smith-Spark et al., 2006).

Method

Participants

This study involved 180 participants (37 males, 143 females; M = 22.68; SD = 2.42). As in study 1, participants above 25 or below 18 years were excluded (i.e., 37 participants). Participants were randomly assigned to one of four groups, with each one answering to 40 different celebrities. Participants were instructed to rate familiarity, facial distinctiveness, and AoA of each celebrity’s name presented. The demographic characteristics of the four groups are presented in Table 4.

Table 4 Demographic characteristics of the participants

Design

We applied a within-subjects design, in which the independent variables were the same as the ones used in study 1. To each participant, both Portuguese and International, as well as male and female celebrities, were presented. However, differently from study 1 in which we evaluate celebrity faces' recognition and naming, each participant answered three new measures previously considered in other celebrity faces databases (Bonin et al., 2008; Marful et al., 2018; Smith-Spark et al., 2006). Familiarity, expressing the number of times the participants heard about the celebrity throughout their life, was measured using a Likert scale that ranges from 1 – never to 7 – more than once every day. Facial distinctiveness, evaluating if celebrity facial features were easy or hard to recognize, was measured using a Likert scale ranging from 1 – typical, hard-to-spot face to 7 – distinctive, easy-to-spot face. Finally, AoA, where participants wrote, using a text entry box, the age, in years, in which they believed they had first become aware of each famous person.

Materials

The celebrities presented in study 2 were the same as the ones shown in study 1. However, instead of photos, a celebrity name was presented in each trial. As in study 1, to reduce the total time that the participants spent in their participation, four sets of 40 names were created. The distribution of the sets was the same as performed for study 1.

Procedure

The COVID-19 pandemic has made the use of online/web surveys even more pertinent, offering an alternative in which social contact is limited and the advantages mentioned before, such as the low cost of applying these research procedures (for a review see Couper, 2000). The online questionnaire was created in the Qualtrics software (Qualtrics, 2015). The participants' recruitment was implemented by sharing the questionnaire link on social networks and e-mail to all the students attending a university in the north of Portugal.

At the beginning of the study, the informed consent and the questionnaire’s instructions were presented to the participants. In each of the trials, participants were exposed to a celebrity name and had to answer three different questions: how many times in their lifetime they had heard, seen, read about, or otherwise been reminded of each of the celebrities (familiarity); how easy each celebrity would be to recognize from just his or her facial features (distinctiveness); and how old they were when they had first become aware of each famous person (AoA). When participants did not know the celebrity, they were instructed to leave the AoA question blank or merely write a “0” to signal that they did not recognize the celebrity’s name. Each participant was randomly assigned to one of the four lists of 40 celebrities’ names.

Results and discussion

Descriptive statistics regarding familiarity, facial distinctiveness, and AoA are presented in Table 5. In our study, the level of familiarity (M = 2.84) was slightly lower than other celebrities’ databases (Smith-Spark et al., 2006: M = 3.92; Bonin et al., 2008: M = 2.98; Marful et al., 2018: M = 3.16). On the other hand, the facial distinctiveness mean (M = 5.17) was higher than other similar databases (Smith-Spark et al., 2006: M = 3.67; Marful et al., 2018: M = 4.89). Nevertheless, our results are very similar to those presented by Bonin et al., 2008, which attained a mean of 5.41 in their study.

Table 5 Descriptive statistics for familiarity, facial distinctiveness, and AoA

Regarding AoA mean (M = 13.27), we also compared our values with other celebrities’ databases. Although we employed a different data collection procedure from the one used by Bonin et al. (2008), but equal to the data collection procedure used by Marful et al., 2018, our study means was similar to what was reported in the other databases, where the AoA means were around 12/13 years old.

It is also interesting to note that most faces were judged as distinctive (i.e., half of the celebrities had a facial distinctiveness rate above 5.30 on a Likert scale from 1 to 7). The mean ratings (i.e., recognition, naming, familiarity, distinctiveness, and AoA) for each celebrity are presented in Table 7.

To detect possible relationships between our dependent variables, we ran bivariate Pearson correlations considering recognition rate, naming rate, familiarity, facial distinctiveness, and AoA. The software used for the data analysis was JASP 0.11.1 (JASP Team, 2019). Table 6 shows a comparison of the results obtained in our study with the results of the same correlations obtained in other celebrities’ databases (Bonin et al., 2008; Marful et al., 2018; Smith-Spark et al., 2006).

Table 6 Bivariate (Pearson) correlations: Comparison with other celebrity databases
Table 7 Recognition rate, naming rate, familiarity, facial distinctiveness and AoA of 160 celebrities

In our study, we found that the recognition rate was positively and significantly correlated with familiarity, r(158) = .39, p < .001, naming rate, r(158) = .72, p < .001, and facial distinctiveness, r(158) = .54, p < .001. There was a higher recognition rate for higher familiar and distinctive faces. Additionally, participants who were capable of naming a celebrity answered “yes” in the recognition task. This result was expected since participants needed first to recognize a celebrity to name it posteriorly. Furthermore, a negative correlation was observed between recognition rate and AoA, r(158) = – .16, p = .05. This last result regarding recognition rate and AoA reveals that celebrities acquired early in life were better recognized than those later in life.

Again, and like other celebrity databases (Bonin et al., 2008; Marful et al., 2018; Smith-Spark et al., 2006), we did not find a significant correlation between familiarity and AoA, but found that familiarity (known as the subjective frequency in some studies) was positively correlated with facial distinctiveness, r(158) = .78, p < .001, and naming rate, r(158) = .62, p < .001. Celebrities who were more frequently encountered were perceived as more distinctive and more easily named. We also found that naming rate was positively correlated with facial distinctiveness, r(158) = .78, p < .001, and negatively with AoA, r(158) = – .31, p < .001, which is consistent with what was observed in the other databases mentioned. Celebrity acquired early, and distinctive celebrities were more easily named. Lastly, a significant negative correlation was found between AoA and facial distinctiveness, r(158) = – .311, p < .001. Famous people acquired early in life were considered more facially distinctive.

Two multiple regression analyses were also carried out in this study to determine which variables can predict recognition and naming rate. Firstly, a multiple regression was applied to face recognition rate as the dependent variable, and familiarity, AoA, and facial distinctiveness were considered as factors. Secondly, we carried out a multiple regression with the same factors but using face naming rate as the dependent variable. Results of the first multiple regression indicated that a significant collective effect was found between the familiarity, AoA, facial distinctiveness, and the recognition rate, F(3, 156) = 21.89, p < .001, R2 = .30, meaning that the three factors are indeed related to face recognition rate. However, upon further examination, only facial distinctiveness, t(156) = 5.51, p < .001, was a significant predictor in the model, meaning that facial distinctiveness is the only factor that can predict the face recognition rate performance.

In the second multiple regression analysis, we also found a collective significant effect between the familiarity, AoA, facial distinctiveness, and the naming rate, F(3, 156) = 80.33, p < .001, R2 = .61. Upon further examination, only facial distinctiveness , t(156) = 8.21, p < .001, was a significant predictor in the model.

General discussion

The data reported in this article will allow researchers to select highly recognizable stimuli to Portuguese young adults. This will facilitate the ease with which the listed stimuli can be matched on five important attributes. The database provides ratings of AoA, familiarity, facial distinctiveness, recognition rate, and naming rate for each celebrity.

These norms can be expected to stand as a valuable tool for different research areas, since celebrities’ pictures have been widely applied as stimuli in studies of varied areas, including human memory (Gopie & MacLeod, 2009). Considering an applied context, this database can be used in tests for assessment of patients with traumatic brain injury, aphasias, amnesia, and/or dementia – tests of famous face naming are particularly useful in the early detection of some diseases since these patients typically present sensitivity for names at the onset of the disease (i.e., have greater difficulty in the naming of familiar faces; Semenza et al., 2000). However, when performing tests with different age groups and/or psychiatric populations, validation should first be applied using this set of face images (Rizzo et al., 2002).

Although this study adopts an extensive set of celebrities from a wide range of categories (e.g., actors, comedians, athletes, politicians, among others), there are some limitations associated with databases of faces that use a set of celebrities. As mentioned before, the first limitation is one of the drives for this study: celebrity norms are highly constrained by geographic context (Marful et al., 2018) because some celebrities are famous only in particular countries. Knowledge and fame of celebrities shift between countries, which on the one hand justifies the necessity of databases appropriated for each population, but on the other hand limits the usage of this database to studies with Portuguese participants. So, this database will help studies be implemented in Portugal since it was validated for usage with this population. However, the international celebrity pictures used in this study (and made available through the OSF) could be posteriorly validated for other nations, giving this current database the potentiality of a universal application, as long as the validation is carried thoroughly.

Nevertheless, as stated before, and despite the relevance of familiar faces for psychological research, there are no normative studies for celebrities in Portugal, which explains the relevance and importance of this database. It is also important to underline that this database should be used with young adults since our sample was constituted of participants with an age interval of 18 to 25 years old. However, this limitation does not weaken this database, as most studies are conducted at universities and, therefore, with young adults. In future research, a database where data gathering is achieved using older adults could allow its application in varied age groups.

Nevertheless, it provides a validated database of celebrity faces for use in the Portuguese population and marks an important first step in standardizing procedures that use this type of stimuli. Lastly, it provides a set of pictures to be used in future studies that aim to add those crucial measures to the data provided in this study.