Evolutionary theory has generated numerous productive research programs in human psychology (Buss, 2015), including the study of individual differences (Buss & Penke, 2015). Most research in evolutionary personality psychology has taken, as its starting point, the well-established Five Factor Model (FFM; (McCrae & John, 1992) or HEXACO (Lee & Ashton, 2004) structure. More recent proposals (Lukaszewski et al., 2020) have called for focusing on domain-specific psychological adaptations (e.g., anger, shame, jealousy), leaving as an open question the variable patterns of covariation among narrow response tendencies that comprise the broad dimensions typically described as personality traits. Other evolutionarily-oriented researchers have gone in the opposite direction, exploring the adaptive significance and explanatory power of higher-order personality dimensions (Figueredo et al., 2007; Rushton et al., 2008).Footnote 1 In previous research, we have drawn upon all of these evolutionary approaches (Chua et al., 2016; Lukaszewski, 2015; Manson, 2015). Here, we test adaptationist hypotheses linking early life experiences to the general factor of personality (GFP) and to meta-trait alpha.

The GFP construct can be traced back to the work of Webb (1915). A general personality factor, based on shared variance among the dimensions of the FFM/Big Five, was explored empirically by Figueredo et al. (2004). Based on factor analysis of three samples, Musek (2007) argued that the GFP is analogous to the general factor of intelligence (g), and that it reflects a psychobiological disposition that manifests as high positive affect, satisfaction with life, and self-esteem, and low levels of negative affect. Twenty-first century research on the GFP construct built on earlier work (Digman, 1997) that suggested the existence of two higher order personality factors, Stability or metatrait alpha (with positive loadings of agreeableness, conscientiousness, and emotional stability) and Plasticity or metatrait beta (with positive loadings of extraversion and openness). As described in more detail below, evolutionarily informed research on the GFP focused on its potential as in indicator of human life history strategy (LHS) variation. However, debate continues regarding the validity and interpretation of the GFP construct. The goal of the present study is to contribute to this debate in two important ways. First, we test the validity of the GFP using two approaches to measure personality traits, self-report and other-ratings by strangers from video “thin slices” (Ambady & Rosenthal, 1992). Second, we examine whether the GFP can be interpreted as social effectiveness in the service of a slow LHS, by testing whether development of the GFP (and its components) is facultatively calibrated to childhood environmental harshness.

What Does the General Factor of Personality Mean?

Interpreted substantively from an evolutionary perspective, the GFP has been described as a product of natural selection for the ability to navigate the unique ultra-social environment of our species (Figueredo et al., 2004; Musek, 2017; Rushton et al., 2008; van der Linden et al., 2016). With respect to the Big Five, the central claim is that individuals higher in extraversion, openness, agreeableness, conscientiousness, and emotional stability are more socially effective (Dunkel & van der Linden, 2014; van der Linden et al., 2016), and therefore more desirable as mates, friends, and leaders. Supporting this hypothesis, research has found associations between the GFP and criterion outcomes including the popularity and likeability of adolescents (van der Linden et al., 2010), several aspects of job performance (Pelt et al., 2017), leadership outcomes (Do & Minbashian, 2020; Wu et al., 2020), higher relationship quality as reflected in daily social experience (Pelt et al., 2020) and (negatively) criminal behavior (Watters et al., 2020). Even among the forager-horticulturalist Tsimane’ people, a GFP extracted from self- and spouse-ratings was positively associated with interviewer-rated social engagement (van der Linden et al., 2018). Furthermore, the GFP overlaps substantially with trait emotional intelligence (EI), and moderately with ability EI (van der Linden et al., 2017). Taken together, these results support the hypothesis that the GFP taps a capacity that is central to human life.

If this view is correct, what accounts for persistent variation in the GFP? An evolutionary-informed theoretical interpretation of the GFP construes it as an indicator of variation in human LHS (Dunkel & Decker, 2010; Figueredo et al., 2004, 2007; Rushton et al., 2008). A mid-level theory from evolutionary biology (MacArthur & Wilson, 1967; Pianka, 1970), life history theory (LHT) draws attention to the trade-offs that organisms face in allocating limited energy among the competing demands of growth, reproduction (including mating and parenting), and bodily maintenance and repair. The basic model of human LHS variation (Del Giudice, 2018) posits a unidimensional continuum between “slower” and “faster” strategies (Del Giudice, 2020). A slower LHS prioritizes somatic effort (i.e., investment in future reproduction) over reproductive effort, parental effort over mating effort, and quality of offspring over quantity of offspring, whereas a faster LHS prioritizes the opposite. Humans depend strongly on unrelated conspecifics as sources of both material and informational resources (Hill et al., 2009; Kaplan et al., 2000). Furthermore, people’s investments in social relationships often entail incurring short-term costs in pursuit of long-term benefits (Gurven et al., 2000). The trade-offs entailed by the “ultra-social” human niche (Hill et al., 2009) generate individual differences in cooperativeness. Indeed, anti-social personality configurations, such as psychopathy, may actually be biologically adaptive under some circumstances (Glenn et al., 2011). Therefore, the GFP, conceptualized as social effectiveness, is a plausible slow LHS indicator.

According to models of the ontogenetic calibration of human LHS (Belsky et al., 1991; Ellis et al., 2009), harsh childhood environments (indexed by, for example, father-absence and neighborhood violence) promote the development of faster LHS, because they provide reliable cues of high rates of extrinsic mortality. Specifically, evolved developmental programs are expected to be sensitive to environmental indicators that the marginal return on investment in increased long-term survival is likely to be low, for reasons outside the individual’s control. An alternative line of theoretical reasoning holds that harsh early environments tend to damage individuals’ health, increasing their age-specific risk of death, and thereby increasing the relative advantages of a faster LHS (Nettle et al., 2013). Both external and internal predictive adaptive responses have been found to influence fast LH-associated risky and aggressive behavior (Ellis et al., 2020).

Critiques of the GFP: Construct Validity

Psychologists have disputed the GFP’s reality as a psychological construct on several grounds. According to the most common critique, the GFP is a methodological artifact resulting from social desirability bias in self-report responding (Anusic et al., 2009; Bäckström et al., 2009; Chang et al., 2012; but see Dunkel et al., 2016). However, statistically controlling for social desirability bias does not eliminate the correlations among the Big Five traits (reviewed by Musek, 2017).

Another critique of the GFP points to the way in which it is extracted. The GFP is often estimated as the first unrotated factor of a set of personality measures, yet it is mathematically possible to extract a substantial general factor even when the underlying measures do not share a large portion of their variance (Revelle & Wilt, 2013). When narrower personality facets are allowed to cross-load on more than one major personality dimension, higher-order factors (the GFP, the Big Two; Digman, 1997) do not emerge (Ashton et al., 2009). To the extent that the GFP is a valid construct, it should be extractable from any valid, comprehensive set of personality dimensions (Rushton & Irwing, 2011). However, regarding the theoretically and empirically well-grounded HEXACO structure, inconsistent results have emerged, with some finding support for the GFP (Rushton & Irwing, 2011; Veselka et al., 2009) and others not finding support (de Vries, 2011). Considering the overall body of theoretical arguments and empirical results, the debate regarding the construct validity of the GFP remains unresolved.

Critiques of the GFP as a Life History Strategy Indicator

With respect to the Big Five personality dimensions, and their expected associations with LHS, there are firm theoretical and empirical grounds for inferring that conscientiousness, agreeableness and emotional stability (comprising the metatrait alpha or Stability in Digman’s (1997) “Big Two”) are indicators of a slower LHS. However, the picture is less clear with regard to extraversion and openness (de Vries et al., 2016; Del Giudice, 2014, 2018), which comprise the beta or Plasticity metatrait. Conscientiousness can be defined as self-control in the pursuit of long-term goals (Nettle, 2006), and Agreeableness as a propensity toward altruistic behavior (Denissen & Penke, 2008). Both these traits have straightforward relevance to human slow LHS. Low levels of emotional stability may reflect low somatic effort devoted to mental health (Figueredo et al., 2004, 2007), and some evidence indicates that emotional stability is negatively related to short-term mating orientation (Banai & Pavela, 2015) or positively related to long-term mating orientation (Holtzman & Strube, 2013b). However, one of the foundational formulations of the LHS-based approach to human individual differences (Rushton, 1985) proposed that higher levels of extraversion are associated with a faster LHS. Research has shown that extraversion is sometimes positively associated with preference for short-term mating (Holtzman & Strube, 2013a; Schmitt & Shackelford, 2008; Simpson & Gangestad, 1991; Wright & Reise, 1997), and sometimes unrelated to it (Bourdage et al., 2007; Manson, 2015; Strouts et al., 2016), but never negatively associated with it. Del Giudice (2014, 2018) has argued that the interpersonal warmth and gregariousness facets of extraversion are slow LHS indicators whereas the excitement-seeking and dominance-striving facets are fast LHS indicators and the intellect facet of openness is a slow LHS indicator whereas the imagination facet is a fast LHS indicator. Manson (2017) found some support for these predictions.

More generally, metatraits alpha and beta are on firmer empirical ground than the GFP, because of their cross-cultural linguistic generality (Saucier et al., 2014) and their demonstrated associations with neurobiological and motivational processes. Metatrait alpha is associated with serotonergic function, while metatrait beta is associated with dopaminergic function (DeYoung, 2010). The cybernetic function of metratrait alpha can be summarized as the protection of goals, interpretations, and strategies from disruption by impulses, while the cybernetic function of metatrait beta can be summarized as the creation of new goals, interpretations, and strategies (DeYoung, 2015). Based on the argument that the basic or core features of a human slow LH consist of high levels of affiliation, cooperation, and preference for long-term mateships, and low levels of impulsivity, risk-taking, and sensation-seeking (Del Giudice, 2018), metatrait alpha is a plausible slow LH indicator. In contrast, high levels of extraversion and openness can be components of narrower subtypes of both slow and fast LHS (Del Giudice, 2018).

The Present Study: Evaluating the Validity of the General Factor of Personality

The present study aims to evaluate the validity of the GFP using two different forms of measurement: self-reported personality data and other-ratings by strangers. Our goal is to elucidate discrepancies from multi-rater assessments of the GFP, while simultaneously testing the role that the GFP plays in the ontogenetic calibration of human LHS. That is, if harsh childhood environments promote faster LHS, and if the GFP is an indicator of human LHS, then harsher childhood environments will be associated with lower levels of the GFP in young adults. Supporting this hypothesis using a genetically sensitive research design, Dunkel et al., (2018a, 2018b) found that, within monozygotic twin pairs, the twin that reported receiving more parental affection tended significantly to score higher on the GFP. Controlling for the GFP, parental affection remained significantly positively associated with metatrait alpha, whereas its associations with beta turned negative. In a replication, Dunkel, van der Linden et al. (2018), found that the relationship between recalled parental affection and the GFP was found only for MZ twins who were raised together rather than apart, suggesting that relative (intra-familial) parental affection received, rather than absolute level, drives this effect.

When self-report instruments are used to measure both present-day personality and recalled childhood experience, results that apparently support this hypothesis could result spuriously from the common effects of socially desirable response bias. In other words, people who view their own personality through rose-colored spectacles might do the same with respect to their childhood experiences. We used two analytic methods to address this problem. First, we measured socially desirable response with a validated instrument, and included it as a covariate in predictive models to address critiques that questions whether the GFP is simply a by-product of social desirability. Second, we measured personality traits using both self-report and other-ratings by strangers from video “thin slices” (Ambady & Rosenthal, 1992). Stranger-ratings are an appropriate method, because psychometrically assessed LHS (Dunkel et al., 2016) and at least some major dimensions of personality (Borkenau & Liebler, 1995; Carney et al., 2007) are discernible by raters from thin slices (Ambady & Rosenthal, 1992). Indeed, some evidence indicates that stranger-rated Big Five traits have greater criterion validity than self-ratings (Connelly & Ones, 2010; Oh et al., 2010). However, the criterion accuracy of trait judgments increases with length of acquaintance (Paulhus & Bruce, 1992). In a meta-analysis, Gnambs (2013) found that a GFP could be identified from ratings by short-term acquaintances, but not from ratings by longer-term acquaintances, suggesting that the former is an artifact, reflecting normative ratings of an average individual rather than valid ratings of any particular individual. A study that incorporated both self- and peer-ratings (Danay & Ziegler, 2011) found that either method alone extracted a GFP, but that a multirater nested model failed to extract a GFP. Riemann and Kandler (2010) also found no support for the GFP in their multimethod data. Based on a meta-analytic multitrait-multimethod approach, Chang et al. (2012) reported that correlations among the Big Five traits are largely attributable to artifactual common method variance. However, Rushton et al. (2009) found support for the GFP in their multitrait-multimethod data, as did van der Linden et al.’s (2018) study of the Tsimane found support for the GFP in their multitrait-multimethod data, which incorporated both self- and spouse-ratings. In summary, results have been mixed with respect to whether the GFP is robust to multi-rater assessment.

Hypotheses Tested

Among the hypotheses tested in this study, several are pre-registered at https://osf.io/82kpj/:

H1

Cues of environmental harshness during early development lead to a faster LHS, as indexed psychometrically by a lower level of the GFP (with positive loadings of extraversion, conscientiousness, agreeableness, openness, and emotional stability).

H1a:

This effect is robust to whether the Big Five traits are measured by self- or stranger-report.

H2

(alternative to H1): Of the Big Five traits, only conscientiousness, agreeableness and emotional stability cohere, as metatrait alpha (Digman, 1997), reflecting a LH factor that is sensitive to early environmental harshness.

Another one of our pre-registered hypotheses was that the effects of harsh childhood environments on GFP would be mediated through developmental timing. However, analyses described by Chua et al. (2020) failed to find evidence that developmental timing mediated associations between harsh early environments and a latent factor comprised of a set of hypothesized psychometric LH indicators (paranoia, cyclothymic tendencies, trust, sociosexual orientation, and temporal discounting). We therefore, dropped this hypothesis from the present study.

To reiterate, the present study aims to contribute, in two ways, to unresolved debates regarding the adaptive significance of personality variation. First, we evaluate the validity of the GFP using multi-rater assessments to determine whether different methods of extracting the GFP produce consistent results. Second, we examine whether two GFPs (self- and stranger-rated) covary with harshness of childhood environments, as predicted by the hypothesis that the GFP is a LHS indicator.

Method

Participants

Participants were undergraduates at Oklahoma State University who received course credit for participating. Of the 386 individuals who were recruited, data from 20 were discarded because of technical errors. These include the 16 cases reported as discarded by Chua et al. (2020) plus four other participants for whom the videotaped interview was missing. The sample was 55.2% female, with a mean (± SD) age of 19.4 ± 1.8. The sample’s ethnic composition (based on self-identification) was 70.2% White, 8.4% Native American, 6.0% African-American, 5.7% Latino, 3.8% Asian-American, and 5.7% multi-racial or “other.” All materials and procedures, except for the video-based personality scoring were approved by Oklahoma State University’s institutional review board (Approval #AS14132). The video-based personality scoring was approved by UCLA’s IRB (Approval #17-001766).

Measures

The measures described below comprise a subset of the complete set of measures used with this sample. A list of all measurements is available from the authors on request.

Neighborhood Stress

The City Stress Inventory (CSI; Ewart & Suchday, 2002) was assessed retrospectively for two timepoints, early childhood (0–7 years) and adolescence (13–18 years), to capture any potential changes in environment during the lifespan. CSI is broken into two subscales, neighborhood disorder (11 items) and exposure to violence (7 items). The former assesses how often individuals experienced various types of disorder within their neighborhood (e.g., “I saw strangers who were drunk or high hanging out near my home”). The latter assesses how often individuals experience specific types of violence within their neighborhoods (e.g., “A family member was attacked or beaten”). Both subscales were anchored on a 4-point scale (1 = “never” to 4 = “often”) with higher values indicating higher rates of occurrences. Items from both subscales were summed into a single score. The two timepoints were highly correlated (Chua et al., 2020) and were therefore collapsed to create a single composite for neighborhood stress.

Father Closeness

Levels of father closeness were assessed retrospectively at three timepoints; early childhood (0–7 years), adolescence (13–18 years), and current adulthood (present) using a single-question item. This item was modified for consistency purposes at each timepoint (e.g., “How close were you to your father (or other father-figure) when you were 0–7 years old?”). This item was measured on a bipolar 7-point scale (1 = “not very close” to 7 = “very close”) and highly correlated across the three timepoints (Chua et al., 2020). We therefore collapsed these into a single composite father closeness measure. Higher values indicated higher levels of closeness to fathers.

Self-rated Big Five dimensions

Participants completed the 50-item Big Five inventory from the International Personality Item Pool (https://ipip.ori.org/newBigFive5broadKey.htm). Responses to these items are made on a five-point scale. Each Big Five dimension is tapped by 10 items. Twenty-four of the 50 items are reverse-keyed.

Socially Desirable Response Bias

Participants completed the Crowne–Marlowe Social Desirability Scale (Crowne & Marlowe, 1960), a 33-item instrument. The items ask for a True or False response to statements that are “too good to be true,” e.g. “I’m always willing to admit it when I make a mistake.” Fifteen items are reverse-keyed. Previous research (e.g. Dunkel et al., 2016) has used this instrument to assess the extent to which variation in socially desirable response bias accounts for variation in the GFP. Figueredo et al. (2005), testing hypotheses about life history strategy and higher-order personality dimensions, controlled for social desirability using a similar instrument, the Lie Scale of the Eysenck Personality Questionnaire (Eysenck & Eysenck, 1975).

Stranger-Rated (Thin Slice) Big Five Dimensions

The same 50-item Big Five IPIP inventory that the participants had used for self-rating was used by raters to describe their personalities based on brief videotaped interviews.

Procedure

Participants

Participants were recruited for a single laboratory session. First, each participant completed a battery of questionnaires on a laboratory computer. Next, participants were directed into a separate room where standardized photos were taken. Participants were then interviewed by trained research assistants. Interview durations averaged 131 s (SD = 32 s, range 35–307 s). Participants were asked to describe themselves up to two minutes and answered inquiries about the participant’s hometown, major subject, post-graduation plans, and hobbies. During interviews, participants were videotaped using a Canon Rebel SL1/EOS (Model # 8575B003). Interviewers were not in the frame of the video, although their questions were clearly audible.

Raters

Each interview video was viewed by four raters (undergraduates enrolled at UCLA), who were instructed to describe the participant’s personality using the same 50-item instrument that the participants had used for self-report. Raters were instructed to make their best guesses regarding each participant’s overall behavioral, emotional, and cognitive patterns, not just their behavior during the interviews themselves. A total of nine raters contributed ratings (M = 165 videos per rater, range 21–350).

Data Analysis

The video-based personality ratings had a crossed structure (each rater rated a subset of the participants where each participant was rated by a subset of the raters and these sets overlapped to varying degrees). We therefore assessed inter-rater reliability for each Big Five dimension (averaged across its constituent items) by calculating intraclass correlations (ICC[3,1] and ICC[3,k]) (Shrout & Fleiss, 1979). ICC[3,1] yields the interrater reliability of individual ratings, whereas ICC[3,k] (k = number of raters) yields the interrater reliability of mean ratings (here, each participant’s mean rating across raters). ICC[3,k] coefficients greater than 0.75 are considered “excellent,” whereas coefficients from 0.60 to 0.74 are considered “good” (Cicchetti, 1994). Each participant’s score on each stranger-rated Big Five dimension was calculated as the mean score across that participant’s four raters.

We used structural equation modeling (SEM) in Mplus 8.4 (Muthén & Muthén, 2019) to test hypotheses H1 and H2. First, we ran a bifactor measurement model (Reise, 2012) in which the self-report and stranger-rated Big Five dimensions comprised the scored personality dimensions, which totaled ten items. The two group factors were Self-Report and Stranger-Rating and the general factor was the GFP. The goal of this analysis was to isolate the two method factors from the GFP. This model failed to converge.

Therefore, we ran separate measurement models of the self-reported and stranger-rated GFP. The fit of these models was assessed using the following indices: Chi-Squared statistic, comparative fit index (CFI), Tucker-Lewis index (TLI), and root mean square error of approximation (RMSEA). Our next step, as planned, was to run separate SEM models assessing the associations between the predictors (neighborhood stress and father closeness) and each GFP (self-reported and stranger-rated) controlling for socially desirable response bias.

To test hypothesis H2 (that only metatrait alpha; Digman, 1997), but not the GFP, comprises a LHS personality indicator that is affected by environmental harshness during early development), we first ran measurement models consisting of metatait alpha indicators separately for the self-reported and stranger-rated Big Five traits. We treated that higher-order factor as an outcome variable to be predicted by childhood neighborhood stress and father closeness. H2 predicts that the higher-order factor comprised of conscientiousness, agreeableness and emotional stability (i.e. metatrait alpha) will be related to childhood neighborhood stress and father closeness.

Results

Reliability and Descriptive Statistics

Table 1 contains descriptive statistics (mean, SD, Cronbach’s alpha) for each self-reported and stranger-rated Big Five dimensions, and for the Crowne–Marlowe Social Desirability Scale. We report descriptive statistics for the entire sample, and also stratified by sex. Table 2 shows ICC[3,1] and ICC[3,9] coefficients for the stranger-rated Big Five dimensions. Only emotional stability showed somewhat less than adequate inter-rater reliability, with the 95% CI of ICC[3,9] including values below the “good” range.

Table 1 Descriptive statistics of social desirability response bias and Big Five Measures
Table 2 Inter-rater reliability (intraclass correlation) of stranger-rated Big Five Dimensions

Bivariate Correlations

Table 3 shows correlations among father closeness, childhood neighborhood stress, socially desirable response bias, and self-reported and stranger-rated Big Five dimensions. For four of the five personality dimensions, the correlation between self-reported and stranger-rating was significant at p < 0.01, and this correlation was greater than that of any other self-reported dimension with that stranger-rated dimension, and vice-versa. For example, the correlation of self-reported and stranger-rated agreeableness was r = 0.25. No other self-reported dimension was as strongly correlated with stranger-rated agreeableness, and no other stranger-rated dimension was as strongly correlated with self-reported agreeableness. The exceptional dimension was emotional stability, for which the self-stranger rating correlation was near zero. In contrast, stranger-rated emotional stability was strongly positively correlated with self-reported extraversion. Social desirability response bias was positively related to self-reported emotional stability, agreeableness, and conscientiousness. Table 4 shows these correlations stratified by sex.

Table 3 Correlations among childhood environment measures, social desirability response bias, and self-reported and stranger-rated Big Five dimensions
Table 4 Correlations among childhood environment measures, social desirability response bias, and self-reported and stranger-rated Big Five dimensions, stratified by sex

Measurement Models of Self-reported and Stranger-Rated GFP

A GFP comprised of the self-reported Big Five dimensions showed good model fit (χ2(5) = 5.82, p = 0.32; CFI = 0.99; TLI = 0.97, RMSEA = 0.02). All Big Five dimensions except conscientiousness loaded significantly positively on the GFP. In contrast, a GFP comprised of the stranger-rated Big Five dimensions showed poor model fit (χ2(5) = 52.52, p < 0.001; CFI = 0.78; TLI = 0.56, RMSEA = 0.16). Examination of modification indices showed that the greatest change in χ2 would result from including an error covariance between stranger-rated extraversion and stranger-rated emotional stability. Adding this covariance to the model still did not yield acceptable model fit (χ2(4) = 30.50, p < 0.001; CFI = 0.88; TLI = 0.70, RMSEA = 0.14). Finally, a four-factor stranger-rated model, dropping the emotional stability dimension (which showed somewhat low inter-rater reliability), also showed poor fit (χ2(2) = 88.85, p < 0.001; CFI = 0.57; TLI = − 0.30, RMSEA = 0.35).

We also tried to fit the stranger-rated data to a two-factor model in which agreeableness, conscientiousness, and emotional stability loaded on metatrait alpha, while extraversion and openness loaded on metatrait beta, and there was no GFP. This model produced a Heywood case, or a negative variance estimate for metatrait alpha, even after adding the error covariance between extraversion and emotional stability.

Because the stranger-rated GFP showed poor model fit, we calculated an alternative stranger-rated GFP score by using the weights of each Big Five dimension on the GFP as reported in van der Linden et al.’s (2010) meta-analysis of 212 samples of Big Five intercorrelation matrices (ES = 0.62, E = 0.57, O = 0.42, A = 0.57, C = 0.63). These weights were multiplied by the raw trait scores, summed, and standardized to create a single stranger-rated GFP composite score. For completeness, the self-reported GFP score was calculated in the same way. We found that self-reported GFP and stranger-rated GFP were significantly positively correlated, r = 0.27, p < 0.001.

Tests of Models Linking Childhood Conditions to the GFP

Self-reported GFP

Because the self-reported Big Five dimensions showed good model fit as a GFP, we added paths from the hypothesized predictors (neighborhood stress and father closeness), controlling for social desirability, to the latent GFP. However, we found that including socially desirable response bias actually diminished model fit (χ2(17) = 50.70, p < 0.001; CFI = 0.78; TLI = 0.67, RMSEA = 0.08). For this reason, we no longer included socially desirable response bias in this or the subsequent models.

Rather, we ran a SEM excluding social desirability (Fig. 1). The overall model showed excellent fit (χ2(13) = 13.73, p = 0.39; CFI = 0.99; TLI = 0.98, RMSEA = 0.01). Statistically significant relationships were found between the predictor and the GFP, such that participants who were closer to their fathers had higher GFP scores (path coefficient = 0.21, p = 0.01, 95% CI [0.05, 0.36]). Neighborhood stress did not contribute to the model (path coefficient = − 0.16, p = 0.11, 95% CI [− 0.35, 0.04]). These results were not sensitive to how the self-reported GFP was calculated. Specifically, similar coefficients were obtained when we ran the SEM using scores calculated from the weights of each Big Five dimension on the GFP as reported in van der Linden et al.’s (2010) meta-analysis. We also conducted an invariance test, using multi-group SEM analysis and found that the model varies by sex, χ2 (2) = 16.90, p < 0.05 (Figs. 2 and 3).

Fig. 1
figure 1

Cues of environmental harshness during early development and association with GFP (Both Sexes). Note * p < 0.05, **p < 0.001. Dashed lines indicate paths that are not statistically significant

Fig. 2
figure 2

Cues of environmental harshness during early development and association with GFP (Males Only). Note * p < 0.05, **p < 0.001. Dashed lines indicate paths that are not statistically significant

Fig. 3
figure 3

Cues of environmental harshness during early development and association with GFP (Females Only). Note * p < 0.05, **p < 0.001. Dashed lines indicate paths that are not statistically significant

Stranger-rated GFP

Because stranger-rated GFP was calculated from the Big Five weights reported in van der Linden et al.’s (2010) meta-analysis, multiple regression was used. Stranger-rated GFP was uncorrelated with socially desirable response bias (r = 0.03, p = 0.57). Regressing stranger-rated GFP on father closeness and neighborhood stress revealed that neither predictor variable was associated with stranger-rated GFP (father closeness: β = 0.08, p = 0.18, 95% CI [− 0.15, 0.07]; neighborhood stress: β = 0.01, p = 0.85, 95% CI[− 0.01, 0.29]).

Tests of Models Linking Childhood Conditions to Three-Factor Higher-Order Factors

We used both the self-report and stranger-rated Big Five dimensions to test H2: that only emotional stability, agreeableness, and conscientiousness cohere as an LH factor, metatrait alpha or Stability (Digman, 1997), that is affected by environmental harshness during early development. We ran SEM models for both self-reported and stranger-rated metatrait alpha. Self-reported metatrait alpha had good model fit (χ2(4) = 1.28, p = 0.86; CFI = 1.00; TLI = 1.25, RMSEA = 0.00). Only emotional stability and agreeableness loaded significantly on self-reported metatrait alpha and was significantly associated in the predicted directions with both father closeness (path coefficient = − 0.27, p = 0.02, 95% CI [− 0.50, − 0.05]) and neighborhood stress (path coefficient = 0.24, p = 0.03, 95% CI [0.03, 0.44]). Stranger-rated metatrait alpha had poor model fit and was not considered further. We were unable to carry out the invariance test because a Heywood case emerged when running the constrained model. For visual purposes, we stratified the model by sex (see Supplementary Materials).

Discussion

In this study, we addressed three questions. First, is the General Factor of Personality robust to multi-rater assessment? Second, is the GFP related to childhood environmental harshness as predicted by the hypothesis that it is a LH indicator? Third, is the Big Two metatrait alpha, with positive loadings of agreeableness, conscientiousness, and emotional stability, a better LHS indicator than the GFP?

Multi-rater Measurement and the GFP

Although others have found that a self-rated GFP correlates with expected stranger-rated criterion variables (e.g., Do & Minbashian, 2020; Pelt et al., 2017; van der Linden et al., 2010), research has produced conflicting results regarding whether the GFP is robust to multi-rater assessment (Chang et al., 2012; Danay & Ziegler, 2011; Gnambs, 2013; Riemann & Kandler, 2010; Rushton et al., 2009; van der Linden et al., 2018). Our initial analytic strategy was to isolate the GFP from the two method factors (i.e., self-report and “thin slice” stranger-rating) by fitting a bi-factor model. However, this model failed to converge. Running separate models for the two methods, we found that self-report data generated a well-fitting GFP, whereas stranger-rating data did not.

Including social desirability bias as a predictor of self-reported GFP drastically reduced model fit. One likely reason for this result is that our measure of social desirability bias is problematic, both methodologically and theoretically. Researchers have raised serious questions about the MCSD’s dimensionality and overall psychometric properties (Leite & Beretvas, 2005). Furthermore, the MCSD has better reliability for women than for men (Beretvas et al., 2002; Leite & Beretvas, 2005). Theoretically, the challenges involved in isolating social desirability bias from substantive personality variation are probably more complex than originally thought, implying that such analyses should be undertaken only with caution (Leite & Beretvas, 2005).

Why did our stranger-rated personality ratings fail to yield a well-fitting GFP? First, stranger-rated emotional stability showed both low inter-rater reliability and no correlation with self-rated emotional stability. These findings are consistent with previous research (Carney et al., 2007; Vazire, 2010). Valid cues of anxiety, depression, and negative affect more generally are rarely detectable from people’s self-presentations to strangers or casual acquaintances. However, even after excluding emotional stability from the stranger-rated GFP, our data failed to reveal a well-fitting single latent variable. Our results contrast with the meta-analytic results reported by Gnambs (2013), who found that the amount of variation in stranger-rated Big Five traits that was explained by the GFP was inversely related to the length of acquaintance between rater and target. He attributed this pattern to short-term acquaintance ratings being more influenced than long-term acquaintance ratings by normative ratings of an average individual, rather than valid ratings of any particular individual. Interestingly, these thin slice assessments, which can arguably be characterized as a short-term acquaintance, may have resulted from raters not having enough contextual information to accurately assess participants’ personality traits. These results call into question how short-term acquaintances are defined and the validity of stranger-rated measures of personality. Future studies on personality can adopt a similar method (Arslan et al., 2020) where participants explain the rationale behind their answers, providing raters contextual information needed to make more accurate judgements.

Furthermore, latent factor models assume “local independence,” that is, the observed indicators of a latent factor are assumed to be statistically independent from one another on a pairwise basis, such that all their shared variance is attributable to the latent factor itself (Cramer et al., 2010). Because the stranger-rated Big Five factors were substantially correlated with each other, more so than the self-reported Big Five factors, it is plausible that inter-factor correlations in the stranger-rated data may have contributed to the inability to produce a GFP structure.

Our inability to reproduce a GFP structure from stranger-ratings is consistent with published literature (Danay & Ziegler, 2011; Riemann & Kandler, 2010) that suggests skepticism regarding the validity of the GFP construct specifically, and higher-order personality dimensions generally (Ashton et al., 2009). Even the more commonly used five- and six-factor models of human personality variation lack cross-cultural universality (Gurven et al., 2013), and there are both theoretical and empirical reasons to suspect that covariation patterns among narrow behavioral propensities vary across societies as a function of social and occupational niche diversity (Lukaszewski et al., 2017; Smaldino et al., 2019). If there is no universally human multi-dimensional personality structure, then it is even less likely that there is a universally human apical personality factor (but see van der Linden et al., 2018). However, a potentially fruitful line of theorizing is to view the GFP, not as a trait per se, but as a general ability to navigate the social world, with its specific components varying across societies and over time.

The GFP as a Life History Strategy Indicator

Our results were mixed, with regards to whether the GFP in young adults was related to childhood environmental harshness, as predicted by the hypothesis that the GFP is an LHS indicator (Dunkel & Decker, 2010; Figueredo et al., 2004, 2007; Rushton et al., 2008). Neither father closeness nor neighborhood stress was associated with a stranger-rated GFP (leaving aside the problems with the measurement model discussed above). One interpretation of this finding is that, although our “thin slice”-based judgments were tapping valid personality variation (except for the emotional stability dimension), this variation did not overlap with the portion of personality variation that is affected by father closeness or childhood neighborhood stress. Self-reported GFP was associated with father closeness, but not with neighborhood stress. This result provides limited support for the hypothesis that the GFP is a LHS indicator. Furthermore, we find that this model does vary by sex, which is consistent with previous evidence demonstrating sex differences in LHS and the Big 5 (Chua et al., 2020; Schmitt et al., 2008).

Metatrait Alpha (Stability) as a Life History Strategy Indicator

Contrary to our hypothesis H2, metatrait alpha or Stability (Digman, 1997) did not emerge as a clear LHS indicator. As a stranger-rated dimension, metatrait alpha showed poor model fit. As a self-reported dimension, only emotional stability and agreeableness loaded significantly on the latent factor, although conscientiousness loaded positively but did not approach statistical significance (p = 0.09). Self-reported metatrait alpha was significantly associated, in the predicted directions, with father closeness and childhood neighborhood stress. Again, these results provide limited support for the GFP- and LHS-based evolutionary account of personality variation.

Limitations

This study had several limitations. We sampled from the undergraduate population of just one U.S. University, limiting the generalizability of our findings. Our personality instrument does not cover the complete range of facets (Costa & McCrae, 1995) within each Big Five trait. Therefore, we are unable to test whether different facets of extraversion and openness are linked to fast vs. slow LHS, as proposed by del Giudice (2014). Unlike the recent work of Dunkel and et al., (2018a, 2018b) and Dunkel, van der Linden, et al. (2018), our study included no genetic controls, so we are unable to test the hypothesis that gene-environment correlations explain the association between childhood environmental harshness and metatrait alpha (see Barbaro et al., 2017). Parents with low levels of metratrait alpha might both transmit genetic predispositions toward this trait, and also create, or be unable to escape, harsh home or neighborhood environments.

Conclusion

Evolutionarily-informed research on individual differences, including work using LH Theory, has mostly built on the inductively derived Big Five or HEXACO personality structures (Dunkel & Decker, 2010; Figueredo et al., 2004, 2007; Rushton et al., 2008), or on higher-order factor structures (i.e., Big Two, GFP) based on them (Figueredo et al., 2004, 2007; Rushton et al., 2008). Here, we found (at most) weak support for predictions deduced from the hypothesis that the GFP or metatrait alpha is an LHS indicator. Future research on adaptive importance of human individual differences may profit from shifting the analytic level down from broad, cross-culturally variable factors to narrow, mechanistically defined situationally responsive propensities (Lukaszewski, 2019; Lukaszewski et al., 2020).