Compounds are special—morphologically complex—words (Arcara et al., 2014) composed of different linguistic parts (referred to as "lexemes" or "morphological components" or "constituents"). These parts are used to communicate one specific meaning (e.g., hot dog meaning a kind of fast food) while the different constituents also individually relate to that meaning to some degree (hot and dog are differentially related to the meaning of hot dog). Compounds differ from derived and inflected words in that they are formed from two free lexemes (Juhasz et al., 2003). An interesting definition of compound words is that provided by Desrochers et al. (2010): “A word is said to be derived by compounding when two lexemes (rarely more) are joined (e.g., flycatcher). Thus, compound words arise from the concatenation of distinct lexemes, which have their own frequency of occurrence in print and/or speech” (p. 109).

A very productive way to construct novel words is word compounding (Fiorentino et al., 2014) which is very prevalent in a number of languages (Libben, 2014). To give a few examples: in English, such words are fairly common (about 4800 (token) occurrences of nominal compounds as reported by Janssen et al., 2008), but the German language also makes extensive use of compounds (Lüttmann et al., 2011). The same is true for Basque, which contains a large number of compounds and where compounding is a very often used morphological mechanism (Duñabeitia, Laka, et al., 2009a). In non-alphabetic languages, compounding is also very often employed—for example in ChineseFootnote 1 (Cui et al. (2017). There are also compounds in languages that are less known by Westerners, such as the Bangla language (Dasgupta et al., 2016). Finally, as far as French is concerned—the language for which norms of compounds were collected in the present study—we are not aware of any published studies that have provided detailed statistics about the number of compounds. As in other languages, compounds are also expressions used in French to communicate meanings. In sum, compounding can be thought of as a universal way to form morphologically complex words across languages (Dressler, 2006).

At the surface level, compounds can take different forms, and the variations seen in forms mostly depend upon the language in question. In German, compounds are consistently written as a single string/single word (Günther et al., 2020; Zwitserlood et al., 2002). In English, compounds are made of free lexemes, and typically a combination of two or more constituent words (Dressler, 2006), which results in single lexicalized expressions (Juhasz et al., 2003)—compare for example tree house in English with Baumhaus in German. Most English compounds are written as unified expressions whereas others are written with a space or hyphenated (Inhoff et al., 2008). In a similar way, in Korean, compounds are built by combining two or more stem morphemes (Ko et al., 2011), with the noun-noun combination being a productive type of compounding (Sohn, 1999 cited in Ko et al., 2011). Compound words in Chinese are often formed by the combination of two or more characters, as in 雪人 [snowman] (Tse et al., 2017). As mentioned by Tse et al. (2017), about 73.6% of modern Chinese words are two-character compound words. In French, compounds can take different forms. For instance, there are noun-noun compounds (e.g., radio-réveil [alarm clock]), adjective-noun compounds (e.g., chauve-souris [bat]), verb-noun compounds (e.g., tire-bouchon [corkscrew]). There are also “noun-preposition-noun” compounds (e.g., chef de gare meaning station master, sac à main meaning a handbag) which, Nicoladis and Krott (2007) point out, are more frequent than the noun-noun compound type, with “de” and “à” being the most frequent prepositions found in noun-preposition-noun compounds (e.g., pommes de terre meaning potatoes, see Nicoladis, 2001).

Different languages also vary as a function of whether their compounds are left- or right-headed. In English, German or Dutch, compound words are right-headed (Koester & Schiller, 2008), that is to say that the second constituent (rightmost) determines the semantic category and the morphosyntactic features of the whole compound—it serves as the head (Juhasz, 2018). The first constituent serves as a modifier which further specifies the meaning as, for instance, in airport, which is a kind of port dedicated to travel by air (Günther et al., 2020). In Bulgarian, compounds are always right-headed (Jarema et al., 1999). In both Italian and French, it is possible to have both left- and right-headed compounds (Semenza & Luzzatti, 2014), even though in French, noun–noun compounds are generally left-headed (Jarema et al., 1999). In French noun–adjective compounds, the adjective can be located at the initial or at the final position; thus there are both left-headed and right-headed noun–adjective compounds.

Different views on the processing of compounds

A key issue in psycholinguistics is to determine the degree to which the different parts are involved when accessing the meaning of compounds. For example, the French compound chauve-souris—meaning bat—comprises a left-part adjective chauve meaning bald and a right-part noun meaning mouse. When reading chauve-souris, do readers first access the meaning of chauve and then the meaning of souris, and then, finally, the meaning of the compound? Or are the meanings of the two lexemes simply bypassed?Footnote 2

The processing of compounds has given rise to a large number of studies. Most of them have been conducted in the field of comprehension, and less so in word production (Gagné & Spalding, 2016a, 2016b; Zwitserlood et al., 2002). Any word reading (or any word production) model must account for the representations and the processes involved in the processing of all types of words, i.e., not only monomorphemic words but also polymorphemic words as well as compounds (e.g., hot dog). The findings on compounds are far from being consistent, but they have helped researchers build and constrain different views on compound processing in production or in word reading tasks. Indeed, different views have been put forward to account for how compounds are processed (for a brief overview, Fiorentino & Poeppel, 2007; Kuperman, 2013). Briefly, one view holds that compound words are processed based on their constituent parts—full-decomposition (or full-parsing) models (e.g., Taft & Forster, 1976). In contrast, according to another radically opposite view—full-listing models (e.g., Butterworth, 1983; Stockall & Marantz, 2006)—compounds are processed based on their “full form”, that is to say holistically. Mixed views have also been proposed (e.g., Bertram & Hyönä, 2003; Kuperman et al., 2009; Schreuder & Baayen, 1995) and assume that both types of processing may be involved depending on the characteristics of the compound. For instance, in the “late decomposition model” (e.g., Giraudo & Grainger, 2000), processing starts with whole-word representations and access to the constituents occurs afterward, for instance when the relation between the whole word and the compound’s constituents is semantically transparent.

Influence of different psycholinguistic variables on compound processing

One way to disentangle different views on compound processing has been to examine the (independent or combined) influence of several characteristics of compounds pertaining to either their constituents (hot and dog in hot dog) or their full form (hot dog). As a result, having norms on compounds is very important in order to address issues pertaining to compounds in psycholinguistics, such as how compounds are represented in the brain, as well as details of their processing in both comprehension and production tasks. In the present paper, we aim to provide psycholinguistic norms based on a set of 506 French compound words. To our knowledge, psycholinguistic norms are not available in French for this type of word. We now describe different psycholinguistic characteristics that have been used in certain studies to investigate compound processing and the rationale behind that use.

When describing the different psycholinguistic norms, we will refer to certain studies that have used each variable in question either in order to control it methodologically or statistically, or to investigate its potential effect in language processing. For the sake of concision, and because the aim of our study was to provide norms on compounds in French, we decided to indicate in a table (Table 7 in the Appendix) certain studies which made use of each of the described variables and not to describe exactly how the variable was explored (e.g., experimental tasks employed, direction of the effects observed). Likewise, readers can easily use this table to refer to the specific studies in order to know what the findings are for the variable in question and in what experimental setting they were obtained. We should make it clear that in doing so we did not intend to provide a full list of the studies in this vast field of research. The studies listed in Table 7 should therefore be taken as illustrations, even though it must be stressed that we were careful to include most recent studies together with older ones. The evidence broadly tends to suggest that there are reliable effects of the constituents’ characteristics on the processing of compounds. These effects have been observed in a large number of lexical processing tasks (e.g., lexical decision, word reading, spoken and written picture naming), often using different experimental techniques (e.g., implicit priming paradigm, picture-word interference paradigm, EEG and fMRI techniques). Thus, in the course of compound comprehension or production, the compound’s constituents are activated to some degree and they influence the processes underpinning these language abilities.

Objective word frequency is a well-known variable in psycholinguistics and one that has proven to be very influential in lexical processing (see Brysbaert et al., 2018 for a recent review). It corresponds to the number of times a word is found in written (or spoken) corpora. It is possible to evaluate the overall frequency of compounds (e.g., the frequency of the word hot dog), but also the frequency of their constituents (e.g., the frequency of hot and dog). For example, in French, the compound word chauve-souris has an overall frequency (subtitle frequency) of 5.43 per million, whereas chauve and souris have frequency values of 5.25 and 21.94 per million, respectively. The rationale has been that, whenever the frequencies of the constituents have an impact on the processing of compounds, this is then evidence for the hypothesis that they are decomposed in some way. One research strategy has been to manipulate the frequency of the constituents of compounds. This strategy was initially undertaken by Taft and Forster (1976) and it has been employed very often since (Table 7).

Length, like lexical frequency, is an objective variable. It has often been examined in visual word recognition but its effects have proven to be non-intuitive (e.g., Ferrand et al., 2011; New et al., 2006). The influence of length has been examined in connection with compounds on the basis of eye movements in reading and it has been found that longer compounds yielded longer gaze durations than shorter compounds (e.g., Juhasz, 2008). Length has been found to interact with frequency in various ways such as, for example, in Bertram and Hyönä's (2003) study, in which constituent frequency effects were found for long but not for short compound words.

Subjective frequency and age of acquisition (AoA) are also lexical variables that have been investigated in compound processing (even though certain researchers think that AoA also indexes semantic aspects, see for instance Ghyselinck et al., 2004). Subjective frequency is a measure of how often a word is encountered in everyday life (read or heard). To assess subjective frequency for words, Likert-like scales are used, for example a 7-point scale where the response 1 is given for words that are perceived as being rare and the response 7 for words that are perceived as being commonly used (Desrochers et al., 2010; Desrochers & Thompson, 2009). Subjective and objective frequency scores are correlated (e.g., Bonin, Méot, et al., 2003a; Desrochers et al., 2010; Gonthier et al., 2009), thus making it possible to complement objective frequency measures (Desrochers & Thompson, 2009), especially for objectively low-frequency words that occur in very specific contexts (e.g., syringe is a low-frequency word according to objective frequency counts but is subjectively frequent for people working in hospitals). They can also be used as a proxy when objective frequency measures are lacking for some words. Interestingly, Balota et al. (2001) have shown that subjective frequency ratings are a slightly better predictor of lexical decision times than objective word frequency measures. An influence of familiarity/subjective frequency on compound processing has been found in word reading (Table 7), but this variable has been less-well investigated than objective lexical frequency.

AoA corresponds to the age at which a word was acquired. AoA of words is often measured by asking adults to rate the age at which they think they learned different words. The AoA effect—words acquired early in life are processed faster and more accurately than words acquired later in life—is a strong empirical finding that surfaces in a large number of lexical tasks (see for reviews: Juhasz, 2005; Johnston & Barry, 2006). This phenomenon is still debated at both the methodological, i.e., how best to evaluate the AoA of words, and theoretical levels, i.e., what are the mechanisms underpinning AoA effects. Another related question is what is the locus of AoA effects (e.g., Bonin et al., 2004; Brysbaert, 2017; Cortese & Khanna, 2007; Zevin & Seidenberg, 2002, 2004; see Calting & Elsherif, 2020 for a brief overview). Up to now, as can be seen from Table 7, the impact of AoA has not often been investigated in compound processing.

Semantic transparency corresponds to the relationship between the meaning of the full form of the compound (e.g., hot dog is a piece of food) and the meaning of its constituents (e.g., hot refers to temperature and dog refers to an animal that barks). As claimed by Günther et al. (2020), research on compounds has made widespread use of the notion of semantic transparency. Compounds vary as a function of their semantic transparency. Fully transparent compounds comprise two lexemes that both contribute to the overall meaning (e.g., birdhouse). In contrast, in fully opaque compounds (e.g., shindig), neither constituent contributes to the meaning, whereas in partially opaque words (e.g., hot dog), only one lexeme is linked to the meaning of the compounds. Thus, there can be varying degrees of semantic transparency in compounds, making it possible to assess the extent to which the meaning of compounds is predictable from the meaning of the constituents. As listed in Table 7, the influence of semantic transparency has been explored in morphological processing in different languages and in several experimental tasks related to either language comprehension or production. Norms of semantic transparency have been collected in English (e.g., Gagné et al., 2019; Kim et al., 2019) and, very recently, semantic transparency measures have been provided for a large set of German compounds (Günther et al., 2020). Another psycholinguistic characteristic that is specific to compounds and for which it is worth collecting norms is lexeme (or lexical) dominance. As stated earlier, languages vary as a function of whether their compounds are left- or right-headed. The meaning can thus be carried either by the left or by the right lexeme or by both lexemes. Lexeme dominance can be assessed by asking participants to use Likert scales to evaluate the degree to which each lexeme contributes to the whole meaning of the compounds. The influence of lexeme dominance in compound processing has been examined in several languages as reported in Table 7.

Apart from the above characteristics, there are other variables that have been found to influence the processing of compounds. Conceptual familiarity, imageability and sensory experience ratings (SER) are all semantic variables which presumably index different semantic dimensions (Kuperman, 2013). Conceptual familiarity measures the degree of physical or mental contact with an object (e.g., a rake is very familiar for a gardener). This variable has often been used to investigate the determinants of object naming speed (Perret & Bonin, 2019). However, this variable has not been taken into account very often when investigating visual word recognition (but see Chedid et al., 2019, for a recent study showing a reliable influence of this factor on lexical decision times). We are not aware of any study on compounds which has examined the influence of this variable. Imageability is a variable thought to index the richness of words (Yap & Pexman, 2016) and it corresponds to the ease with which a mental image can be formed in response to words (Bonin et al., 2011). Imageability is a reliable predictor in a large number of word recognition tasks, such as lexical decision, word reading or conditional word naming (e.g., Cortese et al., 2018; Strain & Herdman, 1999), but it has not been investigated to any great extent in compound processing (see Table 7 for a study examining the influence of imageability in lexical decision times in response to compounds).

SERs are thought to reflect the extent to which a word evokes a sensory and/or perceptual experience (Juhasz et al., 2011; Juhasz & Yap, 2013). To provide SER for words, participants use Likert scales to assess the extent to which the words evoke an actual sensation (e.g. sight, touch, smell). Bonin et al. (2015) found that SER reliably predicted response times for lexical decision on individual words, but not in word naming or progressive demasking. SER effects have also been reported for compound processing (Table 7).

Given that no norms on compounds are available in French, the aim of our study was to provide norms for a set of 510 compounds, and these should prove to be very helpful to researchers who wish to investigate the processing of such words.

Collection of the norms

Method

Participants

A total of 216 students from bachelor’s degree to master’s degree at the University of Bourgogne Franche-Comté (mean age: 20.41 years; 44 male) completed the different questionnaires. All were French native speakers. Thirty-one participants completed each questionnaire, except in the case of the semantic dominance questionnaire, for which there were only thirty participants. This study involving human participants was reviewed and approved by the Statutory Ethics Committee of the University Clermont Auvergne.

Material

The number of French words that have been rated for certain psycholinguistic norms is much smaller than is the case for other languages such as English, even though there are now a growing number of studies that provide psycholinguistic norms in French (see http://www.lexique.org/?page_id=378). We started by checking published French normative studies for their inclusion of compounds. It turned out that compounds were rarely included in the normed items. For instance, there are 45 compounds corresponding to object names in the normative study by Alario and Ferrand (1999) for 400 line drawings taken from Cycowicz et al. (1997), including the 260 Snodgrass and Vanderwart (1980) line drawings. In the Bonin, Peereman, et al. (2003b) normative study of 299 line drawings, only five compounds corresponding to object names can be found. In the Ferrand et al. (2008) study, monosyllabic words (N = 1,493) were rated on AoA and subjective frequency, and thus, no compounds were included. Finally, in the MEGALEX study (Ferrand et al., 2018), no compounds were included.

Juhasz et al. (2015) selected their stimuli from a larger set of English words (Juhasz & Berkowitz, 2011) and had a list of 629 compound words. Given that compound words are different in English and French, it was not possible to translate the list of compounds provided by Juhasz et al. (2015) in a way similar to how pictures normed in one language can be used directly to collect norms in another. We therefore had to create our own list of compounds. Our objective was to gather a list of compounds that university students (who are often used as participants in psycholinguistic studies) might know, and which could therefore be used in word reading or language production experiments involving undergraduate students. Also, in the same way as Juhasz et al. (2015), we were careful to select compound words that would vary in familiarity. Because, to our knowledge, there is no specific database of compounds in French, we gathered an extensive list of French compounds (both with and without prepositions) by searching through several online dictionaries available on the internet and in the Lexique.org database (New et al., 2004). More precisely, about 400 words from the Lexique database were selected, while taking into account that each compound had to be familiar to undergraduates in order to be included in the list. In the end, 397 compounds were selected. Using the same criterion, the remaining compounds (N = 113) were chosen by consulting French dictionaries. Likewise, we selected 510 French compound words of different lengths. Length is defined as the number of characters (spaces, hyphens and apostrophes included).

Procedure

Seven different types of norms were collected using the questionnaires: lexeme meaning dominance, semantic transparency, sensory experience rating (SER), conceptual familiarity, imageability, AoA and subjective frequency. All the questionnaires were created using LimeSurvey (www.limesurvey.org). The participants were tested collectively in small groups (no more than eight individuals) in a large, quiet room. Each participant had a personal computer running the LimeSurvey questionnaire. Participants were randomly assigned to only one of the seven different rating tasks. The participants first gave their consent. Before each type of rating, full instructions with examples were givenFootnote 3. The 510 compounds were randomly presented across participants. Small breaks were allowed during the rating task.

To collect the ratings for lexeme meaning dominance, we closely followed the instructions provided by Juhasz et al. (2015). The participants were asked to assess the extent to which the meaning of the whole compound word was related more closely to the first lexeme or, in contrast, more closely to the second lexeme. A 10-point Likert scale was used, with 0 corresponding to the response “the meaning of the entire compound word is strictly related to the first lexeme” and 10 to “the meaning is strictly related to the second lexeme”.

Semantic transparency, imageability, conceptual familiarity and sensory experience ratings were collected on the basis of the instructions used by Juhasz et al. (2015), except that we used 5-point Likert scales for all characteristics. As far as semantic transparency is concerned, the participants had to rate how related the two lexemes were to the overall meaning of the whole compound, with 1 corresponding to the response “the two words are not related at all to the meaning of the entire compound word”, and 5 “the words are both completely related to the meaning of the entire compound word”. To collect SER, the participants were told to indicate the degree of sensory experience that each compound word evoked for them, with 1 = “the word evokes no sensory experience” and 5 = “the word evokes a strong sensory experience”. Imageability was rated by asking the participants to indicate how easily the concept denoted by the compound word made it possible to generate a mental image, with 1 = “the concept hardly evokes a mental image” to 5 = “the concept easily evokes a mental image”. For the conceptual familiarity task, the participants were instructed to indicate how familiar the concept denoted by each compound word was for them (1 = “the concept is not familiar at all”; 5 = “the concept is very familiar”). We followed the procedure described in Alario and Ferrand (1999) for collecting AoA norms. Five-point Likert scales with 3-year age bands were used, with “1” corresponding to “word acquired at 1–3 years old”, and “5” corresponding to “word acquired at 13 years old or later”. Finally, to evaluate the subjective frequency of compounds, the adults were told to indicate the degree to which they think they had read, heard or used them (1 = “I have never heard, read or used the word”; 5 = “I have very frequently heard, read or used the word”).

Results

The Supplemental Material contains the database of 506 French compound words used in the present study (in csv and xls format; see below for the elimination of four compounds).

The compound words are listed alphabetically with their English translations. The database provides the mean ratings and standard deviations for the seven collected variables for each compound: lexeme meaning dominance, semantic transparency, sensory experience, conceptual familiarity, imageability, AoA and subjective frequency.

Data analyses

Several analyses were performed on the data and are reported in the following order. First, we describe the screening procedure that was used to identify potential outliers among the participants and the items. Second, we describe the reliabilities that were computed for the different collected norms. Third, descriptive statistics are reported together with the distributions of the norms. Fourth, we present the bivariate correlations and comment on a factor analysis that was conducted to analyze and summarize the correlational structure of the norms. Finally, we report multiple regression analyses that were performed in order to study the influence of the characteristics of the lexemes on compound ratings.

Screening of the data

Three participants were set apart: one in the imageability rating task, one in the conceptual familiarity rating task and, finally, one in the semantic transparency rating taskFootnote 4. These participants were eliminated because of the very low means of the correlations that were found between the estimations provided by them and the ratings obtained from the other participants in the respective rating tasks. Each compound word was evaluated by at least 28 participants. Four words were discarded (“appel d’offre” [invitation to tender], “appui-tête” [headrest], “belle-mère” [mother-in-law], “gratte-ciel” [skyscraper]) because they were inadvertently omitted or incorrectly spelled in certain rating tasks.

Reliability

The correlations between the by-item means obtained from the even and odd participants and the intraclass correlation coefficients (random effects of both participants and items—ICC(2, k) in Shrout and Fleiss’s [1979] terminology) are reported in Table 1. With no values below .80, these coefficients suggest a high level of reliability for all the collected variables.

Table 1 Correlations between the by-items means obtained from the even and odd participants and the intraclass correlation coefficients (random effects of both participants and items—ICC(2, k) in Shrout and Fleiss’s (1979) terminology)

Descriptive statistics

Table 2 reports descriptive statistics for the different psycholinguistic variables and Fig. 1 shows the distributions of the different norms. As is generally observed for single words (e.g., Baayen, 2001; Muller, 1977), the objective frequency measures were very positively skewed. In line with Juhasz et al.’s (2015) study, most of the words had lexical frequencies below one per million: Among the 394 words for which objective frequencies were available, 381 (96.7%) had objective frequencies below 10 and only one (.3%) over 100. A positive skew of subjective frequency can also be observed, but is much smaller (Fig. 1).

Table 2 Descriptive statistics for the subjective norms and other objective variables
Fig. 1
figure 1

Distributions of compounds as a function of the different variables (vertical lines = means of the variables)

With a relatively large negative skew (Fig. 1) and with both mean and median above 4, the conceptual familiarity ratings turned out to be relatively high, and they are indeed higher than those reported for French, Spanish and American English (the means were respectively 3.29, 3.12 and 3.06 on a 1–5-point scale) by Alario and Ferrand (Alario & Ferrand, 1999, see their Table 2, p. 534) for the Snodgrass and Vanderwart (1980) set of pictures of familiar objects. This characteristic suggests that the ratings given for the familiarity of the concepts are dependent on the format of presentation (words versus pictures).

Lexical meaning dominance and numbers of letters had nearly symmetrical distributions (Fig. 1). The other norms were not strongly skewed and their main characteristic resides in a negative excess kurtosis, with thinned tails and relatively flat distributions between the tails. In contrast to conceptual familiarity, the characteristics of the AoA and imageability variables were consistent with what has been found for single words in previous databases (Alario & Ferrand, 1999; Bonin, Méot, et al. 2003a; Bonin, Peereman, et al., 2003b). Indeed, compound words were estimated to be learned late in life, and to be less imageable, than single words. The differences were tenuous when compared to subjective frequency as reported by Bonin, Méot, et al., (2003a).

Bivariate correlations and multiple regressions with behavioral variables as dependent variables

Pearson correlations (see Table 3) were computed between the mean ratings for each item on all of the rating measures as well as with objective measures.

Table 3 Bivariate correlations between the measured variables

Table 4 shows the loadings obtained from a varimax rotation computed on the three components associated with eigenvalues beyond 1 in the corresponding principal component analysis (PCA; it is worth noting that the pattern of findings was nearly the same without the inclusion of objective frequency). Subjective frequency, conceptual familiarity, imageability and SER were essentially expressed on the first component and they were very positively correlated, i.e., compounds that were rated highly on one variable also tended to be rated highly on the other variables. There were also strong relationships between this set of variables and AoA, with later-acquired compounds being judged as less frequent, conceptually familiar and imageable and as being associated with fewer sensory experiences than early-acquired compounds. With noticeable correlations with conceptual familiarity, subjective frequency and AoA, objective frequency and semantic transparency were also partially expressed on this first factor, with more frequent and transparent words rated as being somewhat more familiar, subjectively frequent and earlier acquired (and also more imageable in the case of semantic transparency only) than less frequent and more opaque words. Length was poorly correlated with all the other variables, and this variable was expressed on the second factor, in part linked with objective frequency and semantic transparency by means of weak negative correlations, i.e., longer words tended to be associated with lower frequency values and higher transparency ratings. Finally, the third factor essentially expressed lexeme meaning dominance, which was nearly uncorrelated with all the other variables. The only significant correlation was with imageability, but with an opposite sign to that found by Juhasz et al. (2015).

Table 4 Loadings obtained in the varimax rotation

Taken overall, the pattern of correlations was very similar to that obtained by Juhasz et al. (2015). The only noticeable difference concerns the correlation between semantic transparency and SER which, despite being of the same sign, was clearly lower in the current study. It is important to note that the correlations between objective lexical frequency and the other variables reported by Juhasz et al. (2015) were always lower than the correlations reported here for French compounds. However, it seems that the authors did not log-transform objective frequency before performing the computations, and that is the reason why we also report the correlations with raw objective frequency in parentheses in Table 3. Indeed, these latter correlations are approximately similar to those obtained by Juhasz et al. (2015).

In order to examine whether the relationships described above persisted when other dimensions were controlled for, we included each psycholinguistic variable in turn as the dependent variable in a multiple regression analysis and used the other variables as independent variables. (It is worth mentioning that insofar as a PCA—with or without varimax rotation—depends solely on bivariate correlations, it was not suitable for achieving this aim. The same objection could have been made if a clustering method had been used.) Given the high correlation between subjective frequency and conceptual familiarity, only subjective frequency was retained in the analyses. To simplify the comparisons, the variables were all transformed into Z-scores. Non-linearities were introduced by including restricted cubic splines with a maximum of six knots for some independent variablesFootnote 5. In order to reduce potential overfittingFootnote 6, we decided to use a bootstrap approach which was suggested to us by Baayen (2008, p. 212). In a first step, 1000 bootstrap samples were selected. In each sample, each subjective norm was taken in turn as a dependent variable (DV) whereas the other norms were included as independent variables (IV). For each DV, the model with no nonlinearities was enriched by including, in turn, splines with three to six knots for each IV. The percentage of results significant at .05 over all bootstrap samples was then computed for each IV both overall and for its nonlinear part. Nonlinearities were retained for the IV whenever (1) these percentages were the highest and (2) both percentages were higher than 95% for at least one number of knots. Given that beyond the first number of knots that satisfied these criteria, the numbers of subsequent knots led to very similar percentages of significant results, we then computed, over all bootstrap samples, the mean of the R2 using three to six knots for the retained IV. The retained number of knots was the one after which adding more knots did not add more than 1% of additional explained variance (as a mean over all bootstrap samples). In a second step that was limited to the DVs for which one IV including nonlinearities was retained, the same procedure was repeated in order to determine whether such terms should be included for any of the other IVs (the baseline model included nonlinear terms for the IV retained in the first step and only linear terms for the other IVs). However, no IV satisfied the criteria defined above. As a result, no more nonlinearities were included.

The results of these analyses are reported from two perspectives: (1) Table 5 shows the squares of the semi-partial correlations, i.e., estimations of the additional percentages of explained variance when the independent variables are added in the model while the other independent variables are already included in the model. (2) Significant partial effects of the independent variables, that is to say the effects of the independent variables measured at constant levels of the other independent variables, are depicted in Fig. 2. The aim of this figure is to illustrate the comparison of the relationships that exist between pairs of subjective norms when the other dimensions are controlled for, with the associations appearing in the form of bivariate correlations, and also to illustrate the patterns of nonlinear relationships revealed by the inclusion of nonlinear terms.

Table 5 Multiple regressions including one norm as DV and the other norms as IVs
Table 6 Regressions including left (L) and right (R) word ratings
Fig. 2
figure 2

Partial effects in the regression analyses including one variable as the DV and the other variables as independent variables (AoA = age of acquisition; Trans = semantic transparency; Imag = imageability; S-Freq = subjective frequency; O-Freq = objective frequency; LMD = lexical meaning dominance; SER = sensory experience ratings)

For each IV, the added percentage of explained variance when it was included while other IVs were already in the model is shown in the first line. If nonlinear terms were included, a second line gives the same information for nonlinear terms alone together with, in brackets, the number of knots used. The analyses were run using the reduced set of (394) compounds for which subtitle frequencies were available in Lexique (New et al., 2004). ***p < .001; **p < .01; *p < .05

With approximately 70% of explained variance, imageability was predicted to a large extent by the other dimensions, in particular SER, with compounds rated as easily experienced by the senses being more imageable, and AoA, with later-acquired words being less imageable. The increase in imageability ratings with higher SER values was, however, somewhat reduced after the SER mean.

More transparent and longer compounds were also more imageable and the reverse was true for objective high-frequency compounds. There were some discrepancies between the bivariate correlations, and more particularly for the subjective frequency variable. In effect, in spite of a large positive correlation, there was no reliable effect when the other dimensions were controlled for. In addition, the effect of objective lexical frequency was reversed, with more frequent words being less imageable when other variables were entered in the equation. Given that these discrepancies were still observed when one of the two frequency measures was excluded from the equation, they cannot be accounted for by the relationship between the two frequency measures. This might have to do with the relationship between subjective frequency and AoA. Indeed, excluding AoA from the regression equation led to a large positive effect of subjective frequency. The results of the multiple regression analysis for SER mimic in part those that were found for imageability (even though SER is less well explained by the other variables than imageability): The more imageable the compounds were rated as being, the more they were rated as easily experienced by the senses. Late-acquired and more objectively frequent compounds were rated as arousing less sensory experience.

Contrary to what occurred for imageability, the positive effect of subjective frequency found in the bivariate correlation analysis was still observed when the other independent variables were controlled for. Finally, SER were lower for more transparent words in contrast to what was observed in the bivariate correlation analysis.

The variance in the AoA ratings was explained by the other variables at a level similar to that of imageability. The effects were in the same directions as those obtained in the bivariate correlations: Higher imageability and SER ratings as well as higher (objective and subjective) frequencies were associated to a greater extent with earlier acquired compounds. Imageability was the most explanatory variable, accounting for about 12% of explained variance. There were, however, two exceptions. First of all, semantic transparency failed to reach significance whereas its bivariate correlation was relatively large, and second, a significant effect of length was found, with longer compounds being acquired later.

As far as subjective frequency is concerned, a large proportion of the variance was accounted for by the other variables, with the effects having directions similar to those shown in the bivariate correlation analysis. Objective frequency was the most explanatory variable, with more objectively frequent compounds being rated as more frequently encountered. AoA and semantic transparency also accounted for a large portion of the variance: More transparent compounds were rated as being encountered more often, and the reverse was true for late-acquired words. With a less strong effect, compounds with high sensory experience ratings were judged to be encountered more often. This was also the case of imageability, but its effect was not monotonous, with subjective frequency ratings increasing roughly before −1 sd and after +1 sd of the ratings of imageability and the reverse for intermediate ratings.

Compared to the other psycholinguistic variables, differences in semantic transparency estimations were far less well explained by the other dimensions. They increased with higher subjective frequency and imageability ratings and, to a lesser extent, with longer compounds (and the opposite for objective lexical frequency and SER). Finally, the link with AoA was more complex, with increasing transparency values located before the mean AoA and decreasing values located after the mean AoA.

Finally, lexical meaning dominance was very poorly explained by the other variables, with no predictors reaching significance. Conversely, lexical meaning dominance was never significant when it was introduced as IV in the analyses.

Influence of lexeme characteristics on compound ratings

In order to compare our results with those obtained by Juhasz et al. (2015) concerning the influence of the characteristics of the constituents of the compounds, each compound rating was used as a dependent variable in a linear regression which included the length of the compound, its subtitle frequency, the subtitle frequencies of its constituents (subtitle frequencies were all taken from Lexique: New et al., 2004) and the ratings of the left and right constituents, hereafter L-words and R-words, respectively. Because French has fewer databases of psycholinguistic norms than are available for English, and because all the French databases have—numerically speaking—substantially fewer rated words than in English, the ratings for L-words and R-words were available for only a subset of the compound words. As a result, we decided to combine different French databases in order to maximize the numbers of ratings. Before doing this, we standardized the ratings within each database and checked whether the results obtained using this transformation were approximately the same as those resulting from the use of only the database containing the greatest number of words.

The AoA ratings for the constituent words were obtained from Ferrand et al.’s (2008) study (N = 114), from Alario and Ferrand’s (1999) study (N = 12) and from Bonin, Peereman et al.’s (2003) study (N = 8). Imageability ratings were taken from Bonin et al.’s (2011) study (N = 114) and from Bonin, Méot, et al.’s (2003) study (N = 23). SER ratings were obtained from Bonin et al. (2015). We also took into account the subjective frequency ratings of the constituents. These were obtained from Ferrand et al.’s (2008) study (N = 114) and from Bonin, Méot et al.’s (2003) study (N = 23).

The analyses were run in two steps. First of all, the analyses were performed with a block containing the independent variables common to all ratings, with length and objective word frequency included. Second, they were run with a block including the ratings of the constituents.

The percentages of explained variance were roughly comparable to those found by Juhasz et al. (2015) for all dependent variables except subjective frequency, for which it was a little lower in the current study when the ratings of the constituents were not included as they were in Juhasz et al. (2015). Even though they were all significant at .001, the increase in the percentages of explained variance when the lexeme ratings were included in addition to other characteristics were lower in the present study than in the Juhasz et al. (2015) study (AoA: .089 versus .134; imageability: .081 versus .239 ; SER: .088 versus .186).

The compound frequency effects were also similar to those observed by Juhasz et al. (2015), albeit with a nonsignificant effect in the case of SER, for which the estimated beta was nearly half that reported by the latter authors. For both subjective frequency and AoA, these were the largest effects, with more objectively frequent compounds being rated as more frequently encountered and acquired earlier in life.

The negative effects of the frequency of L-words on imageability and SER were also replicated, but they were stronger in the present study than in Juhasz et al.’s (2015) study. For imageability, there was an absolute effect which was as large as that of compound frequency. By contrast, the effect of the frequency of L-words on AoA was positive but less strong than in Juhasz et al. (2015). Unlike in this latter study, the frequency of L-words also had significant positive effects on both semantic transparency and lexical meaning dominance. However, these effects turned out to be relatively weak. Finally, no reliable effects of the frequency of R-words were found, except for a small negative effect on lexical meaning dominance.

As far as the effects of the ratings of the L- and R-words are concerned, our findings were less consistent with those obtained by Juhasz et al. (2015). In their study, both the ratings of L-words and R-words had significant effects on compound ratings, with the effects of the first being larger than those of the second. In contrast, with the exception of imageability, the effects of the L-word ratings failed to reach significance in the present study. Moreover, with the exception of imageability, the effects of the R-word ratings were also somewhat larger in the present study. It is important to note that for both subjective frequency and AoA, this property was also observed in bivariate correlations between L-word and R-word ratings and the compound ratings (L-words: r = -.02 and r = .043; R-words: r = .36 and r = .31), whereas this was less the case for SER (L-words: r = .23 and R-words: r = .42) and not at all the case for imageability (L-words: r = .39 and R-words: r = .39).

Discussion

Our main motivation in the present research was to provide psycholinguistic norms for a set of French compound words. The compounds were normed on seven psycholinguistic variables: lexeme meaning dominance, semantic transparency, sensory experience, conceptual familiarity, imageability, age of acquisition (AoA) and subjective frequency.

Let us summarize the main findings. First of all, and crucially, the reliabilities were high for all the collected variables. Second, and interestingly, the pattern of bivariate correlations among the norms was very similar to that reported by Juhasz et al. (2015), which suggests that the underlying structure of the norms is similar in both English and French. The important aspects of note among the correlations between the variables include the following: (1) Compounds that were given high ratings on subjective frequency, familiarity, imageability or SER—or that were estimated to be acquired early in life—were given higher ratings on the other dimensions. The relations of these variables with objective frequency were in the same direction but weaker. (2) Semantic transparency was also judged to be higher for higher ratings on the variables listed above (e.g., subjective frequency, familiarity, imageability). The same was true for early acquired words, but the relationships were more tenuous. Semantic transparency was also the variable which was the most strongly related to orthographic length with, however, only a small correlation of .134. (3) Lexical meaning dominance ratings were only very weakly related to other psycholinguistic variables.

Taken overall, the relationships described above were also observed when other dimensions were controlled for using multiple regression analyses. First of all, except for AoA for which the opposite was found, imageability was positively related to all the other psycholinguistic norms, and more strongly to SER and AoA. In spite of the fact that SER was generally less explanatory than imageability, the relations were in the same direction as those observed with imageability. One exception was semantic transparency, for which the more the compounds were judged as being easily experienced by the senses, the less transparent they were rated to be. Second, compared to early-acquired compounds (e.g., petit déjeuner [breakfast], pique-nique [picnic]), late-acquired compounds (e.g., trop-perçu [overpayment], arc-boutant [flying buttress]) were rated as less imageable, less easily experienced by the senses and less frequent. Also, both early- and late-acquired compounds were judged to be less transparent than compounds having intermediate age of acquisition ratings. Third, objectively high-frequency compounds (e.g., aujourd’hui [today], grand-père [grandfather], rendez-vous [appointment]) were estimated to be more frequently encountered and acquired later than compounds that are objectively of low-frequency (e.g., abaisse-langue [tongue depressor], trop-perçu [overpayment]). Objectively high-frequency compounds were also judged to be less imageable and easily experienced by the senses, and less semantically transparent, than objectively low-frequency compounds. Finally, more semantically transparent compounds were rated as being more frequently encountered and imageable, but less easily experienced by the senses whereas longer compounds were judged more imageable and transparent, and as having been acquired later than shorter compounds.

The multiple regression analyses that were performed to examine the influence of lexeme characteristics on compound ratings revealed that, except for imageability, there were no reliable effects of the L-word ratings which contrast with Juhasz et al.’s results (2015) in which reliable and large effects of L-word ratings were found for AoA and SER. As far as R-word ratings are concerned, except for imageability, the impact of R-word ratings was also larger in the present study than in Juhasz et al.’s (2015). However, including lexeme ratings added a relatively high proportion of explained variance for all norms. Taken as a whole, this pattern of findings accords with the idea that compound words are decomposed in order to perform rating.

Different characteristics of compounds have been found to modulate the speed and accuracy of the processing of compounds in several word reading or word production tasks, thus suggesting that the constituents of compounds are activated to some degree and that they influence the processes underpinning language processing. As reviewed in the Introduction, certain psycholinguistic characteristics of compounds have been especially well-investigated (e.g., lexical frequency, semantic transparency), whereas other characteristics have given rise to a limited number of studies (AoA, conceptual familiarity). Thus, the influence of the latter variables should be addressed in future studies in order to determine their true influence. Therefore, we hope that the availability of these norms for the French language will stimulate research on how compound words are processed in word reading and in both spoken and written word production. For instance, it would be interesting to design a megastudy including a large number of French compounds to determine the determinants of recognition performance (in lexical decision, in word reading or in perceptual identification). Indeed, the megastudy approach is relatively recent for investigating lexical processing and it is very useful in that it permits the investigation of a large number of variables (and sometimes novel variables) and their relationships (including complex relationships such as nonlinear ones) on a large number of items (Balota et al., 2012; Yap & Balota, 2015). In the future, having large databases on compound word processing speed and accuracy in French should also make it possible to investigate individual differences in compound processing (e.g., processing differences when compounds are read by high- versus low-skilled readers; see Yap et al., 2012, and Lim et al., 2020, for examples of this approach in visual word recognition). As far as compounds are concerned, megastudies have been conducted in English (Kim et al., 2019) and in Chinese (Tse & Yap, 2018). These have helped identify the characteristics that play a significant role in compound word reading and those that appear to be of less importance. For instance, in Chinese, Tse and Yap (2018) found that orthographic variables accounted for the largest part of variance in lexical decision, followed by semantic variables, and finally by phonological variables. We plan to use our norms to design such a study in the future.

Compared to certain more recent studies that have rated compounds on impressive numbers of items (e.g., Gagné et al. (2019) collected meaning predictability judgments for 8,304 compounds and Kim et al. (2019) collected semantic transparency scores for 2,861 compound words), our database is somewhat smaller in terms of the number of compounds that are rated. However, our compounds have been normed on seven psycholinguistic variables, and not only on one or two variables (e.g., imageability, semantic transparency), a fact which will be very useful for many experimental designs. A potential limitation of the current work is that the norms were collected from French-speaking adults who live in France. We cannot exclude the possibility that the norms may not be entirely suitable for use with adults speaking French in Belgium, Switzerland, or parts of Africa. It would be interesting to collect norms on our compounds with such French speakers and to compare them with the norms reported here. It is worth mentioning in passing that norms for compound words are less frequent in languages other than English and, as has been the case for AoA or imageability norms, it is possible that norms for compounds will become available for other alphabetic languages as well as for a larger number of non-alphabetic languages. Finally, as has been the case for other pools of words (e.g., Ferrand et al., 2008), the current norms will be complemented in the future by other types of norms, such as association norms (see Schulte im Walde & Borgwaldt, 2015, for a study of this kind in German), valence and arousal (Kuperman, 2013) or “word prevalence” norms, which have recently been collected for single words (Brysbaert et al., 2019).

To conclude, we hope to have convinced readers that norms on compounds such as those provided here are indispensable for achieving a better understanding of how compounds are comprehended or verbally produced.