Introduction

Timed picture naming is an established method for studying lexical retrieval and word production in both healthy and impaired speakers. Several different models of timed picture naming have been proposed. All agree that picture naming consists of three stages: (1) visual recognition and conceptual identification, where features of the depicted object or event are extracted and matched with semantic knowledge; (2) lexical selection, in which suitable lexical representation of words along with their syntactic properties (lemmas) are accessed and selected; and (3) articulation of the word (Bonin, Meot, Lagarrigue, & Roux, 2015; Glaser, 1992; Johnson, Paivio, & Clark, 1996; Levelt, 1999; Levelt, Roelofs, & Meyer, 1999; Perret & Bonin, 2018; Rapp & Goldrick, 2000). These stages are assumed to be universal across both languages and speakers.

Multiple studies in many languages confirm a number of key variables that predict timed picture naming. In a recent Bayesian meta-analysis, Perret and Bonin (2018) confirmed the effects of these key variables which include rated age of acquisition (AoA), familiarity, imageability, and name agreement (NA) in monolingual speakers. These variables are assumed to exert their influence on at least one stage of timed picture naming (Alario et al., 2004; Humphreys, Riddoch, & Quinlan, 1988), but may also have multiple loci. For example, NA, imageability, and visual complexity (VC) are assumed to arise during the visual recognition of objects and actions (Alario et al., 2004; Humphreys et al., 1988), but NA is also assumed to have an impact on spoken word retrieval. By contrast, rated familiarity and imageability are assumed to influence semantic and conceptual identification only (Barry, Morrison, & Ellis, 1997; Ellis & Morrison, 1998; Weekes, Shu, Hao, Liu, & Tan, 2007), and word frequency is assumed to reflect processes at the lexical selection and encoding stage of speech production (Alario et al., 2004) as does AoA, although AoA may also have an impact on semantic and conceptual identification. Other studied variables have less robust effects on timed picture naming, including word length, morphological complexity, argument structure, and verb instrumentality (Barbieri, Basso, Frustaci, & Luzzatti, 2010; Crepaldi, Che, Su, & Luzzatti, 2012; Cuetos, Ellis, & Alvarez, 1999; Jonkers & Bastiaanse, 2007; Kambanaros, 2009; Parris & Weekes, 2001; Thompson, 2003). A majority of studies use objects in timed picture naming studies. However, studies of action naming reveal similar effects across languages in both typical and atypical monolingual speakers (Alyahya & Druks, 2016; Bird, Franklin, & Howard, 2001; Druks et al., 2006; Edmonds & Donovan, 2012; Khwaileh, Mustafawi, Herbert, & Howard, 2018; Morrison, Hirsh, & Duggan, 2003; Nilipour, Bakhtiar, Momenian, & Weekes, 2017; Szekely et al., 2005). These studies suggest that the item-level characteristics of actions and objects constrain typical and atypical picture naming across languages for monolingual speakers. Recently, however, the study of bilingual language processing has come into focus.

One question is whether the established effects of psycholinguistic variables on timed picture naming in monolingual speakers are observed for bilingual speakers (Lisa A. Edmonds & Donovan, 2014; Ramanujan & Weekes, 2019). A related question is whether the item-level characteristics taken from the ratings of monolingual speakers are valid for testing bilingual speakers. Bilingual speakers are very different in terms of several linguistic and non-linguistic factors. Moreover, unlike monolingual speakers, bilingual speakers are highly diverse in their language proficiency, age of exposure to each language, amount of exposure on a daily basis, and relative linguistic distance between the languages spoken ( Abutalebi et al., 2013; Chee, Soon, Lee, & Pallier, 2004; Degani, Prior, & Hajajra, 2018; Golestani et al., 2006; Kuzmina, Goral, Norvik, & Weekes, 2019; Lutz, Lee, & Weekes, 2018; Momenian, Nilipour, Samar, Cappa, & Golestani, 2018; Newman, Tremblay, Nichols, Neville, & Ullman, 2012; Ramanujan, 2019; Sörman, Hansson, & Ljungberg, 2019; Suh et al., 2007; Yan, Zhang, Xu, Chen, & Wang, 2016). Overall, it can be assumed that bilingual speakers are a more heterogeneous group compared to monolinguals; i.e. they have a wider range of participant level variability in language experience and language processing. This offers a unique way to test models of timed picture naming given that participant-level variability in monolingual speakers is typically highly controlled under the assumption that speakers are not variable in processing.

In this study, we test whether psycholinguistic variables which are assumed to impact picture naming in monolingual speakers are different in bilingual speakers. We first adapted Druks and Masterson’s object and action naming battery (OANB) (Druks & Masterson, 2000) for bilingual speakers. To make a direct comparison between bilingual and monolingual speakers, we then asked Cantonese-English (C-E) and Mandarin-Cantonese (M-C) bilingual speakers to name pictures in ‘monolingual mode’, a term borrowed from Grosjean (2001), referring to sustained maintenance of the first language while the second language is inhibited, and compared the data to monolingual Chinese speakers as a baseline. We measured the effects of individual differences on timed picture naming in both groups such as age, education level, and amount of exposure to spoken and written language in first (L1) and second (L2) languages for bilinguals.

Two contrasting hypotheses will be tested. If the effects of item-level and participant-level variability are similar in bilingual and monolingual speakers, then the variables which have an effect on picture naming in monolinguals should impact naming of bilingual speakers. Under this hypothesis, both monolingual and bilingual speakers should respond similarly to the key psycholinguistic variables. The alternative hypothesis is that key factors will have different effects in bilingual and monolingual speakers. We predict that we will see more variability in the bilingual speakers’ responses based on the argument that bilingual speakers’ individual differences and diverse language learning experiences could modulate the effects of the item-level variables (Edmonds & Donovan, 2012). Based on this hypothesis, we expect to see differences between monolingual and bilingual speakers in the effects of psycholinguistic variables such as AoA and frequency which are modulated more by bilingual experiences, whereas the effects of variables that are related to the conceptual level of processing such as imageability might be more consistent across both populations of monolingual and bilingual speakers. This hypothesis is based on the weaker links hypothesis (WLH; Gollan, Montoya, Fennema-Notestine, & Morris, 2005; Gollan, Montoya, Cera, & Sandoval, 2008; Kroll & Gollan, 2013). The WLH predicts different effects for frequency due to weaker links between semantics and phonology in the bilingual lexicon in comparison with the monolingual lexicon.

Methods

Preparatory study

Forty early C-E and 24 early M-C bilingual speakers were presented with 162 pictures of objects and 100 pictures of actions from OANB (Druks & Masterson, 2000). They were asked to write down the names of all pictures via an online self-paced Google docs form. If they did not know the name of the picture, they could select an option which read ‘not familiar/do not recognize the item’. Name agreement (NA) was derived for each picture by selecting the most dominant Mandarin or Cantonese name (the name with the highest token frequency count among competitors). The cut-off score for objects was 40% and for actions was 30%. Items below the cut-off were removed. Linguistically unfamiliar and culturally inappropriate items were also removed, e.g. baseball bat. For items that had the same modal name in Cantonese or Mandarin (i.e. 服務員 = waiter/waitress), one was excluded from the study. Hence, the number of items for rating was 144 objects and 86 actions in Cantonese, and 145 objects and 84 actions in Mandarin.

A total of 54 early C-E bilingual speakers and 56 early M-C bilingual speakers participated in the rating studies. Rating data was collected in two sessions either individually or in small groups with a 1-week interval between the sessions. In the first session, ratings were produced for NA and VC. In the second session, ratings were produced for AoA, imageability, and familiarity of the names with the highest agreement produced by the whole group in the previous week. The order of rating for each variable was randomized across participants in each session in addition to item randomization across participants. Recruiting age- and education-matched monolingual speakers of Mandarin and Cantonese in Hong Kong is impossible because everyone speaks at least two languages. We therefore collected ratings from 37 monolingual Mandarin speakers for 145 objects and 84 actions in Mainland China. The monolingual and bilingual participants were all age- and education-matched.

We followed Druks and Masterson’s (2000) rating instructions for all variables across all groups. For NA, participants were asked to write down the name of the target line drawing in their native language. They were instructed to produce only one name per picture. For rating of VC, participants were asked to rate the VC of each drawing in terms of the existence of visual details and lines of the drawing. A value of 1 indicated the lowest VC of the drawing, and 7 indicated the highest VC. In addition to subjective VC, we established objective VC measures using computational algorithms (Donderi, 2006; Machado et al., 2015; Székely & Bates, 2000). Objective VC measures have shown correlation with subjective VC and are also not confounded with variables such as familiarity and frequency (Bates et al., 2003). We will use the objective measure for our modelling of data in this study.

For the AoA rating, the participants were asked to judge the age at which they thought they had acquired each word, using the scale of 1 = 0–2 years old; 2 = 3–4 years old; 3 = 5–6 years old; 4 = 7–8 years old; 5 = 9–10 years old; 6 = 11–12 years old; 7 = 13 years old or above. We also reviewed AoA of words from Mandarin and Cantonese parent-report norms such as MacArthur-Bates Communicative Development Inventories (CDIs) (Frank, Braginsky, Yurovsky, & Marchman, 2017). However, many of the object and action names we used in this study were missing in the parent-report norms which made it difficult to look at the correlation between rated AoA and parent-report AoA. For the Imageability rating, participants were asked to rate how readily each word could arouse a mental image. A value of 1 indicated that the word evoked its mental image with the greatest difficulty, and 7 indicated that it evoked the image most readily. For the familiarity rating, the participants were asked to rate the extent to which they came into contact with or thought about the object or action represented by a word. A value of 1 indicated the target was very unfamiliar, and 7 indicated very familiar. Cantonese spoken words were depicted in traditional format in print, whereas Mandarin spoken words were depicted in simplified form due to writing reform of the PRC in the mid-20th century.

We calculated the reliability of ratings for familiarity, subjective VC, imageability, and AoA using intraclass correlation (ICC). ICC is an index to assess the reliability of human ratings (Shrout & Fleiss, 1979). Koo and Li (2016) suggest the following ICC rubric for assessing the reliability of a measurement: below 0.50: poor reliability; between 0.50 and 0.75: moderate reliability; between 0.75 and 0.90: good reliability; and above 0.90: excellent reliability. We calculated the corrected correlation following the procedure suggested by Nicewander (2018) (See Tables 1, 2, and 3).

Table 1 The corrected correlation matrix for the key variables in the C-E group
Table 2 The corrected correlation matrix for the key variables in the M-C group
Table 3 The corrected correlation matrix for the key variables in the monolingual mandarin group

The values for word frequency in Cantonese were obtained from the Sketch Engine database, computed from a corpus named Cantonese Web Corpus (CantoneseWaC) of over a million tokens. The Sketch Engine database is made up of texts collected from the internet for several languages (Kilgarriff, Reddy, Pomikálek, & Avinesh, 2010). For the values in Mandarin text, the SUBTLEX-CH corpus (Cai & Brysbaert, 2010) was used to establish the value for each item. This corpus is a spoken one consisting of film subtitles.

The ratings derived are available online as psycholinguistic norms for both Cantonese and Mandarin objects and actions in the following link (DOI 10.17605/OSF.IO/8HVWR).

Picture-naming experiments

Participants

Sixty C-E speakers (28 male, average age = 22.4, ranging from 18 to 34 years), 50 M-C bilingual speakers (18 male, average age = 21.8, ranging from 18 to 33 years), and 30 monolingual Mandarin speakers (12 male, average age = 20.6, ranging from 19 to 25) participated in the timed picture-naming task (see Tables 4 and 5). None of them had been tested in the preparatory study. They all had normal or correct-to-normal vision. All participants completed a language background questionnaire and a consent form prior to the experiment. The language background questionnaire included questions about the AoA of each language, reported proficiency for each language, amount of exposure to the written and spoken language, demographic information, and other questions about how they learned each of the languages (i.e. through TV, interaction, reading books, etc.). All of the bilinguals were early bilinguals based on the data from the questionnaire. They all reported they could communicate in both of their languages fluently. All of them had received formal education in their first language for at least 15 years. The monolingual Mandarin speakers had been taught English in school but was basic written English. They all reported they could not communicate in English. They had exposure to written and spoken Mandarin on a daily basis 85% and 87% of the time, respectively. Native language (L1) was defined according to the first acquired and most commonly spoken language on a daily basis based on answers from the questionnaire. Native language is denoted first, and non-native language is denoted second hereafter. All the procedures in this study were approved by the Ethical Committee at the University of Hong Kong (Tables 4 and 5).

Table 4 C-E participants’ language profile in the timed picture-naming experiment
Table 5 M-C participants’ language profile in the timed picture-naming experiment

Procedure

Timed picture naming was recorded in sound-proof rooms. Two blocks of stimuli (objects and actions) were designed using DMDX (Forster & Forster, 2003). Items were all randomized within each block. The block order was counterbalanced across participants. Responses were captured by a microphone with calibration completed before each session to ensure background noise was not recorded as a response. The input threshold level for recording was in fact adjusted to match the natural speaking volume of each participant. Participants were familiarized with the experiment format through practice trials before testing commenced. They were instructed to name pictures as quickly and accurately as possible in their first language. They were instructed not to cough, breathe loudly, move their heads, and produce starters or fillers, e.g. ‘um’, during or before each response. Each trial began with the presentation of a cross or fixation point at the centre of the monitor for 500 ms. After that, each picture was presented in the middle of the screen and remained until a response was detected, with 2000 ms as the time-out period. An error was recorded by DMDX if the participant was unable to respond within the time period. Participants’ errors including production of wrong names, nontarget sounds, hesitations, and voice-key failures were all recorded for off-line analysis.

Analysis plan of the latency data

We used linear mixed effects modeling (LMEM) using the lme4 package with R software to analyze the data (see Baayen, Davidson, & Bates, 2008, for a good introduction to LMEM). The practice suggested by Barr, Levy, Scheepers, and Tily (2013) and Bates, Kliegl, Vasishth, and Baayen (2015) was followed to fit the models. Our dependent variable was transformed naming latency recorded as reaction time (RT) using common log transformation. The missing and incorrect responses (C-E group: 9.8%; M-C group: 11.2%; monolingual group: 14.2%) were removed from further analysis. Every response which was different from the dominant name was considered incorrect. Before modelling, we standardized the continuous independent variables. To check the collinearity among the variables, we used a variance inflation factor (VIF). Variables with a VIF value above 5 should be removed from the analysis based on the recommendation by Craney and Surles (2002).

As in previous RT studies, we began with a maximal model (Barr et al., 2013), entering all the fixed variables of interest including objective VC, imageability, AoA, log frequency, NA, familiarity, and grammatical class (GC). We added variables such as participants’ age, education level (in months), reported L1 spoken and written use, and reported L2 spoken and written use as control fixed variables only for the bilingual analyses. The interaction between GC and other fixed effects (except for age and education) was also included in the model. For the random-effects structure of the model, random intercepts of items and subjects together with by-subject random slopes for all fixed effects of interest including VC, imageability, AoA, log frequency, NA, familiarity, and GC were added to the model.

In order to find the most parsimonious random effects structure for our data, we performed a singular value decomposition (SVD) on the covariance matrix of the maximal model using principal component analysis (PCA) (Bates et al., 2015). PCA could tell which of the random effects structures were not contributing significantly to the model. This further helped resolve unnecessary complications such as over-specification and convergence problems in the model. PCA was accompanied by likelihood ratio tests (LRT) for both statistical significance and model evaluation. Random effects correlation parameters were not included in the maximal model at first to prevent convergence problems (Bates et al., 2015). However, once the most appropriate random effects structure was determined, correlation parameters were added to the model and compared with the model without correlation parameters using LRT.

Having dealt with the random effects structure of the model, we then turned to the fixed effects part. Determining which variables have a significant effect is controversial in LME models. Many approaches have been proposed, including Wald tests, LRT, Markov-chain Monte Carlo (MCMC), Kenward-Roger, Satterthwaite, and parametric bootstrapping (Baayen et al., 2008; Barr et al., 2013; Luke, 2017; Pinheiro & Bates, 2000). Wald and LRT are less computationally demanding (Luke, 2017). In order to judge which fixed effects were significant, we used conditional F-tests because doing LRT on the fixed effects is anti-conservative and could result in misleading findings (Halekoh & Højsgaard, 2014; Luke, 2017; Pinheiro & Bates, 2000). We used Kenward-Roger approximations to calculate denominator degrees of freedom which have shown more acceptable type 1 error rates in comparison with LRT and Wald tests (Kuznetsova, Brockhoff, & Christensen, 2017; Luke, 2017). All analysis procedures described above were applied to the data from all timed picture-naming experiments. The above-mentioned analysis pipeline was applied to both the bilingual and monolingual data. To determine whether the experiments had enough power, we consulted Brysbaert and Stevens (2018). They recommend a properly powered repeated-measures study should have at least 40 participants with 40 stimuli (1600 observations per condition). The number of participants and stimuli in our study exceeded this threshold well enough. The data for both bilingual experiments and R codes are available online at the following link (DOI 10.17605/OSF.IO/8HVWR).

Results

Monolingual group

We first checked for multicollinearity by applying VIF. VIF values were below the cut-off, except for age of participants. We therefore removed age from further models. We standardized all predictor variables before the analysis. The removal of the frequency x2(1) = 1.05, p = 0.30, familiarity x2(1) = 0.0, p = 1, and AoA x2(1) = 0.0, p = 1 from the random effects structure did not have any effects on the model fit. However, when the following random effects were removed, a significant change happened in the model fit: VC x2(1) = 4.60, p < 0.05, imageability x2(1) = 3.91, p < 0.05, NA x2(1) = 16.29, p < 0.001, and GC x2(3) = 133.57, p < 0.001. The removal of zero correlation parameters did not change the model fit x2(8) = 11.84, p = 0.15. The only significant fixed effects were GC, imageability, and NA. The AoA effect was marginally significant (See Table 6 for a summary of the model).

Table 6 A summary of the significant effects in the monolingual group

Mandarin-Cantonese group

We did two separate analyses for this group. In the first analysis, we used normed ratings for psycholinguistic variables from the bilingual M-C population to predict reaction time. In the second analysis, we used ratings from a monolingual Mandarin population. VIF results suggested age should be removed for both analyses for its high collinearity with education level. As in the previous analyses, we used PCA accompanied by LRT. In the former analysis, by-subject random slopes of frequency x2(1) = 1.96, p = 0.16 and imageability x2(1) = 2.02, p = 0.15 were not significant. PCA showed that frequency and imageability accounted for less than 1% of the whole variance in the random effects structure. However, the exclusion of familiarity x2(1) = 14.17, p < 0.001, AoA x2(1) = 14.33, p < 0.001, NA x2(1) = 10.97, p <0.001, VC x2(1) = 10.65, p < 0.01 and GC x2(3) = 115.92, p < 0.001 changed the model fit significantly. This means that M-C bilingual speakers showed different degrees of effect when it comes to AoA, GC, imageability, NA, familiarity, and VC. We then removed the zero correlation parameters and performed another LRT. The results showed that removing the correlation parameter had no significant effect x2(13) = 17.70, p = 0.16. Therefore, we decided to remove correlation parameters in the random effects model (see Table 7 for a summary of the LMEM). Finally, we tested the fixed effects and their interactions in the model. The results are summarized in Table 8. Figure 1 shows the GC interactions with AoA, VC, and imageability.

Table 7 A summary of the significant effects in the M-C bilingual group with the bilingual ratings
Table 8 A summary of the significant effects in the M-C bilingual group with the monolingual ratings
Fig. 1
figure 1

GC interactions with AoA, imageability, and VC in the M-C group with a 0.95 confidence interval (CI). RT is on original scale for display purposes in this figure

In the second analysis, we wanted to see how ratings of psycholinguistic variables collected from monolingual Mandarin speakers would predict picture naming in bilingual M-C speakers. The random effects of frequency x2(1) = 1.79, p =0.18 and imageability x2(1) = 0.11, p =0.73 were not significant. However, the random effects of familiarity x2(1) = 6.22, p < 0.05, AoA x2(1) = 12.96, p < 0.001, NA x2(1) = 31.30, p <0.001, VC x2(1) = 10.38, p < 0.01 and GC x2(3) = 124.56, p < 0.001 were all significant. The addition of the correlation parameters to the random effects structure had no significant effects on the model fit x2(13) = 17.62, p =0.17. The significant fixed effects, plus two marginally significant interactions, are summarized in Table 8.

Cantonese-English group

We followed exactly the analysis pipeline outlined above. VIF results suggested that L2 spoken and written use should be removed from the analysis for high collinearity with L1 spoken and written use. Based on PCA results and the variance-covariance matrix, we started removing random effects with the lowest variance followed by LRTs. The removal of familiarity x2(1) = 0.17, p = 0.67, NA x2(1) = 3.42, p = 0.06, frequency x2(1) = 1.12, p = 0.28, and VC x2(1) = 0.001, p =0.97 did not have any significant effects on the model since they accounted for less than 3% of the variance in the random effects structure. The exclusion of AoA x2(1) = 13.76, p < 0.001, imageability x2(1) = 9.32, p < 0.01, and GC x2(3) = 108.24, p < 0.001, however, changed the model fit significantly. In other words, the effects of AoA, imageability, and GC were not exactly the same across all participants. Finally, we removed all of the zero correlation parameters and performed an LRT. The results showed that a model with correlation parameters included was not better x2(4) = 0.0, p = 1 (see Table 6 for a summary of the LMEM).

Regarding the fixed effects, only familiarity, imageability, and NA significantly predicted RT in C-E bilingual speakers (See Table 9 for further information).

Table 9 A summary of the significant effects in the C-E bilingual group

Discussion and conclusion

In this study, we first established normative data for object and action pictures for both monolingual and bilingual Chinese speakers. We then tested influences of key psycholinguistic variables on timed picture naming across bilingual and monolingual speakers. NA and imageability, as we had predicted, were the only variables to show robust effects across all groups, suggesting that these variables were independent of individual differences and language learning experiences. However, substantial variability was observed in the bilingual groups in terms of interactions with GC. The differences in the results among the groups clearly suggest that bilingual experiences and language-specific properties can modulate the effects.

NA is often found to influence lexical access across many studies (Alario & Ferrand, 1999; Bates et al., 2003; Szekely et al., 2005). Research in several languages shows that the number of alternative names associated with an object or action has an effect on how quickly it is named in monolinguals (Alario et al., 2004; E. Bates et al., 2003); the less alternative names an object or action has, the faster it should be named. According to the codability hypothesis (Ramanujan & Weekes, 2019), the same effect of NA is expected for bilingual speakers, but speaking two languages may give rise to cross-linguistic interference and thus within language effects. This is because in bilingual speakers, every concept has at least two language name associations in the lexicon, leading to higher amounts of competition and, more critically, interference compared with monolingual speakers according to models of bilingual language processing (Abutalebi, & Green, 2008; Green, 1998; though see Gollan, Montoya, Cera, & Sandoval, 2008, for a different explanation but the same effect).

We observed a continuum in the strength of NA effects. NA had the strongest effects for the monolingual speakers, followed by the C-E bilingual speakers, and finally the M-C bilingual speakers. We suggest the largest effect was observed in monolingual speakers because there is no between-language competition. However, bilingual speakers have more alternative names generated from the conceptual system (at least two names for each concept), and this will create interference at the level of speech production even in monolingual mode (see (Abutalebi, & Green, 2008; Green, 1998). The inhibitory role of higher linguistic similarity was also evident because we only observed a significant by-participant variability in codability in the M-C group. One may contend that when the languages spoken are more similar, e.g. Cantonese and Mandarin, there is a trend toward an interaction with by-participant variability. By-participant variability for codability was also reported in Ramanujan and Weekes (2019) where bilinguals spoke unrelated languages (Hindi and English). Taken together, we contend that this effect cannot be attributed to linguistic similarity only.

We found that NA was a strong predictor of naming latency in all groups of bilingual and monolingual speakers. However, a more important finding was that this effect was not uniform across all bilingual speakers. This could mean that codability is not only a property of the item, but it is also a product of individual differences, i.e. within-participant variability. Modelling this variability is only possible using sophisticated statistical techniques such as LMEM and ideally tested with groups that are heterogeneous. What is the source of this variability? It cannot be attributed solely to the bilingual lexicon that can be studied in an experimental setting, but rather to the evolving context of learning languages. It is the context and learning history that defines the vocabulary of a bilingual speaker as well as naming patterns, degree and patterns of usage, and sociolinguistic prestige earned for each of the languages (see (Malt, Sloman, & Gennari, 2003). This may explain the higher degree of variability in picture naming of bilingual speakers.

The effect of AoA was not similar across the groups. It is typically assumed in the monolingual literature that AoA is a reliable and robust predictor of timed picture naming. An interaction between AoA and GC was observed only for the M-C group where only object naming was affected. AoA was partially significant in the monolingual group, with no effect in the C-E group. The significant effect of AoA on object naming is consistent with previous studies (Bakhtiar, Nilipour, & Weekes, 2013; Bakhtiar, Su, Lee, & Weekes, 2016; Patrick Bonin, Guillemard-Tsaparina, & Méot, 2013; Liu, Hao, Li, & Shu, 2011; Perret & Bonin, 2018). However, the null effect of AoA in the C-E bilingual speakers contrasts with results from other studies (Khwaileh et al., 2018; Schwitter, Boyer, Méot, Bonin, & Laganaro, 2004; Shao, Roelofs, & Meyer, 2014). One reason that AoA did not have a significant effect may be a high correlation between AoA and familiarity. This multi-collinearity could diminish the effects. However, this possibility was ruled out as the VIF analysis showed a safe degree of collinearity between AoA, imageability, and familiarity despite rather high correlation values. In addition, we ran another LMEM without including familiarity on the C-E data, and AoA was not still significant. One other study on Chinese action naming also reported a borderline effect of AoA (Chen & Zhu, 2015).

We believe that individual level differences in bilingual speakers such as the amount of L1 and L2 written and spoken usage and the amount of proficiency in each language could modulate AoA effects. The evidence to support this claim comes from the random effects structure of the models. While AoA was not a significant random effect in the monolingual group, in both bilingual groups, the random effect of AoA contributed significantly to the model. This means that the AoA effect was uniform across monolingual speakers, which makes sense given the lack of inter-individual variability in this group. However, in the bilingual speakers, the effect was not uniform, suggesting that in some participants, it was a significant predictor and in some it was not. This finding is in line with studies which show bilingual speakers’ individual experiences interact with the item-level properties (Edmonds & Donovan, 2012; Kroll & Gollan, 2013; Ramanujan & Weekes, 2019).

In line with several previous studies, rated imageability was a strong predictor of picture naming (Akinina et al., 2015; Bakhtiar, Jafary, & Weekes, 2017; Bird et al., 2001; Ramanujan & Weekes, 2019). It thus seems likely that imageability has a more central role than AoA in picture naming for Chinese speakers. Several linguists have argued that Chinese words carry richer and more fine-grained semantic and sensory features compared to English (Ma, Golinkoff, Hirsh-Pasek, McDonough, & Tardif, 2009). For instance, in English, one plays the piano, violin, or flute, but in Chinese (literally speaking), one ‘plucks piano with fingers’, ‘pulls violin’, and ‘blows flute’. Given the additional semantic load that is carried by verbs (and actions) in Chinese, we contend that the semantic representations of words could play a more concrete role in Chinese processing than in languages such as English wherein the morphosyntax lessens the semantic load on word structure allowing them to take a more abstract role in naming, therefore diminishing the impact of imageability on picture naming. Our results seem to suggest that imageability plays a more central role in picture naming across Mandarin and Cantonese than AoA because of the importance of conceptual-semantic information.

Word frequency was not significant in any analysis. This finding is in line with Ramanujan and Weekes’ study (2019) and other studies of timed picture naming (Bastiaanse, Wieling, & Wolthuis, 2016; Patrick Bonin, Chalard, Méot, & Fayol, 2002; Nishimoto, Miyawaki, Ueda, Une, & Takahashi, 2005). These researchers contend that AoA is a proxy for word frequency, and therefore once AoA is taken into consideration, the effect of frequency will not be explained (Balota, Cortese, Sergent-Marshall, Spieler, & Yap, 2004; Zevin & Seidenberg, 2002). However, another explanation for the null effect of frequency could be related to the type of corpora used to extract the frequency values in this study. The corpora available are usually created based on monolingual language use which only reflects cumulative frequency of the words (Zevin & Seidenberg, 2002). This is most relevant in the context of bilingual speakers whereby the patterns and degree of L1 and L2 usage are different compared to monolingual speakers (Edmonds & Donovan, 2012; Kroll & Gollan, 2013; Ramanujan & Weekes, 2019). Although we tried to mitigate this problem using a spoken corpus for Mandarin groups, we think that even this corpus could not adequately capture the variability of L1 and L2 language use in bilingual speakers.

An important outcome of this study is that key psycholinguistic variables predicted the response similarly across all groups of monolingual and bilingual speakers. However, bilingual speakers showed more variability in variables such as AoA. Moreover, we witnessed a difference between monolingual and bilingual norms in their capacity to explain GC interactions. This pattern was more evident when the bilinguals’ languages were more similar to each other. We think that collecting normative data from bilingual participants is essential work. Only when such data is available will we be able to properly document the diversity associated with bilingual language processing.

We revealed the similarities and differences between bilingual and monolingual lexical access. Variables like AoA and frequency that are assumed to impact on lexical selection and encoding (Alario et al., 2004) in all models of (monolingual) timed picture naming did not have significant effects in bilingual speakers. On the other hand, we identified that the effects of psycholinguistic variables are not completely uniform among all bilingual speakers. We witnessed a substantial amount of individual variability among different groups of bilingual speakers. Specifically, the bilingual speakers who spoke similar languages (Mandarin and Cantonese) showed extensively higher variability in the effects of psycholinguistic variables on timed picture naming. This variability can be attributed to linguistic and non-linguistic factors as outlined above. However, by using LMEM, we could capture and explain this variability as much as possible taking into account individual differences including demographic information such as age and education and patterns of language use. We acknowledge that other participant-level properties could be added to the model. An important new insight from the present study is that by testing the heterogeneity of bilingual language experience at the participant level, it is possible to explain more variability in timed picture naming. Thus, it is recommended that future studies include participant-level characteristics in their analyses, including even in studies of monolingual speakers. Such variables may include exposure to print, vocabulary size, spelling abilities, and other individual differences that are seen in the general population but typically controlled.

We did not have data for monolingual Cantonese speakers. Finding monolingual Cantonese speakers is not possible in Hong Kong or in Mainland China to the best of our knowledge. There might still be some monolingual Cantonese speakers available, but they are not suitable for testing predictions about picture naming because they are illiterate or aged. Moreover, they are not representative of the population. This makes it difficult to compare a matched monolingual baseline for the C-E bilingual group which is ideal.

We suggest there is a greater need to develop psycholinguistic norms based on data from multilingual speakers as they form the larger portion of the global population. Furthermore, replication studies are needed from other multilingual populations living in diverse sociolinguistic contexts with differing amounts of language use and linguistic distance.