Elsevier

Journal of Phonetics

Volume 88, September 2021, 101094
Journal of Phonetics

Research Article
The role of L2 experience in L1 and L2 perception and production of voiceless stops by English learners of Spanish

https://doi.org/10.1016/j.wocn.2021.101094Get rights and content

Highlights

  • L2 experience has a greater influence on L2 production than on L2 perception.

  • Perception and production of L1 stops are not affected by amount of L2 experience alone.

  • L2 speakers can produce L1 and L2 stops differently despite maintaining a single perceptual boundary.

  • Perceptual category boundaries and VOT production for stops are not correlated neither in the L1 nor in the L2.

Abstract

Some previous studies report that increased experience with a second language (L2) may result in a more target-like perception and production in the L2, as well as in a less native-like performance in the L1. The present paper aimed to (1) assess the role of L2 experience on L2 and L1 production of voiceless stops; (2) investigate the effect of L2 experience on L2 and L1 perception of voiceless stops; and (3) examine the relationship between perception and production. Three groups of English learners of Spanish differing in amount and type of L2 experience, as well as two groups of functional monolinguals, completed a production task and an identification task involving English and Spanish voiceless stops. The results revealed that the L2 speakers were more successful at producing than at perceiving Spanish stops accurately, with L2 experience having a positive effect on production. L2 experience was not found to affect performance in the L1, which could be related to an overall limited amount of L2 use even in an immersion setting. The results also showed a weak relationship between perception and production, which may partly be due to the different nature of perceptual and production measures.

Introduction

The fact that a speaker’s native or first language (L1) exerts an influence on a subsequently learned or second language (L2) is widely attested (e.g., Flege, 2007, Flege et al., 1999). The reverse type of influence, that is, the influence of the L2 on the L1, is also reported (e.g., Flege, 1987, Kartushina et al., 2016), although it has received less attention, particularly regarding the perception of L2 sounds. In addition, the degree of cross-linguistic influence may be modulated by factors such as the amount of L1 and L2 experience and use, and the degree of similarity between the L1 and L2 sounds (Best and Tyler, 2007, Flege and Bohn, 2021). This study aims to explore these issues further by contrasting how L1 English L2 Spanish speakers differing in the type and amount of L2 experience produce and also perceive voiceless stops both in the L1 and the L2, and compared to monolingual Spanish and monolingual English speakers. The following sections review the main issues and findings regarding the direction of crosslinguistic influence (CLI) and the effect of experience on CLI, the main claims of the relevant L2 speech models and the relationship between perception and production. Given that the focus of this paper is the perception and production of stops, the review of the literature will center mostly on these consonants. The characteristics of stops in Spanish and English are presented first.

Voice onset time (henceforth VOT), i.e., the interval between the release of the stop and the onset of voicing, has been established as the main cue to the voiced-voiceless distinction in initial stops in a number of languages, including English and Spanish (Lisker & Abramson, 1964). This study focuses on English and Spanish bilabial and velar voiceless stops, which share manner and place of articulation in both languages but differ in the use of VOT to contrast voicing (Lisker & Abramson, 1964). Coronal stops are alveolar in English and dental in Spanish and will not be analyzed in this study. Table 1 summarizes the mean VOT values for English and Spanish bilabial and velar stops reported in the literature.

English voiceless stops are produced with long-lag VOT. For instance, Lisker and Abramson (1964) reported that American English speakers produced /p/ and /k/ with a mean VOT duration of 58 and 80 ms, respectively. Similarly, the mean VOT values described for British English speakers are 42 ms for /p/ and 62 ms for /k/ (Docherty, 1992). English word-initial voiced stops tend to be produced with short-lag VOT: Lisker and Abramson (1964) report mean VOT values of 1 ms for /b/ and 21 ms for /g/, while Docherty (1992) reports 15 ms and 27 ms, respectively. Still, voice-lead productions are also found for native English voiced stops (Dmitrieva et al., 2015, Lisker and Abramson, 1964). Regarding Spanish, voiceless stops are produced with short-lag VOT and voiced stops present voice-lead VOT. According to Lisker and Abramson (1964), Puerto Rican Spanish speakers produce /p/ with a mean VOT value of 4 ms, /b/ with -138 ms, /k/ with 29 ms and /g/ with -108 ms. Similarly, Castañeda (1986) reported mean VOT values of 6.5 ms, -69.8 ms, 25.7 ms and -58 ms for /p, b, k, g/, respectivley, in European Spanish. Importantly, short-lag VOT values signal a voiceless stop in Spanish and a voiced category in English.

In the speech of bilingual and second language (L2) speakers, cross-linguistic influence, that is, the influence of one language on another, is a common phenomenon. Typically, the first language (L1) exerts an influence on the perception and production of the second language (L2), especially in the case of adult L2 learners (e.g., Antoniou et al., 2010, Caramazza et al., 1973, Carlson et al., 2016, Flege, 1991, Flege et al., 2002). For example, Flege (1991) investigated the production of English and Spanish /t/ by adult Spanish learners of English who had lived in the United States for 14 years. He found that the L2 speakers produced English /t/ with VOT values that were intermediate between the expected L1 and L2 values. Similarly, a group of Canadian French L1 English L2 bilinguals who had started learning English before the age of 7 produced English stops with significantly longer VOT than French stops, but significantly shorter VOT than English monolingual speakers (Caramazza et al., 1973).

Still, L2 speakers may achieve or approximate accuracy in their production and perception of the L2. Previous research has shown that increased experience, particularly a longer length of residence in an L2 setting, may result in a more target-like L2 (Bohn and Flege, 1990, Flege, 1987, Flege et al., 1997, Jun and Cowie, 1994, Levy and Law, 2010, Stevens, 2001, among others). For instance, Stevens (2001) found that L1-English learners of Spanish in Spain improved their production of L2 voiceless stops (i.e., produced shorter VOT) to a greater extent than L2 Spanish learners in the US. Flege (1987) examined L1 and L2 production by different groups of L2 speakers: two groups of American English learners of French residing in the L1 setting but differing in years of learning and amount of time previously spent in an L2 setting, one group of American English speakers residing in France and one group of French speakers living in the US. Regarding the two L2 French speakers in an L1 setting, the less experienced group produced /t/ in English and French similarly, while the more experienced group distinguished English /t/ and French /t/, although the L2 /t/ was not target-like. By contrast, the L1-English L2-French speakers residing in France did not differ significantly from monolingual French speakers in their production of French /t/. Hence, Flege found evidence of a positive effect of L2 experience on L2 learning.

L2 experience, however, is a broad term that is used to refer to a variety of interrelated factors found to affect L2 (and L1) performance in addition to years of learning and learning setting. These include L2 proficiency, amount of L1 and L2 use, language immersion, instruction, motivation and aptitude, among others (Kartushina et al., 2016, Piske et al., 2001, Purcell and Suter, 1980). Piske et al. (2001) found that language use was the second most important factor determining L2 performance, after age at which L2 learning started. Amount of L1 and L2 use is in fact a factor that distinguishes the groups of L2 learners compared in Flege (1987). Recall that the L1-English L2-French speakers in Flege (1987) exhibited a positive effect of experience whereas the L1-Spanish L2-English speakers in Flege (1991) did not. This inconsistency may be explained by the fact that participants in Flege (1987) had native L2 speaking spouses and, consequently, made greater use of the L2, whereas half of the adult learners in Flege (1991) lived near the Mexican border, where there is a strong Spanish-English bilingual community. Similarly, Purcell and Suter (1980) found that a compound variable which combined years of residence in the L2 context and L2 use at home – measured by years living with native speakers of the target language – was one of the main predictors for degree of L2 accentedness. In fact, the revised version of the Speech Learning Model (SLM-r, see section 1.3 for the models’ descriptions), proposes a variable that results from multiplying years of residence in the L2 setting by the proportion of L2 use (referred to as full time equivalent or FTE) as a more appropriate approach to quantifying L2 input. For instance, Aoyama and Flege (2011) investigated the acquisition of L2 liquids by Japanese learners of English and reported a correlation between years of FTE and their ability to discern between English /r/ and /l/. Finally, intensive L2 instruction may also enhance the effect of experience. For example, Casillas (2020) found that L1 English learners of Spanish substantially improved their production of L2 stops within a 7-week immersion program with intensive L2 instruction and L2 input (and suppressing L1 use) in an instructional setting. The present study investigates the perception and production of L1 and L2 stops by adult L2 learners differing in L2 experience, understood as the amount of L2 experience (years of L2 learning) and type of L2 experience (in the L1 or in the L2 setting).

Previous findings illustrate that CLI also affects L2 speakers’ L1, showing that the L1 and the L2 are dynamic systems that interact in the course of the L2 learning process (Chang, 2012, Flege, 1995; see Kartushina, Frauenfelder, et al., 2016, for a review). Kartushina, Frauenfelder, et al. (2016) explain that absence of CLI may be found with simultaneous bilinguals and very early bilinguals (e.g., who started learning the L2 by the time they were two years old). These speakers may have distinct categories for each language and these categories may resemble those of native monolingual speakers of each language. In addition, later L2 learners with limited exposure to the target language may not be able to create separate L2 categories and use unmodified L1 categories for both L1 and similar L2 sounds (Flege, 1995). On the other hand, Kartushina, Frauenfelder, et al. (2016) suggest that backward transfer (effect of L2 learning on L1 categories) occurs under two possible main circumstances. The first one is the case of L2 speakers at an advanced stage of L2 learning and who make predominant use of the L2 usually in an L2 immersion setting (e.g., Bergmann et al., 2016, Flege, 1987, Major, 1992, Harada, 2003). The second case involves learners at an early stage of L2 learning, characterized by an immersion setting, as discussed below (Chang, 2012, Kartushina et al., 2016, Sancier and Fowler, 1997).

Regarding the former, Harada (2003) reported that heritage Japanese speakers in the United States were found to produce L1 voiceless stops with significantly longer VOT values than Japanese monolinguals, whereas their L2 production did not differ from that of English monolinguals. Major (1992) examined the production of /p, t, k/ by L1 English speakers who migrated to Brazil as adults and had spent between 12 and 35 years in the L2 country. Their productions of VOT in Portuguese, and in English, were found to deviate from those of monolingual speakers of these languages. Similarly, Flege (1987) found evidence that the two most experienced groups, that is, the Americans in France and the French speakers in the US, with around 12 years of immersion in the L2 setting, also differed from monolingual speakers in their L1 production. By contrast, Riney and Okamura (1999) found no effect of L2 experience on the production of L1 stops with highly advanced learners: L1-English speakers of Japanese in an immersion setting – as well as L1-Japanese L2-English speakers who had lived in an immersion setting but were back in Japan – produced L1 stops with native-like VOT values. It should be noted, however, that only five speakers were tested for each L1 and that all of them were linguists or language teachers. Thus, a reduced sample size and the participants’ presumable metalinguistic knowledge may explain the absence of phonetic drift of the L1 towards the L2 in this case.

The second type of case of reported backward CLI involves relatively short periods of residence. For example, some studies investigating the effect of short (though repeated) stays in an L2 setting have found that the L1 VOT production may also drift towards more L2-like values after several months (4.5) in an L2 setting (Sancier & Fowler, 1997). In addition, the learners’ L2 production may become less target-like only a few weeks after returning from a short stay in the L2 setting (Sancier and Fowler, 1997, Tobin et al., 2017). L2 influence on the L1 has even been reported for novice L2 learners (Chang, 2012, Kartushina et al., 2016). Chang (2012) investigated the production of L1 and L2 vowels and stops by American English speakers who were enrolled in a Korean language course in Korea. Phonetic drift of the L1 towards L2 values occurred as early as the second week of L2 classes, underscoring the dynamic nature of the L1 and the L2. Moreover, the influence of the L2 on the L1 was greater for beginner learners than for learners who already had some knowledge of Korean (Chang, 2013), which was attributed to a novelty effect, that is, a greater salience and heightened encoding of L2 stimuli in the context of a novel perceptual experience. Along similar lines, Kartushina, Hervais-Adelman, et al. (2016) report a drift in L1 vowel production toward L2 values only after one hour of vowel training involving articulatory feedback. It is possible that a combination of an immersion setting and L2 instruction or specialized training may have a greater impact on the possibility of modifying L1 and L2 patterns.

The studies reviewed in the previous paragraph show a type of CLI influence that involves phonetic drift from the L1 towards L2 values, also referred to as category assimilation (Flege, 2003, see section 1.3). CLI can also result in category dissimilation, that is, the L1 categories deflect away (i.e., become more dissimilar) from L2 categories in an attempt to maintain a distinction between the L1 and the L2. This also results in L2 speakers' L1 categories differing from those of L1 monolinguals. For instance, Flege and Eefting examined the production of voiceless stops in L1 Dutch and L2 English (1987a) and in L1 Spanish and L2 English (1987b). Unlike in English, voiceless stops in Spanish and in Dutch are unaspirated (i.e., produced with short-lag instead of long-lag VOT). Flege and Eefting (1987b) found that early Spanish learners of English made a distinction between L1 and L2 stops, but produced Spanish voiceless stops with shorter VOT than Spanish monolinguals did. Their English stops also had shorter VOT than English monolinguals. In addition, Flege and Eefting (1987a) reported that only those Dutch L2 English speakers who were judged to sound more nativelike in English showed a deflection in L1 Dutch stops. Thus, deflection of L1 categories may occur as a strategy to distinguish L1 and L2 sounds when L2 categories have been established but are still not target-like.

In brief, L2 speakers may approximate target-language VOT patterns more the greater the amount of experience with the L2, particularly in an L2 setting and with predominant L2 use (Flege, 1987, Flege and Bohn, 2021). In addition, the L2 speakers’ L1 may also be affected as a consequence of L2 learning. This type of CLI may occur in immersion settings with highly proficient L2 speakers with a predominant L2 use and as the result of novelty effects. Kartushina, Frauenfelder, et al. (2016) suggest that, in cases of L1 phonetic drift, early learners and bilinguals may have a tendency to deflect L1 and L2 categories away from each other as a consequence of L2 category creation, while later learners may experience category assimilation. Kartushina, Frauenfelder, et al. (2016) add that the type of impact may be influenced by the degree of similarity between L1 and L2 sounds, the type and amount of L2 experience and the level of proficiency.

Only a few studies have investigated the effect that L2 experience may have on L1 and L2 perception (Cabrelli et al., 2019, Cebrian, 2006, Dmitrieva, 2019, Major, 2010), particularly regarding stops (Flege and Eefting, 1987a, Flege and Eefting, 1987b, Gorba, 2019), but some studies have compared L2 speakers and monolinguals (e.g., Caramazza et al., 1973, Hazan and Boulakia, 1993). Hazan and Boulakia (1993) reported that French-dominant French-English bilinguals perceived the /p/-/b/ contrast with French-like values in both languages, thus showing an influence of the L1 on the L2. Other studies have reported that L2 speakers may use perceptual categories that are intermediate between native and target language values. For instance, Caramazza et al. (1973) evaluated the perception of voiced and voiceless stops by monolingual and bilingual speakers using an identification task involving synthetic stimuli ranging from -150 to 150 ms VOT. The participants were ten Canadian English monolinguals, ten Canadian French monolinguals and twenty L1-French L2-English bilinguals who had started learning the L2 by the age of 7. The results showed that the English monolinguals identified voiceless stops later in the continuum than French monolinguals, as expected. They also showed a sharper and more consistent identification slope, which indicated that English speakers were more sensitive to the VOT cue than French speakers. Regarding the bilingual speakers, the results showed that they performed similarly in both languages, with intermediate VOT values between those of monolingual speakers of English and French, and obtained less consistent slopes than the English monolinguals and sharper identification functions than the French monolinguals. Similarly, Flege and Eefting (1987b) reported that Spanish-English early bilinguals and late childhood bilinguals living in Puerto Rico had /t/-/d/ category boundaries that were shorter (i.e., more Spanish-like) than those of English monolinguals – whereas bilinguals since birth did not. Flege and Eefting (1987a) found that Dutch speakers of English of three different levels of proficiency had a significantly later category boundary in English than in Dutch, but the difference was small. These results indicate that L2 speakers and L1-exposed bilinguals tend to have single perceptual categories for their two languages, which was either the L1 category (Hazan & Boulakia, 1993), or an intermediate cateogry (Caramazza et al., 1973, Flege and Eefting, 1987a, Flege and Eefting, 1987b). Hence, inability to develop separate categories may be explained by the fact that the L2 speakers were in their L1 country (Caramazza et al., 1973, Flege and Eefting, 1987a) and exposed to L1-accented English (Flege & Eefting, 1987b). Along similar lines, Ahn, Chang, DeKeyser and Lee-Ellis (2017) found that L1 Korean L2 English speakers started to perceive L1 contrasts less accurately the earlier their L1 input was reduced, underscoring the effect of relative exposure to the L1 and the L2.

Gorba (2019) contrasted the perception of L1 and L2 stops by three groups of Spanish learners of English differing in amount of L2 experience and learning setting, but with a similar amount of L2 instruction (about 13 years). The least L2 experienced group (INEXP) had never lived in an L2 setting, the group of experienced learners in Spain (EXP1) had spent an average of 9 months in an L2 setting and were back in their home country at the time of testing, and the most experienced group (EXP2) had been living in an L2 setting for a mean of 4 years and were still living in an immersion setting at the time of testing. It was found that all learner groups had numerically earlier – more Spanish-like – perceptual boundaries in English than English monolinguals, but this difference was only significant in the case of INEXP. When it comes to their L1, all groups presented longer VOT values than Spanish monolinguals, but only EXP2 had significantly later /p/-/b/ boundaries than Spanish monolinguals, indicating an influence of English on their L1. That is, instances of L1 influence on the L2 as well as L2 influence on the L1 were found to be related to amount of L2 exposure.

While results for production, despite some exceptions, tend to show an effect of experience on the L2, and also on the L1 for highly advanced learners in an immersion setting (in addition to cases of novelty effects, Chang, 2012), the results for perception are less consistent, probably due to an insufficient number of studies. One of the main contributions of the current paper lies in the fact that it investigates the effect of L2 experience not only on VOT production, like previous studies such as Flege (1987), but also on VOT perception, which has received scarce attention in the literature, particularly regarding L2 influence on the L1. In addition, L2 speakers’ performance in their L2 is contrasted with their own performance in the L1, as well as with data from monolingual speakers of each language. The role of L2 experience is assessed by comparing groups of L2 learners that differ in amount and type of L2 experience, namely number of years learning the L2 and learning setting. Although factors such as age of onset of L2 learning, amount of L2 instruction, L2 use and language dominance are often interrelated with other forms of experience (Kartushina, Frauenfelder, et al., 2016), the current paper will mainly focus on years of learning and learning setting. However, the possible role of additional factors will be addressed in the discussion. The first two sections of this paper have reviewed studies that presented different results for CLI and the possible factors that motivate them. The role of these factors is reflected in the major L2 speech models, which will be introduced in the next section, followed by a section reviewing the relationship between perception and production.

Some current models of cross-linguistic perception and L2 speech, such as the Perceptual Assimilation Model (PAM, Best, 1995; PAM-L2, Best & Tyler, 2007) and the Speech Learning Model (SLM; Flege, 1995, Flege, 2002, Flege, 2007; SLM-r: Flege & Bohn, 2021), posit that CLI may be motivated by the coexistence of the L1 and the L2 in a common phonetic space. PAM claims that non-native phones are perceived or categorized in terms of native categories (i.e., assimilated to L1 phones), or may be uncategorized or even heard as non-speech sounds (Best, 1995). As a consequence, in the initial stages of L2 acquisition, the interaction between the two systems, that is, the categorization of non-native sounds in terms of L1 categories, may block the creation of new categories for L2 phones (Flege, 1995, Best, 1995). However, L1 categories established in childhood may also experience changes as a consequence of L2 acquisition (Flege, 1987). The SLM, as well as its recently revised version SLM-r, describes two ways in which, as a consequence of experience with the L2 (or FTE, in SLM-r terms), L1 and L2 categories may interact, with possible consequences for existing L1 categories. L1 and L2 categories may drift toward one another by a process of category assimilation, that is, the development of an L2 category in the vicinity of an L1 category may result in a merged category that presents characteristics that are intermediate between the two languages (Flege, 1987, Flege, 2002). According to the nature of the linguistic input received, the merged category may present more L1-like or L2-like characteristics. Alternatively, the acquisition of an L2 category may result in a process of category dissimilation, that is, the new L2 and existing L1 category may drift away from one another in order to maintain a difference between them, resulting in L1 – and possibly L2 – categories that differ from monolingual speakers' categories (Flege, 2002). Exposure to the L2 may thus help the creation of a new category for the L2 phone, but this new category may affect existing L1 categories. Support for Flege’s hypotheses comes from studies that have found that L2 learners may perceive and produce L1 and L2 stops similarly with values that are intermediate between the two languages (e.g., Caramazza et al., 1973, Flege, 1987, Flege and Eefting, 1987b, Williams, 1977). For instance, the late French-English bilinguals living in an L2 English setting in Flege (1987) had a single category for French /t/ and English /t/ (that is, their L1 and L2 productions did not differ significantly) and this intermediate merged category had VOT values that were significantly different from the mean VOT for monolingual French and monolingual English speakers. Regarding category dissimilation, as discussed above, Flege and Eefting, 1986, Flege and Eefting, 1987b reported that early Spanish-English bilinguals produced Spanish /t/ with shorter VOT values than Spanish native speakers, presumably in order to make their L1 and L2 categories more phonetically distinct.

Conversely, the Second Language Linguistic Perception Model (L2LP, Escudero, 2005, Escudero, 2009) rejects the idea of a common phonetic space and posits that bilingual speakers have separate L1 and L2 systems. Initially, the L2 system is an identical copy of the L1, which develops towards target-like values as phonetic information specific to the L2 becomes available to the learner. As a result, the L2 categories will gradually become distinct from those of the L1 and present more target-like values. Given that the two systems are separate, no direct interaction between the L1 and the L2 categories is expected, although intermediate perceptions – i.e., with values between the two languages – may take place. Escudero explains cases of intermediate perception in terms of Grosjean’s (2001) language mode hypothesis, which claims that the state of language activation ranges on a continuum from a purely monolingual to a purely bilingual mode. Thus, according to the L2LP, CLI is motivated by the simultaneous activation of the two parallel systems. The results of some previous research showing that L2 learners perceive similar L1 and L2 phones differently can be interpreted in L2LP’s terms as support for the existence of separate systems for the two languages (e.g., Casillas and Simonet, 2018, Escudero, 2005, Gonzales and Lotto, 2013). For example, Escudero (2005) tested her own hypothesis on the perception of /ɛ/-/æ/ – a contrast that the author reports exists both in Canadian English and Canadian French but which is implemented differently in terms of cue weighting – by Canadian L1-English learners of Canadian French in each language. It was found that the L2 learners perceived the contrast differently when it was presented in English as opposed to French, as listeners adjusted cue weighting – first formant (F1) and duration – to the language that was being tested. That is, the learners relied mostly on F1 in the French condition and on both F1 and duration in the English condition. This result was interpreted as evidence that L2 learners can establish new categories for the L2 that are separate from the L1 categories and that cue weighting can be adjusted according to language. Casillas and Simonet (2018) tested the perception of Spanish and English stops by simultaneous English-Spanish bilinguals in a study that carefully controlled language mode activation and found that bilinguals had separate L1 and L2 category boundaries for the /p/-/b/ contrast. Moreover, results also revealed that, after a short immersive experience – i.e., a seven-week intensive course –, beginning English learners of Spanish started to make a difference between L1 and L2 /p/-/b/ categorization. The data obtained in the present paper will be discussed in light of the SLM’s and the L2LP’s main premises.

Another theoretical issue that is addressed in the present study is the relationship between perception and production. Although it has been widely discussed in the literature, there is still no clear consensus regarding the nature of the relationship between the two modalities, that is, whether they develop in parallel or sequentially and, if so, in what order. In the case of L1 acquisition, links between perception and production characterize the process of L1 development (Kuhl et al., 2008). Some speech theories such as the Motor Theory (Liberman & Mattingly, 1985) and the Direct-Realist approach (Fowler, 1986, Fowler, 1990) posit that there is a link between the two dimensions, given that adults perceive speech and create phonetic categories based on articulatory gestures of their own speech. However, while many studies support a clear link between the two dimensions in the L1 (e.g., Brunner et al., 2011, Fox, 1982, Newman, 2003), a lack of a connection between the two modalities has also been reported (Bailey and Haggard, 1973, Shultz et al., 2012). For example, Newman (2003) reported that speakers that produced /p/ with the longest VOT values also showed a preference towards longer VOT values in a goodness rating task. Conversely, Shultz et al. (2012) found no link between the use of VOT and F0 in production and the weighting of these cues in an identification test. The difference between the outcomes of these two studies may be related to a difference in the perceptual measures used: a goodness measure of an already identified category vs. a measure of whether or not a given stimulus is identified as a given category. It is possible that a closer relationship between perception and production may be obtained with the former.

Regarding the L2, L2 speech theories such as the SLM (Flege, 1995) and the PAM-L2 (Best & Tyler, 2007) assume that there is a link between the two dimensions and that accurate L2 perception tends to precede accurate L2 production. The revised version of the SLM (Flege & Bohn, 2021), however, posits that perception and production co-evolve without precedence. It also claims that, even though there exists a strong bidirectional influence between the two modalities, their correspondence is never perfect, not even in the case of monolingual speakers. Empirical evidence, in fact, yields a variety of results. Some studies support the claim that perception precedes production (e.g., Bohn and Flege, 1990, Flege et al., 1999, Nagle, 2018), while others suggest that L2 learners may produce L2 phones accurately even if they are not able to perceive them in a target-like manner (e.g., Caramazza et al., 1973, Flege and Bohn, 1997, Flege and Eefting, 1987a, Trofimovich and John, 2011). For instance, Bohn and Flege (1990) found that L1-German L2-English speakers succeeded in distinguishing the /ɛ/-/æ/ contrast in an identification task, but they were not able to produce it accurately. Similarly, Nagle (2018) found that American English speakers learning Spanish in an instructional setting improved their identification of Spanish /p/ and /b/ to a greater extent than their production, which developed later in the course. Still, a greater amount of variability was observed in production than in perception, as some learners improved their production more rapidly than others. In other words, even though perception consistently preceded production, accuracy in perception and production did not develop at the same pace for all participants, resulting in different alignments between the two modalities for different learners throughout the process of L2 acquisition. Evidence of production development occurring prior to perception development has also been found (e.g., Caramazza et al., 1973, Flege and Eefting, 1987a, Trofimovich and John, 2011). Flege and Eefting (1987a) reported that Dutch learners of English produced the /t/-/d/ contrast similarly to English native speakers but were not able to discriminate it in a native-like manner. Similarly, Trofimovich and John (2011) reported that L1-Canadian French learners of English produced the /θ/-/t/ and /ð/-/d/ contrasts accurately, but showed difficulty in distinguishing them perceptually in an auditory priming task. The authors allude to the relative visual salience of the dentals and differences in word familiarity between the production stimuli and the perceptual stimuli among possible causes for the different results. They also report a notably greater variability in the L2 production of the English sounds than in native English production. In addition, it has been suggested that greater social or external pressure on production accuracy than on perceptual accuracy may explain a precedence of production over perception (Llisterri, 1995). For example, L2 speakers’ difficulty perceiving the English /θ/-/t/ and /ð/-/d/ contrasts may have a smaller impact on native speakers’ perception of L2 speech than an inaccurate production of the same contrasts.

Another finding reported in the L2 literature is the lack of a clear relationship between perception and production, especially regarding consonants (e.g., De Leeuw et al., 2019, Hattori and Iverson, 2010 Peperkamp & Bouchon, 2011; Sheldon & Strange, 1982). For example, Sheldon and Strange (1982) investigated the relationship between the perception and production of the /r/-/l/ contrast by Japanese learners of English. They found that a subset of the participants were more successful at producing the contrast than at perceiving it, whereas the opposite was true for other participants. De Leeuw et al.’s (2019) study on phonotactic constraints also indicated that the two modalities may develop – at least to a certain extent – independently, as no relationship between accuracy of discrimination between word-initial /esp-/ and /sp-/ sequences and their production by Spanish learners of English was found. Moreover, an individual inspection of the data revealed that some participants performed more accurately in the perception task, whereas others were more accurate in production. Further, variables like L2 use and L2 proficiency were associated with a more target-like production, but these variables did not predict the discrimination accuracy.

The inconsitent results regarding the development of production and perception may be linked to cross-study differences in the tasks and measures used, the populations and target sounds tested, and the degree to which perception and production measures are comparable (Llisterri, 1995, Mack, 1989). Although cross-study differences make it difficult to pinpoint the effect of variables such as L2 experience, AOL and L2 proficiency, it has been found that experienced learners may present a greater alignment between the two dimensions than inexperienced learners (Bohn & Flege, 1990) and that factors such as L2 use and L2 proficiency may have a clearer effect on production than on perception (De Leeuw et al., 2019). In addition, a stronger perception-production link has been reported for vowels (e.g., Bohn & Flege, 1990) than for consonants, where often no clear link was established or, contrary to the SLM’s predictions, production appeared to precede perception (e.g., Sheldon and Strange, 1982, Williams, 1977). The degree of similarity between the L1 and the L2 phones may also influence the relationship between perception and production. For instance, Levy and Law (2010) found that /y/-/œ/ – two new phones – presented a clearer relationship between the two modalities than /u/-/y/ – a similar and a new phone, respectively. Another possible explanation for this apparent lack of consistency across studies may be related to the SLM-r’s theoretical claim that the relationship between the two dimensions is never perfect, neither in bilingual nor in monolingual speakers. The present paper attempts to evaluate this relationship further by testing the perception and production of L1 and L2 stops by English learners of Spanish differing mainly in amount of L2 experience.

The present study examines the perception and production of L1 and L2 stops by English learners of Spanish differing in amount and type of L2 experience, that is, years of learning and experience living in the L2 setting. Thus, the goals of the present study are (1) to analyze the effect of amount and type of experience on L2 speakers’ production of L1 and L2 voiceless stops; (2) to investigate the influence of L2 experience on L1 and L2 perception of voiceless stops; (3) and to examine the relationship between perception and production in L1 and L2 stops also in light of differences in type and amount of L2 experience. The methodological contribution of this study, thus, stems from the fact that (1) both the perception and the production of stops by L2 learners are evaluated, (2) the learners’ performance in both their L1 and their L2 is contrasted, and (3) the performance of the learners is compared to that of monolingual speakers of the L1 and of the L2. To our knowledge, only Gorba (2019) investigated the effect of experience in an L2 setting on the perception of both L2 and L1 stops.

Considering the findings of previous studies (e.g., Flege, 1987, Sancier and Fowler, 1997), it is hypothesized that participants with more years of experience, particularly in an L2 setting, will present more target-like VOT values in the L2, and are more likely to show instances of L1 phonetic drift towards the L2. This assumption is in agreement with the SLM/SLM-r’s claim that CLI is bidirectional. Moreover, if the SLM’s claims are supported, we should expect that the learners with less experience in the L2 setting will present a single category for the two languages, given the similarity of the L1 and L2 phones, whereas the most experienced learners may have received enough exposure to the L2 so as to present separate L1 and L2 categories. By contrast, the L2LP would hypothesize that all learners – regardless of their experience in the L2 – would present separate L1 and L2 categories, although the L2 categories will resemble the L1’s to a greater extent in the case of less experienced learners, as their L2 is still evolving towards L2-like values. Note, however, that the two models may be able to explain similar scenarios, as both of them can account for realizations that are intermediate between L1 and L2 values either in terms of the simultaneous activation of both the L1 and the L2 when language mode is not controlled (L2LP) or due to the process of category assimilation (i.e., merger hypothesis, SLM). Regarding the relationship between the two modalities in the L1, according to the Motor Theory (Liberman & Mattingly, 1985) or the Direct-Realist approach (Fowler, 1986, Fowler, 1990), a straightforward link should exist between the two dimensions, given that speakers base their perception on their own articulatory gestures. As for the L2, if production difficulty results from inaccurate perception, we may expect greater accuracy in production the more target-like the perception is (Flege, 1995, Flege, 2002), and a more target-like performance in perception than in production. Alternatively, the two modalities may show neither a precedence relationship nor a perfect alignment, as this may not be the case even regarding monolingual speakers (Flege & Bohn, 2021).

The remaining of this paper presents the results of two experiments. Three groups of L1-English L2-Spanish speakers completed a production and a perception experiment under two conditions, namely in the L1 and in the L2. A group of functional monolinguals of each language also completed the experiment. Experiment 1 assessed the production of /p/ and /k/ in each language. Experiment 2 consisted of two forced-choice perception tests – one for the /p/-/b/ contrast and one for the /k/-/g/ contrast – which were presented in the two languages.

Section snippets

Participants

A total of 51 participants completed the study, namely 41 young adult English learners of Spanish and 10 Spanish speakers (24 females (F) and 27 males (M), in their twenties and thirties). They included two groups of monolingual speakers (monolingual Spanish and monolingual English speakers) who acted as control groups, and three groups of English learners of Spanish. Two of these groups were living in their home country and differed in amount of L2 experience, namely the number of years

Participants

Participants in this experiment were the same as the ones described in Experiment 1 (see Section 2.1.1).

Stimuli

A /p/-/b/ and a /k/-/g/ VOT continua with 17 stimuli each were created to test the identification of the two stop contrasts by English learners of Spanish. Vowel /i/ was selected to follow the stops because it has been found to be one of the most perceptually equivalent vowels in the two languages under study (Cebrian, 2019). Hence, first, a highly proficient L1-Spanish L2-English male

Relationship between perception and production

The results of Experiment 1 and Experiment 2 were compared in order to evaluate the relationship between production and perception of stops in the L1 and in the L2. First, an inspection of each individual’s performance in the L1 perception and production tasks was carried out. In other words, an individual’s mean VOT values for /p/ and for /k/ were compared to their perceptual category boundary for /p/-/b/ and /k/-/g/, respectively, both in the L1 and in the L2 (see the scatter plots presented

General discussion

The goal of this paper was to investigate the effect of L2 experience on the production and perception of L1 and L2 bilabial and velar voiceless stops. Thirty-two English learners of Spanish differing in amount and type of L2 experience and two groups of monolingual speakers were evaluated. The L2 Spanish learners included two groups of learners in an instructional setting in the UK (a group of advanced fourth-year undergraduate students of Spanish with some previous experience in an immersion

Conclusions

The present study has investigated the L1 and L2 perception and production of voiceless stops by English learners of Spanish differing in amount and type of L2 experience. Participants completed a perception and a production experiment in each language. The production experiment involved a sentence reading task including the target phones in initial position. As for perception, participants had to complete two identification tasks – one for /p/-/b/ and one for /k/-/g/. The production experiment

CRediT authorship contribution statement

Celia Gorba: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Project administration, Funding acquisition. Juli Cebrian: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Supervision, Project administration, Funding acquisition.

Acknowledgements

This work was supported by research grants from the Spanish Ministry of Economy and Competitiveness [Grant No. FFI2017-88016-P] and from the Catalan Government [Grant No. 2017SGR34].

References (85)

  • J.E. Flege

    The production of “new” and “similar” phones in a foreign language: Evidence for the effect of equivalence classification

    Journal of Phonetics

    (1987)
  • J.E. Flege et al.

    Effects of experience on non-native speakers’ production and perception of English vowels

    Journal of Phonetics

    (1997)
  • J.E. Flege et al.

    Cross-language switching in stop consonant perception and production by Dutch speakers of English

    Speech Communication

    (1987)
  • J.E. Flege et al.

    Production and perception of English stops by native Spanish speakers

    Journal of Phonetics

    (1987)
  • C.A. Fowler

    An event approach to the study of speech perception from a direct–realist perspective

    Journal of Phonetics

    (1986)
  • C.A. Fowler

    Calling a mirage a mirage: Direct perception of speech produced without a tongue

    Journal of Phonetics

    (1990)
  • N. Kartushina et al.

    Mutual influences between native and non-native vowels in production: Evidence from short-term visual articulatory feedback training

    Journal of Phonetics

    (2016)
  • S. Lev-Ari et al.

    The influence of inhibitory skill on phonological representations in production and perception

    Journal of Phonetics

    (2014)
  • A.M. Liberman et al.

    The motor theory of speech perception revised

    Cognition

    (1985)
  • T. Piske et al.

    Factors affecting degree of foreign accent in an L2: A review

    Journal of Phonetics

    (2001)
  • M.L. Sancier et al.

    Gestural drift in a bilingual speaker of Brazilian Portuguese and English

    Journal of Phonetics

    (1997)
  • S.J. Tobin et al.

    Phonetic drift in Spanish-English bilinguals: Experiment and a self-organizing model

    Journal of Phonetics

    (2017)
  • S. Ahn et al.

    Age effects in first language attrition: Speech perception by Korean-English bilinguals

    Language Learning

    (2017)
  • C. Aliaga-García et al.

    Assessing the effects of phonetic training on L2 sound perception and production

  • K. Aoyama et al.

    Effects of L2 experience on perception of English /r/ and /l/ by native Japanese speakers

    Journal of the Phonetic Society of Japan

    (2011)
  • P.J. Bailey et al.

    Perception and production: Some correlations on voicing of an initial stop

    Language and Speech

    (1973)
  • C.T. Best

    A direct realist view of cross-language speech perception. Speech perception and linguistic experience

  • C.T. Best et al.

    Nonnative and second-language speech perception

  • D. Birdsong et al.

    Bilingual language profile: An easy-to-use instrument to assess bilingualism

    (2012)
  • Boersma, P., & Weenink, D. (2016). Praat: Doing Phonrtics by computer [Computer software]. Version 6.0.19. Retrieved...
  • O.S. Bohn et al.

    Interlingual identification and the role of foreign language experience in L2 vowel perception

    Applied Psycholinguistics

    (1990)
  • J. Brunner et al.

    The influence of auditory acuity on acoustic variability and the use of motor equivalence during adaptation to a perturbation

    Journal of Speech, Language, and Hearing Research

    (2011)
  • A. Caramazza et al.

    The acquisition of a new phonological contrast: The case of stop consonants in French-English bilinguals

    Journal of the Acoustical Society of America

    (1973)
  • M.T. Carlson et al.

    Navigating conflicting phonotactic constraints in bilingual speech perception

    Bilingualism

    (2016)
  • J.V. Casillas

    Phonetic category formation is perceptually driven during the early stages of adult L2 development

    Language and Speech

    (2020)
  • M.L. Castañeda

    El VOT de las oclusivas sordas y sonoras españolas

    Estudios de Fonética Experimental

    (1986)
  • J. Cebrian

    Perceptual assimilation of British English vowels to Spanish monophthongs and diphthongs

    The Journal of the Acoustical Society of America

    (2019)
  • E. De Leeuw et al.

    Illusory vowels in Spanish-English sequential bilinguals: Evidence that accurate L2 perception is neither necessary nor sufficient for accurate L2 production

    Second Language Research

    (2019)
  • G.J. Docherty

    The timing of voicing in British English obstruents

    (1992)
  • Escudero, P. (2005). Linguistic perception and second language acquisition: Explaining the attainment of optimal...
  • P. Escudero

    Linguistic perception of “similar” L2 sounds

    Phonology in Perception

    (2009)
  • J.E. Flege

    Age of learning affects the authenticity of voice-onset time (VOT) in stop consonants produced in a second language

    Journal of the Acoustical Society of America

    (1991)
  • Cited by (8)

    View all citing articles on Scopus
    View full text