Research Article
Extreme stop allophony in Mixtec spontaneous speech: Data, word prosody, and modelling

https://doi.org/10.1016/j.wocn.2022.101147Get rights and content

Highlights

  • We examine lenition of obstruents in a spontaneous speech corpus of Yoloxóchitl Mixtec (ISO xty), a language with fixed stem-final stress and a complex tonal inventory.

  • Measures of duration, voicing during constriction, and allophone type were examined.

  • Onset obstruents of unstressed syllables are lenited more than onset obstruents of stem-final stressed syllables.

  • Duration is a strong predictor of the degree of lenition, but the functional status of the morpheme is also important.

  • Modeling of the reduced allophones with deep neural networks resulted in high accuracy in the detection of stop closure (>95%), and fairly high (>70%) accuracy in detecting highly frequent reduced allophones.

Abstract

Word-level prosody plays an important role in processes of consonant lenition. Typically, consonants in word-initial position are strengthened while those in word-medial position are lenited (Keating, Cho, Fougeron, & Hsu, 2003). In this paper we examine the relationship between word-prosodic position and obstruent lenition in a spontaneous speech corpus of Yoloxóchitl Mixtec, an endangered Mixtecan language spoken in Mexico. The language exhibits a surprising amount of lenition in the realization of otherwise voiceless unaspirated stops and voiceless fricatives in careful speech. In Experiment 1, we examine the relationships between word position, consonant duration, and passive voicing and find that word-medial pre-tonic position is the locus of both consonant lengthening and less passive voicing. Non-pre-tonic consonants are produced with more voicing and shorter duration. We also find that the functional status of the morpheme plays a role in voicing lenition. In Experiment 2, we examine manner lenition and find a similar pattern – word-medial pre-tonic stops are more often realized with complete closure relative to non-pre-tonic stops, which are more often realized with incomplete closure. In Experiment 3, we model these lenition patterns using a series of deep neural networks and find that, even with limited training data, we can achieve reasonably high accuracy in the automatic categorization of lenition patterns. The results of this research both complement recent work on the phonetics of lenition in the world’s languages (Katz and Fricke, 2018; White et al., 2020) and provide computational tools for modeling and predicting patterns of extreme lenition.

Introduction

Speech requires speakers to carefully control the timing of different articulatory gestures while simultaneously conveying information to listeners at a sufficient rate. In running speech these constraints compete with each other and speakers may lenite certain speech sounds. Distinct from predictable allophonic rules, like voiceless stop aspiration in English (Lisker and Abramson, 1964, Kingston et al., 2008), are the variable phonetic patterns which occur in reduced, running speech. For instance, voiceless stops can be produced with a variable amount of voicing, called voicing bleed, when preceded by a voiced segment, e.g. in English (Davidson, 2018, Westbury and Keating, 1986), in French & Spanish (Torreira & Ernestus, 2011), and in Triqui (DiCanio, 2012). These patterns are variable in each of these languages and ubiquitous in connected speech.

The rates and types of variable lenition vary by speech context (Hualde et al., 2011, Lewis, 2001, Torreira and Ernestus, 2011, Warner and Tucker, 2011) and by language (Shih, Möbius, & Narasimhan, 1999) (see Section 1.1 for a discussion of patterns across languages). This is true even for what is described as the same phonological category across languages, e.g. voiceless unaspirated stops (ibid). This broadly suggests that lenition does not simply result from general human constraints on articulatory-motor control, but rather that it is sensitive to differences between languages. Thus, the primary theoretical question is just which specific linguistic factors contribute to differences in the rate of lenition. One known factor is prosody – segments which occur in stressed syllables are less likely to be lenited than those which occur in unstressed syllables (Bouavichith and Davidson, 2013, Lavoie, 2001). Given that languages show great diversity in terms of prosodic structure (c.f. Gordon (2016)), one anticipates that stress-related differences contribute to differences in consonantal reduction.

The current study examines how word position and word prosody influences patterns of consonantal lenition in a corpus of spontaneous speech in Yoloxóchitl Mixtec, an endangered Otomanguean language spoken in Mexico (García, 2007, Palancar et al., 2016). By “prosody”, we refer to word-level stress differences. Yoloxóchitl Mixtec has a small obstruent inventory (/p, t, k, kw, s, ʃh, ʧ/) and does not contrast voicing within obstruents; all obstruents are voiceless in citation or in controlled experimental contexts (DiCanio, Zhang, Whalen, & Castillo García, 2020). However, these obstruents undergo extensive patterns of lenition in running speech, where they may be fully or partially voiced and the stops may also be realized as frictionless continuants.

In Experiment 1, we examine voicing lenition in obstruents in relation to word size and prosodic structure. We find that the duration of voicing bleed during the obstruent correlates with the overall obstruent duration, which itself correlates with stress. Fixed stress occurs in stem-final syllables in Yoloxóchitl Mixtec (DiCanio et al., 2018, DiCanio et al., 2020). Obstruents in the onset of stem-final syllables are both phonetically longer and less likely to be voiced than obstruents that occur in non-final syllables, including word-initial position. In Experiment 2, we categorize each of the stops into a number of discrete allophonic groupings in order to examine the degree of spirantization or lenition (full closure with no voicing, full closure with partial voicing, fricative realization, approximant realization, etc). Approximantization (c.f. Bouavichith and Davidson, 2013, Ladd and Scobbie, 2003) is pervasive among stops in the language, occurring in 26% of all stem-final syllables and 45% of all stem non-final syllables. We describe the rates for each lenition type. We then trained several deep neural networks to predict allophonic grouping from the acoustic signal. Two-way DNN models trained on the stop/non-stop ([±continuant]) categories showed 95–98% accuracy in detecting spirantized/non-spirantized allophones. Three-way and four-way models showed 74–88% accuracy in detecting additional allophonic realizations (voiceless vs. voiced stop vs. fricative) but accuracy in detecting fricative vs. approximant realizations of the stops remained low (54–57%). The findings here both argue for a structural prosodic motivation for patterns of lenition in an under-resourced and endangered language; and provide positive evidence that computational tools can categorize surface acoustic–phonetic variation related to lenition in a non-controlled spontaneous speech corpus.

Obstruent reduction refers to the process where, relative to carefully-produced speech, an obstruent is produced with reduced spatial excursion of articulators and reduced constriction degree.1 Obstruents produced with this type of articulatory undershoot are also shorter in duration relative to carefully-produced variants (Lavoie, 2001, Parrell, 2014, Parrell and Narayanan, 2018) and voiceless obstruents may undergo an additional process of passive voicing or voicing bleed (Beckman et al., 2013, Davidson, 2018, DiCanio, 2012, Jansen, 2004, Schwarz et al., 2019, Stevens, 2000, Westbury, 1983, Westbury and Keating, 1986). As a consequence of this reduction, the target obstruent may be realized with rather different acoustic cues than in carefully-produced speech and be less perceptually distinct from adjacent speech sounds.

Reduction (or variable lenition) is ubiquitous in human speech and typical even in carefully-produced speech contexts (Lavoie, 2001, Warner and Tucker, 2011, Warner, 2019). As a result, listeners are regularly exposed to speech with different acoustic cues than one observes in idealized contexts. At the same time, reduction frequently coincides with weak prosodic positions (unstressed syllables; word-internal, phrase-medial, and intervocalic contexts) whereas obstruent fortition frequently coincides with strong prosodic positions (stressed syllables; word-initial, phrase-initial, and phrase-final contexts) (Bouavichith and Davidson, 2013, Cho and Keating, 2001, Fougeron and Keating, 1997, Katz, 2016, Keating et al., 2004, Katz and Fricke, 2018). Though reduction often reduces distinctness of a segment relative to its neighbors, the relationship between reduction and prosodic boundaries nevertheless aids the listener in lexical segmentation. Listeners are able use the degree of reduction to make decisions about word boundaries (Katz & Fricke, 2018) just as they are able to use more general prosodic cues for the purpose of word segmentation (Saffran et al., 1996, White et al., 2015).

Which prosodic positions are sensitive to patterns of reduction? A general finding throughout the literature is that stops are less likely to be reduced in phrase-initial and word-initial position than in phrase-medial or word-medial position in a variety of languages, such as Bardi, English, French, Italian, Spanish, Korean, Hungarian, and Taiwanese (Cho and Keating, 2001, Davidson, 2018, Fougeron and Keating, 1997, Kakadelis, 2018, Katz, 2016, Katz and Fricke, 2018, Keating et al., 2004, Lavoie, 2001, Lewis, 2001, White et al., 2020, Jun, 1995). Phrase and word-level delimitation are in fact considered to be primary goals in processes of lenition (Cho et al., 2007, Katz, 2016, Katz and Fricke, 2018); word-medial segments are lenited whereas word-initial ones are not. This process would appear to be listener-driven; if listeners rely on word-initial segments for lexical parsing, such segments ought to be hyperarticulated for the listener. Models of spoken word recognition, such as Shortlist B even encode the relative importance of word-initial segments directly (Norris & McQueen, 2008).

Katz and Fricke (2018) distinguish between two separate phenomena which are often discussed as leniting processes, voicing lenition and spirantization. Though these two processes often co-occur (Gurevich, 2011), research on reduction and lenition often focuses on just one of them. Katz and Fricke find that only spirantization (or, more often, approximantization) aids in word segmentation in an artificial language learning task. In their study, voicing lenition is the discrete, allophonic process whereby a voiceless obstruent is produced with full voicing in intervocalic position. However, this process has its phonetic precursors in gradient phonetic processes of voicing bleed, where the voicing from a preceding segment may spread into an obstruent (Hualde et al., 2011). Similarly, categorical spirantization has its phonetic precursors in processes of articulatory undershoot.

Yet, there are languages whose patterns might challenge the view that fortition/lenition is driven by the necessity to maintain clear word onsets. In languages with patterns of initial consonant mutation and/or processes of prefixation, knowing the initial consonant does not aid the listener in ascertaining lexical identity (Ussishkin et al., 2017). It is an open question as to whether prefix-heavy languages still undergo the same type of word-initial strengthening observed for languages like English, Spanish, and Korean, where suffixation is more common than prefixation. This question is relevant to the current study since the language under investigation, Yoloxóchitl Mixtec, is prefixal with fixed, stem-final stress.

Another context where obstruent reduction is found is in unstressed syllables. In a study analyzing a spontaneous speech corpus of English, Bouavichith and Davidson (2013) find that voiced stops (/b, d, g/) are produced as approximants [β̞̞, ɾ, ɣ̞̞] between 10–21% of the time (depending on the place of articulation) in the onset of stressed syllables but 47–75% of the time in the onset of unstressed syllables. Durational differences also accompanied stress - obstruents in the onset of stressed syllables were longer (39–66 ms) than those in the onset of unstressed syllables (21–43 ms). Yet, the pattern for voiceless obstruents in English is somewhat different; stress does not influence the degree of passive voicing (a type of lenition) in a spontaneous speech corpus (Davidson, 2018), though it is unclear whether voiceless obstruents are spirantized more often in unstressed contexts. In a study of spontaneous speech in Spanish and French, incomplete closure of voiceless stops (/p, t, k/) was found to be more common in unstressed syllables than in stressed syllables (Torreira & Ernestus, 2011). Closure duration was also found to be significantly shorter in stops in unstressed syllables. Similar findings on Central Colombian and Bilbao (Castillian) Spanish are reported in work by Lewis (2001), where voiceless unaspirated stops are shorter and have more closure voicing in the onsets of unstressed syllables than in stressed syllables. Stress induces a similar pattern of weakening/shortening of /s/ in a corpus of Madrileño Spanish speakers (Torreira & Ernestus, 2012).

Despite its close connection to prosodic structure, predicting the context where lenition will occur in a given language from a phonological representation also remains a challenge. For instance, not all voiced obstruents in a given language may lenite in a similar fashion or even in the same way in the same prosodic context (Jun, 1995). Moreover, languages differ substantially in both the magnitude and type of lenition which may occur. When we take this cross-linguistic variability seriously, there are still largely unanswered empirical questions about the relationship between lenition and other linguistic factors.

For instance, reduction also frequently coincides with changes in speech rate and speech style/register. Grosso modo, obstruents produced at a faster speech rate will be lenited relative to a slower speech rate (Cheng and Xu, 2015, Dalby, 1984, DiCanio, 2012). In a study of Itunyoso Triqui lenis (singleton) and fortis (geminate) stops, DiCanio (2012) finds variable lenition among singleton stops and affricates which correlates both with speech rate differences across speakers and inherent duration of the obstruent. The singleton post-alveolar affricate had the shortest closure duration among all obstruents and was most likely to undergo lenition to [ʃ]. Words and segments also have reduced duration in spontaneous speech relative to careful speech (Baker and Bradlow, 2009, DiCanio et al., 2015, DiCanio and Whalen, 2015, Laan, 1997, Moon and Lindblom, 1994, Warner and Tucker, 2011) and reduced duration in high frequency lexical items relative to low frequency items (Gahl, 2008, Gahl et al., 2012). For instance, in a study on voiceless stops in Bilbao (Castillian) and Central Columbian Spanish, Lewis (2001) finds greater rates of reduction (voicing during closure, shortening of closure duration) in conversational speech than in read passages or wordlist recordings. Obstruent reduction is more common in both spontaneous speech and in high frequency lexical items.

One of the principal articulatory/acoustic components tying all of these effects together is duration. Cross-linguistically, duration is one of the strongest correlates of stress (Gordon, 2016), though the prosodic unit onto which stress manifests can vary (foot, onset, vowel, rime). In languages possessing a geminate-singleton contrast, singleton obstruents may undergo processes of variable spirantization or passive voicing whereas geminates will not (DiCanio, 2012, Hualde and Nadeu, 2011, Stevens and Hajek, 2004). The findings showing greater rates of reduction at faster speech rates also suggests that duration is the primary factor predicting obstruent reduction and articulatory undershoot.

The relationship between duration and reduction is a design feature in articulatory phonology (Browman and Goldstein, 1990, Byrd and Saltzman, 2003, Byrd and Tan, 1996, Parrell, 2014, Parrell and Narayanan, 2018). If articulatory gestures have target durations in individual languages, one anticipates that obstruents shortened by prosodic or paralinguistic factors will undergo gestural undershoot in production. In a study examining lenition in English and Spanish using MRI, Parrell and Narayanan (2018) find that the patterns of Spanish coronal spirantization ([d - ð̞]) and English coronal flapping ([d/t - ɾ]) are gradient and predictable by duration. As obstruent duration decreases, speakers are continuously less likely to achieve full tongue tip to palatal contact – even though there may still be gestural evidence for tongue movement. In a study examining non-nasal velar lenition in Iwaidja (Iwaidjan: Australia), Shaw et al. (2020) find a close correlation between degree of constriction for the velar ([ɰ] vs. [a]) as measured via ultrasound and acoustics and the overall duration of the tongue body movement.

Studies examining variation in reduction in spontaneous speech additionally confirm these findings. Madrileño Spanish voiceless stops are shown to have shorter closure duration than Continental French stops and it is the Madrileño Spanish stops which undergo greater spirantization (Torreira & Ernestus, 2011). Speakers of Central Columbian Spanish have longer closure duration for voiceless stops than Bilbao Spanish speakers do and voiceless stops are more often lenited and (partially) voiced the latter group (Lewis, 2001).2 In a recent study of Campidanese Sardinian, Katz and Pitzanti (2019) find that duration accounts for most of the patterns of lenition among obstruents. Though, categorical features related to the prosodic hierarchy are still useful for characterizing the lenition patterns – increased duration, drops in intensity, and more abrupt releases occur at successively higher positions in the prosodic hierarchy above the word. In a study of lenition patterns in English using the Buckeye corpus (Pitt, Johnson, Hume, Kiesling, & Raymond, 2005), Priva et al. (2020) find that durational changes were specifically causal in predicting patterns of lenition - moreso than other linguistic variables like stress position and the informational content of the target word/phrase.

In her dissertation examining languages lacking a voicing contrast Kakadelis (2018) finds much higher rates of spirantization and passive voicing in languages like Bardi (Western Nyulnyulan) than in languages like Arapaho (Algonquian) and Sierra Norte de Puebla Nahuatl (Uto-Aztecan). In Bardi, stops /p, t, k/ have an average duration of 45–48 ms where 35% of stops are realized without closure and voicing often persists throughout the entire closure (87–95% of closure). In Nahuatl, stops /p, t, k/ have an average duration of 67–84 ms, 10.7% of stops are spirantized, and voicing persists through 63–77% of the closure. In Arapaho, stops /b, t, k/ have an average duration of 79–138 ms, only 10% of stops are spirantized, and (for /t, k/) voicing persists through 36–45% of the closure. There is a close relationship between stop duration and patterns of lenition. If a language has shorter stops (due to inherent duration targets, speech rate, word size, etc), it will have more lenited stops.3 Moreover, even within each of the languages in Kakadelis’ study, the stops with the shortest average duration underwent spirantization at a greater rate than stops with longer durations. If a stop tends to have a shorter target duration in a given language, that stop will be more likely to undergo processes of lenition.

The findings in Kakadelis (2018) are additionally relevant because they suggest that patterns of lenition can vary even in languages lacking a phonological contrast in voicing within the oral stop series. Languages possessing an obstruent voicing or aspiration contrast may limit the degree of voicing bleed in either voiceless or aspirated stops. That is, the necessity to maintain a voicing/aspiration contrast in the language may limit the degree of voicing in leniting contexts. This idea finds support in cross-linguistic, typological studies demonstrating that patterns of categorical voicing lenition rarely result in contrast neutralization (Gurevich, 2004, Gurevich, 2011). Another, so far unexamined hypothesis is the idea that in languages possessing a contrast in continuancy in the obstruent series (stop vs. fricative), stops might be less prone to spirantization. Both hypotheses are based on the more general hypothesis that phonological contrast preservation is an active force that influences surface phonetic variation within a given language (c.f. Keyser and Stevens, 2006, Lindblom, 1990). Yet, as noted above, reduction is more common in voiced stops in English than in voiceless stops. At first glance this would seem to be related to contrast, but note that voiced stops have shorter closure duration than voiceless stops in English and listeners can use this cue in perception (Lisker, 1957, Lisker, 1986). Thus, the presence of a contrast might only constrain the distribution of durational values for a given stop.

The current study examines patterns of variable lenition in a spontaneous speech corpus of Yoloxóchitl Mixtec with three scientific questions in mind. First, prosodic boundaries have a strong influence on consonant duration. This predicts that obstruents in word-initial position should be less likely to undergo variable lenition than obstruents in word-medial position. Second, stress also influences consonant duration. This predicts that obstruents in the onset of stressed syllables4 should be less likely to undergo variable lenition than obstruents in the onset of unstressed syllables. Third, inherent durational differences for obstruents typically correlate with variable patterns of lenition. This predicts that duration will be closely correlated with rates of spirantization and voicing lenition. While experiments 1 and 2 investigate these scientific questions, experiment 3 models the discrete allophonic variants examined in experiment 2 using deep neural networks. The motivation for experiment 3 is to determine whether surface phonetic variants observed in a reasonably small corpus of spontaneous speech can be accurately predicted in a computational model.

Yoloxóchitl Mixtec [joloˈsotʃi͡tl̥ ˈmistεk] is an Oto-Manguean (Mixtecan:Mixtec) language spoken in the towns of Yoloxóchitl, Cuanacaxtitlán, Buena Vista, and Arroyo Cumiapa in Guerrero, Mexico (García, 2007). The name “Mixtec” does not refer to the language itself, but to an ethnolinguistic grouping and language family comprising approximately twelve pan-dialectal regions and between 50–60 language varieties (c.f. DiCanio et al., 2020, Josserand, 1983). For the most part, languages spoken across pan-dialectal regions are not mutually intelligible. For instance, there is reasonably good mutual intelligibility across most of the Guerrero Mixtec languages (García, 2007), but only approximately 30% mutual intelligibility between the Guerrero and Southern Baja pan-dialectal regions (Lewis, Simons, & Fennig, 2015). There are approximately 4,000 speakers remaining, though many younger speakers are more dominant in Spanish than in Yoloxóchitl Mixtec.

Yoloxóchitl Mixtec possesses a complex lexical tone system consisting of nine distinct tones which are moraically-aligned (García, 2007, DiCanio et al., 2014, DiCanio et al., 2018, Palancar et al., 2016). Lexical roots are minimally bimoraic (CVːor CVCV) but longer words are possible with both prefixation (marking negation, tense, and aspect) and with enclitic morphology, which marks person on verbs or possessors on nouns (Palancar et al., 2016). Certain words may consist of a single mora, but these are entirely functional particles, i.e. adverbials like /ka¹/ ‘still’ and /ha¹⁴³/; or pre-nominal classifiers like/ja¹/ ‘that (one).’ Syllables are obligatorily open (CV) and glottalization is contrastive on lexical roots, occurring between moras in disyllabic roots (CVʔCV) and monosyllabic roots (CVʔV), e.g. [nde³e³] ‘flipped’ vs. [nde³ʔe³] ‘ground bean.’5

There are five phonemic vowel qualities and vowel nasalization is also contrastive, e.g. /i, e, a, o, u, ı̃, ẽ, ã, õ, ũ/. Previous research examining vowel production in a corpus of elicited and spontaneous speech found significant effects of speech style and duration on vowel production (DiCanio et al., 2015). Vowels are reduced in spontaneous speech relative to elicited speech but these contexts also significantly influenced overall vowel duration. The consonant inventory is relatively small, consisting of just fourteen contrastive consonants/p, t̪, k, kʷ, g, ʧ, m, n, ɾ, s̪, ʃh, β, j, l/. Prenasalized stops (/b, d/) are allophones of nasal consonants which surface before oral vowels (DiCanio et al., 2020). The fricatives [ʃ] and [h] are in free variation, but the distribution of each allophone is rather different. Out of 733 examples of [h] in the corpus (see Section 2.1), 98.6% (723/733) occurred in word-initial position. Out of 768 examples of [ʃ] in the corpus, 38.5% (296/768) occurred in word-initial position and 61.5% (472/768) occurred in word-medial position. The glottal fricative is almost never found in word-medial position and therefore it rarely occurs in the onset of the stressed syllable. This is unexpected given standard assumptions about where we expect debuccalization to occur (in word-medial position).

The four stops/p, t̪, k, kʷ/ are all voiceless unaspirated in careful speech and elicited recordings, with a positive VOT range between 11–32 ms (DiCanio et al., 2020). García (2007) describes a variable process of velar lenition, where velar and labialized velar stops will be realized as frictionless continuants [ɣ̞, ɣ̞ʷ]. This lenition was not observed in the recordings from eight speakers in DiCanio et al. (2020) but these speakers produced carrier sentences with target words; the speech was elicited. In spontaneous speech, patterns of stop lenition are noticeable. Voiceless fricatives are also variably debuccalized. The post-alveolar fricative freely varies with [h], though the post-alveolar articulation is much more common (García, 2007). The dental fricative /s̪/ occasionally also is produced as [h], though impressionistically this seems rarer than post-alveolar fricative debuccalization.

In addition to lexical tone, there is evidence for root-final, fixed stress in Yoloxóchitl Mixtec. Whereas five lexical tones surface on non-final syllables of roots, nine surface on root-final syllables. Root-final syllables are also consistently longer than non-final syllables (DiCanio et al., 2018). One of the interesting manifestations of stress in the language is the asymmetry in onset consonant duration. Onset consonants in final stressed syllables are longer than onset consonants in non-final, unstressed syllables. This has been observed in words in different focus conditions (DiCanio et al., 2018) and in words elicited in carrier sentences with sentential focus (DiCanio et al., 2020). In the former case, the target focused constituents were all utterance-initial and stop consonants were excluded from the analysis since one can not accurately measure voiceless stop closure duration in this position. In the latter case, stop consonants were included, but the recording conditions involved elicited and more careful speech productions.

Section snippets

Speakers and Materials

Yoloxóchitl Mixtec has been the focus of a major language documentation project which has produced over 200 h of carefully transcribed texts (Amith and García, 2019, Amith and García, 2021). These texts consist almost entirely of spontaneous speech narratives and conversations spoken by native speakers. From this corpus, 85.5 min of spontaneous speech was selected, as produced by three female speakers and three male speakers. Individual sound files were between 318 and 1368 s in duration.

Method

Voicing lenition is just of the types of lenition which occurred within the Yoloxóchitl Mixtec spontaneous speech corpus. Many stop consonants were also realized with incomplete closure. As mentioned above, García (2007) describes a variable process of velar lenition, where velar and labialized velar stops will be realized as frictionless continuants [ɣ̞, ɣ̞ʷ]. Yet, impressionistically this lenition process does not appear to be limited to velar stops. The second analysis utilized a

Experiment III: Modelling allophonic detail using deep neural networks

While it is possible for trained human speech scientists and phoneticians to categorize surface phonetic variants as in Experiment 2, can these categories be detected by a computational model? A current bottleneck in corpus phonetics is the ability to provide phonetic detail below the level of the transcribed phone. Speech corpus segmentation typically reflects phonemic categories after forced alignment has been applied (Babinski et al., 2019, DiCanio et al., 2013, Tang and Bennett, 2019, Yuan

Patterns of lenition

The results from experiment 1 show that voicing lenition is more common in non-final syllables in Yoloxóchitl Mixtec than in consonants in the onsets of stem-final, stressed syllables. Stem-final syllables are longer than non-final syllables and much of the durational difference between non-final and final syllables occurs within the consonant onset, not the vowel. The durational findings here replicate results from three previous studies on Yoloxóchitl Mixtec speech (DiCanio et al., 2018,

Conclusions

An examination of speech reduction in a corpus of Yoloxóchitl Mixtec spontaneous speech revealed that both stress and functional status influenced obstruent duration, a finding in line with previous work on speech reduction (Gahl et al., 2012, Kakadelis, 2018, Lavoie, 2001, Parrell and Narayanan, 2018). However, the language is unique in relation to past research on prosodic strengthening because it demonstrates a pattern of stress-induced strengthening and a pattern of relative weakening of

CRediT authorship contribution statement

Christian DiCanio: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Wei-Rong Chen: Methodology, Software, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Visualization. Joshua Benn: Investigation, Data curation. Jonathan Amith: Resources, Data curation. Rey Castillo García: Resources, Data curation, Funding acquisition.

Acknowledgments

This work was supported by NSF Grant 1603323 (DiCanio, PI) at the University at Buffalo and NIH Grant DC-002717 (Whalen, PI) at Haskins Laboratories. The corpus data from Yoloxóchitl Mixtec was supported by NSF Awards 0966462, 2123578, and 1761421 (Amith, PI) as well as ELDP Projects PPG0048 and MDP0201 (Amith, PI) at Gettysburg College.

References (94)

  • J. Saffran et al.

    Word segmentation: The role of distributional cues

    Journal of Memory and Language

    (1996)
  • M. Schwarz et al.

    Realization and representation of Nepali laryngeal contrasts: Voiced aspirates and laryngeal realism

    Journal of Phonetics

    (2019)
  • J. Amith et al.

    Audio corpus of Yoloxóchitl Mixtec with accompanying time-coded transcriptions in ELAN

  • J.D. Amith et al.

    Documentation of Yoloxóchitl Mixtec (Glottocode: Yolo1241; ISO 639–3: xty): Corpus, Lexicon, and Grammar

    (2019)
  • S. Babinski et al.

    A Robin Hood approach to Forced Alignment: English-trained algorithms and their use on Australian languages

  • R.E. Baker et al.

    Variability in word duration as a function of probability, speech style, and prosody

    Language and Speech

    (2009)
  • R. Beam de Azcona

    A Coatlán-Loxicha Zapotec Grammar

    (2004)
  • J. Beckman et al.

    Empirical evidence for laryngeal features: Aspirating vs. true voice languages

    Journal of Linguistics

    (2013)
  • Boersma, P. and Weenink, D. (2016). Praat: doing phonetics by computer [computer program]....
  • D. Bouavichith et al.

    Segmental and prosodic effects on intervocalic voiced stop reduction in connected speech

    Phonetica

    (2013)
  • Bowern, C.L. (2012). A Grammar of Bardi, volume 57 of Mouton Grammar Library. Walter de Gruyter GmbH & Co. KG,...
  • C. Browman et al.

    Tiers in articulatory phonology, with some implications for casual speech

  • P.-C. Buerkner

    brms: An R package for Bayesian multilevel models using Stan

    Journal of Statistical Software

    (2016)
  • E.W. Campbell

    Aspects of the Phonology and Morphology of Zenzontepec Chatino, a Zapotecan Language of Oaxaca, Mexico

    (2014)
  • B. Carpenter et al.

    Stan: A probabilistic programming language

    Journal of Statistical Software

    (2017)
  • Castillo García, R. (2007). Descripción fonológica, segmental, y tonal del Mixteco de Yoloxóchitl, Guerrero. Master’s...
  • C. Cheng et al.

    Mechanism of Disyllabic Tonal Reduction in Taiwan Mandarin

    Language and Speech

    (2015)
  • U. Cohen Priva et al.

    The causal structure of lenition: a case for the causal precedence of durational shortening

    Language

    (2020)
  • J. Dalby

    Phonetic structure of fast speech in American English

    (1984)
  • L. Davidson

    Phonation and laryngeal specification in American English voiceless obstruents

    Journal of the International Phonetic Association

    (2018)
  • B.L. Davis et al.

    Developmental apraxia of speech: Determiners of differential diagnosis

    Clinical Linguistics & Phonetics

    (1998)
  • C. DiCanio et al.
  • C. DiCanio et al.

    Disentangling the effects of position and utterance-level declination on the production of complex tones in Yoloxóchitl Mixtec

    Language and Speech

    (2021)
  • C. DiCanio et al.

    Using automatic alignment to analyze endangered language data: Testing the viability of untrained alignment

    Journal of the Acoustical Society of America

    (2013)
  • C. DiCanio et al.

    The interaction of vowel length and speech style in an arapaho speech corpus

  • C. DiCanio et al.

    Phonetic structure in Yoloxóchitl Mixtec consonants

    Journal of the International Phonetic Association

    (2020)
  • C.T. DiCanio

    The Phonetics of Fortis and Lenis Consonants in Itunyoso Trique

    International Journal of American Linguistics

    (2012)
  • C.T. DiCanio

    Abstract and concrete tonal classes in Itunyoso Trique person morphology

  • DiCanio, C.T. (2020). Aspecto verbal en triqui de Itunyoso. In Swanton, M., San Giacomo Trinidad, M., and Hernández...
  • C. Fougeron et al.

    Articulatory strengthening at edges of prosodic domains

    Journal of the Acoustical Society of America

    (1997)
  • S. Gahl

    ”Time” and ”thyme” are not homophones: Word durations in spontaneous speech

    Language

    (2008)
  • M.K. Gordon

    Phonological Typology

    (2016)
  • Gurevich, N. (2004). Lenition and contrast: the functional consequences of certain phonetically conditioned sound...
  • N. Gurevich

    Lenition

  • H. He et al.

    Adasyn: Adaptive synthetic sampling approach for imbalanced learning

  • J. Hualde et al.

    Consonant lenition and phonological recategorization

    Laboratory Phonology

    (2011)
  • J.I. Hualde et al.

    Lenition and phonemic overlap in Rome Italian

    Phonetica

    (2011)
  • Cited by (0)

    View full text