Psycholinguists should resist the allure of linguistic units as perceptual units

https://doi.org/10.1016/j.jml.2019.104070Get rights and content

Highlights

  • Linguistic units were developed to describe language, not to be processing units.

  • Psycholinguists should investigate what structures listeners use to decode speech.

  • Perceptual units are developed through repeated exposure to speech patterns.

  • Selective adaptation of stop consonants is position-specific, consistent with position-specific processing.

  • The rebound in using selective adaptation is useful, but researchers must acquaint themselves with older work.

Abstract

The current study has empirical, methodological, and theoretical components. It draws heavily on two recent papers: Bowers et al. (2016) (JML, 87, 71–83) used results from selective adaptation experiments to argue that phonemes play a critical role in speech perception. Mitterer et al. (2018) (JML, 98, 77–92) responded with their own adaptation experiments to advocate instead for allophones. These studies are part of a renewed use of the selective adaptation paradigm. Empirically, the current study reports results that demonstrate that the Bowers et al. findings were artifactual. Methodologically, the renewed use of adaptation in the field is a positive development, but many recent studies suffer from a lack of knowledge of prior adaptation findings. As the use of selective adaptation grows, it will be important to draw on the considerable existing knowledge base (this literature is also relevant to the currently popular research on phonetic recalibration). Theoretically, for a half century there has been a recurring effort to demonstrate the psychological reality of various linguistic units, such as the phoneme or the allophone. The evidence is that listeners will use essentially any pattern that has been experienced often enough, not just the units that are well-suited to linguistic descriptions of language. Thus, rather than trying to identify any special perceptual status for linguistic units, psycholinguists should focus their efforts on more productive issues.

Introduction

In his thoughtful and intriguing article, Elman (2009) began by saying: “I begin with a warning to the reader. I propose to do away with one of the objects most cherished by language researchers: the mental lexicon. I do not call into question the existence of words, nor the many things language users know about them. Rather, I suggest the possibility of lexical knowledge without a lexicon.” (Elman, 2009, p. 548). Following this precedent, I also begin with a warning: I propose to do away with a cherished endeavor of psycholinguists. Despite the clear utility of linguistic units in describing language (the core purpose of linguistic analysis), attempts by psycholinguists to demonstrate “the psychological reality of X”, where “X” is some linguistically well-motivated unit, have repeatedly been fruitless. It is not that linguistic units cannot be used by listeners; rather, it is that almost any often-encountered pattern can be, whether that pattern corresponds to a linguistic unit or not. As such, demonstrating that (some) linguistic units can (sometimes) be used by listeners does not significantly advance our understanding of speech perception.

My call to abandon the search for linguistically-defined perceptual units is not new. Almost 20 years ago, Goldinger and Azuma (2003), in a paper entitled “Puzzle-solving science: The quixotic quest for units in speech perception”, made essentially the same point. Moreover, they noted that 30 years before their own paper, researchers (e.g., Foss and Swinney, 1973, McNeill and Lindig, 1973, Savin and Bever, 1970) had already begun to raise related concerns. As Goldinger and Azuma put it, “Considered collectively, 30 years of speech-unit research has generated little apparent progress. If the goal was to decide a ‘‘winner’’, the enterprise has clearly failed: Despite dozens of studies, the candidate list has actually grown… [T]he classic question of speech units seems ill conceived.” (p. 307).

Despite this insightful analysis, the effort to reify linguistic units is alive and well. In fact, the current study was stimulated by two recent papers in which the goal was to provide evidence that perceptual processing of language relies on particular linguistic units. The first paper, by Bowers, Kazanina, and Andermane (2016) (hereafter, BKA16), made an emphatic claim for the phoneme as an important perceptual unit during spoken word recognition (a claim that was then even more strongly asserted in a follow-up paper by Kazanina, Bowers, & Idsardi, 2018). The second paper, by Mitterer, Reinisch, and McQueen (2018) (hereafter, MRM18), argued forcefully for the allophone’s primacy over the phoneme as a perceptual unit, in a rebuttal to BKA16. These papers are clear examples of the approach that I am highlighting, as the core research goal in each paper was to make a very strong argument in favor of a particular linguistic unit – the phoneme for BKA16, and the allophone for MRM18. As I will expand on below, this type of research goal has consistently proven to be fruitless.

In addition to making this theoretical point, there are two other goals of the current study. The first is empirical: The experiments in the current study test whether the key result reported by BKA16 is an artifact of the stimuli that they used. If the result is artifactual, then of course the conclusions drawn from it are baseless. The other goal of the current study is more methodological: As I will discuss shortly, the technique used by BKA16 and by MRM18 is a variation of a methodology that was widely used 40 years ago, and that is enjoying renewed popularity. One goal of the current study is to urge new users of the technique to thoroughly acquaint themselves with the methodological and empirical findings in the original literature. As noted below, familiarity with this literature is also important for researchers examining phonetic recalibration (also called “retuning” or “perceptual learning”) because many of the studies in this burgeoning area have substantial overlap methodologically with selective adaptation procedures.

The research by BKA16 and MRM18 was conducted using a modified version of the selective adaptation procedure. In order to fully understand their work, some knowledge of that task is necessary because the empirical questions examined in those papers were based on earlier adaptation results. Therefore, I will present a very brief overview of the adaptation literature. A more extensive description of much of the relevant work is available in Samuel (1986).

The seminal selective adaptation paper was done by Eimas and Corbit (1973). Research on visual psychophysics had shown that repeatedly experiencing a stimulus reduced sensitivity to a visual property. Eimas and Corbit took this approach into the field of speech perception by first creating a continuum of syllables that ranged from /ba/ to /pa/, and then testing whether perception of those syllables would be changed by repeated exposure to an endpoint sound. They observed a “selective adaptation” effect akin to what had been found in visual psychophysics: They reported that after hearing /ba/ many times, listeners became less sensitive to /ba/ and heard fewer /ba/ sounds than before exposure; after hearing /pa/ many times, listeners became less sensitive to /pa/ and heard fewer /pa/ sounds than before. Borrowing from theories in visual psychophysics, Eimas and Corbit interpreted the effect as being a consequence of fatiguing “linguistic feature detectors” through repeated stimulation.

The Eimas and Corbit (1973) findings, together with other similar papers that appeared soon after, generated enormous interest and research in the field of speech perception. Adaptation rapidly became a favorite technique, and it was used to look at a range of questions. However, within a few years, a number of authors questioned the idea that there were linguistic feature detectors being fatigued (e.g., Diehl, 1981, Diehl et al., 1978). The general thrust of the alternative was that the observed shifts were not based on fatigue, but were instead a manifestation of a more general contrast effect that had to do with decision-making rather than perception. Even though one could make the argument that this issue does not actually undercut the utility of the paradigm (see Samuel, 1986 for such an argument), by the late 1980′s the technique had fallen out of favor.

Fig. 1 illustrates the boom-bust-boom pattern of speech research using the selective adaptation paradigm. The numbers here are approximations, based on the number of empirical papers in a given year that cited the original Eimas and Corbit (1973) paper. As such, they slightly underestimate the number of papers because not all papers cite the seminal work, an undercount that presumably increases in more recent years. The pattern is very clear: After a big boom during the five years following Eimas and Corbit, the task started to be used less and less in each of the successive five-year windows. Starting in the late 1980′s, for about 20 years, there was very little use of the technique.

During the last 10–15 years there has been a clear rebound, with growing use of adaptation again. What underlies this resurgence? It appears that much of adaptation’s renewed appeal stems from its potential relationship to a different phenomenon that has generated great interest in the field – perceptual recalibration. The two seminal papers on perceptual recalibration were by Norris, McQueen, and Cutler (2003), and by Bertelson, Vroomen, and de Gelder (2003), both appearing just before the rebound in selective adaptation research. Norris et al. showed that when an ambiguous segment (e.g., midway between /s/ and /f/) is presented in a number of lexical contexts that disambiguate it, listeners show evidence of expanding the phonemic category to include the ambiguous sound. Bertelson et al. showed a corresponding phoneme category expansion when the disambiguation comes from lipread information. Although the recalibration effect goes in the opposite direction than adaptation (i.e., boundaries shift to expand a phonemic category, versus a decrease with adaptation), both involve a shift in category boundaries through exposure to speech input. In fact, in many of the recalibration experiments that Vroomen and his colleagues have run (e.g., Vroomen et al., 2007, Vroomen et al., 2004), there is an explicit comparison of adaptation and recalibration. The apparent similarities have been captured in a Bayesian model that uses the same formal procedures to model both effects (Kleinschmidt & Jaeger, 2015).

Although adaptation can be an extremely useful tool (see Samuel, 1986), many of the new selective adaptation and recalibration papers are not well-informed about much of the work done during the original “boom”. There are unmotivated and unexplained changes in procedures, and in many cases, a lack of knowledge about what has already been established. There are about a hundred empirical adaptation papers in the literature (see Fig. 1), in many cases with multiple experiments, meaning that there are hundreds of prior adaptation experiments. New studies using adaptation rarely are well-informed by this literature, and this problem is even worse in recalibration studies that use procedures that effectively and often unknowingly create adaptation situations. Sticking to adaptation itself, the two papers that are the focus here are typical: BKA16 cited only three adaptation papers among their approximately 50 citations. MRM18, with nearly 80 citations, mentioned five adaptation papers; of these, one was the seminal Eimas and Corbit (1973) paper, and two of the other four were actually papers that question the utility of adaptation. As I will discuss in the General Discussion, there are many results in the adaptation literature that are relevant to the research in these two papers.

I will initially focus on the BKA16 paper because the empirical part of the current study is directly based on that study. Recall that the core theoretical claim by BKA16 was that the phoneme is a critical perceptual unit for listeners. They framed their effort as follows: “Traditional linguistic theory postulates a small set of phonemes that can be sequenced in various ways in order to represent thousands of words in a language …The common rejection of position invariant phonemes in psychological theories and models of word perception is a fundamental claim, and we explore this issue here… [W]e describe two experiments that provide strong evidence that phonemes do indeed play a role in word perception.” (pp. 71–72).

For the two experiments in their study, the key phrase in their framing was “position invariant phonemes”. Phonemes are the vowels and consonants of a language, and in linguistic analyses, their positional invariance is an important property. This property can be illustrated with the phoneme /p/ in English. Although most speakers are not aware of it, there is systematic variation in the way that /p/ is produced in English words. Specifically, when the /p/ is an onset (e.g., in “park”), the /p/ is aspirated – there is a puff of air after the lips open; in contrast, when the /p/ is part of a cluster (e.g., in “spark”), the /p/ is unaspirated – there is no puff of air. The two variants (aspirated and unaspirated /p/) are called allophones because they both are members of the broader /p/ phonemic category. Essentially, phonemes are abstractions across the relevant allophones. Because BKA16 were arguing that phonemes are critical perceptual units, perceptual tests should show that listeners treat phonemes as the same, despite differences in position.

From this perspective, one of the early papers in the adaptation literature posed a significant problem: Ades (1974) demonstrated that syllable-initial phonemes produced adaptation on syllable-initial test items (e.g., /bæ/ shifted identification of members of a /bæ/-/dæ/ continuum), and syllable-final phonemes produced adaptation on syllable-final test items (e.g., /æb/ shifted identification of members of an /æb/-/æd/ continuum), but there was no adaptation when adaptors and test syllables differed in position (e.g., /bae/ did not affect identification of /æb/-/æd/ test items). Samuel (1989) reported the same positional specificity for adaptation. BKA16 mentioned these two papers in their review of research that they saw as potentially problematic for their argument that phonemes are essential perceptual units, and the empirical portion of their paper was designed to demonstrate that adaptation actually does occur despite positional mismatching of the adaptors and test items.

Although there are of course variations across the many studies in the adaptation literature, there are certain procedures that are most common. In a typical study, simple consonant-vowel or vowel-consonant stimuli serve as both the adaptors and the test items. Often there will be 6–8 test items that form a continuum (e.g., with a good “ba” at one end, and a good “da” at the other), and the adaptors are usually the continuum endpoints or stimuli chosen to have a particular relationship to them. For example, with a /ba/-/da/ test series, in addition to the endpoint /ba/ and /da/ sounds, adaptors could be /pa/ and /ta/, chosen to share the place of articulation difference of the endpoints, but to differ from them in voicing. Typically, an adaptation study includes 10–20 cycles, with each cycle including about 30–60 s of hearing a repeating adaptor followed by listeners identifying one randomization of the test continuum items. This procedure produces a psychometric function (the probability of identifying each continuum item as one of the two categories) for each adaptation condition. Often there is also a psychometric baseline function, based on identifying the test items before any adaptation occurs. Adaptation manifests as a shift of one psychometric function relative to another, usually with the largest shift near the middle of the continuum.

The task used by BKA16 maintained the core properties of repeating a sound (the adapting sequence) and identification of test items that are midway between two good endpoints. However, the implementation was quite different. For simplicity, I will focus on their test involving a contrast between /b/ and /d/; there was also a test involving /s/ and /f/, but that part of their study is not particularly relevant to the core issues, or to the experiments in the current study. Rather than having listeners identify items that spanned a continuum, BKA16 had them identify a single token that was selected to be midway between “bump” and “dump”. Instead of repeating an endpoint item as the adaptor, BKA16 played listeners sets of words that had the adapting sound in a particular position. For example, there were 25 words that all started with /b/ (e.g., “bail”, “bank”, “berry”, “bother”…), and 25 words that all started with /d/ (e.g., “dice”, “draft”, “donkey”, “driver”…). For a within-position adaptation test, these word-initial items were presented (a number of times), and listeners identified the ambiguous “bump-dump” stimulus a number of times. For a between-position test, 25-item word sets with final-position critical sounds (e.g., “curb”, “glib”, “cherub”, “reverb” for /b/, and “gold”, “need”, “lucid”, “salad” for /d/) were used as the repeating items (there were also medial-position items, which are not relevant to the current study). All stimuli were based on recordings made by a native speaker of British English.

These procedures did produce significant adaptation effects, measured by differences in how people identified the ambiguous “bump-dump” token as a function of whether the repeated words included /b/ or /d/. Critically for BKA16’s argument, there was a significant shift for the final-position adapting words on the initial-position test item. The cross-position effect was smaller than the matched-position effect (about one third as large looking at all subjects, or about one half as large for subjects without ceiling/floor effects), but it was significant. BKA16 took this cross-position adaptation as evidence for position-invariant phonemes.

Given this claim, they said “[W]e would note that there is one set of findings that does seem at odds with our results; namely, the previous adaptation studies that failed to obtain adaptation across syllable positions in non-lexical targets (Ades, 1974, Samuel, 1989). Why the difference?” (p. 79). They answered this question by offering three “speculative explanation[s]”. One suggestion was that using a single ambiguous test token might be more sensitive than using a full continuum of test sounds. This does not make sense, as a full continuum includes an ambiguous region. Moreover, BKA16 had to drop over half of their subjects to get their cleaner measure, precisely because subjects differ in exactly where the ambiguous region will be (by testing the full range, this problem is mitigated). The individual differences in phonetic boundaries are usually within the ambiguous region of a continuum, allowing most listeners to be included in the data analyses. Their second speculation was that their lexical adaptors might be producing additional adaptation at a lexical level. However, Samuel (1997, experiments 3A and 3B) demonstrated that there is no contribution to adaptation at the lexical level itself. Finally, BKA16 said “In addition, whether or not our procedure is better suited for accessing abstract phoneme representations, the important point to emphasize is that the previous authors relied on null results in their adaptation studies to reject phonemes.” (p. 80).

In general, of course, caution is called for in accepting a null effect. However, null effects can indeed exist, and when multiple tests yield null effects, at some point accepting the null is the correct decision. If in fact there were only two tests that yielded null effects, accepting the null might well be premature. However, the evidence against cross-position adaptation is much more substantial. In addition to the Ades, 1974, Samuel, 1989 papers cited by BKA16, the positional-specificity issue was tested by Sawusch, 1977b, Wolf, 1978, and Samuel, Kat, and Tartter (1984). Table 1 summarizes the results from the five studies. Across these studies, there were 18 within-position tests, and all 18 produced significant adaptation effects. There were 18 across-position tests, and 14 of these failed to find adaptation (for 14 out of 18, p < .05 by a sign test). Thus, there is extremely substantial evidence for positional specificity for adaptation.

The four significant cases of across-position adaptation in the prior literature are themselves informative. Two of these came from Wolf’s (1978) study, and Wolf included identical noise bursts across initial and final position stimuli. The other two came from Samuel (1989) test of liquids; to make convincing liquids Samuel included identical 70 msec steady state formants in initial and final position. Thus, for the few cases of (weak) cross-position adaptation (versus the large majority of null effects), the adaptation was almost certainly due to the stimuli including strong acoustic matches across position, rather than to the positions sharing phonemic identity.

The handful of small but significant acoustically-driven cross-position adaptation effects raises the question of whether there might be a similar source for the small but significant effects reported by BKA16. Recall that the adaptor words were items that were recorded by a native speaker of British English. In British English, especially for the citation-form speech recorded for research, a native speaker is likely to produce a “released” final stop consonant. In a released final stop, rather than simply end the word with the stop closure, the speaker releases the closure to produce a more clearly articulated sound. Critically, such a release is acoustically largely the same as the normal articulation of that stop consonant in initial position. If the final-position adapting words had many released stops then listeners would be receiving acoustic input that matches the onset of the “bump-dump” test item. In fact, in Footnote 3 (p. 75), BKA16 report that 11 of the 25 /b/ adaptors had released final stops, and all 25 of the final /d/ adaptors did.

The presence of released final stops in most of the adaptors prompts an obvious question: Were the observed shifts due to the resulting acoustic matching across position, rather than to shared phonemic representations? There is a straightforward way to answer this question: Adaptation can be conducted using the original (released) adaptors, and with versions of those adaptors in which the releases have been spliced off the ends of the words. Those two tests are reported in Experiment 1; Experiment 2 reports the results of two control conditions.

Section snippets

Experiment 1

As noted above, the procedures used by BKA16 differed in several ways from what is typically done in selective adaptation experiments. In the current study, we use procedures that are more in line with standard practice. The most important change is that rather than having subjects repeatedly judge the identity of a single token (a token taken from a continuum between “bump” and “dump”), listeners in the current study identified members of an 8-step continuum ranging between /ba/ and /da/.

Participants

A total of 53 participants took part in Experiment 1, 27 with the Original stimuli, and 26 with the No-Release adaptors. All were native speakers of American English, with no self-identified hearing problems. They received credit toward a course requirement for their participation.

Test syllables

An 8-step consonant-vowel (CV) /ba/-/da/ test continuum was used. The stimuli came from the same 10-step test series that provided the stimuli used by Samuel (1989); the subset of 8 items was shifted (i.e., items 2

Results and discussion

As in previous studies (e.g., Samuel, 1989, Samuel, 2016), participants who were unwilling or unable to do the task were identified on the basis of their labeling of the /ba/-/da/ syllables. If for either of the adaptation functions the percentage of “D” report for the most /d/-like token was not at least 60% greater than the percentage for the most /b/-like item, the listener was classified as not having done the required task. Two participants in the Original condition and two participants in

Experiment 2

Although the results of Experiment 1 are clear, there are two additional tests that can help put the results in perspective. Both of these situations involve the matched-position cases that have consistently produced adaptation. One situation is a test of initial-position adaptors on the initial-position /ba/-/da/ test series used in Experiment 1. Together with Experiment 1, this test provides a comparison of the results using the more-standard adaptation methods here to the results using

Participants

A total of 60 participants took part in Experiment 2, 30 in the Initial-Position-Matched situation, and 30 in the Final-Position-Matched case. All were native speakers of American English, with no self-identified hearing problems; none had participated in Experiment 1. They received credit toward a course requirement for their participation.

Test syllables

For the Initial-Matched-Position test, the 8-step /ba/-/da/ test continuum from Experiment 1 was used. For the Final-Matched-Position test, each member of

Results and discussion

The same criteria were used to identify participants who did not do the task as instructed. Five participants in the Initial-Matched-Position condition and seven participants in the Final-Matched-Position condition were eliminated on this basis, leaving 25 usable participants in the first case, and 23 in the other.

Fig. 4 shows the adaptation results for the Initial-Matched-Position test. Consistent with the prior adaptation literature (see Table 1), adaptors that match test syllables in

General discussion

Recall that the current study is intended to address three issues: (1) Empirically, do the data reported by BKA16 support their advocacy of position-invariant phonemes as perceptual units? (2) Methodologically, is the resurgence of the selective adaptation paradigm well-informed? and (3) Theoretically, should psycholinguistic investigations of perception rely on units derived through linguistic analysis? I will consider each of these in turn.

The results of the two experiments here provide a

Acknowledgements

Support provided by Economic and Social Research Council (UK) Grant #ES/R006288/1, Ministerio de Ciencia E Innovacion (Spain) Grant # PSI2017-82563-P and by Ayuda Centro de Excelencia Severo Ochoa (Spain) SEV-2015-0490.

The data for all experiments can be found at https://osf.io/s6kdj/?view_only=ab3b91a352224a10831c6b92cd383020.

Jeff Bowers provided me with the stimuli used in Bowers, Kazanina, and Andermane (2016). I sincerely appreciate his providing the stimuli used in this project. Similarly,

References (49)

  • D. Norris et al.

    Perceptual learning in speech

    Cognitive Psychology

    (2003)
  • E. Reinisch et al.

    Phonetic category recalibration: What are the categories?

    Journal of Phonetics

    (2014)
  • A.G. Samuel

    Red herring detectors and speech perception: In defense of selective adaptation

    Cognitive Psychology

    (1986)
  • A.G. Samuel

    Lexical activation produces potent phonemic percepts

    Cognitive Psychology

    (1997)
  • A.G. Samuel

    Lexical representations are malleable for about one second: Evidence for the non-automaticity of perceptual recalibration

    Cognitive Psychology

    (2016)
  • H.B. Savin et al.

    The nonperceptual reality of the phoneme

    Journal of Verbal Learning and Verbal Behavior

    (1970)
  • R. Sproat et al.

    Allophonic variation in English /l/ and its implications for phonetic implementation

    Journal of Phonetics

    (1993)
  • J. Vroomen et al.

    Visual recalibration and selective adaptation in auditory–visual speech perception: Contrasting build-up courses

    Neuropsychologia

    (2007)
  • J. Vroomen et al.

    Selective adaptation and recalibration of auditory speech by lipread information: Dissipation

    Speech Communication

    (2004)
  • A. Ades

    How phonetic is selective adaptation? Experiments on syllable position and vowel environment

    Perception & Psychophysics

    (1974)
  • P. Bertelson et al.

    Visual recalibration of auditory speech identification: A McGurk after effect

    Psychological Science

    (2003)
  • N. Chomsky

    Syntactic Structures

    (1957)
  • N. Chomsky

    Aspects of a Theory of Syntax

    (1965)
  • R. Diehl

    Feature analyzers for the phonetic dimension stop vs. continuant

    Perception & Psychophysics

    (1976)
  • Cited by (0)

    View full text