Elsevier

Speech Communication

Volume 55, Issues 7–8, September 2013, Pages 815-824
Speech Communication

Objective speech intelligibility measurement for cochlear implant users in complex listening environments

https://doi.org/10.1016/j.specom.2013.04.001Get rights and content

Highlights

  • Five objective speech intelligibility measures are assessed for cochlear implant users.

  • A CI-inspired non-intrusive measure is proposed based on extending a previously-proposed measure.

  • The modified measure uses an acoustic filterbank to emulate cochlear implant hearing percepts.

  • Accurate results are obtained, with the advantage of not requiring a clean reference signal.

  • A modulation filterbank ranging from 4 to 64 Hz is shown to better simulate cochlear implant hearing.

Abstract

Objective intelligibility measurement allows for reliable, low-cost, and repeatable assessment of innovative speech processing technologies, thus dispensing costly and time-consuming subjective tests. To date, existing objective measures have focused on normal hearing model, and limited use has been found for restorative hearing instruments such as cochlear implants (CIs). In this paper, we have evaluated the performance of five existing objective measures, as well as proposed two refinements to one particular measure to better emulate CI hearing, under complex listening conditions involving noise-only, reverberation-only, and noise-plus-reverberation. Performance is assessed against subjectively rated data. Experimental results show that the proposed CI-inspired objective measures outperformed all existing measures; gains by as much as 22% could be achieved in rank correlation.

Introduction

With technological advances witnessed in cochlear implant (CI) devices, most CI users can now achieve reliable speech intelligibility in controlled quiet scenarios, particularly in predictable conversations (Wilson and Dorman, 2008). Environmental distortions, such as reverberation and additive noise (and their combined effects), on the other hand, are known to significantly degrade speech intelligibility (Hazrati and Loizou, 2012, Neuman et al., 2010, Kokkinakis et al., 2011, Poissant et al., 2006). Reverberation and noise, for example, (a) distort important speech envelope modulation information, making it extremely challenging for CI users to perceive e.g., pitch modulations, formant transitions, timbre, and word/syllable boundaries (Drgas and Blaszak, 2010, Kokkinakis et al., 2011, Watkins and Holt, 2000), (b) introduce unwanted masking effects (Nabelek et al., 1993, Nabelek et al., 1989, Poissant et al., 2006), and (c) cause poor sound localization (Zheng et al., 2011). To overcome these limitations and to improve speech intelligibility in everyday environments, recent research has focused on the development of speech enhancement algorithms, such as noise suppression, channel selection, and dereverberation (e.g., Kokkinakis et al., 2011, Loizou et al., 2005, Yang and Fu, 2005).

In order to assess the effects of environmental conditions on the speech intelligibility of CI users, as well as the recognition gains post speech enhancement, two subjective testing approaches are commonly taken. The first makes use of vocoded speech to simulate CI processing and presents vocoded speech to normal hearing (NH) listeners for identification (e.g., Dorman et al., 1997, Drgas and Blaszak, 2010, Poissant et al., 2006, Qin and Oxenham, 2003). The second approach is more direct and presents noise-degraded (and/or enhanced) speech stimuli to CI users (Hazrati and Loizou, 2012, Kokkinakis and Loizou, 2011). Subjective testing, however, is very expensive and time consuming. Objective speech intelligibility measurement, on the other hand, replaces the listeners with a computational algorithm, thus allowing for automated, repeatable, fast, and cost-effective intelligibility monitoring (Moller et al., 2011). Moreover, for speech enhancement, objective metrics can play an important role, as “on-the-spot” intelligibility assessment can be used for fine-tuning of algorithm parameters (e.g., CI filterbank settings). Lastly, objective metrics allow for repetitive and low-cost quantitative comparison between multiple CI devices.

Objective intelligibility (or quality) metrics can be broadly classified as intrusive (also known as double-ended or full-reference) or non-intrusive (single-ended or no-reference) depending on the need for a reference clean signal or not, respectively (Moller et al., 2011). Intrusive metrics have the advantage of being able to assess directly the amount and type of distortion in a corrupted signal. While both can be used during the development of an enhancement algorithm or for evaluation/comparison of different CI devices, intrusive metrics cannot be used in practical real-time applications, as in this case a reference clean signal is not available. Since non-intrusive metrics do not require a clean reference signal, it is possible to apply them to quantitatively characterize the intelligibility gains achieved with a blind speech enhancement algorithm (e.g., dereverberation) directly on the device. They also enable the development of intelligibility-aware enhancement algorithms, which could adjust CI device parameters in real time taking into consideration the current intelligibility settings imposed by environmental effects (such as background noise and reverberation levels).

Commonly, objective metrics are developed and evaluated with normal hearing listeners as target, with a few studies using vocoded speech to simulate CI hearing (e.g., Chen, 2012). Recently, several objective metrics were evaluated against vocoded speech degraded by reverberation (Cosentino et al., 2012), as well as speech degraded by noise and reverberation and presented directly to CI users (Santos et al., 2012). In these studies, it was observed that existing intrusive metrics did not correlate highly with CI user intelligibility across three environmental conditions, namely noise alone, reverberation alone, and noise-plus-reverberation (Santos et al., 2012). In the reverberation alone case, a recently-proposed non-intrusive metric termed speech-to-reverberation modulation energy ratio (SRMR) (Falk et al., 2010) showed promising results (Cosentino et al., 2012). In this paper, we investigate the performance of five existing objective metrics (two intrusive and three non-intrusive) and compare their performance with the intelligibility scores of CI users. We also propose two new measures (one non-intrusive and one intrusive) by refining the so-called SRMR metric to emulate CI hearing percepts. We show that (i) the investigated intrusive metrics achieve reliable performance under the three tested conditions, and that the proposed CI-inspired non-intrusive metric, (ii) outperforms all other non-intrusive benchmarks, and (iii)achieves results in line with the intrusive metrics, but with the advantage of not requiring a clean reference signal.

The remainder of this paper is organized as follows. Section 2 describes the subjective intelligibility experiments and speech material database, as well as the evaluated objective intelligibility metrics and performance criteria that were considered. Sections 3 and 4 present the experimental results and discussion, respectively. Lastly, Section 5 shows the conclusions.

Section snippets

Participants

Eleven adult CI users were recruited to participate in the subjective intelligibility experiments. The participants were all native speakers of American English with post-lingual deafness and had an average age of 64 years (±8.9). Participants consented and were paid for their participation. The interested reader is referred to reference (Hazrati and Loizou, 2012) for specific demographic details of the participants. All participants had a minimum one-year experience using their device

Results

Table 2 presents the performance criteria obtained by the seven investigated objective metrics. As can be seen, all metrics showed high correlations with subjective ratings. The SRMR-CI measure showed significant improvements in all performance criteria relative to the original SRMR measure (p<0.05, t-test), thus suggesting that emulating CI processing can be beneficial in objective intelligibility monitoring for CI listeners. Moreover, the normalized SRMR-CInorm measure further improved

Objective intelligibility measurement: importance of temporal envelope cues for CI users

Preservation of temporal envelope cues has long been regarded as an important factor in speech perception (Dudley, 1939, Lorenzi and Moore, 2008). This is particularly true for hearing-impaired listeners who have reduced ability to process fine temporal structure and spectral cues (Moore, 2008, Xu and Pfingst, 2008). To this end, it was observed that the NCM intrusive measure, which itself is based on temporal envelope cues, outperformed the CSII measure, based on fine spectral cues (see Table 2

Conclusions

This paper has evaluated several objective speech intelligibility measures for CI users in noisy and reverberant everyday environments. It was shown that existing non-intrusive metrics are outperformed by intrusive ones. Notwithstanding, an extension to the so-called SRMR non-intrusive measure was proposed to better simulate CI hearing. Experimental results showed improvements over its predecessor and the obtained performance levels were in line with intrusive ones, but with the advantage of

Acknowledgements

THF and JFS thank the Natural Sciences and Engineering Research Council of Canada for their financial support. SC acknowledges funding from UCL and Neurelec. PCL and OH were supported by a National Institute of Deafness and Other Communication Disorders Grant (R01 DC 010494).

References (46)

  • H. Dudley

    Remaking speech

    J. Acoust. Soc. Am.

    (1939)
  • S.D. Ewert et al.

    Characterizing frequency selectivity for envelope fluctuations

    J. Acoust. Soc. Am.

    (2000)
  • T.H. Falk et al.

    Modulation spectral features for robust far-field speaker identification

    IEEE Trans. Audio Speech Lang. Process.

    (2010)
  • T.H. Falk et al.

    A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech

    IEEE Trans. Audio Speech Lang. Process.

    (2010)
  • R.L. Goldsworthy et al.

    Analysis of speech-based speech transmission index methods with implications for nonlinear operations

    J. Acoust. Soc. Am.

    (2004)
  • Hazrati, O., Loizou, P.C., 2012. The combined effects of reverberation and noise on speech intelligibility by cochlear...
  • I. Holube et al.

    Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model

    J. Acoust. Soc. Am.

    (1996)
  • ITU-T P.563, 2004. Single ended method for objective speech quality assessment in narrow-band telephony applications,...
  • ITU-T P.862, 2001. Perceptual evaluation of speech quality: An objective method for end-to-end speech quality...
  • J. Kates et al.

    A model of speech intelligibility and quality in hearing aids

  • K. Kokkinakis et al.

    The impact of reverberant self-masking and overlap-masking effects on speech intelligibility by cochlear implant listeners (L)

    J. Acoust. Soc. Am.

    (2011)
  • K. Kokkinakis et al.

    Evaluation of objective measures for quality assessment of reverberant speech

  • K. Kokkinakis et al.

    A channel-selection criterion for supressing reverberation in cochlear implants

    J. Acoust. Soc. Am.

    (2011)
  • Cited by (39)

    • Estimating the reduced benefit of infant-directed speech in cochlear implant-related speech processing

      2021, Neuroscience Research
      Citation Excerpt :

      As shown in Fig. 1, we estimated the degree of intelligibility of each speech stimulus as it might be experienced by listeners with NH or with CIs. To approximate stimulus intelligibility as associated with NH, we calculated the quantitative metric of speech-to-reverberation-modulation energy ratio (SRMR) for IDS and ADS (unprocessed) stimuli, respectively (Santos et al., 2013). This metric has previously been validated as an approximation of intelligibility of speech to listeners with NH in behavioral tasks (Falk et al., 2015; Santos et al., 2013).

    • A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones

      2018, Speech Communication
      Citation Excerpt :

      Objective intelligibility measures (OIMs) have been widely used in the place of subjective listening tests for speech intelligibility evaluation, due to their fast but cheap operation and the reliable feedback they provide. In fields such as telephony quality assessment (Fletcher, 1921; ANSI S3.5, 1997), acoustics design (Houtgast and Steeneken, 1985; IEC, 2011), audiology for hearing impairment (Holube and Kollmeier, 1996; Santos et al., 2013) and algorithm development for speech enhancement and modification (Taal et al., 2010; Gomez et al., 2012), OIMs have been playing an important role for nearly a century. More recently, in order to promote their usability in more realistic listening situations, work on OIM development has focused on improving their predictive performance in conditions such as additive noise (Rhebergen and Versfeld, 2005; Jørgensen et al., 2013; Tang and Cooke, 2016) and reverberation (Rennies et al., 2011; Tang et al., 2016c).

    View all citing articles on Scopus
    1

    Passed away on July 22nd, 2012.

    View full text