Objective speech intelligibility measurement for cochlear implant users in complex listening environments
Introduction
With technological advances witnessed in cochlear implant (CI) devices, most CI users can now achieve reliable speech intelligibility in controlled quiet scenarios, particularly in predictable conversations (Wilson and Dorman, 2008). Environmental distortions, such as reverberation and additive noise (and their combined effects), on the other hand, are known to significantly degrade speech intelligibility (Hazrati and Loizou, 2012, Neuman et al., 2010, Kokkinakis et al., 2011, Poissant et al., 2006). Reverberation and noise, for example, (a) distort important speech envelope modulation information, making it extremely challenging for CI users to perceive e.g., pitch modulations, formant transitions, timbre, and word/syllable boundaries (Drgas and Blaszak, 2010, Kokkinakis et al., 2011, Watkins and Holt, 2000), (b) introduce unwanted masking effects (Nabelek et al., 1993, Nabelek et al., 1989, Poissant et al., 2006), and (c) cause poor sound localization (Zheng et al., 2011). To overcome these limitations and to improve speech intelligibility in everyday environments, recent research has focused on the development of speech enhancement algorithms, such as noise suppression, channel selection, and dereverberation (e.g., Kokkinakis et al., 2011, Loizou et al., 2005, Yang and Fu, 2005).
In order to assess the effects of environmental conditions on the speech intelligibility of CI users, as well as the recognition gains post speech enhancement, two subjective testing approaches are commonly taken. The first makes use of vocoded speech to simulate CI processing and presents vocoded speech to normal hearing (NH) listeners for identification (e.g., Dorman et al., 1997, Drgas and Blaszak, 2010, Poissant et al., 2006, Qin and Oxenham, 2003). The second approach is more direct and presents noise-degraded (and/or enhanced) speech stimuli to CI users (Hazrati and Loizou, 2012, Kokkinakis and Loizou, 2011). Subjective testing, however, is very expensive and time consuming. Objective speech intelligibility measurement, on the other hand, replaces the listeners with a computational algorithm, thus allowing for automated, repeatable, fast, and cost-effective intelligibility monitoring (Moller et al., 2011). Moreover, for speech enhancement, objective metrics can play an important role, as “on-the-spot” intelligibility assessment can be used for fine-tuning of algorithm parameters (e.g., CI filterbank settings). Lastly, objective metrics allow for repetitive and low-cost quantitative comparison between multiple CI devices.
Objective intelligibility (or quality) metrics can be broadly classified as intrusive (also known as double-ended or full-reference) or non-intrusive (single-ended or no-reference) depending on the need for a reference clean signal or not, respectively (Moller et al., 2011). Intrusive metrics have the advantage of being able to assess directly the amount and type of distortion in a corrupted signal. While both can be used during the development of an enhancement algorithm or for evaluation/comparison of different CI devices, intrusive metrics cannot be used in practical real-time applications, as in this case a reference clean signal is not available. Since non-intrusive metrics do not require a clean reference signal, it is possible to apply them to quantitatively characterize the intelligibility gains achieved with a blind speech enhancement algorithm (e.g., dereverberation) directly on the device. They also enable the development of intelligibility-aware enhancement algorithms, which could adjust CI device parameters in real time taking into consideration the current intelligibility settings imposed by environmental effects (such as background noise and reverberation levels).
Commonly, objective metrics are developed and evaluated with normal hearing listeners as target, with a few studies using vocoded speech to simulate CI hearing (e.g., Chen, 2012). Recently, several objective metrics were evaluated against vocoded speech degraded by reverberation (Cosentino et al., 2012), as well as speech degraded by noise and reverberation and presented directly to CI users (Santos et al., 2012). In these studies, it was observed that existing intrusive metrics did not correlate highly with CI user intelligibility across three environmental conditions, namely noise alone, reverberation alone, and noise-plus-reverberation (Santos et al., 2012). In the reverberation alone case, a recently-proposed non-intrusive metric termed speech-to-reverberation modulation energy ratio (SRMR) (Falk et al., 2010) showed promising results (Cosentino et al., 2012). In this paper, we investigate the performance of five existing objective metrics (two intrusive and three non-intrusive) and compare their performance with the intelligibility scores of CI users. We also propose two new measures (one non-intrusive and one intrusive) by refining the so-called SRMR metric to emulate CI hearing percepts. We show that (i) the investigated intrusive metrics achieve reliable performance under the three tested conditions, and that the proposed CI-inspired non-intrusive metric, (ii) outperforms all other non-intrusive benchmarks, and (iii)achieves results in line with the intrusive metrics, but with the advantage of not requiring a clean reference signal.
The remainder of this paper is organized as follows. Section 2 describes the subjective intelligibility experiments and speech material database, as well as the evaluated objective intelligibility metrics and performance criteria that were considered. Sections 3 and 4 present the experimental results and discussion, respectively. Lastly, Section 5 shows the conclusions.
Section snippets
Participants
Eleven adult CI users were recruited to participate in the subjective intelligibility experiments. The participants were all native speakers of American English with post-lingual deafness and had an average age of 64 years (±8.9). Participants consented and were paid for their participation. The interested reader is referred to reference (Hazrati and Loizou, 2012) for specific demographic details of the participants. All participants had a minimum one-year experience using their device
Results
Table 2 presents the performance criteria obtained by the seven investigated objective metrics. As can be seen, all metrics showed high correlations with subjective ratings. The SRMR-CI measure showed significant improvements in all performance criteria relative to the original SRMR measure (, t-test), thus suggesting that emulating CI processing can be beneficial in objective intelligibility monitoring for CI listeners. Moreover, the normalized measure further improved
Objective intelligibility measurement: importance of temporal envelope cues for CI users
Preservation of temporal envelope cues has long been regarded as an important factor in speech perception (Dudley, 1939, Lorenzi and Moore, 2008). This is particularly true for hearing-impaired listeners who have reduced ability to process fine temporal structure and spectral cues (Moore, 2008, Xu and Pfingst, 2008). To this end, it was observed that the NCM intrusive measure, which itself is based on temporal envelope cues, outperformed the CSII measure, based on fine spectral cues (see Table 2
Conclusions
This paper has evaluated several objective speech intelligibility measures for CI users in noisy and reverberant everyday environments. It was shown that existing non-intrusive metrics are outperformed by intrusive ones. Notwithstanding, an extension to the so-called SRMR non-intrusive measure was proposed to better simulate CI hearing. Experimental results showed improvements over its predecessor and the obtained performance levels were in line with intrusive ones, but with the advantage of
Acknowledgements
THF and JFS thank the Natural Sciences and Engineering Research Council of Canada for their financial support. SC acknowledges funding from UCL and Neurelec. PCL and OH were supported by a National Institute of Deafness and Other Communication Disorders Grant (R01 DC 010494).
References (46)
- et al.
Perception of speech in reverberant conditions using AM–FM cochlear implant simulation
Hear. Res.
(2010) - et al.
Derivation of auditory filter shapes from notched-noise data
Hear. Res.
(1990) - et al.
Cochlear implants: a remarkable past and a brilliant future
Hear. Res.
(2008) - et al.
Spectral and temporal cues for speech recognition: implications for auditory prostheses
Hear. Res.
(2008) - Arai, T., Pavel, M., Hermansky, H., Avendano, C. 1996. Intelligibility of speech with filtered time trajectories of...
- Chen, F., in press. Predicting the intelligibility of cochlear-implant vocoded speech from objective quality measure,...
- et al.
Predicting the intelligibility of vocoded speech
Ear Hear.
(2011) - Chen, F., Hazrati, O., Loizou, P.C., Predicting the intelligibility of reverberant speech for cochlear implant...
- Cosentino, S., Marquardt, T., McAlpine, D., Falk, T.H., 2012. Towards objective measures of speech intelligibility for...
- et al.
Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs
J. Acoust. Soc. Am.
(1997)
Remaking speech
J. Acoust. Soc. Am.
Characterizing frequency selectivity for envelope fluctuations
J. Acoust. Soc. Am.
Modulation spectral features for robust far-field speaker identification
IEEE Trans. Audio Speech Lang. Process.
A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech
IEEE Trans. Audio Speech Lang. Process.
Analysis of speech-based speech transmission index methods with implications for nonlinear operations
J. Acoust. Soc. Am.
Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model
J. Acoust. Soc. Am.
A model of speech intelligibility and quality in hearing aids
The impact of reverberant self-masking and overlap-masking effects on speech intelligibility by cochlear implant listeners (L)
J. Acoust. Soc. Am.
Evaluation of objective measures for quality assessment of reverberant speech
A channel-selection criterion for supressing reverberation in cochlear implants
J. Acoust. Soc. Am.
Cited by (39)
Spectral–temporal saliency masks and modulation tensorgrams for generalizable COVID-19 detection
2024, Computer Speech and LanguagePreserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source
2023, Computer Speech and LanguageNonintrusive objective measurement of speech intelligibility: A review of methodology
2022, Biomedical Signal Processing and ControlEstimating the reduced benefit of infant-directed speech in cochlear implant-related speech processing
2021, Neuroscience ResearchCitation Excerpt :As shown in Fig. 1, we estimated the degree of intelligibility of each speech stimulus as it might be experienced by listeners with NH or with CIs. To approximate stimulus intelligibility as associated with NH, we calculated the quantitative metric of speech-to-reverberation-modulation energy ratio (SRMR) for IDS and ADS (unprocessed) stimuli, respectively (Santos et al., 2013). This metric has previously been validated as an approximation of intelligibility of speech to listeners with NH in behavioral tasks (Falk et al., 2015; Santos et al., 2013).
A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones
2018, Speech CommunicationCitation Excerpt :Objective intelligibility measures (OIMs) have been widely used in the place of subjective listening tests for speech intelligibility evaluation, due to their fast but cheap operation and the reliable feedback they provide. In fields such as telephony quality assessment (Fletcher, 1921; ANSI S3.5, 1997), acoustics design (Houtgast and Steeneken, 1985; IEC, 2011), audiology for hearing impairment (Holube and Kollmeier, 1996; Santos et al., 2013) and algorithm development for speech enhancement and modification (Taal et al., 2010; Gomez et al., 2012), OIMs have been playing an important role for nearly a century. More recently, in order to promote their usability in more realistic listening situations, work on OIM development has focused on improving their predictive performance in conditions such as additive noise (Rhebergen and Versfeld, 2005; Jørgensen et al., 2013; Tang and Cooke, 2016) and reverberation (Rennies et al., 2011; Tang et al., 2016c).
- 1
Passed away on July 22nd, 2012.