Elsevier

Speech Communication

Volume 96, February 2018, Pages 102-115
Speech Communication

SEDA: A tunable Q-factor wavelet-based noise reduction algorithm for multi-talker babble

https://doi.org/10.1016/j.specom.2017.11.004Get rights and content

Abstract

We introduce a new wavelet-based algorithm to enhance the quality of speech corrupted by multi-talker babble noise. The algorithm comprises three stages: The first stage classifies short frames of the noisy speech as speech-dominated or noise-dominated. We design this classifier specifically for multi-talker babble noise. The second stage performs preliminary de-nosing of noisy speech frames using oversampled wavelet transforms and parallel group thresholding. The final stage performs further denoising by attenuating residual high frequency components in the signal produced by the second stage. A significant improvement in intelligibility and quality was observed in evaluation tests of the algorithm with cochlear implant users.

Introduction

Although cochlear implants (CIs) have been highly successful at providing speech understanding in optimal listening situations to the profoundly deaf (e.g. Friedland et al., 2010), the performance of CI users is severely impacted by the presence of background noise (e.g. Fetterman and Domico, 2002, Muller-Deile et al., 1995). Therefore, signal processing to remove background noise can be highly beneficial for CI users (e.g. Dawson et al., 2011). One type of noise that has a particularly significant effect on CI user speech understanding is “multi-talker babble” which consists of many people talking simultaneously in the background (e.g. Sperry et al., 1997). However, multi-talker babble is one of the most frequently encountered noises that CI users face. Hence, attenuating the speech from competing talkers is expected to provide speech perception benefits for CI users.

Multi-talker babble is an example of a non-stationary noise. Unlike stationary signals (e.g., white noise), in a non-stationary signal, statistical parameters like mean, variance and autocovariance change over time. Hence it is generally more challenging to predict or model the behavior of a non-stationary signal over time. Although many real-time single-channel noise removal methods have been proposed for CI devices, fewer of these methods have provided benefits in non-stationary noises such as multi-talker babble. Spectral similarities between multi-talker babble and target speech (caused by the fact that both the target speech and noise are comprised of speech signals) as well as the non-stationary nature of multi-talker babble make it difficult to differentiate and separate multi-talker babble from the target speech.

Yang and Fu (2005) proposed using pause detection and spectral subtraction for noise reduction and tested the algorithm with seven post-lingually deafened CI users. While a significant effect of the algorithm was detected with speech-shaped noise, no significant effect of the algorithm was detected with 6 talker babble. Another noise reduction method for CI users is to reduce the gain of the envelope of noise-dominated frequency channels (Bentler and Chiou, 2006). This method has been commercially implemented (e.g. ClearVoice) but Holden et al. (2013) was unable to detect a significant benefit using ClearVoice with multi-talker babble.

Mauger et al. (2012) introduced an optimized noise reduction method by increasing the temporal smoothing of the signal to noise ratio estimate and using a more aggressive gain function. This method was tested in real-time on 12 CI users and significant improvement was found in 4 and 20 talker babble.

Goehring et al. (2016) used auditory features extracted from the noisy speech and a neural network classifier to find and retain the frequency channels which have higher signal to noise ratio and attenuate the channels with lower signal to noise ratio. Two versions of the algorithm (i.e., speaker-dependent and speaker-independent) were tested on 14 cochlear implant users for three different noise types including 20 talker babble. Significant improvement was achieved in multi-talker babble specifically with the speaker-dependent algorithm. However, no significant improvement was observed in multi-talker babble with the speaker-independent algorithm (see Table 1).

Sigmoidal-shaped compression functions have been shown to be effective for speech understanding against a background of multi-talker babble with 20 background talkers (Hu et al., 2007, Kasturi and Loizou, 2007) by attenuating channels with a low signal-to-noise ratio (SNR). However, the perceptual and statistical properties of multi-talker babble depend on the number of talkers (Krishnamurthy and Hansen, 2009). The more talkers present in a background noise, the more the properties of the noise resemble stationary noise. The performance of the sigmoidal-shaped compression functions for multi-talker babble with smaller number of talkers is not clear.

Toledo et al. (2003) observed speech intelligibility improvement for multi-talker babble in four cochlear implant users. Their method is based on envelope subtraction and estimates the noise envelope using a minimum tracking technique (See Table 1).

Wavelet-based denoising algorithms have also been introduced for cochlear implant devices. Ye et al. (2013) proposed shrinkage and thresholding in conjunction with a critically-sampled dual-tree complex wavelet transform. While significant improvement was observed in speech-weighted noise, no significant benefit was observed for multi-talker babble. This is expected, because the algorithm was designed for and trained with speech weighted noise.

Many other single-channel denoising methods have been proposed for cochlear implant devices (e.g. Loizou et al., 2005, Healy et al., 2013, and Chung et al., 2004). However, only a subset of these single-channel denoising methods have been evaluated with multi-talker babble noise. For those algorithms, which have been evaluated with multi-talker babble, the testing conditions, sentence corpuses, languages and types of babble noise vary across studies and therefore it is difficult to compare the effectiveness across algorithms. It is worth noting that most algorithms provided statistically significant improvements only for high SNRs. For reference, the results and testing conditions for some of these denoising algorithms are summarized in Table 1.

In this paper, we propose and evaluate a front-end babble noise reduction algorithm. Although the algorithm is not necessarily specific for CI users, we evaluate performance of the algorithm with CI users because they stand to benefit greatly from noise reduction for positive SNRs where we expect the algorithm to perform best.

Section snippets

Algorithm

The babble noise reduction problem can be summarized asY=S+i=1nSi where Y is the noisy signal, S is the target speech and S1 to Sn are individual background talkers which collectively form the multi-talker babble. For developing the algorithm, we made the following assumptions:

  • 1.

    Target speech and background babble both consist of human speech. This makes it difficult to distinguish the target speech from the background babble.

  • 2.

    Babble, which comprises of multiple independent speech signals, is

Methods

The effect of SEDA on speech intelligibility and sound quality was evaluated for cochlear implant users. IEEE sentences were presented against randomly generated multi-talker background noise at SNRs between 0 and 9 dB with and without SEDA processing. In the first experiment, subjects were asked to repeat as much of each sentence as they could understand. In the second experiment, subjects were asked to rate the sound quality of each sentence.

Speech in noise intelligibility

The percent of words correct for each condition is presented for each subject in Fig. 12. As the test results show, the performance generally increases as a function of SNR.

Furthermore, performance with SEDA noise reduction was higher for all subjects at all SNRs except for C101 at 0 dB SNR, where word recognition was 0 for both processed and processed samples. There was, however, a great deal of variability in the magnitude of the improvement from SEDA noise reduction across subjects and SNRs.

Discussion

Considering the particularly difficult nature of the babble noise reduction for CI devices and limited number of previous works in this field, babble noise reduction is a worthwhile area for CI research. SEDA is an effort to address the babble problem for cochlear implant users. It provides intelligibility and sound quality benefits for CI users in babble noise environments by employing a new approach. SEDA uses a classifier which is specifically tuned for multi-talker babble. It also employs a

Acknowledgments

We appreciate the efforts of all of subjects who provided their valuable time. The authors also thank Natalia Stupak for coordinating the tests. Support for this research was provided by the National Institutes of Health/National Institute on Deafness and Other Communication Disorders (R01 DC012152; PI: Landsberger) as well as an NYU School of Medicine Applied Research Support Fund internal grant.

References (42)

  • B.L. Fetterman et al.

    Speech recognition in background noise of cochlear implant patients

    Otolaryngol. Head Neck Surg.

    (2002)
  • D.A. Reynolds et al.

    Speaker verification using adapted gaussian mixture models

    Digit. Signal Process.

    (2000)
  • Bello J.P. Spring 2016. EL9173 selected topics in signal processing: audio content analysis [Online]...
  • R. Bentler et al.

    Digital noise reduction: an overview

    Trends Amplification

    (2006)
  • R.C. Bilger et al.

    Standardization of a test of speech perception in noise

    J. Speech Hear Res.

    (1984)
  • C.M. Bishop

    Pattern Recognition and Machine Learning

    (2007)
  • K. Chung et al.

    Utilizing advanced hearing aid technologies as pre-processors to enhance cochlear implant performance

    Cochlear Implants Int.

    (2004)
  • P.W. Dawson et al.

    Clinical evaluation of signal-to-noise ratio-based noise reduction in Nucleus(R) cochlear implant recipients

    Ear Hear

    (2011)
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • D.R. Friedland et al.

    Case-control analysis of cochlear implant performance in elderly patients

    Arch. Otolaryngol. Head Neck Surg.

    (2010)
  • T. Goehring et al.

    Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users

    Hearing Res.

    (2016)
  • Gu Q., Li Z., Han J. 2012. Generalized Fisher Score for feature selection. arXiv preprint...
  • O. Hazrati et al.

    Blind binary masking for reverberation suppression in cochlear implants

    J. Acoust. Soc. Am.

    (2013)
  • E.W. Healy et al.

    An algorithm to improve speech recognition in noise for hearing-impaired listeners

    J. Acoust. Soc. Am.

    (2013)
  • L.K. Holden et al.

    Postlingual adult performance in noise with HiRes 120 and clearvoice low, medium, and high

    Cochlear Implants Int.

    (2013)
  • Y. Hu et al.

    Use of a sigmoidal-shaped function for noise attenuation in cochlear implants

    J. Acoust. Soc. Am.

    (2007)
  • IEEE Recommended Practice for Speech Quality Measurements, in IEEE No 297-1969, pp. 1–24, June 11 1969...
  • K. Kasturi et al.

    Use of S-shaped input-output functions for noise suppression in cochlear implants

    Ear Hear

    (2007)
  • R. Kohavi

    A study of cross-validation and bootstrap for accuracy estimation and model selection

  • N. Krishnamurthy et al.

    Babble noise: modeling, analysis, and applications

    IEEE Trans. Audio, Speech, Lang. Process.

    (2009)
  • P.C. Loizou et al.

    Subspace algorithms for noise reduction in cochlear implants

    J. Acoust. Soc. Am.

    (2005)
  • Cited by (11)

    • ALTIS: A new algorithm for adaptive long-term SNR estimation in multi-talker babble

      2019, Computer Speech and Language
      Citation Excerpt :

      In the present manuscript, we introduced ALTIS which is an algorithm capable of providing an adaptive and real-time estimate of the long-term SNR when speech is corrupted by multi-talker babble. It was specifically trained and tested with multi-talker babble noise as the algorithm was originally designed to be implemented as a component of the SEDA (Soleymani et al., 2018) babble noise reduction algorithm. However, ALTIS could be trained with other types of noise and function independently or as a component of another algorithm.

    • A feature extraction technique based on tunable Q-factor wavelet transform for brain signal classification

      2019, Journal of Neuroscience Methods
      Citation Excerpt :

      However, most of the nonlinear approaches are slow in their implementation, which makes them difficult to use in real time (Diykh et al., 2016; Zhu et al., 2014). Most recently, the tunable Q-factor WT (TQWT) has become popular in brain signal processing (Patidar et al., 2017; Patidar and Panigrahi, 2017; Patidar et al., 2015; Hassan et al., 2016) and in the other fields (Soleymani et al., 2018; Selesnick, 2011a, 2011b) as a flexible and discrete wavelet transform that is applicable particularly for analysing oscillatory signals (Hassan et al., 2016). Most wavelet transforms are incapable of tuning their Q-factors.

    View all citing articles on Scopus
    View full text