Elsevier

Speech Communication

Volume 49, Issues 7–8, July–August 2007, Pages 588-601
Speech Communication

Subjective comparison and evaluation of speech enhancement algorithms

https://doi.org/10.1016/j.specom.2006.12.006Get rights and content

Abstract

Making meaningful comparisons between the performance of the various speech enhancement algorithms proposed over the years has been elusive due to lack of a common speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy speech corpus suitable for evaluation of speech enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 speech enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc., using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests.

Introduction

Over the past three decades, various speech enhancement algorithms have been proposed to improve the performance of modern communication devices in noisy environments. Yet, it still remains unclear as to which speech enhancement algorithm performs well in real-world listening situations where the background noise level and characteristics are constantly changing. Reliable and fair comparison between algorithms has been elusive for several reasons, including lack of common speech database for evaluation of new algorithms, differences in the types of noise used and differences in the testing methodology. Without having access to a common speech database, it is nearly impossible for researchers to compare at very least the objective performance of their algorithms with that of others. Subjective evaluation of speech enhancement algorithms is further complicated by the fact that the quality of enhanced speech has both signal and noise distortion components, and it is not clear as to whether listeners base their quality judgments on the signal distortion, noise distortion or both. This concern was recently addressed by a new ITU-T standard (P.835) that was designed to lead the listeners to integrate the effects of both signal and background distortion in making their ratings of overall quality.

In this paper, we report on the subjective comparison and evaluation of 13 speech enhancement algorithms using the ITU-T P.835 methodology. The speech enhancement algorithms were chosen to encompass four different classes of noise reduction methods: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. These algorithms were evaluated using a newly developed noisy speech corpus (NOIZEUS) suitable for evaluation of speech enhancement algorithms and available from our website. The enhanced speech files were sent to Dynastat, Inc (Austin, TX) for subjective evaluation using the recently standardized methodology for evaluating noise suppression algorithms based on ITU-T P.835 (2003). This paper presents the results from the comparative analysis of the subjective tests.

Section snippets

NOIZEUS: a noisy speech corpus for evaluation of speech enhancement algorithms

NOIZEUS1 is a noisy speech corpus recorded in our lab to facilitate comparison of speech enhancement algorithms among research groups. The noisy database contains 30 IEEE sentences (IEEE Subcommittee, 1969) produced by three male and three female speakers, and was corrupted by eight different real-world noises at different SNRs. The noise was taken from the AURORA database (Hirsch et al., 2000) and includes suburban train noise,

Algorithms evaluated

A total of 13 different speech enhancement methods were evaluated based on our own implementation (see list in Table 3). Representative algorithms from four different classes of enhancement algorithms were chosen: three spectral subtractive algorithms, two subspace algorithms, three Wiener-type algorithms and five statistical-model based algorithms. The Wiener-type algorithms were grouped separately since these algorithms estimate the complex spectrum in the mean square sense while the

Subjective evaluation

To reduce the length and cost of the subjective evaluations, only a subset of the NOIZEUS corpus was processed by the 13 algorithms and submitted to Dynastat, Inc., for formal subjective evaluation. A total of 16 sentences (see Table 1, Table 2) corrupted in four background noise environments (car, street, babble and train) at two levels of SNR (5 dB and 10 dB) were processed. These sentences were produced by two male speakers and two female speakers.

Statistical analysis and discussion

We present comparative analysis at three levels. At the first level, we compare the performance of the algorithms within each of the four classes (subspace, statistical-model, subtractive, and Wiener-type). This comparison was meant to examine whether there were significant differences between algorithms within each class. At the second level, we compare the performance of the various algorithms across all classes aiming to find the algorithm(s) that performed the best across all noise

Conclusions

The present study reported on the subjective evaluation of 13 different speech enhancement algorithms using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. A total of 32 listeners participated in the listening tests. Based on the statistical analysis of the listener’s ratings of the enhanced speech, in terms of overall quality, speech and noise distortion, we can draw the following conclusions:

  • (1)

Acknowledgements

The authors would like to thank Dr. Alan Sharpley of Dynastat, Inc., for all his help and advice throughout the duration of this project. Research was supported in part by Grant No. R01 DC07527 from NIDCD/NIH.

References (23)

  • S. Rangachari et al.

    A noise estimation algorithm for highly non-stationary environments

    Speech Commun.

    (2006)
  • Berouti, M., Schwartz, R., Makhoul, J., 1979. Enhancement of speech corrupted by acoustic noise. In: Proc. IEEE Int....
  • I. Cohen

    Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator

    IEEE Signal Process. Lett.

    (2002)
  • I. Cohen

    Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging

    IEEE Trans. Speech Audio Proc.

    (2003)
  • Y. Ephraim et al.

    Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator

    IEEE Trans. Acoust. Speech Signal Process.

    (1984)
  • Y. Ephraim et al.

    Speech enhancement using a minimum mean-square error log-spectral amplitude estimator

    IEEE Trans. Acoust. Speech Signal Process.

    (1985)
  • H. Gustafsson et al.

    Spectral subtraction using reduced delay convolution and adaptive averaging

    IEEE Trans. Speech Audio Proc.

    (2001)
  • Hirsch, H., Pearce, D., 2000. The aurora experimental framework for the performance evaluation of speech recognition...
  • Y. Hu et al.

    A generalized subspace approach for enhancing speech corrupted by colored noise

    IEEE Trans. Speech Audio Proc.

    (2003)
  • Y. Hu et al.

    Speech enhancement based on wavelet thresholding the multitaper spectrum

    IEEE Trans. Speech Audio Proc.

    (2004)
  • IEEE Subcommittee

    IEEE recommended practice for speech quality measurements

    IEEE Trans. Audio Electroacoust.

    (1969)
  • Cited by (615)

    View all citing articles on Scopus
    View full text