Abstract
This paper presents a Gaussian-based distributed speech presence probability (DSPP) estimator which is applied in fully connected wireless acoustic sensor networks (WASNs). In WASNs, we are primarily interested in optimally utilizing all available information of recorded signals. In this work, under the Gaussian statistical assumption of signals, each node computes the DSPP using its own local signals along with the compressed signals from other nodes. We evaluate the effect of DSPP estimation on noise reduction from both the simulated and the real recorded signals. The performance of the proposed DSPP estimator is compared to that of local SPP estimation, where each node only uses its noisy signals, and to that of centralized SPP estimation, where each node uses all recorded noisy signals of the whole network. It is shown that the proposed method exhibits good performance, while the computational complexity is considerably reduced.
Similar content being viewed by others
Notes
The ratio of clean signal variance to noise signal variance.
A priori knowledge, indicating whether speech segments are more probable or silence.
The ratio of noisy signal variance to noise signal variance
Central limit theorem states that when independent random variables are added, the summation tends toward a Gaussian distribution regardless of the distribution of the original variables.
In Souden et al. [32], it is shown that when the noise is a mixture of both coherent point source interference (e.g., non Gaussian babble, pink or factory noises) and non-coherent additive white noise, the SPP estimator is theoretically able to achieve an estimate close to one when speech is present. Interested readers are referred to Souden et al. [32] for theoretical proof.
In the local case, there is no cooperation and consequently no transmitted signals between nodes, and each node only uses the recorded signals by its own microphones. Indeed, in this case instead of \( {\mathbf {y}}, {\varvec{\Phi }}_{{\mathbf {v}}},\) and \({\varvec{\Phi }}_{{\mathbf {x}}} \) in (14), the information of each node, i.e., \( {\mathbf {y}}_{k}, {\varvec{\Phi }}_{{\mathbf {v}}_{k}},\) and\( {\varvec{\Phi }}_{{\mathbf {x}}_{k}} \), are utilized to compute the SPP. Since the procedure is similar to that of CSPP and it is only required to replace the parameters, we explain this case briefly.
Since the first microphone in the first node was considered as the reference microphone, the input full-band SNRs is computed for this microphone.
References
J.B. Allen, D.A. Berkley, Image method for efficiently simulating small-room acoustics. Acoust. Soc. Am. J. 65, 943–950 (1979). https://doi.org/10.1121/1.382599
A. Bertrand, M. Moonen, Distributed adaptive node-specific signal estimation in fully connected sensor networks—part I: sequential node updating. IEEE Trans. Signal Process. 58(10), 5277–5291 (2010). https://doi.org/10.1109/TSP.2010.2052612
A. Bertrand, M. Moonen, Distributed adaptive node-specific signal estimation in fully connected sensor networks—part II: simultaneous and asynchronous node updating. IEEE Trans. Signal Process. 58(10), 5292–5306 (2010). https://doi.org/10.1109/TSP.2010.2052613
I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process. 11(5), 466–475 (2003). https://doi.org/10.1109/TSA.2003.811544
I. Cohen, B. Berdugo, Speech enhancement for non-stationary noise environments. Signal Process. 81(11), 2403–2418 (2001)
S. Doclo, M. Moonen, T. Van den Bogaert, J. Wouters, Reduced-bandwidth and distributed MWF-based noise reduction algorithms for binaural hearing aids. IEEE Trans. Audio Speech Lang. Process. 17(1), 38–51 (2009). https://doi.org/10.1109/TASL.2008.2004291
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985). https://doi.org/10.1109/TASSP.1985.1164550
D. Fischer, S. Doclo, E.A.P. Habets, T. Gerkmann, Combined single-microphone Wiener and MVDR filtering based on speech interframe correlations and speech presence probability. in Proceedings of Speech Communication; 12. ITG Symposium, pp. 1–5 (2016)
B. Fodor, T. Fingscheidt, MMSE speech enhancement under speech presence uncertainty assuming (generalized) Gamma speech priors throughout. in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4033–4036 (2012). https://doi.org/10.1109/ICASSP.2012.6288803
B. Fodor, T. Gerkmann, A posteriori speech presence probability estimation based on averaged observations and a super-Gaussian speech model. in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 11–15 (2014). https://doi.org/10.1109/IWAENC.2014.6953309
B. Fodor, T. Gerkmann, A speech presence probability estimator based on fixed priors and a heavy-tailed speech model. in Proceedings of European Signal Processing Conference (EUSIPCO), pp. 2305–2309 (2014)
J.S. Garofolo, Getting started with the DARPA TIMIT CD-ROM: an acoustic phonetic continuous speech database (Tech. rep, National Institute of Standards and Technology (NIST), Gaithersburgh, MD, 1988)
T. Gerkmann, C. Breithaupt, R. Martin, Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Trans. Audio Speech Lang. Process. 16(5), 910–919 (2008). https://doi.org/10.1109/TASL.2008.921764
T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012). https://doi.org/10.1109/TASL.2011.2180896
T. Gerkmann, M. Krawczyk, R. Martin, Speech presence probability estimation based on temporal Cepstrum smoothing. in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4254–4257 (2010). https://doi.org/10.1109/ICASSP.2010.5495677
E.A.P. Habets, J. Benesty, I. Cohen, S. Gannot, J. Dmochowski, New insights into the MVDR beamformer in room acoustics. IEEE Trans. Audio Speech Lang. Process. 18(1), 158–170 (2010). https://doi.org/10.1109/TASL.2009.2024731
A. Hassani, A. Bertrand, M. Moonen, GEVD-based low-rank approximation for distributed adaptive node-specific signal estimation in wireless sensor networks. IEEE Trans. Signal Process. 64(10), 2557–2572 (2016). https://doi.org/10.1109/TSP.2015.2510973
A.I. Koutrouvelis, T.W. Sherson, R. Heusdens, R.C. Hendriks, A low-cost robust distributed linearly constrained beamformer for wireless acoustic sensor networks with arbitrary topology. IEEE/ACM Trans. Audio Speech Lang. Process. 26(8), 1434–1448 (2018). https://doi.org/10.1109/TASLP.2018.2829405
M. Krawczyk-Becker, D. Fischer, T. Gerkmann, Utilizing spectro-temporal correlations for an improved speech presence probability based noise power estimation. in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 365–369 (2015). https://doi.org/10.1109/ICASSP.2015.7177992
T.C. Lawin-Ore, S. Doclo, Analysis of the average performance of the multi-channel Wiener filter for distributed microphone arrays using statistical room acoustics. Signal Process. 107(C), 96–108 (2015)
T.C. Lawin-Ore, S. Stenzel, J. Freudenberger, S. Doclo, Alternative formulation and robustness analysis of the multichannel Wiener filter for spatially distributed microphones. in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC). Juan les Pins, France (2014). https://doi.org/10.1109/IWAENC.2014.6954008
D. Malah, R.V. Cox, A.J. Accardi, Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments. in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 789–792 (1999). https://doi.org/10.1109/ICASSP.1999.759789
S. Markovich-Golan, A. Bertrand, M. Moonen, S. Gannot, Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks. Signal Process. 107, 4–20 (2015). https://doi.org/10.1016/j.sigpro.2014.07.014
S. Markovich-Golan, S. Gannot, I. Cohen, Distributed multiple constraints generalized sidelobe canceler for fully connected wireless acoustic sensor networks. IEEE Trans. Audio Speech Lang. Process. 21(2), 343–356 (2013). https://doi.org/10.1109/TASL.2012.2224454
R. Martin, Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors. in Proceedings International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–253–I–256 (2002). https://doi.org/10.1109/ICASSP.2002.5743702
R. Martin, Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors. IEEE Trans. Speech Audio Process. 13(5), 845–856 (2005). https://doi.org/10.1109/TSA.2005.851927
R. Martin, C. Breithaupt, Speech enhancement in the DFT domain using Laplacian speech priors. in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC) (2003)
R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980). https://doi.org/10.1109/TASSP.1980.1163394
H. Momeni, H.R. Abutalebi, E.A.P. Habets, Conditional MMSE-based single-channel speech enhancement using inter-frame and inter-band correlations. in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5215–5219 (2016). https://doi.org/10.1109/ICASSP.2016.7472672
K. Ngo, A. Spriet, M. Moonen, J. Wouters, S.H. Jensen, Incorporating the conditional speech presence probability in multi-channel Wiener filter based noise reduction in hearing aids. EURASIP J. Adv. Signal Process. (2009). https://doi.org/10.1155/2009/930625
R. Ranjbaryan, S. Doclo, H.R. Abutalebi, Distributed MAP estimators for noise reduction in fully connected wireless acoustic sensor networks. in Proceedings of Speech Communication; 13th ITG-Symposium, pp. 1–5 (2018)
M. Souden, J. Chen, J. Benesty, S. Affes, Gaussian model-based multichannel speech presence probability. IEEE Trans. Audio Speech Lang. Process. 18(5), 1072–1077 (2010). https://doi.org/10.1109/TASL.2009.2035150
M. Souden, J. Chen, J. Benesty, S. Affes, An integrated solution for online multichannel noise tracking and reduction. IEEE Trans. Audio Speech Lang. Process. 19(7), 2159–2169 (2011). https://doi.org/10.1109/TASL.2011.2118205
M. Taseska, E.A.P. Habets, Informed spatial filtering for sound extraction using distributed microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 22(7), 1195–1207 (2014). https://doi.org/10.1109/TASLP.2014.2327294
V.M. Tavakoli, J.R. Jensen, M.G. Christensen, J. Benesty, A framework for speech enhancement with ad hoc microphone arrays. IEEE/ACM Trans. Audio Speech Lang Process. 24(6), 1038–1051 (2016). https://doi.org/10.1109/TASLP.2016.2537202
Acknowledgements
We would like to express our appreciation to Iran National Science Foundation (INSF) for supporting this work under Grant number 96000455. We are also grateful to the Department of Medical Physics and Acoustics, University of Oldenburg, for allowing access to their recorded data.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ranjbaryan, R., Abutalebi, H.R. Distributed Speech Presence Probability Estimator in Fully Connected Wireless Acoustic Sensor Networks. Circuits Syst Signal Process 39, 6121–6141 (2020). https://doi.org/10.1007/s00034-020-01452-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-020-01452-4