Abstract
Sparse component analysis techniques have been successfully applied to the separation of speech sources. This paper presents an efficient algorithm based on the matching pursuit approach to deal with multichannel records. The proposed algorithm explicitly employs spatial constraints among different channels to express mixed signals as linear combinations of delayed components selected from an overcomplete dictionary. We present a new procedure for estimating the mixing system parameters (attenuations and delays), which can be applied to more than two mixtures and is not restricted to non-negative attenuation coefficients. The proposed mixing system estimation method can accommodate delays of greater magnitude than traditional approaches. In addition, learned dictionaries that improve the identification step can be used when excerpts from sources (exogenous to mixtures) are available. The simulation results show that semi-blind dictionaries perform better than those used in blind configurations.
Similar content being viewed by others
Notes
The first row of \(\tilde{\varvec{H}}^{(1)}_{\mathcal {R}}\) is composed only by ones.
The subscript “ideal” indicates that this is the matrix expected to be returned by the system identification algorithm.
References
F. Abrard, Y. Deville, Blind separation of dependent sources using the time-frequency ratio of mixtures approach, in ISSPA (2003), pp. 1–4
F. Abrard, Y. Deville, A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources. Signal Process. 85(7), 1389–1403 (2005)
F. Abrard, Y. Deville, P. White, From blind source separation to blind source cancellation in the underdetermined case: a new approach based on time-frequency analysis, in Proceedings of 3rd International Conference on Independent Component Analysis Signal Separation (ICA) (2001), pp. 734–739
M.E.M. Aharon, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
A. Aissa-El-Bey, N. Linh-Trung, K. Abed-Meraim, A. Belouchrani, Y. Grenier, Underdetermined blind separation of nondisjoint sources in the time-frequency domain. IEEE Trans. Signal Process. 55(3), 897–907 (2007)
S.I. Amari, T.P. Chen, A. Cichocki, Nonholonomic orthogonal learning algorithm for blind source separation. Neural Comput. 12(6), 1463–1484 (2000)
S. Araki, H. Sawada, R. Mukai, S. Makino, Underdetermined sparse source separation of convolutive mixtures with observation vector clustering, in Proceedings of IEEE International Symposium on Circuits Systems (2006), pp. 3594–3597
G. Bao, Y. Xu, Z. Ye, Learning a discriminative dictionary for single-channel speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(7), 1130–1138 (2014). https://doi.org/10.1109/TASLP.2014.2320575
T. Barker, T. Virtanen, N.H. Pontoppidan, Low-latency sound-source-separation using non-negative matrix factorization with coupled analysis and synthesis dictionaries, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 241–245
L. Benaroya, F. Bimbot, R. Gribonval, Audio source separation with a single sensor. IEEE Trans. Audio Speech Lang. Process. 14(1), 191–199 (2006)
P. Bofill, M. Zibulevsky, Underdetermined blind source separation using sparse representations. Signal Process. 81, 2353–2362 (2001)
F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of German emotional speech. Interspeech 5, 1517–1520 (2005)
S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
C. Chenot, J. Bobin, J. Rapin, Robust sparse blind source separation. IEEE Signal Process. Lett. 22(11), 2172–2176 (2015)
F.S.P. Clark, M.R. Petraglia, D.B. Haddad, A new initialization method for frequency-domain blind source separation algorithms. IEEE Signal Process. Lett. 18(6), 343–346 (2011)
G. Davis, S. Mallat, M. Avellaneda, Adaptive greedy approximations. Constr. Approx. 13(1), 57–98 (1997)
G.A. de Oliveira, M.P. Tcheou, L. Lovisolo, Artificial neural networks for dictionary selection in adaptive greedy decomposition algorithms with reduced complexity, in 2018 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2018), pp. 1–8
R.A. DeVore, V.N. Temlyakov, Some remarks on greedy algorithms. Adv. Comput. Math. 5(1), 173–187 (1996)
Z. Dong, W. Zhu, An improvement of the penalty decomposition method for sparse approximation. Signal Process. 113, 52–60 (2015)
K. Engan, S.O. Aase, J.H. Husoy, Multi-frame compression: theory and design. EURASIP Signal Process. 80(10), 2121–2140 (2000)
S.E. Ferrando, L.A. Kolasa, N. Kovacevic, Algorithm 820: a flexible implementation of matching pursuit for gabor functions on the interval. ACM Trans. Math. Softw. 28(3), 337–353 (2002)
C. Févotte, S.J. Godsill, A bayesian approach for blind separation of sparse sources. IEEE Trans. Audio Speech Process. 14(6), 2174–2188 (2006)
J.H. Friedman, W. Stuetzle, Projection pursuit regression. J. Am. Stat. Assoc. 13(376), 435–475 (1981)
S. Gannot, E. Vincent, S. Markovich-Golan, A. Ozerov, A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 692–730 (2017)
P. Georgiev, F. Theis, A. Cichocki, Sparse component analysis and blind source separation of underdetermined mixtures. IEEE Trans. Neural Netw. 16(4), 992–996 (2005)
S. Goel, A. Verma, S. Goel, K. Juneja, ICA in image processing: a survey, in 2015 IEEE International Conference on Computational Intelligence and Communication Technology (CICT) (2015), pp. 144–149
M.M. Goodwin, Multichannel matching pursuit and applications to spatial audio coding, in 2006 Fortieth Asilomar Conference on Signals, Systems and Computers (IEEE, 2006), pp. 1114–1118
B.V. Gowreesunker, A.H. Tewfik, Dictionary and sparse decomposition method selection for underdetermined blind source separation, in EUSIPCO (2007), pp. 768–772
R. Gribonval, Sparse decomposition of stereo signals with matching pursuit and application to blind separation of more than two sources from a stereo mixture, in International Conference on Acoustic, Speech, and Signal Processing, vol. 3 (2002), pp. 3057–3060
R. Gribonval, M. Zibulevsky, Sparse component analysis, in Handbook of Blind Source Separation, ed. by P. Comon, C. Jutten (Elsevier, Amsterdam, 2010), pp. 367–420
D.B. Haddad, Estruturas em subbandas para filtragem adaptativa e separação cega e semi-cega de sinais de voz. Ph.D. thesis, UFRJ/COPPE (2013)
C. Hesse, C. James, On semi-blind source separation using spatial constraints with applications in EEG analysis. IEEE Trans. Biomed. Eng. 53(12), 2525–2534 (2006)
P.S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, Deep learning for monaural speech separation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 1562–1566
A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, 1st edn. (Wiley, New York, 2001)
K. Itoyama, M. Goto, K. Komatani, T. Ogata, H.G. Okuno, Simultaneous processing of sound source separation and musical instrument identification using bayesian spectral modeling, in Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (2011), pp. 3816–3819
M.G. Jafari, M.D. Plumbley, Fast dictionary learning for sparse representations of speech signals. IEEE J. Sel. Top. Signal Process. 5(5), 1025–1031 (2011)
X. Jaureguiberry, E. Vincent, G. Richard, Fusion methods for speech enhancement and audio source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1266–1279 (2016)
M. Jia, J. Sun, C. Bao, C. Ritz, Separation of multiple speech sources by recovering sparse and non-sparse components from b-format microphone recordings. Speech Commun. 96, 184–196 (2018)
P. Jost, P. Vandergheynst, P. Frossard, Tree-based pursuit: algorithm and properties. IEEE Trans. Signal Process. 54(12), 4685–4697 (2006)
P. Kabal, Tsp speech database, McGill University, Database Version, vol. 1, 09–02 (2002)
D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
S. Leglaive, R. Badeau, G. Richard, Semi-blind student’s t source separation for multichannel audio convolutive mixtures, in 2017 25th European Signal Processing Conference (EUSIPCO) (2017), pp. 2259–2263
S. Lesage, S. Krstulovic, R. Gribonval, Underdetermined source separation: comparison of two approaches based on sparse decompositions, in ICA (2006), pp. 633–640
Y. Li, S. Amari, A. Cichocki, D.W.C. Ho, S. Xie, Underdetermined blind source separation based on sparse representation. IEEE Trans. Signal Process. 54(2), 423–437 (2006)
B. Liu, V.G. Reju, A.W.H. Khong, Underdetermined instantaneous blind source separation of sparse signals with temporal structure using the state-space model, in 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (2013), pp. 81–85
L. Lovisolo, E.A.B. da Silva, P.S.R. Diniz, On the statistics of matching pursuit angles. Signal Process. 90, 3164–3184 (2010)
L. Lovisolo, E.A.B. da Silva, M.A.M. Rodrigues, P.S.R. Diniz, Efficient coherent adaptive representations of monitored electric signals in power systems using damped sinusoids. IEEE Trans. Signal Process. 53(10), 3831–3846 (2005)
L. Lovisolo, M.P. Tcheou, E.A.B. da Silva, M.A.M. Rodrigues, P.S.R. Diniz, Modeling of electric disturbance signals using damped sinusoids via atomic decompositions and its applications. EURASIP J. Appl. Signal Process. 2007, 1–16 (2007)
S. Mallat, A Wavelet Tour of Signal Processing (Academic Press, London, 1999)
S.G. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
J. Ming, R. Srinivasan, D. Crookes, A. Jafari, Close—a data-driven approach to speech separation. IEEE Trans. Audio Speech Lang. Process. 21(7), 1355–1368 (2013)
L.A. Muth, C.M. Wang, T. Conn, Robust separation of background and target signals in radar cross section measurements. IEEE Trans. Instrum. Meas. 54(6), 2462–2468 (2005)
F. Nesta, M. Fakhry, Unsupervised spatial dictionary learning for sparse underdetermined multichannel source separation, in ICASSP (2013), pp. 86–90
C.I. Nieblas, M.A. Alonso, R. Conte, S. Villareal, High performance heart sound segmentation algorithm based on matching pursuit, in Proceedings of DSP/SPE Workshop (2013), pp. 96–100
M. Parvaix, L. Girin, Informed source separation of linear instantaneous under-determined audio mixtures by source index embedding. IEEE Trans. Audio Speech Lang. Process. 19(6), 1721–1733 (2011)
M.S. Pedersen, D. Wang, J. Larsen, U. Kjems, Two-microphone separation of speech mixtures. IEEE Trans. Neural Netw. 19(3), 475–492 (2013)
T. Peel, V. Emiya, L. Ralaivola, Matching pursuit with stochastic selection, in Proceedings of European Signal Processing Conference (2012), pp. 879–883
M. Puigt, Y. Deville, Time-frequency ratio-based blind separation methods for attenuated and time-delayed sources. Mech. Syst. Signal Process. 19(6), 1348–1379 (2005)
S. Qian, Introduction to Time-Frequency and Wavelet Transforms, vol. 68 (Prentice Hall PTR, Upper Saddle River, 2002)
W. Rafique, S.M. Naqvi, P.J. Jackson, J.A. Chambers, Iva algorithms using a multivariate student’s t source prior for speech source separation in real room environments, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 474–478
R. Rehr, T. Gerkmann, On the importance of super-gaussian speech priors for machine-learning based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 26(2), 357–366 (2018). https://doi.org/10.1109/TASLP.2017.2778151
W. Ren, G. Li, D. Tu, L. Jia, Nonnegative matrix factorization with regularizations. IEEE J. Emerg. Sel. Top. Circuits Syst. 4(1), 153–164 (2014)
S. Rickard, The duet blind source separation algorithm, in Blind Speech Separation, Signals and Communication Technology, ed. by S. Makino, H. Sawada, T.W. Lee (Springer, Amsterdam, 2007), pp. 217–241
R. Rubinstein, A.M. Bruckstein, M. Elad, Dictionaries for sparse representation modeling. Proc. IEEE 98(6), 1045–1057 (2010)
R. Rubinstein, T. Peleg, M. Elad, Analysis K-SVD: a dictionary–learning algorithm for the analysis sparse model. IEEE Trans. Signal Process. 61(3), 661–677 (2013)
Z. Sadeghipoor, M. Babaie-Zadeh, Dictionary learning for sparse decomposition: a new criterion and algorithm, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (2013), pp. 5855–5859
S.E. Selvan, Nonsmooth ica contrast minimization using a riemannian Nelder–Mead method. IEEE Trans. Neural Netw. Learn. Syst. 26(1), 177–183 (2015)
B.L. Sturm, J.J. Shynk, Sparse approximation and the pursuit of meaningful signal models with interference adaptation. IEEE Trans. Audio Speech Lang. Process. 18(3), 461–472 (2010)
B.L. Sturm, J.J. Shynk, L. Daudet, C. Roads, Dark energy in sparse atomic estimations. IEEE Trans. Audio Speech Lang. Process. 16(3), 671–676 (2008)
B.L. Sturm, J.J. Shynk, S. Gauglitz, Agglomerative clustering in sparse atomic decompositions of audio signals, in International Conference on Acoustic, Speech, and Signal Processing (ICASSP) (2008), pp. 97–100
P. Sugden, N. Canagarajah, Underdetermined noisy blind separation using dual matching pursuits, in International Conference on Acoustic, Speech, and Signal Processing, vol. V, pp. 557–560 (2004)
M.P. Tcheou, Compressão de sinais usando decomposições atômicas com base em dicionários redundantes. Ph.D. thesis, Universidade Federal do Rio de Janeiro (2011)
Y. Tian, X. Sun, S. Zhao, Doa and power estimation using a sparse representation of second-order statistics vector and \(l_0\)-norm approximation. Signal Process. 105, 98–108 (2014)
R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
I. Tosic, P. Frossard, Dictionary learning. IEEE Signal Process. Mag. 28(2), 27–38 (2011)
J.A. Tropp, Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)
J.A. Tropp, Just relax: convex programming methods for identifying sparse signals. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006)
C.D. Vleeschouwer, B. Macq, Subband dictionaries for low-cost matching pursuit of video residues. IEEE Trans. Circuits Syst. Video Technol. 9(7), 984–993 (1999)
S.U.N. Wood, J. Rouat, S. Dupont, G. Pironkov, Blind speech separation and enhancement with GCC-NMF. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 745–755 (2017)
T. Xu, W. Wang, W. Dai, Sparse coding with adaptive dictionary learning for underdetermined blind speech separation. Speech Commun. 55, 432–450 (2013)
J. Yamashita, S. Tatsuta, Y. Hirai, Estimation of propagation delays using orientation histograms of anechoic blind source separation. IJCNN 3, 2175–2180 (2004)
Ö. Yilmaz, S. Rickard, Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process. 52(7), 1830–1847 (2004)
L. Zhang, Q. Zhang, L. Zhang, D. Tao, X. Huang, B. Du, Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognit. 48(10), 3102–3112 (2015)
X. Zhang, D. Wang, Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1075–1084 (2017)
M. Zibulevsky, B.A. Pearlmutter, Blind source separation by sparse decomposition. Neural Comput. 13(4), 862–882 (2001)
Acknowledgements
Funding was provided by CNPq (Grant No. 431215/2016-2).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by Conselho Nacional de Desenvolvimento Científico e Tecnológico and in part by Fundação Carlos Chagas Filho de Amparo a Pesquisa do Estado do Rio de Janeiro.
Rights and permissions
About this article
Cite this article
Haddad, D.B., Lovisolo, L., Petraglia, M.R. et al. Blind and Semi-blind Anechoic Mixing System Identification Using Multichannel Matching Pursuit. Circuits Syst Signal Process 40, 4546–4575 (2021). https://doi.org/10.1007/s00034-021-01681-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01681-1