Elsevier

Speech Communication

Volume 120, June 2020, Pages 42-52
Speech Communication

Adaptive and hybrid Kronecker product beamforming for far-field speech signals

https://doi.org/10.1016/j.specom.2020.04.001Get rights and content

Highlights

  • The ULA can be represented in terms to two smaller virtual ULAs (VULAs) using the Kronecker product.

  • MVDR adaptive beamformers are obtained separately for the VULAs, and combined using the Kronecker product to obtain the full-length KP-MVDR beamformer.

  • Implementing the fixed DS beamformer on one VULA and the MVDR adaptive beamformer on the other results in the KP-DS-MVDR and KP-MVDR-DS hybrid beamformers.

  • The Kronecker product beamformers are more robust to errors in the statistical estimates of the data, and provide superior performances to the conventional DS and MVDR beamformers.

  • The Kronecker product beamformers require less data (faster convergence) to achieve their steady state performances compared to the conventional MVDR beamformer.

Abstract

This work presents a Kronecker product based methodology of frequency-domain beamforming of large sensor arrays for far-field broadband speech signals. The principal idea involves splitting up a given uniform linear array (ULA) into two smaller virtual ULAs (VULAs), using the Kronecker product. The linear system of the original ULA is bifurcated into two smaller linear systems of the VULAs. Henceforth, traditional adaptive beamformers such as the minimum-variance-distortionless-response (MVDR) beamformer may be obtained for each of the VULAs, using lesser data to estimate the statistics. The short-length beamformers, obtained from the VULAs, are finally combined by the Kronecker product to derive the full-length Kronecker product beamformer. Additionally, the VULAs allow fixed and adaptive beamforming to be implemented separately on each of them. As fixed beamformers do not employ statistical information, the Kronecker product hybrid beamformers reduce the original linear system to just a small linear system involving one VULA. Accordingly, hybrid beamformers may be implemented using traditional fixed beamformers, such as the delay-and-sum (DS) beamformer, on one VULA, and traditional adaptive beamformers, such as the MVDR, on the other. The proposed Kronecker product beamformers are observed to provide faster convergence and superior robustness with respect to the traditional beamformers.

Introduction

Beamforming is the task of conserving a signal received by an array of sensors from a particular direction and source while trying to attenuate the interferences and noise-signals impinging on it from other directions and sources (Van Veen, Buckley, 1988, Wang, 2009, Benesty, Cohen, Chen, 2017). It involves applying a filter to the data received by the sensor array, resulting in a signal which is an accurate estimate of the signal-of-interest (SOI) impinging from the particular direction (Van Veen, Buckley, 1988, Wang, 2009, Benesty, Cohen, Chen, 2017). One way of doing so is to design a filter based solely on the knowledge of the direction-of-arrival (DOA) of the SOI, and sometimes also the DOAs of the interferences - this is called fixed beamforming (Benesty et al., 2017). A more robust way, additionally, involves utilizing the knowledge of the statistics of the data. Such a method is called adaptive beamforming (Wang, 2009, Benesty, Cohen, Chen, 2017, Benesty, Huang, 2013, Chandran, 2013). When the DOA of the SOI is known, and there is a limited effect of interference, a fixed beamformer is a very useful and efficient solution. As, in reality, such situations seldom exist, adaptive beamformers are a more sensible option. Over the years, a plethora of fixed and adaptive beamformers have been developed, out of which the delay-and-sum (DS) and minimum-variance-distortionless-response (MVDR) beamformers are well-appreciated, and are utilized in this work (Benesty et al., 2017).

As is apparent, the performance of both fixed and adaptive beamformers depend on the accuracy of the DOA estimate. An adaptive beamformer also depends on the accuracy of the estimated data-statistics. Therefore, there has always been a lot of focus on improving the performances of beamformers based on the better estimation of the steering vector and (or) quicker and more accurate tracking of the second-order statistics of the data (Reed, Mallett, Brennan, 1974, Asl, Mahloojifar, 2012, Zaharis, Yioultsis, 2011, Asl, Mahloojifar, 2010, Zhang, Liu, Leng, Wang, Shi, 2016, Feng, Liao, Xu, Zhu, Zeng, 2018, Landau, de Lamare, Haardt, 2014, Jia, Jin, Zhou, Yao, 2013, Gu, Leshem, 2012, Gu, Goodman, Hong, Li, 2014, Khabbazibasmenj, Vorobyov, Hassanien, 2012, Yang, Liao, Li, Lei, Wang, 2017, Ke, Zheng, Peng, Li, 2017, Yuan, Gan, 2017, Huang, Zhang, Xu, Ye, 2015, Liao, Guo, Huang, Li, So, 2017). Concurrently, recent technological advancements are driving the ever-evolving design of high-density sensor arrays, consisting of a large number of sensors, to obtain better performances (Weinstein et al., 2007). Such developments have brought the challenge of accurately estimating the second-order statistics from limited data, and processing such information efficiently. These challenges have led to innovative refinements of conventional adaptive algorithms used for efficient implementation of beamforming filters, the popular ones being the multi-stage wiener (MSW), reduced-rank linearly constrained minimum variance (RRLCMV), and their widely-linear variants (Wang, 2009, Honig, Goldstein, 2002, Burykh, Abed-Meraim, 2002, Santos, Zoltowski, 2004, De Lamare, Sampaio-Neto, 2007, de Lamare, Wang, Fa, 2010, Chevalier, Blin, 2007, Chevalier, Delmas, Oukaci, 2009, Song, Alokozai, de Lamare, Haardt, 2014, Zheng, Deleforge, Li, Kellermann, 2018).

It is worthwhile noting that this work is not another refinement of adaptive filtering algorithms. Rather, this work provides a theoretical framework to tackle the above-mentioned challenges at a higher level of abstraction, i.e., in the beamformer/filter design level. In this work, we propose a methodology to mathematically separate a large array of sensors into two smaller arrays using the Kronecker product (Van Loan, 2000, Schäcke). Henceforth, we propose the methodology to obtain new beamformers for the original array by combining the traditional beamformers of the smaller sub-arrays. The practical implementation of the proposed beamformers may be carried out using suitable adaptive algorithms, or new algorithms (based on RLS, LMS, etc.) may be developed that is more suitable for our proposed framework of beamforming. That is not the scope of this work, and may be dealt with separately. As such, this work must not be confused or compared with adaptive filtering algorithms such as the MSW, RRLCMV, etc.

Using the traditional MVDR and DS beamformers as examples, we show how to utilize the proposed theoretical framework to make beamforming for large arrays more efficient and effective. As will be illustrated in this work, the proposed variants of traditional frequency domain beamformers are more robust to unstable/erroneous estimates of the data-statistics and various levels of interferences. It is assumed that the DOA of the SOI is known. The interferences are broadband noise signals taken from the NOISEX-92 database (Varga and Steeneken, 1993), and the noise in the sensors are considered as Gaussian white noise signals. Inspired by our recent works (Cohen, Benesty, Chen, 2019, Yang, Huang, Benesty, Cohen, Chen, 2019) using the Kronecker product in differential (fixed) beamforming, this work experiments the utility of the same in adaptive beamforming. The framework also allows the combination of fixed and adaptive beamforming to generate hybrid beamformers. Uniform linear arrays (ULAs) (Wang, 2009, Benesty, Cohen, Chen, 2017, Benesty, Huang, 2013, Chandran, 2013) are used in our study.

At this juncture, we must also note that adaptive beamforming has generally been implemented on narrowband modulated signals used in communication technologies (Wang, 2009, Honig, Goldstein, 2002, Burykh, Abed-Meraim, 2002, Santos, Zoltowski, 2004, De Lamare, Sampaio-Neto, 2007, de Lamare, Wang, Fa, 2010, Chevalier, Blin, 2007, Chevalier, Delmas, Oukaci, 2009, Song, Alokozai, de Lamare, Haardt, 2014, Zheng, Deleforge, Li, Kellermann, 2018). Such signals are near-sinusoids and quasi-stationary in nature, and hence time-domain beamforming is suitable. However, in the case of non-stationary broadband signals like speech, frequency domain beamforming may be more relevant (Benesty, Cohen, Chen, 2018, Benesty, Chen, Huang, 2008, Ming Zhang, Er, 1995). Moreover, while traditionally microphone array beamforming has been confined to near-field applications (Brandstein, Ward, 2013, Ryan, Goubran, 1997, Thomas, Verburgh, Catrysse, Botteldooren, 2017), the developments in sensor technology now enable us to employ large microphone arrays in far-field applications, such as surveillance in crowded environments, trading in large open halls, and other distant speech and speaker recognition tasks (Kumatani, McDonough, Raj, 2012, Taghizadeh, Garner, Bourlard, 2012, AlShehhi, Hammadih, Zitouni, AlKindi, Ali, Weruaga). This work is devoted to beamforming for such evolving and futuristic applications.

The rest of this work is organized as follows: Section 2 formulates the beamforming problem, the various statistical and non-statistical metrics, and presents the traditional DS and MVDR beamformers. Section 3 describes the Kronecker product based beamforming methodology. Section 4 presents the experimental results, and compares the proposed beamformers with their traditional counterparts. Section 5 summarizes and concludes this work.

Section snippets

Signal model and conventional filters

We consider an arbitrary ULA of M sensors, as shown in Fig. 1, with the sensors located at arbitrary positions denoted by {δm=(m1)δ:m=1,2,,M}, where δ is the inter-sensor distance. A discrete-time SOI, x(t), where t denotes the discrete-time index, impinges on the ULA as a plane-wave in the far-field, traveling at the velocity of sound, c, through the medium. Similarly, K independent interferences, {uk(t):k=1,2,,K}, impinge on the ULA. We consider the DOA of the SOI as θd, and the DOAs of

Kronecker product beamforming3

3 It is obvious from (20) and (21) that the performance of the MVDR beamformer depends significantly on the accuracy of the M-dimensional square covariance matrix, Φv(f, r) or Φy(f, r). As the number of sensors, M, increases, more data is required to reliably estimate such

Experimental performance

In this section, we evaluate the performances of the proposed beamformers in comparison with the conventional MVDR and DS beamformers. It has been observed that the KP-MVDR beamformer saturates rapidly with respect to the number of iterations, N. Hence, N=5 will be used in this work for the KP-MVDR beamformer. For our experiments, we utilize a speech signal as the SOI, and four noise signals, taken from the NOISEX-92 database (Varga and Steeneken, 1993), as interferences (of different variances

Conclusions

We have introduced a new approach to frequency domain adaptive beamforming for large sensor arrays, with the purpose of achieving enhanced robustness to interference and statistical instability. Firstly, the original ULA is represented by two smaller VULAs, which are connected by the Kronecker product. As the VULAs are smaller than the original ULA, adaptive beamformers can be derived from them using lesser data for statistical computations. Their smaller size also makes them robust to errors

CRediT authorship contribution statement

Rajib Sharma: Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft. Israel Cohen: Conceptualization, Methodology, Supervision, Writing - review & editing, Funding acquisition. Jacob Benesty: Conceptualization, Methodology, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The authors thank Dr. Gongping Huang for his valuable inputs and suggestions.

References (52)

  • J. Benesty et al.

    Fundamentals of Signal Enhancement and Array Signal Processing

    (2017)
  • J. Benesty et al.

    Fundamentals of Signal Enhancement and Array Signal Processing

    (2018)
  • J. Benesty et al.

    Adaptive Signal Processing: Applications to Real-World Problems

    (2013)
  • J. Benesty et al.

    Springer Handbook of Speech Processing

    (2008)
  • M. Brandstein et al.

    Microphone Arrays: Signal Processing Techniques and Applications

    (2013)
  • S. Burykh et al.

    Reduced-rank adaptive filtering using Krylov subspace

    EURASIP J. Appl. Signal Process.

    (2002)
  • S. Chandran

    Adaptive Antenna Arrays: Trends and Applications

    (2013)
  • P. Chevalier et al.

    Widely linear MVDR beamformers for the reception of an unknown signal corrupted by noncircular interferences

    IEEE Trans. Signal Process.

    (2007)
  • P. Chevalier et al.

    Optimal widely linear MVDR beamforming for noncircular signals

    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on

    (2009)
  • I. Cohen et al.

    Differential Kronecker product beamforming

    IEEE/ACM Trans. Audio Speech Lang. Process.

    (2019)
  • R.C. De Lamare et al.

    Reduced-rank adaptive filtering based on joint iterative optimization of adaptive filters

    IEEE Signal Process. Lett.

    (2007)
  • Y. Feng et al.

    Robust adaptive beamforming against large steering vector mismatch using multiple uncertainty sets

    Signal Process.

    (2018)
  • Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., 1993. DARPA TIMIT...
  • Y. Gu et al.

    Robust adaptive beamforming based on interference covariance matrix reconstruction and steering vector estimation

    IEEE Trans. Signal Process.

    (2012)
  • M.L. Honig et al.

    Adaptive reduced-rank interference suppression based on the multistage wiener filter

    IEEE Trans. Commun.

    (2002)
  • L. Huang et al.

    Robust adaptive beamforming with a novel interference-plus-noise covariance matrix reconstruction method.

    IEEE Trans. Signal Process.

    (2015)
  • Cited by (6)

    • A comparison of robust capon beamformers using a large-scale microphone array for speech extraction

      2023, Applied Acoustics
      Citation Excerpt :

      The first class takes full advantage of the characteristics of some regularly shaped arrays, decomposing their high dimensional direct path steering vectors into the Kronecker product (KP) of two low dimensional vectors [19–22], which is referred to as KP-based methods in this paper. This decomposition can help reduce the dimension of the beamformer weight vector, leading to improve the robustness of the MPDR beamformer and reduce the computational complexity simultaneously [23]. Although KP-based beamformers have been studied for speech extraction tasks in noisy and reverberant environments, only MVDR was integrated into KP-based beamformers, where the INCM is assumed to be estimated ideally, while the performance of KP-based MPDR beamformers in practical applications has not been evaluated.

    • A Comprehensive Review of Beamforming-Based Speech Enhancement Techniques, IoT, and Smart City Applications

      2023, 2023 IEEE 2nd Industrial Electronics Society Annual On-Line Conference, ONCON 2023
    • Adaptive beamforming based on new steepest descent algorithm

      2022, Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics
    • Kronecker Product Adaptive Beamforming for Microphone Arrays

      2021, 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
    • Beamforming with Cube Microphone Arrays Via Kronecker Product Decompositions

      2021, IEEE/ACM Transactions on Audio Speech and Language Processing
    • Controlling Elevation and Azimuth Beamwidths with Concentric Circular Microphone Arrays

      2021, IEEE/ACM Transactions on Audio Speech and Language Processing
    1

    This research was supported by the Israel Science Foundation (grant no. 576/16) and the ISF-NSFC joint research program (grant No. 2514/17).

    View full text