Skip to main content
Log in

Asynchronous microphone arrays calibration and sound source tracking

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

In this paper, we proposed an optimisation method to solve the problem of sound source localisation and calibration of an asynchronous microphone array. This method is based on the graph-based formulation of the simultaneous localisation and mapping problem. In this formulation, a moving sound source is considered to be observed from a static microphone array. Traditional approaches for sound source localisation rely on the well-known geometrical information of the array and synchronous readings of the audio signals. Recent work relaxed these two requirements by estimating the temporal offset between pair of microphones based on the assumption that the clock timing of each microphone is exactly the same. This assumption requires the sound cards to be identically manufactured, which in practice is not possible to achieve. Hereby an approach is proposed to jointly estimate the array geometrical information, time offset and clock difference/drift rate of each microphone together with the location of a moving sound source. In addition, an observability analysis of the system is performed to investigate the most suitable configuration for sound source localisation. Simulation and experimental results are presented, which prove the effectiveness of the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Note that the main difference with a standard landmark-pose SLAM system is that here all the microphones are “observed” at all times. In a standard SLAM system only part of the landmarks are observed at any time. This fact allows the microphone array to be treated as a single landmark with a large state that contains locations of all microphones. However, the same solution can be achieved if the microphones are considered independently.

  2. Note that other motion models, such as constant velocity model, can be applied as long as it describes the motion of the sound source properly.

  3. Note that uncertainties of microphone locations do not directly related to minimum eigenvalues of sub FIMs corresponding to individual microphones, due to: (1) Minimum eigenvalues of sub FIMs corresponding to individual microphones related to not only locations but also starting time offsets and clock differences of microphones. (2) Uncertainties of microphone locations depend on relative distance to microphone 1, which is fixed to the origin of the coordinates as a reference. This means microphones close to microphone 1 have smaller uncertainties on their estimated locations. In comparison, minimum eigenvalues of sub FIMs corresponding to individual microphones do not depend on distance to microphone 1.

References

  • Bar-Shalom, Y., Li, X. R., & Kirubarajan, T. (2004). Estimation with applications to tracking and navigation: Theory algorithms and software. New York: Wiley.

    Google Scholar 

  • Blandin, C., Ozerov, A., & Vincent, E. (2012). Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing, 92(8), 1950–1960.

    Article  Google Scholar 

  • Bove, J., Michael, V., & Dalton, B. (2005). Audio-based self-localization for ubiquitous sensor networks. In Audio Engineering Society Convention (p. 118).

  • Canclini, A., Antonacci, E., Sarti, A., & Tubaro, S. (2013). Acoustic source localization with distributed asynchronous microphone networks. IEEE Transactions on Audio, Speech and Signal Processing, 21(2), 439–443.

    Article  Google Scholar 

  • Fan, H. H., & Yan, C. (2007). Asynchronous differential TDOA for sensor self-localization. In IEEE International Conference on Acoustics, Speech and Signal Processing 2007 (ICASSP 2007) (pp. II-1109–II-1112).

  • Grisetti, G., Kummerle, R., Stachniss, C., & Burgard, W. (2010). A tutorial on graph-based SLAM. IEEE Intelligent Transportation Systems Magazine, 2(4), 31–43.

    Article  Google Scholar 

  • Hasegawa, K., Ono, N., Miyabe, S., & Sagayama, S. (2010). Blind estimation of locations and time offsets for distributed recording devices. In Latent variable analysis and signal separation (pp. 57–64).

  • Hennecke, M. H., & Fink, G. A. (2011). Towards acoustic self-localization of ad hoc smartphone arrays. In 2011 joint workshop on hands-free speech communication and microphone arrays (HSCMA) (pp. 127–132).

  • Huang, S., & Dissanayake, G. (2016). A critique of current developments in simultaneous localization and mapping. International Journal of Advanced Robotic Systems. https://doi.org/10.1177/1729881416669482.

    Article  Google Scholar 

  • Lombard, A., Zheng, Y., Buchner, H., & Kellermann, W. (2011). TDOA estimation for multiple sound sources in noisy and reverberant environments using broadband independent component analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1490–1503.

    Article  Google Scholar 

  • Miura, H., Yoshida, T., Nakamura, K., & Nakadai, K. (2011). SLAM-based online calibration of asynchronous microphone array for robot audition. In IEEE/RSJ international conference on intelligent robots and systems (IROS 2011) (pp. 524–529).

  • Nakadai, K., Yamamoto, S., Okuno, H. G., Nakajima, H., Hasegawa, Y., & Tsujino, H. (2008). A robot referee for rock-paper-scissors sound games. In The 2008 IEEE international conference on robotics and automation (ICRA 2008) (pp. 3469–3474).

  • Nakadai, K., Okuno, H. G., & Mizumoto, T. (2017). Development, deployment and applications of robot audition open source software HARK. Journal of Robotics and Mechatronics, 29(1), 16–25.

    Article  Google Scholar 

  • Nakamura, K., Nakadai, K., Asano, F., & Ince, G. (2011). Intelligent sound source localization and its application to multimodal human tracking. In IEEE/RSJ international conference on intelligent robots and systems (IROS 2011) (pp. 143–148).

  • Nedis. (2019). 3D sound USB adapter. https://www.nedis.com/product/sound-card-3d-sound-51-usb-20-double-35-mm-connector. Accessed 10 May 2019.

  • Nesta, F., & Omologo, M. (2012). Generalized state coherence transform for multidimensional TDOA estimation of multiple sources. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 246–260.

    Article  Google Scholar 

  • Ohata, T., Nakamura, K., Nagamine, A., Mizumoto, T., Ishizaki, T., Kojima, R., et al. (2017). Outdoor sound source detection using a quadcopter with microphone array. Journal of Robotics and Mechatronics, 29(1), 177–187.

    Article  Google Scholar 

  • Ono, N., Shibata, K., & Kameoka, H. (2016). Self-localization and channel synchronization of smartphone arrays using sound emissions. In 2016 Asia–Pacific signal and information processing association annual summit and conference (APSIPA 2016) (pp. 1–5).

  • Pertila, P., Mieskolainen, M., & Hamalainen, M. S. (2011). Closed-form self-localization of asynchronous microphone arrays. In 2011 joint workshop on hands-free speech communication and microphone arrays (HSCMA) (pp. 139–144).

  • Plinge, A., Jacob, F., Haeb-Umbach, R., & Fink, G. A. (2016). Acoustic microphone geometry calibration: An overview and experimental evaluation of state-of-the-art algorithms. IEEE Signal Processing Magazine, 33(4), 14–29.

    Article  Google Scholar 

  • Plinge, A., Fink, G. A., & Gannot, S. (2017). Passive online geometry calibration of acoustic sensor networks. IEEE Signal Processing Letters, 24(3), 324–328.

    Article  Google Scholar 

  • Raykar, V. C., Yegnanarayana, B., Prasanna, S., & Duraiswami, R. (2005). Speaker localization using excitation source information in speech. IEEE Transactions on Speech and Audio Processing, 13(5), 751–761.

    Article  Google Scholar 

  • Samueli, H. (1988). On the design of optimal equiripple FIR digital filters for data transmission applications. IEEE Transactions on Circuits and Systems, 35(12), 1542–1546.

    Article  MathSciNet  Google Scholar 

  • Sekiguchi, K., Bando, Y., Nakamura, K., Nakadai, K., Itoyama, K., & Yoshii, K. (2016). Online simultaneous localization and mapping of multiple sound sources and asynchronous microphone arrays. In IEEE/RSJ international conference on intelligent robots and systems 2016 (IROS 2016) (pp. 1973–1979).

  • Su, D., Vidal-Calleja, T., & Valls Miro, J. (2015). Simultaneous asynchronous microphone array calibration and sound source localisation. In IEEE/RSJ international conference on intelligent robots and systems 2015 (IROS 2015) (pp. 5561–5567).

  • Valin, J. M., Rouat, J., & Michaud, F. (2004). Enhanced robot audition based on microphone array source separation with post-filter. In IEEE/RSJ international conference on intelligent robots and systems 2014 (IROS 2014) (pp. 2123–2128).

  • Wang, Z., & Dissanayake, G. (2008). Observability analysis of SLAM using Fisher information matrix. In The 2008 10th international conference on control, automation, robotics and vision (ICARCV 2008) (pp. 1242–1247).

  • Yamamoto, S., Valin, J. M., Nakadai, K., Rouat, J., Michaud, F., Ogata, T., & Okuno, H. G. (2005). Enhanced robot speech recognition based on microphone array source separation and missing feature theory. In The 2005 IEEE international conference on robotics and automation (ICRA 2005) (pp. 1477–1482).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daobilige Su.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Jacobian of the observation model of a 2D microphone array

In this appendix, Jacobian matrix \(J_{mic\_n}^{p-l}\), \(J_{k\_x}^{p-l}\) and \(J_{k\_y}^{p-l}\) of the observation model of a 2D microphone array are formulated.

Jacobian matrix \(J_{mic\_n}^{p-l}\) in Eq. (19) is only nonzero at row n and this nonzero row is computed as

$$\begin{aligned} \begin{aligned}&J_{mic\_n}^{p-l}(n,:)\\&=\begin{bmatrix} \dfrac{x_{mic\_n}^x - x_{src\_k}^x}{c\sqrt{(x_{mic\_n}^x - x_{src\_k}^x)^2 + (x_{mic\_n}^y - x_{src\_k}^y)^2}}\\ \dfrac{x_{mic\_n}^y - x_{src\_k}^y}{c\sqrt{(x_{mic\_n}^x - x_{src\_k}^x)^2 + (x_{mic\_n}^y - x_{src\_k}^y)^2}}\\ 1\\ k{\varDelta }t \end{bmatrix}^{T}. \end{aligned} \end{aligned}$$
(32)

Jacobian matrix \(J_{k\_x}^{p-l}\) and \(J_{k\_y}^{p-l}\) are computed as

$$\begin{aligned} J_{k\_x}^{p-l}= & {} \begin{bmatrix} \dfrac{x_{src\_k}^x - x_{mic\_2}^x}{c\sqrt{(x_{mic\_2}^x - x_{src\_k}^x)^2 + (x_{mic\_2}^y - x_{src\_k}^y)^2}} \\ \vdots \\ \dfrac{x_{src\_k}^x - x_{mic\_N}^x}{c\sqrt{(x_{mic\_N}^x - x_{src\_k}^x)^2 + (x_{mic\_N}^y - x_{src\_k}^y)^2}} \end{bmatrix}\nonumber \\&-\begin{bmatrix} \dfrac{x_{src\_k}^x}{c\sqrt{{x_{src\_k}^x}^2 + {x_{src\_k}^y}^2}}\\ \vdots \\ \dfrac{x_{src\_k}^x}{c\sqrt{{x_{src\_k}^x}^2 + {x_{src\_k}^y}^2}}\\ \end{bmatrix}, \end{aligned}$$
(33)
$$\begin{aligned} J_{k\_y}^{p-l}= & {} \begin{bmatrix} \dfrac{x_{src\_k}^y - x_{mic\_2}^y}{c\sqrt{(x_{mic\_2}^x - x_{src\_k}^x)^2 + (x_{mic\_2}^y - x_{src\_k}^y)^2}} \\ \vdots \\ \dfrac{x_{src\_k}^y - x_{mic\_N}^y}{c\sqrt{(x_{mic\_N}^x - x_{src\_k}^x)^2 + (x_{mic\_N}^y - x_{src\_k}^y)^2}} \end{bmatrix}\nonumber \\&-\begin{bmatrix} \dfrac{x_{src\_k}^y}{c\sqrt{{x_{src\_k}^x}^2 + {x_{src\_k}^y}^2}}\\ \vdots \\ \dfrac{x_{src\_k}^y}{c\sqrt{{x_{src\_k}^x}^2 + {x_{src\_k}^y}^2}}\\ \end{bmatrix}. \end{aligned}$$
(34)

Jacobian of the observation model of a 3D microphone array

In this appendix, Jacobian matrix \(J_{mic\_n}^{p-l}\), \(J_{k\_x}^{p-l}\), \(J_{k\_y}^{p-l}\) and \(J_{k\_z}^{p-l}\) of the observation model of a 3D microphone array are formulated.

For a 3D microphone array, \(J_{mic\_n}^{p-l}\) is Eq. (32) is rewritten as

$$\begin{aligned} \begin{aligned} J_{mic\_n}^{p-l}(n,:)= \begin{bmatrix} \dfrac{x_{mic\_2}^x - x_{src\_k}^x}{c{d}_{n,k}}\\ \dfrac{x_{mic\_2}^y - x_{src\_k}^y}{c{d}_{n,k}}\\ \dfrac{x_{mic\_2}^z - x_{src\_k}^z}{c{d}_{n,k}}\\ 1\\ k{\varDelta }t \end{bmatrix}^{T}. \end{aligned} \end{aligned}$$
(35)

\(J_{k\_x}^{p-l}\), \(J_{k\_y}^{p-l}\) and \(J_{k\_z}^{p-l}\) in Eq. 20 can be formulated as follows,

$$\begin{aligned}&\begin{aligned} J_{k\_x}^{p-l}=\begin{bmatrix} \dfrac{x_{src\_k}^x - x_{mic\_2}^x}{c{d}_{n,k}} \\ \vdots \\ \dfrac{x_{src\_k}^x - x_{mic\_N}^x}{c{d}_{n,k}} \end{bmatrix} -\begin{bmatrix} \dfrac{x_{src\_k}^x}{c{d_k}}\\ \vdots \\ \dfrac{x_{src\_k}^x}{c{d_k}}\\ \end{bmatrix}, \end{aligned} \end{aligned}$$
(36)
$$\begin{aligned}&\begin{aligned} J_{k\_y}^{p-l}=\begin{bmatrix} \dfrac{x_{src\_k}^y - x_{mic\_2}^y}{c{d}_{n,k}} \\ \vdots \\ \dfrac{x_{src\_k}^y - x_{mic\_N}^y}{c{d}_{n,k}} \end{bmatrix} -\begin{bmatrix} \dfrac{x_{src\_k}^y}{c{d_k}}\\ \vdots \\ \dfrac{x_{src\_k}^y}{c{d_k}}\\ \end{bmatrix}, \end{aligned} \end{aligned}$$
(37)
$$\begin{aligned}&\begin{aligned} J_{k\_z}^{p-l}=\begin{bmatrix} \dfrac{x_{src\_k}^z - x_{mic\_2}^z}{c{d}_{n,k}} \\ \vdots \\ \dfrac{x_{src\_k}^z - x_{mic\_N}^z}{c{d}_{n,k}} \end{bmatrix} -\begin{bmatrix} \dfrac{x_{src\_k}^z}{c{d_k}}\\ \vdots \\ \dfrac{x_{src\_k}^z}{c{d_k}}\\ \end{bmatrix}, \end{aligned} \end{aligned}$$
(38)

where \({d_k}\) is the distance from the sound source position at the kth time instance to the origin of the global coordinate frame, which is formulated as follows,

$$\begin{aligned} d_k = \sqrt{{x_{src\_k}^x}^2 + {x_{src\_k}^y}^2 + {x_{src\_k}^z}^2}. \end{aligned}$$
(39)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, D., Vidal-Calleja, T. & Miro, J.V. Asynchronous microphone arrays calibration and sound source tracking. Auton Robot 44, 183–204 (2020). https://doi.org/10.1007/s10514-019-09885-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-019-09885-w

Keywords

Navigation