Spatial extrapolation of early room impulse responses in local area using sparse equivalent sources and image source method

doi:10.1016/j.apacoust.2021.108027

Applied Acoustics

Volume 179, August 2021, 108027

https://doi.org/10.1016/j.apacoust.2021.108027 Get rights and content

Abstract

The room impulse response (RIR) is important in most acoustic applications, such as the design of concert halls and sound field control, because it characterizes the sound propagation. The measurement of RIRs at multiple points is challenging, as it requires a huge microphone array or repeating the experiment by microphone replacement. Several RIR interpolation and extrapolation methods of RIRs have been developed for obtaining RIRs from multiple measurement points efficiently. Extrapolation methods offer more efficient RIR measurement compared to interpolation. However, previous studies focused on extrapolation at frequencies below 1 kHz, and the extrapolation at higher frequencies was difficult. In this study, we propose an extrapolation method for RIRs of direct sound and early reflections in a local area using a small number of measurement points. The proposed method represents the RIRs around the microphones using superpositions of sparse equivalent sources located around the loudspeaker and image sources. We conducted a measurement experiment in an anechoic chamber to estimate RIRs around microphones using sound-reflecting boards. From the experimental results with 2.5 dimensional conditions, the proposed method achieved about above 10 dB of signal to noise ratio (SNR) near the microphone array from 0.5–8.5 kHz. For the extrapolation accuracy over the entire evaluation area (0.6 × 0.54 m²), the proposed method improved the SNR by about 5–6 dB compared to results using the plane wave decomposition.

Introduction

Measurement of room impulse response (RIR) is essential to understand sound propagation in a room. For example, the RIRs at multiple points are useful for sound field control/synthesis [4], [23], [6], and visualization of sound field, etc [10], [29]. Sound propagation is considered as a linear time-invariant system, and the system considers the loudspeaker as the input and the microphone as the output. The RIR depends on the positions of the loudspeaker and the microphone because the sound propagation path can change based on those positions. Thus, to obtain the RIRs from the loudspeaker to other points, we must repeat measurements after relocating the microphone. Another approach is to use a microphone array that has multiple microphones located at all measurement points. However, as the number of microphones increases, the sound reflections from the microphones become larger and the calibration of the microphones become more complicated.

In recent studies, several RIR interpolation methods have been proposed that are able to measure multiple RIRs efficiently. In [19], proposed an RIR interpolation method for an entire room using the sparsity of the early reflections in the time domain.

In [12], [13], the RIRs at grid points were recovered by solving the inverse problem of interpolating the signal of the microphone on a moving path from RIRs at the grid points.

Alternatively, the equivalent source method (ESM) is well known for the reconstruction of the radiated and scattered sound fields [14], [11], [27]. Based on the Kirchhoff–Helmholtz integral equation [30], the sound field can be represented by the superposition of point sources surrounding the domain of interest. Because the number of point sources exceeds the number of measurement points, it is treated as an undecided problem and is solved by the least squares (LS) method.

With the recent developments in compressed sensing [3], a sparse representation of the acoustic field has been shown to be effective for some applications, such as multi-sound field control [21], [22], sound field decomposition [15], and RIR interpolation [19], [2]. In ESM, the sparse expression is also more effective than the LS method to represent the near field [5].

In addition, the extrapolation method helps us to obtain the RIRs more efficiently compared to the interpolation, since it is simple and easy to place the microphones. In [9], the room transfer function is modeled by the common acoustical poles and their residues corresponding to the eigenfrequencies of the room. In [17], neural networks are applied for the sound field reconstruction, and sound field variations are predicted based on the observations by a small number of microphones. Furthermore, in the work presented in [28], the RIRs at arbitrary points were extrapolated by a limited number of plane waves using compressed sensing. These previous studies have shown the effectiveness of RIR extrapolation at frequencies below 1 kHz. However, extrapolation of RIRs at higher frequencies is required for various applications such as sound field control and concert hall design. We previously proposed extrapolation methods for direct sound and primary reflections based on sparse ESM and image source method [24], [25], and evaluated the effectiveness of our proposed method by simulation experiments.

In this study, to estimate the sound reflections around the microphones using the decomposition of the reflection components on each wall, we propose the extrapolation method of RIRs including not only primary reflection but also the other early reflections. In the measurement experiment in an anechoic chamber, we evaluate the estimation accuracies at frequencies up to 8.5 kHz.

Our proposed method can determine the reflection components for each wall using the image source method [1]. This decomposition can help in various applications such as sound visualization and room acoustics design in architectural acoustics, based on the relationship between each reflection component and the reflecting object. Furthermore, since it is known that the early part of RIRs affects the timbre and sound localization [7]; therefore, it is necessary to design and control the early part of RIRs in most acoustic applications. Thus, in this study, we focus on extrapolation of the early part of RIRs in the frequency band 0.5–8.5 kHz. We conduct an experimental evaluation with sound-reflecting boards in an anechoic chamber. We evaluate extrapolation accuracies of RIRs including primary and secondary sound reflections with two different configurations of the microphone array.

The outline of this paper is as follows: In Section 2, we present the estimation method of RIRs with sparse ESM and the image source method. In Section 3, experimental results are reported to evaluate the proposed method. Finally, we conclude this paper in Section 4.

Section snippets

Method

Fig. 1 shows the concept of the proposed method with sparse equivalent sources and the image source method. We consider RIRs comprising of the direct sound from a loudspeaker and its early reflections from the walls. First, based on the superposition principle for sound waves, the transfer function $y_{m} (\in C)$ of the m-th microphone from a loudspeaker $x_{src}$ to a position of a microphone $x'_{m} (m = 1, \dots, M)$ is divided into a direct sound and early reflections as $y_{m} = y_{m, 0}^{(0)} + y_{m, 1}^{(1)} + \dots + y_{m, I_{1}}^{(1)} + y_{m, 1}^{(2)} + \dots + y_{m, I_{2}}^{(2)}$

Experimental conditions

We conducted the evaluation experiments in an anechoic chamber with sound-reflecting boards. In these experiments, we evaluate the extrapolation accuracy of the transfer functions that comprise direct sound and primary/secondary reflections in the horizontal plane. We compared the proposed method with the plane wave decomposition (PWD) [28] for extrapolation. In PWD, the optimization problem in Eq. (8) was solved with transfer functions of plane waves and error tolerance was $∊ = 2$ . The number of

Conclusion

We proposed the spatial extrapolation method of the early room impulse responses with a small number of microphones to efficiently obtain RIRs in the local area. We estimated components of the direct sound and the reflection from walls using the sparse equivalent sources and the image source method.

The experimental results indicate that RIRs, including primary and secondary reflections, can be estimated over 0.5–8.5 kHz in 0.54 × 0.6 $m^{2}$ using 13 or 16 microphones. For both microphone arrays,

CRediT authorship contribution statement

Izumi Tsunokuni: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Visualization, Project administration. Kakeru Kurokawa: Validation, Investigation. Haruka Matsuhashi: Software, Validation, Investigation. Yusuke Ikeda: Conceptualization, Methodology, Software, Validation, Resources, Writing - review & editing, Supervision, Funding acquisition. Naotoshi Osaka: Resources, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by Research Institute for Science and Technology of Tokyo Denki University Grant No. Q20J-04/ Japan. The authors would like to thank Prof. Y. Kaneda (Tokyo Denki University) for support regarding the measurement environment and equipment.

References (30)

J.B. Allen et al.
Image method for efficiently simulating small-room acoustics
J Acoust Soc Amer
(1979)
N. Antonello et al.
Room impulse response interpolation using a sparse spatio-temporal representation of the sound field
IEEE/ACM Trans Audio Speech Lang Process
(2017)
E.J. Candes et al.
An introduction to compressive sampling
IEEE Signal Process Mag
(2008)
S.J. Elliott et al.
Multiple-point equalization in a room using adaptive digital filters
J Audio Eng Soc
(1989)
E. Fernandez-Grande et al.
A sparse equivalent source method for near-field acoustic holography
J Acoust Soc Amer
(2017)
P.A. Gauthier et al.
Adaptive wave field synthesis with independent radiation mode control for active sound field reproduction: Theory
J Acoust Soc Amer
(2006)
T. Gotoh et al.
A consideration of distance perception in binaural hearing
J Acoust Soc Jpn
(1977)
Grant M, Boyd S. CVX: Matlab software for disciplined convex programming, version 2.1; 2020. URL:...
Y. Haneda et al.
Common-acoustical-pole and residue model and its application to spatial interpolation and extrapolation of a room transfer function
IEEE Trans Speech Audio Process
(1999)
A. Inoue et al.
Visualization system for sound field using see-through head-mounted display
Acoust Sci Technol
(2019)

M.E. Johnson et al.

An equivalent source technique for calculating the sound field inside an enclosure containing scattering objects

J Acoust Soc Amer

(1998)

F. Katzberg et al.

Sound-field measurement with moving microphones

J Acoust Soc Amer

(2017)

F. Katzberg et al.

A compressed sensing framework for dynamic sound-field measurements

IEEE/ACM Trans Audio Speech Lang Process

(2018)

G.H. Koopmann et al.

A method for computing acoustic fields based on the principle of wave superposition

J Acoust Soc Amer

(1989)

S. Koyama et al.

Sparse representation of a spatial sound field in a reverberant environment

IEEE J Select Top Signal Process

(2019)

Cited by (15)

Multizone sound field reproduction using pressure matching with sparse equivalent source
2024, Journal of Sound and Vibration
Multizone sound field reproduction aims to create different acoustic environments in multiple spatial zones, allowing listeners to enjoy their individual sounds without being disturbed by sound from other zones. The conventional pressure matching method is used in practice to reproduce the desired sound field by minimizing the sound field error with a limited number of control points, whose reproduction performance decreases above the cutoff frequency. In this paper, a multizone sound field reproduction method based on a sparse equivalent source method is proposed to reproduce the sound field in the whole target region. The acoustic transfer function of each loudspeaker and the desired sound field are represented by the sparse equivalent sources. The whole target region is controlled by interpolating fine virtual control points within the region of the uncontrolled points, and the acoustic transfer functions between the virtual control points and the loudspeaker array are interpolated by the sparse equivalent sources. An optimization reproduction model with the objective of reproducing a desired sound field in the bright zone and constraining the acoustic energy in the dark zone is formulated to find the sparse loudspeaker weights. The simulation results and experimental results show that the proposed method achieves better performance than the conventional pressure matching method in free field and reverberant environments.
Spherical-harmonics-based sound field decomposition and multichannel NMF for sound source separation
2024, Applied Acoustics
In the context of source separation solutions for virtual reality applications, several techniques in the spherical harmonics domain have been proposed in the literature. The performance of such methods is limited under high reverberation conditions and the rendering of the obtained spatial sound is fixed to the recording location only. Recently, novel sound field works in the literature proposed a global representation that enables both the direct sound (exterior field) estimation and the reconstruction in locations different from the acquisition ones. In this paper, we propose a signal processing framework based on Multichannel Non-Negative Matrix Factorization in the spherical harmonics domain that operates directly over the exterior field coefficients enabling the reconstruction of the direct sound field of the separated sources. To evaluate our proposal, we compared with other state-of-the-art source separation approaches using several setups and including different reverberation conditions, showing promising results in terms of BSS_eval metrics.
Sound field reconstruction using neural processes with dynamic kernels
2024, Eurasip Journal on Audio, Speech, and Music Processing
ROOM TRANSFER FUNCTION RECONSTRUCTION USING COMPLEX-VALUED NEURAL NETWORKS AND IRREGULARLY DISTRIBUTED MICROPHONES
2024, arXiv
Optimal Transport Based Impulse Response Interpolation in the Presence of Calibration Errors
2024, IEEE Transactions on Signal Processing
Sound field reconstruction using neural processes with dynamic kernels
2023, arXiv

View all citing articles on Scopus

View full text

Spatial extrapolation of early room impulse responses in local area using sparse equivalent sources and image source method

Abstract

Introduction

Section snippets

Method

Experimental conditions

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Image method for efficiently simulating small-room acoustics

J Acoust Soc Amer

Room impulse response interpolation using a sparse spatio-temporal representation of the sound field

IEEE/ACM Trans Audio Speech Lang Process

An introduction to compressive sampling

IEEE Signal Process Mag

Multiple-point equalization in a room using adaptive digital filters

J Audio Eng Soc

A sparse equivalent source method for near-field acoustic holography

J Acoust Soc Amer

Adaptive wave field synthesis with independent radiation mode control for active sound field reproduction: Theory

J Acoust Soc Amer

A consideration of distance perception in binaural hearing

J Acoust Soc Jpn

Common-acoustical-pole and residue model and its application to spatial interpolation and extrapolation of a room transfer function

IEEE Trans Speech Audio Process

Visualization system for sound field using see-through head-mounted display

Acoust Sci Technol

An equivalent source technique for calculating the sound field inside an enclosure containing scattering objects

J Acoust Soc Amer

Sound-field measurement with moving microphones

J Acoust Soc Amer

A compressed sensing framework for dynamic sound-field measurements

IEEE/ACM Trans Audio Speech Lang Process

A method for computing acoustic fields based on the principle of wave superposition

J Acoust Soc Amer

Sparse representation of a spatial sound field in a reverberant environment

IEEE J Select Top Signal Process