当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient binaural rendering of spherical microphone array data by linear filtering
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2021-11-06 , DOI: 10.1186/s13636-021-00224-5
Johannes M. Arend 1, 2 , Tim Lübeck 1, 2 , Christoph Pörschmann 1
Affiliation  

High-quality rendering of spatial sound fields in real-time is becoming increasingly important with the steadily growing interest in virtual and augmented reality technologies. Typically, a spherical microphone array (SMA) is used to capture a spatial sound field. The captured sound field can be reproduced over headphones in real-time using binaural rendering, virtually placing a single listener in the sound field. Common methods for binaural rendering first spatially encode the sound field by transforming it to the spherical harmonics domain and then decode the sound field binaurally by combining it with head-related transfer functions (HRTFs). However, these rendering methods are computationally demanding, especially for high-order SMAs, and require implementing quite sophisticated real-time signal processing. This paper presents a computationally more efficient method for real-time binaural rendering of SMA signals by linear filtering. The proposed method allows representing any common rendering chain as a set of precomputed finite impulse response filters, which are then applied to the SMA signals in real-time using fast convolution to produce the binaural signals. Results of the technical evaluation show that the presented approach is equivalent to conventional rendering methods while being computationally less demanding and easier to implement using any real-time convolution system. However, the lower computational complexity goes along with lower flexibility. On the one hand, encoding and decoding are no longer decoupled, and on the other hand, sound field transformations in the SH domain can no longer be performed. Consequently, in the proposed method, a filter set must be precomputed and stored for each possible head orientation of the listener, leading to higher memory requirements than the conventional methods. As such, the approach is particularly well suited for efficient real-time binaural rendering of SMA signals in a fixed setup where usually a limited range of head orientations is sufficient, such as live concert streaming or VR teleconferencing.

中文翻译:

通过线性滤波对球形麦克风阵列数据进行有效的双耳渲染

随着人们对虚拟和增强现实技术的兴趣不断增长,实时高质量地渲染空间声场变得越来越重要。通常,球形麦克风阵列 (SMA) 用于捕获空间声场。捕获的声场可以使用双耳渲染通过耳机实时再现,虚拟地将单个听众置于声场中。双耳渲染的常用方法首先通过将声场转换到球谐函数域对声场进行空间编码,然后通过将其与头部相关传递函数 (HRTF) 相结合对声场进行双耳解码。然而,这些渲染方法在计算上要求很高,尤其是对于高阶 SMA,并且需要实现非常复杂的实时信号处理。本文提出了一种通过线性滤波对 SMA 信号进行实时双耳渲染的计算效率更高的方法。所提出的方法允许将任何常见的渲染链表示为一组预先计算的有限脉冲响应滤波器,然后使用快速卷积实时应用于 SMA 信号以产生双耳信号。技术评估结果表明,所提出的方法等效于传统的渲染方法,同时对计算的要求较低,并且使用任何实时卷积系统更容易实现。然而,较低的计算复杂度伴随着较低的灵活性。一方面,编码和解码不再解耦,另一方面,SH域中的声场变换不能再进行。最后,在所提出的方法中,必须为听众的每个可能的头部方向预先计算和存储过滤器集,从而导致比传统方法更高的内存要求。因此,该方法特别适用于固定设置中 SMA 信号的高效实时双耳渲染,其中通常有限的头部方向范围就足够了,例如现场音乐会流媒体或 VR 电话会议。
更新日期:2021-11-07
down
wechat
bug