An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2022-05-16 , DOI: 10.1186/s13636-022-00242-x
Maximo Cobos , Jens Ahrens , Konrad Kowalczyk , Archontis Politis

The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial information carried by audio signals. This is in contrast to model-based methods, which impose spatial information from, for example, metadata like the intended position of a source onto signals that are otherwise free of spatial information. Signal processing has traditionally been at the core of spatial audio systems, and it continues to play a very important role. The irruption of deep learning in many closely related fields has put the focus on the potential of learning-based approaches for the development of data-based spatial audio applications. This article reviews the most important application domains of data-based spatial audio including well-established methods that employ conventional signal processing while paying special attention to the most recent achievements that make use of machine learning. Our review is organized based on the topology of the spatial audio pipeline that consist in capture, processing/manipulation, and reproduction. The literature on the three stages of the pipeline is discussed, as well as on the spatial audio representations that are used to transmit the content between them, highlighting the key references and elaborating on the underlying concepts. We reflect on the literature based on a juxtaposition of the prerequisites that made machine learning successful in domains other than spatial audio with those that are found in the domain of spatial audio as of today. Based on this, we identify routes that may facilitate future advancement.

中文翻译：

机器学习和其他基于数据的空间音频捕获、处理和再现方法概述

空间音频领域包括用于捕获、处理和再现包含空间信息的音频内容的方法。基于数据的方法是直接对音频信号携带的空间信息进行操作的方法。这与基于模型的方法形成对比，后者将来自例如元数据（如源的预期位置）的空间信息强加到原本没有空间信息的信号上。传统上，信号处理一直是空间音频系统的核心，并且继续发挥着非常重要的作用。深度学习在许多密切相关领域的爆发使人们关注基于学习的方法在开发基于数据的空间音频应用方面的潜力。本文回顾了基于数据的空间音频最重要的应用领域，包括采用传统信号处理的成熟方法，同时特别关注利用机器学习的最新成果。我们的审查是根据空间音频管道的拓扑结构进行组织的，其中包括捕获、处理/操作和再现。讨论了有关管道三个阶段的文献，以及用于在它们之间传输内容的空间音频表示，突出了关键参考并详细说明了基本概念。我们基于将机器学习在空间音频以外的领域取得成功的先决条件与目前在空间音频领域中发现的先决条件并置来反思文献。基于此，我们确定了可能促进未来发展的路线。

更新日期：2022-05-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文