Point Cloud Audio Processing,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Point Cloud Audio Processing
arXiv - CS - Sound Pub Date : 2021-05-06 , DOI: arxiv-2105.02469
Krishna Subramani, Paris Smaragdis

Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.

中文翻译：

点云音频处理

大多数音频处理管道都涉及对音频的固定尺寸输入表示起作用的转换。例如，当使用短时傅立叶变换（STFT）时，DFT大小为输入表示指定一个固定的大小。结果，大多数音频机器学习模型被设计为处理固定大小的矢量输入，这通常禁止将学习的模型重新用于具有不同采样率或替代表示的音频。但是，我们注意到，音频信号中的固有频谱信息对于输入表示或采样率的选择是不变的。为此，我们引入了一种新颖的方式来处理音频信号，方法是将它们视为特征空间中的点集合，并且我们使用点云机器学习模型，这些模型使我们对表示参数（例如DFT大小或采样率）的选择具有不变性。此外，我们观察到这些方法可生成较小的模型，并允许我们以最小的影响对输入表示进行二次采样，而对经过训练的模型性能的影响最小。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文