Audio-Visual Based Online Multi-Source Separation,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Audio-Visual Based Online Multi-Source Separation
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2022-03-14 , DOI: 10.1109/taslp.2022.3156758
Jonah Ong ₁ , Ba Tuong Vo ₁ , Sven Nordholm ₁ , Ba-Ngu Vo ₁ , Diluka Moratuwage ₁ , Changbeom Shim ₁

Affiliation

Meeting or conference assistance is a popular application that typically requires compact configurations of co-located audio and visual sensors. This paper proposes a novel solution for online separation of an unknown and time-varying number of moving sources using only a single microphone array co-located with a single visual device. The approach exploits the complementary nature of simultaneous audio and visual measurements, accomplished by a model-centric 3-stage process of detection, tracking, and (spatial) filtering, which performs separation in a block-wise or recursive fashion. Fusing the measurements requires solving the multi-modal space-time permutation problem, since the audio and visual measurements reside in different observation spaces, but also are unidentified or unlabeled (with respect to the unknown and time-varying number of sources), and are subject to noise, extraneous measurements and missing measurements. A labeled random finite set tracking filter is applied to resolve the permutation problem and recursively estimate the source identities and trajectories. A time-varying set of generalized side-lobe cancellers is constructed based on the tracking estimates to perform online separation. Evaluations are undertaken with live human speakers.

中文翻译：

基于视听的在线多源分离

会议或大会协助是一种流行的应用，通常需要紧凑配置的共置音频和视觉传感器。本文提出了一种新颖的解决方案，仅使用与单个视觉设备共置的单个麦克风阵列来在线分离未知且随时间变化的数量的移动源。该方法利用了同步音频和视觉测量的互补性质，通过以模型为中心的检测、跟踪和（空间）过滤的三阶段过程来完成，该过程以块方式或递归方式执行分离。融合测量需要解决多模态时空排列问题，因为音频和视觉测量驻留在不同的观察空间中，但也是未识别或未标记的（相对于未知且随时间变化的源数量），并且是受到噪声、无关测量和缺失测量的影响。应用标记随机有限集跟踪滤波器来解决排列问题并递归估计源身份和轨迹。基于跟踪估计构建一组时变的广义旁瓣消除器以执行在线分离。评估是由真人演讲者进行的。

更新日期：2022-03-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文