当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multimodal fusion for indoor sound source localization
Pattern Recognition ( IF 8 ) Pub Date : 2021-02-23 , DOI: 10.1016/j.patcog.2021.107906
Jinhui Chen , Ryoichi Takashima , Xingchen Guo , Zhihong Zhang , Xuexin Xu , Tetsuya Takiguchi , Edwin R. Hancock

To identify the localization of indoor sound source, especially when attempted using only a single microphone, it is a challenging problem to machine learning. To address these issues, this paper presents a distinct novel solution based on fusing visual and acoustic models. Therefore, we propose two novel approaches. First, to estimate orientation of vocal object in a stable manner, we employ the visual approach as estimation model, where we develop a robust image feature representation method that adopts Fourier analysis to efficiently extract polar descriptors. Second the distance information is estimated by calculating the signal difference between transmit receive ends. To implement these, we use phoneme-level hidden Markov models (HMMs) extracted from clean speech sound, to estimate the acoustic transfer function (ATF), which can capture the speech signal as a network of phoneme HMMs. And using the separated frame sequences of the ATF, we can indicate the signal difference between two positions, which can be used to estimate the distance of sound source. Experimental results show that the proposed method can simultaneously extract the sound source parameters of direction and distance, and thus improves the verification task of sound source localization.



中文翻译:

多模态融合用于室内声源定位

要确定室内声源的定位,尤其是在尝试仅使用单个麦克风的情况下,这对于机器学习是一个具有挑战性的问题。为了解决这些问题,本文提出了一种基于融合视觉和声学模型的独特新颖解决方案。因此,我们提出了两种新颖的方法。首先,为了以稳定的方式估计声音对象的方向,我们采用视觉方法作为估计模型,在此方法中,我们开发了一种鲁棒的图像特征表示方法,该方法采用傅立叶分析来有效地提取极性描述符。其次,通过计算发射接收端之间的信号差来估计距离信息。为了实现这些功能,我们使用从干净语音中提取的音素级隐藏马尔可夫模型(HMM)来估计声学传递函数(ATF),可以将语音信号捕获为音素HMM网络。并且使用分离的ATF帧序列,我们可以指示两个位置之间的信号差,可以用来估计声源的距离。实验结果表明,该方法可以同时提取方向和距离的声源参数,从而改善了声源定位的验证任务。

更新日期:2021-02-28
down
wechat
bug