DOANet: a deep dilated convolutional neural network approach for search and rescue with drone-embedded sound source localization,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DOANet: a deep dilated convolutional neural network approach for search and rescue with drone-embedded sound source localization
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2020-11-05 , DOI: 10.1186/s13636-020-00184-2
Alif Bin Abdul Qayyum , K. M. Naimul Hassan , Adrita Anika , Md. Farhan Shadiq , Md Mushfiqur Rahman , Md. Tariqul Islam , Sheikh Asif Imran , Shahruk Hossain , Mohammad Ariful Haque

Drone-embedded sound source localization (SSL) has interesting application perspective in challenging search and rescue scenarios due to bad lighting conditions or occlusions. However, the problem gets complicated by severe drone ego-noise that may result in negative signal-to-noise ratios in the recorded microphone signals. In this paper, we present our work on drone-embedded SSL using recordings from an 8-channel cube-shaped microphone array embedded in an unmanned aerial vehicle (UAV). We use angular spectrum-based TDOA (time difference of arrival) estimation methods such as generalized cross-correlation phase-transform (GCC-PHAT), minimum-variance-distortion-less-response (MVDR) as baseline, which are state-of-the-art techniques for SSL. Though we improve the baseline method by reducing ego-noise using speed correlated harmonics cancellation (SCHC) technique, our main focus is to utilize deep learning techniques to solve this challenging problem. Here, we propose an end-to-end deep learning model, called DOANet, for SSL. DOANet is based on a one-dimensional dilated convolutional neural network that computes the azimuth and elevation angles of the target sound source from the raw audio signal. The advantage of using DOANet is that it does not require any hand-crafted audio features or ego-noise reduction for DOA estimation. We then evaluate the SSL performance using the proposed and baseline methods and find that the DOANet shows promising results compared to both the angular spectrum methods with and without SCHC. To evaluate the different methods, we also introduce a well-known parameter—area under the curve (AUC) of cumulative histogram plots of angular deviations—as a performance indicator which, to our knowledge, has not been used as a performance indicator for this sort of problem before.

中文翻译：

DOANet：一种用于搜索和救援的深度扩张卷积神经网络方法，具有无人机嵌入式声源定位

由于恶劣的照明条件或遮挡，无人机嵌入式声源定位 (SSL) 在具有挑战性的搜索和救援场景中具有有趣的应用前景。然而，严重的无人机自我噪声可能会导致记录的麦克风信号中的负信噪比，使问题变得复杂。在本文中，我们使用嵌入在无人机 (UAV) 中的 8 通道立方体麦克风阵列的录音介绍了我们在无人机嵌入式 SSL 方面的工作。我们使用基于角谱的 TDOA（到达时间差）估计方法，例如广义互相关相变换 (GCC-PHAT)、最小方差失真小响应 (MVDR) 作为基线，它们是状态- 最先进的 SSL 技术。尽管我们通过使用速度相关谐波消除 (SCHC) 技术减少自我噪声来改进基线方法，但我们的主要重点是利用深度学习技术来解决这个具有挑战性的问题。在这里，我们提出了一种用于 SSL 的端到端深度学习模型，称为 DOANet。DOANet 基于一维扩张卷积神经网络，可根据原始音频信号计算目标声源的方位角和仰角。使用 DOANet 的优势在于它不需要任何手工制作的音频特征或用于 DOA 估计的自我降噪。然后，我们使用建议的方法和基线方法评估 SSL 性能，并发现与使用和不使用 SCHC 的角谱方法相比，DOANet 显示出有希望的结果。为了评估不同的方法，

更新日期：2020-11-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文