VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition
arXiv - CS - Sound Pub Date : 2020-09-09 , DOI: arxiv-2009.04323
Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein

We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. Delivering such a model presents numerous challenges: It should improve the performance when the input signal consists of overlapped speech, and must not hurt the speech recognition performance under all other acoustic conditions. Besides, this model must be tiny, fast, and perform inference in a streaming fashion, in order to have minimal impact on CPU, memory, battery and latency. We propose novel techniques to meet these multi-faceted requirements, including using a new asymmetric loss, and adopting adaptive runtime suppression strength. We also show that such a model can be quantized as a 8-bit integer model and run in realtime.

中文翻译：

VoiceFilter-Lite：用于设备语音识别的流式目标语音分离

作为流式语音识别系统的一部分，我们引入了 VoiceFilter-Lite，这是一种在设备上运行以仅保留来自目标用户的语音信号的单通道源分离模型。提供这样的模型面临着许多挑战：当输入信号由重叠语音组成时，它应该提高性能，并且在所有其他声学条件下不得损害语音识别性能。此外，该模型必须小巧、快速，并以流式方式执行推理，以便对 CPU、内存、电池和延迟的影响最小。我们提出了满足这些多方面要求的新技术，包括使用新的非对称损失和采用自适应运行时抑制强度。我们还表明，这样的模型可以量化为 8 位整数模型并实时运行。

更新日期：2020-09-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>