当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discriminatory and Orthogonal Feature Learning for Noise Robust Keyword Spotting
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 9-2-2022 , DOI: 10.1109/lsp.2022.3203911
Donghyeon Kim 1 , Kyungdeuk Ko 1 , David K. Han 2 , Hanseok Ko 1
Affiliation  

Keyword Spotting (KWS) is an essential component in a smart device for alerting the system when a user prompts it with a command. As these devices are typically constrained by computational and energy resources, the KWS model should be designed with a small footprint. In our previous work, we developed lightweight dynamic filters which extract a robust feature map within a noisy environment. The learning variables of the dynamic filter are jointly optimized with KWS weights by using Cross-Entropy (CE) loss. CE loss alone, however, is not sufficient for high performance when the SNR is low. In order to train the network for more robust performance in noisy environments, we introduce the LOw Variant Orthogonal (LOVO) loss. The LOVO loss is composed of a triplet loss applied on the output of the dynamic filter, a spectral norm-based orthogonal loss, and an inner class distance loss applied in the KWS model. These losses are particularly useful in encouraging the network to extract discriminatory features in unseen noise environments.

中文翻译:


用于噪声鲁棒关键字识别的判别性和正交特征学习



关键字识别 (KWS) 是智能设备中的重要组件,用于在用户用命令提示时向系统发出警报。由于这些设备通常受到计算和能源资源的限制,因此 KWS 模型的设计应占用较小的空间。在我们之前的工作中,我们开发了轻量级动态滤波器,可以在嘈杂的环境中提取鲁棒的特征图。通过使用交叉熵(CE)损失,动态滤波器的学习变量与 KWS 权重联合优化。然而,当 SNR 较低时,仅 CE 损耗不足以实现高性能。为了训练网络在噪声环境中获得更鲁棒的性能,我们引入了低变体正交(LOVO)损失。 LOVO 损失由应用于动态滤波器输出的三元组损失、基于谱范数的正交损失和应用于 KWS 模型的内部类距离损失组成。这些损失对于鼓励网络在不可见的噪声环境中提取歧视性特征特别有用。
更新日期:2024-08-28
down
wechat
bug