Masked multi-center angular margin loss for language recognition,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Masked multi-center angular margin loss for language recognition
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2022-07-07 , DOI: 10.1186/s13636-022-00249-4
Minghang Ju , Yanyan Xu , Dengfeng Ke , Kaile Su

Language recognition based on embedding aims to maximize inter-class variance and minimize intra-class variance. Previous researches are limited to the training constraint of a single centroid, which cannot accurately describe the overall geometric characteristics of the embedding space. In this paper, we propose a novel masked multi-center angular margin (MMAM) loss method from the perspective of multiple centroids, resulting in a better overall performance. Specifically, numerous global centers are used to jointly approximate entities of each class. To capture the local neighbor relationship effectively, a small number of centers are adapted to construct the similarity relationship between these centers and each entity. Furthermore, we use a new reverse label propagation algorithm to adjust neighbor relations according to the ground truth labels to learn a discriminative metric space in the classification process. Finally, an additive angular margin is added, which understands more discriminative language embeddings by simultaneously enhancing intra-class compactness and inter-class discrepancy. Experiments are conducted on the APSIPA 2017 Oriental Language Recognition (AP17-OLR) corpus. We compare the proposed MMAM method with seven state-of-the-art baselines and verify that our method has 26.2% and 31.3% relative improvements in the equal error rate (EER) and Cavg respectively in the full-length test (“full-length” means the average duration of the utterances is longer than 5 s). Also, there are 31.2% and 29.3% relative improvements in the 3-s test and 14% and 14.8% relative improvements in the 1-s test.

中文翻译：

用于语言识别的掩蔽多中心角边距损失

基于嵌入的语言识别旨在最大化类间方差和最小化类内方差。以前的研究仅限于单个质心的训练约束，无法准确描述嵌入空间的整体几何特征。在本文中，我们从多个质心的角度提出了一种新颖的掩蔽多中心角余量（MMAM）损失方法，从而获得了更好的整体性能。具体来说，许多全局中心用于联合近似每个类的实体。为了有效地捕捉局部邻居关系，采用少量中心来构建这些中心与每个实体之间的相似关系。此外，我们使用一种新的反向标签传播算法来根据地面实况标签调整邻居关系，以在分类过程中学习判别度量空间。最后，添加了一个附加的角边距，它通过同时增强类内紧凑性和类间差异来理解更具辨别力的语言嵌入。实验在 APSIPA 2017 东方语言识别（AP17-OLR）语料库上进行。我们将提出的 MMAM 方法与七个最先进的基线进行比较，并验证我们的方法在全长测试中的等错误率 (EER) 和 Cavg 分别有 26.2% 和 31.3% 的相对改进（“长度”是指话语的平均持续时间超过 5 秒）。此外，3-s 测试和 14% 和 14 的相对改进分别为 31.2% 和 29.3%。

更新日期：2022-07-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文