Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss,Applied Sciences

当前位置： X-MOL 学术 › Appl. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss
Applied Sciences ( IF 2.5 ) Pub Date : 2020-10-26 , DOI: 10.3390/app10217522
Labib Chowdhury , Hasib Zunair , Nabeel Mohammed

Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.

中文翻译：

强大的深层说话人识别能力：通过联合角余量损失学习潜在表示

说话人识别在安全性，自动化和身份验证等方面的应用日益普及。对于说话人识别，基于深度卷积网络的方法（例如SincNet）被用作i向量的替代方法。由SincNet中的参数化sinc函数执行的卷积显示了在该领域的出色结果。该系统优化了softmax损失，该损失集成在负责进行预测的分类层中。由于这种损失的性质仅仅是增加类间距离，因此对于生物识别身份验证任务（例如面部和说话者识别）而言，它并非始终是最佳的设计选择。为了克服上述问题，本研究提出了一系列模型，这些模型对最新的SincNet模型进行了改进。拟议模型AF-SincNet，Ensemble-SincNet和ALL-SincNet可以作为成功的SincNet模型的潜在继任者。在许多说话人识别数据集（例如TIMIT和LibriSpeech）中，将提出的模型进行了比较，并面临各自独特的挑战。与竞争基准相比，性能得到了证明。在数据集间评估中，报告得最好的模型不仅始终优于基线和当前的先前模型，而且在看不见且多样化的任务（如孟加拉语说话者识别）上也得到了很好的概括。

更新日期：2020-10-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文