当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speaker recognition based on characteristic spectrograms and an improved self-organizing feature map neural network
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2020-06-29 , DOI: 10.1007/s40747-020-00172-1
Yanjie Jia , Xi Chen , Jieqiong Yu , Lianming Wang , Yuanzhe Xu , Shaojin Liu , Yonghui Wang

To obtain a speaker’s pronunciation characteristics, a method is proposed based on an idea from bionics, which uses spectrogram statistics to achieve a characteristic spectrogram to give a stable representation of the speaker’s pronunciation from a linear superposition of short-time spectrograms. To deal with the issue of slow network training and recognition speed for speaker recognition systems on resource-constrained devices, based on a traditional SOM neural network, an adaptive clustering self-organizing feature map SOM (AC-SOM) algorithm is proposed. This algorithm automatically adjusts the number of neurons in the competition layer based on the number of speakers to be recognized until the number of clusters matches the number of speakers. A 100-speaker database of characteristic spectrogram samples was built and applied to the proposed AC-SOM model, yielding a maximum training time of only 304 s, with a maximum sample recognition time of less than 28 ms. Comparing to other approaches, the proposed method offers greatly improved training and recognition speed without sacrificing too much recognition accuracy. The promising results suggest that the proposed method satisfies real-time data processing and execution requirements for edge intelligence systems better than other speaker recognition methods.



中文翻译:

基于特征谱图和改进的自组织特征图神经网络的说话人识别

为了获得说话者的发音特征,基于仿生学的思想提出了一种方法,该方法利用频谱图统计数据来获得特征频谱图,从而从短时频谱图的线性叠加中给出说话者语音的稳定表示。为了解决资源受限设备上说话人识别系统的网络训练和识别速度慢的问题,基于传统的SOM神经网络,提出了一种自适应聚类自组织特征图SOM(AC-SOM)算法。该算法会根据要识别的说话者数量自动调整竞争层中的神经元数量,直到簇的数量与说话者数量匹配为止。建立了一个100个扬声器的特征频谱图样本数据库,并将其应用于建议的AC-SOM模型,该模型的最大训练时间仅为304 s,最大样本识别时间小于28 ms。与其他方法相比,所提出的方法提供了大大提高的训练和识别速度,而不会牺牲太多的识别精度。有希望的结果表明,与其他说话人识别方法相比,该方法可以更好地满足边缘智能系统的实时数据处理和执行要求。

更新日期:2020-06-29
down
wechat
bug