An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means,Mathematical Problems in Engineering

当前位置： X-MOL 学术 › Math. Probl. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means
Mathematical Problems in Engineering Pub Date : 2020-09-12 , DOI: 10.1155/2020/3608286
Nan Jiang ₁ , Ting Liu ₂

Affiliation

This paper studies the segmentation and clustering of speaker speech. In order to improve the accuracy of speech endpoint detection, the traditional double-threshold short-time average zero-crossing rate is replaced by a better spectrum centroid feature, and the local maxima of the statistical feature sequence histogram are used to select the threshold, and a new speech endpoint detection algorithm is proposed. Compared with the traditional double-threshold algorithm, it effectively improves the detection accuracy and antinoise in low SNR. The k-means algorithm of conventional clustering needs to give the number of clusters in advance and is greatly affected by the choice of initial cluster centers. At the same time, the self-organizing neural network algorithm converges slowly and cannot provide accurate clustering information. An improved k-means speaker clustering algorithm based on self-organizing neural network is proposed. The number of clusters is predicted by the winning situation of the competitive neurons in the trained network, and the weights of the neurons are used as the initial cluster centers of the k-means algorithm. The experimental results of multiperson mixed speech segmentation show that the proposed algorithm can effectively improve the accuracy of speech clustering and make up for the shortcomings of the k-means algorithm and self-organizing neural network algorithm.

中文翻译：

基于SOM和K-Means的改进的语音分割与聚类算法

本文研究了说话人语音的分割和聚类。为了提高语音端点检测的准确性，将传统的双阈值短时平均零交叉率替换为更好的频谱质心特征，并使用统计特征序列直方图的局部最大值来选择阈值，提出了一种新的语音端点检测算法。与传统的双阈值算法相比，有效地提高了低信噪比下的检测精度和抗噪性能。该ķ传统聚类的-means算法需要提前给出聚类数量，并且受初始聚类中心选择的影响很大。同时，自组织神经网络算法收敛缓慢，无法提供准确的聚类信息。提出了一种基于自组织神经网络的改进的k均值说话人聚类算法。通过训练网络中竞争神经元的获胜情况来预测簇的数量，并将神经元的权重用作k的初始簇中心。-均值算法。多人混合语音分割的实验结果表明，该算法可以有效提高语音聚类的准确性，弥补了k均值算法和自组织神经网络算法的不足。

更新日期：2020-09-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11