当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speaker clustering quality estimation with logistic regression
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-08-07 , DOI: 10.1016/j.csl.2020.101139
Yishai Cohen , Itshak Lapidot

This paper focuses on estimating the quality of a clustering process. The task is to cluster short speech segments that belong to different speakers. A variety of statistical parameters are estimated from the output of the clustering process. These parameters are used to train a logistic regression to serve as a clustering quality estimation system. In this paper, mean-shift clustering with either a cosine distance or probabilistic linear discriminant analysis (PLDA) score as the similarity measure, as well as stochastic vector quantization (VQ) with cosine distance, are applied in order to cluster the short speaker segments, which are represented by i-vectors. The quality of the clustering is measured using the average cluster purity (ACP), average speaker purity (ASP) and K, which is the geometric mean of ASP and ACP. We show that these measures can be estimated fairly well by applying logistic regression. Moreover, clustering quality may be well estimated even if the logistic regression was trained using parameters derived from a different clustering algorithm. This is very important, as it allows the use of a single quality estimation system, without the need for retraining when the clustering method is changed.

Additionally, we showed how the clustering quality estimator could be served as an estimator of the number of clusters. For VQ-based clustering the number of clusters has to be predefined. We perform the clustering with different number of clusters. The best number of clusters is estimated as the clustering that achieved the higher estimation of the K value. We will show that this approach estimate the best number of clusters accurately.



中文翻译:

逻辑回归的说话人聚类质量估计

本文着重于评估聚类过程的质量。任务是将属于不同说话者的简短语音片段聚类。从聚类过程的输出中估计出各种统计参数。这些参数用于训练逻辑回归以用作聚类质量估计系统。在本文中,采用余弦距离或概率线性判别分析(PLDA)得分作为相似性度量的均值漂移聚类,以及具有余弦距离的随机矢量量化(VQ),以对短说话者片段进行聚类由i-vector表示。聚类的质量使用平均聚类纯度ACP),平均说话者纯度ASP)和K,这是ASPACP的几何平均值。我们表明,通过应用逻辑回归可以很好地估计这些度量。此外,即使使用从不同聚类算法得出的参数训练了逻辑回归,也可以很好地估计聚类质量。这非常重要,因为它允许使用单个质量评估系统,而在更改聚类方法时无需重新培训。

此外,我们展示了如何将聚类质量估算器用作聚类数量的估算器。对于基于VQ的群集,必须预定义群集的数量。我们使用不同数量的集群执行集群。聚类的最佳数目被估计为实现了更高的K值估计的聚类。我们将证明该方法可以准确地估计最佳群集数。

更新日期:2020-08-14
down
wechat
bug