当前位置: X-MOL 学术Expert Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A fuzzy‐clustering‐based hierarchical i‐vector/probabilistic linear discriminant analysis system for text‐dependent speaker verification
Expert Systems ( IF 3.0 ) Pub Date : 2020-01-30 , DOI: 10.1111/exsy.12496
Mohammad Azharuddin Laskar 1 , Rabul Hussain Laskar 1
Affiliation  

In the i‐vector/probabilistic linear discriminant analysis (PLDA) technique, the PLDA backend classifier is modelled on i‐vectors. PLDA defines an i‐vector subspace that compensates the unwanted variability and helps to discriminate among speaker‐phrase pairs. The channel or session variability manifested in i‐vectors are known to be nonlinear in nature. PLDA training, however, assumes the variability to be linearly separable, thereby causing loss of important discriminating information. Besides, the i‐vector estimation, itself, is known to be poor in case of short utterances. This paper attempts to address these issues using a simple hierarchy‐based system. A modified fuzzy‐clustering technique is employed to divide the feature space into more characteristic feature subspaces using vocal source features. Thereafter, a separate i‐vector/PLDA model is trained for each of the subspaces. The sparser alignment owing to subspace‐specific universal background model and the relatively reduced dimensions of variability in individual subspaces help to train more effective i‐vector/PLDA models. Also, vocal source features are complementary to mel frequency cepstral coefficients, which are transformed into i‐vectors using mixture model technique. As a consequence, vocal source features and i‐vectors tend to have complementary information. Thus using vocal source features for classification in a hierarchy tree may help to differentiate some of the speaker‐phrase classes, which otherwise are not easily discriminable based on i‐vectors. The proposed technique has been validated on Part 1 of RSR2015 database, and it shows a relative equal error rate reduction of up to 37.41% with respect to the baseline i‐vector/PLDA system.

中文翻译:

基于模糊聚类的分层i向量/概率线性判别分析系统,用于文本相关的说话人验证

在i向量/概率线性判别分析(PLDA)技术中,PLDA后端分类器是在i向量上建模的。PLDA定义了一个i-vector子空间,该子空间可补偿不必要的可变性并有助于区分说话人短语对。在i向量中显示的通道或会话可变性本质上是非线性的。但是,PLDA训练假定可变性是线性可分离的,从而导致重要的区分信息丢失。此外,众所周知,在短话语的情况下,i矢量估计本身很差。本文试图使用一个简单的基于层次的系统来解决这些问题。一种改进的模糊聚类技术被用来使用声源特征将特征空间划分为更多特征子空间。之后,为每个子空间训练一个单独的ivector / PLDA模型。由于特定于子空间的通用背景模型而导致的稀疏对齐以及各个子空间中可变性的相对减小的维度有助于训练更有效的i-vector / PLDA模型。而且,声源特征与梅尔频率倒谱系数互补,后者使用混合模型技术转换为i向量。结果,声源特征和i向量趋向于具有互补信息。因此,使用语音源特征在层次树中进行分类可能有助于区分某些说话者短语类,否则这些类很难基于i向量进行区分。所提出的技术已在RSR2015数据库的第1部分中得到验证,并且显示出相对相等的错误率降低幅度高达37。
更新日期:2020-01-30
down
wechat
bug