当前位置: X-MOL 学术J. Sign. Process. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improved Speech-Signal Based Frequency Warping Scale for Cepstral Feature in Robust Speaker Verification System
Journal of Signal Processing Systems ( IF 1.8 ) Pub Date : 2020-03-11 , DOI: 10.1007/s11265-020-01517-2
Susanta Kumar Sarangi , Goutam Saha

Development of automatic speaker verification system (ASV) for real-world applications remains a major challenge. In this paper, we propose an improved speech-signal-based frequency warping scale to extract cepstral features from the speech signal for ASV application. The proposed scale is a modified version of the speech-signal-based scale, successfully used in speech recognition application, an allied domain. It uses spectral entropy weighted power spectral density to extract speaker specific attributes. This is complementary to fixed scale based mel frequency cepstral coefficient (MFCC) for different emphasis given to spectral regions. The work uses fusion based approach to exploit the complementarity of static MFCC and proposed feature. The performances of the ASV system that uses MFCC and the proposed technique are evaluated in clean and various noisy conditions on publicly available NIST SRE databases. Noise database (NOISEX-92) is used to simulate the noisy environment. The ASV system developed from the proposed feature extraction method shows slightly improved performance than baseline MFCC and SFCC (speech-signal-based frequency cepstral coefficient) based techniques in clean condition and up to 38.15% and 17.15%, respectively in noisy conditions. The fusion-based approach further improves the performance of ASV system with up to 53.85% and 36.22% relative improvement over baseline MFCC and SFCC based feature extraction methods, respectively.



中文翻译:

鲁棒说话人验证系统中基于改进的语音信号倒频谱特征的频率弯曲量表

为实际应用开发自动扬声器验证系统(ASV)仍然是一项重大挑战。在本文中,我们提出了一种改进的基于语音信号的频率规整尺度,以从语音信号中提取倒谱特征,用于ASV应用。拟议的量表是基于语音信号量表的改进版本,已成功用于语音识别应用(相关领域)。它使用频谱熵加权功率频谱密度来提取说话者的特定属性。这是对基于固定比例的梅尔频率倒谱系数(MFCC)的补充,以实现对频谱区域的不同强调。该工作使用基于融合的方法来利用静态MFCC和所建议功能的互补性。在公开可用的NIST SRE数据库上,在干净和各种嘈杂的条件下,评估了使用MFCC的ASV系统的性能和所提出的技术。噪声数据库(NOISEX-92)用于模拟嘈杂的环境。从提出的特征提取方法开发的ASV系统在清洁条件下的性能比基于基线MFCC和SFCC(基于语音信号的频率倒谱系数)的技术略有改善,在嘈杂条件下分别达到38.15%和17.15%。基于融合的方法进一步提高了ASV系统的性能,分别比基于基线MFCC和SFCC的特征提取方法分别提高了53.85%和36.22%。从提出的特征提取方法开发的ASV系统在清洁条件下的性能比基于基线MFCC和SFCC(基于语音信号的频率倒谱系数)的技术略有改善,在嘈杂条件下分别达到38.15%和17.15%。基于融合的方法进一步提高了ASV系统的性能,分别比基于基线MFCC和SFCC的特征提取方法分别提高了53.85%和36.22%。从提出的特征提取方法开发的ASV系统在清洁条件下的性能比基于基线MFCC和SFCC(基于语音信号的频率倒谱系数)的技术略有改善,在嘈杂条件下分别达到38.15%和17.15%。基于融合的方法进一步提高了ASV系统的性能,分别比基于基线MFCC和SFCC的特征提取方法分别提高了53.85%和36.22%。

更新日期:2020-04-18
down
wechat
bug