当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Glottal features for classification of phonation type from speech and neck surface accelerometer signals
Computer Speech & Language ( IF 4.3 ) Pub Date : 2021-04-27 , DOI: 10.1016/j.csl.2021.101232
Sudarsana Reddy Kadiri , Paavo Alku

Glottal source characteristics vary between phonation types due to the tension of laryngeal muscles with the respiratory effort. Previous studies in the classification of phonation type have mainly used speech signals recorded by microphone. Recently, two studies were published in the classification of phonation type using neck surface accelerometer (NSA) signals. However, there are no previous studies comparing the use of the acoustic speech signal vs. the NSA signal as input in classifying phonation type. Therefore, the current study investigates simultaneously recorded speech and NSA signals in the classification of three phonation types (breathy, modal, pressed). The general goal is to understand which of the two signals (speech vs. NSA) is more effective in the classification task. We hypothesize that by using the same feature set for both signals, classification accuracy is higher for the NSA signal, which is more closely related to the physical vibration of the vocal folds and less affected by the vocal tract compared to the acoustical speech signal. Glottal source waveforms were computed using two signal processing methods, quasi-closed phase (QCP) glottal inverse filtering and zero frequency filtering (ZFF), and a group of time-domain and frequency-domain scalar features were computed from the obtained waveforms. In addition, the study investigated the use of mel-frequency cepstral coefficients (MFCCs) derived from the glottal source waveforms computed by QCP and ZFF. Classification experiments with support vector machine classifiers revealed that the NSA signal showed better discrimination of the phonation types compared to the speech signal when the same feature set was used. Furthermore, it was observed that the glottal features showed complementary information with the conventional MFCC features resulting in the best classification accuracy both for the NSA signal (86.9%) and the speech signal (80.6%).



中文翻译:

从语音和颈部表面加速度计信号中对发声类型进行分类的声门特征

由于喉部肌肉在呼吸作用下的张力,声门的声源特性在不同的发声类型之间会有所不同。关于发声类型的分类的先前研究主要使用麦克风记录的语音信号。最近,发表了两项使用颈部表面加速度计(NSA)信号进行发声类型分类的研究。但是,目前尚无将声语音信号与NSA信号用作分类发声类型的输入进行比较的研究。因此,当前的研究在三种发声类型(呼吸,语气,压迫)的分类中,对同时录制的语音和NSA信号进行了调查。总体目标是了解两种信号中的哪一种(语音与NSA)在分类任务中更有效。我们假设通过对两个信号使用相同的特征集,NSA信号的分类精度更高,与声学语音信号相比,它与声带的物理振动更紧密相关,并且受声道的影响较小。使用两种信号处理方法(准闭相(QCP),声门逆滤波和零频滤波(ZFF))来计算声源波形,并从获得的波形中计算出一组时域和频域标量特征。此外,该研究还研究了根据QCP和ZFF计算的声门源波形得出的梅尔频率倒谱系数(MFCC)的使用。使用支持向量机分类器的分类实验表明,与使用相同功能集的语音信号相比,NSA信号显示出对语音类型的更好区分。此外,观察到声门特征显示出与常规MFCC特征的互补信息,从而导致NSA信号(86.9%)和语音信号(80.6%)的最佳分类精度。

更新日期:2021-05-09
down
wechat
bug