Analysis and classification of phonation types in speech and singing voice,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Analysis and classification of phonation types in speech and singing voice
Speech Communication ( IF 3.2 ) Pub Date : 2020-02-25 , DOI: 10.1016/j.specom.2020.02.004
Sudarsana Reddy Kadiri , Paavo Alku , B. Yegnanarayana

Both in speech and singing, humans are capable of generating sounds of different phonation types (e.g., breathy, modal and pressed). Previous studies in the analysis and classification of phonation types have mainly used voice source features derived using glottal inverse filtering (GIF). Even though glottal source features are useful in discriminating phonation types in speech, their performance deteriorates in singing voice due to the high fundamental frequency of these sounds that reduces the accuracy of source-filter separation in GIF. In the present study, features describing the glottal source were computed using three signal processing methods that do not compute source-filter separation. These three methods are zero frequency filtering (ZFF), zero time windowing (ZTW) and single frequency filtering (SFF). From each method, a group of scalar features were extracted. In addition, cepstral coefficients were derived from the spectra computed using ZTW and SFF. Experiments were conducted with the proposed features to analyse and classify phonation types using three phonation types (breathy, modal and pressed) for speech and singing voice. Statistical pair-wise comparisons between the phonation types showed that most of the features were capable of separating the phonation types significantly for speech and singing voices. Classification with support vector machine classifiers indicated that the proposed features and their combinations showed improved accuracy compared to usually employed glottal source features and mel-frequency cepstral coefficients (MFCCs).

中文翻译：

语音和歌声中发声类型的分析和分类

在语音和唱歌中，人类都能够产生不同发声类型的声音（例如，呼吸声，模态声和压迫声）。先前对发声类型进行分析和分类的研究主要使用了通过声门逆滤波（GIF）得出的语音源特征。尽管声门声源特征可用于区分语音中的发声类型，但由于这些声音的高基频会降低GIF中声源-滤波器分离的准确性，因此声源的性能在歌唱语音中会变差。在本研究中，描述声门源的特征是使用三种不计算源-过滤器分离的信号处理方法来计算的。这三种方法是零频率滤波（ZFF），零时间窗口（ZTW）和单频滤波（SFF）。从每种方法，提取了一组标量特征。另外，倒频谱系数是从使用ZTW和SFF计算的光谱得出的。使用提出的功能进行了实验，使用语音和歌唱语音的三种发声类型（呼吸，模态和按动）分析和分类发声类型。语音类型之间的成对统计比较表明，大多数功能都能够有效区分语音和歌唱语音的语音类型。支持向量机分类器的分类表明，与常用的声门源特征和梅尔频率倒谱系数（MFCC）相比，拟议的特征及其组合显示出更高的准确性。倒频谱系数由使用ZTW和SFF计算的光谱得出。使用提出的功能进行了实验，使用语音和歌唱语音的三种发声类型（呼吸，模态和按动）分析和分类发声类型。语音类型之间的成对统计比较表明，大多数功能都能够有效区分语音和歌唱语音的语音类型。支持向量机分类器的分类表明，与常用的声门源特征和梅尔频率倒谱系数（MFCC）相比，提出的特征及其组合显示出更高的准确性。倒频谱系数由使用ZTW和SFF计算的光谱得出。使用提出的功能进行了实验，使用语音和歌唱语音的三种发声类型（呼吸，模态和按动）分析和分类发声类型。语音类型之间的成对统计比较表明，大多数功能都能够有效区分语音和歌唱语音的语音类型。支持向量机分类器的分类表明，与常用的声门源特征和梅尔频率倒谱系数（MFCC）相比，提出的特征及其组合显示出更高的准确性。模态和按下）以进行语音和唱歌。语音类型之间的成对统计比较表明，大多数功能都能够有效区分语音和歌唱语音的语音类型。支持向量机分类器的分类表明，与通常使用的声门源特征和梅尔频率倒谱系数（MFCC）相比，提出的特征及其组合显示出更高的准确性。模态和按下）以进行语音和歌唱。语音类型之间的成对统计比较表明，大多数功能都能够有效区分语音和歌唱语音的语音类型。支持向量机分类器的分类表明，与常用的声门源特征和梅尔频率倒谱系数（MFCC）相比，提出的特征及其组合显示出更高的准确性。

更新日期：2020-02-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>