当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speaker discrimination: Citation tones vs. coarticulated tones
Speech Communication ( IF 2.4 ) Pub Date : 2020-02-21 , DOI: 10.1016/j.specom.2019.06.006
Ricky KW Chan

The task of forensic voice comparison (FVC) often involves the comparison of a voice in an offender recording with that in a suspect recording, with the aim to assist the investigating authority or the court in determining the identity of the speaker. One of the main goals in FVC research is to identify speech variables that are useful for differentiating speakers. While French and Stevens (2013) stated that connected speech processes (CSPs) vary across speakers and thus CSPs may be included in the ‘toolbox’ for forensic voice comparison casework, little empirical research has been done to test how effective various CSPs are in speaker discrimination. This paper reports an exploratory study comparing the speaker-discriminatory power of lexical tones in their citation forms and coarticulated tones. 20 Cantonese and 20 Mandarin speakers were instructed to produce tones under different speech rates and tonal contexts. Results based on discriminant analysis show that the combination of normal speech rate and compatible tonal context appears to have yielded the best speaker discrimination. On the other hand, the combination of fast speech and a conflicting tonal context, which in principle led to the greatest tonal coarticulatory effects, yielded the worst speaker discrimination. The addition of duration on top of tonal f0 significantly improved the classification rates in both languages. Furthermore, for the same tone categories, the Mandarin ones generally discriminate speakers better than the Cantonese counterparts, suggesting that tone inventory density affects the speaker-discriminatory power of tones. Implications of the findings for forensic speaker comparison are discussed.



中文翻译:

说话者辨别力:引语声与共鸣声

法证语音比较(FVC)的任务通常涉及将罪犯录音中的声音与嫌疑犯录音中的声音进行比较,以协助调查机构或法院确定说话人的身份。FVC研究的主要目标之一是识别对区分说话者有用的语音变量。French和Stevens(2013)指出,说话人之间的连接语音过程(CSP)有所不同,因此CSP可能包含在法医语音比较案例工作的“工具箱”中,但很少进行经验研究来测试各种CSP在说话者中的有效性歧视。本文报告了一项探索性研究,比较了词汇声调在其引文形式和共鸣声中的说话人区分能力。指导20位广东话和20位普通话说话者在不同的语速和语境下产生音调。基于判别分析的结果表明,正常语速和兼容的语境相结合似乎可以产生最佳的说话人辨别力。另一方面,快速语音和冲突的语境相结合,原则上导致最大的音调协同发音效果,则使讲话者的辨别力最差。在音调f0之上增加持续时间可以显着提高两种语言的分类率。此外,对于相同的音调类别,普通话通常比广东话更好地区分说话者,这表明音调存量密度会影响说话人的辨别力。

更新日期:2020-02-21
down
wechat
bug