当前位置: X-MOL 学术IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Acoustic Measure for Word Prominence in Spontaneous Speech.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 5.4 ) Pub Date : 2007-02-01 , DOI: 10.1109/tasl.2006.881703
Dagen Wang 1 , Shrikanth Narayanan
Affiliation  

An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information.

中文翻译:

自发语音中单词突出的声学测量。

本文报道了一种自动语音突出检测的算法。我们描述了对单词显着性检测的各种声学特征的比较分析,并使用具有手动分配的显着性标签的口语对话语料库报告结果。重点是频谱强度和语速等特征,这些特征基于基于相关性的方法直接从语音中提取,无需明确的语言或语音知识。此外,还研究了各种基于音调的措施,以了解它们对突出检测的区分能力。提出了一种用于模拟音高平台的参数化方案,并且发现仅此特征就优于传统的局部音高统计。两组实验用于探索使用这些特征生成的声学分数的有用性。第一组侧重于基于手动标记语料库的更传统的单词显着性检测方法。在角色扮演口语对话的语料库中实现了 76.8% 的分类准确率。由于手动将语音突出度标记为离散级别(类别)的困难,第二组实验侧重于间接评估分数。具体来说,通过在 Switchboard 语料库上的实验,表明所提出的声学分数可以以统计显着的方式区分内容词和功能词。还探讨了语音突出与内容/功能词之间的关系。由于突出词往往主要是内容词,并且由于内容词可以从文本衍生的词性 (POS) 信息中自动标记,
更新日期:2019-11-01
down
wechat
bug