当前位置: X-MOL 学术IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 5.4 ) Pub Date : 2009-07-01 , DOI: 10.1109/tasl.2009.2014795
Ozlem Kalinli 1 , Shrikanth Narayanan
Affiliation  

Auditory attention is a complex mechanism that involves the processing of low-level acoustic cues together with higher level cognitive cues. In this paper, a novel method is proposed that combines biologically inspired auditory attention cues with higher level lexical and syntactic information to model task-dependent influences on a given spoken language processing task. A set of low-level multiscale features (intensity, frequency contrast, temporal contrast, orientation, and pitch) is extracted in parallel from the auditory spectrum of the sound based on the processing stages in the central auditory system to create feature maps that are converted to auditory gist features that capture the essence of a sound scene. The auditory attention model biases the gist features in a task-dependent way to maximize target detection in a given scene. Furthermore, the top-down task-dependent influence of lexical and syntactic information is incorporated into the model using a probabilistic approach. The lexical information is incorporated by using a probabilistic language model, and the syntactic knowledge is modeled using part-of-speech (POS) tags. The combined model is tested on automatically detecting prominent syllables in speech using the BU Radio News Corpus. The model achieves 88.33% prominence detection accuracy at the syllable level and 85.71% accuracy at the word level. These results compare well with reported human performance on this task.

中文翻译:

使用听觉注意提示和任务相关的高级信息进行突出检测。

听觉注意力是一种复杂的机制,涉及处理低级听觉线索和高级认知线索。在本文中,提出了一种新方法,该方法将受生物学启发的听觉注意线索与更高级别的词汇和句法信息相结合,以模拟对给定口语处理任务的依赖于任务的影响。根据中央听觉系统中的处理阶段,从声音的听觉频谱中并行提取一组低级多尺度特征(强度、频率对比度、时间对比度、方向和音调),以创建转换后的特征图到听觉要点特征,捕捉声音场景的本质。听觉注意力模型以依赖于任务的方式偏向主旨特征,以最大化给定场景中的目标检测。此外,使用概率方法将词汇和句法信息的自上而下依赖于任务的影响合并到模型中。词汇信息是通过使用概率语言模型来合并的,句法知识是使用词性 (POS) 标签建模的。使用 BU 广播新闻语料库对组合模型自动检测语音中的突出音节进行了测试。该模型在音节级别达到了 88.33% 的突出检测准确率,在单词级别达到了 85.71% 的准确率。这些结果与报告的人类在此任务上的表现相得益彰。并且使用词性 (POS) 标签对句法知识进行建模。使用 BU 广播新闻语料库对组合模型自动检测语音中的突出音节进行了测试。该模型在音节级别达到了 88.33% 的突出检测准确率,在单词级别达到了 85.71% 的准确率。这些结果与报告的人类在此任务上的表现相得益彰。并且使用词性 (POS) 标签对句法知识进行建模。使用 BU 广播新闻语料库对组合模型自动检测语音中的突出音节进行了测试。该模型在音节级别达到了 88.33% 的突出检测准确率,在单词级别达到了 85.71% 的准确率。这些结果与报告的人类在此任务上的表现相得益彰。
更新日期:2019-11-01
down
wechat
bug