当前位置: X-MOL 学术IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 5.4 ) Pub Date : 2008-01-01 , DOI: 10.1109/tasl.2008.917071
Vivek Kumar Rangarajan Sridhar 1 , Srinivas Bangalore , Shrikanth S Narayanan
Affiliation  

In this paper, we describe a maximum entropy-based automatic prosody labeling framework that exploits both language and speech information. We apply the proposed framework to both prominence and phrase structure detection within the Tones and Break Indices (ToBI) annotation scheme. Our framework utilizes novel syntactic features in the form of supertags and a quantized acoustic-prosodic feature representation that is similar to linear parameterizations of the prosodic contour. The proposed model is trained discriminatively and is robust in the selection of appropriate features for the task of prosody detection. The proposed maximum entropy acoustic-syntactic model achieves pitch accent and boundary tone detection accuracies of 86.0% and 93.1% on the Boston University Radio News corpus, and, 79.8% and 90.3% on the Boston Directions corpus. The phrase structure detection through prosodic break index labeling provides accuracies of 84% and 87% on the two corpora, respectively. The reported results are significantly better than previously reported results and demonstrate the strength of maximum entropy model in jointly modeling simple lexical, syntactic, and acoustic features for automatic prosody labeling.

中文翻译:

在最大熵框架中利用声学和句法特征进行自动韵律标记。

在本文中,我们描述了一种利用语言和语音信息的基于最大熵的自动韵律标记框架。我们将所提出的框架应用于音调和中断索引 (ToBI) 注释方案中的突出和短语结构检测。我们的框架利用超级标签形式的新句法特征和类似于韵律轮廓的线性参数化的量化声学韵律特征表示。所提出的模型经过有区别的训练,并且在为韵律检测任务选择合适的特征方面具有鲁棒性。所提出的最大熵声学句法模型在波士顿大学广播新闻语料库上实现了 86.0% 和 93.1% 的音高重音和边界音检测准确率,在波士顿方向语料库上实现了 79.8% 和 90.3%。通过韵律中断索引标记的短语结构检测在两个语料库上分别提供了 84% 和 87% 的准确率。报告的结果明显优于先前报告的结果,并证明了最大熵模型在联合建模用于自动韵律标记的简单词汇、句法和声学特征方面的优势。
更新日期:2019-11-01
down
wechat
bug