Integrating lexical and prosodic features for automatic paragraph segmentation,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Integrating lexical and prosodic features for automatic paragraph segmentation
Speech Communication ( IF 3.2 ) Pub Date : 2020-05-11 , DOI: 10.1016/j.specom.2020.04.007
Catherine Lai , Mireia Farrús , Johanna D. Moore

Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically identify their discourse structure is an important step to understanding what a spoken document is about. Moreover, finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that integrate representations generated by separate lexical and prosodic models while allowing interactions between these features streams rather than treating them as independent information sources. Application to ASR outputs shows that adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to transcription errors.

中文翻译：

集成词汇和韵律特征以实现自动段落分割

口语文件，例如播客或讲座，在日常生活中的地位越来越高。能够自动识别其话语结构是理解语音文档的重要一步。此外，对于呈现和分析语音内容，更需要细粒度的单元（例如段落）。但是，在基于话题的语音分割方面，在广泛的话题级别以下，还没有进行任何工作。为了检查语音中语篇转换的提示，我们研究了使用词汇和韵律特征的TED演讲自动段落分割。使用支持向量机，AdaBoost，和神经网络表明，使用超句子韵律特征和诱导提示词的模型比基于广泛主题细分中常用的词汇衔接度量类型的模型表现更好。此外，将各种单独的弱词汇和韵律预测变量组合在一起可以提高性能，并且使用递归神经网络对上下文信息进行建模在很大程度上优于其他方法。我们最好的结果来自使用后期融合方法，该方法融合了由单独的词汇模型和韵律模型生成的表示，同时允许这些特征流之间进行交互，而不是将它们视为独立的信息源。在ASR输出中的应用表明，添加韵律特征，尤其是使用后期融合，

更新日期：2020-05-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>