当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Spanish multispeaker database of esophageal speech
Computer Speech & Language ( IF 4.3 ) Pub Date : 2020-11-05 , DOI: 10.1016/j.csl.2020.101168
Luis Serrano García , Sneha Raman , Inma Hernáez Rioja , Eva Navas Cordón , Jon Sanchez , Ibon Saratxaga

A laryngectomee is a person whose larynx has been removed by surgery, usually due to laryngeal cancer. After surgery, most laryngectomees are able to speak again, using techniques that are learned with the help of a speech therapist. This is termed as alaryngeal speech, and esophageal speech (ES) is one of the several alaryngeal speech production modes. A considerable amount of research has been dedicated to the study of alaryngeal speech, with a wide range of aims such as helping speech therapists with evaluation and diagnosis, and improving its quality and intelligibility using digital signal processing techniques. We present to you a database of Spanish ES voices, named AhoSLABI, which is designed to allow the development of new support technologies for this speech impairment. The database primarily consists of recordings of 31 laryngectomees (27 males and 4 females) pronouncing phonetically balanced sentences. Additionally, it includes parallel recordings of the sentences by 9 healthy speakers (6 males and 3 females) to facilitate speech processing tasks that require small parallel corpora, such as voice conversion or synthetic speech adaptation. Apart from the sentences, the database includes sustained vowels and a small set of isolated words, which can be valuable for research on ES analysis, diagnosis and evaluation. The paper describes the main contents of the database, the recording protocols and procedure, as well as the labeling process. The main acoustic characteristics of the voices, such as speaking rate, durations of the recordings, phones and silences, and other such characteristics are compared with those of a reduced set of healthy voices. In addition, we describe an experiment using the database to improve the performance of an ASR system for ES speakers. This new resource will be made available to the scientific community with the hope that it will be used to improve the quality of life of the laryngectomees.



中文翻译:

西班牙多国食道语言数据库

喉切除术是通常由于喉癌而通过手术切除喉部的人。手术后,大多数喉切除术能够使用在语言治疗师的帮助下学习的技术再次说话。这被称为喉头语音,而食道语音(ES)是几种喉头语音产生模式之一。大量的研究致力于语音的研究,其目标广泛,例如帮助语音治疗师进行评估和诊断,以及使用数字信号处理技术提高语音质量和清晰度。我们向您提供一个名为AhoSLABI的西班牙ES语音数据库,该数据库旨在允许开发针对此语音障碍的新支持技术。该数据库主要由31个喉切除组(27个男性和4个女性)的语音发音平衡的句子组成。此外,它还包括9位健康说话者(6位男性和3位女性)对句子的并行记录,以方便需要较小并行语料库的语音处理任务,例如语音转换或合成语音适应。除句子外,数据库还包含持续的元音和少量孤立词,这对于研究ES分析,诊断和评估很有用。本文介绍了数据库的主要内容,记录协议和过程以及标记过程。声音的主要声学特征,例如说话速度,录音持续时间,电话和静音,并将其他此类特征与减少的健康声音进行比较。此外,我们描述了一个使用数据库改善ES发言人ASR系统性能的实验。这项新资源将提供给科学界,希望将其用于改善喉头切除术组的生活质量。

更新日期:2020-11-13
down
wechat
bug