NHSS: A speech and singing parallel database,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

NHSS: A speech and singing parallel database
Speech Communication ( IF 3.2 ) Pub Date : 2021-07-12 , DOI: 10.1016/j.specom.2021.07.002
Bidisha Sharma ₁ , Xiaoxue Gao ₁ , Karthika Vijayan ₁ , Xiaohai Tian ₁ , Haizhou Li ₁

Affiliation

We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak–Sing (NHSS) database. We release this database¹ to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and singing signals, cooperative synthesis of speech and singing voices, and speech-to-singing conversion. This database consists of recordings of sung vocals of English pop songs, the spoken counterpart of lyrics of the songs read by the singers in their natural reading manner, and manually prepared utterance-level and word-level annotations. The audio recordings in the NHSS database correspond to 100 songs sung and spoken by 10 singers, resulting in a total of 7 h of audio data. There are 5 male and 5 female singers, singing and reading the lyrics of 10 songs each. In this paper, we discuss the design methodology of the database, analyze the similarities and dissimilarities in characteristics of speech and singing voices, and provide some strategies to address relationships between these characteristics for converting one to another. We develop benchmark systems, which can be used as reference for speech-to-singing alignment, spectral mapping, and conversion using the NHSS database.

中文翻译：

NHSS：语音和唱歌并行数据库

我们展示了一个由新加坡国立大学 (NUS) 的人类语言技术 (HLT) 实验室收集和发布的语音和歌唱并行录音数据库，称为 NUS-HLT Speak-Sing (NHSS) 数据库。我们发布这个数据库¹向公众提供支持研究活动，包括但不限于语音和歌唱信号声学属性的比较研究、语音和歌声的合作合成以及语音到歌唱的转换。该数据库包括英语流行歌曲的人声录音，歌手以自然阅读方式朗读的歌曲歌词的口语副本，以及人工准备的话语级和单词级注释。NHSS 数据库中的录音对应 10 位歌手演唱和朗读的 100 首歌曲，总共产生 7 小时的音频数据。有5男5女歌手，每人演唱和朗读10首歌曲的歌词。在本文中，我们讨论了数据库的设计方法，分析语音和歌声特征的异同点，并提供一些策略来解决这些特征之间的关系，以实现相互转换。我们开发了基准系统，可用作使用 NHSS 数据库进行语音到歌唱对齐、频谱映射和转换的参考。

更新日期：2021-07-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>