当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unified Mandarin TTS Front-end Based on Distilled BERT Model
arXiv - CS - Sound Pub Date : 2020-12-31 , DOI: arxiv-2012.15404
Yang Zhang, Liqun Deng, Yasheng Wang

The front-end module in a typical Mandarin text-to-speech system (TTS) is composed of a long pipeline of text processing components, which requires extensive efforts to build and is prone to large accumulative model size and cascade errors. In this paper, a pre-trained language model (PLM) based model is proposed to simultaneously tackle the two most important tasks in TTS front-end, i.e., prosodic structure prediction (PSP) and grapheme-to-phoneme (G2P) conversion. We use a pre-trained Chinese BERT[1] as the text encoder and employ multi-task learning technique to adapt it to the two TTS front-end tasks. Then, the BERT encoder is distilled into a smaller model by employing a knowledge distillation technique called TinyBERT[2], making the whole model size 25% of that of benchmark pipeline models while maintaining competitive performance on both tasks. With the proposed the methods, we are able to run the whole TTS front-end module in a light and unified manner, which is more friendly to deployment on mobile devices.

中文翻译:

基于提炼的BERT模型的统一中文TTS前端

典型的普通话文本语音转换系统(TTS)中的前端模块由很长的文本处理组件流水线组成,这需要大量的构建工作,并且容易出现较大的累积模型大小和级联错误。本文提出了一种基于预训练语言模型(PLM)的模型,以同时解决TTS前端中两个最重要的任务,即韵律结构预测(PSP)和字素到音素(G2P)转换。我们使用经过预训练的中文BERT [1]作为文本编码器,并采用多任务学习技术使其适应两个TTS前端任务。然后,采用称为TinyBERT [2]的知识提炼技术将BERT编码器提炼成较小的模型,使整个模型的大小是基准管道模型的25%,同时在两个任务上都保持竞争优势。通过所提出的方法,我们能够以轻便和统一的方式运行整个TTS前端模块,这对于在移动设备上进行部署更加友好。
更新日期:2021-01-01
down
wechat
bug