Unified Mandarin TTS Front-end Based on Distilled BERT Model,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unified Mandarin TTS Front-end Based on Distilled BERT Model
arXiv - CS - Sound Pub Date : 2020-12-31 , DOI: arxiv-2012.15404
Yang Zhang, Liqun Deng, Yasheng Wang

The front-end module in a typical Mandarin text-to-speech system (TTS) is composed of a long pipeline of text processing components, which requires extensive efforts to build and is prone to large accumulative model size and cascade errors. In this paper, a pre-trained language model (PLM) based model is proposed to simultaneously tackle the two most important tasks in TTS front-end, i.e., prosodic structure prediction (PSP) and grapheme-to-phoneme (G2P) conversion. We use a pre-trained Chinese BERT[1] as the text encoder and employ multi-task learning technique to adapt it to the two TTS front-end tasks. Then, the BERT encoder is distilled into a smaller model by employing a knowledge distillation technique called TinyBERT[2], making the whole model size 25% of that of benchmark pipeline models while maintaining competitive performance on both tasks. With the proposed the methods, we are able to run the whole TTS front-end module in a light and unified manner, which is more friendly to deployment on mobile devices.

中文翻译：

基于提炼的BERT模型的统一中文TTS前端

典型的普通话文本语音转换系统（TTS）中的前端模块由很长的文本处理组件流水线组成，这需要大量的构建工作，并且容易出现较大的累积模型大小和级联错误。本文提出了一种基于预训练语言模型（PLM）的模型，以同时解决TTS前端中两个最重要的任务，即韵律结构预测（PSP）和字素到音素（G2P）转换。我们使用经过预训练的中文BERT [1]作为文本编码器，并采用多任务学习技术使其适应两个TTS前端任务。然后，采用称为TinyBERT [2]的知识提炼技术将BERT编码器提炼成较小的模型，使整个模型的大小是基准管道模型的25％，同时在两个任务上都保持竞争优势。通过所提出的方法，我们能够以轻便和统一的方式运行整个TTS前端模块，这对于在移动设备上进行部署更加友好。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>