当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Weight factorization for Multilingual Speech Recognition
arXiv - CS - Sound Pub Date : 2021-05-07 , DOI: arxiv-2105.03010
Ngoc-Quan Pham, Tuan-Nam Nguyen, Sebastian Stueker, Alexander Waibel

End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages, resulting in a single neural network to handle transcribing different languages. Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously. In this paper we propose a novel multilingual architecture that targets the core operation in neural networks: linear transformation functions. The key idea of the method is to assign fast weight matrices for each language by decomposing each weight matrix into a shared component and a language dependent component. The latter is then factorized into vectors using rank-1 assumptions to reduce the number of parameters per language. This efficient factorization scheme is proved to be effective in two multilingual settings with $7$ and $27$ languages, reducing the word error rates by $26\%$ and $27\%$ rel. for two popular architectures LSTM and Transformer, respectively.

中文翻译:

高效的权重分解技术,可进行多语言语音识别

端到端多语言语音识别涉及对包括多种语言的组合语音语料库使用单个模型训练,从而产生单个神经网络来处理转录不同的语言。由于训练数据中的每种语言具有不同的特性,因此共享网络可能难以同时针对所有各种语言进行优化。在本文中,我们提出了一种针对神经网络核心操作的新型多语言体系结构:线性变换函数。该方法的关键思想是通过将每个权重矩阵分解为一个共享组件和一个与语言相关的组件来为每种语言分配快速权重矩阵。然后使用等级1假设将后者分解为向量,以减少每种语言的参数数量。事实证明,这种有效的因式分解方案在$ 7 $和$ 27 $语言的两种多语言设置中均有效,从而将单词错误率降低了$ 26 \%$和$ 27 \%$ rel。分别适用于两种流行的架构LSTM和Transformer。
更新日期:2021-05-10
down
wechat
bug