End-to-End Speech Recognition from Federated Acoustic Models,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

End-to-End Speech Recognition from Federated Acoustic Models
arXiv - CS - Sound Pub Date : 2021-04-29 , DOI: arxiv-2104.14297
Yan Gao, Titouan Parcollet, Javier Fernandez-Marques, Pedro P. B. de Gusmao, Daniel J. Beutel, Nicholas D. Lane

Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has recently attracted considerable attention. However, the FL scenarios often presented in the literature are artificial and fail to capture the complexity of real FL systems. In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French Common Voice dataset, a large heterogeneous dataset containing over 10k speakers. We present the first empirical study on attention-based sequence-to-sequence E2E ASR model with three aggregation weighting strategies -- standard FedAvg, loss-based aggregation and a novel word error rate (WER)-based aggregation, are conducted in two realistic FL scenarios: cross-silo with 10-clients and cross-device with 2k-clients. In particular, the WER-based weighting method is proposed to better adapt FL to the context of ASR by integrating the error rate metric with the aggregation process. Our analysis on E2E ASR from heterogeneous and realistic federated acoustic models provides the foundations for future research and development of realistic FL-based ASR applications.

中文翻译：

联合声学模型的端到端语音识别

最近，在联合学习（FL）设置下训练自动语音识别（ASR）模型引起了相当大的关注。但是，文献中经常介绍的FL场景是人为的，无法捕获实际FL系统的复杂性。在本文中，我们使用French Common Voice数据集构建了一个具有挑战性和现实意义的ASR联邦实验设置，该客户端包含具有异构数据分布的客户端，该数据集是一个大型的异构数据集，包含超过10k的发言人。我们提出了基于注意力的序列到序列E2E ASR模型的首次实证研究，该模型具有三种聚合加权策略-标准FedAvg，基于损失的聚合和基于新型误码率（WER）的聚合，是在两个现实中进行的FL场景：具有10个客户端的跨孤岛和具有2k客户端的跨设备。特别地，提出了基于WER的加权方法，以通过将错误率度量与聚合过程集成在一起，使FL更好地适应ASR的上下文。我们对来自异构和逼真的联合声学模型的E2E ASR的分析为将来基于FL的逼真的ASR应用的研究和开发提供了基础。

更新日期：2021-04-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文