当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model
arXiv - CS - Computation and Language Pub Date : 2021-02-22 , DOI: arxiv-2102.11114
Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. We leverage the Metadata Extraction (MDE) corpus to construct a task-specific dataset for our study. Since the dataset is small, we propose a novel data augmentation method and use a two-stage training strategy to fine-tune the RoBERTa pre-trained model. On the constructed test set, our model outperforms a production two-step pipeline-based post-processing method by a large margin of 13.26 on readability-aware WER (RA-WER) and 17.53 on BLEU metrics. Human evaluation also demonstrates that our method can generate more human-readable transcripts than the baseline method.

中文翻译:

生成人类可读的转录本,以使用预训练的语言模型进行自动语音识别

现代的自动语音识别(ASR)系统可以在识别精度方面实现高性能。但是,由于流利性,过滤词和口头交流中常见的其他勘误,要阅读出一本非常准确的笔录仍可能会带来挑战。许多下游任务和人类读者都依赖ASR系统的输出。因此,说话人和ASR系统所引入的错误都会传播到管道中的下一个任务。在这项工作中,我们提出了一个ASR后处理模型,旨在将不正确且嘈杂的ASR输出转换为可读的文本,以供人类和下游任务使用。我们利用元数据提取(MDE)语料库为我们的研究构建任务特定的数据集。由于数据集很小,我们提出了一种新颖的数据扩充方法,并使用了两阶段训练策略来微调RoBERTa预训练模型。在构建的测试集上,我们的模型在基于可读性的WER(RA-WER)和BLEU度量标准上分别大出13.26和17.53的优势,胜过基于生产两步流水线的后处理方法。人工评估还表明,与基线方法相比,我们的方法可以生成更多的人类可读的笔录。
更新日期:2021-02-23
down
wechat
bug