WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm
arXiv - CS - Sound Pub Date : 2021-01-14 , DOI: arxiv-2101.05478
Akshay Krishna Sheshadri, Anvesh Rao Vijjini, Sukhdeep Kharbanda

Audio Speech Recognition (ASR) systems are evaluated using Word Error Rate (WER) which is calculated by comparing the number of errors between the ground truth and the ASR system's transcription. This calculation, however, requires manual transcription of the speech signal to obtain the ground truth. Since transcribing audio signals is a costly process, Automatic WER Evaluation (e-WER) methods have been developed which attempt to predict the WER of a Speech system by only relying on the transcription and the speech signal features. While WER is a continuous variable, previous works have shown that positing e-WER as a classification problem is more effective than regression. However, while converting to a classification setting, these approaches suffer from heavy class imbalance. In this paper, we propose a new balanced paradigm for e-WER in a classification setting. Within this paradigm, we also propose WER-BERT, a BERT based architecture with speech features for e-WER. Furthermore, we introduce a distance loss function to tackle the ordinal nature of e-WER classification. The proposed approach and paradigm are evaluated on the Librispeech dataset and a commercial (black box) ASR system, Google Cloud's Speech-to-Text API. The results and experiments demonstrate that WER-BERT establishes a new state-of-the-art in automatic WER estimation.

中文翻译：

WER-BERT：平衡序数分类范例中的BERT自动WER估计

音频语音识别（ASR）系统使用单词错误率（WER）进行评估，该单词错误率是通过比较基本事实与ASR系统的抄写之间的错误数来计算的。但是，该计算需要手动转录语音信号以获得地面真相。由于转录音频信号是一个昂贵的过程，因此已经开发了自动WER评估（e-WER）方法，该方法试图仅依靠转录和语音信号特征来预测语音系统的WER。虽然WER是一个连续变量，但以前的工作表明，将e-WER定位为分类问题比回归更为有效。但是，当转换为分类设置时，这些方法会遇到严重的类不平衡问题。在本文中，我们为分类环境中的e-WER提出了一种新的平衡范式。在此范例中，我们还提出了WER-BERT，这是一种基于BERT的架构，具有针对e-WER的语音功能。此外，我们引入了距离损失函数来解决e-WER分类的序数性质。在Librispeech数据集和商业（黑盒）ASR系统（Google Cloud的语音转文本API）上评估了所建议的方法和范例。结果和实验表明，WER-BERT建立了自动WER估算的最新技术。在Librispeech数据集和商业（黑盒）ASR系统（Google Cloud的语音转文本API）上评估了所建议的方法和范例。结果和实验表明，WER-BERT建立了自动WER估算的最新技术。在Librispeech数据集和商业（黑盒）ASR系统（Google Cloud的语音转文本API）上评估了所建议的方法和范例。结果和实验表明，WER-BERT建立了自动WER估算的最新技术。

更新日期：2021-01-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>