当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sequential routing framework: Fully capsule network-based speech recognition
Computer Speech & Language ( IF 4.3 ) Pub Date : 2021-04-18 , DOI: 10.1016/j.csl.2021.101228
Kyungmin Lee , Hyunwhan Joe , Hyeontaek Lim , Kwangyoun Kim , Sungsoo Kim , Chang Woo Han , Hong-Gee Kim

Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is the first method to adapt a CapsNet-only structure to sequence-to-sequence recognition. Input sequences are capsulized then sliced by a window size. Each slice is classified to a label at the corresponding time through iterative routing mechanisms. Afterwards, losses are computed by connectionist temporal classification (CTC). During routing, the required number of parameters can be controlled by the window size regardless of the length of sequences by sharing learnable weights across the slices. We additionally propose a sequential dynamic routing algorithm to replace traditional dynamic routing. The proposed technique can minimize decoding speed degradation caused by the routing iterations since it can operate in a non-iterative manner without dropping accuracy. The method achieves a 1.1% lower word error rate at 16.9% on the Wall Street Journal corpus compared to bidirectional long short-term memory-based CTC networks. On the TIMIT corpus, it attains a 0.7% lower phone error rate at 17.5% compared to convolutional neural network-based CTC networks (Zhang et al., 2016).



中文翻译:

顺序路由框架:基于完全胶囊网络的语音识别

胶囊网络(CapsNets)最近作为一种新型的神经体系结构而受到关注。本文介绍了顺序路由框架,我们认为这是使CapsNet-only结构适应序列间识别的第一种方法。输入序列被封装,然后按窗口大小切片。通过迭代路由机制将每个切片在相应的时间分类为一个标签。之后,通过连接主义的时间分类(CTC)计算损失。在路由过程中,可以通过共享切片之间的可学习权重,由窗口大小控制所需数量的参数,而不管序列的长度如何。我们还提出了一种顺序动态路由算法来代替传统的动态路由。所提出的技术可以最小化由路由迭代引起的解码速度降低,因为它可以以非迭代的方式操作而不会降低精度。与双向长短期基于记忆的CTC网络相比,该方法在《华尔街日报》语料库上实现了1.1%的更低的单词错误率,仅为16.9%。与基于卷积神经网络的CTC网络相比,TIMIT语料库的电话错误率降低了0.7%,为17.5%(Zhang et al。,2016)。

更新日期:2021-04-29
down
wechat
bug