当前位置:
X-MOL 学术
›
arXiv.cs.MM
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Deep Graph Random Process for Relational-Thinking-Based Speech Recognition
arXiv - CS - Multimedia Pub Date : 2020-07-04 , DOI: arxiv-2007.02126 Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang
arXiv - CS - Multimedia Pub Date : 2020-07-04 , DOI: arxiv-2007.02126 Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang
Lying at the core of human intelligence, relational thinking is characterized
by initially relying on innumerable unconscious percepts pertaining to
relations between new sensory signals and prior knowledge, consequently
becoming a recognizable concept or object through coupling and transformation
of these percepts. Such mental processes are difficult to model in real-world
problems such as in conversational automatic speech recognition (ASR), as the
percepts (if they are modelled as graphs indicating relationships among
utterances) are supposed to be innumerable and not directly observable. In this
paper, we present a Bayesian nonparametric deep learning method called deep
graph random process (DGP) that can generate an infinite number of
probabilistic graphs representing percepts. We further provide a closed-form
solution for coupling and transformation of these percept graphs for acoustic
modeling. Our approach is able to successfully infer relations among utterances
without using any relational data during training. Experimental evaluations on
ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and
benefits of our method.
中文翻译:
基于关系图的语音识别的深度图随机过程
关系思维是人类智力的核心,其特征是最初依赖于与新的感官信号和先验知识之间的关系有关的无数无意识的感知,因此通过这些感知的耦合和转化成为可识别的概念或对象。这种心理过程很难在现实问题中建模,例如在会话自动语音识别(ASR)中,因为人们认为这些感知(如果将它们建模为表示话语之间关系的图表)是无数的,并且不能直接观察到。在本文中,我们提出了一种称为深度图随机过程(DGP)的贝叶斯非参数深度学习方法,该方法可以生成表示感知的无限数量的概率图。我们进一步提供了用于声学建模的这些感知图的耦合和变换的封闭形式解决方案。我们的方法能够成功地推断出话语之间的关系,而无需在训练过程中使用任何关系数据。对包括CHiME-2和CHiME-5在内的ASR任务的实验评估证明了我们方法的有效性和益处。
更新日期:2020-07-07
中文翻译:
基于关系图的语音识别的深度图随机过程
关系思维是人类智力的核心,其特征是最初依赖于与新的感官信号和先验知识之间的关系有关的无数无意识的感知,因此通过这些感知的耦合和转化成为可识别的概念或对象。这种心理过程很难在现实问题中建模,例如在会话自动语音识别(ASR)中,因为人们认为这些感知(如果将它们建模为表示话语之间关系的图表)是无数的,并且不能直接观察到。在本文中,我们提出了一种称为深度图随机过程(DGP)的贝叶斯非参数深度学习方法,该方法可以生成表示感知的无限数量的概率图。我们进一步提供了用于声学建模的这些感知图的耦合和变换的封闭形式解决方案。我们的方法能够成功地推断出话语之间的关系,而无需在训练过程中使用任何关系数据。对包括CHiME-2和CHiME-5在内的ASR任务的实验评估证明了我们方法的有效性和益处。