当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Graph Random Process for Relational-Thinking-Based Speech Recognition
arXiv - CS - Sound Pub Date : 2020-07-04 , DOI: arxiv-2007.02126
Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang

Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.

中文翻译:

基于关系思维的语音识别的深度图随机过程

作为人类智能的核心,关系思维的特点是最初依赖于无数无意识的知觉,这些知觉与新的感官信号和先验知识之间的关系有关,然后通过这些知觉的耦合和转化成为可识别的概念或对象。这种心理过程很难在现实世界的问题中建模,例如在会话自动语音识别 (ASR) 中,因为感知(如果它们被建模为表示话语之间关系的图形)被认为是无数的并且不能直接观察到。在本文中,我们提出了一种称为深度图随机过程 (DGP) 的贝叶斯非参数深度学习方法,该方法可以生成无限数量的表示感知的概率图。我们进一步提供了一个封闭形式的解决方案,用于耦合和转换这些用于声学建模的感知图。我们的方法能够成功地推断出话语之间的关系,而无需在训练期间使用任何关系数据。包括 CHiME-2 和 CHiME-5 在内的 ASR 任务的实验评估证明了我们方法的有效性和好处。
更新日期:2020-07-09
down
wechat
bug