Deep Graph Random Process for Relational-Thinking-Based Speech Recognition,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition
arXiv - CS - Multimedia Pub Date : 2020-07-04 , DOI: arxiv-2007.02126
Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang

Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.

中文翻译：

基于关系图的语音识别的深度图随机过程

关系思维是人类智力的核心，其特征是最初依赖于与新的感官信号和先验知识之间的关系有关的无数无意识的感知，因此通过这些感知的耦合和转化成为可识别的概念或对象。这种心理过程很难在现实问题中建模，例如在会话自动语音识别（ASR）中，因为人们认为这些感知（如果将它们建模为表示话语之间关系的图表）是无数的，并且不能直接观察到。在本文中，我们提出了一种称为深度图随机过程（DGP）的贝叶斯非参数深度学习方法，该方法可以生成表示感知的无限数量的概率图。我们进一步提供了用于声学建模的这些感知图的耦合和变换的封闭形式解决方案。我们的方法能够成功地推断出话语之间的关系，而无需在训练过程中使用任何关系数据。对包括CHiME-2和CHiME-5在内的ASR任务的实验评估证明了我们方法的有效性和益处。

更新日期：2020-07-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文