当前位置: X-MOL 学术Front. Syst. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Human EEG and recurrent neural networks exhibit common temporal dynamics during speech recognition
Frontiers in Systems Neuroscience ( IF 3 ) Pub Date : 2021-06-10 , DOI: 10.3389/fnsys.2021.617605
Saeedeh Hashemnia 1 , Lukas Grasse 1 , Shweta Soni 1 , Matthew S Tata 1
Affiliation  

Recent deep-learning artificial neural networks have shown remarkable success in recognizing natural human speech, however the reasons for their success are not entirely understood \cite{karpathy2015visualizing}. Success of these methods might be because state-of-the-art networks use recurrent layers or dilated convolutional layers that enable the network to use a time-dependent feature space. The importance of time-dependent features in human cortical mechanisms of speech perception, measured by electroencephalography (EEG) and magnetoencephalography (MEG), have also been of particular recent interest. It is possible that recurrent neural networks (RNNs) achieve their success by emulating aspects of cortical dynamics, albeit through very different computational mechanisms. In that case, we should observe commonalities in the temporal dynamics of deep-learning models, particularly in recurrent layers, and brain electrical activity (EEG) during speech perception. We explored this prediction by presenting the same sentences to both human listeners and the Deep Speech RNN (Hannun et al., 2014) and considered the temporal dynamics of the EEG and RNN units for identical sentences. We tested whether the recently discovered phenomenon of envelope phase tracking in the human EEG is also evident in RNN hidden layers. We furthermore predicted that the clustering of dissimilarity between model representations of pairs of stimuli would be similar in both RNN and EEG dynamics. We found that the dynamics of both the recurrent layer of the network and human EEG signals exhibit envelope phase tracking with similar time lags. We also computed the representational distance matrices (RDMs) of brain and network responses to speech stimuli. The model RDMs became more similar to the brain RDM when going from early network layers to later ones, and eventually peaked at the recurrent layer. These results suggest that the Deep Speech RNN captures a representation of temporal features of speech in a manner similar to human brain.

中文翻译:

人类脑电图和循环神经网络在语音识别过程中表现出共同的时间动态

最近的深度学习人工神经网络在识别自然人类语音方面取得了显着的成功,但其成功的原因尚不完全清楚\cite{karpathy2015visualizing}。这些方法的成功可能是因为最先进的网络使用循环层或扩张卷积层,使网络能够使用时间相关的特征空间。通过脑电图 (EEG) 和脑磁图 (MEG) 测量的人类皮层语音感知机制中的时间相关特征的重要性最近也引起了人们的特别关注。尽管通过非常不同的计算机制,循环神经网络 (RNN) 有可能通过模拟皮层动力学的各个方面来取得成功。在这种情况下,我们应该观察深度学习模型的时间动态的共性,特别是在循环层和语音感知期间的脑电活动 (EEG) 中。我们通过向人类听众和 Deep Speech RNN(Hannun 等人,2014 年)呈现相同的句子来探索这一预测,并考虑了相同句子的 EEG 和 RNN 单元的时间动态。我们测试了最近在人类脑电图中发现的包络相位跟踪现象在 RNN 隐藏层中是否也很明显。我们进一步预测,在 RNN 和 EEG 动力学中,成对刺激的模型表示之间的差异聚类将是相似的。我们发现网络的循环层和人类 EEG 信号的动态表现出具有相似时滞的包络相位跟踪。我们还计算了大脑和网络对语音刺激的反应的代表性距离矩阵 (RDM)。当从早期网络层到后期网络层时,模型 RDM 变得更类似于大脑 RDM,并最终在循环层达到顶峰。这些结果表明,Deep Speech RNN 以类似于人脑的方式捕获了语音的时间特征表示。
更新日期:2021-06-10
down
wechat
bug