A neural speech decoding framework leveraging deep learning and speech synthesis,Nature Machine Intelligence

当前位置： X-MOL 学术 › Nat. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A neural speech decoding framework leveraging deep learning and speech synthesis
Nature Machine Intelligence ( IF 23.8 ) Pub Date : 2024-04-08 , DOI: 10.1038/s42256-024-00824-8
Xupeng Chen , Ran Wang , Amirhossein Khalilian-Gourtani , Leyao Yu , Patricia Dugan , Daniel Friedman , Werner Doyle , Orrin Devinsky , Yao Wang , Adeen Flinker

Decoding human speech from neural signals is essential for brain–computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity and high dimensionality. Here we present a novel deep learning-based neural speech decoding framework that includes an ECoG decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable speech synthesizer that maps speech parameters to spectrograms. We have developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation, even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Finally, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage.

中文翻译：

利用深度学习和语音合成的神经语音解码框架

从神经信号中解码人类语音对于旨在恢复神经缺陷人群语音的脑机接口（BCI）技术至关重要。然而，这仍然是一项极具挑战性的任务，而且具有相应语音、数据复杂性和高维性的神经信号的稀缺性使情况变得更加复杂。在这里，我们提出了一种新颖的基于深度学习的神经语音解码框架，其中包括一个将皮层的皮层电图（ECoG）信号转换为可解释的语音参数的 ECoG 解码器，以及一个将语音参数映射到频谱图的新颖的可微分语音合成器。我们开发了一种配套的语音到语音自动编码器，由语音编码器和相同的语音合成器组成，用于生成参考语音参数，以促进 ECoG 解码器训练。该框架可生成听起来自然的语音，并且在 48 名参与者中具有高度可重复性。我们的实验结果表明，我们的模型可以解码具有高相关性的语音，即使仅限于因果操作，这对于实时神经假体的采用是必要的。最后，我们成功解码了左半球或右半球覆盖的参与者的言语，这可能会为因左半球损伤而导致言语缺陷的患者安装言语假肢。

更新日期：2024-04-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>