当前位置: X-MOL 学术J. Neural Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus
Journal of Neural Engineering ( IF 4 ) Pub Date : 2020-11-25 , DOI: 10.1088/1741-2552/abbfef
Guy H Wilson 1 , Sergey D Stavisky 2, 3, 4 , Francis R Willett 2, 4, 5 , Donald T Avansino 2 , Jessica N Kelemen 6 , Leigh R Hochberg 6, 7, 8, 9 , Jaimie M Henderson 2, 3 , Shaul Druckmann 3, 10 , Krishna V Shenoy 3, 4, 5, 10, 11
Affiliation  

Objective. To evaluate the potential of intracortical electrode array signals for brain-computer interfaces (BCIs) to restore lost speech, we measured the performance of decoders trained to discriminate a comprehensive basis set of 39 English phonemes and to synthesize speech sounds via a neural pattern matching method. We decoded neural correlates of spoken-out-loud words in the ‘hand knob’ area of precentral gyrus, a step toward the eventual goal of decoding attempted speech from ventral speech areas in patients who are unable to speak. Approach. Neural and audio data were recorded while two BrainGate2 pilot clinical trial participants, each with two chronically-implanted 96-electrode arrays, spoke 420 different words that broadly sampled English phonemes. Phoneme onsets were identified from audio recordings, and their identities were then classified from neural features consisting of each electrode’s binned action potential counts or high-frequency local field potential power. Speech synthesis was performed using the ‘Brain-to-Speech’ pattern matching method. We also examined two potential confounds specific to decoding overt speech: acoustic contamination of neural signals and systematic differences in labeling different phonemes’ onset times. Main results. A linear decoder achieved up to 29.3% classification accuracy (chance = 6%) across 39 phonemes, while an RNN classifier achieved 33.9% accuracy. Parameter sweeps indicated that performance did not saturate when adding more electrodes or more training data, and that accuracy improved when utilizing time-varying structure in the data. Microphonic contamination and phoneme onset differences modestly increased decoding accuracy, but could be mitigated by acoustic artifact subtraction and using a neural speech onset marker, respectively. Speech synthesis achieved r = 0.523 correlation between true and reconstructed audio. Significance. The ability to decode speech using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.



中文翻译:

从中央前回背侧皮质内电极阵列解码口语英语

客观。为了评估用于脑机接口 (BCI) 的皮层内电极阵列信号恢复丢失语音的潜力,我们测量了解码器的性能,这些解码器经过训练可以区分 39 个英语音素的综合基组并通过神经模式匹配方法合成语音. 我们在中央前回的“手把手”区域解码了大声说出的单词的神经相关性,这是朝着解码无法说话患者腹侧语音区域尝试语音的最终目标迈出的一步。方法. 神经和音频数据被记录下来,同时两名 BrainGate2 试点临床试验参与者,每人都有两个长期植入的 96 电极阵列,说出 420 个不同的词,这些词广泛地采样了英语音素。从音频记录中识别音素起始,然后根据由每个电极的合并动作电位计数或高频局部场电位功率组成的神经特征对它们的身份进行分类。语音合成是使用“大脑到语音”模式匹配方法进行的。我们还检查了解码公开语音特有的两个潜在混淆:神经信号的声学污染和标记不同音素开始时间的系统差异。主要结果. 线性解码器在 39 个音素上实现了高达 29.3% 的分类准确度(机会 = 6%),而 RNN 分类器实现了 33.9% 的准确度。参数扫描表明,当添加更多电极或更多训练数据时,性能不会饱和,并且在数据中使用时变结构时,准确性会有所提高。微音污染和音素起始差异适度提高了解码准确性,但可以分别通过声学伪影减法和使用神经语音起始标记来减轻。语音合成在真实音频和重构音频之间实现了r = 0.523 的相关性。意义. 使用来自非传统语音区域的皮质内电极阵列信号解码语音的能力表明,将电极阵列放置在腹侧语音区域是语音 BCI 的一个有希望的方向。

更新日期:2020-11-25
down
wechat
bug