当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A neural decoding algorithm that generates language from visual activity evoked by natural images
Neural Networks ( IF 6.0 ) Pub Date : 2021-08-12 , DOI: 10.1016/j.neunet.2021.08.006
Wei Huang 1 , Hongmei Yan 1 , Kaiwen Cheng 2 , Chong Wang 1 , Jiyi Li 1 , Yuting Wang 1 , Chen Li 3 , Chaorong Li 1 , Yunhan Li 1 , Zhentao Zuo 4 , Huafu Chen 1
Affiliation  

Transforming neural activities into language is revolutionary for human–computer interaction as well as functional restoration of aphasia. Present rapid development of artificial intelligence makes it feasible to decode the neural signals of human visual activities. In this paper, a novel Progressive Transfer Language Decoding Model (PT-LDM) is proposed to decode visual fMRI signals into phrases or sentences when natural images are being watched. The PT-LDM consists of an image-encoder, a fMRI encoder and a language-decoder. The results showed that phrases and sentences were successfully generated from visual activities. Similarity analysis showed that three often-used evaluation indexes BLEU, ROUGE and CIDEr reached 0.182, 0.197 and 0.680 averagely between the generated texts and the corresponding annotated texts in the testing set respectively, significantly higher than the baseline. Moreover, we found that higher visual areas usually had better performance than lower visual areas and the contribution curve of visual response patterns in language decoding varied at successively different time points. Our findings demonstrate that the neural representations elicited in visual cortices when scenes are being viewed have already contained semantic information that can be utilized to generate human language. Our study shows potential application of language-based brain–machine interfaces in the future, especially for assisting aphasics in communicating more efficiently with fMRI signals.



中文翻译:

一种从自然图像诱发的视觉活动中生成语言的神经解码算法

将神经活动转化为语言对于人机交互以及失语症的功能恢复具有革命性意义。当前人工智能的飞速发展使得解码人类视觉活动的神经信号成为可能。在本文中,提出了一种新的渐进式传输语言解码模型 (PT-LDM),用于在观看自然图像时将视觉 fMRI 信号解码为短语或句子。PT-LDM 由图像编码器、fMRI 编码器和语言解码器组成。结果表明,通过视觉活动成功地生成了短语和句子。相似性分析表明,三个常用的评价指标BLEU、ROUGE和CIDEr在生成的文本和测试集中对应的注释文本之间分别平均达到0.182、0.197和0.680,明显高于基线。此外,我们发现较高的视觉区域通常比较低的视觉区域具有更好的性能,并且语言解码中视觉响应模式的贡献曲线在不同的时间点连续变化。我们的研究结果表明,当观看场景时,视觉皮层中引发的神经表征已经包含可用于生成人类语言的语义信息。我们的研究显示了基于语言的脑机接口在未来的潜在应用,特别是帮助失语症患者更有效地与 fMRI 信号进行交流。我们发现较高的视觉区域通常比较低的视觉区域具有更好的性能,并且语言解码中视觉响应模式的贡献曲线在连续不同的时间点变化。我们的研究结果表明,当观看场景时,视觉皮层中引发的神经表征已经包含可用于生成人类语言的语义信息。我们的研究显示了基于语言的脑机接口在未来的潜在应用,特别是帮助失语症患者更有效地与 fMRI 信号进行交流。我们发现较高的视觉区域通常比较低的视觉区域具有更好的性能,并且语言解码中视觉响应模式的贡献曲线在连续不同的时间点变化。我们的研究结果表明,当观看场景时,视觉皮层中引发的神经表征已经包含可用于生成人类语言的语义信息。我们的研究显示了基于语言的脑机接口在未来的潜在应用,特别是帮助失语症患者更有效地与 fMRI 信号进行交流。

更新日期:2021-08-31
down
wechat
bug