当前位置: X-MOL 学术Hum. Brain Mapp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A dual-channel language decoding from brain activity with progressive transfer training
Human Brain Mapping ( IF 3.5 ) Pub Date : 2021-07-27 , DOI: 10.1002/hbm.25603
Wei Huang 1 , Hongmei Yan 1 , Kaiwen Cheng 2 , Yuting Wang 1 , Chong Wang 1 , Jiyi Li 1 , Chen Li 3 , Chaorong Li 1 , Zhentao Zuo 4 , Huafu Chen 1
Affiliation  

When we view a scene, the visual cortex extracts and processes visual information in the scene through various kinds of neural activities. Previous studies have decoded the neural activity into single/multiple semantic category tags which can caption the scene to some extent. However, these tags are isolated words with no grammatical structure, insufficiently conveying what the scene contains. It is well-known that textual language (sentences/phrases) is superior to single word in disclosing the meaning of images as well as reflecting people's real understanding of the images. Here, based on artificial intelligence technologies, we attempted to build a dual-channel language decoding model (DC-LDM) to decode the neural activities evoked by images into language (phrases or short sentences). The DC-LDM consisted of five modules, namely, Image-Extractor, Image-Encoder, Nerve-Extractor, Nerve-Encoder, and Language-Decoder. In addition, we employed a strategy of progressive transfer to train the DC-LDM for improving the performance of language decoding. The results showed that the texts decoded by DC-LDM could describe natural image stimuli accurately and vividly. We adopted six indexes to quantitatively evaluate the difference between the decoded texts and the annotated texts of corresponding visual images, and found that Word2vec-Cosine similarity (WCS) was the best indicator to reflect the similarity between the decoded and the annotated texts. In addition, among different visual cortices, we found that the text decoded by the higher visual cortex was more consistent with the description of the natural image than the lower one. Our decoding model may provide enlightenment in language-based brain-computer interface explorations.

中文翻译:

渐进式迁移训练的大脑活动双通道语言解码

当我们观看一个场景时,视觉皮层通过各种神经活动提取和处理场景中的视觉信息。以前的研究已经将神经活动解码为单个/多个语义类别标签,这些标签可以在一定程度上描述场景。然而,这些标签是孤立的词,没有语法结构,不足以传达场景包含的内容。众所周知,文本语言(句子/短语)在揭示图像的含义以及反映人们对图像的真实理解方面优于单个单词。在这里,我们基于人工智能技术,尝试构建双通道语言解码模型(DC-LDM),将图像诱发的神经活动解码为语言(短语或短句)。DC-LDM 由五个模块组成,即,图像提取器、图像编码器、神经提取器、神经编码器和语言解码器。此外,我们采用渐进式迁移策略来训练 DC-LDM 以提高语言解码的性能。结果表明,DC-LDM解码的文本能够准确、生动地描述自然图像刺激。我们采用六个指标来定量评估解码文本与对应视觉图像的注释文本之间的差异,发现Word2vec-Cosine相似度(WCS)是反映解码文本与注释文本相似度的最佳指标。此外,在不同的视觉皮层中,我们发现由较高的视觉皮层解码的文本比较低的视觉皮层更符合自然图像的描述。
更新日期:2021-09-19
down
wechat
bug