当前位置: X-MOL 学术J. Cogn. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Diverse Deep Neural Networks All Predict Human Inferior Temporal Cortex Well, After Training and Fitting
Journal of Cognitive Neuroscience ( IF 3.2 ) Pub Date : 2021-09-01 , DOI: 10.1162/jocn_a_01755
Katherine R Storrs 1, 2 , Tim C Kietzmann 3, 4 , Alexander Walther 4 , Johannes Mehrer 4 , Nikolaus Kriegeskorte 5
Affiliation  

Deep neural networks (DNNs) trained on object recognition provide the best current models of high-level visual cortex. What remains unclear is how strongly experimental choices, such as network architecture, training, and fitting to brain data, contribute to the observed similarities. Here, we compare a diverse set of nine DNN architectures on their ability to explain the representational geometry of 62 object images in human inferior temporal cortex (hIT), as measured with fMRI. We compare untrained networks to their task-trained counterparts and assess the effect of cross-validated fitting to hIT, by taking a weighted combination of the principal components of features within each layer and, subsequently, a weighted combination of layers. For each combination of training and fitting, we test all models for their correlation with the hIT representational dissimilarity matrix, using independent images and subjects. Trained models outperform untrained models (accounting for 57% more of the explainable variance), suggesting that structured visual features are important for explaining hIT. Model fitting further improves the alignment of DNN and hIT representations (by 124%), suggesting that the relative prevalence of different features in hIT does not readily emerge from the Imagenet object-recognition task used to train the networks. The same models can also explain the disparate representations in primary visual cortex (V1), where stronger weights are given to earlier layers. In each region, all architectures achieved equivalently high performance once trained and fitted. The models' shared properties—deep feedforward hierarchies of spatially restricted nonlinear filters—seem more important than their differences, when modeling human visual representations.



中文翻译:

经过训练和拟合后,各种深度神经网络都能很好地预测人类颞下皮层

经过对象识别训练的深度神经网络 (DNN) 提供了当前最好的高级视觉皮层模型。尚不清楚的是,网络架构、训练和大脑数据拟合等实验选择对观察到的相似性的贡献有多大。在这里,我们比较了一组不同的九个 DNN 架构,它们能够解释人类颞下皮层 (hIT) 中 62 个对象图像的表征几何,如用 fMRI 测量的。我们将未经训练的网络与其经过任务训练的对应物进行比较,并通过对每一层内特征的主要成分进行加权组合,然后对层进行加权组合来评估交叉验证拟合对 hIT 的影响。对于训练和拟合的每种组合,我们使用独立的图像和主题测试所有模型与 hIT 表征差异矩阵的相关性。经过训练的模型优于未经训练的模型(占可解释方差的 57% 以上),表明结构化视觉特征对于解释 hIT 很重要。模型拟合进一步改进了 DNN 和 hIT 表示的对齐(提高了 124%),这表明 hIT 中不同特征的相对普遍性并不容易从用于训练网络的 Imagenet 对象识别任务中出现。相同的模型还可以解释初级视觉皮层 (V1) 中的不同表示,其中较早的层具有更强的权重。在每个区域,所有架构在经过训练和安装后都实现了同等的高性能。模特们的

更新日期:2021-09-12
down
wechat
bug