当前位置: X-MOL 学术J. Math. Psychol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fixed versus mixed RSA: Explaining visual representations by fixed and mixed feature sets from shallow and deep computational models
Journal of Mathematical Psychology ( IF 2.2 ) Pub Date : 2017-02-01 , DOI: 10.1016/j.jmp.2016.10.007
Seyed-Mahdi Khaligh-Razavi 1 , Linda Henriksson 2 , Kendrick Kay 3 , Nikolaus Kriegeskorte 4
Affiliation  

Studies of the primate visual system have begun to test a wide range of complex computational object-vision models. Realistic models have many parameters, which in practice cannot be fitted using the limited amounts of brain-activity data typically available. Task performance optimization (e.g. using backpropagation to train neural networks) provides major constraints for fitting parameters and discovering nonlinear representational features appropriate for the task (e.g. object classification). Model representations can be compared to brain representations in terms of the representational dissimilarities they predict for an image set. This method, called representational similarity analysis (RSA), enables us to test the representational feature space as is (fixed RSA) or to fit a linear transformation that mixes the nonlinear model features so as to best explain a cortical area’s representational space (mixed RSA). Like voxel/population-receptive-field modelling, mixed RSA uses a training set (different stimuli) to fit one weight per model feature and response channel (voxels here), so as to best predict the response profile across images for each response channel. We analysed response patterns elicited by natural images, which were measured with functional magnetic resonance imaging (fMRI). We found that early visual areas were best accounted for by shallow models, such as a Gabor wavelet pyramid (GWP). The GWP model performed similarly with and without mixing, suggesting that the original features already approximated the representational space, obviating the need for mixing. However, a higher ventral-stream visual representation (lateral occipital region) was best explained by the higher layers of a deep convolutional network and mixing of its feature set was essential for this model to explain the representation. We suspect that mixing was essential because the convolutional network had been trained to discriminate a set of 1000 categories, whose frequencies in the training set did not match their frequencies in natural experience or their behavioural importance. The latter factors might determine the representational prominence of semantic dimensions in higher-level ventral-stream areas. Our results demonstrate the benefits of testing both the specific representational hypothesis expressed by a model’s original feature space and the hypothesis space generated by linear transformations of that feature space.

中文翻译:

固定与混合 RSA:通过来自浅层和深层计算模型的固定和混合特征集解释视觉表示

灵长类视觉系统的研究已经开始测试各种复杂的计算对象视觉模型。现实模型有许多参数,实际上无法使用通常可用的有限数量的大脑活动数据来拟合。任务性能优化(例如使用反向传播训练神经网络)为拟合参数和发现适合任务(例如对象分类)的非线性表示特征提供了主要约束。模型表示可以根据它们为图像集预测的表示差异与大脑表示进行比较。这种方法称为代表性相似性分析 (RSA),使我们能够按原样测试表征特征空间(固定 RSA)或拟合混合非线性模型特征的线性变换,以便最好地解释皮层区域的表征空间(混合 RSA)。与体素/种群感受野建模一样,混合 RSA 使用训练集(不同的刺激)来拟合每个模型特征和响应通道(此处为体素)的权重,以便最好地预测每个响应通道的图像之间的响应配置文件。我们分析了由功能磁共振成像 (fMRI) 测量的自然图像引发的反应模式。我们发现早期的视觉区域最好由浅层模型来解释,例如 Gabor 小波金字塔 (GWP)。在混合和不混合的情况下,GWP 模型的表现相似,表明原始特征已经接近表示空间,无需混合。然而,较高的腹侧流视觉表示(枕骨外侧区域)最好由深层卷积网络的较高层来解释,并且其特征集的混合对于该模型解释表示是必不可少的。我们怀疑混合是必不可少的,因为卷积网络已经过训练以区分一组 1000 个类别,这些类别在训练集中的频率与它们在自然经验中的频率或它们的行为重要性不匹配。后一个因素可能决定了语义维度在更高水平的腹流区域中的代表性突出。
更新日期:2017-02-01
down
wechat
bug