当前位置: X-MOL 学术J. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Training for object recognition with increasing spatial frequency: A comparison of deep learning with human vision.
Journal of Vision ( IF 2.0 ) Pub Date : 2021-9-18 , DOI: 10.1167/jov.21.10.14
Lev Kiar Avberšek 1, 2 , Astrid Zeman 1 , Hans Op de Beeck 1
Affiliation  

The ontogenetic development of human vision and the real-time neural processing of visual input exhibit a striking similarity-a sensitivity toward spatial frequencies that progresses in a coarse-to-fine manner. During early human development, sensitivity for higher spatial frequencies increases with age. In adulthood, when humans receive new visual input, low spatial frequencies are typically processed first before subsequent processing of higher spatial frequencies. We investigated to what extent this coarse-to-fine progression might impact visual representations in artificial vision and compared this to adult human representations. We simulated the coarse-to-fine progression of image processing in deep convolutional neural networks (CNNs) by gradually increasing spatial frequency information during training. We compared CNN performance after standard and coarse-to-fine training with a wide range of datasets from behavioral and neuroimaging experiments. In contrast to humans, CNNs that are trained using the standard protocol are very insensitive to low spatial frequency information, showing very poor performance in being able to classify such object images. By training CNNs using our coarse-to-fine method, we improved the classification accuracy of CNNs from 0% to 32% on low-pass-filtered images taken from the ImageNet dataset. The coarse-to-fine training also made the CNNs more sensitive to low spatial frequencies in hybrid images with conflicting information in different frequency bands. When comparing differently trained networks on images containing full spatial frequency information, we saw no representational differences. Overall, this integration of computational, neural, and behavioral findings shows the relevance of the exposure to and processing of inputs with variation in spatial frequency content for some aspects of high-level object representations.

中文翻译:

空间频率增加的物体识别训练:深度学习与人类视觉的比较。

人类视觉的个体发育和视觉输入的实时神经处理表现出惊人的相似性——对空间频率的敏感性以粗到细的方式发展。在人类早期发展过程中,对更高空间频率的敏感性随着年龄的增长而增加。在成年期,当人类接收到新的视觉输入时,通常首先处理低空间频率,然后再处理更高的空间频率。我们调查了这种从粗到细的进展可能会在多大程度上影响人工视觉中的视觉表征,并将其与成人人类表征进行比较。我们通过在训练期间逐渐增加空间频率信息来模拟深度卷积神经网络 (CNN) 中图像处理的粗到细进展。我们将标准训练和从粗到细训练后的 CNN 性能与来自行为和神经影像实验的各种数据集进行了比较。与人类相比,使用标准协议训练的 CNN 对低空间频率信息非常不敏感,在对此类对象图像进行分类方面表现出非常差的性能。通过使用我们从粗到精的方法训练 CNN,我们将 CNN 的分类准确率从 0% 提高到 32%,对取自 ImageNet 数据集的低通滤波图像。由粗到细的训练还使 CNN 对不同频段信息冲突的混合图像中的低空间频率更加敏感。当在包含完整空间频率信息的图像上比较不同训练的网络时,我们没有发现代表性差异。总体,
更新日期:2021-09-18
down
wechat
bug