当前位置: X-MOL 学术J. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gloss perception: Searching for a deep neural network that behaves like humans.
Journal of Vision ( IF 1.8 ) Pub Date : 2021-11-25 , DOI: 10.1167/jov.21.12.14
Konrad Eugen Prokott 1 , Hideki Tamura 2, 3 , Roland W Fleming 1, 4
Affiliation  

The visual computations underlying human gloss perception remain poorly understood, and to date there is no image-computable model that reproduces human gloss judgments independent of shape and viewing conditions. Such a model could provide a powerful platform for testing hypotheses about the detailed workings of surface perception. Here, we made use of recent developments in artificial neural networks to test how well we could recreate human responses in a high-gloss versus low-gloss discrimination task. We rendered >70,000 scenes depicting familiar objects made of either mirror-like or near-matte textured materials. We trained numerous classifiers to distinguish the two materials in our images-ranging from linear classifiers using simple pixel statistics to convolutional neural networks (CNNs) with up to 12 layers-and compared their classifications with human judgments. To determine which classifiers made the same kinds of errors as humans, we painstakingly identified a set of 60 images in which human judgments are consistently decoupled from ground truth. We then conducted a Bayesian hyperparameter search to identify which out of several thousand CNNs most resembled humans. We found that, although architecture has only a relatively weak effect, high correlations with humans are somewhat more typical in networks of shallower to intermediate depths (three to five layers). We also trained deep convolutional generative adversarial networks (DCGANs) of different depths to recreate images based on our high- and low-gloss database. Responses from human observers show that two layers in a DCGAN can recreate gloss recognizably for human observers. Together, our results indicate that human gloss classification can best be explained by computations resembling early to mid-level vision.

中文翻译:

光泽感知:搜索行为类似于人类的深度神经网络。

人类光泽感知背后的视觉计算仍然知之甚少,迄今为止,还没有图像计算模型可以独立于形状和观看条件来再现人类光泽判断。这样的模型可以提供一个强大的平台来测试关于表面感知的详细工作的假设。在这里,我们利用人工神经网络的最新发展来测试我们在高光泽与低光泽辨别任务中重建人类反应的能力。我们渲染了超过 70,000 个场景,这些场景描绘了由镜面或近无光泽纹理材料制成的熟悉物体。我们训练了许多分类器来区分图像中的两种材料——从使用简单像素统计的线性分类器到多达 12 层的卷积神经网络 (CNN)——并将它们的分类与人类判断进行比较。为了确定哪些分类器犯了与人类相同类型的错误,我们煞费苦心地确定了一组 60 张图像,其中人类判断始终与基本事实脱钩。然后,我们进行了贝叶斯超参数搜索,以确定数千个 CNN 中哪些与人类最相似。我们发现,尽管架构只有相对较弱的影响,但与人类的高度相关性在较浅到中等深度(三到五层)的网络中更为典型。我们还训练了不同深度的深度卷积生成对抗网络 (DCGAN),以根据我们的高光泽度和低光泽度数据库重新创建图像。人类观察者的反应表明,DCGAN 中的两个层可以为人类观察者重新创建可识别的光泽。总之,我们的结果表明,人类光泽分类可以通过类似于早期到中级视觉的计算来最好地解释。
更新日期:2021-11-25
down
wechat
bug