Visual complexity analysis using deep intermediate-layer features,Computer Vision and Image Understanding

当前位置： X-MOL 学术 › Comput. Vis. Image Underst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Visual complexity analysis using deep intermediate-layer features
Computer Vision and Image Understanding ( IF 4.5 ) Pub Date : 2020-04-02 , DOI: 10.1016/j.cviu.2020.102949
Elham Saraee , Mona Jalal , Margrit Betke

In this paper, we focus on visual complexity, an image attribute that humans can subjectively evaluate based on the level of details in the image. We explore unsupervised information extraction from intermediate convolutional layers of deep neural networks to measure visual complexity. We derive an activation energy metric that combines convolutional layer activations to quantify visual complexity. To show the effectiveness of our proposed metric for various applications, we introduce Savoias, a visual complexity dataset that compromises of more than 1,400 images from seven diverse image categories (e.g., advertisement and interior design). We demonstrate high correlations of our deep neural network-based measure of visual complexity with human-curated ground-truth (GT) scores on various widely used network architectures, e.g., VGG16, ResNet-v2-152, and EfficientNet, and in networks trained on two classification tasks (object and scene classification). This result reveals that intermediate convolutional layers of deep neural networks carry information about the complexity of images that is meaningful to people. Furthermore, we show that our method of measuring visual complexity outperforms traditional methods on Savoias and two other state-of-the-art benchmark datasets. Moreover, we perform extensive analysis on the performance difference between our unsupervised method and a supervised method trained on the feature map, and show that by supervision, we can improve the prediction. Finally, we demonstrate that, within the context of a category, visually more complex images are also more memorable to human observers.

中文翻译：

使用深层中间层功能进行视觉复杂性分析

在本文中，我们着眼于视觉复杂性，即人类可以根据图像中的细节水平主观评估的图像属性。我们探索了从深度神经网络的中间卷积层中无监督的信息提取，以测量视觉复杂性。我们得出了结合卷积层激活来量化视觉复杂度的激活能量度量。为了展示我们提出的度量标准在各种应用中的有效性，我们介绍了Savoias，一种视觉复杂性数据集，其中包含来自七个不同图像类别（例如，广告和室内设计）的1,400多个图像。我们证明了基于深度神经网络的视觉复杂性测度与人类策展的地面真相（GT）得分在各种广泛使用的网络体系结构（例如VGG16，ResNet-v2-152和EfficientNet）以及经过训练的网络中的高度相关性在两个分类任务上（对象和场景分类）。该结果表明，深层神经网络的中间卷积层承载着对人们有意义的图像复杂性信息。此外，我们证明了我们测量视觉复杂性的方法优于Savoias上的传统方法和另外两个最新的基准数据集。此外，我们对无监督方法与在特征图上训练的有监督方法之间的性能差异进行了广泛的分析，并表明通过监督，我们可以改善预测。最后，我们证明，在一个类别的上下文中，视觉上更复杂的图像对于人类观察者来说也更加令人难忘。

更新日期：2020-04-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>