tructural Similarity Loss for Learning to Fuse Multi-Focus Images,Sensors

当前位置： X-MOL 学术 › Sensors › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

tructural Similarity Loss for Learning to Fuse Multi-Focus Images
Sensors ( IF 3.4 ) Pub Date : 2020-11-20 , DOI: 10.3390/s20226647
Xiang Yan , Syed Zulqarnain Gilani , Hanlin Qin , Ajmal Mian

Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as ‘focused’ or ‘defocused’, and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks.

中文翻译：

学习融合多焦点图像的结构相似性损失

卷积神经网络最近已用于多焦点图像融合。然而，一些现有方法已经采取了向聚焦图像添加高斯模糊的方法，以模拟散焦，从而生成数据（具有真实性）以进行监督学习。此外，他们将像素分类为“聚焦”或“散焦”，并使用分类结果构建融合权重图。然后，这需要一系列的后处理步骤。在本文中，我们提出了一种从多焦点输入图像对直接预测全焦点输出图像的端到端学习方法。建议的方法使用经过训练可进行融合的CNN架构，而无需地面真相融合图像。CNN利用图像结构相似度（SSIM）来计算损失，融合图像质量评估中广泛接受的指标。此外，在设计损失函数时，我们还使用图像局部窗口的标准偏差来自动估计源图像在最终融合图像中的重要性。我们的网络可以接受大小可变的图像，因此，我们可以利用真实的基准数据集而不是模拟数据集来训练我们的网络。该模型是前馈的全卷积神经网络，可以在测试期间处理大小可变的图像。对基准数据集的广泛评估表明，我们的方法在客观和主观基准方面均优于或与现有的最新技术相媲美。

更新日期：2020-11-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11