当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining max-pooling and wavelet pooling strategies for semantic image segmentation
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2021-06-12 , DOI: 10.1016/j.eswa.2021.115403
André de Souza Brito , Marcelo Bernardes Vieira , Mauren Louise Sguario Coelho de Andrade , Raul Queiroz Feitosa , Gilson Antonio Giraldi

This paper presents a novel multi-pooling architecture generated by combining the advantages of wavelet and max-pooling operations in convolutional neural networks (CNNs), focusing on semantic segmentation tasks. CNNs often use pooling to reduce the number of parameters, improve invariance to certain distortions, and enlarge the receptive field. However, pooling can cause information loss and thus is detrimental to further operations such as feature extraction and analysis. This problem is particularly critical for semantic segmentation, where each pixel of an image is assigned to a specific class to divide the image into disjoint regions of interest. To address this problem, pooling strategies based on wavelets-operations have been proposed with the promise to achieve a better trade-off between receptive field size and computational efficiency. Previous works have confirmed the superiority of wavelet pooling over the traditional one in semantic segmentation tasks. However, we have observed in our computational experiments that the expressive gains reported from the use of wavelet pooling in other segmentation tasks were not observed in the scope of aerial imagery due to imprecision in the segmentation of image details. The combination of wavelet pooling and max-pooling, a solution not yet reported in the literature, can address that issue. Such gap observed in the pooling area motivated the two proposals that are the main contributions of this paper: (a) A new multi-pooling strategy combining wavelet and traditional pooling in a new network structure suitable for aerial image segmentation tasks; (b) Two-stream architectures using the traditional max-pooling and wavelet pooling as streams. These proposals were implemented using the Segnet, a known architecture for semantic segmentation. The computational experiments, based on the IRRG images from the Potsdam and Vaihingen data sets, demonstrated that the proposed architectures surpassed the original Segnet architecture’s performance with results comparable to state-of-the-art approaches.



中文翻译:

结合最大池化和小波池化策略进行语义图像分割

本文提出了一种新颖的多池化架构,该架构结合了卷积神经网络 (CNN) 中小波和最大池化操作的优势,专注于语义分割任务。CNN 经常使用池化来减少参数的数量,提高对某些扭曲的不变性,并扩大感受野。然而,池化会导致信息丢失,从而不利于进一步的操作,如特征提取和分析。这个问题对于语义分割尤其重要,其中图像的每个像素都被分配到一个特定的类,以将图像划分为不相交的感兴趣区域。为了解决这个问题,已经提出了基于小波操作的池化策略,以期在感受野大小和计算效率之间实现更好的权衡。先前的工作已经证实了小波池在语义分割任务中优于传统的。然而,我们在计算实验中观察到,由于图像细节分割的不精确,在航空影像范围内没有观察到在其他分割任务中使用小波池所报告的表达增益。小波池化和最大池化的组合(文献中尚未报道的解决方案)可以解决该问题。在池化区域中观察到的这种差距激发了本文的两个主要贡献:(a)一种新的多池化策略,将小波和传统池化结合在适合航空图像分割任务的新网络结构中;(b) 使用传统最大池化和小波池化作为流的双流架构。这些提议是使用 Segnet 实现的,Segnet 是一种已知的语义分割架构。基于来自 Potsdam 和 Vaihingen 数据集的 IRRG 图像的计算实验表明,所提出的架构超越了原始 Segnet 架构的性能,其结果可与最先进的方法相媲美。

更新日期:2021-06-18
down
wechat
bug