当前位置: X-MOL 学术Mach. Vis. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gabor capsule network with preprocessing blocks for the recognition of complex images
Machine Vision and Applications ( IF 3.3 ) Pub Date : 2021-06-09 , DOI: 10.1007/s00138-021-01221-6
Mighty Abra Ayidzoe , Yongbin Yu , Patrick Kwabena Mensah , Jingye Cai , Kwabena Adu , Yifan Tang

Capsule network (CapsNet) is a novel concept demonstrating the importance of learning spatial hierarchical relationship between features for the effective recognition of images. However, the baseline capsule network is not suitable for the recognition of complex images leading to its poor performance on such images. This limitation can partially be attributed to the inability of CapsNets to extract important features from the input images as well as the attempt to account for every object in the image including background objects. To address these problems, we propose a variant of a capsule network that is less complex yet robust with strong feature extraction capabilities. The model uses the advantages of Gabor filter and custom preprocessing block to learn the structure and semantic information in the image. This enhances the extraction of only important features, resulting in improved activation diagrams that enable meaningful hierarchical information to be learned. Experimental results show that the proposed model can achieve 85.24%, 68.17%, 94.78% and 91.50% test accuracies on complex images such as CIFAR 10, CIFAR 100, fashion-MNIST and kvasir-dataset-v2 datasets, respectively. The performance of the proposed model is comparable to that of the state-of-the-art models on the five datasets with a relatively small number of parameters.



中文翻译:

用于识别复杂图像的带有预处理块的 Gabor 胶囊网络

胶囊网络(CapsNet)是一个新颖的概念,展示了学习特征之间的空间层次关系对于有效识别图像的重要性。然而,基线胶囊网络不适合复杂图像的识别,导致其在此类图像上的性能不佳。这种限制部分归因于 CapsNets 无法从输入图像中提取重要特征以及试图解释图像中的每个对象,包括背景对象。为了解决这些问题,我们提出了一种胶囊网络的变体,它不太复杂,但具有强大的特征提取能力。该模型利用 Gabor 滤波器和自定义预处理块的优势来学习图像中的结构和语义信息。这增强了仅提取重要特征的能力,从而改进了激活图,从而能够学习有意义的分层信息。实验结果表明,所提出的模型在CIFAR 10、CIFAR 100、fashion-MNIST和kvasir-dataset-v2数据集等复杂图像上分别可以达到85.24%、68.17%、94.78%和91.50%的测试准确率。所提出模型的性能与五个数据集上的最先进模型的性能相当,参数数量相对较少。分别是 fashion-MNIST 和 kvasir-dataset-v2 数据集。所提出模型的性能与五个数据集上的最先进模型的性能相当,参数数量相对较少。分别是 fashion-MNIST 和 kvasir-dataset-v2 数据集。所提出模型的性能与五个数据集上的最先进模型的性能相当,参数数量相对较少。

更新日期:2021-06-10
down
wechat
bug