EnsemV3X: a novel ensembled deep learning architecture for multi-label scene classification,PeerJ Computer Science

当前位置： X-MOL 学术 › PeerJ Comput. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

EnsemV3X: a novel ensembled deep learning architecture for multi-label scene classification
PeerJ Computer Science ( IF 3.5 ) Pub Date : 2021-05-25 , DOI: 10.7717/peerj-cs.557
Priyal Sobti ₁ , Anand Nayyar ₂ , Niharika ₁ , Preeti Nagrath ₁

Affiliation

Convolutional neural network is widely used to perform the task of image classification, including pretraining, followed by fine-tuning whereby features are adapted to perform the target task, on ImageNet. ImageNet is a large database consisting of 15 million images belonging to 22,000 categories. Images collected from the Web are labeled using Amazon Mechanical Turk crowd-sourcing tool by human labelers. ImageNet is useful for transfer learning because of the sheer volume of its dataset and the number of object classes available. Transfer learning using pretrained models is useful because it helps to build computer vision models in an accurate and inexpensive manner. Models that have been pretrained on substantial datasets are used and repurposed for our requirements. Scene recognition is a widely used application of computer vision in many communities and industries, such as tourism. This study aims to show multilabel scene classification using five architectures, namely, VGG16, VGG19, ResNet50, InceptionV3, and Xception using ImageNet weights available in the Keras library. The performance of different architectures is comprehensively compared in the study. Finally, EnsemV3X is presented in this study. The proposed model with reduced number of parameters is superior to state-of-of-the-art models Inception and Xception because it demonstrates an accuracy of 91%.

中文翻译：

EnsemV3X：一种用于多标签场景分类的新型集成深度学习架构

卷积神经网络广泛用于在 ImageNet 上执行图像分类任务，包括预训练，然后进行微调，从而调整特征以执行目标任务。 ImageNet 是一个大型数据库，由属于 22,000 个类别的 1500 万张图像组成。从网络收集的图像由人工贴标员使用 Amazon Mechanical Turk 众包工具进行贴标。 ImageNet 对于迁移学习非常有用，因为其数据集数量庞大且可用对象类数量众多。使用预训练模型的迁移学习非常有用，因为它有助于以准确且廉价的方式构建计算机视觉模型。使用已在大量数据集上进行预训练的模型并根据我们的要求重新调整用途。场景识别是计算机视觉在许多社区和行业（例如旅游业）中广泛使用的应用。本研究旨在使用 Keras 库中提供的 ImageNet 权重展示使用五种架构（即 VGG16、VGG19、ResNet50、InceptionV3 和 Xception）的多标签场景分类。研究中全面比较了不同架构的性能。最后，本研究中介绍了 EnsemV3X。所提出的参数数量减少的模型优于最先进的模型 Inception 和 Xception，因为它的准确率高达 91%。

更新日期：2021-05-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文