当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of state-of-the-art deep learning APIs for image multi-label classification using semantic metrics
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-07-02 , DOI: 10.1016/j.eswa.2020.113656
Adam Kubany , Shimon Ben Ishay , Ruben Sacha Ohayon , Armin Shmilovici , Lior Rokach , Tomer Doitshman

Image understanding heavily relies on accurate multi-label classification. In recent years, deep learning algorithms have become very successful for such tasks, and various commercial and open-source APIs have been released for public use. However, these APIs are often trained on different datasets, which, besides affecting their performance, might pose a challenge to their performance evaluation. This challenge concerns the different object-class dictionaries of the APIs’ training dataset and the benchmark dataset, in which the predicted labels are semantically similar to the benchmark labels but considered different simply because they have different wording in the dictionaries. To face this challenge, we propose semantic similarity metrics to obtain richer understating of the APIs predicted labels and thus their performance. In this study, we evaluate and compare the performance of 13 of the most prominent commercial and open-source APIs in a best-of-breed challenge on the Visual Genome and Open Images benchmark datasets. Our findings demonstrate that, while using traditional metrics, the Microsoft Computer Vision, Imagga, and IBM APIs performed better than others. However, applying semantic metrics also unveil the InceptionResNet-v2, Inception-v3, and ResNet50 APIs, which are trained only with the simple ImageNet dataset, as challengers for top semantic performers.



中文翻译:

使用语义指标对用于图像多标签分类的最新深度学习API进行比较

图像理解在很大程度上取决于准确的多标签分类。近年来,深度学习算法已非常成功地用于此类任务,并且已经发布了各种商业和开源API供公众使用。但是,这些API通常在不同的数据集上进行训练,这不仅影响它们的性能,而且可能对其性能评估构成挑战。这项挑战涉及API的训练数据集和基准数据集的不同的对象类字典,其中,预测标签在语义上与基准标签相似,但是仅仅因为它们在字典中具有不同的措词而被认为是不同的。为了应对这一挑战,我们提出了语义相似性度量,以获得API预测标签的更丰富的低估,从而获得它们的性能。在这个研究中,我们在视觉基因组和开放图像基准数据集的同类最佳挑战中,评估并比较了13种最著名的商业和开源API的性能。我们的发现表明,在使用传统指标时,Microsoft Computer Vision,Imagga和IBM API的性能要优于其他API。但是,应用语义度量还会揭露InceptionResNet-v2,Inception-v3和ResNet50 API,这些API仅使用简单的ImageNet数据集进行培训,成为表现最佳的语义执行者。IBM API的性能要优于其他API。但是,应用语义度量还会揭露InceptionResNet-v2,Inception-v3和ResNet50 API,这些API仅使用简单的ImageNet数据集进行培训,成为表现最佳的语义执行者。IBM API的性能要优于其他API。但是,应用语义度量还会揭露InceptionResNet-v2,Inception-v3和ResNet50 API,它们仅使用简单的ImageNet数据集进行训练,成为表现最佳的语义执行者。

更新日期:2020-07-02
down
wechat
bug