Visual question answering in the medical domain based on deep learning approaches: A comprehensive study,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Visual question answering in the medical domain based on deep learning approaches: A comprehensive study
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2021-07-20 , DOI: 10.1016/j.patrec.2021.07.002
Aisha Al-Sadi ₁ , Mahmoud Al-Ayyoub ₁ , Yaser Jararweh ₂ , Fumie Costen ₃

Affiliation

Visual Question Answering (VQA) in the medical domain has attracted more attention from research communities in the last few years due to its various applications. This paper investigates several deep learning approaches in building a medical VQA system based on ImageCLEF’s VQA-Med dataset, which consists of about 4K images with about 15K question-answer pairs. Due to the wide variety of the images and questions included in this dataset, the proposed model is a hierarchical one consisting of many sub-models, each tailored to handle certain questions. For that, a special model is built to classify the questions into four categories, where each category is handled by a separate sub-model. At their core, all of these models consist of pre-trained Convolution Neural Networks (CNN). In order to get the best results, extensive experiments are performed and various techniques are employed including Data Augmentation (DA), Multi-Task Learning (MTL), Global Average Pooling (GAP), Ensembling, and Sequence to Sequence (Seq2Seq) models. Overall, the final model achieves 60.8 accuracy and 63.4 BLEU score, which are competitive with the state-of-the-art results despite using less demanding and simpler sub-models.

中文翻译：

基于深度学习方法的医学领域视觉问答：一项综合研究

医学领域的视觉问答（VQA）由于其各种应用，在过去几年中引起了研究界的更多关注。本文研究了基于 ImageCLEF 的 VQA-Med 数据集构建医学 VQA 系统的几种深度学习方法，该数据集由约 4K 图像和约 15K 问答对组成。由于该数据集中包含的图像和问题种类繁多，因此所提出的模型是一个分层模型，由许多子模型组成，每个子模型都专门用于处理某些问题。为此，构建了一个特殊模型将问题分为四类，其中每个类别由单独的子模型处理。所有这些模型的核心都是由预训练的卷积神经网络 (CNN) 组成。为了得到最好的结果，进行了广泛的实验并采用了各种技术，包括数据增强 (DA)、多任务学习 (MTL)、全局平均池化 (GAP)、集成和序列到序列 (Seq2Seq) 模型。总体而言，最终模型实现了 60.8 的准确率和 63.4 的 BLEU 分数，尽管使用了要求较低且更简单的子模型，但仍可与最先进的结果竞争。

更新日期：2021-07-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>