当前位置: X-MOL 学术IEEE Access › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CGMVQA: A new Classification and Generative Model for Medical Visual Question Answering
IEEE Access ( IF 3.4 ) Pub Date : 2020-01-01 , DOI: 10.1109/access.2020.2980024
Fuji Ren 1 , Yangyang Zhou 1
Affiliation  

Medical images are playing an important role in the medical domain. A mature medical visual question answering system can aid diagnosis, but there is no satisfactory method to solve this comprehensive problem so far. Considering that there are many different types of questions, we propose a model called CGMVQA, including classification and answer generation capabilities to turn this complex problem into multiple simple problems in this paper. We adopt data augmentation on images and tokenization on texts. We use pre-trained ResNet152 to extract image features and add three kinds of embeddings together to deal with texts. We reduce the parameters of the multi-head self-attention transformer to cut the computational cost down. We adjust the masking and output layers to change the functions of the model. This model establishes new state-of-the-art results: 0.640 of classification accuracy, 0.659 of word matching and 0.678 of semantic similarity in ImageCLEF 2019 VQA-Med data set. It suggests that the CGMVQA is effective in medical visual question answering and can better assist doctors in clinical analysis and diagnosis.

中文翻译:

CGMVQA:医学视觉问答的新分类和生成模型

医学图像在医学领域发挥着重要作用。成熟的医学视觉问答系统可以辅助诊断,但目前还没有令人满意的方法来解决这个综合问题。考虑到有许多不同类型的问题,我们提出了一个名为 CGMVQA 的模型,包括分类和答案生成能力,将这个复杂的问题在本文中转化为多个简单的问题。我们采用图像的数据增强和文本的标记化。我们使用预训练的 ResNet152 来提取图像特征,并添加三种嵌入来处理文本。我们减少了多头自注意力转换器的参数以降低计算成本。我们调整掩蔽层和输出层来改变模型的功能。该模型建立了最新的最新结果:在 ImageCLEF 2019 VQA-Med 数据集中,分类准确率为 0.640,词匹配为 0.659,语义相似度为 0.678。说明CGMVQA在医学视觉问答方面是有效的,可以更好地辅助医生进行临床分析和诊断。
更新日期:2020-01-01
down
wechat
bug