Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2023-02-19 , DOI: arxiv-2302.09636
Xinyue Hu, Lin Gu, Kazuma Kobayashi, Qiyuan An, Qingyu Chen, Zhiyong Lu, Chang Su, Tatsuya Harada, Yingying Zhu

Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. Existing medical VQA methods tend to encode medical images and learn the correspondence between visual features and questions without exploiting the spatial, semantic, or medical knowledge behind them. This is partially because of the small size of the current medical VQA dataset, which often includes simple questions. Therefore, we first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images. The questions involved detailed relationships, such as disease names, locations, levels, and types in our dataset. Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs: spatial relationship, semantic relationship, and implicit relationship graphs on the image regions, questions, and semantic labels. The answer and graph reasoning paths are learned for different questions.

中文翻译：

通过多模态关系图学习进行可解释的医学图像视觉问答

医学视觉问答（VQA）旨在回答有关输入医学图像的临床相关问题。这种技术有可能提高医疗专业人员的效率，同时减轻公共卫生系统的负担，特别是在资源贫乏的国家。现有的医学 VQA 方法倾向于对医学图像进行编码，并在不利用其背后的空间、语义或医学知识的情况下学习视觉特征与问题之间的对应关系。这部分是因为当前医学 VQA 数据集的规模较小，通常包含简单的问题。因此，我们首先收集了一个全面的、大规模的医学 VQA 数据集，重点是胸部 X 光图像。这些问题涉及详细的关系，例如我们数据集中的疾病名称、位置、级别和类型。基于这个数据集，我们还提出了一种新的基线方法，通过在图像区域、问题和语义标签上构建三种不同的关系图：空间关系图、语义关系图和隐式关系图。针对不同的问题学习答案和图推理路径。

更新日期：2023-02-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文