Contextualized Keyword Representations for Multi-modal Retinal Image Captioning,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Contextualized Keyword Representations for Multi-modal Retinal Image Captioning
arXiv - CS - Multimedia Pub Date : 2021-04-26 , DOI: arxiv-2104.12471
Jia-Hong Huang, Ting-Wei Wu, Marcel Worring

Medical image captioning automatically generates a medical description to describe the content of a given medical image. A traditional medical image captioning model creates a medical description only based on a single medical image input. Hence, an abstract medical description or concept is hard to be generated based on the traditional approach. Such a method limits the effectiveness of medical image captioning. Multi-modal medical image captioning is one of the approaches utilized to address this problem. In multi-modal medical image captioning, textual input, e.g., expert-defined keywords, is considered as one of the main drivers of medical description generation. Thus, encoding the textual input and the medical image effectively are both important for the task of multi-modal medical image captioning. In this work, a new end-to-end deep multi-modal medical image captioning model is proposed. Contextualized keyword representations, textual feature reinforcement, and masked self-attention are used to develop the proposed approach. Based on the evaluation of the existing multi-modal medical image captioning dataset, experimental results show that the proposed model is effective with the increase of +53.2% in BLEU-avg and +18.6% in CIDEr, compared with the state-of-the-art method.

中文翻译：

用于多模式视网膜图像字幕的上下文关键字表示

医学图像字幕会自动生成医学描述，以描述给定医学图像的内容。传统医学图像字幕模型仅基于单个医学图像输入来创建医学描述。因此，难以基于传统方法来生成抽象医学描述或概念。这种方法限制了医学图像字幕的有效性。多模式医学图像字幕是用于解决此问题的方法之一。在多模式医学图像字幕中，文本输入（例如，专家定义的关键字）被视为医学描述生成的主要驱动力之一。因此，有效地编码文本输入和医学图像对于多模式医学图像字幕的任务都是重要的。在这项工作中，提出了一种新的端到端深度多模式医学图像字幕模型。上下文化的关键字表示，文本特征增强和蒙版的自我注意被用来开发所提出的方法。在对现有多模式医学图像字幕数据集进行评估的基础上，实验结果表明，与当前状态相比，该模型在BLEU-avg中+ 53.2％和CIDEr中+ 18.6％的增长是有效的。艺术方法。

更新日期：2021-04-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>