ICGA-GPT: report generation and question answering for indocyanine green angiography images,British Journal of Ophthalmology

当前位置： X-MOL 学术 › Br. J. Ophthalmol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ICGA-GPT: report generation and question answering for indocyanine green angiography images
British Journal of Ophthalmology ( IF 4.1 ) Pub Date : 2024-03-26 , DOI: 10.1136/bjo-2023-324446
Xiaolan Chen , Weiyi Zhang , Ziwei Zhao , Pusheng Xu , Yingfeng Zheng , Danli Shi , Mingguang He

Background Indocyanine green angiography (ICGA) is vital for diagnosing chorioretinal diseases, but its interpretation and patient communication require extensive expertise and time-consuming efforts. We aim to develop a bilingual ICGA report generation and question-answering (QA) system. Methods Our dataset comprised 213 129 ICGA images from 2919 participants. The system comprised two stages: image–text alignment for report generation by a multimodal transformer architecture, and large language model (LLM)-based QA with ICGA text reports and human-input questions. Performance was assessed using both qualitative metrics (including Bilingual Evaluation Understudy (BLEU), Consensus-based Image Description Evaluation (CIDEr), Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence (ROUGE-L), Semantic Propositional Image Caption Evaluation (SPICE), accuracy, sensitivity, specificity, precision and F1 score) and subjective evaluation by three experienced ophthalmologists using 5-point scales (5 refers to high quality). Results We produced 8757 ICGA reports covering 39 disease-related conditions after bilingual translation (66.7% English, 33.3% Chinese). The ICGA-GPT model’s report generation performance was evaluated with BLEU scores (1–4) of 0.48, 0.44, 0.40 and 0.37; CIDEr of 0.82; ROUGE of 0.41 and SPICE of 0.18. For disease-based metrics, the average specificity, accuracy, precision, sensitivity and F1 score were 0.98, 0.94, 0.70, 0.68 and 0.64, respectively. Assessing the quality of 50 images (100 reports), three ophthalmologists achieved substantial agreement (kappa=0.723 for completeness, kappa=0.738 for accuracy), yielding scores from 3.20 to 3.55. In an interactive QA scenario involving 100 generated answers, the ophthalmologists provided scores of 4.24, 4.22 and 4.10, displaying good consistency (kappa=0.779). Conclusion This pioneering study introduces the ICGA-GPT model for report generation and interactive QA for the first time, underscoring the potential of LLMs in assisting with automated ICGA image interpretation. Data are available upon reasonable request. The authors do not have the authorisation to distribute the dataset.

中文翻译：

ICGA-GPT：吲哚菁绿血管造影图像的报告生成和问答

背景吲哚菁绿血管造影（ICGA）对于诊断脉络膜视网膜疾病至关重要，但其解释和患者沟通需要广泛的专业知识和耗时的努力。我们的目标是开发双语 ICGA 报告生成和问答 (QA) 系统。方法我们的数据集包含来自 2919 名参与者的 213 129 张 ICGA 图像。该系统包括两个阶段：通过多模式转换器架构生成报告的图像文本对齐，以及基于大语言模型 (LLM) 的 QA，其中包括 ICGA 文本报告和人工输入问题。使用两种定性指标评估表现（包括双语评估研究（BLEU）、基于共识的图像描述评估（CIDEr）、面向回忆的Gisting评估最长公共子序列（ROUGE-L）、语义命题图像描述评估（SPICE））、准确性、敏感性、特异性、精密度和 F1 分数），并由三位经验丰富的眼科医生使用 5 分制进行主观评估（5 表示高质量）。结果经过双语翻译后，我们生成了 8757 份 ICGA 报告，涵盖 39 种疾病相关病症（66.7% 英文，33.3% 中文）。 ICGA-GPT 模型的报告生成性能以 BLEU 分数（1-4）评估为 0.48、0.44、0.40 和 0.37；苹果酒度为 0.82； ROUGE 为 0.41，SPICE 为 0.18。对于基于疾病的指标，平均特异性、准确性、精密度、敏感性和 F1 评分分别为 0.98、0.94、0.70、0.68 和 0.64。在评估 50 张图像（100 份报告）的质量后，三位眼科医生达成了基本一致（完整性 kappa=0.723，准确性 kappa=0.738），得分从 3.20 到 3.55。在涉及 100 个生成答案的交互式 QA 场景中，眼科医生提供的分数为 4.24、4.22 和 4.10，显示出良好的一致性（kappa=0.779）。结论这项开创性研究首次引入了用于报告生成和交互式 QA 的 ICGA-GPT 模型，强调了法学硕士在协助自动 ICGA 图像解释方面的潜力。数据可根据合理要求提供。作者没有分发数据集的授权。

更新日期：2024-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>