Overcoming barriers to data sharing with medical image generation: a comprehensive evaluation,npj Digital Medicine

当前位置： X-MOL 学术 › npj Digit. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Overcoming barriers to data sharing with medical image generation: a comprehensive evaluation
npj Digital Medicine ( IF 12.4 ) Pub Date : 2021-09-24 , DOI: 10.1038/s41746-021-00507-3
August DuMont Schütte _{1,

2} , Jürgen Hetzel _{3,

4} , Sergios Gatidis ₅ , Tobias Hepp _{2,

5} , Benedikt Dietz ₁ , Stefan Bauer _{2,

6,

7} , Patrick Schwab ₇

Affiliation

Privacy concerns around sharing personally identifiable information are a major barrier to data sharing in medical research. In many cases, researchers have no interest in a particular individual’s information but rather aim to derive insights at the level of cohorts. Here, we utilise generative adversarial networks (GANs) to create medical imaging datasets consisting entirely of synthetic patient data. The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information. We assess the quality of synthetic data generated by two GAN models for chest radiographs with 14 radiology findings and brain computed tomography (CT) scans with six types of intracranial haemorrhages. We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset. We find that synthetic data performance disproportionately benefits from a reduced number of classes. Our benchmark also indicates that at low numbers of samples per class, label overfitting effects start to dominate GAN training. We conducted a reader study in which trained radiologists discriminate between synthetic and real images. In accordance with our benchmark results, the classification accuracy of radiologists improves with an increasing resolution. Our study offers valuable guidelines and outlines practical conditions under which insights derived from synthetic images are similar to those that would have been derived from real data. Our results indicate that synthetic data sharing may be an attractive alternative to sharing real patient-level data in the right setting.

中文翻译：

克服医学图像生成数据共享的障碍：综合评估

共享个人身份信息的隐私问题是医学研究数据共享的主要障碍。在许多情况下，研究人员对特定个人的信息不感兴趣，而是旨在获得群体层面的见解。在这里，我们利用生成对抗网络（GAN）来创建完全由合成患者数据组成的医学成像数据集。理想情况下，合成图像总体上具有与源数据集类似的统计属性，但不包含敏感的个人信息。我们评估了两个 GAN 模型生成的合成数据的质量，这些数据包括包含 14 种放射学结果的胸部 X 光照片和包含 6 种类型颅内出血的脑部计算机断层扫描 (CT) 扫描。我们通过在合成数据集或真实数据集上训练的预测模型的性能差异来衡量合成图像质量。我们发现，类数量的减少对合成数据性能带来了不成比例的好处。我们的基准还表明，在每类样本数量较少的情况下，标签过度拟合效应开始主导 GAN 训练。我们进行了一项读者研究，其中训练有素的放射科医生区分合成图像和真实图像。根据我们的基准结果，放射科医生的分类准确性随着分辨率的提高而提高。我们的研究提供了宝贵的指导方针，并概述了实际条件，在这些条件下，从合成图像中获得的见解与从真实数据中获得的见解相似。我们的结果表明，在正确的环境下，合成数据共享可能是共享真实患者级数据的一个有吸引力的替代方案。

更新日期：2021-09-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文