Generating and evaluating cross-sectional synthetic electronic healthcare data: Preserving data utility and patient privacy,Computational Intelligence

当前位置： X-MOL 学术 › Comput. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating and evaluating cross-sectional synthetic electronic healthcare data: Preserving data utility and patient privacy
Computational Intelligence ( IF 2.8 ) Pub Date : 2021-01-03 , DOI: 10.1111/coin.12427
Zhenchen Wang ₁ , Puja Myles ₁ , Allan Tucker ₂

Affiliation

Electronic healthcare record data have been used to study risk factors of disease, treatment effectiveness and safety, and to inform healthcare service planning. There has been increasing interest in utilizing these data for new purposes such as for machine learning to develop predictive algorithms to aid diagnostic and treatment decisions. Synthetic data could potentially be an alternative to real-world data for these purposes as well as reveal any biases in the data used for algorithm development. This article discusses the key requirements of synthetic data for multiple purposes and proposes an approach to generate and evaluate synthetic data focused on, but not limited to, cross-sectional healthcare data. To our knowledge, this is the first article to propose a framework to generate and evaluate synthetic healthcare data with the aim of simultaneously preserving the complexities of ground truth data in the synthetic data while also ensuring privacy. We include findings and new insights from synthetic datasets modeled on both the Indian liver patient dataset and UK primary care dataset to demonstrate the application of this framework under different scenarios.

中文翻译：

生成和评估横截面合成电子医疗数据：保留数据效用和患者隐私

电子医疗记录数据已用于研究疾病，治疗效果和安全性的危险因素，并为医疗服务计划提供信息。越来越多的兴趣将这些数据用于新目的，例如用于机器学习，以开发预测算法以辅助诊断和治疗决策。出于这些目的，合成数据可能会替代现实世界的数据，并揭示用于算法开发的数据中的任何偏差。本文讨论了多种用途的综合数据的关键要求，并提出了一种生成和评估综合数据的方法，该方法的重点是但不限于横断面医疗保健数据。据我们所知，这是第一篇提出用于生成和评估综合医疗保健数据的框架的文章，目的是在保留综合数据中地面真相数据的复杂性的同时还确保隐私。我们包括以印度肝病患者数据集和英国初级保健数据集为模型的综合数据集的发现和新见解，以证明该框架在不同情况下的应用。

更新日期：2021-01-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>