N-Sanitization: A semantic privacy-preserving framework for unstructured medical datasets,Computer Communications

当前位置： X-MOL 学术 › Comput. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

N-Sanitization: A semantic privacy-preserving framework for unstructured medical datasets
Computer Communications ( IF 4.5 ) Pub Date : 2020-07-27 , DOI: 10.1016/j.comcom.2020.07.032
Celestine Iwendi , Syed Atif Moqurrab , Adeel Anjum , Sangeen Khan , Senthilkumar Mohan , Gautam Srivastava

The introduction and rapid growth of the Internet of Medical Things (IoMT), a subset of the Internet of Things (IoT) in the medical and healthcare systems, has brought numerous changes and challenges to current medical and healthcare systems. Healthcare organizations share data about patients with research organizations for various medical discoveries. Releasing such information is a tedious task since it puts the privacy of patients at risk with the understanding that textual health documents about an individual contains specific sensitive terms that need to be sanitized before such document can be released. Recent approaches improved the utility of protected output by substituting sensitive terms with appropriate “generalizations” that are retrieved from several medical and general-purpose knowledge bases (KBs). However, these approaches perform unnecessary sanitization by anonymizing the negated assertions, e.g., AIDS-negative. This paper proposes a semantic privacy framework that effectively sanitizes the sensitive and semantically related terms in healthcare documents. The proposed model effectively identifies the negated assertions (e.g., AIDS-negative) before the sanitization process in IoMT which further improves the utility of sanitized documents. Moreover, besides considering the sensitive medical findings, we also incorporated state-of-the-art metrics, i.e., Protected Health Information (PHI), as defined in the privacy rules such as Health Insurance Portability and Accountability Act (HIPAA), Informatics for Integrating Biology & the Bedside (i2b2), and Materialize Interactive Medical Image Control System (MIMICS). The proposed approach is evaluated on real clinical data provided by i2b2. On average the detection (for both PHI’s and medical findings) accuracy is improved with Precision, Recall and F-measure score at 21%, 51%, and 54% respectively. The overall improved data utility of our proposed model is 8% as compared to C-sanitized and 25% when comparing it with a simple reduction approach. Experimental results show that our approach effectively manages the privacy and utility trade-off as compared to its counterparts.

中文翻译：

N消毒：用于非结构化医疗数据集的语义隐私保留框架

医疗物联网（IoMT）的引入和快速发展是医疗和保健系统中物联网（IoT）的子集，给当前的医疗和保健系统带来了许多变化和挑战。医疗保健组织与研究组织共享有关各种医学发现的患者数据。释放此类信息是一项繁琐的任务，因为它使患者的隐私受到威胁，同时要了解有关个人的文本健康文档包含特定的敏感术语，在发布此类文档之前需要对其进行消毒。最近的方法通过将敏感术语替换为适当的“概括”来提高受保护输出的效用，这些概括是从几个医学和通用知识库（KB）中检索到的。然而，这些方法通过匿名化否定的断言（例如，阴性的AIDS）来执行不必要的消毒。本文提出了一种语义隐私框架，可以有效地对医疗文档中的敏感术语和语义相关术语进行消毒。提出的模型可以有效地在IoMT的消毒过程之前识别出被否定的主张（例如，艾滋病阴性），从而进一步提高了消毒文件的效用。此外，除了考虑敏感的医学发现外，我们还采用了最新的衡量标准，即隐私规则（如《健康保险可移植性和责任法案》（HIPAA），整合生物学和床头（i2b2），并实现交互式医学图像控制系统（MIMICS）。根据i2b2提供的真实临床数据对提出的方法进行了评估。平均来说，对于PHI和医学发现，检测的准确度通过Precision，Recall和F-measure得分分别提高了21％，51％和54％。与C-sanitized方法相比，我们提出的模型的整体数据实用性提高了8％，而与简单归约方法相比，则提高了25％。实验结果表明，与其他方法相比，我们的方法可以有效地管理隐私权和公用事业权衡。与C-sanitized方法相比，我们提出的模型的整体数据实用性提高了8％，而与简单归约方法相比，则提高了25％。实验结果表明，与其他方法相比，我们的方法可以有效地管理隐私权和公用事业权衡。与C-sanitized方法相比，我们提出的模型的整体数据实用性提高了8％，而与简单归约方法相比，则提高了25％。实验结果表明，与其他方法相比，我们的方法可以有效地管理隐私权和公用事业权衡。

更新日期：2020-07-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11