Computer Communications ( IF 2.816 ) Pub Date : 2020-07-27 , DOI: 10.1016/j.comcom.2020.07.032 Celestine Iwendi; Syed Atif Moqurrab; Adeel Anjum; Sangeen Khan; Senthilkumar Mohan; Gautam Srivastava
The introduction and rapid growth of the Internet of Medical Things (IoMT), a subset of the Internet of Things (IoT) in the medical and healthcare systems, has brought numerous changes and challenges to current medical and healthcare systems. Healthcare organizations share data about patients with research organizations for various medical discoveries. Releasing such information is a tedious task since it puts the privacy of patients at risk with the understanding that textual health documents about an individual contains specific sensitive terms that need to be sanitized before such document can be released. Recent approaches improved the utility of protected output by substituting sensitive terms with appropriate “generalizations” that are retrieved from several medical and general-purpose knowledge bases (KBs). However, these approaches perform unnecessary sanitization by anonymizing the negated assertions, e.g., AIDS-negative. This paper proposes a semantic privacy framework that effectively sanitizes the sensitive and semantically related terms in healthcare documents. The proposed model effectively identifies the negated assertions (e.g., AIDS-negative) before the sanitization process in IoMT which further improves the utility of sanitized documents. Moreover, besides considering the sensitive medical findings, we also incorporated state-of-the-art metrics, i.e., Protected Health Information (PHI), as defined in the privacy rules such as Health Insurance Portability and Accountability Act (HIPAA), Informatics for Integrating Biology & the Bedside (i2b2), and Materialize Interactive Medical Image Control System (MIMICS). The proposed approach is evaluated on real clinical data provided by i2b2. On average the detection (for both PHI’s and medical findings) accuracy is improved with Precision, Recall and F-measure score at 21%, 51%, and 54% respectively. The overall improved data utility of our proposed model is 8% as compared to C-sanitized and 25% when comparing it with a simple reduction approach. Experimental results show that our approach effectively manages the privacy and utility trade-off as compared to its counterparts.