Abstract
Medical data is shared with a wide range for various research purposes and an extensive amount of research has been developed in the data privacy community for anonymization. Unfortunately, Data anonymization techniques do not provide data privacy guarantees and synthetic data generation is an alternative approach in data anonymization. Deep learning has recently achieved more reputation for its high accuracy and privacy concern. Nowadays, deep learning is extensively applied in the medical field for classification, segmentation and privacy-preserving. Using Deep learning, synthetic data can be generated to improve the privacy of the original medical data and also to prevent attacks. Deep learning models capture the relationship between multiple features in medical data. In this research, Healthcare Cramér Generative Adversarial Network (HCGAN) is proposed, where (i) the Quasi Identifiers (QI) are identified in medical data and separated as QI attributes and the remaining attributes are considered as Sensitive Attributes (SA) (ii) f–differential privacy anonymization technique is applied only to the identified QI and the final result is combined with the SA attribute (iii) The anonymized medical data is used as real data for training Cramér Generative Adversarial Network (GAN) where Cramér distance is used to improve the efficiency of the model. (iv) Finally, Privacy is checked by overcoming the attacks. The result shows that the HCGAN method effectively prevents attacks during the training and testing phase compared to Wasserstein GAN. The result demonstrates that health care GAN generates synthetic data that can provide high privacy and overcome various attacks.
Similar content being viewed by others
Abbreviations
- D:
-
Randomized algorithm
- G:
-
Generator model
- h:
-
Transformation
- \(l\) :
-
Cramér distance
- M:
-
Noise prior
- P :
-
Generator distribution
- p:
-
Joint probability
- Q :
-
Target distribution
- R:
-
Data instances
- S, T, S′, T′:
-
Random variables
- \(T\) :
-
Set of labels
- X, Y :
-
Distribution function
- y:
-
Adjacent database
- ε:
-
Privacy budget
- δ:
-
Failure rate
- θ:
-
Sensitivity
- \(\sigma\) :
-
Common variance
- \(\nabla\) :
-
Stochastic gradient
References
Office of the National Coordinator for Health Information Technology. Guide to Privacy and Security of Health Information. http://www.healthit.gov/sites/default/files/pdf/privacy/privacy-and-security-guide.pdf. Accessed 10 Aug 2012
Sathiya Devi, S., Indhumathi, R.: A study on privacy-preserving approaches in online social networks for data publishing. In: Proceedings of the Advances in Intelligent Systems and Computing (2019)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings (2014)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. (2006). https://doi.org/10.1162/neco.2006.18.7.1527
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. (2002). https://doi.org/10.1142/S0218488502001648
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-Diversity: privacy beyond k-anonymity. In: Proceedings of the International Conference on Data Engineering (2006)
Ninghui, L., Tiancheng, L., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of the International Conference on Data Engineering (2007)
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. (2013). https://doi.org/10.1561/0400000042
Gardner, J., Xiong, L.: HIDE: An integrated system for health information DE-identification. In: Proceedings of the IEEE Symposium on Computer-Based Medical Systems (2008)
Loukides, G., Liagouris, J., Gkoulalas-Divanis, A., Terrovitis, M.: Disassociation for electronic health record privacy. J. Biomed. Inf. (2014). https://doi.org/10.1016/j.jbi.2014.05.009
Prasser, F., Spengler, H., Bild, R., et al.: Privacy-enhancing ETL-processes for biomedical data. Int. J. Med. Inf. (2019). https://doi.org/10.1016/j.ijmedinf.2019.03.006
Lu, Y., Sinnott, R.O., Verspoor, K..: A semantic-based k-anonymity scheme for health record linkage. In: Studies in Health Technology and Informatics (2017)
Lee, H., Kim, S., Kim, J.W., Chung, Y.D.: Utility-preserving anonymization for health data publishing. BMC Med. Inf. Decis. Mak. (2017). https://doi.org/10.1186/s12911-017-0499-0
Zhang, J., Cormode, G., Procopiuc, C.M., et al.: PrivBayes: Private data release via Bayesian networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2014)
Acs, G., Melis, L., Castelluccia, C., De Cristofaro, E.: Differentially private mixture of generative neural networks. IEEE Trans Knowl Data Eng (2019). https://doi.org/10.1109/TKDE.2018.2855136
Kaushik, S., Choudhury, A., Natarajan, S., Pickett, L.A., Dutt, V.: Medicine expenditure prediction via a variance-based generative adversarial network. IEEE Access 8, 110947–110958 (2020)
Li, Y., Wang, Y., Wang, Y., Ke, L., Tan, Y.A.: A feature-vector generative adversarial network for evading PDF malware classifiers. Inf. Sci. 523, 38–48 (2020)
Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Lee, R., Bhavnani, S.P., Byrd, J.B., Greene, C.S.: Privacy-preserving generative deep neural networks support clinical data sharing. Circulation 12(7), e005122 (2019)
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F. and Sun, J.: Generating multi-label discrete patient records using generative adversarial networks. In: Proceedings of the Machine Learning for Healthcare Conference. pp. 286–305. PMLR (2017)
Abay, N.C., Zhou, Y., Kantarcioglu, M., et al.: Privacy preserving synthetic data release using deep learning. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019)
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership Inference Attacks Against Machine Learning Models. In: Proceedings of the IEEE Symposium on Security and Privacy (2017)
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the ACM Conference on Computer and Communications Security (2015)
Fredrikson, M., Lantz, E., Jha, S., et al.: Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In: Proceedings of the 23rd USENIX Security Symposium (2014)
Hitaj, B., Ateniese, G., Perez-Cruz, F.: Deep Models under the GAN: Information leakage from collaborative deep learning. In: Proceedings of the ACM Conference on Computer and Communications Security (2017)
Elliot, M.J., Manning, A., Mayes, K., Gurd, J., Bane, M.: "SUDA: A Program for Detecting Special Uniques”. In: Proceedings of UNECE Work Session on Statistical Data Confidentiality (2005)
Manning, A.M., Haglin, D.J.: A new algorithm for finding minimal sample uniques for use in statistical disclosure assessment. In: Proceedings of the IEEE International Conference on Data Mining, ICDM (2005)
Lodha, S., Thomas, D.: Probabilistic anonymity. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). https://doi.org/10.1007/978-3-540-78478-4_4 (2008)
Motwani, R., Xu, Y.: Efficient Algorithms for Masking and Finding Quasi-Identifiers. VLDB ’07 (2007)
Dwork, C., Rothblum, G.N., Vadhan, S.: Boosting and differential privacy. In: Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS (2010)
Dwork, C.: Differential Privacy: A Survey of Results. Theory and Applications of Models of Computation. Springer, Berlin (2008)
Dwork, C., Kenthapadi, K., McSherry, F., et al.: Our data, ourselves: Privacy via distributed noise generation. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006)
Dong, J., Roth, A., Su, W.J.: Gaussian differential privacy. arXiv (2019)
Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: Proceedings of the 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings (2017)
Andoni. A., Indyk, P., Krauthgamer, R.: Earth Mover Distance over high-dimensional spaces. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (2008)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017 (2017)
Bellemare, M.G., Danihelka, I., Dabney, W., et al.: The cramer distance as a solution to biased wasserstein gradients. arXiv (2017)
Cerda, P., Varoquaux, G.: Encoding high-cardinality string categorical variables. arXiv (2013)
Lichman, M.: UCI Machine Learning Repository. http://archive.uci.edu/ml (2013)
Hospital discharge data public use data life (2018)
Abdar, M., Zomorodi-Moghadam, M., Zhou, X., Gururajan, R., Tao, X., Barua, P.D., Gururajan, R.: A new nested ensemble technique for automated diagnosis of breast cancer. Pattern Recogn. Lett. 132, 123–131 (2020)
Kadam, V.J., Jadhav, S.M., Vijayakumar, K.: Breast cancer diagnosis using feature ensemble learning based on stacked sparse autoencoders and softmax regression. J. Med. Syst. 43(8), 1–11 (2019)
Funding
This research received no external funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Indhumathi, R., Devi, S.S. Healthcare Cramér Generative Adversarial Network (HCGAN). Distrib Parallel Databases 40, 657–673 (2022). https://doi.org/10.1007/s10619-021-07346-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-021-07346-x