Skip to main content
Log in

Healthcare Cramér Generative Adversarial Network (HCGAN)

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Medical data is shared with a wide range for various research purposes and an extensive amount of research has been developed in the data privacy community for anonymization. Unfortunately, Data anonymization techniques do not provide data privacy guarantees and synthetic data generation is an alternative approach in data anonymization. Deep learning has recently achieved more reputation for its high accuracy and privacy concern. Nowadays, deep learning is extensively applied in the medical field for classification, segmentation and privacy-preserving. Using Deep learning, synthetic data can be generated to improve the privacy of the original medical data and also to prevent attacks. Deep learning models capture the relationship between multiple features in medical data. In this research, Healthcare Cramér Generative Adversarial Network (HCGAN) is proposed, where (i) the Quasi Identifiers (QI) are identified in medical data and separated as QI attributes and the remaining attributes are considered as Sensitive Attributes (SA) (ii) f–differential privacy anonymization technique is applied only to the identified QI and the final result is combined with the SA attribute (iii) The anonymized medical data is used as real data for training Cramér Generative Adversarial Network (GAN) where Cramér distance is used to improve the efficiency of the model. (iv) Finally, Privacy is checked by overcoming the attacks. The result shows that the HCGAN method effectively prevents attacks during the training and testing phase compared to Wasserstein GAN. The result demonstrates that health care GAN generates synthetic data that can provide high privacy and overcome various attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Abbreviations

D:

Randomized algorithm

G:

Generator model

h:

Transformation

\(l\) :

Cramér distance

M:

Noise prior

P :

Generator distribution

p:

Joint probability

Q :

Target distribution

R:

Data instances

S, T, S′, T′:

Random variables

\(T\) :

Set of labels

X, Y :

Distribution function

y:

Adjacent database

ε:

Privacy budget

δ:

Failure rate

θ:

Sensitivity

\(\sigma\) :

Common variance

\(\nabla\) :

Stochastic gradient

References

  1. Office of the National Coordinator for Health Information Technology. Guide to Privacy and Security of Health Information. http://www.healthit.gov/sites/default/files/pdf/privacy/privacy-and-security-guide.pdf. Accessed 10 Aug 2012

  2. Sathiya Devi, S., Indhumathi, R.: A study on privacy-preserving approaches in online social networks for data publishing. In: Proceedings of the Advances in Intelligent Systems and Computing (2019)

  3. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems (2014)

  4. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings (2014)

  5. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. (2006). https://doi.org/10.1162/neco.2006.18.7.1527

    Article  MathSciNet  MATH  Google Scholar 

  6. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. (2002). https://doi.org/10.1142/S0218488502001648

    Article  MathSciNet  MATH  Google Scholar 

  7. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-Diversity: privacy beyond k-anonymity. In: Proceedings of the International Conference on Data Engineering (2006)

  8. Ninghui, L., Tiancheng, L., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of the International Conference on Data Engineering (2007)

  9. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. (2013). https://doi.org/10.1561/0400000042

    Article  MathSciNet  MATH  Google Scholar 

  10. Gardner, J., Xiong, L.: HIDE: An integrated system for health information DE-identification. In: Proceedings of the IEEE Symposium on Computer-Based Medical Systems (2008)

  11. Loukides, G., Liagouris, J., Gkoulalas-Divanis, A., Terrovitis, M.: Disassociation for electronic health record privacy. J. Biomed. Inf. (2014). https://doi.org/10.1016/j.jbi.2014.05.009

    Article  Google Scholar 

  12. Prasser, F., Spengler, H., Bild, R., et al.: Privacy-enhancing ETL-processes for biomedical data. Int. J. Med. Inf. (2019). https://doi.org/10.1016/j.ijmedinf.2019.03.006

    Article  Google Scholar 

  13. Lu, Y., Sinnott, R.O., Verspoor, K..: A semantic-based k-anonymity scheme for health record linkage. In: Studies in Health Technology and Informatics (2017)

  14. Lee, H., Kim, S., Kim, J.W., Chung, Y.D.: Utility-preserving anonymization for health data publishing. BMC Med. Inf. Decis. Mak. (2017). https://doi.org/10.1186/s12911-017-0499-0

    Article  Google Scholar 

  15. Zhang, J., Cormode, G., Procopiuc, C.M., et al.: PrivBayes: Private data release via Bayesian networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2014)

  16. Acs, G., Melis, L., Castelluccia, C., De Cristofaro, E.: Differentially private mixture of generative neural networks. IEEE Trans Knowl Data Eng (2019). https://doi.org/10.1109/TKDE.2018.2855136

    Article  Google Scholar 

  17. Kaushik, S., Choudhury, A., Natarajan, S., Pickett, L.A., Dutt, V.: Medicine expenditure prediction via a variance-based generative adversarial network. IEEE Access 8, 110947–110958 (2020)

    Article  Google Scholar 

  18. Li, Y., Wang, Y., Wang, Y., Ke, L., Tan, Y.A.: A feature-vector generative adversarial network for evading PDF malware classifiers. Inf. Sci. 523, 38–48 (2020)

    Article  Google Scholar 

  19. Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Lee, R., Bhavnani, S.P., Byrd, J.B., Greene, C.S.: Privacy-preserving generative deep neural networks support clinical data sharing. Circulation 12(7), e005122 (2019)

    Google Scholar 

  20. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F. and Sun, J.: Generating multi-label discrete patient records using generative adversarial networks. In: Proceedings of the Machine Learning for Healthcare Conference. pp. 286–305. PMLR (2017)

  21. Abay, N.C., Zhou, Y., Kantarcioglu, M., et al.: Privacy preserving synthetic data release using deep learning. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019)

  22. Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership Inference Attacks Against Machine Learning Models. In: Proceedings of the IEEE Symposium on Security and Privacy (2017)

  23. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the ACM Conference on Computer and Communications Security (2015)

  24. Fredrikson, M., Lantz, E., Jha, S., et al.: Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In: Proceedings of the 23rd USENIX Security Symposium (2014)

  25. Hitaj, B., Ateniese, G., Perez-Cruz, F.: Deep Models under the GAN: Information leakage from collaborative deep learning. In: Proceedings of the ACM Conference on Computer and Communications Security (2017)

  26. Elliot, M.J., Manning, A., Mayes, K., Gurd, J., Bane, M.: "SUDA: A Program for Detecting Special Uniques”. In: Proceedings of UNECE Work Session on Statistical Data Confidentiality (2005)

  27. Manning, A.M., Haglin, D.J.: A new algorithm for finding minimal sample uniques for use in statistical disclosure assessment. In: Proceedings of the IEEE International Conference on Data Mining, ICDM (2005)

  28. Lodha, S., Thomas, D.: Probabilistic anonymity. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). https://doi.org/10.1007/978-3-540-78478-4_4 (2008)

  29. Motwani, R., Xu, Y.: Efficient Algorithms for Masking and Finding Quasi-Identifiers. VLDB ’07 (2007)

  30. Dwork, C., Rothblum, G.N., Vadhan, S.: Boosting and differential privacy. In: Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS (2010)

  31. Dwork, C.: Differential Privacy: A Survey of Results. Theory and Applications of Models of Computation. Springer, Berlin (2008)

    MATH  Google Scholar 

  32. Dwork, C., Kenthapadi, K., McSherry, F., et al.: Our data, ourselves: Privacy via distributed noise generation. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006)

  33. Dong, J., Roth, A., Su, W.J.: Gaussian differential privacy. arXiv (2019)

  34. Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: Proceedings of the 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings (2017)

  35. Andoni. A., Indyk, P., Krauthgamer, R.: Earth Mover Distance over high-dimensional spaces. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (2008)

  36. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017 (2017)

  37. Bellemare, M.G., Danihelka, I., Dabney, W., et al.: The cramer distance as a solution to biased wasserstein gradients. arXiv (2017)

  38. Cerda, P., Varoquaux, G.: Encoding high-cardinality string categorical variables. arXiv (2013)

  39. Lichman, M.: UCI Machine Learning Repository. http://archive.uci.edu/ml (2013)

  40. Hospital discharge data public use data life (2018)

  41. Abdar, M., Zomorodi-Moghadam, M., Zhou, X., Gururajan, R., Tao, X., Barua, P.D., Gururajan, R.: A new nested ensemble technique for automated diagnosis of breast cancer. Pattern Recogn. Lett. 132, 123–131 (2020)

    Article  Google Scholar 

  42. Kadam, V.J., Jadhav, S.M., Vijayakumar, K.: Breast cancer diagnosis using feature ensemble learning based on stacked sparse autoencoders and softmax regression. J. Med. Syst. 43(8), 1–11 (2019)

    Article  Google Scholar 

Download references

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Indhumathi.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Indhumathi, R., Devi, S.S. Healthcare Cramér Generative Adversarial Network (HCGAN). Distrib Parallel Databases 40, 657–673 (2022). https://doi.org/10.1007/s10619-021-07346-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-021-07346-x

Keywords

Navigation