Skip to main content

Advertisement

Log in

NeuroCrypt: Machine Learning Over Encrypted Distributed Neuroimaging Data

  • Original Article
  • Published:
Neuroinformatics Aims and scope Submit manuscript

Abstract

The field of neuroimaging can greatly benefit from building machine learning models to detect and predict diseases, and discover novel biomarkers, but much of the data collected at various organizations and research centers is unable to be shared due to privacy or regulatory concerns (especially for clinical data or rare disorders). In addition, aggregating data across multiple large studies results in a huge amount of duplicated technical debt and the resources required can be challenging or impossible for an individual site to build. Training on the data distributed across organizations can result in models that generalize much better than models trained on data from any of organizations alone. While there are approaches for decentralized sharing, these often do not provide the highest possible guarantees of sample privacy that only cryptography can provide. In addition, such approaches are often focused on probabilistic solutions. In this paper, we propose an approach that leverages the potential of datasets spread among a number of data collecting organizations by performing joint analyses in a secure and deterministic manner when only encrypted data is shared and manipulated. The approach is based on secure multiparty computation which refers to cryptographic protocols that enable distributed computation of a function over distributed inputs without revealing additional information about the inputs. It enables multiple organizations to train machine learning models on their joint data and apply the trained models to encrypted data without revealing their sensitive data to the other parties. In our proposed approach, organizations (or sites) securely collaborate to build a machine learning model as it would have been trained on the aggregated data of all the organizations combined. Importantly, the approach does not require a trusted party (i.e. aggregator), each contributing site plays an equal role in the process, and no site can learn individual data of any other site. We demonstrate effectiveness of the proposed approach, in a range of empirical evaluations using different machine learning algorithms including logistic regression and convolutional neural network models on human structural and functional magnetic resonance imaging datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://github.com/OpenMined/PySyft

  2. https://pytorch.org/

  3. https://scikit-learn.org/

  4. https://trendscenter.org/

  5. https://www.tensorflow.org/api_docs/python/tf/keras

  6. https://colab.research.google.com/

  7. https://keras.io/api/applications/vgg/

  8. The data was downloaded from the Function BIRN Data Repository, Project AccessionNumber 2007-BDR-6UHZ1.

  9. http://www.fil.ion.ucl.ac.uk/spm/

  10. Thus, it cannot be directly compared with our model.

References

  • Abrol, A., Rokham, H., Calhoun, V.D. (2019). Diagnostic and prognostic classification of brain disorders using residual learning on structural mri data*. In 2019 41St annual international conference of the IEEE engineering in medicine and biology society (EMBC) (p. nil). https://doi.org/10.1109/embc.2019.8857902.

  • Agarwal, A., Dowsley, R., McKinney, N.D., Wu, D., Lin, C.-T., Cock, M.D., Nascimento, Anderson CA. (2019). Protecting privacy of users in brain-computer interface applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(8), 1546–1555.

    Article  Google Scholar 

  • Aine, C.J., Bockholt, H.J., Bustillo, J.R., Cañive, J.M., Caprihan, A., Gasparovic, C., Hanlon, F.M., Houck, J.M, Jung, R.E, Lauriello, J., et al. (2017). Multimodal neuroimaging in schizophrenia: description and dissemination. Neuroinformatics, 15(4), 343–364.

    Article  CAS  Google Scholar 

  • Arbabshirani, M.R., Plis, S., Sui, J., Calhoun, V.D. (2017). Single subject prediction of brain disorders in neuroimaging Promises and pitfalls. NeuroImage, 145(nil), 137–165. https://doi.org/10.1016/j.neuroimage.2016.02.079.

    Article  Google Scholar 

  • Baker, B., Abrol, A., Silva, R.F., Damaraju, E., Sarwate, A.D., Calhoun, V.D., Plis, S.M. (2019). Decentralized temporal independent component analysis: Leveraging fMRI data in collaborative settings. NeuroImage, 186, 557–569. https://doi.org/10.1016/j.neuroimage.2018.10.072.

    Article  Google Scholar 

  • Baker, B.T., Damaraju, E., Silva, R.F., Plis, S.M., Calhoun, V.D. (2020). Decentralized dynamic functional network connectivity: State analysis in collaborative settings. Human Brain Mapping.

  • Baker, B.T., Silva, R.F., Calhoun, V.D., Sarwate, A.D., Plis, S.M. (2015). Large scale collaboration with autonomy: Decentralized data ica. In 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP) (pp. 1–6): IEEE.

  • Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H.B., Patel, S., Ramage, D., Segal, A., Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (pp. 1175–1191).

  • Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D. (2020). Language models are few-shot learners.

  • Carter, K.W., Francis, R.W., Carter, K.W., Francis, R.W., Bresnahan, M., Gissler, M., Grønborg, T.K., Gross, R., Gunnes, N., Hammond, G. (2015). viPAR: a software platform for the virtual pooling and analysis of research data. International Journal of Epidemiology ,dyv193.

  • Cole, J.H., Poudel, Rudra P.K., Tsagkrasoulis, D., Caan, Matthan W.A., Steves, C., Spector, T.D., Montana, G. (2017). Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. NeuroImage, 163(nil), 115–124. https://doi.org/10.1016/j.neuroimage.2017.07.059.

    Article  Google Scholar 

  • Damgård, I., Pastro, V., Smart, N., Zakarias, S. (2012). Multiparty computation from somewhat homomorphic encryption. In Safavi-Naini, Reihaneh, & Canetti, Ran (Eds.) Advances in cryptology – CRYPTO 2012. ISBN 978-3-642-32009-5 (pp. 643–662). Berlin: Springer.

  • Dankar, F.K., Madathil, N., Dankar, S.K., Boughorbel, S. (2019). Privacy-preserving analysis of distributed biomedical data: Designing efficient and secure multiparty computations using distributed statistical learning theory. JMIR medical informatics, 7(2), e12702.

    Article  Google Scholar 

  • Danner, G., & Jelasity, M. (2015). Fully distributed privacy preserving mini-batch gradient descent learning. In IFIP International conference on distributed applications and interoperable systems (pp. 30–44): Springer.

  • Davatzikos, C. (2019). Machine learning in neuroimaging: Progress and challenges. NeuroImage, 197(nil), 652–656. https://doi.org/10.1016/j.neuroimage.2018.10.003.

    Article  Google Scholar 

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Li, F.-F. (2009). Imagenet a large-scale hierarchical image database. In 2009 IEEE Conference on computer vision and pattern recognition (pp. 248–255): IEEE.

  • DeVries, T., & Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arxiv:1708.04552.

  • Doyle, O.M., Mehta, M.A., Brammer, M.J. (2015). The role of machine learning in neuroimaging for drug discovery and development. Psychopharmacology, 232 (21-22), 4179–4189. https://doi.org/10.1007/s00213-015-3968-0.

    Article  CAS  Google Scholar 

  • Du, Y., Fu, Z., Sui, J., Gao, S., Xing, Y., Lin, D., Salman, M., Md, A.R., Abrol, A., Chen, J., Hong, E., Kochunov, P., Osuch, E.A., Calhoun, V.D. (2019). Neuromark: an adaptive independent component analysis framework for estimating reproducible and comparable fmri biomarkers among brain disorders medRxiv.

  • Dwork, C. (2006). Differential privacy. In Bugliesi, Michele, Preneel, Bart, Sassone, Vladimiro, Wegener, Ingo (Eds.) Automata, languages and programming. ISBN 978-3-540-35908-1 (pp. 1–12). Berlin: Springer.

  • Evans, D., Kolesnikov, V., Rosulek, M. (2017). A pragmatic introduction to secure multi-party computation. Foundations and Trends®;in Privacy and Security, 2, 2–3.

    Google Scholar 

  • Fedorov, A., Devon Hjelm, R, Abrol, A., Fu, Z., Du, Y., Plis, S., Calhoun, V.D. (2019). Prediction of progression to alzheimer’s disease with deep infomax. In 2019 IEEE EMBS International conference on biomedical & health informatics (BHI). https://doi.org/10.1109/bhi.2019.8834630 (p. nil).

  • Fischl, B. (2012). Freesurfer. Neuroimage, 62(2), 774–781.

    Article  Google Scholar 

  • Fu, Z., Caprihan, A., Chen, J., Du, Y., Adair, J.C, Sui, J., Rosenberg, G.A, Calhoun, V.D. (2019). Altered static and dynamic functional network connectivity in alzheimer’s disease and subcortical ischemic vascular disease: shared and specific brain connectivity abnormalities. Human Brain Mapping.

  • Greenspan, H., van Ginneken, B., Summers, R.M. (2016). Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging, 35 (5), 1153–1159. https://doi.org/10.1109/tmi.2016.2553401.

    Article  Google Scholar 

  • Gupta, O., & Raskar, R. (2018). Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications, 116, 1–8.

    Article  Google Scholar 

  • Hibar, DP, ENIGMA-Consortium, et al. (2013). Enigma2: genomewide scans of subcortical brain volumes in 16,125 subjects from 28 cohorts worldwide. In 19th Annual Meeting of the Organization for Human Brain Mapping.

  • Hinton, G. (2018). Deep learning-a technology with the potential to transform health care. Journal of the American Medical Association, 320(11), 1101. https://doi.org/10.1001/jama.2018.11100.

    Article  Google Scholar 

  • Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W. (2008). Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genetics, 4(8), e1000167. https://doi.org/10.1371/journal.pgen.1000167.

    Article  Google Scholar 

  • Huettel, S.A., Song, A.W., McCarthy, G. (2014). Functional magnetic resonance imaging. Sunderland: Sinauer. ISBN 9780878936274, https://books.google.com/books?id=CUrVoAEACAAJ.

    Google Scholar 

  • Imtiaz, H., Mohammadi, J., Silva, R., Baker, B., Plis, S.M., Sarwate, A.D., Calhoun, V.D. (2019). Improved differentially private decentralized source separation for fMRI data. Technical Report arXiv:1910.12913 [stat.ML].

  • Imtiaz, H., & Sarwate, A.D. (2018a). Differentially private distributed principal component analysis. In Proceedings of the 43rd IEEE international conference on acoustics, speech and signal processing (ICASSP), Calgary, AB, Canada (pp. 2206–2210). https://doi.org/10.1109/ICASSP.2018.8462519.

  • Imtiaz, H., & Sarwate, A.D. (2018b). Distributed differentially-private algorithms for matrix and tensor factorization. IEEE Journal of Selected Topics in Signal Processing, 12(6), 1449–1464. https://doi.org/10.1109/JSTSP.2018.2877842.

    Article  Google Scholar 

  • Jack, C.R. Jr., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G., Harvey, D., Borowski, B., Britson, P.J., Whitwell, J.L., Ward, C., et al. (2008). The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine, 27(4), 685–691.

    Article  Google Scholar 

  • Jollans, L., Boyle, R., Artiges, E., Banaschewski, T., Desrivières, S., Grigis, A., Martinot, Jean-Luc, Paus, Tomáš, Smolka, M.N., Walter, H., Schumann, G., Garavan, H., Whelan, R. (2019). Quantifying performance of machine learning methods for neuroimaging data. NeuroImage, 199(nil), 351–365. https://doi.org/10.1016/j.neuroimage.2019.05.082.

    Article  Google Scholar 

  • Konečný, J., McMahan, H.B., Yu, F.X., Richtarik, P., Suresh, A.T., Bacon, D. (2016). Federated learning: Strategies for improving communication efficiency. In NIPS workshop on private multi-party machine learning. arXiv:1610.05492.

  • Krizhevsky, A., Nair, V., Hinton, G. Cifar-10 (canadian institute for advanced research). http://www.cs.toronto.edu/kriz/cifar.html.

  • Lauterbur, P.C. (1973). Image formation by induced local interactions: examples employing nuclear magnetic resonance. Nature, 242(5394), 190–191.

    Article  CAS  Google Scholar 

  • LeCun, Y., & Cortes, C. (2010). MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/.

  • Lewis, N., Gazula, H., Plis, S.M., Calhoun, V.D. (2020). Decentralized distribution-sampled classification models with application to brain imaging. Journal of neuroscience methods, 108418, 329.

    Google Scholar 

  • Lewis, N., Plis, S., Calhoun, V. (2017). Cooperative learning: Decentralized data neural network. In 2017 International joint conference on neural networks (IJCNN) (pp. 324–331): IEEE.

  • Li, W., Milletarì, F., Xu, D., Rieke, N., Hancox, J., Zhu, W., Baust, M., Cheng, Y., Ourselin, S., Cardoso, M.J., et al. (2019). Privacy-preserving federated brain tumour segmentation. In International workshop on machine learning in medical imaging (pp. 133–141): Springer.

  • Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., Ginneken, Bram van, Sánchez, C.I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42(nil), 60–88. https://doi.org/10.1016/j.media.2017.07.005.

    Article  Google Scholar 

  • Liu, X., Faes, L., Kale, A.U., Wagner, S.K., Fu, D.J., Bruynseels, A., Mahendiran, T., Moraes, G., Shamdas, M., Kern, C., et al. (2019). A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health, 1(6), e271–e297.

    Article  Google Scholar 

  • Mahmood, U., Rahman, M.M., Fedorov, A., Lewis, N., Fu, Z., Calhoun, V.D., Plis, S.M. (2020). Whole milc: generalizing learned dynamics across tasks, datasets, and populations. arxiv:2007.16041.

  • Mcguire, A.L., Basford, M., Dressler, L.G., Fullerton, S.M., Koenig, B.A., Li, R., McCarty, C.A., Ramos, E., Smith, M.E., Somkin, C.P., Waudby, C., Wolf, W.A., Clayton, E.W. (2011). Ethical and practical challenges of sharing data from genome-wide association studies: The eMERGE Consortium experience. Genome Research, 21(7), 1001–1007. https://doi.org/10.1101/gr.120329.111.

    Article  CAS  Google Scholar 

  • McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (pp. 1273–1282).

  • Mennes, M., Biswal, B.B., Castellanos, F.X., Milham, M.P. (2013). Making data sharing work: the fcp/indi experience. NeuroImage, 82, 683–691.

    Article  Google Scholar 

  • Michel, Jean-Baptiste, Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Holberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L. (2010). Quantitative analysis of culture using millions of digitized books. Science. http://www.sciencemag.org/content/331/6014/176.full.

  • Oh, K., Chung, Y.-C., Ko, W.K., Kim, W.-S., Oh, I.-S. (2019). Classification and visualization of alzheimer’s disease using volumetric convolutional neural network and transfer learning. Scientific Reports, 9(1), 18150. https://doi.org/10.1038/s41598-019-54548-6.

    Article  CAS  Google Scholar 

  • Orrù, G., Pettersson-Yeo, W., Marquand, A.F., Sartori, G., Mechelli, A. (2012). Using support vector machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review. Neuroscience & Biobehavioral Reviews, 36(4), 1140–1152. https://doi.org/10.1016/j.neubiorev.2012.01.004.

    Article  Google Scholar 

  • Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., Talwar, K. (2016). Semi-supervised knowledge transfer for deep learning from private training data. arxiv:1610.05755.

  • Pizarro, R., Assemlal, H.-E., Nigris, D.D., Elliott, C., Antel, S., Arnold, D., Shmuel, A. (2019). Using deep learning algorithms to automatically identify the brain mri contrast: implications for managing large databases. Neuroinformatics, 17(1), 115–130.

    Article  Google Scholar 

  • Plis, S.M., Hjelm, D.R., Salakhutdinov, R., Allen, E.A., Bockholt, H.J., Long, J.D., Johnson, H.J., Paulsen, J.S., Turner, J.A., Calhoun, V.D. (2014). Deep learning for neuroimaging: a validation study. Frontiers in Neuroscience, 8(nil), nil. https://doi.org/10.3389/fnins.2014.00229.

    Google Scholar 

  • Plis, S.M., Sarwate, A.D., Wood, D., Dieringer, C., Landis, D., Reed, C., Panta, S.R., Turner, J.A., Shoemaker, J.M., Carter, Kim. W., Thompson, P., Hutchison, K., Calhoun, V.D. (2016). COINSTAC: A privacy enabled model and prototype for leveraging and processing decentralized brain imaging data. Frontiers in neuroscience, 10, 365. https://doi.org/10.3389/fnins.2016.00365.

    Article  Google Scholar 

  • Poldrack, R.A., Barch, D.M., Mitchell, J., Wager, T., Wagner, A.D., Devlin, J.T., Cumba, C., Koyejo, O., Milham, M. (2013). Toward open sharing of task-based fmri data: the openfmri project. Frontiers in neuroinformatics, 7, 12.

    Article  Google Scholar 

  • Potkin, S.G., & Ford, J.M. (2009). Widespread cortical dysfunction in schizophrenia: the fbirn imaging consortium. Schizophrenia Bulletin, 35(1), 15–18.

    Article  Google Scholar 

  • Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H.R, Albarqouni, S., Bakas, S., N Galtier, M., Landman, B.A, Maier-Hein, K., et al. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3(1), 1–7.

    Article  Google Scholar 

  • Rinck, P. (2014). Magnetic resonance: a critical peer-reviewed introduction. In Magnetic resonance in medicine. The basic textbook of the european magnetic resonance forum (pp. 21–01).

  • Rumelhart, D.E., Hinton, G.E., Williams, R.J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.

    Article  Google Scholar 

  • Saha, D.K., Calhoun, V.D., Panta, S.R., Plis, S.M. (2017). See without looking: joint visualization of sensitive multi-site datasets. In IJCAI (pp. 2672–2678).

  • Sarwate, A.D., Plis, S.M., Turner, J.A., Arbabshirani, M.R., Calhoun, V.D. (2014). Sharing privacy-sensitive access to neuroimaging and genetics data: a review and preliminary validation. Frontiers in Neuroinformatics, 8, 35. https://doi.org/10.3389/fninf.2014.00035.

    Article  Google Scholar 

  • Shen, D., Wu, G., Suk, H.-I. (2017). Deep learning in medical image analysis. Annual Review of Biomedical Engineering, 19(1), 221–248. https://doi.org/10.1146/annurev-bioeng-071516-044442.

    Article  CAS  Google Scholar 

  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354–359.

    Article  CAS  Google Scholar 

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556.

  • Singh, A., Vepakomma, P., Gupta, O., Raskar, R. (2019). Detailed comparison of communication efficiency of split learning and federated learning. arxiv:1909.09145.

  • Speelpenning, B. (1980). Compiling fast partial derivatives of functions given by algorithms. Technical report, Illinois Univ., Urbana (USA). Dept. of Computer Science.

  • Vepa, A. (2020). Fmri train npy19. https://www.kaggle.com/arvindmvepa/fmri-train-npy19. Accessed: 2020-07-17.

  • Verner, E. (2021). Trendscenter/freesurfer-multisite-dataset: updated Readme. https://doi.org/10.5281/zenodo.4631021.

  • Wojtalewicz, N.P., Silva, R.F., Calhoun, V.D., Sarwate, A.D., Plis, S.M. (2017). Decentralized independent vector analysis. In 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 826–830): IEEE.

  • Wolfson, M., Wallace, S.E., Masca, N., Rowe, G., Sheehan, N.A., Ferretti, V., LaFlamme, P., Tobin, M.D., Macleod, J., Little, J., et al. (2010). Datashield: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data. International Journal of Epidemiology, 39(5), 1372–1382.

    Article  Google Scholar 

  • Woo, C.-W., Chang, L.J., Lindquist, M.A., Wager, T.D. (2017). Building better biomarkers: Brain models in translational neuroimaging. Nature Neuroscience, 20(3), 365–377. https://doi.org/10.1038/nn.4478.

    Article  CAS  Google Scholar 

  • Xie, L., Plis, S., & Sarwate, A.D. (2016). Data-weighted ensemble learning for privacy-preserving distributed learning. In 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 2309–2313): IEEE.

  • Zhu, L., & Han, S. (2020). Deep leakage from gradients. In Federated learning (pp. 17–31): Springer.

Download references

Acknowledgements

This work was supported by the National Institute of Health (Grant Numbers 1R01DA040487 and 2RF1MH121885). We are also thankful to Mr. Eric Verner for his support with TReNDS data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nipuna Senanayake.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Senanayake, N., Podschwadt, R., Takabi, D. et al. NeuroCrypt: Machine Learning Over Encrypted Distributed Neuroimaging Data. Neuroinform 20, 91–108 (2022). https://doi.org/10.1007/s12021-021-09525-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12021-021-09525-8

Keywords

Navigation