Abstract
The field of neuroimaging can greatly benefit from building machine learning models to detect and predict diseases, and discover novel biomarkers, but much of the data collected at various organizations and research centers is unable to be shared due to privacy or regulatory concerns (especially for clinical data or rare disorders). In addition, aggregating data across multiple large studies results in a huge amount of duplicated technical debt and the resources required can be challenging or impossible for an individual site to build. Training on the data distributed across organizations can result in models that generalize much better than models trained on data from any of organizations alone. While there are approaches for decentralized sharing, these often do not provide the highest possible guarantees of sample privacy that only cryptography can provide. In addition, such approaches are often focused on probabilistic solutions. In this paper, we propose an approach that leverages the potential of datasets spread among a number of data collecting organizations by performing joint analyses in a secure and deterministic manner when only encrypted data is shared and manipulated. The approach is based on secure multiparty computation which refers to cryptographic protocols that enable distributed computation of a function over distributed inputs without revealing additional information about the inputs. It enables multiple organizations to train machine learning models on their joint data and apply the trained models to encrypted data without revealing their sensitive data to the other parties. In our proposed approach, organizations (or sites) securely collaborate to build a machine learning model as it would have been trained on the aggregated data of all the organizations combined. Importantly, the approach does not require a trusted party (i.e. aggregator), each contributing site plays an equal role in the process, and no site can learn individual data of any other site. We demonstrate effectiveness of the proposed approach, in a range of empirical evaluations using different machine learning algorithms including logistic regression and convolutional neural network models on human structural and functional magnetic resonance imaging datasets.
Similar content being viewed by others
Notes
The data was downloaded from the Function BIRN Data Repository, Project AccessionNumber 2007-BDR-6UHZ1.
Thus, it cannot be directly compared with our model.
References
Abrol, A., Rokham, H., Calhoun, V.D. (2019). Diagnostic and prognostic classification of brain disorders using residual learning on structural mri data*. In 2019 41St annual international conference of the IEEE engineering in medicine and biology society (EMBC) (p. nil). https://doi.org/10.1109/embc.2019.8857902.
Agarwal, A., Dowsley, R., McKinney, N.D., Wu, D., Lin, C.-T., Cock, M.D., Nascimento, Anderson CA. (2019). Protecting privacy of users in brain-computer interface applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(8), 1546–1555.
Aine, C.J., Bockholt, H.J., Bustillo, J.R., Cañive, J.M., Caprihan, A., Gasparovic, C., Hanlon, F.M., Houck, J.M, Jung, R.E, Lauriello, J., et al. (2017). Multimodal neuroimaging in schizophrenia: description and dissemination. Neuroinformatics, 15(4), 343–364.
Arbabshirani, M.R., Plis, S., Sui, J., Calhoun, V.D. (2017). Single subject prediction of brain disorders in neuroimaging Promises and pitfalls. NeuroImage, 145(nil), 137–165. https://doi.org/10.1016/j.neuroimage.2016.02.079.
Baker, B., Abrol, A., Silva, R.F., Damaraju, E., Sarwate, A.D., Calhoun, V.D., Plis, S.M. (2019). Decentralized temporal independent component analysis: Leveraging fMRI data in collaborative settings. NeuroImage, 186, 557–569. https://doi.org/10.1016/j.neuroimage.2018.10.072.
Baker, B.T., Damaraju, E., Silva, R.F., Plis, S.M., Calhoun, V.D. (2020). Decentralized dynamic functional network connectivity: State analysis in collaborative settings. Human Brain Mapping.
Baker, B.T., Silva, R.F., Calhoun, V.D., Sarwate, A.D., Plis, S.M. (2015). Large scale collaboration with autonomy: Decentralized data ica. In 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP) (pp. 1–6): IEEE.
Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H.B., Patel, S., Ramage, D., Segal, A., Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (pp. 1175–1191).
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D. (2020). Language models are few-shot learners.
Carter, K.W., Francis, R.W., Carter, K.W., Francis, R.W., Bresnahan, M., Gissler, M., Grønborg, T.K., Gross, R., Gunnes, N., Hammond, G. (2015). viPAR: a software platform for the virtual pooling and analysis of research data. International Journal of Epidemiology ,dyv193.
Cole, J.H., Poudel, Rudra P.K., Tsagkrasoulis, D., Caan, Matthan W.A., Steves, C., Spector, T.D., Montana, G. (2017). Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. NeuroImage, 163(nil), 115–124. https://doi.org/10.1016/j.neuroimage.2017.07.059.
Damgård, I., Pastro, V., Smart, N., Zakarias, S. (2012). Multiparty computation from somewhat homomorphic encryption. In Safavi-Naini, Reihaneh, & Canetti, Ran (Eds.) Advances in cryptology – CRYPTO 2012. ISBN 978-3-642-32009-5 (pp. 643–662). Berlin: Springer.
Dankar, F.K., Madathil, N., Dankar, S.K., Boughorbel, S. (2019). Privacy-preserving analysis of distributed biomedical data: Designing efficient and secure multiparty computations using distributed statistical learning theory. JMIR medical informatics, 7(2), e12702.
Danner, G., & Jelasity, M. (2015). Fully distributed privacy preserving mini-batch gradient descent learning. In IFIP International conference on distributed applications and interoperable systems (pp. 30–44): Springer.
Davatzikos, C. (2019). Machine learning in neuroimaging: Progress and challenges. NeuroImage, 197(nil), 652–656. https://doi.org/10.1016/j.neuroimage.2018.10.003.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Li, F.-F. (2009). Imagenet a large-scale hierarchical image database. In 2009 IEEE Conference on computer vision and pattern recognition (pp. 248–255): IEEE.
DeVries, T., & Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arxiv:1708.04552.
Doyle, O.M., Mehta, M.A., Brammer, M.J. (2015). The role of machine learning in neuroimaging for drug discovery and development. Psychopharmacology, 232 (21-22), 4179–4189. https://doi.org/10.1007/s00213-015-3968-0.
Du, Y., Fu, Z., Sui, J., Gao, S., Xing, Y., Lin, D., Salman, M., Md, A.R., Abrol, A., Chen, J., Hong, E., Kochunov, P., Osuch, E.A., Calhoun, V.D. (2019). Neuromark: an adaptive independent component analysis framework for estimating reproducible and comparable fmri biomarkers among brain disorders medRxiv.
Dwork, C. (2006). Differential privacy. In Bugliesi, Michele, Preneel, Bart, Sassone, Vladimiro, Wegener, Ingo (Eds.) Automata, languages and programming. ISBN 978-3-540-35908-1 (pp. 1–12). Berlin: Springer.
Evans, D., Kolesnikov, V., Rosulek, M. (2017). A pragmatic introduction to secure multi-party computation. Foundations and Trends®;in Privacy and Security, 2, 2–3.
Fedorov, A., Devon Hjelm, R, Abrol, A., Fu, Z., Du, Y., Plis, S., Calhoun, V.D. (2019). Prediction of progression to alzheimer’s disease with deep infomax. In 2019 IEEE EMBS International conference on biomedical & health informatics (BHI). https://doi.org/10.1109/bhi.2019.8834630 (p. nil).
Fischl, B. (2012). Freesurfer. Neuroimage, 62(2), 774–781.
Fu, Z., Caprihan, A., Chen, J., Du, Y., Adair, J.C, Sui, J., Rosenberg, G.A, Calhoun, V.D. (2019). Altered static and dynamic functional network connectivity in alzheimer’s disease and subcortical ischemic vascular disease: shared and specific brain connectivity abnormalities. Human Brain Mapping.
Greenspan, H., van Ginneken, B., Summers, R.M. (2016). Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging, 35 (5), 1153–1159. https://doi.org/10.1109/tmi.2016.2553401.
Gupta, O., & Raskar, R. (2018). Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications, 116, 1–8.
Hibar, DP, ENIGMA-Consortium, et al. (2013). Enigma2: genomewide scans of subcortical brain volumes in 16,125 subjects from 28 cohorts worldwide. In 19th Annual Meeting of the Organization for Human Brain Mapping.
Hinton, G. (2018). Deep learning-a technology with the potential to transform health care. Journal of the American Medical Association, 320(11), 1101. https://doi.org/10.1001/jama.2018.11100.
Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W. (2008). Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genetics, 4(8), e1000167. https://doi.org/10.1371/journal.pgen.1000167.
Huettel, S.A., Song, A.W., McCarthy, G. (2014). Functional magnetic resonance imaging. Sunderland: Sinauer. ISBN 9780878936274, https://books.google.com/books?id=CUrVoAEACAAJ.
Imtiaz, H., Mohammadi, J., Silva, R., Baker, B., Plis, S.M., Sarwate, A.D., Calhoun, V.D. (2019). Improved differentially private decentralized source separation for fMRI data. Technical Report arXiv:1910.12913 [stat.ML].
Imtiaz, H., & Sarwate, A.D. (2018a). Differentially private distributed principal component analysis. In Proceedings of the 43rd IEEE international conference on acoustics, speech and signal processing (ICASSP), Calgary, AB, Canada (pp. 2206–2210). https://doi.org/10.1109/ICASSP.2018.8462519.
Imtiaz, H., & Sarwate, A.D. (2018b). Distributed differentially-private algorithms for matrix and tensor factorization. IEEE Journal of Selected Topics in Signal Processing, 12(6), 1449–1464. https://doi.org/10.1109/JSTSP.2018.2877842.
Jack, C.R. Jr., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G., Harvey, D., Borowski, B., Britson, P.J., Whitwell, J.L., Ward, C., et al. (2008). The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine, 27(4), 685–691.
Jollans, L., Boyle, R., Artiges, E., Banaschewski, T., Desrivières, S., Grigis, A., Martinot, Jean-Luc, Paus, Tomáš, Smolka, M.N., Walter, H., Schumann, G., Garavan, H., Whelan, R. (2019). Quantifying performance of machine learning methods for neuroimaging data. NeuroImage, 199(nil), 351–365. https://doi.org/10.1016/j.neuroimage.2019.05.082.
Konečný, J., McMahan, H.B., Yu, F.X., Richtarik, P., Suresh, A.T., Bacon, D. (2016). Federated learning: Strategies for improving communication efficiency. In NIPS workshop on private multi-party machine learning. arXiv:1610.05492.
Krizhevsky, A., Nair, V., Hinton, G. Cifar-10 (canadian institute for advanced research). http://www.cs.toronto.edu/kriz/cifar.html.
Lauterbur, P.C. (1973). Image formation by induced local interactions: examples employing nuclear magnetic resonance. Nature, 242(5394), 190–191.
LeCun, Y., & Cortes, C. (2010). MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/.
Lewis, N., Gazula, H., Plis, S.M., Calhoun, V.D. (2020). Decentralized distribution-sampled classification models with application to brain imaging. Journal of neuroscience methods, 108418, 329.
Lewis, N., Plis, S., Calhoun, V. (2017). Cooperative learning: Decentralized data neural network. In 2017 International joint conference on neural networks (IJCNN) (pp. 324–331): IEEE.
Li, W., Milletarì, F., Xu, D., Rieke, N., Hancox, J., Zhu, W., Baust, M., Cheng, Y., Ourselin, S., Cardoso, M.J., et al. (2019). Privacy-preserving federated brain tumour segmentation. In International workshop on machine learning in medical imaging (pp. 133–141): Springer.
Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., Ginneken, Bram van, Sánchez, C.I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42(nil), 60–88. https://doi.org/10.1016/j.media.2017.07.005.
Liu, X., Faes, L., Kale, A.U., Wagner, S.K., Fu, D.J., Bruynseels, A., Mahendiran, T., Moraes, G., Shamdas, M., Kern, C., et al. (2019). A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health, 1(6), e271–e297.
Mahmood, U., Rahman, M.M., Fedorov, A., Lewis, N., Fu, Z., Calhoun, V.D., Plis, S.M. (2020). Whole milc: generalizing learned dynamics across tasks, datasets, and populations. arxiv:2007.16041.
Mcguire, A.L., Basford, M., Dressler, L.G., Fullerton, S.M., Koenig, B.A., Li, R., McCarty, C.A., Ramos, E., Smith, M.E., Somkin, C.P., Waudby, C., Wolf, W.A., Clayton, E.W. (2011). Ethical and practical challenges of sharing data from genome-wide association studies: The eMERGE Consortium experience. Genome Research, 21(7), 1001–1007. https://doi.org/10.1101/gr.120329.111.
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A. (2017). Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (pp. 1273–1282).
Mennes, M., Biswal, B.B., Castellanos, F.X., Milham, M.P. (2013). Making data sharing work: the fcp/indi experience. NeuroImage, 82, 683–691.
Michel, Jean-Baptiste, Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Holberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L. (2010). Quantitative analysis of culture using millions of digitized books. Science. http://www.sciencemag.org/content/331/6014/176.full.
Oh, K., Chung, Y.-C., Ko, W.K., Kim, W.-S., Oh, I.-S. (2019). Classification and visualization of alzheimer’s disease using volumetric convolutional neural network and transfer learning. Scientific Reports, 9(1), 18150. https://doi.org/10.1038/s41598-019-54548-6.
Orrù, G., Pettersson-Yeo, W., Marquand, A.F., Sartori, G., Mechelli, A. (2012). Using support vector machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review. Neuroscience & Biobehavioral Reviews, 36(4), 1140–1152. https://doi.org/10.1016/j.neubiorev.2012.01.004.
Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., Talwar, K. (2016). Semi-supervised knowledge transfer for deep learning from private training data. arxiv:1610.05755.
Pizarro, R., Assemlal, H.-E., Nigris, D.D., Elliott, C., Antel, S., Arnold, D., Shmuel, A. (2019). Using deep learning algorithms to automatically identify the brain mri contrast: implications for managing large databases. Neuroinformatics, 17(1), 115–130.
Plis, S.M., Hjelm, D.R., Salakhutdinov, R., Allen, E.A., Bockholt, H.J., Long, J.D., Johnson, H.J., Paulsen, J.S., Turner, J.A., Calhoun, V.D. (2014). Deep learning for neuroimaging: a validation study. Frontiers in Neuroscience, 8(nil), nil. https://doi.org/10.3389/fnins.2014.00229.
Plis, S.M., Sarwate, A.D., Wood, D., Dieringer, C., Landis, D., Reed, C., Panta, S.R., Turner, J.A., Shoemaker, J.M., Carter, Kim. W., Thompson, P., Hutchison, K., Calhoun, V.D. (2016). COINSTAC: A privacy enabled model and prototype for leveraging and processing decentralized brain imaging data. Frontiers in neuroscience, 10, 365. https://doi.org/10.3389/fnins.2016.00365.
Poldrack, R.A., Barch, D.M., Mitchell, J., Wager, T., Wagner, A.D., Devlin, J.T., Cumba, C., Koyejo, O., Milham, M. (2013). Toward open sharing of task-based fmri data: the openfmri project. Frontiers in neuroinformatics, 7, 12.
Potkin, S.G., & Ford, J.M. (2009). Widespread cortical dysfunction in schizophrenia: the fbirn imaging consortium. Schizophrenia Bulletin, 35(1), 15–18.
Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H.R, Albarqouni, S., Bakas, S., N Galtier, M., Landman, B.A, Maier-Hein, K., et al. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3(1), 1–7.
Rinck, P. (2014). Magnetic resonance: a critical peer-reviewed introduction. In Magnetic resonance in medicine. The basic textbook of the european magnetic resonance forum (pp. 21–01).
Rumelhart, D.E., Hinton, G.E., Williams, R.J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
Saha, D.K., Calhoun, V.D., Panta, S.R., Plis, S.M. (2017). See without looking: joint visualization of sensitive multi-site datasets. In IJCAI (pp. 2672–2678).
Sarwate, A.D., Plis, S.M., Turner, J.A., Arbabshirani, M.R., Calhoun, V.D. (2014). Sharing privacy-sensitive access to neuroimaging and genetics data: a review and preliminary validation. Frontiers in Neuroinformatics, 8, 35. https://doi.org/10.3389/fninf.2014.00035.
Shen, D., Wu, G., Suk, H.-I. (2017). Deep learning in medical image analysis. Annual Review of Biomedical Engineering, 19(1), 221–248. https://doi.org/10.1146/annurev-bioeng-071516-044442.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354–359.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556.
Singh, A., Vepakomma, P., Gupta, O., Raskar, R. (2019). Detailed comparison of communication efficiency of split learning and federated learning. arxiv:1909.09145.
Speelpenning, B. (1980). Compiling fast partial derivatives of functions given by algorithms. Technical report, Illinois Univ., Urbana (USA). Dept. of Computer Science.
Vepa, A. (2020). Fmri train npy19. https://www.kaggle.com/arvindmvepa/fmri-train-npy19. Accessed: 2020-07-17.
Verner, E. (2021). Trendscenter/freesurfer-multisite-dataset: updated Readme. https://doi.org/10.5281/zenodo.4631021.
Wojtalewicz, N.P., Silva, R.F., Calhoun, V.D., Sarwate, A.D., Plis, S.M. (2017). Decentralized independent vector analysis. In 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 826–830): IEEE.
Wolfson, M., Wallace, S.E., Masca, N., Rowe, G., Sheehan, N.A., Ferretti, V., LaFlamme, P., Tobin, M.D., Macleod, J., Little, J., et al. (2010). Datashield: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data. International Journal of Epidemiology, 39(5), 1372–1382.
Woo, C.-W., Chang, L.J., Lindquist, M.A., Wager, T.D. (2017). Building better biomarkers: Brain models in translational neuroimaging. Nature Neuroscience, 20(3), 365–377. https://doi.org/10.1038/nn.4478.
Xie, L., Plis, S., & Sarwate, A.D. (2016). Data-weighted ensemble learning for privacy-preserving distributed learning. In 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 2309–2313): IEEE.
Zhu, L., & Han, S. (2020). Deep leakage from gradients. In Federated learning (pp. 17–31): Springer.
Acknowledgements
This work was supported by the National Institute of Health (Grant Numbers 1R01DA040487 and 2RF1MH121885). We are also thankful to Mr. Eric Verner for his support with TReNDS data.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Senanayake, N., Podschwadt, R., Takabi, D. et al. NeuroCrypt: Machine Learning Over Encrypted Distributed Neuroimaging Data. Neuroinform 20, 91–108 (2022). https://doi.org/10.1007/s12021-021-09525-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12021-021-09525-8