Abstract
A new approach to the organization of data pipelining in cryo-electron microscopy (Cryo-EM) and X-ray free-electron laser (XFEL) experiments is presented. This approach, based on the progress in information technologies (IT) due to the development of containerization techniques, allows one to separate user’s work at the application level from the developments of IT experts at the system and middleware levels. A user must only perform two simple operations: pack application packages in containers and write a workflow with data processing logic in a standard format. Some examples of containerized workflows for Cryo-EM and XFEL experiments on study of the spatial structure of single biological nanoobjects (viruses, macromolecules, etc.) are discussed. Examples of program codes for installing applied packages in Docker containers and examples of applied workflows written in the high-level language CWL are presented at the site of the project. The examples have comments, which may help an IT-inexperienced researcher to gain an idea of how to organize Docker containers and form CWL workflows for Cryo-EM and XFEL data pipelining.
Similar content being viewed by others
REFERENCES
E. Callaway, Nature News 525 (7568), 172 (2015). https://doi.org/10.1038/525172a
M. Altarelli and A. P. Mancuso, Philos. Trans. R. Soc. B 369 (1647), 20130311 (2014). https://doi.org/10.1098/rstb.2013.0311
T. A. M. Bharat, C. J. Russo, J. Löwe, et al., Structure 23 (9), 1743 (2015). https://doi.org/10.1016/j.str.2015.06.026
R. Neutze, R. Wouts, D. van der Spoel., et al., Nature 406 (6797), 752 (2000). https://doi.org/10.1038/35021099
K. J. Gaffney and H. N. Chapman, Science 316 (5830), 1444 (2007). https://doi.org/10.1126/science.1135923
A. Saxena, M. Sun, and A. Y. Ng, IEEE Trans. Pattern Anal. Mach. Intell. 31 (5), 824 (2008). https://doi.org/10.1109/TPAMI.2008.132
D. J. Rezende, S. M. Ali, Eslami, S. Mohamed, et al., Advances in Neural Information Processing Systems (Curran Associates, 2016), p. 4996.
Y. LeCun and Y. Bengio, Convolutional Networks for Images, Speech, and Time-Series (MIT Press, 1998), p. 255.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, Advances in Neural Information Processing Systems (Curran Associates, 2012), p. 1097.
E. Pichkur, T. Baimukhametov, A. Teslyuk, et al., J. Phys.: Conf. Ser. 955, 012005 (2018). https://doi.org/10.1088/1742-6596/955/1/012005
S. A. Bobkov, A. B. Teslyuk, S. I. Zolotarev, et al., Lobachevskii J. Math. 39 (9), 1170 (2018). https://doi.org/10.1134/S1995080218090093
D. Merkel, Linux J., No. 239, 2 (2014).
https://kubernetes.io/
I. Nadareishvili, R. Mitra, M. McLarty, et al., Microservice Architecture: Aligning Principles, Practices, and Culture (O’Reilly Media, 2016).
T. Šimko, L. Heinrich, H. Hirvonsalo, et al., EPJ Web Conf. EDP Sci. 214, 06034 (2019). https://doi.org/10.1051/epjconf/201921406034
P. Amstutz, M. R. Crusoe, N. Tijanić, et al. Common Workflow Language working group. 2016. https://doi.org/10.6084/m9.figshare.3115156.v2
C. Gatsogiannis and J. Markl, J. Mol. Biol. 385 (3), 963 (2009). https://doi.org/10.1016/j.jmb.2008.10.080
A. Tesliuk, S. Bobkov, V. Ilyin, et al., 2019 Ivannikov ISPRAS Open Conference (ISPRAS).IEEE Xplore,2019, p. 67. https://doi.org/10.1109/ISPRAS47671.2019.00016
T. N. Baimukhametov, Yu. M. Chesnokov, E. B. Pichkur, et al., Acta Nat. 10 (3), 48 (2018). https://doi.org/10.32607/20758251-2018-10-3-48-56
Y. Cheng, Cell 161 (3), 450 (2015). https://doi.org/10.1016/j.cell.2015.03.049
S. H. W. Scheres, J. Struct. Biol. 189 (2), 114 (2015). https://doi.org/10.1016/j.jsb.2014.11.010
A. Punjani, J. L. Rubinstein, D. J. Fleet, et al., Nat. Methods 14 (3), 290 (2017). https://doi.org/10.1038/nmeth.4169
J. Plitzko and W. P. Baumeister, Springer Handbook of Microscopy (Springer, Cham, 2019), p. 2-2.
https://www.ebi.ac.uk/pdbe/emdb/statistics_num_res.html/.
E. Nwanochie and V. N. Uversky, Int. J. Mol. Sci. 20 (17), 4186 (2019). https://doi.org/10.3390/ijms20174186
X. C. Bai, G. McMullan, and S. H. W. Scheres, Trends Biochem. Sci. 40 (1), 49 (2015). https://doi.org/10.1016/j.tibs.2014.10.005
A. K. Mitra and M. van Raaij, Acta Crystallogr. F 75 (1), 1 (2019). https://doi.org/10.1107/S2053230X18017806
D. B. Williams and C. B. Carter, Transmission Electron Microscopy (Springer, 2009). https://doi.org/10.1007/978-0-387-76501-3_10
S. H. W. Scheres, M. Valle, R. Nuñez, et al., J. Mol. Biol. 348 (1), 139 (2005). https://doi.org/10.1016/j.jmb.2005.02.031
D. N. Mastronarde, Microsc. Microanal. 24 (S1), 864 (2018). https://doi.org/10.1017/S1431927618004816
S. Q. Zheng, E. Palovcak, J.-P. Armache, et al., Nat. Methods 14 (4), 331 (2017). https://doi.org/10.1038/nmeth.4193
A. Rohou and N. Grigorieff, J. Struct. Biol. 192 (2), 216 (2015). https://doi.org/10.1016/j.jsb.2015.08.008
K. Zhang, J. Struct. Biol. 193 (1), 1 (2016). https://doi.org/10.1016/j.jsb.2015.11.003
K. Zhang, M. Li, and F. Sun, Gautomatch: An Efficient and Convenient gpu-Based Automatic Particle Selection Program. https://www.mrc-lmb.cam.ac.uk/kzhang/ 2011
S. H. W. Scheres, J. Struct. Biol. 180 (3), 519 (2012). https://doi.org/10.1016/j.jsb.2012.09.006
T. Grant, A. Rohou, and N. Grigorieff, Elife 7, e35383 (2018). https://doi.org/10.7554/eLife.35383
P. Emma, R. Akre, A. R. Bionta, et al., Nat. Photonics. 4 (9), 641 (2010). https://doi.org/10.1038/nphoton.2010.176
T. Ishikawa, H. Aoyagi, T. Asaka, et al., Nat. Photonics. 6 (8), 540 (2012). https://doi.org/10.1038/nphoton.2012.141
M. M. Seibert, T. Ekeberg, F. R. N. C. Maia, et al., Nature 470 (7332), 78 (2011). https://doi.org/10.1038/nature09748
M. F. Hantke, D. Haase, F. R. N. Maia, et al., Nat. Photonics. 8 (12), 943 (2014). https://doi.org/10.1038/nphoton.2014.270
G. van der Schot, M. Svenda, F. R. N. C. Maia, et al., Nat. Commun. 6 (1), 1 (2015). https://doi.org/10.1038/ncomms6704
T. Ekeberg, M. Svenda, C. Abergel, et al., Phys. Rev. Lett. 114 (9), 098102 (2015). https://doi.org/10.1103/PhysRevLett.114.098102
A. Aquila, A. Barty, C. Bostedt, et al., Struct. Dyn. 2 (4), 041701 (2015).
R. P. Kurta, J. J. Donatelli, C. H. Yoon, et al., Phys. Rev. Lett. 119 (15), 158102 (2017). https://doi.org/10.1103/PhysRevLett.119.158102
M. Rose, S. Bobkov, K. Ayyer, et al., IUCrJ 5 (6), 727 (2018). https://doi.org/10.1107/S205225251801120X
Y. Shi, K. Yin, X. Tai, et al., IUCrJ 6 (2), 331 (2019). https://doi.org/10.1107/S2052252519001854
D. Assalauova, Y. Y. Kim, S. Bobkov, et al., IUCrJ. 7 (2020) (in press).
E. Sobolev, S. Zolotarev, K. Giewekemeyer, et al., Commun. Phys. 3, Article number 97 (2020). https://doi.org/10.1038/s42005-020-0362-y
https://linuxcontainers.org/lxc/introduction/.
https://cri-o.io/.
https://docs.docker.com/engine/reference/builder/.
https://opencontainers.org/about/overview/.
https://hub.docker.com/.
https://docs.docker.com/get-started/swarm-deploy/.
https://rancher.com/.
https://bio1.grid.kiae.ru/sw/workflows/.
J. Yu and R. Buyya, ACM Sigmod Record. 34 (3), 44 (2005). https://doi.org/10.1007/s10723-005-9010-8
https://xpra.org/.
T. Wagner, L. Lusnig, S. Pospich, et al., bioRxiv. 2020. https://doi.org/10.1101/2020.02.28.969196
H. K. N. Reddy, C. H. Yoon, A. Aquila, et al., Sci. Data 4, 170079 (2017). https://doi.org/10.1038/sdata.2017.79
J. D. Bozek, Eur. Phys. J. Spec. Top. 169 (1), 129 (2009). https://doi.org/10.1140/epjst/e2009-00982-y
K. R. Ferguson, M. Bucher, J. D. Bozek, et al., J. Synchrotron Radiat. 22 (3), 492 (2015). https://doi.org/10.1107/S1600577515004646
F. R. N. C. Maia, Nat. Methods 9 (9), 854 (2012). https://doi.org/10.1038/nmeth.2110
K. Ayyer, T.-Y. Lan, V. Elser, et al., J. Appl. Crystallogr. 49 (4), 1320 (2016). https://doi.org/10.1107/S1600576716008165
ACKNOWLEDGMENTS
The study was performed using the computational resources supplied within project no. 1571 “Development of a digital platform for distributed storage, processing, and analyzing scientific data based on supercomputer and grid technologies” of the National Research Centre “Kurchatov Institute.”
Funding
The work on the development of a containerized platform for organizing data pipelining in SPA/SPI Cryo-EM and XFEL experiments was supported by the Russian Science Foundation (grant No. 18-41-06001) and the Helmholtz Associations Initiative Networking Fund (grant no. HRSF-0002).
The development and deployment of the infrastructure and program services at the system level for platform operation and data storage were performed according to the research and development plan of the National Research Centre “Kurchatov Institute.”
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by Yu. Sin’kov
Rights and permissions
About this article
Cite this article
Bobkov, S.A., Teslyuk, A.B., Baymukhametov, T.N. et al. Advances in Modern Information Technologies for Data Analysis in CRYO-EM and XFEL Experiments. Crystallogr. Rep. 65, 1081–1092 (2020). https://doi.org/10.1134/S1063774520060085
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1063774520060085