Abstract
Virtual clusters are widely used computing platforms than can be deployed in multiple cloud platforms. The ability to dynamically grow and shrink the number of nodes has paved the way for customised elastic computing both for High Performance Computing and High Throughput Computing workloads. However, elasticity is typically restricted to a single cloud site, thus hindering the ability to provision computational resources from multiple geographically distributed cloud sites. To this aim, this paper introduces an architecture of open-source components that coherently deploy a virtual elastic cluster across multiple cloud sites to perform large-scale computing. These hybrid virtual elastic clusters are automatically deployed and configured using an Infrastructure as Code (IaC) approach on a distributed hybrid testbed that spans different organizations, including on-premises and public clouds, supporting automated tunneling of communications across the cluster nodes with advanced VPN topologies. The results indicate that cluster-based computing of embarrassingly parallel jobs can benefit from hybrid virtual clusters that aggregate computing resources from multiple cloud back-ends and bring them together into a dedicated, albeit virtual network.
Similar content being viewed by others
References
Kranzlmüller, D., de Lucas, J.M., Öster, P.: The European Grid Initiative (EGI). In: Remote Instrumentation and Virtual Laboratories. https://doi.org/10.1007/978-1-4419-5597-5∖_6, pp 61–66. Springer, US (2010)
Altunay, M., Avery, P., Blackburn, K., Bockelman, B., Ernst, M., Fraser, D., Quick, R., Gardner, R., Goasguen, S., Levshina, T., Livny, M., McGee, J., Olson, D., Pordes, R., Potekhin, M., Rana, A., Roy, A., Sehgal, C., Sfiligoi, I., Wuerthwein, F.: A science driven production cyberinfrastructure-the open science grid. J Grid Comput. 9(2), 201–218 (2011). https://doi.org/10.1007/s10723-010-9176-6
Medeiros, C.B., Katz, D.S.: EScience today and tomorrow. https://doi.org/10.1016/j.future.2015.10.016 (2016)
Mell, P., Grance, T.: The NIST definition of cloud computing. NIST special publication 800-145 (Final), Tech. rep. http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf (2011)
Slurm workload manager. https://slurm.schedmd.com/
Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: The Condor experience. https://doi.org/10.1002/cpe.938 (2005)
Fajardo, E.M., Dost, J.M., Holzman, B., Tannenbaum, T., Letts, J., Tiradani, A., Bockelman, B., Frey, J., Mason, D.: How much higher can HTCondor fly?. In: Journal of Physics: Conference Series, vol 664, Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/664/6/062014(2015)
De Alfonso, C., Caballer, M., Alvarruiz, F., Moltó, G.: An economic and energy-aware analysis of the viability of outsourcing cluster computing to a cloud. Futur. Gener. Comput. Syst. 29(3), 704–712 (2013). 10.1016/j.future.2012.08.014
INFN, INDIGO PaaS orchestrator. https://www.indigo-datacloud.eu/paas-orchestrator
CESNET, INDIGO virtual router. https://github.com/indigo-dc/ansible-role-indigovr
U. of Zurich, ElastiCluster. https://github.com/elasticluster/elasticluster
MIT, StarCluster. http://web.mit.edu/stardev/cluster/
Coulter, J.E., Abeysinghe, E., Pamidighantam, S., Pierce, M.: Virtual clusters in the jetstream cloud: A story of elasticized hpc. In: Proceedings of the Humans in the Loop: Enabling and Facilitating Research on Cloud Computing, HARC ’19. https://doi.org/10.1145/3355738.3355752, pp 8:1–8:6. ACM, New York (2019)
Yu, L., Cai, Z.: Dynamic scaling of virtual clusters with bandwidth guarantee in cloud datacenters. In: IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, pp 1–9. https://doi.org/10.1109/INFOCOM.2016.7524355 (2016)
Yu, L., Shen, H., Cai, Z., Liu, L., Pu, C.: Towards bandwidth guarantee for virtual clusters under demand uncertainty in multi-tenant clouds. IEEE Trans. Parallel Distrib. Syst 29(2), 450–465 (2018). https://doi.org/10.1109/TPDS.2017.2754366
Caballer, M., De Alfonso, C., Alvarruiz, F., Moltó, G.: EC3: Elastic cloud computing cluster. J. Comput. Syst. Sci. 79(8), 1341–1351 (2013). https://doi.org/10.1016/j.jcss.2013.06.005
Sipos, G., La Rocca, G., Scardaci, D., Solagna, P.: The EGI applications on demand service. Futur. Gener. Comput. Syst. 98, 171–179 (2019). https://doi.org/10.1016/j.future.2019.03.009
Calatrava, A., Romero, E., Moltó, G., Caballer, M., Alonso, J.M.: Self-managed cost-efficient virtual elastic clusters on hybrid Cloud infrastructures. Futur. Gener. Comput. Syst. 61, 3–25 (2016). https://doi.org/10.1016/j.future.2016.01.018
Azure, Azure cyclecloud. https://azure.microsoft.com/en-us/features/azure-cyclecloud/
AWS, AWS parallel cluster. https://aws.amazon.com/es/blogs/opensource/aws-parallelcluster/
AWS Batch. https://aws.amazon.com/batch/
ALCES flight. https://alces-flight.com/
Istio. Connect, secure, control, and observe services. https://istio.io/
Submariner k8s project documentation website. https://submariner.io/
Open netorking environment. https://networkencyclopedia.com/open-network-environment/
Open vSwitch. https://www.openvswitch.org/
Cloudify. https://cloudify.co/
Caballer, M., Zala, S., García, Á. L., Moltó, G., Fernández, P. O., Velten, M.: Orchestrating complex application architectures in heterogeneous clouds. J.Grid Comput. 16 (1), 3–18 (2018). https://doi.org/10.1007/s10723-017-9418-y arXiv:1711.03334
Caballer, M., Donvito, G., Moltó, G., Rocha, R., Velten, M.: TOSCA-based orchestration of complex clusters at the IaaS level. J Phys. Conf. Ser. 898, 082036 (2017). https://doi.org/10.1088/1742-6596/898/8/082036http://stacks.iop.org/1742-6596/898/i=8/a=082036?key=crossref.af71f04f17660fdd1e050f7c1e00b643
Barisits, M., Beermann, T., Berghaus, F., Bockelman, B., Bogado, J., Cameron, D., Christidis, D., Ciangottini, D., Dimitrov, G., Elsing, M., Garonne, V., di Girolamo, A., Goossens, L., Guan, W., Guenther, J., Javurek, T., Kuhn, D., Lassnig, M., Lopez, F., Magini, N., Molfetas, A., Nairz, A., Ould-Saada, F., Prenner, S., Serfon, C., Stewart, G., Vaandering, E., Vasileva, P., Vigne, R., Wegner, T.: Rucio: Scientific data management. Comput. Softw. Big Sci. 3(1), 11 (2019). https://doi.org/10.1007/s41781-019-0026-3
Caballer, M., Blanquer, I., Moltó, G., de Alfonso, C.: Dynamic management of virtual infrastructures. J. Grid Comput. 13(1), 53–70 (2015). https://doi.org/10.1007/s10723-014-9296-5
Palma, D., Rutkowski, M., Spatzier, T.: Simple-Profile-YAML-v1.1.html TOSCA Simple Profile in YAML Version 1.1, Tech. rep. http://docs.oasis-open.org/tosca/TOSCA-Simple-Profile-YAML/v1.1/TOSCA-Simple-Profile-YAML-v1.1.html (2016)
de Alfonso, C., Caballer, M., Alvarruiz, F., Hernández, V.: An energy management system for cluster infrastructures. Comput. Electr. Eng. 39(8), 2579–2590 (2013). http://www.sciencedirect.com/science/article/pii/S0045790613001365
de Alfonso, C., Caballer, M., Calatrava, A., Moltó, G., Blanquer, I.: Multi-elastic Datacenters: Auto-scaled Virtual Clusters on Energy-Aware Physical Infrastructures. J. Grid Comput. 17(1), 191–204 (2019). https://doi.org/10.1007/s10723-018-9449-z
Google, Google IPv6. https://www.google.com/intl/en/ipv6/statistics.html
Addepalli, S.R.: OVN4NFVK8s Plugin. https://github.com/opnfv/ovn4nfv-k8s-plugin (Sood, R.)
Audio model. https://marketplace.deep-hybrid-datacloud.eu/modules/deep-oc-audio-classification-tf.html
DEEP open catalog. https://marketplace.deep-hybrid-datacloud.eu
AudioSet: A large-scale dataset of manually annotated audio events. https://research.google.com/audioset/
Gomes, J., Bagnaschi, E., Campos, I., David, M., Alves, L., Martins, J., Pina, J., López-García, A., Orviz, P.: Enabling rootless linux containers in multi-user environments: The udocker tool. Comput. Phys. Commun. 232, 84–97 (2018). https://doi.org/10.1016/j.cpc.2018.05.021 arXiv:1711.01758
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, Association for Computing Machinery, Inc, pp 1041–1044. https://doi.org/10.1145/2647868.2655045 (2014)
Acknowledgements
The work presented in this article has been partially funded by project DEEP Hybrid-DataCloud (grant agreement No 777435). GM and MC would also like to thank the Spanish “Ministerio de Economía, Industria y Competitividad” for the project “BigCLOE” with reference number TIN2016-79951-R. Computational resources at CESNET, used in the real-world use case, were supplied by the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Caballer, M., Antonacci, M., Šustr, Z. et al. Deployment of Elastic Virtual Hybrid Clusters Across Cloud Sites. J Grid Computing 19, 4 (2021). https://doi.org/10.1007/s10723-021-09543-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10723-021-09543-5