Big data deployment in containerized infrastructures through the interconnection of network namespaces,Software: Practice and Experience

当前位置： X-MOL 学术 › Softw. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Big data deployment in containerized infrastructures through the interconnection of network namespaces
Software: Practice and Experience ( IF 3.5 ) Pub Date : 2020-01-27 , DOI: 10.1002/spe.2793
Carla Sauvanaud ₁ , Ajay Dholakia ₂ , Jordi Guitart _{1,

3} , Chulho Kim ₂ , Peter Mayes ₂

Affiliation

Big Data applications tackle the challenge of fast handling of large streams of data. Their performance is not only dependent on the data frameworks implementation and the underlying hardware but also on the deployment scheme and its potential for fast scaling. Consequently, several efforts have focused on the ease of deployment of Big Data applications, notably through the use of containerization. This technology was indeed raised to bring multitenancy and multiprocessing out of clusters, providing high deployment flexibility through lightweight container images. Recent studies have focused mostly on Docker containers. Notwithstanding, this article is actually interested in recent Singularity containers as they provide more security and support high‐performance computing (HPC) environments and, in this way, they can make Big Data applications benefit from the specialized hardware of HPC. Singularity 2.x, however, does not isolate network resources as required by most Big Data components. Singularity 3.x allows allocating each container with isolated network resources, but their interconnection requires a nontrivial amount of configuration effort. In this context, this article makes a functional contribution in the form of a deployment scheme based on the interconnection of network namespaces, through underlay and overlay networking approaches, to make Big Data applications easily deployable inside Singularity containers. We provide detailed account of our deployment scheme when using both interconnection approaches in the form of a “how‐to‐do‐it” report, and we evaluate it by comparing three Big Data applications based on Hadoop when performing on a bare‐metal infrastructure and on scenarios involving Singularity and Docker instances.

中文翻译：

通过网络命名空间的互连在容器化基础设施中部署大数据

大数据应用程序解决了快速处理大量数据流的挑战。它们的性能不仅取决于数据框架的实现和底层硬件，还取决于部署方案及其快速扩展的潜力。因此，几项努力都集中在大数据应用程序的易于部署上，特别是通过使用容器化。这项技术确实是为了将多租户和多处理带出集群而提出的，通过轻量级容器镜像提供高部署灵活性。最近的研究主要集中在 Docker 容器上。尽管如此，本文实际上对最近的 Singularity 容器很感兴趣，因为它们提供了更高的安全性并支持高性能计算 (HPC) 环境，并且通过这种方式，他们可以使大数据应用程序受益于 HPC 的专用硬件。然而，Singularity 2.x 并没有按照大多数大数据组件的要求隔离网络资源。Singularity 3.x 允许为每个容器分配隔离的网络资源，但它们的互连需要大量的配置工作。在此背景下，本文以基于网络命名空间互连的部署方案的形式做出功能贡献，通过底层和重叠网络方法，使大数据应用程序可以轻松部署在 Singularity 容器内。我们以“如何做”报告的形式详细说明了使用两种互连方法时的部署方案，

更新日期：2020-01-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>