Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines
arXiv - CS - Performance Pub Date : 2020-09-01 , DOI: arxiv-2009.00304
S\"oren Henning, Wilhelm Hasselbring

Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies relevant workload dimensions that might affect the scalability of a use case. We propose to design one benchmark per use case and relevant workload dimension. We present a general benchmarking framework, which can be applied to execute the individual benchmarks for a given use case and workload dimension. Our framework executes an implementation of the use case's dataflow architecture for different workloads of the given dimension and various numbers of processing instances. This way, it identifies how resources demand evolves with increasing workloads. Within the scope of this paper, we present 4 identified use cases, derived from processing Industrial Internet of Things data, and 7 corresponding workload dimensions. We provide implementations of 4 benchmarks with Kafka Streams as well as an implementation of our benchmarking framework to execute scalability benchmarks in cloud environments. We use both for evaluating the Theodolite method and for benchmarking Kafka Streams' scalability for different deployment options.

中文翻译：

经纬仪：分布式流处理引擎的可扩展性基准测试

分布式流处理引擎的设计重点是可扩展性，以连续方式处理大数据量。我们提出了用于对分布式流处理引擎的可扩展性进行基准测试的经纬仪方法。该方法的核心是定义实现流处理的微服务必须满足的用例。对于每个用例，我们的方法确定可能影响用例可扩展性的相关工作负载维度。我们建议为每个用例和相关工作负载维度设计一个基准。我们提出了一个通用的基准测试框架，它可以用于为给定的用例和工作负载维度执行单独的基准测试。我们的框架执行用例的实现' s 数据流架构，适用于给定维度的不同工作负载和不同数量的处理实例。通过这种方式，它可以确定资源需求如何随着工作负载的增加而演变。在本文的范围内，我们提出了 4 个确定的用例，它们源自处理工业物联网数据，以及 7 个相应的工作负载维度。我们使用 Kafka Streams 提供 4 个基准测试的实现，以及我们的基准测试框架的实现，以在云环境中执行可扩展性基准测试。我们既用于评估经纬仪方法，也用于针对不同部署选项对 Kafka Streams 的可扩展性进行基准测试。源自处理工业物联网数据，以及 7 个相应的工作量维度。我们使用 Kafka Streams 提供 4 个基准测试的实现，以及我们的基准测试框架的实现，以在云环境中执行可扩展性基准测试。我们既用于评估经纬仪方法，也用于针对不同部署选项对 Kafka Streams 的可扩展性进行基准测试。源自处理工业物联网数据，以及 7 个相应的工作量维度。我们使用 Kafka Streams 提供 4 个基准测试的实现，以及我们的基准测试框架的实现，以在云环境中执行可扩展性基准测试。我们既用于评估经纬仪方法，也用于针对不同部署选项对 Kafka Streams 的可扩展性进行基准测试。

更新日期：2020-09-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文