当前位置: X-MOL 学术Softw. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Aggregating data center measurements for availability analysis
Software: Practice and Experience ( IF 3.5 ) Pub Date : 2020-11-18 , DOI: 10.1002/spe.2934
Élisson da Silva Rocha 1 , Leylane G. F. da Silva 1 , Guto L. Santos 1 , Diego Bezerra 1 , André Moreira 1 , Glauco Gonçalves 2 , Maria Valéria Marquezini 3 , Amardeep Mehta 4 , Mattias Wildeman 4 , Judith Kelner 1 , Djamel Sadok 1 , Patricia T. Endo 1
Affiliation  

A data center infrastructure is composed of heterogeneous resources divided into three main subsystems: IT (processor, memory, disk, network, etc.), power (generators, power transformers, uninterruptible power supplies, distribution units, among others), and cooling (water chillers, pipes, and cooling tower). This heterogeneity brings challenges for collecting and gathering data from several devices in the infrastructure. In addition, extracting relevant information is another challenge for data center managers. While seeking to improve the cloud availability, monitoring the entire infrastructure using a variety of (open source and/or commercial) advanced monitoring tools, such as Zabbix, Nagios, Prometheus, CloudWatch, AzureWatch, and others is required. It is often common to use many monitoring systems to collect real‐time data for data center components from different subsystems. Such an environment brings an inherent challenge stemming from the need to aggregate and organize the whole collected infrastructure data and measurements. This first step is necessary prior to obtaining any valuable insights for decision‐making. In this paper, we present the Data Center Availability (DCA) System, a software system that is able to aggregate and analyze data center measurements aimed toward the study of DCA. We also discuss the DCA implementation and illustrate its operation, monitoring a small University research laboratory data center. The DCA System is able to monitor different types of devices using the Zabbix tool, such as servers, switches, and power devices. The DCA System is able to automatically identify the failure time seasonality and trend present in the collected data from different devices of the data center.

中文翻译:

汇总数据中心度量以进行可用性分析

数据中心基础设施由异构资源组成,异构资源分为三个主要子系统:IT(处理器,内存,磁盘,网络等),电源(发电机,电源变压器,不间断电源,配电单元等)和冷却(冷水机,管道和冷却塔)。这种异构性给从基础架构中的多个设备收集和收集数据带来了挑战。此外,提取相关信息是数据中心管理人员面临的另一项挑战。在寻求提高云可用性时,需要使用各种(开源和/或商业)高级监视工具(例如Zabbix,Nagios,Prometheus,CloudWatch,AzureWatch等)监视整个基础结构。通常,使用许多监视系统来收集来自不同子系统的数据中心组件的实时数据是很常见的。这种环境带来了固有的挑战,这是由于需要汇总和组织整个收集的基础架构数据和度量而产生的。在获得任何有价值的决策见解之前,第一步是必要的。在本文中,我们介绍了数据中心可用性(DCA)系统,该软件系统能够汇总和分析旨在研究DCA的数据中心度量。我们还将讨论DCA的实施并说明其运作,并监控一个小型的大学研究实验室数据中心。DCA系统能够使用Zabbix工具监视不同类型的设备,例如服务器,交换机和电源设备。
更新日期:2020-11-18
down
wechat
bug