Data centers’ services restoration based on the decision-making of distributed agents

Lima, Príscila Alves; Neto, Antônio Sá Barreto; Maciel, Paulo

doi:10.1007/s11235-020-00660-2

Data centers’ services restoration based on the decision-making of distributed agents

Published: 14 March 2020

Volume 74, pages 367–378, (2020)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

Príscila Alves Lima¹,
Antônio Sá Barreto Neto² &
Paulo Maciel¹

191 Accesses
3 Citations
Explore all metrics

Abstract

The increasing number of companies that are migrating their IT infrastructure to cloud environments has been motivated many studies on distributed backup strategies to improve the availability of these companies’ systems. In this scenario, it is essential to study mechanisms to evaluate the network conditions to minimize the transmission time to improve the availability of the system. The goal of this study is to build models to evaluate the availability of services running in cloud data center infrastructure, emphasizing the impact of the variation of throughput on the data redundancy, and consequently, on the availability of the service. Based on it, this research purposes some smart models which can be deployed in each data center of a distributed arrange of data centers and help the system administrator to choose the best data center to restore the services of a faulty one. To analyze the impact of the network throughput over the service’s availability, we gathered the MTTF and MTTR metrics of data center’s components and services, generated a reliability block diagram to get the MTTF of the system as a whole, and developed a formalism to model the network component. Based on the results, we built an SPN model to represent the system and get the availability of it in many network conditions. After that, we analyze the availability of the system to discuss the impact of the network conditions over the system’s availability. After building the models and get the system’s availability in many network conditions, we can perceive the enormous impact of the network conditions over the system’s availability through a plot that exhibits the annual downtime along of a year. Using the models developed to study the system availability, we developed smart agents capable of predicting the transfer time of a bulk of data and, with it, choose the data center with the best network conditions to restore the services of a faulty one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 6

Fig. 7

Fig. 10

Fig. 11

Fig. 12

Fig. 13

Fig. 14

Fig. 15

A Methodology for Automating the Cloud Data Center Availability Assessment

Performance Estimation of Fault-prone Infrastructure-as-a-Service Cloud Computing Systems and their Cost-aware Optimal Performance Determination

Article 17 April 2017

Model-Based Sensitivity of a Disaster Tolerant Active-Active GENESIS Cloud System

References

Amazon: Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region. (2017). Retrieved on February 28, 2017, from https://aws.amazon.com/pt/message/41926/
Bauer, E., Adams, R., & Eustace, D. (2011). Beyond redundancy: How geographic redundancy can improve service availability and reliability of computer-based systems. Hoboken: Wiley.
Book Google Scholar
Bradner, S. (1991). Benchmarking terminology for network interconnection devices. Technical report RFC
Callou, G., Sousa, E., Maciel, P., Tavares, E., Araujo, C., Silva, B., Rosa, N., et al. (2010). Impact analysis of maintenance policies on data center power infrastructure. In 2010 IEEE international conference on systems, man and cybernetics (pp. 526–533). IEEE.
Chen, T., Gao, X., & Chen, G. (2016). The features, hardware, and architectures of data center networks: A survey. Journal of Parallel and Distributed Computing, 96, 45–74.
Article Google Scholar
de QV Lima, M. A., Maciel, P. R., Silva, B., & Guimarães, A. P. (2014). Performability evaluation of emergency call center. Performance Evaluation, 80, 27–42.
Article Google Scholar
Forouzan, B., & Fegan, S. (2007). Data Communications and Networking. McGraw-Hill Forouzan networking series. New York: McGraw-Hill Higher Education.
Google Scholar
Foundation, L. (2019). NetEm-Network Emulator. Retrieved on April, 2019, from http://bit.ly/2Hmpghx.
Gartner: Cloud Computing Enters its Second Decade. (2019). Retrieved on April, 2019, from https://cnnmon.ie/2GZDMww.
German, R. (2000). Performance analysis of communication systems: Modeling with non-Markovian stochastic Petri nets. Wiey-Interscience series in systems and optimization. Hoboken: Wiley.
Google Scholar
Jiang, C., Qiu, Y., Gao, H., Fan, T., Li, K., & Wan, J. (2019). An edge computing platform for intelligent operational monitoring in internet data centers. IEEE Access, 7, 133375–133387.
Article Google Scholar
Kuo, W., & Zuo, M. (2003). Optimal reliability modeling: Principles and applications. Hoboken: Wiley.
Google Scholar
Kurose, J. F., & Ross, K. W. (2013). Computer networking: A top-down approach (international ed.). London: Pearson Higher Ed.
Google Scholar
Lee, D. (2018). Amazon data centre fault knocks websites offline temporarily. Retrieved on April, 2018, from https://bbc.in/2HxTegg.
Lima, P. A., Neto, A. S. B., & Maciel, P. R. M. (2018). Data centers service restoration based on distributed agents decision. In 2018 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 1611–1616). IEEE.
Ma, L., & Yang, B. (2018). Data backup against progressive disasters in geo-distributed data center networks. In 2018 international conference on networking and network applications (NaNA) (pp. 223–226). IEEE.
Maciel, P., Matos, R., Silva, B., Figueiredo, J., Oliveira, D., Fé, I., et al. (2017). Mercury: Performance and dependability evaluation of systems with exponential, expolynomial, and general distributions. In 2017 IEEE 22nd Pacific Rim international symposium on dependable computing (PRDC) (pp. 50–57). IEEE.
Maciel, P. R., Trivedi, K. S., Matias, R., & Kim, D. S. (2012). Dependability modeling. In Performance and dependability in service computing: Concepts, techniques and research directions (pp. 53–97). IGI Global.
Mining, O. D. (2019). Cd diagram. Retrieved on November, 2019, from https://bit.ly/2OUqYIU.
Nabi, M., Toeroe, M., & Khendek, F. (2016). Availability in the cloud: State of the art. Journal of Network and Computer Applications, 60, 54–67.
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Google Scholar
Persico, V., Botta, A., Marchetta, P., Montieri, A., & Pescapé, A. (2017). On the performance of the wide-area networks interconnecting public-cloud datacenters around the globe. Computer Networks, 112, 67–83.
Article Google Scholar
Pina, F. (2019). Speedtest.net python script. Retrieved on April, 2019, from http://bit.ly/2Hgywk1.
Pohlert, T. (2014). The pairwise multiple comparison of mean ranks package (PMCMR). R Package, 27, 9.
Google Scholar
Rosendo, D., Leoni, G., Gomes, D., Moreira, A., Gonçalves, G., Endo, P., et al. (2018). How to improve cloud services availability? Investigating the impact of power and it subsystems failures. In Proceedings of the 51st Hawaii international conference on system sciences.
Santos, G. L., Endo, P. T., Gonçalves, G., Rosendo, D., Gomes, D., Kelner, J., et al. (2017) Analyzing the it subsystem failure impact on availability of cloud services. In 2017 IEEE symposium on computers and communications (ISCC) (pp. 717–723). IEEE.
Scikit-learn. (2019). Scikit-learn kfold model selection. Retrieved on November, 2019, from https://bit.ly/37S2KYk.
Scikit-learn. (2019). Scikit-learn mean absolute error metric. Retrieved on November, 2019, from https://bit.ly/34As7Mr.
Silva, B. (2016) A framework for availability performance and survivability evaluation of disaster tolerant cloud computing systems. Ph.D. thesis, Universidade Federal de Pernambuco.
Silva, B., Maciel, P., Brilhante, J., & Zimmermann, A. (2014) Geoclouds modcs: A perfomability evaluation tool for disaster tolerant iaas clouds. In 2014 8th annual IEEE systems conference (SysCon) (pp. 116–122). IEEE.
Silva, B., Maciel, P. R. M., Zimmermannb, A., & Brilhantea, J. (2014). Survivability evaluation of disaster tolerant cloud computing systems. In Proceedings of probabilistic safety assessment & management conference (p. 12).
Souza, R., Callou, G., Camboin, K., Ferreira, J., & Maciel, P. (2013). The effects of temperature variation on data center it systems. In 2013 IEEE international conference on systems, man, and cybernetics (pp. 2354–2359). IEEE.
Toncar, V. (2018). VoIP basics: About jitter. Retrieved on April, 2018, from http://bit.ly/2JKlKw4.
Trivedi, K. (2016). Probability and statistics with reliability, queuing, and computer science applications. Hoboken: Wiley.
Book Google Scholar
Ziafat, H., & Babamir, S. M. (2017). A method for the optimum selection of datacenters in geographically distributed clouds. The Journal of Supercomputing, 73(9), 4042–4081.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Federal University of Pernambuco, Av. Jornalista Aníbal Fernandes, s/n - Cidade Universitária, Recife, PE, Brazil
Príscila Alves Lima & Paulo Maciel
Federal Institute of Education, Science, and Technology of Pernambuco (IFPE), Av. Prof. Luis Freire 500, Recife, PE, Brazil
Antônio Sá Barreto Neto

Authors

Príscila Alves Lima
View author publications
You can also search for this author in PubMed Google Scholar
Antônio Sá Barreto Neto
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Maciel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antônio Sá Barreto Neto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lima, P.A., Neto, A.S.B. & Maciel, P. Data centers’ services restoration based on the decision-making of distributed agents. Telecommun Syst 74, 367–378 (2020). https://doi.org/10.1007/s11235-020-00660-2

Download citation

Published: 14 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11235-020-00660-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data centers’ services restoration based on the decision-making of distributed agents

Abstract

Access this article

Similar content being viewed by others

A Methodology for Automating the Cloud Data Center Availability Assessment

Performance Estimation of Fault-prone Infrastructure-as-a-Service Cloud Computing Systems and their Cost-aware Optimal Performance Determination

Model-Based Sensitivity of a Disaster Tolerant Active-Active GENESIS Cloud System

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data centers’ services restoration based on the decision-making of distributed agents

Abstract

Access this article

Similar content being viewed by others

A Methodology for Automating the Cloud Data Center Availability Assessment

Performance Estimation of Fault-prone Infrastructure-as-a-Service Cloud Computing Systems and their Cost-aware Optimal Performance Determination

Model-Based Sensitivity of a Disaster Tolerant Active-Active GENESIS Cloud System

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation