Skip to main content
Log in

Multi-agent architecture for fault recovery in self-healing systems

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Self-healing, a prominent property of self-adaptiveness provides reliability, availability, maintainability, and survivability to a software system. These qualitative factors are very salient to modern distributed systems in which components and their collaboration often vary. Survivability of such systems can be best addressed from an architectural viewpoint. When it comes to maintainability and reliability, architectural level adaptation is not often supported during the design phase. Adaptation to fault tolerance into the design phase of the system development process can increase the scope of software availability and thereby attaining self-healing. In distributed systems, most of the existing architectures are often associated with communication and correspondence as primary criteria. On the other hand, a multi-agent mechanism helps in schematic control of functionality, communication by emphasizing scalability. In this paper, a novel architecture was proposed that could support agent-based distributed systems to address fault recovery aspects for achieving self-adaptiveness. Unlike traditional multi-agent architecture, task-oriented functional multi-agent communication is incorporated for various activities during design phase designated to perform self-healing criteria. An adaptation of agent communication control flow is proposed using three novel mechanism such as planning, functioning and enacting as agents’ critical responsibility. The paper also validates the proposed architecture for resource and availability based faults related to crash and resource unavailability using performance-based evaluation metrics. A case-based application with single thread connectivity is used to reflect the architecture during application design phase and is tested for success using mean response time as evaluation metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Andersson J, De Lemos R, Malek S, Weyns D (2009) Modeling dimensions of self-adaptive software systems. In: Software engineering for self-adaptive systems. Springer, Berlin, pp 27–47

  • Arlat J, Costes A, Crouzet Y, Laprie JC, Powell D (1993) Fault injection and dependability evaluation of fault-tolerant systems. IEEE Trans Comput 42(8):913–923

    Article  Google Scholar 

  • Azaiez M, Chainbi W (2016) A multi-agent system architecture for self-healing cloud infrastructure. In: Proceedings of the international conference on internet of things and cloud computing. ACM, New York, pp 1–6. Article no. 7

  • Azim MT, Neamtiu I, Marvel LM (2014) Towards self-healing smartphone software via automated patching. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering. ACM, New York, pp 623–628

  • Babaoglu O, Jelasity M, Montresor A, Fetzer C, Leonardi S, van Moorsel A, van Steen M (eds) (2005) Self-star properties in complex information systems: conceptual and practical foundations. Conceptual and practical foundations. Springer, Berlin, p 3460

    Google Scholar 

  • Baker M, Sullivan M (1992) The recovery box: using fast recovery to provide high availability in the UNIX environment. In: USENIX summer 1992 Technical Conference, San Antonio

  • Breitgand D, Goldstein M, Henis E, Shehory O, Weinsberg Y (2007) Panacea towards a self-healing development framework. In: 10th IFIP/IEEE international symposium on integrated network management, pp 169–178

  • Brooks FP Jr (1995) The mythical man-month: essays on software engineering, anniversary edition, 2nd edn. Pearson Education, New Delhi

    Google Scholar 

  • Chainbi W (2005) Why applying agent technology to autonomic computing? Front Artif Intell Appl 135:282

    Google Scholar 

  • Cheng B, de Lemos R, Giese H, Inverardi P, Magee J, Malek RM, Müller H, Park S, Shaw M, Tichy M (2008) Software engineering for self-adaptive systems: a research road map. In: Dagstuhl seminar proceedings 08031, Schloss Dagstuhl-Leibniz-Zentrum für Informatik

  • Dai W, Riliskis L, Wang P, Vyatkin V, Guan X (2018) A cloud-based decision support system for self-healing in distributed automation systems using fault tree analysis. IEEE Trans Ind Inf 14(3):989–1000

    Article  Google Scholar 

  • Dashofy EM, Van der Hoek A, Taylor RN (2002) Towards architecture-based self-healing systems. In: Proceedings of the first workshop on self-healing systems. ACM, New York, pp 21–26

  • De Lemos R, Giese H, Müller HA, Shaw M, Andersson J, Litoiu M, Schmerl B, Tamura G, Villegas NM, Vogel T (2013) Software engineering for self-adaptive systems: a second research roadmap. In: Software engineering for self-adaptive systems II. Springer, Berlin, pp 1–32

  • Elnozahy EN, Alvisi L, Wang YM, Johnson DB (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv (CSUR) 34(3):375–408

    Article  Google Scholar 

  • Essa YM, El-Mahalawy A, Attiya G, El-Sayed A (2017) A distributed multi-agents architecture for self healing healthcare data center. In: 4th IEEE international conference on engineering technologies and applied sciences (ICETAS). IEEE, New York, pp 1–6

  • Feyzi F (2020) Model-driven development of self-adaptive multi-agent systems with context-awareness. Int J Comput Aided Eng Technol 12(2):131–156

    Article  Google Scholar 

  • Ganek AG, Corbi TA (2003) The dawning of the autonomic computing era. IBM Syst J 42(1):5–18

    Article  Google Scholar 

  • Garlan D, Cheng SW, Huang AC, Schmerl B, Steenkiste P (2004) Rainbow: architecture-based self-adaptation with reusable infrastructure. Computer 37(10):46–54

    Article  Google Scholar 

  • Ghosh D, Sharman R, Rao HR, Upadhyaya S (2007) Self-healing systems—survey and synthesis. Decis Support Syst 42(4):2164–2185

    Article  Google Scholar 

  • Goldstein M, Shehory O, Weinsberg Y (2007) Can self-healing software cope with loitering? In: Fourth international workshop on software quality assurance: in conjunction with the 6th ESEC/FSE joint meeting. ACM, New York, pp 1–8

  • Golpayegani F (2015) Multi-agent collaboration in distributed self-adaptive systems. In: 2015 IEEE international conference on self-adaptive and self-organizing systems workshops. IEEE, New York, pp 146–151

  • Gray J (1999) What next? A dozen remaining IT problems. Turing award lecture

  • Hennessy J (1999) The future of systems research. Computer 32(8):27–33

    Article  Google Scholar 

  • Jennings NR (2000) On agent-based software engineering. Artif Intell 117(2):277–296

    Article  Google Scholar 

  • Jennings NR, Wooldridge M (2000) Agent-oriented software engineering [Handbook of agent technology]. AAAI/MIT Press, Cambridge

    Google Scholar 

  • Kamdar R, Paliwal P, Kumar Y (2018) A state of art review on various aspects of multi-agent system. J Circuits Syst Comput 27(11):1830006

    Article  Google Scholar 

  • Kephart JO, Chess DM (2003) The vision of autonomic computing. Computer 36(1):41–50

    Article  MathSciNet  Google Scholar 

  • Laddaga R, Robertson P, Shrobe H (2001) Introduction to self-adaptive software: applications. In: International workshop on self-adaptive software. Springer, Berlin, pp 1–5

  • Lampson B (1999) Computer systems research-past and future, keynote address, 17th SOSP

  • Lee S, Oh J, Lee E (2005) An architecture for multi-agent based self-adaptive system in mobile environment. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 494–500

  • Magalhães JP, Silva LM (2015) SHõWA: a self-healing framework for web-based applications. ACM Trans Auton Adapt Syst 10(1):4

    Article  Google Scholar 

  • Merideth MG (2003) Enhancing survivability with proactive fault-containment. In: DSN student forum, Citeseer 20

  • Merideth MG, Narasimhan P (2003) Proactive containment of malice in survivable distributed systems. In: Security and management, pp 3–9

  • Montani S, Anglano C (2008) Achieving self-healing in service delivery software systems by means of case-based reasoning. Appl Intell 28(2):139–152

    Article  Google Scholar 

  • Patterson D, Brown A, Broadwell P, Candea G, Chen M, Cutler J, Enriquez P, Fox A, Kiciman E, Merzbacher M, Oppenheimer D (2002) Recovery-oriented computing (ROC): motivation, definition, techniques, and case studies. Technical Report UCB//CSD-02-1175, UC Berkeley Computer Science

  • Rajput PK, Sikka G (2019) Exploration in adaptiveness to achieve automated fault recovery in self-healing software systems: a review. Intell Decis Technol 13(3):329–341

    Article  Google Scholar 

  • Ravulakollu KK, Khan MA, Abraham A (2016) Trends in ambient intelligent systems. Springer, Cham

    Book  Google Scholar 

  • Ribeiro L, Barata J, Mendes P (2008) MAS and SOA: complementary automation paradigms. In: International conference on information technology for balanced automation systems. Springer, Boston, pp 259–268

  • Robertson P, Laddaga R, Shrobe H (2009) Introduction: the first international workshop on self-adaptive software. In: International workshop on self-adaptive software. Springer, Berlin, pp 1–10

  • Salehie M, Tahvildari L (2009) Self-adaptive software: landscape and research challenges. ACM Trans Auton Adapt Syst (TAAS) 4(2):1–42

    Article  Google Scholar 

  • Simon HA (1996) The sciences of the artificial. MIT Press, Cambridge

    Google Scholar 

  • Sinha J, Kant S, Ravulakollu KK (2019) Significance of intelligent agents in strengthening consumer relationship management. Int J Eng Res Technol 12(3):364–372

    Google Scholar 

  • Stipancic T, Jerbic B, Curkovic P (2016) A context-aware approach in realization of socially intelligent industrial robots. Robot Comput Integr Manuf 37:79–89

    Article  Google Scholar 

  • Strang T, Linnhoff-Popien C (2004) A context modeling survey. In: Workshop on advanced context modelling, reasoning and management, UbiComp, vol 4, pp 34–41

  • Wang L, Li Q (2016) A multi-agent based framework for self-adaptive software with search-based optimization. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, New York, pp 621–625

  • Wooldridge M (1997) Agent-based software engineering. IEE Proc Softw 144(1):26–37

    Article  Google Scholar 

Download references

Acknowledgements

Authors would like to thank Dr. Krian Kumar Ravulakollu, Senior Member International Neural Network Society for his direction and suggestions as advisory for the experimental strategy and validation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pushpendra Kumar Rajput.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajput, P.K., Sikka, G. Multi-agent architecture for fault recovery in self-healing systems. J Ambient Intell Human Comput 12, 2849–2866 (2021). https://doi.org/10.1007/s12652-020-02443-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02443-8

Keywords

Navigation