Preprint Article Version 1 This version is not peer-reviewed

AI-Driven Self-Healing Cloud Systems: Enhancing Reliability and Reducing Downtime through Event-Driven Automation

Version 1 : Received: 23 August 2024 / Approved: 26 August 2024 / Online: 27 August 2024 (11:43:55 CEST)

How to cite: Arora, R.; Kumar, A.; Soni, A.; Tiwari, A. AI-Driven Self-Healing Cloud Systems: Enhancing Reliability and Reducing Downtime through Event-Driven Automation. Preprints 2024, 2024081860. https://doi.org/10.20944/preprints202408.1860.v1 Arora, R.; Kumar, A.; Soni, A.; Tiwari, A. AI-Driven Self-Healing Cloud Systems: Enhancing Reliability and Reducing Downtime through Event-Driven Automation. Preprints 2024, 2024081860. https://doi.org/10.20944/preprints202408.1860.v1

Abstract

Abstract. The goal of this study is to create and carry out a self-healing cloud system by combining an event-driven automation framework depending on the if-this-then-that principle for managing incidents and recovery. A recovery engine with Artificial Intelligence (AI)-based decision-making approaches is presented—which chooses the best remedial actions from a pre-established catalogue in order to maximise system reliability and minimise downtime. The system is tested on an OpenStack-based video on demand service—where multiple issues are replicated in order to assess the efficaciousness of various recovery actions and workflows. The decision-making module of the recovery engine examines data from these experiments to determine the most effective remedial actions, taking into account their impact on the quality of service and other factors. The recovery engine is only meant to need human input when it comes to parameterizing and optimising decision models at particular points in time. In order to show how these AI-driven decision-making techniques can enhance mean time to repair and overall service quality in cloud environments—the study presents and assesses their results. This novel strategy represents a change towards cloud systems that are more sturdy autonomous, and able to effectively manage anomalies and recover from failure.

Keywords

Keywords: Artificial Intelligence, Cloud System, Engine, OpenStack

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.