Preprint Article Version 1 This version is not peer-reviewed

Log Based Fault Localization with Unsupervised Log Segmentation

Version 1 : Received: 14 August 2024 / Approved: 14 August 2024 / Online: 15 August 2024 (07:30:48 CEST)

How to cite: Dobrowolski, W.; Iwach-Kowalski, K.; Nikodem, M.; Unold, O. Log Based Fault Localization with Unsupervised Log Segmentation. Preprints 2024, 2024081096. https://doi.org/10.20944/preprints202408.1096.v1 Dobrowolski, W.; Iwach-Kowalski, K.; Nikodem, M.; Unold, O. Log Based Fault Localization with Unsupervised Log Segmentation. Preprints 2024, 2024081096. https://doi.org/10.20944/preprints202408.1096.v1

Abstract

Localizing faults in a software is a tedious process. The manual approach is becoming impractical because of the large size and complexity of contemporary computer systems as well as their logs, which are often the primary source of information about the fault. Log-based Fault Localization (LBFL) is a popular method applied for this purpose. However, in real-world scenarios, this method is vulnerable to a large number of previously unseen log lines. In this paper, we propose a novel method that can guide programmers to the location of a fault by creating a hierarchy of log lines with the highest rank, selected by the traditional LBFL method. We use the intuition that the symptoms of faults are in the context of normal behavior, whereas suspicious log lines grouped together are from new or additional functionalities turned on during faulty execution. To obtain this context, we used unsupervised log sequence segmentation, which has been previously used to segment log sequences into meaningful segments. Experiments on real-life examples show that our method reduces the effort to find the most crucial logs by up to 64% compared to the traditional timestamp approach. We demonstrated that context is highly useful in advancing fault localization, showing the possibility of further speeding up the process.

Keywords

Automated Log Analysis; Log-based Fault Localization; Log Sequence; Unsupervised Log Sequence Segmentation; Software Reliability

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.