Advancing Cyber Incident Timeline Analysis Through Retrieval-Augmented Generation and Large Language Models

Preprint

Article

Advancing Cyber Incident Timeline Analysis Through Retrieval-Augmented Generation and Large Language Models

This version is not peer-reviewed.

Fatma Yasmine Loumachi,

Mohamed Chahine Ghanem^*

Mohamed Amine Ferrag

Fatma Yasmine Loumachi,

Mohamed Chahine Ghanem^*

Mohamed Amine Ferrag

This version is not peer-reviewed.

Altmetrics

Downloads

Views

Comments

Submitted:

28 December 2024

Posted:

30 December 2024

You are already at the latest version

Abstract

Cyber timeline analysis or Forensic timeline analysis is critical in Digital Forensics and Incident Response (DFIR) investigations. It involves examining artefacts and events—particularly their timestamps and associated metadata—to detect anomalies, establish correlations, and reconstruct a detailed sequence of the incident. Traditional approaches rely on processing structured artefacts, such as logs and filesystem metadata, using multiple specialised tools for evidence identification, feature extraction, and timeline reconstruction. This paper introduces an innovative framework, GenDFIR, a context-specific approach powered by large language models (LLMs) capabilities. Specifically, it proposes the use of Llama 3.1 8B in zero-shot, selected for its ability to understand cyber threat nuances, integrated with a Retrieval-Augmented Generation (RAG) agent. Our approach comprises two main stages: (1) Data Preprocessing and Structuring: Incident events, represented as textual data, are transformed into a well-structured document, forming a comprehensive knowledge base of the incident. (2) Context Retrieval and Semantic Enrichment: A RAG agent retrieves relevant incident events from the knowledge base based on user prompts. The LLM processes the pertinent retrieved-context, enabling detailed interpretation and semantic enhancement. The proposed framework was tested on synthetic cyber incident events in a controlled environment, with results assessed using DFIR-tailored, context-specific metrics designed to evaluate the framework’s performance, reliability, and robustness, supported by human evaluation to validate the accuracy and reliability of the outcomes. Our findings demonstrate the potential of LLMs in DFIR and the automation of the timeline analysis process. This approach highlights the power of Generative AI, particularly LLMs, and opens new possibilities for advanced threat detection and incident reconstruction.

Keywords:

Subject:

Computer Science and Mathematics - Security Systems

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Alerts

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Advancing Cyber Incident Timeline Analysis Through Retrieval-Augmented Generation and Large Language Models

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe