Preprint Article Version 1 This version is not peer-reviewed

Enhancing SQL Injection Detection with Trustworthy Ensemble Learning and Boosting Models Using Local Explanation Techniques

Version 1 : Received: 23 October 2024 / Approved: 24 October 2024 / Online: 24 October 2024 (10:27:16 CEST)

How to cite: Le, T.-T.-H.; Hwang, Y.; Choi, C.; Wardhani, R. W.; Putranto, D. S. C.; Kim, H. Enhancing SQL Injection Detection with Trustworthy Ensemble Learning and Boosting Models Using Local Explanation Techniques. Preprints 2024, 2024101878. https://doi.org/10.20944/preprints202410.1878.v1 Le, T.-T.-H.; Hwang, Y.; Choi, C.; Wardhani, R. W.; Putranto, D. S. C.; Kim, H. Enhancing SQL Injection Detection with Trustworthy Ensemble Learning and Boosting Models Using Local Explanation Techniques. Preprints 2024, 2024101878. https://doi.org/10.20944/preprints202410.1878.v1

Abstract

This paper presents a comparative analysis of several decision models for detecting Structured Query Language (SQL) injection attacks, which remain one of the most prevalent and serious security threats to web applications. SQL injection enables attackers to exploit databases, gaining unauthorized access, and manipulating data. Traditional detection methods often struggle due to the constantly evolving nature of these attacks, the increasing complexity of modern web applications, and the lack of transparency in the decision-making processes of machine learning models. To address these challenges, we evaluated the performance of various models, including Decision Tree, Random Forest, XGBoost, AdaBoost, Gradient Boosting Decision Tree (GBDT), and Histogram Gradient Boosting Decision Tree (HGBDT), using a comprehensive SQL injection dataset. The primary motivation behind our approach is to leverage the strengths of ensemble learning and boosting techniques to enhance detection accuracy and robustness against SQL injection attacks. By systematically comparing these models, we aim to identify the most effective algorithms for SQL injection detection systems. Our experiments show that Decision Tree, Random Forest, and AdaBoost achieved the highest performance, with an accuracy of 99.50% and an F1 score of 99.33%. Additionally, we applied SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) for local explainability, illustrating how each model classifies normal and attack cases. This transparency enhances the trustworthiness of our approach in detecting SQL injection attacks. These findings highlight the potential of ensemble methods to provide reliable and efficient solutions for detecting SQL injection attacks, thereby improving the security of web applications.

Keywords

Explained AI; SQL Injection Detection; Decision Tree; Random Forest; XGBoost; AdaBoost; Gradient Boosting Decision Tree; Histogram Gradient Boosting Decision Tree; Local Explanation; SHAP; LIME

Subject

Computer Science and Mathematics, Security Systems

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.