Preprint Article Version 1 This version is not peer-reviewed

Comparative Investigation of Traditional Machine Learning Models and Transformer Models for Phishing Email Detection

Version 1 : Received: 18 October 2024 / Approved: 18 October 2024 / Online: 18 October 2024 (11:18:03 CEST)

How to cite: Meléndez, R.; Ptaszynski, M.; Fumito, M. Comparative Investigation of Traditional Machine Learning Models and Transformer Models for Phishing Email Detection. Preprints 2024, 2024101467. https://doi.org/10.20944/preprints202410.1467.v1 Meléndez, R.; Ptaszynski, M.; Fumito, M. Comparative Investigation of Traditional Machine Learning Models and Transformer Models for Phishing Email Detection. Preprints 2024, 2024101467. https://doi.org/10.20944/preprints202410.1467.v1

Abstract

Phishing emails pose a significant threat to cybersecurity worldwide. There are already tools that mitigate the impact of these emails by filtering them, but these tools are only as reliable as their ability to detect new formats and techniques for creating phishing emails. In this paper we investigated how traditional models and transformer models work on the classification task of identifying if an email is phishing or not. We realized that transformer models, in particular DistilBERT, BERT, and RoBERTa had a significantly higher performance compared to traditional models like Logistic Regression, Random Forest, Support Vector Machine, and Naive Bayes. The process consisted in using a large and robust dataset of emails and applying preprocessing and optimization techniques to maximize the best result possible. roBERTa showed its outstanding capacity to identify phishing emails by achieving the maximum accuracy of 0.9943. Even though they were free successful, traditional models performed marginally worse; SVM performed the best, with an accuracy of 0.9854. The results emphasize the value of sophisticated text processing methods and the possibility of transformer models to improve email security by thwarting phishing attempts.

Keywords

Phishing detection; Phishing emails; Machine Learning; Transformer Models; Traditional 14 Models; Supervised Learning; Text Classification, Cyber threat Mitigation; Cybersecurity

Subject

Computer Science and Mathematics, Information Systems

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.