Preprint Article Version 1 This version is not peer-reviewed

Credit Card Fraud Detection with Machine Learning and Big Data Analytics: A PySpark Framework Implementation

Version 1 : Received: 27 June 2024 / Approved: 28 June 2024 / Online: 1 July 2024 (09:24:46 CEST)

How to cite: Theodorakopoulos, L.; Theodoropoulou, A.; Zakka, F.; Halkiopoulos, C. Credit Card Fraud Detection with Machine Learning and Big Data Analytics: A PySpark Framework Implementation. Preprints 2024, 2024070022. https://doi.org/10.20944/preprints202407.0022.v1 Theodorakopoulos, L.; Theodoropoulou, A.; Zakka, F.; Halkiopoulos, C. Credit Card Fraud Detection with Machine Learning and Big Data Analytics: A PySpark Framework Implementation. Preprints 2024, 2024070022. https://doi.org/10.20944/preprints202407.0022.v1

Abstract

This paper presents a comprehensive study on applying machine learning (ML) techniques for real-time credit card fraud detection. With the increasing prevalence of credit card fraud, financial institutions face the challenge of detecting fraudulent transactions promptly and accurately. This research evaluates various ML models, including Logistic Regression, Decision Trees, Random Forests, XGBoost, and Deep Convolutional Neural Networks, for their effectiveness in identifying fraudulent activities. Utilizing PySpark for processing large-scale transaction data, the study highlights the importance of real-time analysis, adaptive learning, and handling imbalanced datasets. Key findings reveal that XgBoost, with its balance of accuracy and complexity, emerged as the most promising model, achieving an accuracy rate of 98% in detecting fraudulent transactions. The paper further discusses the potential of ensemble methods, graph-powered systems, and intelligent sampling in enhancing fraud detection capabilities. Challenges such as overfitting, data access, and the need for real-time analysis are addressed. Future research directions include implementing models in live transaction environments, expanding model applicability across financial platforms, and developing advanced anomaly detection methodologies. This study underscores the pivotal role of machine learning in safeguarding financial transactions against fraud, offering significant implications for consumers, financial institutions, and the broader financial ecosystem.

Keywords

Fraud Detection; Credit Cards; Machine Learning; Data Security; XGBoost; PySpark; Financial Security

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.