Version 1
: Received: 27 June 2024 / Approved: 28 June 2024 / Online: 1 July 2024 (09:24:46 CEST)
How to cite:
Theodorakopoulos, L.; Theodoropoulou, A.; Zakka, F.; Halkiopoulos, C. Credit Card Fraud Detection with Machine Learning and Big Data Analytics: A PySpark Framework Implementation. Preprints2024, 2024070022. https://doi.org/10.20944/preprints202407.0022.v1
Theodorakopoulos, L.; Theodoropoulou, A.; Zakka, F.; Halkiopoulos, C. Credit Card Fraud Detection with Machine Learning and Big Data Analytics: A PySpark Framework Implementation. Preprints 2024, 2024070022. https://doi.org/10.20944/preprints202407.0022.v1
Theodorakopoulos, L.; Theodoropoulou, A.; Zakka, F.; Halkiopoulos, C. Credit Card Fraud Detection with Machine Learning and Big Data Analytics: A PySpark Framework Implementation. Preprints2024, 2024070022. https://doi.org/10.20944/preprints202407.0022.v1
APA Style
Theodorakopoulos, L., Theodoropoulou, A., Zakka, F., & Halkiopoulos, C. (2024). Credit Card Fraud Detection with Machine Learning and Big Data Analytics: A PySpark Framework Implementation. Preprints. https://doi.org/10.20944/preprints202407.0022.v1
Chicago/Turabian Style
Theodorakopoulos, L., Fotini Zakka and Constantinos Halkiopoulos. 2024 "Credit Card Fraud Detection with Machine Learning and Big Data Analytics: A PySpark Framework Implementation" Preprints. https://doi.org/10.20944/preprints202407.0022.v1
Abstract
This paper presents a comprehensive study on applying machine learning (ML) techniques for real-time credit card fraud detection. With the increasing prevalence of credit card fraud, financial institutions face the challenge of detecting fraudulent transactions promptly and accurately. This research evaluates various ML models, including Logistic Regression, Decision Trees, Random Forests, XGBoost, and Deep Convolutional Neural Networks, for their effectiveness in identifying fraudulent activities. Utilizing PySpark for processing large-scale transaction data, the study highlights the importance of real-time analysis, adaptive learning, and handling imbalanced datasets. Key findings reveal that XgBoost, with its balance of accuracy and complexity, emerged as the most promising model, achieving an accuracy rate of 98% in detecting fraudulent transactions. The paper further discusses the potential of ensemble methods, graph-powered systems, and intelligent sampling in enhancing fraud detection capabilities. Challenges such as overfitting, data access, and the need for real-time analysis are addressed. Future research directions include implementing models in live transaction environments, expanding model applicability across financial platforms, and developing advanced anomaly detection methodologies. This study underscores the pivotal role of machine learning in safeguarding financial transactions against fraud, offering significant implications for consumers, financial institutions, and the broader financial ecosystem.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.