Preprint Article Version 1 This version is not peer-reviewed

Evolving Transparent Credit Risk Models: A Symbolic Regression Approach Using Genetic Programming

Version 1 : Received: 4 October 2024 / Approved: 7 October 2024 / Online: 8 October 2024 (17:39:31 CEST)

How to cite: Sotiropoulos, D.; Koronakos, G.; Solanakis, S. V. Evolving Transparent Credit Risk Models: A Symbolic Regression Approach Using Genetic Programming. Preprints 2024, 2024100527. https://doi.org/10.20944/preprints202410.0527.v1 Sotiropoulos, D.; Koronakos, G.; Solanakis, S. V. Evolving Transparent Credit Risk Models: A Symbolic Regression Approach Using Genetic Programming. Preprints 2024, 2024100527. https://doi.org/10.20944/preprints202410.0527.v1

Abstract

Credit scoring is a cornerstone of financial risk management, enabling financial institutions to assess the likelihood of loan default. However, widely recognized contemporary credit risk metrics, like FICO or Vantage scores, remain proprietary and inaccessible to the public. This study aims to devise an alternative credit scoring metric that mirrors the FICO score, using an extensive dataset from Lending Club. The challenge lies in the limited insights available on both the precise analytical formula and the comprehensive suite of credit-specific attributes integral to the FICO score's calculation. Our proposed metric leverages basic information provided by potential borrowers, eliminating the need for extensive historical credit data. We aim to articulate this credit risk metric in a closed analytical form with variable complexity. To achieve this, we employ a symbolic regression method anchored in Genetic Programming (GP). Here, Occam's razor principle guides evolutionary bias towards simpler, more interpretable models. To ascertain our method's efficacy, we juxtapose the approximation capabilities of GP-based symbolic regression with established machine learning regression models, such as Gaussian Support Vector Machines (GSVMs), Multi-Layer Perceptrons (MLPs), Regression Trees and Radial Basis Function Networks (RBFNs). Our experiments indicate that GP-based symbolic regression offers comparable accuracy with these benchmark methodologies. Moreover, the resultant analytical model offers invaluable insights into credit risk evaluation mechanisms, enabling stakeholders to make informed credit risk assessments. This study contributes to the growing demand for transparent machine learning models by demonstrating the value of interpretable, data-driven credit scoring models.

Keywords

Credit Risk Assessment; Neural Networks; Support Vector Machines; Genetic Programming; Radial Basis Functions Networks

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.