1. Introduction
The progress in on-line banking systems has motivated the use of credit cards to ease the payment of product and services. Credit cards are payment cards issued to cardholders to purchase products and services based on their accumulated debt. In most cases, before a credit card payment is approved, some vital information are requested from the cardholders such as the personal identification number (PIN), card verification value (CVV) number, expiration date, and so on. This information helps to validate the authenticity of the cardholders, to prevent fraud; yet cases of credit card fraud are recorded daily. Nonetheless, credit cards are becoming favoured and widely employed for on-line banking payments, which implies that the number of credit card transactions has increased exponentially despite the theft risk it imposed.
This growth in the acceptability of credit cards has made it difficult to differentiate between a fraudulent and non-fraudulent credit card transaction. Accordingly, credit card theft is on the increase irrespective of the security information (such as PIN and CVV) required before a transaction is approved. Although security checks such as tokenization and data encryption are used to prevent credit card theft [
13], they cannot completely prevent fraudulent credit card transactions. Particularly, credit card fraud occurs remotely, whereby only the simple card information is all that is required. The time and place for the fraudulent transactions do not call for a PIN, card imprint or handwritten signature. Most times, the victims of fraud are oblivious that the perpetrators have access to their credit card information; especially, when these credit cards are used for payment on phishing websites [
5,
26]. The quickest way to spot these credit card frauds is to examine each card spending patterns and look for any differences from the regular spending patterns. However, it is complex to achieve because the daily number of credit card transactions is huge, resulting in large transactional information. As a result, there has been a lot of recent study into the exact, rapid, and efficient prediction of credit card fraud.
In recent times, machine learning (ML) tools are deployed in the literature to efficiently predict credit card frauds. ML tools enable computers to improve their forecasting abilities by learning from previous datasets. Different ML tools such as hidden Markov models (HMMs), decision trees, K-nearest neighbour, logistic regression, and so on, have been deployed for the prediction of credit card frauds [
11,
12,
14,
25,
32]. However, work is being done to improve the predictive ability of these ML tools.
With emphasis on HMMs, it is a popular and flexible ML tool that can easily model randomly changing datasets. HMMs can easily predict fraudulent transactions from a sequence of defined observations. Nonetheless, the ability of the HMM to effectively predict these credit card frauds depend on the adopted feature extraction technique. This implies that the more reliable the output of the feature extraction technique the better the prediction performance of the HMM [
3,
18,
20]. Likewise, the length of the outputted feature vector which doubles as the dimension of the HMM, determines the computational time complexity imposed on the HMM. The longer the length of the feature vector, the more the computational time complexity imposed on the HMM [
17,
19]. Therefore, it is paramount to carefully select the feature attraction technique that will be combined with the HMM to balance the trade-off between its performance gain and the computational time complexity.
Accordingly, this article analyses two feature extraction techniques that can be combined with the HMM for the prediction of credit card fraud. First, the principal component analysis (PCA) [
1,
27] technique is combined with the HMM for the prediction of credit card fraud. In addition to employing the PCA, this article computationally determines the "optimal" feature vector length that will balance the trade-off between the PCA performance gain and the computational time complexity it imposed on the HMM. It was discovered that this `optimal’ feature vector length is not computational time efficient when the PCA is combined with the HMM. Therefore, the features derived using the PCA are converted to statistical features to reduce the computational time complexity imposed on the HMM. This proposed robust but simple statistical features denoted as MRE; Mean, Relative Amplitude and Entropy, are merged to form a feature vector that can be combined with the HMM to predict credit card frauds effectively.
Furthermore, as highlighted in [
19,
21], the Gaussian emission distribution parameters of the HMM are sensitive to a flat start or random values, which impedes prediction performance. Hence, the K-mean clustering (K-MC) technique [
9,
15] and the Gaussian mixture model (GMM) [
7,
23] are sequentially used to initialise the HMM process. Since the K-MC and GMM techniques are embedded in the Gaussian emission process of the HMM, it is referred to as an ensemble hidden Markov model (EHMM) in this article. Therefore, this article evaluates the performance of the PCA-EHMM and MRE-EHMM using the credit card transactions dataset of European cardholders
1 gathered within two days in September 2013. The results were documented using different performance metrics such as the recall/sensitivity (
), specificity (
), precision,
, and F1-score (
).
The contribution and importance of this article are as follows. This article develops two techniques based on the EHMM that can be used to effectively predict fraudulent credit card transactions. This EHMM approach is different from the regular HMM approach for the prediction of credit card frauds in the literature; thus, it is innovative. The results obtained from these two techniques are applaudable and can be easily reproduced for real-time credit card transactions. Furthermore, it is hoped that this article will save cardholders and financial institutions money on a daily basis, as well as increase their confidence in on-line banking that involves credit card information.
The remaining part of this article is organised as follows.
Section 2 briefly reviews some of the recent works on the prediction of credit card frauds using HMM. In
Section 3, the dataset used for result verification is discussed. The PCA and MRE feature extraction techniques are explained in
Section 4.
Section 5 discusses the EHMM in detail while also explaining its training process. In
Section 6, the results obtained from the PCA-EHMM and MRE-EHMM are presented and discussed. This section also explains the performance metric used in analysing the results.
Section 7 concludes the article with observable remarks.
2. Related Work
The widespread use of credit cards for on-line purchases have also increased the possibility of fraud. To buttress, according to the Nigerian Deposit Insurance Corporation (NDIC) annual report of 2018, between 2016 and 2018, the number of credit card fraud incidents in Nigeria increased by 33% while the actual amount lost to this credit card fraud climbed by 84% [
22]. Likewise, the federal trade commission (FTC) affirmed that there were around 1579 data breaches totalling 179 million data points, with credit card fraud being the most widespread [
6,
12]. Therefore, a wide range of ML and data mining techniques have been deployed in the literature to proffer solutions to this menace. With respect to ML techniques, this section reviews some recent and related credit card prediction techniques based on the principle of the HMM.
In [
10], the sequence of operation of credit card transactions is modelled using the HMM. Also, the paper provided information on how HMMs can be used to detect fraudulent credit card transactions. Yet, the paper do not document any clear results to buttress the performance of their developed HMM. More importantly, the paper do not provide information on the feature extraction technique used or combined with their developed HMM. As mentioned earlier, the feature extraction technique used with any ML technique including the HMM determines the performance of the ML technique; hence, the focus of this article.
The authors in [
8] modelled a fraud detection system that would attempt to detect credit card fraud as accurately as possible by producing clusters and analysing the clusters formed by the dataset for anomalies. Therefore, their work examined the detection accuracy of two hybrid techniques: K-MC technique with multilayer perceptron (MLP) and K-MC technique with HMM. The authors show that the detection accuracy of the two models examined are fairly the same. Nonetheless, the paper was silent on the adopted feature extraction technique which is a major factor in analysing the performances of their proposed models.
The process of fraud detection using the HMM was described in [
31]. In their work, they combined the HMM process with the K-MC technique to form what this article described as an EHMM. The K-MC is used to initialise the HMM process to improve its performance. Their model’s performance was presented in terms of recall and precision. Although their model training flowchart displays the importance of the feature extraction step, there was no discussion on the their adopted feature extraction technique. This article combines K-MC technique and GMM sequentially with the HMM to form an EHMM and focuses on the adopted feature extraction technique, which determines the performance of the EHMM and any other ML tool.
A credit card fraud detection model is developed using multiple perspective HMM based approach in [
14]. The study develops eight HMMs to model sequences of credit card transactions. They employ history-based features with the HMM based on three perspective that are aided with different assumptions. Their result is documented in terms of the recall and precision. On the other hand, this article derives the features vector using PCA and a simple statistical method termed MRE. More so, an EHMM is used in this article, which performance is envisaged to surpass the traditional HMM method.
5. Ensemble Hidden Markov Model (EHMM)
The HMM is a probabilistic ranking classifier that assigns a tag to each observational segment in a sequence. As a result, it determines the probability distribution over the set of observations and produces the sequence of observations that is most probable [
29]. Because of this, it is able to model and categorise the set of observations derived from this credit card transaction with ease. There are two consecutive stages to the HMM’s operations: the training stage and the detecting stage. During training, the HMM estimates three major parameters: (I) start probability,
, (II) transition matrix,
, and (III) Gaussian emission distribution parameters,
. The extracted feature vector is represented as a series of states across time by the transition probability matrix,
. This transition matrix comprises of probability values that enables switching between states. The transition matrix,
, can be computed by using different maximum-likelihood estimation (MLE) techniques like the Baum-Welch algorithm [
4]. The Gaussian emission distribution parameters,
=
are used in the MLE process, where
M and ∑ are defined as above, and
is the mixture weight. These Gaussian parameters,
assume random values or flat values at the start of the MLE process. However, HMMs are sensitive to flat start or random values of the Gaussian parameters
M, ∑ and
because it limits the prediction performance of the model in general. Accordingly, the K-MC technique [
9,
15] and the Gaussian mixture model (GMM) [
7,
23] are sequentially used to initialise the Gaussian emission parameters. As a result, this article refers to the HMM as an ensemble hidden Markov model (EHMM) because the K-MC and GMM techniques are embedded in the Gaussian emission process as shown in
Figure 1.
To carry out detection, the extracted feature vector
f from the unknown transaction is matched with the Gaussian parameters
=
to output a modified feature vector
. Subsequently, the
,
, and
are used in the Viterbi algorithm (
) [
30] to predict the class of the transaction. The
outputs the path with optimal probability by calculating all probable hidden paths from
,
, and
. This straightforward HMM approach using the MLE and
has been employed to predict credit card fraud in the literature. Nevertheless, the reliability of the extracted feature vector showcases the performance of the EHMM. In fact, the more reliable the feature vector the better is the performance of the EHMM. Hence, attention is on the feature extraction technique in this article.
5.1. EHMM Training
The credit card dataset, which includes 284,807 transactions, is divided into two portions. The model is tested on a small subset of the dataset, while the majority is used for training (between ). The training portion is divided into two groups: fraudulent transactions and non-fraudulent transactions. As indicated in the dataset, fraudulent transactions are represented as ’1’ while non-fraudulent transactions are designated with ’0’. As a result, two HMMs are created to represent fraudulent transactions (, , ) and non-fraudulent transactions (, , ). In each scenario, a four-state ergodic HMM with two mixture weights is used. The dimension of the HMM is dictated by the resulting feature extraction vector dimension. The MRE-HMM assumes 5-dimensions, whereas the PCA-HMM assumes k-dimension.
During testing, the two HMMs are merged and put into the . This combined HMM (, , ) is an eight-state model with four mixture weights. States 1-4 indicate fraudulent transactions, while states 5–8 represent non-fraudulent transactions. The parameter is utilized to fine-tune the resulting feature vector for the unknown card transaction to be predicted. Thus, the uses the and to forecast the sequence of states, thereby predicting whether the unknown card transaction is fraudulent or not. Also, the shifts between the two states (states 1-4 and 5-8) with equal probability as defined in the transmission matrix.