1. Introduction
Atrial fibrillation (AF) is a common sustained arrhythmia in clinical practice and can significantly impair quality of life and increase the risk of serious medical conditions, including stroke and heart attack [
1]. The prevalence of AF increases with age [
2], therefore, as the aging population problem becomes increasingly prominent, the threat of atrial fibrillation to human health becomes increasingly severe. When AF occurs, the disorganized fibrillation of the atrium will reduce the cardiac output and accelerate the formation of a thrombus, which may cause blood vessel block and further lead to life-threatening diseases such as ischemic stroke [
3] and myocardial infarction [
4].
To evaluate the risk of AF during different phases and take the intervention and treatment in time, AF is commonly divided into different types. Currently, the AF classification according to the presentation, duration, and spontaneous of AF episodes has become a consensus in authoritative guidelines [
5], the corresponding five AF types are: first-diagnosed AF, paroxysmal AF (PAF), persistent AF, long-standing persistent AF, and permanent AF. AF usually manifests as PAF at the beginning, which is defined as the AF that terminates spontaneously or with intervention within 7 days of onset. There are many ways to manage and treat AF, including drug therapy, implanted medical instruments, and radio-frequency ablation, but all these methods carry potential risks. For example, drug therapy has been shown to be effective in patients with newly diagnosed AF with a treatment success rate of about 50% [
6,
7], but in the case of patients with persistent AF, it may not only be less effective but also cause other arrhythmias and even fatal complications. So, to prevent irreversible atrial lesions and prevent the further deterioration of AF, early diagnosis of AF has become particularly important.
Electrocardiogram (ECG) is a commonly used tool that could assist cardiologists to diagnose AF in clinical practice, the development of accurate predictors based on ECG is important for designing high-performance models [
5]. Three types of ECG episodes from both normal and PAF subjects are divided as (A) ECG episode of a normal subject under rest state, (B) ECG episode of a PAF subject when AF doesn’t occur, and (C) ECG episode of a PAF subject when AF occurs. Compared with the ECG episode from a normal subject, (B) have occasionally premature beats and subtle changes of R-R interval in the ECG episode from PAF subjects when AF doesn’t occur, which could be used as predictors of the onset of PAF. As such, developing a PAF onset prediction model would be significant for several reasons. First, when AF doesn’t occur, it's difficult to distinguish the ECG of PAF subjects from that of normal subjects. An automatic PAF onset prediction model could assist clinicians in the risk assessment of patients with PAF. Second, the positive prediction result could suggest patients to receive timely interventions, like drug therapy, which could effectively avoid the deterioration of AF in PAF subjects. Third, during postoperative follow-up of radio-frequency ablation surgery, the model is also helpful for assessing the surgical effect [
7].
In past decades, many PAF onset prediction algorithms based on machine learning methods have been proposed, with most of these studies based on the extracted heart rate variability (HRV) features including time domain, frequency, nonlinear and time-frequency domain features. In March 2001, the PhysioNet Computing in Cardiology Challenge 2001 was held [
8], during which researchers proposed various methods to predict the onset of PAF, such as methods based on HRV features [
9,
10,
11,
12], atrial premature contraction numbers [
13,
14], Rhythm-based heartbeat duration normalization [
15] and P-wave morphology [
16]. The publicly accessible PAF prediction challenge database (AFPDB) was also provided in this competition, which could be used to train and test the classification model. Recently, Mohebbi [
17] extracted spectrum, bispectrum and non-linear features from the 30-minute HRV signal and used a support vector machine (SVM)-based classifier to predict the onset of PAF, achieving a sensitivity of 96.3%. Boon [
18] used genetic algorithm to optimize the features extracted from 15 minutes HRV signal and also used SVM classifier to predict the onset of PAF, achieving an accuracy of 79.3%. In another study, they used a shorter 5-minute HRV signal and achieved an accuracy of 87.7% [
19]. Narin [
20] also used 5 minutes HRV signal for the linear and non-linear features extraction, they used the k-nearest neighbors (KNN) classifier and further discussed the performance of the model for data segments in different time windows. Wang [
21] improved the speed of SVM algorithm and gained 92.5% accuracy for the test set of different databases but the required length of the signal was 5 minutes long and the generalization ability (87.0% accuracy) on clinical tests was unsatisfying. Sutton [
22] proposed the PhysOnline, an open-source streaming physiological signal analysis platform, and demonstrated the effective online prediction of PAF. Although the HRV analysis commonly used in such studies on the topic of PAF prediction could be well compatible with the feature selection methods and machine learning classifiers, their extraction and selection of hand-crafted features would be inevitably subjective as well as a time consuming and labor-intensive processes. Most recently, couple studies have reported on developing AF detection algorithms using deep learning methods [
23,
24,
25,
26,
27], showing better performance compared with feature extraction [
28,
29,
30] and machine learning methods [
31,
32,
33]. However, few studies on the topic of PAF prediction based on deep learning methods were presented so far, and a prominent limitation in these machine learning based methods is a poor real-time performance since the time duration of the ECG signal used for HRV analysis is commonly at least couple minutes, which doesn’t meet the requirements of the real-time monitoring scenarios [
34].
To address these challenges, this paper proposes a novel deep learning based method for real-time predicting PAF onset, named the PAFNet model. This automated algorithm integrates a sliding window technique on raw R-R interval of ECG segments with an end-to-end convolutional neural network (CNN). This integration enable the CNN model to accommodate the size of sliding windows by only altering the input layer, specifically its effectiveness in making a new prediction with each new heartbeat. This algorithm aims to mitigate traditional PAF prediction methods’ limitations: vulnerability to subjective and poor real-time performance and limited contextual understanding. Our experiments on a variety of publicly accessible ECG databases show that our algorithm improved the prediction accuracy and real-time performance of PAF. The contributions of this paper are:
(1) We propose a novel automated algorithm for real-time predicting PAF onset, which integrates a sliding window on raw R-R interval of ECG segments. This mechanism allows the model to easily adjust the sliding step to meet different application scenarios. We set the sliding step to 1 in this study to meet the real-time monitoring requirements.
(2) We also introduce the CNN model for an end-to-end PAF prediction and classification with only raw R-R interval segments as input samples, which allow the whole system to automatically emphasize important information of input data and avoid inevitable subjective in using machine learning methods.
(3) By comparing the results produced with different input sizes of the model, we found that 100 R-R intervals were the overall improvement in the prediction performance, and 50 and 200 R-R intervals were a relatively lower efficiency in terms of testing time of each sample.
(4) We carried out comprehensive comparative experiments using public datasets to validate the effectiveness of our model. The results demonstrate that our approach performs exceptionally well in PAF prediction tasks and holds promise for real-time applications.
The rest of the article is structured as follows. In
Section 2, the databases and the detailed methods both are presented. The methods are evaluated in
Section 3.
Section 4 discusses the analysis and results. Finally,
Section 5 concludes this article by summarizing the achievements and stating possible future applications.
2. Materials and Methods
2.1. Databases
Table 1 shows that we used AFPDB for training and validating PAFNet, while the MIT-BIH Atrial Fibrillation Database (AFDB) and the MIT-BIH Normal Sinus Rhythm Database (NSRDB) were used to test the model's performance and generalization ability. These publicly accessible databases are available from PhysioNet [
34] and contain two ECG channels each. As all channels were collected simultaneously and possess the same RR interval information, we used only single-lead ECG to derive the RR interval sequence.
Figure 1 (A) shows the learning set of AFPDB, which contains three types of labeled ECG records: PAF normal (PAFN) type, which is at least 45 minutes away from any AF episodes; PAF onset (PAFO) type, which is just near the onset of AF; and normal (N) type, with each record lasting 30 minutes. To predict the onset of PAF at least 45 minutes in advance, we used 25 PAFN-type ECG records and 25 N-type ECG records.
For the test databases, AFDB includes 25 long-term ECG records from subjects with AF (mostly PAF), and NSRDB includes 18 long-term ECG records from subjects with no significant arrhythmia. We extracted PAFN-type records from AFDB using the same protocol as AFPDB, excluding AF segments less than 5 minutes, atrial flutter segments, and atrial ventricular junction rhythm segments. As a result, we extracted 12 PAFN-type ECG records and 18 N-type ECG records from these two databases, with each record lasting 30 minutes.
2.2. R-R Intervals of ECG Segments
Pre-processing of ECG records through filtering is crucial for improving signal quality and R-wave location accuracy. To reduce different types of noise interferences, we adopted a series of digital filters. First, we used a band-pass filter with a cutoff frequency of 0.1 Hz to 100 Hz to filter out noise beyond the useful frequency range. Next, we removed the baseline drift using a median filter with a window size set to 0.85 of the sampling frequencies. Finally, we used a fourth-order low-pass filter to further eliminate high-frequency noise.
Figures 1 (B) and (C) demonstrate the data segmentation procedure. The R-R interval sequence of PAFN subjects is more fluctuant than that of normal subjects. After pre-processing, we accurately located the R-waves of each ECG record using the difference threshold algorithm, and derived the R-R interval sequence using this equation:
where R-Ri represents the value of the i-th RR interval, Ri represents the time index of the i-th R-wave, and the index i ranges from 1 to M when the ECG record contains (M+1) R-waves.
We then adopted a sliding window with a size of N on each R-R interval sequence. This window continuously moved from one side to another, and derived a segment containing N R-R intervals during each move. The sliding step could be adjusted according to different application scenarios. In this study, we set the sliding step to 1 to meet the real-time processing requirements and the massive amount of data required for training deep learning models. With a sliding step of 1, we derived (M-N+1) RR interval segments from the whole RR interval sequence.
2.3. Architecture of the PAFNet Model
In this study, we explored a real-time and accurate method for predicting the onset of PAF at least 45 minutes in advance by developing a 1D CNN model. Unlike methods that rely on manually extracted HRV features and traditional machine learning classifiers, end-to-end deep learning techniques avoid the need for hand-crafted feature extraction, thus reducing the loss of ECG information and the limitations of prior knowledge. Among these techniques, CNN is well-suited for image processing and automatic feature extraction, making it ideal for image classification and identification [
35]. Similarly, the ECG signal and RR interval sequence contain abundant overall and partial information that can be automatically extracted using CNN to identify specific diseases. The CNN model can extract high-level feature maps from 1D signal, enabling accurate identification of specific patterns related to the onset of PAF.
As shown in
Figure 2, PAFNet consisted of 26 layers, including 5 convolutional layers. The input size is the same as the sliding window, with each sample represented by a matrix of one row and N columns, where the column number is the index of the RR interval, and the value is the corresponding RR interval duration in seconds. The 1D convolutional layer, batch normalization layer, activation layer, and 1D maximum pooling layer were abstracted as a block, CBAP layer. The convolutional layer automatically extracted feature maps using kernel techniques, while the batch normalization layer accelerated training and improved accuracy. The activation layer increased the non-linearity of the model, and the pooling layer reduced the scale of the feature map. The flatten layer converted all feature maps into one row as input to the dense layer for the final prediction. The output represents the binary prediction result of PAFNet.
Table 2 displays the details of the PAFNet's architecture, including hyperparameters and activation functions used. The size of the input layer depends on the size of the sliding window, and the size of the output layer is set to 1, which represents the probability that the corresponding sample is PAFN. For the CBAP layer, a convolutional kernel size of 100 and a stride of 16 were selected, and the padding method was set to 'valid'. The ReLU function was used as the activation function. A pooling kernel size of 2 and a stride of 2 were selected to halve the scale of each feature map. The number of output feature maps was set to 16, 32, 64, and 128 for the four CBAP layers, respectively. The size of the flatten layer also depends on the size of the sliding window, and the node number of the first dense layer is set to 2,048. A dropout ratio of 0.5 was used to randomly deactivate half of the nodes during each iteration. The training epoch was set to 9, and the batch size was set to 512.
2.4. Training and Optimization of the PAFNet Model
After determining the model's structure, the next step was to optimize the size of the sliding window. Three types of evaluation metrics were used to assess the model's performance with different input sizes, including testing time per batch. A small input size may result in decreased performance due to insufficient ECG information captured by the sliding window, while a larger input size includes more details but may require more testing time per batch.
The limited total sample number of the databases necessitated the use of a stratified ten-fold cross-validation strategy to optimize and evaluate the performance of the PAFNet model during the training and testing procedures. The training dataset, with a sliding window size of 100, yielded 113,281 RR interval segments, consisting of 56,381 PAFN type and 56,900 N type R-R interval segments. To train PAFNet, all segments were randomly divided into ten parts, with nine parts used for training and one part used for validation. This resulted in ten CNN models being trained and saved, with the prediction result of PAFNet during the testing procedure obtained by averaging the prediction result of these models. The model's performance was evaluated using the receiver operator characteristic (ROC) curve, which compared the prediction results obtained when using samples of different spans before the onset of PAF as input.
2.4. Evaluation Protocols
The ability of PAFNet to predict the onset of PAF was evaluated quantitatively using sensitivity (Sen), specificity (Spe), and accuracy (Acc). The total number of true positive (TP), false negative (FN), true negative (TN), and false positive (FP) were counted for PAFN type as positive and N type as negative, and Sen, Spe, and Acc were calculated based on these statistical parameters.
Finally, based on these statistical parameters, Sen, Spe and Acc were calculated as follows:
3. Results
In this study, the training and testing of the PAFNet model were conducted using the TensorFlow 2.3.0 deep learning framework on a desktop computer equipped with an Intel(R)Core(TM)i9-10900KF CPU@3.70GHz and 64 GB memory. To accelerate processing and reduce training and testing time, an NVIDIA GeForce RTX 3080 GPU with 10 GB memory was also utilized.
Table 3 summarizes the results of the model input size optimization, where three models were trained using input sizes of 50, 100, and 200 R-R intervals, denoted as M1, M2, and M3, respectively. The total parameter count is in the range of millions. The testing results of these models showed that M2 achieved the highest Sen, Spe, and Acc, with values of 89.92%, 93.24%, and 91.96%, respectively. Notably, M2 exhibited a 4% increase in Sen and nearly 1% increase in Spe compared to M1 and M3. Accordingly, the Acc of M2 increased by nearly 2%, indicating an overall improvement in the model performance. In terms of testing time, M2 was the most efficient, taking only 9.3 milliseconds to process one input sample, whereas M1 and M3 took 13.8 milliseconds and nearly 30 milliseconds, respectively, to process a batch of data (i.e., 512 samples). Based on these results, the input size of 100 was selected, and M2 was identified as the optimized model.
Table 4 presents the results of ten-fold cross-validation of the PAFNet model using 100 R-R intervals as input (M2 in
Table 3). The sliding window size is set to 100, with a sliding step of 1 as mentioned in section 2. During the ten-fold cross-validation, the 113,281 training and validation samples are randomly shuffled and divided into 10 parts, with each part used once for validation and nine times for training. The fifth and seventh folds show the highest accuracy of 100.00%, while the first fold shows the lowest accuracy of 87.16%, indicating significant variability. Notably, the average validation results are substantially higher than the testing results of Sen, Spe, and Acc for M2 in
Table 3, at 97.12%, 97.77%, and 97.45%, respectively.
Figure 3 and
Figure 4 present the results of database-level testing to evaluate the generalization ability of the PAFNet model, where the.
Figure 3 shows the prediction accuracy of the trained PAFNet using AFPDB, tested using the databases AFDB and NSRDB. Additionally, the evaluation includes input samples of different spans before the onset of PAF. The horizontal axis represents the different spans, which started 75 minutes before the PAF onset, while the vertical axis represents the prediction accuracy. The resultant curve indicates that the accuracy fluctuates around 85% and does not significantly change when the span of the sample varies.
Figure 4 depicts the receiver operator characteristic (ROC) curves of the ten models during the ten folds of the stratified ten-fold cross-validation. The bold blue curve denotes the ROC curve of the average prediction results. The proposed PAFNet achieved high performance for both positive and negative samples, and the mean area under the curve (AUC) was about 0.93, with AUC values ranging from 0.91 to 0.97 in each fold.