1. Introduction
Cardiovascular disorders (CVD) and valvular heart diseases (VHD) are the leading cause of mortality worldwide [
1]. Recent studies have shown that AI capabilities have potential for health monitoring, automation of these tasks could free up clinicians for more complex work and improve access to health care [
2]. Heart sound classification is an essential task for diagnosing and monitoring heart conditions.
VHD is a growing public health problem that should be addressed with appropriate resources to improve diagnosis and treatment [
3]. Aortic stenosis (AS) is the most recurrent valvular disorder in developed countries (affecting 9 million people worldwide) [
4], and its prevalence is increasing with the aging of the population and the increasing prevalence of atherosclerosis and it is characterized by a harsh, systolic ejection murmur that is best heard at the right upper sternal border[
5]. Mitral regurgitation (MI) is one of the most common heart valve disorders worldwide, with an estimated prevalence of 1.7% [
6] and it is characterized by a holosystolic murmur that is best heard at the apex and radiates to the axilla. Mitral stenosis (MS) is a common disease that causes a large number of diseases throughout the world. This disease is more recurrent in developing countries, but in developed countries it is becoming more common in atypical forms [
7] and it is characterized by a low-pitched, diastolic rumble that is best heard at the apex with the patient in the left lateral decubitus position. Mitral valve prolapse (MVP) is a frequent condition that affects 2-3% of the general population and it is characterized by a mid-to-late systolic click followed by a late systolic murmur that is best heard at the apex and radiates to the axilla [
3,
8]. These common heart valve disorders with distinct characteristics and prevalences worldwide and each condition has a unique auscultation finding that helps in diagnosis.
Phonocardiography (PCG) is a diagnostic technique that analyzes heart sounds acquired at the chest wall to determine if the heart is functioning normally or if further diagnosis is required. Skilled cardiologists typically analyze these sounds, which result from muscle contractions and heart valve closure. However, this process can be affected by factors such as environmental noise, limitations in audible frequency range and the medical examiner’s expertise [
9,
10]. Given these problems, solutions have been developed over the last few years to help provide a better diagnosis, aided by technology for the analysis and classification. Based on previous work we can highlight 3 stages for the classification of PCG signals: pre-processing part, features extraction and classification. Cochleograms, spectrograms, mel-spectrograms, and scalograms are time-frequency multi-representations that provide a way to visualize the frequency content of PCG signals over time. A cochleogram is a time-frequency representation (TFR) that simulates the cochlea in the inner ear, which is responsible for frequency analysis in the auditory system [
11,
12]. A spectrogram is a time-frequency representation that uses a Fourier transform to convert a signal from the time domain to the frequency domain [
13]. A mel-spectrogram is a TFR that uses a mel-scale filterbank to approximate the nonlinear human auditory system [
14]. A scalogram is a TFR that uses a wavelet transform to provide a high resolution in both time and frequency [
15]. These time-frequency representations allow a visualization of frequency content in PCG signals over time and their mathematical differences make a path to a benchmarking.
The use of bicubic and Lanczos interpolation methods for resizing the time-frequency representations is a significant contribution to this work [
5,
16]. These methods ensure the preservation of the original signal’s quality while providing uniform input dimensions for the classification models. By employing these high-quality resizing techniques, the performance and robustness of the classification models are enhanced, facilitating more accurate diagnosis and monitoring of valvular heart diseases.
The preprocessing, studies have been carried out for segmented signals as in [
17] the segmentation is done synchronously and asynchronously with different sizes. Shannon energy envelope and zero crossing is a proposed algorithm for segmentation [
18]. In this work we will not use signal segmentation since we may lose information from the signals.
The conventional methods of heart sound signal feature extraction like time domain as in [
19] where the authors extract 12 features and to reduce them use feature reduction techniques, Matching Pursuit time-frequency decomposition using Gabor dictionaries [
20]. Short-time Fourier transform (STFT) based spectrogram to represent patterns of the normal and abnormal PCG signals [
21]. In [
22] the authors use discrete wavelets transform (DWT) to extract features. Continue wavelet transforms based spectrogram (CWTS) [
23], Hilbert-Huang transform (HHT) [
24], Mel frequency cepstral coefficients (MFCC)[
14,
25]. Since manually the extraction of characteristics may not be as efficient, it is proposed use pretrained neural networks as Alexnet, VGG16, VGG19 to extract deep features [
10,
25].
In the last stage many types of machine learning for the classification of PCG signals to detect different types of CVD have been proposed. One dimension deep neural network (1-D DNN) with low parameters to detect abnormalities of Cardiovascular disease [
26]. In [
27] propose a combination of CNN and bi-directional long short-term memory (CNN-BiLSTM). In [
28] they used five different types of Artificial neural network (ANN) named narrow, wide, tri-layered, bi-layered and medium. In [
29] the residual neural network (Resnet) was used to avoid losing information from the previous layers.
In
Table 1 shows the different techniques of feature extraction performance with the classifiers used by the authors where we can see the types of two-class and five-class classifications of PCG signals.
In this study, we present a benchmark of heart sound classification systems based on time-frequency multi-representations. The goal of this benchmark is to provide a comprehensive comparison of state-of-the-art heart sound classification systems and to identify the most effective TFR for detecting valvular heart diseases. The other contributions of this work are summarized in the following points:
Employ resizing techniques to TFRs to improve sorting performance.
Use of Boruta feature selector to narrow down the features to the most important ones.
Perform nested cross validation (nCV) approach that combines cross-validation with an additional model selection process.
Comparison of the classification performance of machine learning algorithms such as Decision Trees (DTs), K-Nearest Neighbors (KNN), Random Forest (RF) and Suport Vector Machine (SVM).
This paper is part of the research development in the area of biomedical engineering of the Universidad Nacional de San Agustin de Arequipa [
30,
31,
32,
33] for the Think Health project which seeks to improve medical assistance in the auscultation process.
3. Results
The diagnosis of cardio-valvular diseases is a complex issue since the signals are not always obtained cleanly, which is why, as explained in
Section 2.2.1, preprocessing is done for each type of signal that was used in this work. In the next stage, the TFRs were obtained from PCG signals of VHDs were given as input for the pre trained VGG16 model and given the large number of these obtained. In
Table 3 shows the features confirmed, tentative and rejected for each type of TFR. The number of iterations performed for the selection was 10. The last part consisted of an nCV to obtain the best parameters of each classifier for each type of given input and obtain a correct classification. For this research, a machine with 16GB of RAM, Ryzen 5 3600 processor and an NVIDIA GeForce RTX 3060 video card was used. Obtaining characteristics as well as training each model were carried out in the Spyder 4.1.4 software with the libraries Tensorflow 2.12.0 and Tensorflow-GPU 2.10.0.
As previously indicated, this work seeks alternative ways of extracting features in order to improve the classification performance. Firstly, as we see in
Table 4, different TFRs were used but without applying resize techniques, where initially a quantity of 4096 deep features was obtained. By then applying Boruta feature selection, confirmed deep features were obtained 1028 for Spectogram, 1007 for Mel Spectogram 1130 for and Cochleagram as shown in
Table 3. The highest performance values were obtained using Cochleagram and SVM as classifier with the values of 99.2% precision, 99.2% recall, 99.19% F1 score, 99% Matthews correlation coefficient and 99.2% accuracy.
Another objective of this work was to apply resize techniques for the TFRs where in this case, as shown in
Table 5, the bicubic resize technique was applied, which served as input for the VGG16 model, obtaining a row vector of 4096 deep features, then using Boruta in
Table 3 the confirmed deep features were obtained 1031 for Spectogram, 936 for Mel Spectogram 1142 for Cochleagram. The highest performance values were obtained using Mel Spectogram and SVM as classifier with the values of 99.4% precision, 99.4% recall, 99.39% F1 score, 99.25% Matthews correlation coefficient and 99.4% accuracy.
And for the last part, each TFR obtained from the PCG signals, Lanczos resize technique was applied where a total of 4096 deep features were obtained and by applying Boruta feature selection in
Table 3 the confirmed features were obtained 959 for Spectogram, 980 for Mel Spectogram 1124 for Cochleagram. The performance values shown in
Table 6 were obtained using Mel Spectogram and SVM as classifier with the values of 99.2% precision, 99.2% recall, 99.19% F1 score, 99% Matthews correlation coefficient and 99.2% accuracy.
In our study, we observed that the cochleagram consistently outperformed other time-frequency representations in terms of accuracy, F1 score, recall, and precision. With a remarkable performance of 7 folds reaching 1.0 out of the other TFR and keep results in interpolation of itself, the cochleagram demonstrated its robustness and stability in diagnosing valvular heart diseases (VHD). Additionally, the mel-spectrogram representation, further highlighting the effectiveness of leveraging modern digital processing techniques. For spectograms in comparison to the others, are not stable and effective as the others two. These findings underscore the potential of time-frequency representations combined with advanced signal processing and machine learning algorithms for enhancing the accuracy of VHD diagnosis, as shown in
Figure 6.
Our study’s findings demonstrate how well Support Vector Machine (SVM) classifiers perform when it comes to correctly identifying valvular heart disorders (VHD) from phonocardiogram (PCG) signals. SVM distinguished itself with an astounding accuracy rate of 98.76% among the classification algorithms examined, which included Decision Trees (DT), k-Nearest Neighbors (KNN), Random Forests (RF), and SVM. This remarkably high accuracy highlights SVM’s ability to distinguish between various heart sound patterns linked to VHD, making it a useful tool for clinical diagnosis.
The strong performance of Support Vector Machines (SVM) can be ascribed to its capacity to build ideal hyperplanes that optimize the margin of separation between distinct classes, hence permitting accurate PCG signal categorization. The input data is transformed into a higher-dimensional feature space by using a kernel function.
In this study, we evaluated the performance of Support Vector Machine (SVM) in
Figure 7 classifiers using different time-frequency representations (TFRs) of phonocardiogram (PCG) signals. Specifically, we analyzed the effectiveness of three TFRs: spectrogram, mel-spectrogram, and cochleagram, each processed with two interpolation techniques—Bicubic and Lanczos. The results reveal compelling insights into the diagnostic accuracy of SVM classifiers across various TFRs and interpolation methods.
Starting with the spectrogram representation, SVM achieved an accuracy of 97.50%, with minor variations observed when applying Bicubic and Lanczos interpolation techniques. Despite slight fluctuations, SVM consistently demonstrated robust performance, highlighting its resilience in distinguishing between different heart sound classes.
Moving to the mel-spectrogram representation, SVM maintained a high accuracy of 99.00% across all interpolation methods. This indicates the effectiveness of mel-spectrogram features in capturing relevant information for VHD diagnosis, with SVM exhibiting exceptional discriminative capabilities.
In contrast, the cochleagram representation yielded even more impressive results, with SVM achieving an accuracy of 99.20%. Notably, this representation showcased superior performance compared to spectrogram and mel-spectrogram, emphasizing the importance of cochleagram features in accurately characterizing heart sound patterns associated with VHD.
Furthermore, when considering the interpolation techniques, both Bicubic and Lanczos methods resulted in comparable accuracies across all TFRs, underscoring their utility in enhancing image quality without compromising diagnostic performance.
Figure 8, shows confusion matrices for TFR using SVM classifier, due to last shown performance of SVM in classification.
Our study delved into the efficacy of Time-Frequency Representations (TFRs) in accurately classifying valvular heart diseases (VHD) based on phonocardiogram (PCG) signals. Among the TFRs evaluated Spectrogram, Mel-Spectrogram, and Cochleagram emerged as the most promising, boasting an impressive accuracy rate of 96.07%. This notable accuracy underscores the effectiveness of Mel-Spectrogram in capturing and representing key features of PCG signals associated with different types of VHD.
The performance of Mel-Spectrogram can be attributed to its ability to provide a detailed and informative representation of PCG signals in both the time and frequency domains. By leveraging the Mel-scale to perceptually weight frequency bands, Mel-Spectrogram effectively highlights important spectral characteristics that are indicative of underlying heart abnormalities.
Furthermore, the high accuracy achieved by Cochelagram highlights its robustness and reliability in discriminating between different VHD classes, even in the presence of variations in signal quality or recording conditions. This robustness makes Cochleagram a valuable tool for automated VHD diagnosis, offering clinicians a dependable means of detecting and classifying heart abnormalities with high accuracy and confidence.
Overall, our findings underscore the potential of Cochleagram as a valuable tool for PCG-based VHD diagnosis. Its ability to consistently achieve high accuracy rates reaffirms its utility in clinical practice, providing clinicians with a reliable and efficient method for identifying and classifying VHD with precision and accuracy.
Figure 9.
Comparison of Time Frequency Representations accuracy considering previously shown classifiers.
Figure 9.
Comparison of Time Frequency Representations accuracy considering previously shown classifiers.
Our exploration into the effects of image resizing techniques on valvular heart disease (VHD) classification accuracy based on phonocardiogram (PCG) signals yielded noteworthy findings. Among the assessed techniques No Resize, Bicubic, and Lanczos Bicubic interpolation emerged as the standout performer, achieving an impressive accuracy rate of 96.51
The superior performance of Bicubic interpolation can be attributed to its capability to produce smooth and visually appealing resized images from the original PCG spectrograms. By employing sophisticated interpolation algorithms, Bicubic interpolation effectively preserves crucial features of the PCG signals while enhancing overall image quality.
Moreover, the high accuracy achieved with Bicubic interpolation underscores its effectiveness in enhancing the discriminative capability of PCG spectrograms for VHD classification. The refined image representations generated by Bicubic interpolation facilitate more precise feature extraction and classification, enabling accurate identification of various VHD classes.
These findings underscore the substantial impact of image resizing techniques on the accuracy of VHD classification from PCG signals. Bicubic interpolation, in particular, emerges as a valuable enhancement, providing clinicians with a reliable and efficient method of leveraging PCG spectrograms for accurate VHD diagnosis.
Figure 10.
Comparison of Resize Techniques accuracy considering previously shown TFRs.
Figure 10.
Comparison of Resize Techniques accuracy considering previously shown TFRs.
4. Discussion
The discussion of our findings unveils intriguing insights into the potential applications and implications of employing various time-frequency representations (TFRs) and image resizing techniques for valvular heart disease (VHD) classification based on phonocardiogram (PCG) signals. The meticulous evaluation of TFRs—specifically Spectrogram, Mel-Spectrogram, and Cochleagram—reveals nuanced differences in their effectiveness for capturing distinctive features indicative of different VHD classes. Notably, Mel-Spectrogram emerges as the most promising TFR, boasting a commendable accuracy rate of 96.07%. This underscores the importance of considering the spectral characteristics of PCG signals in VHD classification, with Mel-Spectrogram offering a rich representation conducive to accurate classification.
Furthermore, our investigation into the impact of image resizing techniques sheds light on the significance of preserving signal fidelity and enhancing image quality for improved classification outcomes. Among the evaluated techniques—No Resize, Bicubic, and Lanczos—Bicubic interpolation emerges as the optimal choice, achieving an impressive accuracy rate of 96.51%. This underscores the critical role of image preprocessing techniques in augmenting the discriminative power of PCG spectrograms, thereby facilitating more accurate VHD classification.
The observed superiority of Bicubic interpolation can be attributed to its ability to generate visually appealing resized images while preserving essential signal features, enabling more robust feature extraction and classification. This highlights the importance of leveraging advanced image processing techniques to enhance the diagnostic capabilities of PCG-based VHD classification systems.
Moreover, our study underscores the potential of machine learning algorithms, particularly Support Vector Machine (SVM), in effectively leveraging TFRs and resized images for accurate VHD classification. The consistently high accuracy rates achieved by SVM across different TFRs and resizing techniques underscore its robustness and suitability for VHD diagnosis.