3.1. Implementation
We implemented this work on Python and used cross-validation to split the database into training, testing and validation set. The Keras library was mainly used, which provides high-level access through an API to the functionalities of the Tensorflow library. On the other hand, we also implemented in Python an algorithm to obtain a Support Vector Machine model, and compare the obtained results with those achieved with the proposed convolutional neural network model. For the case of SVM, we used the Sklearn library which is based on other libraries for scientific work such as NumPy, SciPy and Matplotlib. In addition, the used Kernel was based on the Radial Basis Functions (RBF) and the parameter γ=0.00000001. The computational medium used in the implementation of both models was an Intel® Quad Core™ i7-4720HQ 2.6GHz (6M Cache, Turbo max. ...) and 16 GB of RAM.
We have used a learning rate with a value of 0.001. In all cases, 100x 100 size input images and batch size of 50 were used. These parameters made the training process efficient, creating a good relation between training and model validation precision [
22]. In this work, we followed a procedure very similar to the one shown in [
12].
Figure 3 shows the training and validation accuracy, where some fluctuations (random spikes) can be observed in the validation accuracy as the epochs advance with training accuracy. This is indicative of some over-fitting and that the neuron weights were not uniformly adjusted in the validation process.
However, despite the lack of uniformity in learning (occurrence of some peaks at certain epochs), these peaks decreased in magnitude as the epochs progressed, indicating an adequate learning process.
In many cases, the performance of a model can be evaluated by graphical analysis, which often does not provide accurate evidence by taking a single metric. For this reason, it is necessary to use other evaluation metrics (accuracy, recall, F1 score, etc.) to perform a more in-depth comparison of models.
3.2. Comparison of the Obtained Results with CNNs and Support Vector Machine
In this paper, our goal is not to give an exhaustive explanation about SVM, which is a well-known machine learning technique. In this case our database was not very unbalanced as in [
12], but we proceeded in the same way. We selected the SVM method to carry out the comparison because it has proven its effectiveness and it was necessary to compare the obtained results with CNN with a classical machine learning method.
In Table I, we show the results of the evaluation metrics for the proposed CNN model. Table II shows the results of the evaluation metrics using the SVM model.
Table I.
Results of the evaluation metrics for the proposed CNN model.
Table I.
Results of the evaluation metrics for the proposed CNN model.
Table II.
Results of the evaluation metrics for the SVM model.
Table II.
Results of the evaluation metrics for the SVM model.
From Tables I and II, we can perform a more in-depth analysis of the obtained results taking into consideration the size of the database for patients with COVID-19 and without COVID-19. For example, it is evident (something that has been pointed out in the literature) that when the database is small the machine learning models do not learn well, which is in correspondence by the amount of false positives and negatives that are classified by the models (see the confusion matrix). It should be kept in mind that the correctly classified samples are those that appear on the diagonal.
The interesting issue about these results is that when the database is small or very unbalanced, the DL model learns less than the SVM model. We had already obtained similar results in another paper published in [
12]. Note that the false positives (FP) and false negatives (FN) classified by the DL model were slightly higher. It is important to note that our objective here is not to criticize DL models, since they have shown that, when adequate databases are available, the results obtained in many applications are unquestionable. Our interest is to point out that many times one wants to apply the most current state-of-the-art technique without first carrying out an analysis of the data, and not infrequently the established machine learning models (as is the case of SVM) are underestimated. The larger the database, the more the network learns, but more time for training too.
On the other hand, accuracy tends to hide classification errors for classes with fewer elements, since these classes have little weight compared to other larger classes. For this reason, the analysis should be directed towards other metrics, in order to carry out a more accurate study in validating the performance of a model. For example, when taking the F1 score, which is the harmonic mean between precision and recall, it can be observed, in Tables I and II, that the trend of a higher value in the SVM model than in the DL model is maintained.
In
Figure 5 and Figure 6, we represent two graphical examples of the classification of FP and FN.
Figure 4.
The three images represent chest CT scans of COVID-19 positive individuals. The classification of the DL model is for (a) negative, for (b) negative and for (c) positive. The classification by the SVM model is for (a) positive, for (b) positive and for (c) negative.
Figure 4.
The three images represent chest CT scans of COVID-19 positive individuals. The classification of the DL model is for (a) negative, for (b) negative and for (c) positive. The classification by the SVM model is for (a) positive, for (b) positive and for (c) negative.
This experimental study is not intended to reach definitive conclusions, as it is only the beginning of a deeper research that needs us to keep growing the database, so that we can verify if both models follow the same trend in terms of the result in the evaluation metrics; or if there is an inflection point from which the growth of the database makes the DL models start to outperform the SVM models. However, it is a reality that established machine learning methods cannot be completely discarded. For example, when observing the training time, it is evident as it was expressed, that when using the SVM model it was much lower, besides having slightly higher evaluation metrics.
The important issue in this analysis is to keep in mind the cost/benefit principle. It is true that in any research process or in a given application, the time to obtain the expected result is very important. In the case of COVID-19 disease, the response time of testing a patient was of utmost importance because of the subsequent implications that this wait could entail. For this reason, the need to refine the effectiveness in the predictions of machine learning models, and the importance and contribution of this experimental study.
In recent years it has become evident that in machine learning models, and primarily in deep learning, it is very important to have real and large databases. It is true that there are many numerical methods and transformations that can be used to augment the database [
4], which for the purposes of any work can be effective, but in real medical imaging problems it is best to try to have large real databases.