1. Introduction
White blood cells (WBCs), also known as leukocytes, play a vital role in the body's immune response [
1]. They are produced in the bone marrow and are an essential component of the body's defense system against infection and disease. WBCs are classified as either granulocytes, which possess granules in their cytoplasm, or agranulocytes, which lack granules [
2]. Granulocytes include neutrophils, eosinophils, and basophils. Agranulocytes include lymphocytes and monocytes. Neutrophils, the most common type of WBCs, are the first to arrive at the site of an infection and are responsible for engulfing and destroying bacteria and other foreign particles [
3]. Lymphocytes include T and B cells, which are responsible for cell-mediated and antibody-mediated immunity, respectively [
4]. T cells help to identify and attack infected or cancerous cells, while B cells produce antibodies that can neutralize pathogens. Monocytes mature into macrophages, which consume and destroy microorganisms and debris [
5]. Eosinophils play a role in the body's response to parasitic infections and allergies [
6]. Basophils release histamine and other inflammatory chemicals in response to allergens and other stimuli [
7].
WBCs number can increase in response to infection, inflammation, or other stimuli. An abnormal increase in WBCs count is called leukocytosis, while a decrease is called leukopenia [
8]. Abnormalities in WBCs counts can indicate a variety of medical conditions, including infections, cancers, and immune system disorders. A complete blood count (CBC) test, which isolates WBCs from a blood sample and studies their number and appearance under a microscope, is commonly used as part of a routine medical check-up [
9].
The utilization of artificial intelligence-based systems to automatically classify WBCs in a CBC test can provide several benefits. Firstly, it can enhance the accuracy and consistency of the results by removing the subjective nature of manual classification. Manual classification of WBCs is a complex and time-consuming task that requires a high level of expertise and experience [
10]. However, with AI-based systems, the process can be automated, and the results can be more consistent, as the system does not get tired or make mistakes due to human error. Secondly, it can also increase the efficiency of the process by reducing the time required for manual classification. This can be especially beneficial in high-volume settings, such as in hospital laboratories, where a large number of CBC tests are performed daily. Automated classification can also help to reduce the workload of laboratory staff, allowing them to focus on other tasks. Furthermore, AI-based systems can also provide additional information that may not be visible to the human eye, such as detecting rare or abnormal cells, which can assist in the diagnosis of certain blood disorders.
In recent years, there has been a growing interest in using machine learning and artificial intelligence to automate the analysis of WBCs. Deep learning algorithms have been employed to develop automated systems that can identify and segment WBCs in digital images of blood samples, providing a faster and more accurate alternative to manual analysis [
11]. To perform WBCs classification using deep learning, a dataset of labeled images is first employed to train a neural network model. The model is subsequently able to make predictions on new images, accurately identifying and classifying various types of WBCs. This approach has demonstrated promising results, with some studies showing the ability to achieve high levels of accuracy and precision in WBCs classification [
12,
13,
14].
A number of studies have explored the utilization of deep learning for WBCs classification, employing techniques such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to classify various types of WBCs. Cheuque et al. [
15] proposed a two-stage hybrid multi-level scheme for efficiently classifying four groups of WBCs (lymphocytes, monocytes, segmented neutrophils, and eosinophils) using a combination of a Faster R-CNN network and parallel CNNs with the MobileNet structure. The proposed model achieved a performance metric of approximately 98.4% in terms of accuracy, recall, precision, and F-1 score. Sharma et al. [
16] proposed a deep learning model, specifically the DenseNet121 model, for classifying various types of WBCs in blood cell images. They utilized preprocessing techniques, such as normalization and data augmentation, to optimize the model. The model was evaluated using a dataset from Kaggle containing 12,444 images of various types of WBCs. The results indicated that the model achieved an accuracy of 98.84%, precision of 99.33%, sensitivity of 98.85%, and specificity of 99.61%. Jung et al. [
17] proposed a CNN-based method, referred to as W-Net, for WBCs classification, which was evaluated on a large-scale dataset of real images of the five types of WBCs. The proposed method, W-Net, achieved an average accuracy of 97% and demonstrated superior performance compared to other CNN and RNN-based model architectures. The authors also proposed the utilization of Generative Adversarial Networks (GANs) to generate synthetic WBCs images for educational and research purposes. Rustam et al. [
18] proposed a hybrid feature set that combines texture and RGB features from microscopic images for classifying various types of WBCs in blood cell images. They utilized a synthetic minority oversampling technique-based resampling to mitigate the influence of imbalanced datasets, which is a common problem in existing studies. The authors also adopted machine and deep learning models for performance comparison using the original dataset, augmented dataset, and oversampled dataset to analyze the performances of the models. The results suggest that a hybrid feature set of both texture and RGB features from microscopic images, yields a high accuracy rate of 97.00% with random forest. Chola et al. [
19] proposed a deep learning framework, referred to as BCNet, for the identification of various types of blood cells in an eight-class identification scenario. The proposed BCNet framework is based on transfer learning with a CNN. The dependability and viability of BCNet were established through exhaustive experiments consisting of five-fold cross-validation tests. The performance of BCNet was compared with state-of-the-art deep learning models such as DenseNet, ResNet, Inception, and MobileNet. The BCNet framework achieved the highest performance with the RMSprop optimizer, with 98.51% accuracy and 96.24% F-1 score. CNN-based architectures have been utilized in most studies published in the literature. Nonetheless, CNNs exhibit limitations in managing variable length sequences and capturing long-term dependencies due to the convolution and pooling operations which can result in information loss. Consequently, researchers are in search of alternative methods to CNN architectures.
Recently, image transformer architectures have been applied to image classification [
20]. These architectures, designed to process sequential data, have demonstrated exceptional results in image classification tasks. In contrast to traditional CNNs, which utilize spatial convolutions to extract features from images, image transformers employ self-attention mechanisms to capture relationships between different regions of an image [
21]. This allows them to learn global, contextual features that are valuable for classification tasks. Recent studies have shown that image transformers can achieve state-of-the-art performance on various image classification benchmarks [
22].
This paper proposes an explainable Vision Transformer (ViT) model for computer-assisted automatic WBCs classification. The ViT model, pre-trained for a distinct task, was fine-tuned to classify WBCs. The model was trained on a public set consisting of 16,633 samples, and its performance was evaluated. The results showed that the model achieved high accuracy rates in both multi-class and binary classification of WBCs. To ensure the proposed method can be confidently applied in clinical settings, the pixel areas upon which the model focuses its predictions have been visualized.
The main contributions of this study can be summarized as follows:
- ➢
WBCs classification is achieved without the requirement for any preprocessing or convolutional processes.
- ➢
The effectiveness of using ViT for WBC classification, which can potentially outperform traditional CNN architectures.
- ➢
The proposed ViT-based method achieves high accuracy rates in both multi-class and binary classification of WBCs, which is crucial for accurate disease diagnosis and treatment.
- ➢
An explainable method for WBCs classification, which increases the transparency and trustworthiness of the model's decision-making process.
- ➢
Visiualization of the pixel areas upon which the model focuses its predictions, which can facilitate the adoption of the proposed method in clinical settings.
4. Discussion
WBCs classification plays a crucial role in diagnosing of many diseases, including infections and blood-related disorders. However, manual classification of WBCs can be time-consuming and prone to errors due to human subjectivity. Therefore, the use of machine learning algorithms, has the potential to improve the accuracy and efficiency of WBCs classification.
Table 2 provides details for a hand-curated selection of research studies on that topic. Tavakoli et al. [
33] introduced a novel approach for the classification of white blood cells utilizing image processing and machine learning techniques. The proposed method encompasses three main stages, namely nucleus and cytoplasm detection, feature extraction, and classification through an SVM model. The achieved accuracy rate of the proposed method in categorizing WBCs in the Raabin-WBC dataset was 94.65%. Katar and Kilincer [
34] proposed an approach for the automatic classification of WBCs using pre-trained deep learning models, including ResNet-50, VGG-19, and MobileNet-V3-Small. The proposed approach achieved high accuracy rates, with the MobileNet-V3-Small model reaching the highest accuracy of 98.86%. Akalin and Yumusak [
35] present a study on the real-time detection and classification of WBCs in peripheral blood smears using object recognition frameworks. YOLOv5s, YOLOv5x, and Detectron2 R50-FPN pre-trained models were used, and two original contributions were made to improve the model's performance. The maximum accuracy rate achieved on the test dataset for detection and classification of WBCs was 98%. Leng et al. [
36] present a study on developing a deep learning-based object detection network for leukocyte detection using the detection transformer (DETR) model. The study findings indicate that the improved DETR model outperforms the original DETR and CNN with a mean average precision (mAP) detection performance of up to 96.10%.
In this study, the pre-trained ViT model was utilized for automatic classification of white blood cells. The model trained for five distinct types of white blood cells attained an accuracy rate of 99.75%. In contrast, the fine-tuned model, which classified cells based on their granule content, achieved an accuracy rate of 99.40%. In comparison to the studies listed in Table 3, we achieved higher accuracy and evolved the ViT model into an explainable structure using the Score-CAM algorithm. The ViT model's superior performance can be attributed to its unique architecture, which enables it to capture long-range dependencies between different parts of the image, resulting in better image recognition performance.
The advantages of our explainable model can be summarized as follows:
The proposed model is based on vision transformers that has become popular research field. Therefore this study is an example to examine vision transformers performance in biomedical image classification.
This model can classify WBCs images with end-to-end transformer structure. There is no need to use any feature engineering.
Due to the explainable structure, the proposed model presents focused regions during the classification process. According to these results, experts can validate model performance.
Due to its high level of classification accuracy, it has the potential to be utilized in clinical applications.
The limitations of our study are outlined as follows. Although the proposed method achieves a high success rate in classifying white blood cells, its response time was not assessed in a real-time study. Additionally, the model's resilience to image variations due to factors such as illumination and the noise was not verified. To address these limitations, future research will involve generating synthetic images using data augmentation techniques and training new models with these images. Furthermore, we will investigate the effectiveness of transformer-based models in detecting various diseases and symptoms. The proposed method can be seamlessly integrated into clinical software and provide invaluable assistance to specialists in WBCs classification.
Author Contributions
Conceptualization, O.K. and O.Y.; methodology, O.K.,.; software, O.K.,; validation, O.Y.; formal analysis, O.K.; investigation, O.K., and O.Y., ; writing—original draft preparation, O.K., and O.Y.; writing—review and editing, O.K., and O.Y.,; visualization, O.K.,; supervision, O.K.,. All authors have read and agreed to the published version of the manuscript.