The Classification of VOCs Based on Sensor Images Using a Lightweight Neural Network for Lung Cancer Diagnosis

Preprint

Article

The Classification of VOCs Based on Sensor Images Using a Lightweight Neural Network for Lung Cancer Diagnosis

Altmetrics

Downloads

121

Views

Comments

A peer-reviewed article of this preprint also exists.

Chengyuan Zha

Lei Li^*,Fangting Zhu,Yanzhe Zhao

Chengyuan Zha

Lei Li^*,Fangting Zhu,Yanzhe Zhao

This version is not peer-reviewed

Submitted:

15 March 2024

Posted:

15 March 2024

You are already at the latest version

Alerts

Abstract

The application of artificial intelligence to point-of-care testing (POCT) disease detection has become a hot research field, in which breath detection which detects the patient's exhaled VOCs combined with sensor arrays of Convolutional Neural Network (CNN) algorithms as a new lung cancer detection is attracting more researchers' attention. However, due to low accuracy, high complexity computation and large number of parameters make the CNN algorithms difficult transplant to the embedded system of POCT devices. A lightweight neural network (LTNet) in this work is proposed to deal with this problem, and meanwhile, achieve high-precision classification of acetone and ethanol gases, which are respiratory markers for lung cancer patients. Compared to currently popular lightweight CNN models such as EfficientNet, LTNet has fewer parameters (32K) and its training weight size is only 0.155 MB. LTNet achieved an overall classification accuracy of 99.06% and 99.14% in the own mixed gas dataset and the University of California (UCI) dataset, which are both higher than that of the five existing models’, and it also offers the shortest training (844.38s and 584.67s) and inference time (23s and 14s) in the same validation set. Compared to the existing CNN models, LTNet is more suitable for resource-limited POCT devices.

Keywords:

Subject: Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Due to its high incidence and lethality, lung cancer (LC) imposes a significant burden on the healthcare system [1,2]. By the end of 2023, it is forecasted that there will be 609,820 cancer-related deaths in the United States, lung cancer remaining the leading cause of cancer mortality among them [3]. Currently, clinical detection of lung cancer primarily relies on cellular or histopathology examinations, radiographic imaging such as X-rays and CT scans, and tumor marker assays in bodily fluids. But, these existing detection techniques have obvious drawbacks, including high cost and significant harm to the human body.

Volatile organic compounds (VOCs) containing in exhaled human breath are closely associated with various diseases. Breath analysis was as one solution to be put forward, which is a non-invasive early screening method that can be employed for the screening of various diseases, including lung cancer, diabetes, breast cancer and so on [4,5,6] Research has indicated that lung cancer results in elevated levels of acetone and ethanol in exhaled breath. Both of these gases can serve as reliable exhalation biomarkers for early lung cancer [7,8]. In current clinical trials, spectroscopic methods, mass spectrometry, chromatography-related techniques, and electronic noses (e-nose) are considered as relatively viable and efficient standard technologies for detecting VOCs in human exhaled breath [9,10,11]. However, techniques such as mass spectrometry, chromatography, and spectroscopic methods are constrained by their high equipment costs and technical operator requirements. In contrast, gas sensors array-based e-nose gas sensing technology holds greater cost advantage and development potential for applying in breath analysis [12,13,14].

Through the analysis of the data collected by the sensor array by pattern recognition algorithm, the electronic nose can effectively identify complex gases. In pattern recognition task, feature selection and feature extraction directly affect the detection performance of electronic nose. Marzorati et al. [15] extracted nine features from the response of each gas sensor to exhaled gas. Liu et al. [16] used 19 sensors, selected 13 composite features from each sensor, and combined with classical classifiers to verify the feasibility of identifying LC patients through VOCs. In previous studies, it is found that the whole feature extraction process is particularly complex, and it is necessary to try the feature extraction method continuously to obtain better results. Gramian Angular Field (GAF) is a data visualization method proposed by Wang et al. which converts time series into two-dimensional color images [17]. GAF encodes one-dimensional time series into two-dimensional color images with more prominent key features, so as to display the important information hidden in the sensor signal more clearly. By using the data visualization technology of GAF, the original data does not need to be processed, and is directly converted into two-dimensional color images, which can not only retain the deep features of the signal, but also avoid complex feature extraction engineering [18].

Currently, pattern recognition algorithms employed in electronic noses are categorized into classical gas identification algorithms (machine learning), artificial neural networks (ANNs), and biologically-inspired pulse neural networks (SNNs) [19,20,21]. To address complex gas recognition tasks, ANNs have been regarded as the current popular choice [22,23]. Compared with machine learning and SNNs, ANNs exhibit strong adaptability and do not require model redesign for different training tasks. Additionally, the Nets can automatically learn complex features from data without the need for manual feature extraction and they encompass various neural network architectures, such as Backpropagation Neural Networks (BPNNs), Convolutional Neural Networks (CNNs), etc. Avian et al [24]. proposed a CNN architecture and built two models for analyzing VOCs in exhaled gas. The first model receives the signal processed by different feature extraction methods as input, while the second model directly processes the original signal. The results indicate that different classifiers demonstrate varying effects depending on the employed feature extraction methods, with Kernel PCA (KPCA) showing a positive impact on performance. Guo et al. [25] introduced an innovative deep learning framework that combines an electronic nose to predict odor descriptor ratings, which was the first application of Convolutional Long Short-Term Memory (ConvLSTM) on an electronic nose for olfaction prediction.

Although convolutional neural networks have been found extensive application in gas classification, they possess a substantial number of trainable parameters, high computational complexity, and slow inference speed and not to be hardware-friendly for devices, which hinder the transportability of pattern recognition algorithms into embedded systems. To address this problem, convolutional neural networks have progressively developed toward lightweight direction [26,27].

The contributions of this article are summarized as follows:

(1): A hardware-friendly lightweight neural network model (LTNet) using depth-separable convolution structure for gas classification is constructed.
(2): To settle the decrease in classification accuracy caused by depthwise separable convolutions, we proposed to add Squeeze-and-Excitation (SE) attention mechanisms and residual connections in the model.
(3): The convolutional and batch normalization (BN) layers are combined together so as to reduce the model parameters, speed up the inference speed and improve the stability of the model.

2. Experimental Section

Data collection was performed using the CGS-8 intelligent gas sensor analysis system provided by Beijing Ailite Technology Co., Ltd. And the collected data was transformed into images using the Gramian Angular Field (GAF) which were as classification input for LTNet. Two different datasets were employed to assess the performance of the model, and the overall process is illustrated in Figure 1.

2.1. Data Sources I: Gas Mixture Data Set

Exhaled breath from lung cancer patients contains biomarkers including numerous species, such as acetone, ethanol, isoprene and etc. To validate LTNet's classification capabilities, this work utilized acetone and ethanol gases as target gases. In the experiment, a sensor array composed of 16 commercial semiconductor metal oxide gas sensors manufactured by FIGARO was used, which matched the sensor model in Data Sources II (UCI database).

n Data Sources II (UCI database).

The experiment employed a static gas volumetric method using 98% AR acetone and 98% anhydrous ethanol with a microsyringe capable of handling a range of 10

μ l

for liquid extraction. As indicated by Equation (1), it can be observed that conducting experiments directly with high-concentration test liquids would result in a very small volume of extracted liquid, making it difficult to inject into the gas chamber. To address this issue, the concentration of the test liquid was chosen to dilute to 10% in the experiment.

Q = \frac{V \times C \times M}{22.4 \times d \times r} \times 10^{- 9} \times \frac{273 + T_{R}}{273 + T_{B}}

(1)

where Q is the volume of the test liquid (mL), V denotes the volume of the gas chamber (mL), C stands for the desired gas concentration to be prepared (ppm), M is the molecular weight of the substance, d is the concentration of the test liquid, r signifies the liquid density(

g / {cm}^{3}

T_{R}

represents the laboratory ambient temperature (℃), and

T_{B}

is the gas chamber temperature (℃).

Following the specifications in the sensor manual, the working voltage of all 16 sensors was set to 5V. The sensor operating current was adjusted on the CGS-8 smart gas sensing analysis system through multiple experiments to identify the optimal operating current for each sensor. The sensor models and their optimal operating currents are presented in Table 1, respectively.

After setting the optimal operating current, it is necessary to preheat the sensor for two hours and wait for the baseline to stabilize. Then, activate the evaporation and heating functions of the experimental apparatus. Using a microsyringe to extract a certain amount of liquid from the test liquid prepared according to formula (1) and vertically drops the liquid into the evaporation dish. Expose the sensor array to acetone, ethanol, or a binary mixture of these two VOC gases. The concentration indices for the two gas mixtures are detailed in Table 2. The experimental response time is set approximately 120 seconds, and the recovery time is also about 120 seconds.

2.2. Data Sources II: UCI Database

This work also used a public database from the University of California (UCI) to complement and validate Data sources I. This dataset is a collection of gas sensor drift datasets at different concentrations collected by the Chemical Signaling Research Laboratory at the UCI BioCircuits Institute in San Diego [28,29]. The acetone concentrations in the dataset range from 12-500 ppm and ethanol concentrations range from 10-500 ppm. A total of 4650 data from UCI dataset were used for classification with LTNet.

2.3. Experimental Environment and Hardware Configuration

The algorithmic programming environment for this study is Python 3.10, running on a computer with an RTX 3060 graphics card. The LTNet network, as well as the comparison network, uses the Adam optimizer with a cross-entropy loss function. The model usually converges after about ten rounds of training, so Epoch was set to 30. Conventional convolutional neural networks take up a lot of graphics card memory during training, and for comparison purposes, the Batch size was set to 16. For the own mixed gas dataset, the learning rate was set to 0.0006, while for the UCI database, the learning rate was set to 0.0004.

3. Data Processing

3.1. Image Conversion Methods

CNNs are typically used for processing two-dimension image data. However, the raw data collected from each channel of the sensor array were one-dimensional results and not suitable for processing by the CNNs directly. To figure out this issue, an image transformation model based on Gramian Angular Field (GAF) was utilized [30], transforming one-dimensional time series data into two-dimensional images which would become effective input for the LTNet. The coding diagram based on GAF is shown in Figure 2. The response data of 16 sensors is selected at a certain sampling point, and the response data were denoted as

X = {x_{1}, \dots, x_{16}}

,then normalizing the response data

X

to the range of

[- 1, 1]

were carried out using the min-max normalization with formula (2).

{\tilde{x}}_{i} = \frac{(x_{i} - \max (X)) + (x_{i} - \min (X))}{\max (X) - \min (X)}, i = 1, 2...16

(2)

Where

X

represents the response data of the sampling points, and

x_{i}

is the response data of the i_th sensor, this step constrains the angular range between 0 and π, facilitating the acquisition of more detailed GAF information.

The selected data for this study consists of the response data from 16 sensors at a specific sampling point, and it does not involve time series. Therefore, there is no need to encode the timestamps as radii. Formula (3) is used to calculate the arccosine values of the response data for the sensors at this sampling point.

ϕ = \arccos ({\tilde{x}}_{i}) ， - 1 \leq {\tilde{x}}_{i} \leq 1 ， {\tilde{x}}_{i} \in X

(3)

After transforming the scaled

{\tilde{x}}_{i}

into polar coordinates, Capturing the Correlation Between Response Data of Different Sensors Through the Summation of Triangular Relationships Among Each Point. Therefore, GASF and GADF are defined by the following equations, respectively.

G A D F = (\begin{matrix} \sin (ϕ_{1} - ϕ_{1}) & \dots & \sin (ϕ_{1} - ϕ_{n}) \\ ⋮ & ⋮ & ⋮ \\ \sin (ϕ_{n} - ϕ_{1}) & \dots & \sin (ϕ_{n} - ϕ_{n}) \end{matrix})

(4)

G A S F = (\begin{matrix} \cos (ϕ_{1} + ϕ_{1}) & \dots & \cos (ϕ_{1} + ϕ_{n}) \\ ⋮ & ⋮ & ⋮ \\ \cos (ϕ_{n} + ϕ_{1}) & \dots & \cos (ϕ_{n} + ϕ_{n}) \end{matrix})

(5)

In which,

ϕ_{1}

and

ϕ_{n}

represent the normalized and inverse cosine-transformed response data of the first and nth sensors, respectively, in the sensor array.

3.2. Lightweight Neural Network Model

The high complexity, a large number of parameters, and relatively slow inference speed of neural networks could potentially impede the feasibility of porting pattern recognition algorithms to embedded systems. Therefore, a lightweight neural network model (LTNet) is proposed to figure out this problem. It includes a backbone network based on depthwise separable convolutions, and the complete network architecture is illustrated in Figure 3.

As shown in Figure 3, the architecture of the LTNet is mainly composed of the ConvBN layer and the LTBlock module. The convolutional layer is fused with the BN layer into a new ConvBN layer and the weights and biases of the ConvBN layer are reinitialized. The aim of this design is to reduce the parameter size of the network and improve the inference speed of the network in the validation procedure.

To extract features from the input image, the network uses a 3x3 deep convolution to learn the feature maps of the input channels when the input image passes through the ConvBN layer, in order to preserve the correlation between different channels. Subsequently, the feature maps produced by deep convolution are mapped by a 1x1 point-by-point convolutional layer to improve the ability of capturing local information of the models. In order to avoid increasing the depth of the network, the LTBlock module uses the ConvBN layer for feature extraction several times, and introduces the SE attention mechanism and residual connectivity in order to enhance the feature interactions between channels, maintaining the integrity of the original input information. The LTBlock module employs the Hard Swish as activation function (Hswish) in order to introduce the nonlinear nature of the output of the network neurons, and finally obtains the classification results through the Fully Connected Layer (FC layer).

3.3. Calculation of Depthwise Separable Convolutions Parameters

To enhance the efficiency of standard convolutions while maintaining network performance and generalization capability, LTNet introduced depthwise separable convolutions. Depthwise separable convolutions decompose standard convolutions into two steps: first, a depthwise convolution with a K×K kernel, followed by a pointwise convolution with a 1×1 kernel. In the depthwise convolution stage, independent convolution filters are applied to each input channel, making the convolution operation independent in the channel dimension and effectively capturing spatial features within each channel [31]. The role of pointwise convolution is to construct new features by computing a linear combination of input channels. The parameters and floating-point operations (FLOPs) for standard convolution and depthwise separable convolution are as follows:

C_{P} = K \times K \times C_{in} \times C_{out}

(6)

C_{F} = H \times W \times K \times K \times C_{i n} \times C_{o u t}

(7)

D_{P} = K \times K \times C_{i n} + C_{i n} \times C_{o u t}

(8)

D_{F} = H \times W (K \times K \times C_{i n} + C_{i n} \times C_{o u t})

(9)

Where

C_{P}

and

C_{F}

represent the parameters and floating-point operations (FLOPs) of standard convolution, and

D_{P}

and

D_{F}

represent the parameters and FLOPs of depthwise separable convolution. The convolution kernel size is

H

and

W

represent the dimensions of the output feature map.

C_{i n}

is the number of input feature map channels, and

C_{o u t}

is the number of output feature map channels. The comparison of parameters and FLOPs between standard convolution and depthwise separable convolution is as follows:

\frac{D_{P}}{C_{p}} = \frac{K \times K \times C_{i n} + C_{i n} \times C_{o u t}}{K \times K \times C_{in} \times C_{out}} = \frac{1}{C_{out}} + \frac{1}{K \times K}

(10)

\frac{D_{F}}{C_{F}} = \frac{H \times W (K \times K \times C_{i n} + C_{i n} \times C_{o u t})}{H \times W \times K \times K \times C_{i n} \times C_{o u t}} = \frac{1}{C_{o u t}} + \frac{1}{K \times K}

(11)

From equations (10) and (11), it can be observed that depthwise separable convolution involves fewer parameters and floating-point operations, making the model more lightweight.

This study integrates deep convolution with the BN layer, further reducing the number of parameters in the LTNet model and accelerating the model's inference speed. The parameter calculations before and after fusion are shown in equations (12) and (13).

U_{p} = C_{o u t} \times C_{i n} \times K \times K + 4 \times C_{o u t}

(12)

F_{p} {= C}_{o u t} \times C_{i n} \times K \times K + C_{o u t}

(13)

where

U_{p}

is the number of parameters before fusion and

F_{p}

is the number of parameters after fusion. The parameters of BN layer are mainly determined by four parameters which are scale parameter, offset parameter, mean and variance of BN layer. The purpose of BN layer is to normalize the data on each channel, so the number of parameters of BN layer corresponds to the output channel

C_{o u t}

. From Equation (12) and (13), fusing the depth-separable convolution with the BN layer only reduces the parameters of

C_{o u t}

with a factor of three. But, the fused weights and biases can be directly used in the testing and validation phases of LTNet, which can speed up the inference of LTNet and reduce the computation and memory consumption. The merged ConvBN layer obtained after fusion needs to be computed according to the relevant parameters of the convolutional and BN layers to generate new weights and biases, which are described by the following formula:

\begin{matrix} Y_{f} & = \frac{γ \cdot (X - μ)}{\sqrt{σ^{2} + ε}} + β \\ = \frac{γ \cdot (X_{c} W_{c} + b_{c}) - μ}{\sqrt{σ^{2} + ε}} + β \\ = \frac{γ \cdot W_{c} \cdot X_{c}}{\sqrt{σ^{2} + ε}} + \frac{γ \cdot (b_{c} - μ)}{\sqrt{σ^{2} + ε}} + β \end{matrix}

(14)

where

Y_{f}

represents the output of the merged ConvBN layer,

X

is the output of the pre-merged convolutional layer,

γ

and

β

denote the weights and biases of the BN layer,

σ^{2}

represents the variance, and

ε

is a constant which is to prevent division by zero.

X_{c}

represents the input of the pre-merged convolutional layer, while

W_{c}

and

b_{c}

are the weights and biases of the convolutional layer, respectively. Formula (14) can be simplified as follows:

Y_{f} = W_{f} \cdot X_{c} + b_{f}

(15)

where the weights

W_{f}

and biases

b_{f}

of the ConvBN layer are calculated as follows:

W_{f} = \frac{γ \cdot W_{c}}{\sqrt{σ^{2} + ε}}

(16)

b_{f} = \frac{γ \cdot (b_{c} - μ)}{\sqrt{σ^{2} + ε}} + β

(17)

4. Results and Discussion

The dataset was divided into training, testing, and validation sets in a 6:3:1 ratio. For the own mixed gas dataset, the numbers of images in the training, testing, and validation sets are 4,474, 2,234, and 744, respectively. For the UCI database, the training, testing, and validation sets consist of 2,791, 1,395, and 464 images, respectively. In this study, validation set accuracy and six evaluation metrics were adopted as criteria for assessing both the lightweight of the model and its classification performance. These criteria include the model's total accuracy on the validation set (Accuracy) (ethanol, acetone and the mixture), the time required for the model to complete thirty training epochs (Training time), the GPU memory usage under the same conditions when different models are trained with a cleared background (GPU RAM), the inference time on the validation set (Inference time), the model's parameters (Params), and the size of the best-preserved weights on the test set (Weight size).

4.1. Data Conversion Comparison Test and Results Discussion

In the data preprocessing, GASF is compared with Gramian Angular Difference Field (GADF), Short-Time Fourier Transform (STFT), and Markov Transition Field (MTF). Figure 4 illustrates the images transformed by these four data transformation methods.

LTNet was employed to evaluate four different methods for transforming one-dimensional time series into two-dimensional images. The confusion matrices which can display the classification of each sample intuitively, and is an important index used to measure the classification performance are depicted in Figure 5 (a), (b), (c), and (d), while Table 3 presents a comparative analysis of the results obtained using these four methods.

From Table 3 and the results of confusion matrices, it is evident that STFT exhibits the highest total classification accuracy, reaching 100%. The majority of errors for GADF and GASF are concentrated in the mixed gas class, with total classification accuracies of 98.79% and 99.06%, respectively. MTF, on the other hand, experiences more classification errors in the mixed gas and ethanol classes, resulting in the total accuracy of only 92.34%. However, training with images transformed using STFT takes the longest training time, reaching 1630.92 seconds, and it also requires the highest GPU RAM usage. Conversely, training with images transformed using GASF for the same thirty epochs only takes 844.38 seconds and occupies a mere 2.1GB of GPU RAM while achieving similar accuracy to STFT. Therefore, GASF was chosen as the data transformation method in this work.

4.2. Model Evaluation and Comparison Experiment

LTNet is compared with a total of five different networks, including three traditional convolutional neural networks (AlexNet, ResNet50, VGG16) and two lightweight convolutional neural networks (EfficientNet, MobileNetV3_large). AlexNet is a relatively deep neural network, which facilitates the model in learning more complex features. ResNet50 implements classification using skip-connected residual blocks, VGG16 employs deep convolutional networks for feature extraction from raw samples, followed by classification using fully connected layers. EfficientNet achieves high performance in resource-constrained environments through strategies like Compound Scaling and Width Multiplier. MobileNetV3_large classifies raw samples using depthwise separable convolution.

Before conducting the classification task, we compared the model parameters and the size of the best saved weights during training of LTNet with the other five networks, as shown in Table 4. Among these six models, LTNet has only 32,614 parameters, which is equivalent to just 0.139% of the traditional convolutional neural network ResNet50, less than 0.1% of AlexNet and VGG16, and even less than 1% of the popular lightweight convolutional neural networks, EfficientNet and MobileNetV3_large. The optimal training weight size of LTNet is 0.155MB, demonstrating a more efficient memory utilization compared to MobileNetV3_large, EfficientNet, and AlexNet. This indicates that relative to the existing lightweight convolutional neural networks and traditional convolutional neural networks, LTNet is better suited for use in resource-constrained environments.

4.3. Classification Results of Own Mixed Gas Dataset

From (e), (f), (g), (h), (i), and (j) in Figure 5 and Table 5, it can be observed that for the own mixed gas dataset, LTNet had five errors in the mixed gas category and only two errors in the ethanol gas category, achieving the highest classification accuracy of 99.06%. Additionally, it's worth noting that LTNet's GPU RAM usage during training was significantly lower than that of traditional convolutional networks such as VGG16 and lightweight networks like EfficientNet. It completed thirty rounds of training in only 844.38 seconds, making it the fastest among all compared networks, far surpassing ResNet50, VGG16, and EfficientNet. LTNet only takes 23 seconds to complete inference on 744 validation set images, making it the fastest among these six networks. It significantly outperforms traditional convolutional neural networks, with the required inference time being only a quarter of that of the lightweight network MobileNetV3_large.

4.4. UCI Database Classification Results

From Figure 5 (k), (l), (m), (n), (o), (p), and the Table 6, it can be observed that for the UCI database, LTNet achieved similar results as on the own mixed gas dataset. LTNet still maintains the highest classification accuracy while significantly outperforming the traditional convolutional neural network models and lightweight convolutional neural network models in terms of GPU RAM, training time, and inference time.

Results from the own mixed gas dataset and the UCI database demonstrate that LTNet can achieve high accuracy in gas classification tasks while maintaining low computational resource requirements, further validating the lightweight nature of LTNet.

5. Conclusions

In this study, we proposed a lightweight and efficient LTNet network model combined with GASF to convert one-dimensional time series into two-dimensional images for high-precision classification of acetone and ethanol gases, which are respiratory markers for lung cancer patients. The 6-evaluation metrics verified that LTNet outperforms classical convolutional neural network models such as VGG16 and ResNet50, as well as lightweight neural network models such as MobileNetV3_large. Validation with the own mixed gas dataset and the UCI database shows that comparing with the other five models, LTNet has higher classification accuracy, superior generalization performance, and fewer number of parameters. And by fusing the convolutional layer with the BN layer, the inference speed of LTNet in the validation set is much faster than that of ResNet50, MobileNetV3_large and so on. In addition, it required less graphic card resources during the training process and the model weights took up less memory. This indicated that the LTNet network requires less computational resources and is suitable for less configured hardware. The lightweight network model lays the foundation for subsequent algorithm transplant. In future, Artificial intelligence and novel biomarkers play a key role in the entire lung cancer screening process, promising to transform lung cancer screening [32].

Author Contributions

Conceptualization, C.Z., L.L, F.Z and Y.Z.; methodology, C.Z. and L.L.; software, C.Z.; validation, C.Z.; formal analysis, L.L.; investigation, C.Z., L.L, F.Z and Y.Z.; resources, C.Z. and L.L.; data curation, C.Z. and L.L.; writing—original draft preparation, C.Z.; writing—review and editing, L.L.; visualization, C.Z.; supervision, L.L and H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No.21605006), Science and Technology Development Program of Jilin Province (20170520160JH) and the Education Department of Scientific Research Project of Jilin Province, China (No. JJKH20230758KJ, JJKH20191291KJ, JJKH20170567KJ, and 201592). (Corresponding author: Lei Li).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, Z. P.; Cheng, H. L.; Loh, S. Y.; Cheng, K. K. F., Functional Status, Supportive Care Needs, and Health-Related Quality of Life in Advanced Lung Cancer Patients Aged 50 and Older. Asia-Pacific Journal of Oncology Nursing 2020, 7, (2), 151-160. [CrossRef]
Vranas, K. C.; Lapidus, J. A.; Ganzini, L.; Slatore, C. G.; Sullivan, D. R., Association of Palliative Care Use and Setting With Health-care Utilization and Quality of Care at the End of Life Among Patients With Advanced Lung Cancer. Chest 2020, 158, (6), 2667-2674. [CrossRef]
Siegel, R. L.; Miller, K. D.; Wagle, N. S.; Jemal, A., Cancer statistics, 2023. Ca-a Cancer Journal for Clinicians 2023, 73, (1), 17-48.
Marzorati, D.; Mainardi, L.; Sedda, G.; Gasparri, R.; Spaggiari, L.; Cerveri, P., A review of exhaled breath: a key role in lung cancer diagnosis. J. Breath Res. 2019, 13, (3), 20. [CrossRef]
Sun, X. H.; Shao, K.; Wang, T., Detection of volatile organic compounds (VOCs) from exhaled breath as noninvasive methods for cancer diagnosis. Anal. Bioanal. Chem. 2016, 408, (11), 2759-2780. [CrossRef]
Zhu, H. Y.; Liu, C.; Zheng, Y.; Zhao, J.; Li, L., A Hybrid Machine Learning Algorithm for Detection of Simulated Expiratory Markers of Diabetic Patients Based on Gas Sensor Array. Ieee Sensors Journal 2023, 23, (3), 2940-2947. [CrossRef]
Buszewski, B.; Ulanowska, A.; Kowalkowski, T.; Cieslinski, K., Investigation of lung cancer biomarkers by hyphenated separation techniques and chemometrics. Clinical Chemistry and Laboratory Medicine 2012, 50, (3), 573-581. [CrossRef]
Ulanowska, A.; Kowalkowski, T.; Trawinska, E.; Buszewski, B., The application of statistical methods using VOCs to identify patients with lung cancer. J. Breath Res. 2011, 5, (4). [CrossRef]
Apolonski, A.; Maiti, K. S., Towards a standard operating procedure for revealing hidden volatile organic compounds in breath: the Fourier-transform IR spectroscopy case. Appl. Optics 2021, 60, (14), 4217-4224. [CrossRef]
Schulz, E.; Woollam, M.; Grocki, P.; Davis, M. D.; Agarwal, M., Methods to Detect Volatile Organic Compounds for Breath Biopsy Using Solid-Phase Microextraction and Gas Chromatography-Mass Spectrometry. Molecules 2023, 28, (11). [CrossRef]
Ye, Z. Y.; Wang, J.; Hua, H.; Zhou, X. D.; Li, Q. L., Precise Detection and Quantitative Prediction of Blood Glucose Level With an Electronic Nose System. Ieee Sensors Journal 2022, 22, (13), 12452-12459. [CrossRef]
Baldini, C.; Billeci, L.; Sansone, F.; Conte, R.; Domenici, C.; Tonacci, A., Electronic Nose as a Novel Method for Diagnosing Cancer: A Systematic Review. Biosensors 2020, 10, (8). [CrossRef]
Dragonieri, S.; Annema, J. T.; Schot, R.; van der Schee, M. P. C.; Spanevello, A.; Carratú, P.; Resta, O.; Rabe, K. F.; Sterk, P. J., An electronic nose in the discrimination of patients with non-small cell lung cancer and COPD. Lung Cancer 2009, 64, (2), 166-170. [CrossRef]
Dragonieri, S.; Pennazza, G.; Carratu, P.; Resta, O., Electronic Nose Technology in Respiratory Diseases. Lung 2017, 195, (2), 157-165. [CrossRef]
Marzorati, D.; Mainardi, L.; Sedda, G.; Gasparri, R.; Spaggiari, L.; Cerveri, P., MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis. Chemosensors 2021, 9, (8). [CrossRef]
Liu, B.; Yu, H.; Zeng, X.; Zhang, D.; Gong, J.; Tian, L.; Qian, J.; Zhao, L.; Zhang, S.; Liu, R., Lung cancer detection via breath by electronic nose enhanced with a sparse group feature selection approach. Sensors and Actuators B: Chemical 2021, 339. [CrossRef]
Zhiguang Wang, T. O, Imaging Time-Series to Improve Classification and Imputation. presented at the Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, 2015.
Wang, X.; Qian, C.; Zhao, Z.; Li, J.; Jiao, M., A Novel Gas Recognition Algorithm for Gas Sensor Array Combining Savitzky–Golay Smooth and Image Conversion Route. Chemosensors 2023, 11, (2). [CrossRef]
Binson, V. A.; Subramoniam, M.; Sunny, Y.; Mathew, L., Prediction of Pulmonary Diseases With Electronic Nose Using SVM and XGBoost. IEEE Sensors Journal 2021, 21, (18), 20886-20895. [CrossRef]
Chen, K.; Liu, L.; Nie, B.; Lu, B.; Fu, L.; He, Z.; Li, W.; Pi, X.; Liu, H., Recognizing lung cancer and stages using a self-developed electronic nose system. Computers in Biology and Medicine 2021, 131. [CrossRef]
Peng, C.; Zheng, Y., Robust gas recognition with mixed interference using a spiking neural network. Measurement Science and Technology 2021, 33, (1). [CrossRef]
Lekha, S.; M, S., Real-Time Non-Invasive Detection and Classification of Diabetes Using Modified Convolution Neural Network. IEEE Journal of Biomedical and Health Informatics 2018, 22, (5), 1630-1636. [CrossRef]
Wang, T.; Zhang, H. X.; Wu, Y.; Jiang, W. K.; Chen, X. W.; Zeng, M.; Yang, J. H.; Su, Y. J.; Hu, N. T.; Yang, Z., Target discrimination, concentration prediction, and status judgment of electronic nose system based on large-scale measurement and multi-task deep learning. Sens. Actuator B-Chem. 2022, 351, 12. [CrossRef]
Avian, C.; Mahali, M. I.; Putro, N. A. S.; Prakosa, S. W.; Leu, J.-S., Fx-Net and PureNet: Convolutional Neural Network architecture for discrimination of Chronic Obstructive Pulmonary Disease from smokers and healthy subjects through electronic nose signals. Computers in Biology and Medicine 2022, 148. [CrossRef]
Guo, J.; Cheng, Y.; Luo, D. H.; Wong, K. Y.; Hung, K.; Li, X., ODRP: A Deep Learning Framework for Odor Descriptor Rating Prediction Using Electronic Nose. Ieee Sensors Journal 2021, 21, (13), 15012-15021. [CrossRef]
Li, Z.; Kang, S.; Feng, N.; Yin, C.; Shi, Y., PSCFormer: A lightweight hybrid network for gas identification in electronic nose system. Pattern Recognition 2024, 145. [CrossRef]
Shi, Y.; Wang, B.; Yin, C.; Li, Z.; Yu, Y., Performance improvement: A lightweight gas information classification method combined with an electronic nose system. Sensors and Actuators B: Chemical 2023, 396. [CrossRef]
Rodriguez-Lujan, I.; Fonollosa, J.; Vergara, A.; Homer, M.; Huerta, R., On the calibration of sensor arrays for pattern recognition using the minimal number of experiments. Chemometrics and Intelligent Laboratory Systems 2014, 130, 123-134. [CrossRef]
Vergara, A.; Vembu, S.; Ayhan, T.; Ryan, M. A.; Homer, M. L.; Huerta, R., Chemical gas sensor drift compensation using classifier ensembles. Sens. Actuator B-Chem. 2012, 166, 320-329. [CrossRef]
Xiong, L. J.; He, M.; Hu, C.; Hou, Y. X.; Han, S. Y.; Tang, X. Y., Image presentation and effective classification of odor intensity levels using multi-channel electronic nose technology combined with GASF and CNN. Sens. Actuator B-Chem. 2023, 395. [CrossRef]
Bai, L.; Zhao, Y. M.; Huang, X. M., A CNN Accelerator on FPGA Using Depthwise Separable Convolution. Ieee Transactions on Circuits and Systems Ii-Express Briefs 2018, 65, (10), 1415-1419.
Adams, S. J.; Stone, E.; Baldwin, D. R.; Vliegenthart, R.; Lee, P.; Fintelmann, F. J., Lung cancer screening. The Lancet 2023, 401, (10374), 390-408.

Figure 1. Schematic diagram of the overall process of this work.

Figure 2. The diagram of GAF coding. X represents the response data of 16 sensors at a certain sampling point. After scaling, X is converted to the polar coordinate system, and finally the GASF/GADF image is generated.

Figure 3. Block diagrams of LTNet network structure. (a) the ConvBN module is presented, which results from the fusion of depthwise separable convolution and Batch Normalization (BN) layers, (b) the backbone network, and (c) the LTBlock module, constructed by integrating the ConvBN module, residual connections, and the Squeeze-and-Excitation (SE) attention mechanism.

Figure 4. Image generated after data conversion. (a), (b), (c) and (d) are images converted by GADF, MTF, SFTF and GASF, respectively.

Figure 5. shows the confusion matrices. (a), (b), (c), and (d) represent the evaluation results of LTNet on the GADF, MTF, SFTF, and GASF image transformation methods, respectively. (e), (f), (g), (h), (i), and (j) show the evaluation results of AlexNet, ResNet50, VGG16, EfficientNet, MobileNetV3_large, and LTNet on the mixed gas dataset. (k), (l), (m), (n), (o), and (p) display the evaluation results of AlexNet, ResNet50, VGG16, EfficientNet, MobileNetV3_large, and LTNet on the UCI database.

Table 1. Sensor models and optimal operating currents.

NO.	Models	Target gases	Detection ranges (ppm)	Optimal operating currents (mA)
1	TGS2600	Ethanol、Hydrogen	1-30	45
2	TGS2602	Ammonia、Ethanol	Ethanol 1-30	50
3	TGS2610	Organic compounds	500-10000	55
4	TGS2620	Ethanol、Organic compounds	Ethanol 50-5000	43

Table 2. Details of concentration indicators in the acetone ethanol experimental dataset.

NO.	Ethanol (ppm)	Acetone (ppm)	Mixed gas (ppm)
1	0	1	1
2	0	3	3
3	0	5	5
4	0	7	7
5	0	9	9
6	0	11	11
7	0	13	13
8	0	15	15
9	1	0	1
10	3	0	3
11	5	0	5
12	7	0	7
13	9	0	9
14	11	0	11
15	13	0	13
16	15	0	15
17	1	1	2
18	1	5	6
19	1	10	11
20	1	15	16
21	5	1	6
22	5	5	10
23	5	10	15
24	5	15	20
25	10	1	11
26	10	5	15
27	10	10	20
28	10	15	25
29	15	1	16
30	15	5	20
31	15	10	25
32	15	15	30

Table 3. Comparison of results of four data conversion methods.

Models	Accuracy	Training time (S)	GPU RAM (G)
GADF	98.79%	863.49	2.3
MTF	92.34%	929.08	2.6
STFT	100%	1630.92	2.6
GASF	99.06%	844.38	2.1

Table 4. Model parameters and weights.

Models	Params.	Weight size (MB)
AlexNet	57012034	217
ResNet50	23514179	89.9
VGG16	134268738	512
EfficientNet	4586092	17.8
MobileNetV3_large	4208443	16.2
LTNet (This work)	32614	0.155

Table 5. Classification results of mixed gas data sets.

Models	Accuracy	GPU RAM (G)	Training time (S)	Inference time (S)
AlexNet	97.71%	3.1	853.27	283
ResNet50	98.39%	3.8	1234.34	284
VGG16	97.98%	6.9	2249.56	592
EfficientNet	99.06%	5.4	1373.48	170
MobileNetV3_large	98.79%	3.3	877.53	91
LTNet (This work)	99.06%	2.1	844.38	23

Table 6. Classification accuracy and model parameters of UCI database.

Models	Accuracy	GPU RAM (G)	Training time (S)	Inference time (S)
AlexNet	98.92%	3.2	613.50	187
ResNet50	98.92%	3.7	841.81	178
VGG16	98.71%	7.1	1477.44	377
EfficientNet	98.92%	5.3	933.21	109
MobileNetV3_large	98.92%	3.3	606.20	60
LTNet (this work)	99.14%	2.1	584.67	14

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

NO.	Ethanol (ppm)	Acetone (ppm)	Mixed gas (ppm)
1	0	1	1
2	0	3	3
3	0	5	5
4	0	7	7
5	0	9	9
6	0	11	11
7	0	13	13
8	0	15	15
9	1	0	1
10	3	0	3
11	5	0	5
12	7	0	7
13	9	0	9
14	11	0	11
15	13	0	13
16	15	0	15
17	1	1	2
18	1	5	6
19	1	10	11
20	1	15	16
21	5	1	6
22	5	5	10
23	5	10	15
24	5	15	20
25	10	1	11
26	10	5	15
27	10	10	20
28	10	15	25
29	15	1	16
30	15	5	20
31	15	10	25
32	15	15	30

NO.	Ethanol (ppm)	Acetone (ppm)	Mixed gas (ppm)
1	0	1	1
2	0	3	3
3	0	5	5
4	0	7	7
5	0	9	9
6	0	11	11
7	0	13	13
8	0	15	15
9	1	0	1
10	3	0	3
11	5	0	5
12	7	0	7
13	9	0	9
14	11	0	11
15	13	0	13
16	15	0	15
17	1	1	2
18	1	5	6
19	1	10	11
20	1	15	16
21	5	1	6
22	5	5	10
23	5	10	15
24	5	15	20
25	10	1	11
26	10	5	15
27	10	10	20
28	10	15	25
29	15	1	16
30	15	5	20
31	15	10	25
32	15	15	30

The Classification of VOCs Based on Sensor Images Using a Lightweight Neural Network for Lung Cancer Diagnosis

Abstract

1. Introduction

2. Experimental Section

2.1. Data Sources I: Gas Mixture Data Set

2.2. Data Sources II: UCI Database

2.3. Experimental Environment and Hardware Configuration

3. Data Processing

3.1. Image Conversion Methods

3.2. Lightweight Neural Network Model

3.3. Calculation of Depthwise Separable Convolutions Parameters

4. Results and Discussion

4.1. Data Conversion Comparison Test and Results Discussion

4.2. Model Evaluation and Comparison Experiment

4.3. Classification Results of Own Mixed Gas Dataset

4.4. UCI Database Classification Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe

NO.	Ethanol (ppm)	Acetone (ppm)	Mixed gas (ppm)
1	0	1	1
2	0	3	3
3	0	5	5
4	0	7	7
5	0	9	9
6	0	11	11
7	0	13	13
8	0	15	15
9	1	0	1
10	3	0	3
11	5	0	5
12	7	0	7
13	9	0	9
14	11	0	11
15	13	0	13
16	15	0	15
17	1	1	2
18	1	5	6
19	1	10	11
20	1	15	16
21	5	1	6
22	5	5	10
23	5	10	15
24	5	15	20
25	10	1	11
26	10	5	15
27	10	10	20
28	10	15	25
29	15	1	16
30	15	5	20
31	15	10	25
32	15	15	30