1. Introduction
Lung cancer is an aggressive neoplasm, and is leading cause of cancer-related deaths worldwide, with an estimated 1.8 million deaths [
1]. The five-year survival rate is associated with the stage of disease - 67% stage I, and 23% stage III - as well as the mortality being strongly associated with the late diagnosis [
2]. This scenario is aggravated by the absence of a non-invasive screening test, for example mammography and the fecal occult blood test currently in use for other aggressive neoplasms such as breast cancer and colorectal cancer (survival rate 60% - 80% respectively). Although low-dose computed tomography (LDCT) has shown a 20% mortality reduction [
3], its application remains limited to the high-risk population (heavy smokers aged 50-80 years), excluding the growing number of young individuals (<50 years) diagnosed with advanced-stage lung cancer [
4,
5]. Furthermore, the prevalence of false positives leading to unnecessary invasive diagnostic procedures, coupled with the high-costs of the methodology, renders it unsuitable for integration into screening initiatives in low-income developing countries [
6]. Concerning clinical practice, there is a pressing need for an alternative solution to address the key questions such as non-invasiveness and test reliability, while favoring easily obtainable biological samples that can be analyzed with cost-effective tools and reagents, thus making it feasible for adoption even in less industrialized countries. Indeed, according to the National Institute of Health (NIH) a biomarker is defined as “a characteristic used to measure and evaluate objectively normal biological processes, pathogenic processes, or pharmacological responses to a therapeutic intervention” [
7]. In this regard, during the last decade a considerable number of research studies have focused on the investigation of new technologies for the identification of biomarkers that should be suitable for mass screening, tackling the complexity of the biological and histological heterogeneity of lung cancer. Several biological molecules such as proteins, microRNAs (miRNAs), circulants tumor cells (CTCs), tumor DNA (ctDNA) and volatile organic compounds (VOCs) have been investigated to understand their predictive value. Another key point of early detection is the issue of sample choice. The body fluids such as blood (serum and plasma), urine, stools, exhaled breath, sputum and saliva satisfy the clinical needs because of their simplicity of collection and non-invasiveness [
8,
9].
Finally, this review aims to offer a contribution to the literature concerning biomarkers for the early diagnosis of lung cancer. We focused on the state of the art as well as on promising biomarkers, while discussing challenges and tips for the discovery of the biomarkers and their transition into clinical practice.
1.1. Phases for the discovery of lung cancer biomarkers
Investigation and translation into clinical practice of the biomarkers is the main issue in the development of a screening test. In 2001, Pepe et.al proposed four consecutive phases to reach the final validation of the biomarkers. (
Figure 1).
The first phase is the identification of specific biomarkers of the neoplasia by comparing the analysis of tumor and healthy tissue. A crucial factor in the first phase is the stratification of the study cohort, particularly the control group, to ensure that other factors such as age, gender, race, and possibly lifestyle-related characteristics (smoking habits), do not influence the biomarker’s expression, which would result in an overlap between cases and controls. In the second phase, the biomarkers detected in the first phase, will be searched in samples that do not require invasive procedures for collection (e.g., blood, urine, respiratory exhalation); In phase III, a comparison is made between individuals who have cancer (not diagnosed at the time of biomarker analysis) and individuals who have not developed the disease. In phase IV, patients who test positive in the screening (using only the biomarker) are referred for further diagnostic evaluation. This phase also helps identify the number of false-positive cases undergoing further assessment.
Although the research path for a biomarker may seem simple, in practice, it is a highly complex and expensive process. The validation phases require a large number of samples to ensure appropriate statistical validity of the data, as well as samples that reflect the biological variability of the population [
10].
4. Discussion
Early diagnosis of lung cancer ranks among the most crucial health issues. The five-year survival is strongly correlated with stage (90% stage I vs 10% stage IV) [
51]. Considerable advances have been made for metastatic lung cancer through the finding of numerous disease subtypes defined by specific oncogenic driver mutations (EGFR, ALK, ROS1, BRAF, HER2, MET, RET or KRASG12C and PD-L1). This has led to the development of a range of molecularly targeted therapies which have exerted a significant impact on patient survival rates. [
52] By contrast, although numerous studies have been conducted to search for useful biomarkers for early diagnosis, none of the investigated molecules have been incorporated into clinical practice. Currently, in a clinical setting, serum tumor markers are increasingly being used as a supplement to radiological examinations (CT and PET) for therapy monitoring and disease recurrence. Studies on lung cancer patients have demonstrated that proteins such as CYFRA 21-1, CEA, and NSE can be used to determine the lung cancer subtype or are correlated with the stage and prognosis of the disease. Additionally, the integration of ctDNA into the protein panel has shown potential for therapy monitoring [
11,
13,
14,
15]. However, the serum concentration of these biomarkers also increases in the presence of other malignancies, rendering them nonspecific. Thus, despite emerging evidence of their potential in early diagnosis and treatment monitoring, they have not yet been incorporated into clinical guidelines. Further studies will be necessary to determine whether proteomic analyses could be cost-effective for lung cancer screening, even in low-income countries [
9]. On the other hand, TAAs can be detected in the blood by means of the ELISA assay, one of the most specific and straightforward tests for detecting circulating biomolecules, widely used in research and in clinics worldwide [
53]. Furthermore, one study reported that lung cancer patients have TAA levels 30% higher than those in healthy individuals [
54]. Du Q et al. described that TAA levels did not differ among different stages of the disease, being able to detect both early and advanced stages of lung cancer, suggesting that this phenomenon may be related to immune amplification. This characteristic could be crucial for early diagnosis [
18]. However, studies described in the peer-reviewed biomedical literature have shown a low-test sensitivity (30-40%), limiting their use in detecting high-risk subjects [
16,
17,
18,
19]. Therefore, additional studies will be needed to understand the combination of antibodies with greater sensitivity and specificity for early-stage lung cancer.
Concerning the microRNAs, a significant number of research studies have explored their potential role as a biomarker in early diagnosis of lung cancer. MicroRNA has been studied alone or in combination with other biomolecules, showing high sensitivity and specificity in distinguishing lung cancer subjects from healthy controls [
24,
26,
29]. As described in the results, the miR-Test and MSC are the only studies currently ongoing, and are pending validation. Moreover, the miR-Test has presented accuracy, sensitivity, and specificity of less than 80% as well as the detection sensitivity between stage I (69%) and stages II/III (71%), whichis not particularly significant [
27]. At the same time, the MSC has shown a greater effect only in combination with CT. [
29,
29]. All these miRNA studies conducted so far suffer some degree of limitation such as the small sample size, the selection and inclusion of patients in advanced stages, epidemiological diversity, and poor homogeneity in experimental protocols [
8,
19].
Other interesting biomolecules are the CTCs. These are already used in thoracic oncology for the molecular characterization of lung cancer, suggesting their potential role for the early diagnosis of lung cancer. Indeed, recent data have shown that lung cancer patients have CTC levels ten times higher than patients with other types of cancer, and their presence in the circulation is associated with a worse prognosis [
35]. However, CTC isolation and detection remain complex procedures that require the development of new technologies. Furthermore, the literature shows high specificity (96% - 100%) and low sensitivity (26% - 68%) [
32,
33]. ctDNA, like CTCs, has a circulating concentration ranging from 0.01% to 90% depending on the lung cancer burden and its progression [
35]. As demonstrated in the literature, its concentration is much higher in advanced stages compared to early stages (100% vs. 50%), resulting in levels eight times higher than in healthy individuals [
11,
12,
38]. While these characteristics make ctDNA a promising biomarker for early diagnosis, its low plasma quantity and half-life have challenged its clinical applicability. Further research on biological samples other than plasma could be useful in assessing the future applicability of ctDNA. However, future studies will be necessary to explore the diagnostic power of CTCs and ctDNA for early lung cancer diagnosis and their utility in clinical practice.
During the last decade, the volatolomic profile has garnered much interest in the field of the early diagnosis of lung cancer. Various types of Volatile Organic Compounds (VOCs) can be detected in exhaled breath because they have low solubility in the blood and are thus excreted in the breath within minutes of their formation. Several analytical methods for VOC detection have been recently developed. The most widely used analytical method is Gas Chromatography-Mass Spectrometry (GC/MS). However, MS-based analysis is expensive and requires skilled staff. Other innovative and cost-effective tools are available for potential future applications in clinical practice, such as artificial sensors (electronic nose). Studies have demonstrated the ability of the electronic nose (eNose) to distinguish between lung cancer patients and healthy subjects, particularly with greater sensitivity in diagnosing stage I lung cancer compared to stages II, III, or IV. Furthermore, the combined use of GC/MS and eNose could enhance the method’s specificity, sensitivity, and accuracy. Recent research has unveiled the potential of urinary volatile compounds for early Non-Small Cell Lung Cancer (NSCLC) diagnosis, indicating GC/MS’s ability to detect not only specific volatile compounds related to the lung cancer group but also to discriminate specific compounds associated with the early stages. Future research will be necessary to implement the results obtained thus far.
In conclusion, we consider all the studies currently available in the literature conceptually valid. However, they do still suffer from the limitation of a small number of participants and partial stratification of the population that includes few early-stage patients [
31]. Elsewhere they have the absence of a long follow-up as well as the use of assays with poor reproducibility.
4.1. Future perspectives
The near future should be focused on establishing a multicenter clinical trial involving a large number of participants, with the aim of improving population stratification involved in the study. The group of lung cancer patients should be stratified based on the radiological characteristics of the nodule (e.g., ground-glass opacity, solid nodule, spiculated nodule) alongside histological factors (TNM staging), with a particular emphasis on early stages. Simultaneously, the group of high-risk healthy subjects should be stratified based on their family cancer history, smoking history, and existing lung conditions. Furthermore, the study should include both prospective and retrospective analyses, along with the planning of a follow-up to monitor the behavior of biomarkers in high-risk healthy subjects, in order to identify the potential presence of a malignant lung nodule in individuals who have not yet received a diagnosis.
The results should be analysed via the creation of a universal database serving a global network of research centers dedicated to early lung cancer diagnosis. Data-sharing is paramount to create resource synergy and optimization, reducing costs in biomarker research. Ideally, a consortium would be established, whereby biomarkers for mass population screening are discussed and evaluated. Each participant should use a variety of approaches and data collection methods and make them transparently available in the public domain. Furthermore, great technological strides have witnessed the development and deployment of artificial intelligence and machine learning tools that can be used to process, overlay, and integrate molecular biomarkers with clinical and epidemiological data.
The milestone we aspire to reach is a non-invasive and relatively cost-effective diagnostic tool that can significantly impact clinical decision-making. This tool should be robust and readily implementable in the clinical practice of all healthcare facilities, even those in resource-constrained countries.