1. Introduction
Idiopathic granulomatous mastitis (IGM) poses a significant clinical challenge [
1,
2]. It is characterised by chronic inflammation and granuloma formation in the breast, with the precise molecular mechanisms driving its pathogenesis remaining ambiguous [
3,
4,
5]. Unlike other forms of mastitis with known risk factors of breastfeeding, infection, or autoimmunity, IGM perplexes clinicians and researchers alike due to its elusive aetiology and diverse clinical presentation [
5,
6,
7,
8].
The rarity of IGM complicates both its diagnosis and treatment [
2,
9]. Specific prevalence and incidence rates are not well-established in the literature due to their rarity and the diagnosis being one of exclusion [
10]. Some studies indicate that the incidence may be higher in certain geographic regions and populations, but precise numbers are not widely available [
2,
11]. With its low incidence rate and heterogeneous clinical manifestations, establishing standardised diagnostic criteria and therapeutic guidelines proves challenging [
2,
11,
12]. The lack of consensus regarding management also underscores the pressing need for a deeper understanding of the disease’s molecular underpinnings [
10].
Advancements in genomic technologies have heralded a new era in unravelling the genetic basis of complex diseases. Whole exome sequencing (WES) has emerged as a powerful tool for comprehensively interrogating the coding regions of the genome, offering a promising avenue to explore the genetic landscape of rare disorders like IGM [
13]. Studying IGM-related somatic mutations enhances our understanding of the molecular mechanisms underlying the disease’s pathogenesis. Such insights can lead to improved diagnostics through biomarker identification, enabling quicker and more accurate differentiation from other breast diseases [
14]. In this study, we embark on a first-of-its-kind endeavour to identify somatic mutations associated with IGM by employing WES on matched blood and tissue samples from IGM patients.
3. Discussion
This study presents somatic variants identified through WES in paired blood and breast tissue samples from eight women diagnosed with IGM. WES libraries exhibited typical fragment size distributions, high mapping rates, and sufficient coverage depth across exonic regions. Somatic variant calling revealed more variants by Strelka2 compared to Mutect2 in paired samples. This flips when variants called by Mutect2 are annotated to more non-synonymous mutations and PTVs than those called by Strelka2. However, none of the variants were pathogenic per ClinVar annotation. Further examination identified 53 altered genes, with CHIT1, CEP170 and CTR9 genes altered in more than one case. Functional enrichment analysis did not show statistically significant pathways, although terpenoid backbone biosynthesis, protein export, and protein processing in the endoplasmic reticulum were implicated. Validation of variants through Sanger sequencing did not yield any validated variants.
Differences in the variability of sensitivity and specificity between different variant calling algorithms have been extensively discussed [
15,
16,
17,
18,
19]. Their differences in algorithm and focus of the variant caller underscores this discrepancy between
Strelka2 and
Mutect2 in terms of the number of variants identified, and their limited overlap.
Strelka2 uses a probabilistic model leveraging local assembly and realignment to call variants for more sensitivity in identifying low-frequency somatic mutations, especially in matched tumour-normal pairs, by using Bayesian methods to model both the tumour and normal samples [
20]. Contrastingly, the haplotype-based approach in
Mutect2 employs a sophisticated filtering process that incorporates various sources of evidence to distinguish between true mutations and sequencing artefacts [
21]. Designed to balance sensitivity and specificity,
Mutect2 minimises false positives by implementing additional artefact filters for oxidative artefacts and strand bias, on top of the standard filtering preprocessing [
21]. Furthermore, nf-core/sarek
Strelka2’s use of hard filters based on fixed thresholds, vs GATK4
Mutect2 use of machine learning to filter variants could provide additional explanation for the large discrepancy in called variants [
20,
21,
22].
Both algorithms identified more SNVs than indels, which is consistent with typical findings in WES studies [
23,
24,
25]. Despite the larger number of variants identified by
Strelka2, the fewer non-synonymous mutations and PTVs annotation in
Funcotator contrasted with
Mutect2 variant calls in matched blood-tissue samples, emphasises the need for the multiple variant caller approach for capturing the full spectrum of genetic alterations [
26,
27]. The limited overlap across the different categorisations of the variants in both the matched blood-tissue calls and the blood only calls, signals further optimisation of the variant calling pipelines and validating identified variants through independent methods are necessary.
From the PTVs from the matched blood-tissue
Strelka2 and
Mutect2 variant calling,
CHIT1,
CEP170, and
CTR9, were altered in more than one case.
CHIT1 has been implicated in both granulomatous and non-granulomatous inflammatory conditions, including multiple sclerosis, sarcoidosis, inflammatory bowel disease, and a few fibrotic interstitial lung diseases (tuberculosis, idiopathic pulmonary fibrosis, scleroderma-associated interstitial lung diseases, chronic obstructive pulmonary diseases) [
28,
29,
30,
31,
32].
Song and
Shao (2024) have also proposed
CHIT1 as one of 12 genes in an immune-mediated genetic prognostic risk score model when administering immunotherapy in triple negative breast cancer [
33].
CEP170 and
CTR9 are involved in cell cycle processes, in which dysregulation have been described to result in the secretion of inflammatory factors, impair immune-mediated processes, and increase inflammation [
34,
35,
36,
37,
38]. Potentially, the alterations in
CHIT1,
CEP170, and
CTR9 may individually, or collectively contribute to granuloma formation and chronic inflammation in the breast tissue in IGM.
Unfortunately, there is considerable genetic heterogeneity amongst the eight IGM cases since the remaining 50 genes are each altered in single cases. This variability complicates efforts to pinpoint common genetic drivers of the disease, and could suggest IGM may arise from multiple genetic pathways. Such heterogeneity is consistent with the clinical diversity observed in IGM, where patients present with a wide range of symptoms and disease severities [
4,
10,
39]. However, the lack of statistically significant pathways identified from functional enrichment analysis of the 53 genes, with all pathways identified enriched by only one to two genes, suggests there may not be identifiable genetic drivers for IGM among these eight patients.
Both pipelines rely on rigorous variant calling and annotation processes to maximise the reliability and validity of the identified somatic variants. Unfortunately, none of the selected variants identified from WES were validated with Sanger sequencing. Despite achieving high coverage depth across all exome-targeted regions, WES is prone to inaccuracies, due to sequencing artefacts or accurately identifying variants in regions of genomic instability [
40]. False positives can arise in short-read technology, particularly in regions with high GC content or repetitive sequences [
41]. Sanger sequencing, with its ability to provide uniform coverage and longer read lengths, is a valuable orthogonal validation tool [
42]. Other studies have also described somatic variants identified from WES that were not found in Sanger sequencing [
24,
25,
43,
44]. Discrepancies between WES and Sanger sequencing results can be attributed to their inherent differences in their error profiles or limitations in detecting variants present at low allele frequencies, especially in heterogeneous samples like those from IGM patients [
45,
46,
47].
It must be recognised that the application of WES to the study of IGM presents its own unique set of challenges. WES mainly targets the exonic regions of the genome and is not as effective in identifying large structural variations, including deletions, duplications, inversions, and translocations [
48]. Additionally, the rarity of IGM limits access to large patient cohorts for comprehensive genomic analysis with sufficient power to detect smaller effect sizes and lower impact somatic mutations [
49]. Given our sample size of eight patients with matched blood and tissue samples, this study has low statistical power (0.230) to detect a significant difference in somatic mutations between the paired samples if such a difference truly exists [
50]. The ideal power of at least 0.80 requires an odds ratio of approximately 7.25 to detect a statistically significant difference in somatic mutations between the paired blood and tissue samples [
50].
Another difficulty lies in the disease’s inherent heterogeneity [
2,
4,
8,
10]. With a broad spectrum of clinical features, ranging from localised breast masses to diffuse inflammatory changes, identifying consistent genetic signatures associated with IGM is challenging [
39]. Moreover, the multifactorial and inflammatory nature of IGM adds another layer of complexity to the study of its aetiology [
8,
51]. While various hypotheses, including immune dysregulation, infectious triggers, and hormonal influences, have been proposed, the precise interplay of genetic and environmental factors remains poorly understood [
8,
11]. Understanding these complex interactions is key to determining the underlying causes of IGM.
This study pioneers the investigation into somatic variants in IGM patients. While the matched blood-tissue WES variant calls did not identify any
ClinVar annotated pathogenic variants, the detection of variants in multiple genes suggests that IGM may involve a variety of molecular mechanisms. Larger studies with more comprehensive datasets are needed to uncover significant genomic drivers and biological pathways associated with IGM. Furthermore, larger scale studies could also unearth possible associations between different variants and, the clinical manifestations and severity of IGM [
2,
10]. Future studies should also focus on integrating additional omics data, such as transcriptomics and proteomics, to provide a more comprehensive understanding of the disease. The challenges in validating somatic variants underscore the need for improved methodologies and protocols for variant validation. Addressing these challenges requires a multifaceted approach, including refining bioinformatics pipelines to mitigate false positives, and an integrated orthogonal validation approach to ensure the accuracy of variant calls.
Author Contributions
Conceptualisation, S.S.O., P.J.H., J.L. and M.H.; methodology, S.S.O., P.J.H., A.J.K. and J.L.; software, S.S.O., P.J.H. and J.L.; validation, A.J.K.; formal analysis, S.S.O. and J.L.; investigation, S.S.O., A.J.K. and J.L.; resources, A.J.K., J.L. and M.H.; data curation, A.J.K., B.K.T.T., Q.T.T., E.Y.T., S.M.T., T.C.P., S.H.L., E.L.S.T. and M.H.; writing—original draft preparation, S.S.O.; writing—review and editing, S.S.O., P.J.H., A.J.K., B.K.T.T., Q.T.T., E.Y.T., S.M.T., T.C.P., S.H.L., E.L.S.T., J.L. and M.H.; visualisation, S.S.O., P.J.H., A.J.K., and J.L.; supervision, P.J.H., J.L. and M.H.; project administration, S.S.O., A.J.K., J.L. and M.H.; funding acquisition, J.L. and M.H. All authors have read and agreed to the published version of the manuscript.