1. Introduction
Tuberculosis (TB) accounted for 1.4 million of deaths and an estimated 10.6 million of new cases in 2023, and was the leading cause of death from a single infectious agent worldwide until COVID-19 pandemic [
1]. The emergence of drug resistant (DR) TB is a global threat that hinders successful TB treatment. TB multidrug resistant (MDR-TB), defined as simultaneous resistance to rifampicin (RIF) and isoniazid (INH), results in a worse prognosis, prolonged TB treatment with second-line drugs that are more toxic, more expensive and with the possibility of evolving to pre-extensive drug resistant (p-XDR-TB). The latter is defined as strains that fulfil the definition of multidrug resistant or rifampicin resistant (MDR/RR-TB) plus resistance to any fluoroquinolone, and subsequently can evolve to extensive drug resistant (XDR-TB), defined as strains p-XDR-TB plus resistance to bedaquiline and/or linezolid.
Brazil is one of the thirty countries with highest TB burden, and although the incidence rate decreased until 2014, during the period of 2015-2019 a notified increased incidence of 34.3 to 37.4 cases/100.000 inhabitants was observed [
2]. In addition, due to the COVID-19 pandemic, it is estimated that after a decade of decline, TB mortality has increased in Brazil and globally [
3]. An additional concern for TB control in Brazil is the considerable number of DR-TB patients; in 2018 around 2.500 MDR/RR-TB cases were estimated [
4], including an increase of MDR-TB among patients that had not previously been treated in Rio de Janeiro State [
5].
Rapid DR-TB detection and epidemiological surveillance, as well as knowledge about the genetic diversity of isolates of the
Mycobacterium tuberculosis complex (MTBC) in different settings are factors that may contribute to DR-TB elimination. In this scenario, molecular tools have become important and quite recently, next generation sequencing (NGS) made it possible to quickly characterize the whole genome of MTBC strains, enabling both identification of resistance-related genetic variants and lineages identification involved in ongoing transmission [
6,
7]. Culture-based phenotypic drug susceptibility test (DST), although still the current gold standard, has limitations due to the slow growth rate of the MTBC organisms. Thus, molecular methods for drug resistance prediction are being steadily introduced as a routine in low TB incidence countries [
8].
WGS is a promising tool and an approach for DR/MDR-TB detection, since it provides detailed sequence information from different genomic regions, thus enabling drug resistance prediction [
9]. However, the high amount of sequencing data generated by WGS created the challenge to develop bioinformatics tools to translate the data into information of clinical and laboratory interest [
8]. To permit the use of WGS by professionals with little or no bioinformatics skills, user friendly tools for analysis and interpretation of WGS data have been developed and implemented, permitting accessibility and broad implementation of NGS-based approaches [
10,
11,
12,
13]. Nonetheless, due to the complexity of large scale data analyses, some bioinformatics command line skills are still required and sometimes, user a friendly graphical interface is not available [
14,
15].
Due to global variability of prevalence of MTBC lineages and evidence of differential association with drug resistance [
16,
17], it is evident that combined detection of both characteristics in different countries and regions may interfere with performance and management of DR-TB detection. Therefore, genomic information of MTBC strains together with conventional phenotypic drug susceptibility testing (DST) and clinical outcome is urgently needed. In addition, evaluation of the different available tools for extraction of DST profile from WGS became important to evaluate the regional differences in DR-TB surveillance and to understand local TB transmission.
Here we conducted a genetic diversity study on genomes from a large collection of Mtb isolates from several states of Brazil that mostly had phenotypic DST for primary and secondary drugs. WGS data were evaluated by an in-house WGS pipeline and different online available pipelines, including Mykrobe Predictor, KvarQ, TB Profiler version 0.3.4 and the more recent TB Profiler 5.0 to predict drug susceptibility and the in-house WGS pipeline, KvarQ, TB Profiler 5.0 and RD-Analyzer to predict the genotype (TB lineages).
3. Discussion
In this study we performed a genome-based characterization of the genetic diversity of mainly drug resistant isolates of Mtb in different regions in Brazil. The analyzed Mtb strains predominantly belonged to the sub-lineage 4.3 (LAM) (52%). This is in accordance with previous studies performed in various Brazilian regions in both susceptible and DR-TB samples [
18,
19,
20,
21,
22,
23,
24]. L4.3 was predominant in the South, North, Southeast and Midwest region. In the Northeast, this predominance by L4.3 was not observed but the number of samples from this region was low. Three isolates (from São Paulo, Rio de Janeiro and Distrito Federal) belonged to Lineage 1, known to be restricted mostly to Eastern Africa and South and East of Asia [
16], but apparently described with a certain frequency in the Northern Brazil region [
24,
25,
26], probably imported trough slave trade from East Africa [
27]. Although L1 is usually not associated with DR, two of three isolates in this study presented some resistance (one IMR and one Poly-R). This might be due to sampling bias as the CRPHF mainly receives samples from patients suspicious for being DR, either due to treatment failure, treatment abandonment, relapse TB or contact with TB resistant patients.
The four pipelines presented a high agreement level (k = 0.89), with almost complete agreement among SNP-based pipelines, but showing divergences mainly in lineage classifications by RD-Analyzer. The Mtb isolates with high counts of mixed SNP calls may be indicative of mixed infection as described by others [
28,
29,
30]. Analysis among five isolates with a high proportion of mixed SNP calls, showed four mixed classifications by the in-house WGS pipeline only, while TB Profiler and KvarQ detected only one of the two lineages present in these isolates. The RD-Analyzer had some difficulties to classify lineage in comparison with SNP-based tools and presented the most divergences among all tools tested, including 31 isolates with mixed classification and six unidentified despite good genome quality and low mixed SNP call counts.
The majority of clusters were identified in Rio de Janeiro, and although sampling from this state was clearly over represented in our setting, this may represent higher levels of recent infection and drug resistant TB transmission, differing of TB cases due to reactivation of latent infection [
31,
32]. In this Brazilian State, an increase of primary MDR-TB has been described as associated with low TB control performance [
5]. Studies in other Brazilian States also shown important ongoing transmission of MDR-TB, as observed in Sao Paulo [
33,
34] and Santa Catarina [
35] and also, in a genome-based study [
36], in Rio Grande do Sul. In the latter, as well as in our study, a significant association between clustering and genotypic MDR-TB and significant ongoing transmission in p-XDR-TB resistance profiles was observed, with practically half of the patients infected with genotypic p-XDR-TB (44.6%;
n = 25) in clusters, reflecting an alarming scenario of insufficient DR-TB control [
36].
In addition to clusters of isolates derived from residents of particular states, we also observed clusters suggestive of interstate transmission of particular genotypes that seem to have been circulating in the country for a considerable time. Examples are GC8 (three patients from Rio de Janeiro and one from Ceará) and GC15 (one from Goiás and one from Tocantins) presenting genetic distances ≤12 SNPs. Furthermore, we observed among isolates of unidentified patients, that were not included in clustering analysis, one isolate from Distrito Federal (isolated in the year 2007) that had genetic distance of ≤12 SNPs to the GC4 isolates that were from Rio de Janeiro (2008-2012), and also one isolate from São Paulo (2004) and one isolate from Distrito Federal (2007) that had genetic distance of 11 SNPs between them. These could point to other two interstate transmission events, requiring further investigation.
From 15 patients, we had several isolates collected during and after treatment. In most of these cases, the infecting lineage strain during follow up of the patients was the same and only in one patient (P35), reinfection with another strain and lineage was observed. In a retrospective cohort study from China [
37], a MIRU-VNTR-based recurrence definition observed a lower frequency of reinfection (
n = 21; 36.2%) in comparison to relapse (
n = 37; 63.8%). In a population-based study from Malawi [
38] using WGS, lower frequency of reinfection (
n = 20; 14.3%) compared with relapse (
n = 55; 39.5%) was also observed; 64 (46%) of recurrent patients remained unclassified, showing how challenging it remains to classify such cases.
We observed emergence of FQ resistance conferring mutations among one third of the patients that were followed-up during treatment. These bacterial populations had likely been selected due non-compliance of drug treatment [
39,
40]. In addition, emergence of phenotypic resistance to AMK in patient P39, and phenotypic resistance to KAN and CAP in patient P64 was observed, but without detection of known mutations in
rrs and
eis. This could be caused by not yet known resistant conferring mutations to those drugs and/or by efflux pump activity, not investigated here. Such pumps have been described to be induced by sub-inhibitory levels of antibiotics such as FQ, RIF and INH in inappropriate treatment regimens [
41]. Additionally, the presence of a minor undetected population of drug resistant bacteria, so-called heteroresistance, can also explain differences between culture and in silico based drug susceptibility outcome [
42].
Another observation was the emergence of the G406D mutation in
embB in patient P54 (
Table 3), however, a mixed call was detected in this codon and may indicate selection of the mutant population within a single strain, probably due to inappropriate treatment. Patient P35 presented emergence of the Q497R mutation in
embB, and in addition a shift of RIF resistance genotype, initially presenting Q432P mutation in
rpoB and posteriorly the S450L mutation. Besides that, in the first isolate the sub-lineage detected was L4.1.1 (X) and in the second was L4.3 (LAM), and also a genomic difference of 484 SNPs was observed between both, indicating a TB recurrence by reinfection [
38].
We observed a statistically significant lower frequency of resistance related mutations to STR, CAP, ETH and AMK in strains belonging to sub-lineage 4.3 (LAM) in comparison with non-LAM sub-lineages. Lower frequency of mutations related to resistance in some lineages has been described, including lower frequency of S315T mutation in
katG in LAM spoligofamily when compared to Haarlem [
43]. Another study showed a higher mutation rate in Mtb strains of lineage 2 and as such, a faster emergence of resistance conferring mutations in strains of lineage 2, accompanied by development of DR-TB [
44].
Upon comparing among patient treatment outcome, risk factors for TB development, phenotypic and genotypic resistance and Mtb lineage, we observed a significant association of EMB phenotypic (Chi-square
p = 0.006) and genotypic (Fisher’s exact test
p = 0.0007) resistance with TB mortality. EMB resistance has been described as a risk factor for mortality, mainly when associated with MDR pattern [
43]. In our study, genotypic p-XDR-TB was significantly associated with EMB genotypic resistance, while genotypic p-XDR-TB profile was significantly associated with higher morbidity (Chi-square
p = 0.03). In addition, the presence of any of the five risk factors (HIV, diabetes, smoking cigarette, alcohol and/or illicit drug use) was also significantly associated with morbidity but independent of genotypic or phenotypic EMB resistance. The increased risk of mortality in MDR-TB and p-XDR-TB patients has been described [
46,
47] and is related with increasing level of drug resistances, whereas accumulating of even more factors increases the probability of an unfavorable outcome [
48,
49]; EMB resistance could be an important stage of resistance accumulation in Mtb.
We observed a similar proportion of main drug resistance conferring mutations, frequencies and genes affected detected by all pipelines tested and this is in agreement with other studies [
50,
51,
52,
53,
54].
Unprecedented was the detection of 20 novel mutations in 32 strains using TB Profiler 5.0, including a novel frameshift mutation katG_c.2070delC in an isolate that is likely associated with INH resistance, considering the impact this may have on the catalase-peroxidase structure and activity, accompanied by phenotypic resistance against INH. The same isolate also presented the mutation ahpC_c.-81C>T, classified in WHO catalogue as with uncertain significance, but may be acting in synergy with the newly described to confer resistance; this needs to be better investigated. This isolate was misclassified as genetically susceptible to INH by all pipelines but TB Profiler, and the same was observed in other isolates with the mutation ahpC_c.-81C>T, highlighting the importance of using updated pipelines.
Only four of 32 isolates were fully phenotypically characterized for the nine drugs available in this study. Unfortunately, thirteen of the isolates in which we detected a novel mutation were not phenotypically characterized for the drug of interest, making it impossible to establish the genetic and phenotypic correlation. Among the four isolates with the frameshift mutation
pncA_c.193_200dupTCCTCGTC, two had no PZA DST available and two were PZA resistant (these two were from the same patient); the mutation
pncA_c.305dupC was observed in one resistant and one with DST not available; finally, the mutations
pncA_c.452dupT and
pncA_c.75_79delCGCGC were both presents in resistant isolates, all suggesting resistance association. On the other hand, the frameshift mutation
pncA_c.443_444dupGC was detected in both susceptible and resistant isolates, making it difficult to interpret. Other isolates observed with novel frameshift mutations in
pncA and
katG genes and phenotypically susceptible could be explained by an acquired low-level resistance, under the antibiotic concentration threshold used here, or due to some laboratorial error in phenotypic resistance determination. Phenotypic susceptible isolates harboring confident resistance mutations have been reported in other studies [
55,
56].
Among ten other mutations we detected that were absent in the WHO catalogue (
Supplementary Table S5), eight [
57,
58,
59,
60,
61,
62,
63,
64] in twelve isolates had not yet reported in Brazil [
63,
65], four (
katG_p.Leu634Phe,
pncA_p.Asp63His,
pncA_p.Gly23Val,
pncA_c.521_522insT) were present in phenotypically resistant isolates in our study. Another mutation associated with resistance was detected for the first time in Brazil,
rpoB_p.Ile491Phe, a mutation outside of the 81 bp hotspot of the
rpoB gene
, what make it undetectable by commercial assays such as GeneXpert MTB/RIF and MTBDR
plus [
66,
67]. In Eswatini, for example, the presence of this mutation is a significant problem because of its prevalence of >60% in MDR strains [
68]. In our study, this mutation was observed in one isolate genotypically and phenotypically MDR, that also carried the
rpoB_p.Ser493Leu mutation, an uncertain variant according to WHO. However, their combination can be acting in synergy because the former is a borderline resistance mutation [
69].
The comparison among the pipelines evaluating their performance through sensitivity, specificity, accuracy, positive and negative predictive values calculation for drug resistance prediction, using phenotypic resistance results as a reference, showed a better overall performance of resistance prediction by the in-house WGS pipeline. For all pipelines, the sensitivity of prediction of resistance to RIF and INH was higher than 80% but the specificity was lower than 95% and therefore below that of the WHO recommendations [
70]. High positive predictive value-PPV (>80%) was observed in all pipelines used for detection of resistance conferring mutations for RIF, INH, OFL, AMK and similar to that described by others [
71,
72]. Lower PPV was observed for detection of resistance mutations whose action mechanisms are more complex and genetic bases of resistance is less understood, such as resistance to PZA, KAN and CAP. Unreliable PZA resistance prediction has been reported in various studies [
25,
71,
72,
73] and in order to enhance PZA resistance prediction, the following strategies have been described: (i) improving the mutation library of the pipelines mainly by including detection of indels in
pncA, (ii) the “non-wild type sequence” approach, that consist in interpret any non-synonymous mutations or indels in
pncA as genotypic PZA-resistant strains, (iii) besides manually checking of the sequence reads [
74,
75]. Not surprisingly, the best sensitivity performance for PZA resistance prediction was achieved by TB Profiler 5.0, the most up-to-date pipeline for detection of DR-associated mutations, accounting with the largest PZA-mutations catalogue among all pipelines used and able to detect novel mutations, including fourteen novel indels observed in this study.
Another important finding in our study is that, compared to conventional DST, more than twice as many cases were classified as p-XDR-TB using WGS. This difference in proportion could be beyond the accuracy of genomic tests to second-line resistance detection. However, one should take into account that the criteria for performing phenotypic DST tests to second line drugs in Brazil are causing the underestimates of the detection of resistance to such drugs. These results highlight the importance of using WGS for epidemiologic surveillance and control of DR/MDR/p-XDR-TB, as these discrepancies would not be detected and reported without it use.
Without any doubt, a major limitation of this study is that phenotypic tests for second line drugs were not done in all samples. Among the 161 isolates classified as phenotipically MDR-TB only 22 were tested to all second line drugs and only 10 were tested for nine drugs; one of 10 Poly-R-TB and six of the 27 p-XDR-TB were submitted to the full DST. This further emphasizes the importance of using molecular methods for TB diagnosis and comprehensive resistance analyzes. Another limitation is the small sample size from regions outside Rio de Janeiro, Distrito Federal and Sao Paulo States, therefore not representing the national scenario of DR-TB in Brazil.
In conclusion, the evaluated pipelines for prediction of MTBC lineage and drug resistance work well in the Brazilian sampling studied here and our data favor the use of WGS in cases without conventional DST. Although the in-house WGS pipeline performed slightly better in general, all tools performed well in predicting DR for RIF, INH, AMK, KAN and CAP. Importantly they allowed detection of p-XDR-TB strains that otherwise would probably have been unreported. Analysis of isolates of DR-TB patients that were sampled sometimes years apart, demonstrated that several accumulated drug resistance mutations showing that resistance evolution occurs by acquisition of mutations in the same strain, probably due to noncompliance of treatment. Phylogenetic analysis and lineage characterization contribute to better understand the MTBC genotypes spreading in the population, so WGS improves the knowledge of MTBC population structure and evolution and offers rapid and reliable assessment of resistance related mutations allowing faster access to effective treatment. A surprisingly low level of reinfection was observed in an area with high TB incidence and, even in patients with DR and prolonged treatment. We believe that these findings highlight the importance of the need for active surveillance throughout the national territory in order to avoid further aggravation of the TB scenario in Brazil.