1. Introduction
Hepatitis B Virus (HBV) infection is a significant global health concern, linked to severe outcomes such as hepatocellular carcinoma (HCC) and liver cirrhosis. It is particularly prevalent in sub-Saharan Africa, where approximately 6.1% of adults are chronically infected, contributing to a substantial portion of the 296 million global cases [
1]. Approximately 1.5 million new Chronic HBV (CHB) infections occur annually across Africa, accounting for a quarter of new cases globally. In 2019, the global CHB prevalence was estimated at 4.1%, with the Western Pacific region having the highest prevalence (7.1%) and the European region the lowest (1.1%) [
1,
2].
HBV, a member of the
Hepadnaviridae family, possesses a compact circular genome of approximately 3.2 kilobases (kb) [
3]. It comprises four genes—HBx (X), Core (Pre-C/C), Surface (S), and Polymerase (P) encoded in seven overlapping reading frames. The S protein, containing the antigenicity, or "a", determinant region within the major hydrophilic region (MHR), is crucial as it is the primary target of neutralizing HBV antibodies with substitutions in and around this region being associated with immune escape and vaccine failure [
4].
To date, ten HBV genotypes (A-J) with an intergroup nucleotide divergence of at least 8% at the whole genome level have been identified [
5]. The most prevalent genotypes globally are C (26%), D (22%), E (18%), A (17%), and B (14%) [
2]. In Africa, predominant circulating genotypes include A, D, and E, with sub-genotypes A1 and D3 predominating in southern Africa. Specifically, sub-genotype A1 remains most prevalent in South African populations (accounting for 97% of infections) and has been linked to severe liver disease and rapid progression to HCC [
6,
7].
While whole-genome sequencing-based surveillance is becoming a key tool for understanding the distribution, prevalence, and evolution of viral pathogens [
8,
9,
10], it has yet to be fully leveraged in clinical diagnostic settings. In many settings, rapid point-of-care screening tests are used to detect the hepatitis B surface antigen (HBsAg) in serum or plasma as a marker of active infection. However, a major concern when using diagnostic tests is that they must possess a high degree of sensitivity and an acceptable level of specificity to reduce false results [
11,
12]. For more detailed characterization of HBV in patient samples, Sanger sequencing of complete or partial HBV genomes is considered the gold standard and has been used to classify HBV into its ten genotypes. It is, however, often restricted to analyzing specific genes and is rarely used for the analysis of intra-patient genetic diversity [
13]. However, partial genome sequences can be somewhat misleading when characterizing recombinant HBV genomes, and Sanger sequencing yields little or no information on intra-patient HBV genetic diversity: information which would be extremely valuable with respect to monitoring the emergence of drug resistance or immune evasion mutations, estimating the durations of chronic infections, understanding the progression of pathogenesis, and tracing transmission patterns [
13]. Illumina deep sequencing, while effective for genotyping and characterizing genetic diversity [
14], suffers from limitations such as the inability to sequence long DNA stretches, biases introduced during amplification steps, and challenges in generating sufficient overlap between DNA fragments [
15].
High-throughput sequencing (HTS) techniques are powerful tools that, in addition to diagnosing CHB infections and genotyping HBV, would both enable the accurate characterization of recombinant HBV genomes (including those with mixed genotypes) and provide detailed data on intra-patient HBV genetic diversity [
16]. Among numerous other HBV-focused applications, HTS and downstream analyses have previously been used to sequence complete HBV genomes [
8,
10,
17,
18], track the demographics of HBV populations within individual CHB patients [
19], and identify the prevalence of drug-resistance mutations in large patient cohorts [
13,
20,
21,
22].
Two major challenges associated with HTS workflows employed to generate viral genome sequence data are 1) the efficient and accurate barcoding of samples needed for multiplexed sequencing where multiple patient samples are simultaneously sequenced in a single run and 2) the accurate reassembly of sub-genome-length sequence reads into complete genomes. Third-generation HTS technologies, such as the Oxford Nanopore Technology (ONT) MinION, GridION, and PromethION, have largely overcome these limitations. Barcoding kits allow for cost-effective and efficient sequencing by enabling the pooling and running of multiple libraries on a single flow cell. Different types of barcoding kits are available, including ligation-based, PCR-based, and rapid chemistry-based kits, each with their own advantages and input requirements (
https://community.nanoporetech.com/docs). A study by McNaughton et al. describes advancements in a sequencing protocol utilizing isothermal rolling-circle amplification and ligation-based barcoding kits [
23]. The ONT rapid barcoding kit does not require individual sample washes and allows samples to be processed uniformly without quantification or normalization [
24]. The sample runtime on ONT, per 96 samples, is almost half that of Illumina (14 hours compared to 26 hours) mainly due to real-time data analysis with ONT. The rapid barcoding library preparation utilized by ONT also requires fewer reagents, as everything is contained within the kit, and is thus cheaper than Illumina sequencing [
25]. Further, the sequence reads generated using ONT are substantially longer than those generated by Illumina, a factor that vastly simplifies the assembly of whole genome sequences. However, ONT still exhibits a higher error rate than Illumina sequencing although this has improved with newer chemistry and the use of post-sequencing software such as Nanopolish [
26].
Here we describe an optimized ONT-based HBV whole genome sequencing protocol and associated downstream computational analyses that will be applicable within a clinical HBV diagnostic setting. Using 148 samples from chronic South African CHB patients we demonstrate that the protocol enables accurate recombination-aware genotyping of patient samples and the detection of drug-resistance mutations.
3. Discussion
In this study, we demonstrate that ONT sequencing, utilizing the Oxford Nanopore Rapid Barcode Kit, enables the rapid and simple generation of full-length HBV genomes. Additionally, we illustrate the utility of the rich sequencing data generated by this approach in the recombination-aware genotyping of HBV genomes and the detection of mutations associated with drug resistance and vaccine/immunotherapeutic escape.
Our protocol yielded 124 genomes with uniform coverage of greater than 80% and a sequencing depth of approximately 2,343.86. We opted for ONT sequencing, which overcomes Sanger and Illumina limitations by providing long-read sequencing, eliminating the need for complex library preparation processes, and reducing the risk of biases associated with amplification steps. As the ONT rapid barcoding library preparation requires fewer reagents than the Illumina sequencing protocol, ONT sequencing was also much cheaper (Illumina cost per sample is ~150–250 USD while the ONT cost per sample is ~10–40 USD) [
25].
Full-length genome and sub-genomic ONT-based sequencing approaches have previously been used for HBV [
23,
29,
30]. In these studies, ONT sequencing only worked well for samples with high HBV loads, with one study [
30] having a raw read error rate of ~12% and another unable to definitively confirm putative minority variants detected in the MinION reads [
29,
30]. In addition to ONT-based approaches, Dopico et al. (2021) developed an HBV sequencing protocol using the Distance-Based discrimination method (DB Rule) on Illumina MiSeq to sequence-specific, relatively short regions of the HBV genome (preS and 5′end of the HBV X gene regions), rather than generating complete genomes [
31]. This method is valuable for the rapid identification of variants in targeted regions but does not offer the comprehensive genomic insights that whole-genome sequencing approaches enable. An important factor associated with the successful sequencing of viral genomes is the VL present in samples: the higher the VL in a sample, the higher the yield of amplified products to be sequenced and the easier the assembly of a complete genome [
32]. One of the studies [
23] developed a sequencing protocol utilizing ONT ligation-based barcoding kits that improved the accuracy of HBV nanopore sequencing for use in research and clinical applications. Our rapid, chemistry-based, barcoding kit, sequencing protocol produced complete genomes at low (<2000 IU/ml), medium (2000 - 20 000 IU/ml), and high (>20 000 IU/ml) VL and allowed for the identification of various HBV genotypes; including genotypes A, D, and E.
Genome characterization of a virus can be important for clinical diagnostics as, beyond identifying the infecting agent, it can reveal clinically relevant genetic variations [
33]. HBV-A was the most prevalent genotype among our study cohort, which is consistent with this genotype's high prevalence in sub-Saharan Africa [
34]. A study by Jose-Abrego et al. (2021) explored the possible influence of HIV-HBV co-infection, which is likely to be prevalent in South Africa, on HBV genomes [
35]. They observed that co-infection could lead to genotype mixtures, increased viral load, and more severe liver damage, suggesting that HIV co-infection may have important implications for HBV genome diversity and clinical outcomes. Recombination analysis also revealed complex viral replication/recombination dynamics, with 16 identified recombinants, primarily between genotypes A and D (A/D). This was to be expected due to the high prevalence of genotypes A and D in Southern Africa [
6,
7]. While RDP5.46 analysis confirmed eight of these recombinants, discrepancies were noted, highlighting the challenges in precisely classifying recombinant strains. Although recombination can be easily detected using phylogenetic trees and recombination software, it is much more difficult to determine whether detected recombinants exist or whether they are detection artifacts arising from 1) primer jumping during Polymerase Chain Reaction (PCR) of samples that are either from mixed infections or have been accidentally cross-contaminated, 2) “backfilling” of failed amplicons with contaminating sequence reads or 3) incorrectly assembled genomes where reads from multiple different genetically distinct viruses get assembled into a single genome. In general, the only way to confirm the existence of a recombinant is to independently amplify and re-sequence samples or to detect multiple genomes of the same recombinant lineage in different patients (i.e. by identifying circulating recombinants). Of the 16 recombinants detected, only three were identified by RDP5.46 as circulating recombinants. Therefore, until the other 13 are independently amplified and sequenced, it cannot be definitively stated that they are not simply either sequence amplification or sequence assembly artifacts.
Nevertheless, the fact that HBV recombinants have been so widely and frequently detected suggests that the recombinants detected in this study are likely real. Genotype B/E recombinants identified in Nigeria and Eritria are the most common in Africa [
36]. Genotype D/E recombinants are also common, having been identified in eight countries, namely Kenya, Niger, Egypt, Ghana, Libya, Mali, Eretria, and Uganda [
36], followed by genotype A/E, identified in five countries, namely Uganda, Eretria, Ghana, Niger, and Mozambique [
36,
37], and genotype A/D, reported in Egypt, Eretria, and Uganda [
36].
Others have also detected recombination hotspots within the C region, pre-C, P, and X genes [
38]. These regions were also frequently transferred during recombination events and, as a result, may impact the diagnosis and treatment of HBV since coinfection and viral recombination can trigger greater virulence and result in a worsened patient clinical status [
32].
The Geno2pheno-HBV tool for identifying HBsAg vaccine escape and drug-resistant mutations [
39,
40] identified various potential drug resistance, HBsAg vaccine/immunotherapeutic escape, and diagnostic failure-associated mutations in our study population. Mutations, such as the triple mutation 173L + 180M + 204I/V, and 133L/T, found in the Pol gene, have been identified as major vaccine escape mutations. An HBV study in Bangladesh identified HBsAg mutant 128V as their most common mutant [
28] while mutant 100C was the most prevalent in our study. This may be due to the difference in genotype prevalence between the two study populations as HBV-C was the most prevalent genotype in Bangladesh whilst genotype A is the most prevalent among our study population. One of the most prevalent resistance-associated mutations for HBV is 204V/I for both treatment-experienced and treatment-naive individuals. This mutation can either occur alone or can occur in combination with other mutations such as 80I/V, 173L, 180M, 181S, 184S, 200V, and/or 202S [
41]. Our study also notes a high prevalence of 204V/I in combination with other drug-resistance mutations.
First-line treatment for CHB includes PEGylated interferon and nucleoside/nucleotide analogs such as Tenofovir, Entecavir, and Lamivudine [
7]. In South Africa, over 1.9 million people are chronically infected with HBV and the most used first-line treatment is Tenofovir in the form of tenofovir disoproxil fumarate (TDF) [
7]. In this study, we note that all sequences suggested susceptibility to Tenofovir. TDF treatment is long-term and is frequently given indefinitely due to the risk of infections reactivating when therapy is terminated as such treatments do not entirely remove the replication-competent viral genomes [
7,
42]. A CHB functional cure, characterized by loss of HBsAg and reduced risk of HCC, can be achieved by treatment regimens including TDF [
7]. Resistance to TDF has been noted in patients harboring mutations; 80M, 180M, 204V/I, 200V, 221Y, 223A, 184A/L, 153Q, and 191I [
43]. The absence of drug-resistance mutations in genotype E genomes and their presence in genotype A and D genomes highlight the importance of monitoring drug resistance as treatment with TDF may not be successful for these patients.
A major limitation of the current study is that HBV samples were only sequenced using the proposed protocol. The sequencing data was not compared to results generated using Illumina sequencing protocols. Moreover, efforts have been invested in optimizing the analysis pipeline to enhance accessibility and enable clinical laboratory staff to execute the entire process from start to end seamlessly. This initiative seeks to empower non-specialist bioinformaticians and streamline workflows, making genomic analysis more user-friendly and ensuring that valuable insights can be obtained without a dependency on specialized expertise. The aim is to make the application of these advanced sequencing technologies accessible to all, allowing broader adoption and application in clinical settings.
While short-read sequencing is the most used form of NGS [
27], despite ONT having a slightly higher error rate, ONT appears to generate high-quality data at a very affordable cost. Therefore, ONT-based sequencing is presently the most cost-effective, HTS technology, especially well-suited for countries with limited resources for monitoring shifting viral demographics and tracking the prevalence and spread of drug resistance and vaccine evasion mutations.
The high value of genome surveillance was excellently demonstrated during the COVID-19 pandemic. Although whole genome sequencing was primarily used to monitor virus evolution and was only carried out on ~2.12% of all diagnosed COVID-19 cases (
https://data.who.int/dashboards/covid19/cases?n=c, https://gisaid.org/), the prospect of agnostically diagnosing any viral pathogen (whether known or unknown) through HTS-based whole genome sequencing is likely to be realized within the coming decade. Diagnosis-augmenting protocols that are applicable in a clinical virology setting, such as the one we present here, are an important first step toward this objective.
Author Contributions
Conceptualization, TdO, DPM, DT, WC, SEJ, and TM.; methodology, TdO, DPM, DT, WC, SEJ, TM, WP, GvZ, JG, SP, UJA, TJS, and YN.; software, TdO and DPM.; validation, TdO, DPM, DT, WC, SEJ, TM.; formal analysis, TdO, DPM, DT, WC, SEJ, TM; investigation, TdO, DPM, DT, WC, SEJ, TM; resources, TdO and DPM.; data curation, TdO, DT, WC, SEJ, and TM.; writing—original draft preparation, DT, WC, and SEJ.; writing—review and editing, DT, WC, and SEJ., CB visualization, DPM, DT, WC, and SEJ.; supervision, TdO and DPM.; project administration, TdO.; funding acquisition, TdO, CB. All authors have read and agreed to the published version of the manuscript.