1. Introduction
In 1911, Peyton Rous isolated a transmissible agent from a large tumor on the breast of a Plymouth Rock hen [
1,
2]. His groundbreaking work demonstrated that malignant tumors may have infectious origins. The impact of his novel discovery ushered in the field of tumor virology which deepened our understanding of carcinogenesis by insertional mutagenesis [
1,
2]. Rous was eventually awarded the Nobel Prize in 1966, and the famed chicken retrovirus was eponymously named Rous sarcoma virus (RSV) [
1,
2]. Today, seven viruses have been classified as human carcinogens (Group 1) by the International Agency for Research on Cancer (IARC), which include: Epstein-Barr virus (EBV); Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus type 1 (HIV-1), Human papillomavirus (HPV), Human T-cell lymphotropic virus type I, (HTLV-I), and Kaposi’s sarcoma-associated herpesvirus (KSHV) [
3]. HIV-1 infection alone, interestingly, does not lead to cell transformation or immortalization. Instead, HIV-1 accelerates the process by interacting with oncoviruses, suppressing the immune system, and producing transferable HIV-1 proteins, thus acting as a co-factor in carcinogenesis [
3,
4].
Globally, the burden of cancer is staggering with an estimated incidence rate of 20 million new cases in 2022 with a projected increase to 35 million in 2050 [
5]. One out of every eight cases are attributed to chronic infections. HPV and HBV are the two most common viral causes of cancer worldwide [
5,
6,
7,
8,
9]. In 2020, there were 730,000 cancer cases attributable to HPV and 380,000 cancer cases attributable to HBV [
5]. The most common cancers associated with HPV were cervical cancer (with 662,301 cases) followed by oropharyngeal and anogenital cancers. As for HBV, it was hepatocellular carcinoma (HCC). In 2022, the global population of people living with HIV (PLWH) was 39 million, and those acquiring new infections was 1.3 million [
10]. With anti-viral treatment, life expectancy for HIV-infected persons has increased. However, among individuals co-infected with HPV and HBV, non-AIDS-defining cancers have become a concerning cause of mortality [
4].
The journey from viral infection to malignant transformation of the host cell is complicated and disparate for HPV, HBV, and HIV-1 (
Figure 1). Typically, these three viruses display host cell specificity, requiring binding to specific cell surface proteins for entry [
4,
11,
12]. After traversing the cytoplasm, the viral genomes enter the nucleus for replication (
Figure 1). Viral DNA integration into the host genome, whether accidental or deliberate, can disrupt or alter the function of host cancer-associated genes and neighboring genes, ultimately leading to malignant transformation. The mechanism of integration for each virus is briefly described here and shown in
Figure 1. During host cell division, the HPV circular genome tethers like a “hitchhiker” on sister chromatids [
13,
14]. The viral genome then unwinds bidirectionally, replicates, and partitions equally into the daughter cells [
15,
16]. Such intimate “liaisons” between viral and host DNA result in accidental integration at vulnerable sites (e.g., open chromatin and common fragile sites) [
15,
16]. For HBV, after virion entry into the hepatocyte, the relaxed circular DNA (rcDNA) and double stranded linear DNA (dslDNA) traverses into the nucleus for conversion to covalently closed circular (cccDNA) [
17]. Only the dslDNA integrates randomly at double-stranded DNA (dsDNA) breaks in the host genome through non-homologous end joining (NHEJ) or micro-homology mediated end-joining (MMEJ) [
17,
18]. The integrated viral DNA leads to viral persistence, pathogenesis, and carcinogenesis [
17,
18]. For HIV-1, upon entering the immune cell, the RNA genome of the virion undergoes reverse transcription, resulting in a dsDNA (provirus) [
19]. The provirus then enters the nucleus for insertion into the host genome at random sites by the HIV-1 integrase enzyme [
19].
Viral hybrid-capture next-generation sequencing (hyb-cap NGS) is a widely used method for targeted or whole genome sequencing [
20]. It also enables the capture of virus-host chimeric reads, useful for identifying genomic breakpoints, especially for integration site analysis [
21]. In 2022, a commercial hyb-cap NGS kit for HPV, HBV, and HIV-1 became available [
22]. The anticipated benefits of using pre-designed, virus-specific probes for targeted sequencing include eliminating the effort involved in probe design, ensuring quality assurance, and standardizing protocols. As for post-sequencing analysis, our prior study demonstrated the efficiency of the Viral Hybrid-Capture (VHC) and Viral Integration Site (VIS) workflows within CLC Microbial Genomics Module (CLC MGM) for HPV integration site analysis [
21]. Our current study builds upon this previous work. We evaluated a commercial viral hyb-cap NGS kit and the VHC/VIS workflows for HPV, HBV, and HIV-1 integration analysis. Customized genomic databases were constructed to cover HBV and HIV-1. The results confirmed the effectiveness of a comprehensive laboratory pipeline that identifies viral integration sites and their impact on host genes. This process enhances our understanding of cancer development and facilitates the identification of diagnostic, prognostic, and therapeutic markers.
4. Discussion
In this study, we developed and tested an end-to-end workflow from NGS to mapping virus-host integration sites. Eleven samples comprised of 9 established cancer cell lines, one synthetic HIV-1 plasmid, and one publicly available HIV-1 dataset were subjected to testing. Overall, the pre-designed hybrid capture probes of the QIAseq xHYB Viral STI Panel performed well in terms of sequence quality and quantity, as well as breadth and depth of viral genome coverage.
The creation of curated genomic databases played a vital role in the analytical workflows for both HBV and HIV-1. Unlike HPV, which benefits from an organized and curated genomic database, namely Papilloma Virus Episteme (PaVE), previously customized for use within CLC MGM [
79]. Both HBV and HIV-1 necessitated considerable manual curation which were structured similarly to our prototypical HPV database [
79]. For HBV, an exhaustive literature search for HBV databases identified only one online database, i.e., HBVdb release 59 (
https://hbvdb.lyon.inserm.fr/HBVdb/HBVdbIndex) with 106,100 entries and tools for FASTA sequence annotation, genotyping, and drug resistance profiling [
80]. Given that HBVdb was not designed for NGS data analysis, we developed our own database sourced from NCBI Virus [
29]. The goal was to create a comprehensive and representative database with the following features: 1) sufficient resolution at both the genotype and subtype levels, 2) adequate genomic diversity without being exhaustive computationally, and 3) compatibility with CLC MGM for NGS analysis. For HIV-1, the LANL HIV-1 database served as the foundational resource, providing accession numbers for a comprehensive range of well-characterized HIV-1 reference genomes. However, the GenBank taxonomic information associated with each viral genome needed revision to incorporate genotype and subtype nomenclature. This modification was essential to enable precise subtyping for taxonomic profiling and variant analysis. After construction of the databases and alignment of the genomes, we successfully visualized the phylogenetic distances and relationships. The resulting tree allowed us to examine genotype and sub-lineage representation, ensuring accuracy of the databases. Viral sub-lineage classification is of utmost importance in the realms of epidemiologic research, public health, and outbreak investigations. Recently, the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) held the 2024 Viral Sub-species Classification Workshop to address the complexity, enormity, and challenges of classifying and tracing viral evolution [
81]. Establishing a clinically relevant, widely accepted terminology is crucial for clinical virology, especially in the context of diagnostics, vaccine research, and therapeutics [
81].
The utility of all three databases was demonstrated by using our deep-sequenced samples. By integrating curated viral and human genome databases into CLC MGM workflows, we streamlined the processing of hybrid-capture NGS data. This involved inputting FASTQ files, selecting reference genomes, and configuring necessary parameters. Efficient and rapid taxonomic classification and visualization of viral metagenomes allowed us to uncover compositional differences between samples. The VHC mapping, along with its track list and zoomable visualization, facilitated easy inspection of mapped regions, variants, and low-coverage areas at both the nucleotide and amino acid levels. Remarkably, the median processing time for the VHC workflow was only 17 minutes per sample using a laptop computer. Furthermore, the HPV consensus sequences obtained from the VHC workflow proved valuable in revealing HPV sub-lineages and elucidating evolutionary relationships between samples. The VIS workflow efficiently processed NGS data, achieving a median runtime of 79 minutes per sample. The autogenerated VIS outputs featured tracks with viral and host breakpoint annotations, a zoomable and rotatable circular plot, and a summary report highlighting disrupted and surrounding genes. The tabulated report facilitated review and identification of pathogenic genetic alterations.
The primary locations of viral-host integration identified herein for SiHa, HeLa, CaSki, SCC154, and ACH-2 cell lines were consistent with prior investigations, although some differences in minor integration sites were noted [
24,
42,
44,
45,
48,
82,
83,
84,
85,
86,
87,
88,
89,
90,
91]. The discrepant results may be attributed to differences in sequencing methods, software platforms, parameters, and cut-off definitions. Additionally, the identification of disrupted and nearby genes depended on user-defined search parameters (e.g., choosing between 100 KB and 500 KB for nearby gene distance), which can either restrict or expand the results. Studies specifically related to viral integration in the DoTc2, 2A3, Hep 3B2, and SNU-182 cell lines were not found in PubMed or online databases, VISDB and VIS Atlas [
88,
89,
90,
91]. Previously, DoTc2 cells were identified as HPV-negative by ATCC [
23]. However, a recent study by Vuckovic et al. and subsequent retesting using a novel set of primers by ATCC confirmed HPV-16 integration [
23,
92]. Our findings corroborated HPV-16 integration in DoTc2 and revealed disrupted segments of HPV
L1 and
L2 genes by VHC analysis as the cause of false-negative PCR results. Hence, the findings generated by this study will serve as a valuable reference for future investigations.
Exploring the functions and interactions of genes adjacent to the viral integration sites provides valuable insights into their roles in carcinogenesis. For instance, in SiHA cells with HPV-16 integration, the gene LINC00393 on chromosome 13q22.1 has been implicated in altering 3D chromatin structure, leading to downregulation of the tumor suppressor gene
KLF12 [
44]. The HPV-18 DNA fragments in HeLa cells were detected approximately 500 kb upstream of the
MYC proto-oncogene (located at chr. 8: 127,735,434-127,742,951).
MYC is the human homolog of the oncogene (
v-myc) carried by the avian retrovirus, which is associated with myelocytomatosis and other neoplasms [
93]. A long-range chromatin interaction between HPV-18 fragments, the
MYC gene, and the cytoband 8q24.21 has been demonstrated to constitutively activate the
MYC gene, leading to cell proliferation and tumorigenesis [
45]. Notably, the 8q24.21 region is recognized as a hotspot for genetic mutations associated with various cancer types [
46,
47]. In a recent review focusing on gastric cancer (GC), genetic alterations within the 8q24 cytoband and its sub-bands (8q24.3, 8q24.11-13, 8q24.21, and 8q24.22) were explored [
47]. Among the genes frequently associated with GC within the 8q24 region are NSMCE2, PCAT1, CASC19, CASC8, CCAT2, PRNCR1, POU5F1B, PSCA, JRK, MYC, PVT1, and PTK2 [
47]. The presence of similar genetic alterations across different cancer types suggests a common mechanism of oncogenesis. In our study, cytoband 21p11.2 emerged as another frequent and significant site. Bi et al. reported that 21p11.2 was the most frequently integrated region in cervical squamous cell carcinoma [
48]. Among the affected downstream genes, RNA5-8SN1 to N3 which encode 45S ribosomal RNA promoters could potentially be exploited for expressing viral oncoproteins [
48]. Additionally, the downstream gene MIR3648-1 (located at chr. 21: 8208473-8208652) may play a role as a tumor-suppressive miRNA within the
MIR-3648/FRAT1-FRAT2/MYC negative feedback loop [
50]. In SNU-182 cells, HBV DNA fragments were integrated at
HAS2-AS1 downstream of the
HAS2 gene on cytoband 8q24.13 (located at chr. 8: 118,300,001–121,500,000). A pan-cancer analysis, including HCCA, revealed that significant downregulation of
HAS2 contributes to cancer progression and metastasis [
73,
74]. In ACH-2 cells, we identified the integration of the HIV-1 provirus at the
NT5C3A gene on cytoband 7p14.3 (located at chr. 7: 33,014,113-33,062,776). The NT5C3A-encoded enzyme, pyrimidine 5′ nucleotidase, catalyzes the dephosphorylation of pyrimidine 5′ monophosphates, including the antiretroviral AZT monophosphate (AZT-MP) used in HIV/AIDS treatment [
75]. The combined evidence highlights the substantial impact of virally integrated sites and neighboring genes on carcinogenesis or cellular function modification.
This study has several strengths. Firstly, we demonstrated the benefit of using an off-the-shelf, pre-designed hyb-cap NGS kit for detecting HPV, HBV, and HIV-1. Unlike custom probe design, which typically demands expert knowledge of the target virus, molecular biology, and bioinformatics, the pre-designed kit circumvented those exacting, time-consuming requirements. Furthermore, this represents the initial assessment of result quality achieved using the QIAseq xHYB Viral STI Panel, providing an important benchmark for future comparisons. Secondly, the VHC and VIS workflows, equipped with embedded customized viral databases, efficiently localized viral-host integration sites, and identified disrupted human genes. This end-to-end workflow serves to facilitate translational research and enhance our understanding of viral oncogenesis.
We acknowledge the limitations of our study, which focused on cancer cell lines and a single synthetic plasmid for performance testing. To further our investigation, we intend to broaden our testing to include clinical samples and datasets. Hybrid-capture NGS technology exhibits remarkable versatility and can be applied to diverse starting materials e.g., genomic RNA or DNA extracted from cells, fresh or formalin-fixed paraffin-embedded (FFPE) tissues [
22]. Additionally, NGS analysis of cell-free (cfDNA) or circulating tumor DNA (ctDNA) holds great promise for mutational profiling [
94]. Sastre-Garau et al. demonstrated that hybrid capture NGS of liquid biopsies (using a standard 10 mL blood sample) from patients with carcinoma of the cervix, oropharynx, oral cavity, anus, and vulva enabled molecular characterization of HPV DNA and identification of host insertion sites [
95]. Similarly, hybrid capture NGS successfully detected cell-free, virus-host chimeric DNA in liquid biopsies obtained from patients with HCCA [
96]. NGS hybrid-capture probes have also been developed to target all HIV-1 subtypes (groups M, N, O, and P) and HIV-2 subtypes (A and B) for monitoring sequence diversity and tracking viral evolution [
97]. In the future, we plan to implement our streamlined approach for detecting viral DNA/RNA fragments in liquid biopsies. This promising, non-invasive test has the potential to assess therapeutic response and detect residual or recurrent disease in virally induced cancers.
Abbreviations
AdenoCA, adenocarcinoma; AN, accession number; ANI, Average Nucleotide Identity; AP, Alignment Percentage; BLAST, Basic local alignment search tool; BV-BRC, Bacterial and Viral Bioinformatics Resource Center; cccDNA, covalently closed circular DNA; CCRS, chemokine receptor type 5; CDS, coding DNA sequence; cfDNA, cell-free DNA; CFS, Common Fragile Site; chr, chromosome; CLC MGM, CLC Microbial Genomics Module; ctDNA, circulating tumor DNA; CXCR4, chemokine receptor type 4; DDR, DNA damage repair; dsDNA, double-stranded DNA; dslDNA, double-stranded linear DNA; ECM, extracellular matrix; gDNA, genomic DNA; GFR, growth factor receptor; HBV, Hepatitis B virus; HCCA, hepatocellular carcinoma; HIV, Human immunodeficiency virus type 1; HPV, Human papillomavirus; HSPG, heparin sulfate proteoglycan; hyb-cap NGS, hybrid-capture next-generation sequencing; IARC, International Agency for Research on Cancer; ICTV, International Committee on Taxonomy of Viruses; ISCN, International System for Human Cytogenetic Nomenclature; LANL, Los Alamos National Laboratory; MMEJ, micro-homology mediated end-joining; NGS, next-generation sequencing; NHEJ, non-homologous end joining; NJ, Neighbor Joining; NTCP, Na+-taurocholate co-transporting polypeptide; PWC, pairwise comparison; rcDNA, relaxed circular DNA; RSV, Rous sarcoma virus; SCCA, squamous cell carcinoma; SRA, Sequence Read Archive; VHC, Viral Hybrid Capture; VIS, Viral Integration Site; WGA, whole genome alignment
Figure 1.
Viral carcinogenesis, viral entry, and viral-host genome integration. Hallmarks of viral carcinogenesis from integration to malignant transformation for human oncoviruses i.e., HPV, HBV, and HIV-1* (accelerates oncovirus-mediated carcinogenesis). Viral entry into mammalian cells is host and surface receptor specific [
4,
11,
12]. The modes of viral-host integration differ respectively among HPV, HBV, and HIV-1 to include faulty viral partitioning to daughter cells during mitosis, non-homologous end joining (NHEJ) or micro-homology mediated end-joining (MMEJ) at double-strand DNA breaks, and provirus insertion [
13,
14,
15,
16,
17,
18,
19]. cccDNA, covalently closed circular DNA; CCRS; chemokine receptor type 5; CXCR4, chemokine receptor type 4; DDR, DNA damage repair; dslDNA, double-stranded linear DNA; ECM, extracellular matrix; GFR, growth factor receptor; HSPG, heparin sulfate proteoglycan; NTCP, Na+-taurocholate co-transporting polypeptide; rcDNA, relaxed circular DNA (figure created with BioRender.com).
Figure 1.
Viral carcinogenesis, viral entry, and viral-host genome integration. Hallmarks of viral carcinogenesis from integration to malignant transformation for human oncoviruses i.e., HPV, HBV, and HIV-1* (accelerates oncovirus-mediated carcinogenesis). Viral entry into mammalian cells is host and surface receptor specific [
4,
11,
12]. The modes of viral-host integration differ respectively among HPV, HBV, and HIV-1 to include faulty viral partitioning to daughter cells during mitosis, non-homologous end joining (NHEJ) or micro-homology mediated end-joining (MMEJ) at double-strand DNA breaks, and provirus insertion [
13,
14,
15,
16,
17,
18,
19]. cccDNA, covalently closed circular DNA; CCRS; chemokine receptor type 5; CXCR4, chemokine receptor type 4; DDR, DNA damage repair; dslDNA, double-stranded linear DNA; ECM, extracellular matrix; GFR, growth factor receptor; HSPG, heparin sulfate proteoglycan; NTCP, Na+-taurocholate co-transporting polypeptide; rcDNA, relaxed circular DNA (figure created with BioRender.com).
Figure 2.
Bioinformatics methods (
A) CLC Microbial Genomics Module, databases and dataset used for Whole Genome Alignment (WGA), Viral Hybrid Capture (VHC) data analysis and Viral Integration Site (VIS) analysis. Primary workflows and tools used for this study are designated by the virus icon (
); (
B) WGA workflow steps (1-4) with user-defined parameter settings for WGA and annotation e.g., HBV RefSeq (*) genome; (
C) Create Average Nucleotide Identity Comparison workflow inputs the WGA file for quantification of the similarity between genomes, and outputs a pairwise comparison matrix; (
D) VHC workflow steps (1-11) with user-defined parameter settings for Taxonomic Profiling (*) e.g., HBV reference index and Host genome index; (
E) VIS workflow steps (1-4) with selected HBV(*) and Host reference genome databases and user-defined search parameters entered for this study.
Figure 2.
Bioinformatics methods (
A) CLC Microbial Genomics Module, databases and dataset used for Whole Genome Alignment (WGA), Viral Hybrid Capture (VHC) data analysis and Viral Integration Site (VIS) analysis. Primary workflows and tools used for this study are designated by the virus icon (
); (
B) WGA workflow steps (1-4) with user-defined parameter settings for WGA and annotation e.g., HBV RefSeq (*) genome; (
C) Create Average Nucleotide Identity Comparison workflow inputs the WGA file for quantification of the similarity between genomes, and outputs a pairwise comparison matrix; (
D) VHC workflow steps (1-11) with user-defined parameter settings for Taxonomic Profiling (*) e.g., HBV reference index and Host genome index; (
E) VIS workflow steps (1-4) with selected HBV(*) and Host reference genome databases and user-defined search parameters entered for this study.
Figure 3.
Circular phylograms of HBV and HIV-1 genomes (A) Phylogram of aligned HBV whole genomes (n = 268) clustered into 10 genotypes (A-J). The genotypes (clades) and sub-types (nodes) reveal the relatedness of its member samples. Two genomes, NC_003977 (HBV RefSeq) and AB267090 carried the conventional classification by serology (adw, adr, ayw and ayr) based on HBV surface antigen (HBsAg) reactivity. All genomes clustered according to its assigned genotype except for four accessions found in genotypes B and C (pink/blue discordancy in figure); (B) Phylogram of aligned HIV-1 whole genomes (n = 53) clustered into 4 groups (M-P). All genomes clustered according to their assigned group (clade) and sub-type (nodes). The outermost ring (label) displays the NCBI accession number and release date (year).
Figure 3.
Circular phylograms of HBV and HIV-1 genomes (A) Phylogram of aligned HBV whole genomes (n = 268) clustered into 10 genotypes (A-J). The genotypes (clades) and sub-types (nodes) reveal the relatedness of its member samples. Two genomes, NC_003977 (HBV RefSeq) and AB267090 carried the conventional classification by serology (adw, adr, ayw and ayr) based on HBV surface antigen (HBsAg) reactivity. All genomes clustered according to its assigned genotype except for four accessions found in genotypes B and C (pink/blue discordancy in figure); (B) Phylogram of aligned HIV-1 whole genomes (n = 53) clustered into 4 groups (M-P). All genomes clustered according to their assigned group (clade) and sub-type (nodes). The outermost ring (label) displays the NCBI accession number and release date (year).
Figure 4.
Viral Hybrid Capture (VHC) analysis (A) Relative abundance of HPV, HBV and HIV-1 genotypes found in individual samples (n = 11) after taxonomic profiling are shown as stacked bars. For HPV-positive cell lines (S01-S07), three HPV genotypes were identified in the cohort. For S08 through S11, the HBV and HIV-1 genotypes/groups and sub-lineages were deciphered by taxonomic profiling (see legend); (B) For HPV-16 and -18 positive cervical and oral samples, the sub-lineages (alphanumeric code in the label) were determined by BLAST against the HPV variant reference database. The sub-lineages were genetically distinct as shown by the divergent branches in the phylogenetic tree. Cx, cervix; Phx, pharynx; Tng, tongue.
Figure 4.
Viral Hybrid Capture (VHC) analysis (A) Relative abundance of HPV, HBV and HIV-1 genotypes found in individual samples (n = 11) after taxonomic profiling are shown as stacked bars. For HPV-positive cell lines (S01-S07), three HPV genotypes were identified in the cohort. For S08 through S11, the HBV and HIV-1 genotypes/groups and sub-lineages were deciphered by taxonomic profiling (see legend); (B) For HPV-16 and -18 positive cervical and oral samples, the sub-lineages (alphanumeric code in the label) were determined by BLAST against the HPV variant reference database. The sub-lineages were genetically distinct as shown by the divergent branches in the phylogenetic tree. Cx, cervix; Phx, pharynx; Tng, tongue.
Figure 5.
Viral Hybrid Capture (VHC) track list view. (A-C) Representative VHC track lists for HeLa, 3B2 and ACH-2 cell lines containing HPV, HBV, and HIV-1, respectively. The displayed tracks include (top to bottom): best match sequence, read mapping against the viral reference genome, coding sequence (CDS) track, low coverage areas, and annotated variant track. Low coverage regions correspond to viral genomic gaps (breaks) for the individual viruses. The read mapping track was truncated due to its extensive length.
Figure 5.
Viral Hybrid Capture (VHC) track list view. (A-C) Representative VHC track lists for HeLa, 3B2 and ACH-2 cell lines containing HPV, HBV, and HIV-1, respectively. The displayed tracks include (top to bottom): best match sequence, read mapping against the viral reference genome, coding sequence (CDS) track, low coverage areas, and annotated variant track. Low coverage regions correspond to viral genomic gaps (breaks) for the individual viruses. The read mapping track was truncated due to its extensive length.
Figure 6.
Viral-host Integration Site (VIS) Analysis of HeLa cells (A) VIS circular plots in chromosome view (left) and gene view (right) revealed 2 integration sites at chromosomal cytobands 8p24.21 and 21p11.2. The dynamic functions of the VIS circular plot (i.e., genome rotation and zoom) facilitated rapid inspection of the integration sites; (B) Read mappings to HPV-18 and chromosome 8 at viral or host breakpoints (vertical brown bars with genomic coordinates) reveal both forward (green) and reverse (red) viral-host chimeric reads (bolded HPV/Host). The unaligned chimeric segment of a read sequence stands out with its subdued color. Read mappings were truncated due to their extensive length.
Figure 6.
Viral-host Integration Site (VIS) Analysis of HeLa cells (A) VIS circular plots in chromosome view (left) and gene view (right) revealed 2 integration sites at chromosomal cytobands 8p24.21 and 21p11.2. The dynamic functions of the VIS circular plot (i.e., genome rotation and zoom) facilitated rapid inspection of the integration sites; (B) Read mappings to HPV-18 and chromosome 8 at viral or host breakpoints (vertical brown bars with genomic coordinates) reveal both forward (green) and reverse (red) viral-host chimeric reads (bolded HPV/Host). The unaligned chimeric segment of a read sequence stands out with its subdued color. Read mappings were truncated due to their extensive length.
Figure 7.
Viral Integration Site (VIS) circular plots. Collage of VIS circular plots for samples (S01 to S11) in chromosome view. Virus-host integration linkages manifested as chimeric reads are designated by the bi-directional curvilinear lines. As expected, the C-33A cervical cancer cell line (p53+ and pRB+) and the synthetic HIV-1 plasmid lacking viral incorporation did not show any viral-host integration. In contrast, the HIV-1 infected ACH-2 cell had a single integration site, while all other virally transformed cell lines exhibited multiple integration sites. (anatomical icons created with BioRender.com).
Figure 7.
Viral Integration Site (VIS) circular plots. Collage of VIS circular plots for samples (S01 to S11) in chromosome view. Virus-host integration linkages manifested as chimeric reads are designated by the bi-directional curvilinear lines. As expected, the C-33A cervical cancer cell line (p53+ and pRB+) and the synthetic HIV-1 plasmid lacking viral incorporation did not show any viral-host integration. In contrast, the HIV-1 infected ACH-2 cell had a single integration site, while all other virally transformed cell lines exhibited multiple integration sites. (anatomical icons created with BioRender.com).
Figure 8.
Correlation between NGS Reads and VHC/VIS Workflow Runtimes. (A) The sequencing file sizes of the 11 samples ranged broadly between 93 and 2,535 MB with a median of 571.6 MB; (B) The file size correlated near-perfectly with the number of merged sequences after log2-log10 transformation, respectively (R2 = 0.90); (C, D) The number of merged reads (log10) correlated positively with VHC and VIS workflow runtimes in a linear-log relationship. The correlation was modest for both VHC and VIS with R2 = 0.53 and R2 = 0.81, respectively. The regression equations are useful for estimation of workflow runtimes based on number of merged reads. Log transformation was used to compress the wide range of X- or Y-values, making them suitable for linear modeling.
Figure 8.
Correlation between NGS Reads and VHC/VIS Workflow Runtimes. (A) The sequencing file sizes of the 11 samples ranged broadly between 93 and 2,535 MB with a median of 571.6 MB; (B) The file size correlated near-perfectly with the number of merged sequences after log2-log10 transformation, respectively (R2 = 0.90); (C, D) The number of merged reads (log10) correlated positively with VHC and VIS workflow runtimes in a linear-log relationship. The correlation was modest for both VHC and VIS with R2 = 0.53 and R2 = 0.81, respectively. The regression equations are useful for estimation of workflow runtimes based on number of merged reads. Log transformation was used to compress the wide range of X- or Y-values, making them suitable for linear modeling.
Table 1.
Cell line and construct information.
Table 1.
Cell line and construct information.
Sample No. |
Cell line or construct1
|
Age |
Genome ancestry |
Virus-genotype |
Tumor site |
Histology |
S01 |
SiHa |
55 |
NE. Asian (Japanese) |
HPV-16 |
Cervix |
SCCA |
S02 |
HeLa |
30 |
African (American) |
HPV-18 |
Cervix |
AdenoCA |
S03 |
CaSki |
40 |
N European |
HPV-16 |
Cervix (met)5
|
SCCA |
S04 |
C-33A |
66 |
N European |
P53+, pRB+ |
Cervix |
SCCA |
S05 |
DoTc2 |
NS |
N European |
HPV-16 |
Cervix |
NS |
S06 |
2A3 |
56 |
SE Asian (Indian) |
HPV-164
|
Hypopharynx |
SCCA |
S07 |
SCC154 |
54 |
Caucasian |
HPV-164
|
Tongue |
SCCA |
S08 |
3B2.1-7 |
8 |
African (American) |
HBV-A24
|
Liver |
HCCA |
S09 |
SNU-182 |
24 |
NE Asian (Korean) |
HBV-C4
|
Liver |
HCCA |
S10 |
Syn HIV-12
|
NA |
NA |
HIV-1 M4
|
NA |
NA |
S11 |
ACH-23
|
3 |
Caucasian |
HIV-1 M4
|
Blood |
T-cell |
Table 2.
Cell lines, viral integration sites, and disrupted host genes.
Table 2.
Cell lines, viral integration sites, and disrupted host genes.
VIS |
Cell line |
Cervix |
Oropharynx |
Liver |
T-cell |
SiHa |
HeLa |
CaSki |
DoTc2 |
2A3 |
SCC154 |
3B2 |
SNU-182 |
ACH-2 |
1 |
13q22.11,2,3 LINC003934 KLF125
|
8q24.212,3 PCAT14 CASC194 MYC5
|
2p11.2 LINC01830 PRR30
|
2p22.3 LINC004864
|
12q24.331 FBRSL1
|
6p12.2 7SK4
|
4q13.36
|
1q25.11 TNR
|
7p14.3 NT5C3A FKBP95 RP95
|
2 |
21p11.22 MIR3648-15 RNA5-8SNx
|
21p11.22 MIR3648-15 RNA5-8SNx
|
3q236
|
3p25.11 ANKRD284 RN7SL4P4
|
16p13.3 METTL26
|
8q24.31,2 HGH14
|
13q31.35
|
2q34 UNC804
|
|
3 |
|
|
6p21.16
|
4p15.311 IGFBP75
|
20q11.231 RPN24
|
14q21.3 RN7SL24
|
16q11.26
|
4q31.35
|
|
4 |
|
|
10p146
|
6p21.32 ZBTB22
|
21p11.22 MIR3648-15 RNA5-8SNx
|
21p11.22 MIR3648-15 RNA5-8SNx
|
21p11.22 MIR3648-15 RNA5-8SNx
|
8q24.131,2 HAS2-AS14
|
|
5 |
|
|
11p15.46
|
7q21.3 GNGT14
|
22q11.22 PPIL24
|
21q21.17
|
Yq126
|
17p11.16
|
|
6 |
|
|
14q21.3 MDGA24
|
8p11.21 PLAT4 HGH14
|
|
|
|
21p11.22 MIR3648-15 RNA5-8SNx
|
|
7 |
|
|
19q13.421 BRSK14
|
9p23 PTPRD4 RN7SL5P RMRP4
|
|
|
|
|
|
8 |
|
|
20p11.16
|
10q26.111 TUBGCP24 RGS104
|
|
|
|
|
|
9 |
|
|
Xq27.31, 6
|
14q21.3 RPPH14 RN7SL24 RPS294 RN7SL14
|
|
|
|
|
|
10 |
|
|
|
16p13.3 METTL26
|
|
|
|
|
|
11 |
|
|
|
21p11.22 MIR3648-15 RNA5-8SNx
|
|
|
|
|
|
12 |
|
|
|
22q11.22 PPIL24
|
|
|
|
|
|
13 |
|
|
|
Xq21.2 DACH24
|
|
|
|
|
|