2. Methods
Because HIV uses the host’s NF-κB signaling pathway to activate viral transcription [
3], the author designed the following experiment. First, the author prepared several T cells and HIV-1 RNAs. The HIV genome contains at least nine genes, including Gag, Pol and Env [
4]. The
IGF1R gene is located on human chromosome 15, which contains at least 21 exons, such as ENSE00003838363 and ENSE00001316091 [
5]. Using mathematical models can help understand phenomena in biology. In mathematics, the genome can be defined as a set of elements by listing the elements between curly brackets and separated by commas:
where
denotes the set of HIV genomes and
represents the set of
IGF1R genes. The
IGF1R gene is one of the known target genes of androgen receptor activation [
6]. Hence, the process of transcription can be written in the following form:
where the domain of
is the set of genes that RNA polymerase II will transcribe, and
represents the androgen. The CRISPR‒Cas9 enzyme [
7,
8] then copies the enhancer of the
IGF1R gene into the promoter-proximal region of HIV-1 RNAs. Subsequently, T cells are infected with the modified virus, and the form can then be rewritten as follows:
Androgens are then injected into the T cells. After the
IGF1R gene is transcribed by RNA polymerase II [
9], HIV will also 'wake up' [
10]. Python is one of the most popular programming languages [
11] and can be used to write scripts that can check the accuracy of mathematical formulas:
The set represents the HIV genome, and the set represents the IGF1R gene. The string is then defined to represent androgen, whereas represents the function of RNA polymerase II, which returns the IGF1R gene and HIV genome applied to the string . The result of is then printed to verify whether the virus was activated. As a result, the Python program returns True, which indicates that dormant HIV infection is reawakened.
Can this possibly mean that androgen reawakens sleeping HIV? The answer is that androgen reawakens both the IGF1R gene and HIV genome and not only the retrovirus. In fact, even the Python program returns a result of False.
The collection of elements returned by the method includes the HIV set, which does not mean that the two sets are equal. Mathematically, their relationship can be expressed as follows:
The author will not actually copy the enhancer of the
IGF1R gene into the virus due to related studies: the promoter-proximal (enhancer) region of the HIV-1 long terminal repeat contains two adjacent NF-κB binding sites that play a central role in mediating inducible HIV-1 gene expression [
3,
12,
13].
Several studies claim that AZD5582 can reawaken sleeping HIV and SIV, but the effectiveness rate was found to be only 42% [
14,
15,
16]. Most importantly, the novel small-molecule IAP inhibitor AZD5582 has been used for the treatment of cancer and reportedly causes cIAP1 degradation and thus induces apoptosis in the MDA-MB-231 breast cancer cell line at subnanomolar concentrations in vitro [
17].
Latency-reversing drugs are involved in the process of transcription of cancer genes, and the virus genome reawakened by these drugs contains NF-κB binding sites; thus, the drugs do not directly reawaken dormant retrovirus infection. This method may not be easily understood by some readers; thus, the author provides another example:
To ensure that babies are not carried away by the wrong parents in a hospital (nucleus) with three newborn babies, Adam (HIV genome), Bob (IGF1R gene), and Claire (cancer gene), babies and their parents are given wristbands with their corresponding names (enhancer region). Adam is a naughty boy who secretly made a copy of Claire’s wristband (NF-κB binding sites) and placed it on his hand. Because nurses (RNA polymerase II) cannot recognize the appearance of babies, babies can only be identified by their wristbands. When Claire’s parents (NF-κB) wanted to take their child away from the hospital, the parents handed over their wristband to the nurses and asked them to find their child. Because both Adam and Claire had Claire’s name written on their wristbands, both babies were taken away by the parents.
Based on this example, it feels like the previous studies were attempting to use Claire’s wristband (NF-κB binding sites) to find Adam (HIV genome), which is clearly inappropriate. More importantly, the mutation rate of HIV-1 is extremely high [
18]; if Claire’s name on the wristband mutates to Clara or Clark (another cancer gene), the drugs that target Claire will have no effect on the mutated virus.
In addition to AZD5582, many studies claim that latency-reversing drugs, including ciapavir [
19], bryostatin-1 [
20], disulfiram [
21], ingenol-B [
22], and prostratin [
23], can be used to reawaken sleeping HIV. These latency-reversing drugs have also been used for the treatment of cancer: disulfiram inhibits prostate cancer cell growth [
24], bryostatin-1 exhibits potent antitumor activity in vitro and in vivo in human tumor xenografts [
25], semisynthetic ingenol compounds show potent antitumor activity on all cancer cell lines evaluated [
26], and prostratin exerts a potential anticancer effect through SIK3 inhibition [
27].
One type of latency-reversing drug approach will not work in different patients infected with different types of mutated viruses unless multiple drugs are used at the same time. However, the mutation rate of HIV-1 is extremely high, which means that scientists have to constantly develop new drugs for new viruses. More importantly, the virus uses the NF-κB pathway to enhance its expression, which does not mean that the virus must have an NF-κB primer binding site. If the primer binding site is mutated into the enhancer of other genes unrelated to NF-κB, latency-reversing drugs will have no effect on patients. The author thus believes that instead of using Claire, Clara and Clark’s wristbands to find Adam indirectly, it would be better to use Adam’s wristbands to find Adam directly. In other words, the use of viral proteins can more reliably reawaken dormant retroviruses.
Viral RNA is specifically packaged into virions, not IGF1R or cancer RNA; thus, the virus can accurately identify viral RNA. Therefore, viral proteins carry information that can identify viral RNA, just as the androgen receptor activates the IGF1R gene. It is possible that a certain viral protein has a similar function to NF-κB or androgen receptor, which can be used to identify viral RNA directly.
It is well known that HIV recruits human uncharged tRNA to serve as the reverse transcription primer [
28], and tRNA serves as the physical link between the mRNA and the amino acid sequences of proteins [
29]. The author hence believes that uncharged tRNA serves as the physical link between the promoter and the protein receptors, which are recruited by RNA polymerase II. To determine which viral proteins match the primer binding site, a Python program was written to match all proteins with their own gene sequences and display them graphically.
3. Results
Latent HIV can synthesize a 5'-3' RNA chain by transcribing the existing 3'-5' complementary DNA strand after cellular infection [
30,
31]. The author uses the x-axis to represent the protein and the y-axis to represent the primer binding site. Negative numbers indicate that the protein or tRNA may have rotated 180 degrees (which did not happen) or been bound in the 3'-to-5' direction (if both values are negative).
The author also expanded the analysis to include other retroviruses, including Deltaretroviruses (HTLV and STLV) and Lentiviruses (HIV, SIV, and FIV). The gene data was sourced from the GenBank database at the NCBI. The author used the following sequences for the analysis: NC_001436 (HTLV-1) [
32], NC_001488 (HTLV-2) [
33], NC_000858 (STLV-1) [
34], NC_001815 (STLV-2) [
35], NC_001802 (HIV-1) [
36], NC_001722 (HIV-2) [
37], NC_001549 (SIV) [
38], and NC_001482 (FIV) [
39].
Having 2 amino acid sequences of the matching points leads to many possibilities, and it is thus impossible to confirm which protein matches the primer binding site. When there are 4 amino acid sequences, no matching target can be found. However, when there are 3 amino acid sequences, there is exactly one perfect matching region. Different types of retroviruses are represented by different patterns and colors, and their sequences around the primer binding site are matched with their own proteins, as shown in
Figure 1.
As shown in
Figure 1, inside the red box, the coordinates of 8 different viruses appear at the same time and are extremely close, which means that they represent the same protein. Other locations contained either only Deltaretroviruses or only Lentiviruses, and the spacing between the different color coordinates was too large, indicating that they were not even the same protein and were therefore excluded. If the virus amino acid sequences of the protein mutated, its primer binding site remained the same, which means that it was not the matching target.
In the GenBank database, the primer binding site of the HTLV-2 (NC_001488) genome is approximately nt 766 to 783, and that of the HIV-1 (NC_001802) genome is approximately nt 182 to 199. Their primer binding sites start with TGG and end with GGGA, and after aligning the sequences, their matching points can be found in the same position, as shown in
Figure 2.
As shown in
Figure 2, the sequences on the second line represent the 3'-5' complementary DNA strand that gag proteins match with, and the arrow indicates the 3'-5' direction. The Gag proteins of the viruses match the same primer binding site, even though the viruses are highly different.
To determine whether this finding is a coincidence, the author analyzed the probability. Because viruses of the same type, Deltaretrovirus or Lentivirus, have the same primer binding site, one virus can be considered a mutation from another. The author used the HTLV-1 and HIV-1 genomes as templates and used Pairwise Sequence Alignment (EMBOSS Needle) to compare the genetic similarity of different viruses. The similarity of the NC_001488 (HTLV-2), NC_000858 (STLV-1) and NC_001815 (STLV-2) genomes to the NC_001436 (HTLV-1) genome was 59.2%, 89.3% and 61.0%, respectively. The similarity of the NC_001722 (HIV-2), NC_001549 (SIV) and NC_001482 (FIV) genomes to the NC_001802 (HIV-1) genome was 51.1%, 54.1% and 49.1%, respectively. The similarity of six viruses can be written as
The average probabilities of the amino acid sequences A (GCT, GCC, GCA, GCG), R (CGT, CGC, CGA, CGG, AGA, AGG), G (GGT, GGC, GGA, GGG), S (TCT, TCC, TCA, TCG, AGT, AGC) and P (CCT, CCC, CCA, CCG) remaining unchanged after a mutation are 3/63, 5/63, 3/63, 5/63, and 3/63, respectively. Thus, the average probabilities of the amino acid sequences ARG and SPR remaining unchanged after a mutation are 11/189 and 13/189, respectively. Therefore, the average probability that 3 amino acid sequences of six viruses remain unchanged after a mutation can be represented by
as follows:
Assuming that each gene sequence has the same probability of mutation, the number of amino acid sequence mutations increases with increases in the diversity of the viruses. The probability that 3 amino acid sequences of different viruses match the same primer binding site is
The result shows that the probability is approximately 3.67636×10-9, which is extremely small; thus, it can be determined that Gag proteins can match the primer binding site.
Related studies showed that the genomic Gag/Gag-Pol complex recruits the LysRS/tRNA complex [
40], the selective packaging of the tRNA primer requires HIV-1 Gag and Gag-Pol [
41], and an interaction between LysRS and Gag is observed in vitro [
42]. Since HIV-1 initiates the process of reverse transcription by using tRNA(Lys) to bind to the genomic RNA at the primer binding site [
43], it has been proven that the gag protein matches the primer binding site.
In HIV-1, Gag/LysRS interaction depends on Gag sequences within the C-terminal domain (CTD) of CA around amino acids 283-363 [
44] and motif 1 of LysRS around amino acids 208-259 [
42]. It should be noted that the amino acid sequence SPR of the Gag protein is located at amino acids 148-150 within the N-terminal domain (NTD) of CA, specifically at the NTD-NTD interface 1.
5. Conclusions
Latency-reversing drugs are involved in the transcription of cancer genes, and the virus genomes that they reawaken contain the same NF-κB primer binding sites; thus, the drugs are not directly reawaken dormant HIV infection. The amino acid sequence ARG of Gag proteins of HTLV-1, HTLV-2, STLV-1 and STLV-2 match their primer binding site GGGGGCTCG in the 3'-to-5' direction, and the amino acid sequence SPR of Gag proteins of HIV-1, HIV-2, SIV and FIV match their primer binding site GGCGCCCGA in the 3'-to-5' direction. Related studies showed that the genomic Gag/Gag-Pol complex recruits the LysRS/tRNA complex, the selective packaging of the tRNA primer requires HIV-1 Gag and Gag-Pol, and an interaction between LysRS and Gag is observed in vitro. In contrast, Gag proteins can more reliably be used to directly reawaken dormant HIV infection, which recruits human uncharged tRNA to serve as the reverse transcription primer.
Authors Contributions: S.C. wrote the manuscript.
Funding: None.
Ethics approval and consent to participate: Not applicable.
Consent to publish: The author gives consent for the publication of identifiable details, which can include photographs and details within the text to be published in the above Journal and Article.
Availability of data and materials: The datasets were produced by Python3, and the tool is available at
https://github.com/rheast/genome. Pairwise Sequence Alignment (EMBOSS Needle) was used to identify regions of similarity between two biological sequences with the tool available at
https://www.ebi.ac.uk/Tools/psa/. Nucleotides were downloaded from the NCBI database at
https://www.ncbi.nlm.nih.gov/nuccore/. The sample nucleotides correspond to the accession numbers NC_001436, NC_001488, NC_000858, NC_001815, NC_001802, NC_001722, NC_001549 and NC_001482.
Acknowledgments: None.
Competing interests: There are no conflicts of interest.