1. Introduction
According to the estimation of the Joint United Nations Programme on HIV/AIDS (UNAIDS), there are 39 million people living with HIV (PLHIV), and 29.58 million of them have access to antiretroviral therapy (ART) around the world. Of these, 2.2 million are from Latin America [
1].
The human leukocyte antigen (HLA) is situated in a region spanning approximately 4 million base pairs on the short arm of human chromosome 6 (6p21.3). This region encompasses over 200 protein-coding genes [
2], with more than 35,000 alleles present in the HLA locus [
3,
4,
5]. HLA class I and class II genes play a central role in host adaptive immune responses to infectious pathogens. HLA class I genes are also involved in the host innate immune response. Class I genes encode proteins that facilitate the presentation of antigenic peptides from pathogens to CD8+ T cells, leading to the destruction of infected cells and pathogens, while HLA class II antigens contribute to the initiation of adaptive immune responses by presenting antigenic epitopes to CD4+ T cells. The remarkable polymorphism of HLA class I and class II genes contributes to the diverse host immune responses to many infectious pathogens [
6].
In recent years, advancements in next-generation sequencing (NGS) have brought significant improvements, allowing for rapid, high-throughput sequencing of genomes, exomes, and targeted gene panels, with the ability to process multiple samples simultaneously. Within this framework, whole exome sequencing (WES) can be adapted for various configurations, tailored to clinical or research needs, particularly for HLA alleles determination [
7]. The incorporation of NGS, a sophisticated, high-resolution assay, has been a significant advancement in HLA typing and has markedly improved the performance and data collection of many HLA laboratories [
3,
8,
9].
Considering the potential impact of various HLA alleles on both HIV acquisition and pathogenesis, these findings will provide deeper insights into host genetics implicated in differential immunological responses to infection and disease progression. Such insights are crucial to optimizing treatment and vaccine effectiveness, as well as contextualizing the human host determinants that influence the epidemiology of HIV in distinct populations. Our aim is to identify HLA alleles that are associated with susceptibility or resistance to HIV in the Peruvian population with a major South American autochthonous ancestry, part of the 80% of the world population neglected under population genetics studies. This will contribute to the knowledge of alleles that would serve as markers in this population and those of the neighboring countries, as well as to their response to HIV.
2. Materials AND Methods
2.1. Study Population and Design
We conducted a cross-sectional study involving 59 PLHIV from the Santa Rosa Hospital (Lima, Perú) and 46 HIV-uninfected individuals with high-risk behaviors from the NGO “MCC Voluntades Lima Norte” (Lima, Perú). The study participants were enrolled, from December 2019 to October 2021.
Eligibility criteria included being of Peruvian nationality, aged 18 years or older, male or female, and gave written informed consent. HIV seropositive individuals were referred to medical care and assigned to the PLHIV group. The inclusion criteria for the HIV-uninfected group were a negative rapid HIV test, exhibiting high-risk sexual behaviors, and perception of HIV risk. Transgender women who met the eligibility criteria were included in this group.
Data from patients (socio-demographic, clinical, immunological and laboratory results) were obtained from the clinical history of each patient and the CD4 T cell count results were provided by the National Reference Laboratory for STD/HIV-AIDS at the Instituto Nacional de Salud (INS) in Lima.
For the HIV-uninfected group, all individuals completed a brief questionnaire about their high-risk sexual behaviors regarding HIV transmission, such as their history of sexual contact with sex workers, diagnosis or treatment of a sexually-transmitted infection (STI), condom use, and having multiple sexual partners.
2.2. Biological Samples and DNA Extraction
Blood samples were collected in 3 ml ethylenediaminetetraacetic acid (EDTA) tubes from all the enrolled participants in both groups, and DNA was extracted using the salting-out procedure with minor modifications [
10]. Samples were then quantified and checked for purity using a NanoDrop spectrophotometer (Thermo Scientific).
2.3. Whole-Exome Sequencing (WES)
Extracted DNA was sent to two different service providers, who carried out library construction and WES: Macrogen (Korea) and Novogene (Cambridge, UK). DNA libraries were prepared using SureSelect XT Human All Exon V6 - bait library: S07604514 (Agilent Technologies, Santa Clara, CA, USA), and WES was performed using 150 bp paired-end reads and sequenced on the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA) according to the manufacturer’s instructions.
2.4. Bioinformatics Pipeline to HLA Typing
We received the exome data and established an in-house bioinformatics pipeline for HLA typing. First, the WES FastQ files were preprocessed by fastp (version 0.23.2) [
11] and then mapped to the human reference genome GRCh37/hg19 using BWA software (version 0.7.17-r1188, bwa mem) [
12]. The filtered and mapped reads were sorted by coordinates using SAMtools (version 1.7) [
12], and possible PCR duplicate reads were marked by Picard (version 2.27.4, Broad Institute). The reads that matched the targeted chromosome 6 sequences were extracted from the bam file and mapped on a comprehensive reference panel from the IMGT/HLA database (v3.52.0) [
4]. In addition, we evaluated the germline variants (single-nucleotide polymorphisms [SNPs] and small insertions/deletions [indels]) in the HLA region.
We performed the HLA typing analysis with the reads from chromosome 6 that were generated as an input. We imputed HLA class I alleles for three loci (HLA-A, HLA-B and HLA-C) using OptiType software (version 1.3.3) [
13] and HLA class II alleles for four loci (HLA-DPB1, HLA-DQA1, HLA-DQB1 and HLA-DRB1) were imputed with arcasHLA [
14]. The HLA class I and class II alleles were assigned at the 2-field resolution. We set a threshold for the depth (≥20X) of HLA alleles and samples with a depth lower than 20 were excluded. After these considerations, we included 56 cases and 44 controls.
2.5. Statistical Analysis
Socio-demographic and clinical data were expressed as mean ± standard deviation. Comparisons between PLHIV and HIV-uninfected groups were performed using the chi-squared test for categorical/qualitative variables and the Mann–Whitney U test (or Student’s t-test) for quantitative variables. Likewise, HLA allele frequencies and HLA association analysis for performing case–control studies were determined with PyHLA software [
15] using additive logistic regression models. Age, sex, and CD4 values were included as covariates in the association test.
In all analyses, a p-value less than 0.05 and a confidence interval of 95% were considered statistically significant. R software (version: 4.3.1) was used for statistical analysis and plot creation.
2.6. Hardware and Software Environment
All software was run according to instructions on the Linux server (Ubuntu 18.04) with the following hardware configuration: Intel(R) Xeon(R) CPU E5-2697 v2 @ 3.50 GHz each with 24 physical CPU cores, and 94 GiB RAM installed. All computational resources were provided by the Research Center of Genetic and Molecular Biology – Universidad de San Martin de Porres.
2.7. Ethics Statement
The study received ethical approval from the “Comité de Ética en Investigación de la Universidad de San Martín de Porres” (IRB IORB00003251 OHRP/FDA). Likewise, this study was also approved by the Ethics Committees of the following institutes that participated in the study: the National Institute of Health of Peru and Santa Rosa Hospital. Written informed consent from patients was obtained locally at Santa Rosa Hospital and the NGO “MCC Voluntades Lima Norte”. An explanation session for each participant was held to clarify doubts about the procedure and to explain the results obtained.
3. Results
The study sample included 59 PLHIV (cases) and 46 HIV-uninfected (control) subjects. The relevant socio-demographic, clinical and ART treatment characteristics are provided in
Table 1; the distribution of these characteristics varied according to gender, age, and CD4 cell count between the case and control groups. The majority of participants were male (73.0%) and born in Lima (71.43%). The median age was 41 years in the case group and 36 years in the control groups. At the time of enrollment, the median CD4 count was 634 cells/mm3 and 952.67 cells/mm3 in the PLHIV and HIV-uninfected groups, respectively.
Based on the inclusion criteria concerning HIV risk behaviors presented in the HIV-uninfected group, all participants reported having sex with sex workers. Of these participants, 26 (56.52%) self-identified as heterosexual, 14 (30.43%) as homosexual, and 6 (13.04%) as bisexual. Regarding sexual behavior, 76% of participants reported having had multiple sexual partnerships (5–15 sexual partners in the last 12 months).
High-resolution HLA typing was performed in all 105 enrolled subjects to yield at least 2-field resolution HLA alleles for HLA-A, HLA-B and HLA-C in class I, and HLA-DPB1, HLA-DQA1, HLA-DQB1 and HLA-DRB1 in class II. The number of HLA alleles present across the six HLA loci is provided in
Table 2. A total of 1394 2-field HLA alleles were observed across the seven loci in the population study. In HLA alleles class I, 200, 200, and 198 alleles were counted in HLA-A, HLA-B and HLA-C genes, respectively (
Table 2). In class II alleles, 200, 196, 200, and 200 alleles were counted in HLA-DPB1, HLA-DQA1, HLA-DQB1 and HLA-DRB1 genes, respectively (
Table 2).
The three most frequent alleles for each HLA locus were A*02:01 (35.5%), A*24:02 (11.0%), A*02:11 (10.0%); B*35:01 (11.5%), B*48:01 (6.50%), B*15:04 (6.0%); C*04:01 (22.73%), C*07:02 (18.69%), C*01:02 (12.12%); DPB1*04:02 (29.50%), DPB1*04:01 (16.00%), DPB1*14:01 (13.50%), DQA1*03:01 (22.45%), DQA1*05:03 (11.73%), DQA1*04:01 (11.22%), DQB1*03:419 (28.00%), DQB1*03:02 (21.00%), DQB1*03:96 (10.00%), DRB1*09:01 (14.50%), DRB1*04:07 (13.00%) and DRB1*08:02 (11.0%) prevalence in general population (complete HLA allele frequencies are detailed in
Supplementary Table S1).
We identified three novel SNP-type variants in two patients and one control, respectively: NM_002116.8(HLA-A):c.619+8G>T in splice region variant (Qual:16.5), NM_001243961.2(HLA-DBQ1):c.109+71G>C in region variant (Qual:11.7) and NM_002117.6(HLA-C):c.620-177G>T in region variant (Qual:10). Additionally, we detected one novel deletion variant in a patient: NM_002117.6(HLA-C):c.1097-21del in the intron variant. However, the novel variants had a variant quality score below 20. No novel alleles were identified in the evaluated HLA genes; all detected alleles had already been reported in the IPD-IMGT/HLA database.
The allele frequencies of class I (
Figure 1) and class II (
Figure 2) in the PLHIV sample (pink bar) were compared to those in the HIV-uninfected sample (sky-blue bar). Alleles with frequencies less than 1% were omitted.
We found four HLA alleles that were associated with HIV infection; these are shown in
Table 3 (results for other HLA alleles are detailed in Supplementary
Table 2). The HLA-C, HLA-DQA1 and HLA-DRB1 alleles associated with the risk of HIV infection were C*07:01 (p = 0.0101, OR = 10.222), DQA1*03:02 (p = 0.0051, OR = 5.297) and DRB1*09:01 (p = 0. 0119, OR = 4.788), respectively. Only the HLA-DQB1*03:419 (p = 0.0412, OR = 0.3273) allele conferred protection (
Table 3).
4. Discussion
In this study, we performed high-resolution HLA allele-calling using a WES approach in a sample of a Peruvian population. HLA typing through WES proves to be a powerful tool that offers a comprehensive view of an individual’s genetic composition. This approach allows for the identification of variants in HLA genes and other genes associated with susceptibility to HIV infection. Additionally, it is considered a cost-effective strategy for genomic sequencing projects [
16].
Our results show diverse HLA profiles and frequencies between PLHIV and HIV-uninfected individuals. We analyzed the association of HLA class I and II alleles to determine the relationship of HLA alleles that confer HIV protection or susceptibility in our population. We detected HLA alleles for three class I loci (HLA-A, HLA-B, HLA-C) and four loci in class II (HLA-DPB1, HLA-DQA1, HLA-DQB1, and HLA-DRB1) because these genes have crucial roles in antigen presentation and high degrees of polymorphism [
17]. HLA allele composition is key for association studies and helps us understand the genetic risk of autoimmune and infectious diseases, such as HIV infection [
6].
The relevant socio-demographic, clinical and ART treatment characteristics varied in distribution according to gender, age, and CD4 cell count between the case and control groups. The majority of participants were male (73.0%) and born in Lima (71.43%). The median age was 41 years in the case group and 36 years in the control group. At the time of enrolment, the median CD4 count was 634 cells/mm3 and 952.67 cells/mm3 in the PLHIV and HIV-uninfected groups, respectively. This finding is consistent with a study carried out in Peru, where 162 patients with HIV infection were recruited, the median age was 42 years, 61% patients were male, 71% were heterosexual, and 58% were born in Lima [
18].
On the other hand, the imputed HLA alleles for PLHIV and HIV-uninfected in this Peruvian sample, were in the range of 196 to 200 alleles (
Table 2,
Figure 1 and
Figure 2). The most frequent alleles for each HLA locus were A*02:01 (35.5%), B*35:01 (11.5%), C*04:01 (22.73%), DPB1*04:02 (29.50%), DQA1*03:01 (22.45%), DQB1*03:419 (28.00%) and DRB1*09:01 (14.50%) prevalence in general population. The Allele Frequency Net Database (
http://www.allelefrequencies.net/) [
19] reports the frequencies of HLA alleles detected in different populations. In a study carried out in the Uros population localized in Puno City (Peru), the five most frequent alleles were found to be: B*35:05 (51.5%), A*02:01 (50%), DQB1*03:02 (38.6%), DQB1*04:02 (33.4%) and DRB1*08:02 (31.9%) [
20]. Similar allele frequencies of HLA class I alleles were reported by another study in Lima, with 468 individuals recruited. Of these, 222 were seronegative for HIV-1 (HIV-negative) and 246 were infected with HIV-1. The most common alleles for the different loci were HLA-A*02:01 (46.8%), HLA-B*35:01 (12.0%) and HLA-C*04:01 (37.6%) [
21].
We were able to identify significant associations of common HLA variants in our cohort. We identified significant associations of HLA-C*07:01 (p = 0.0101, OR = 10.222), HLA-DQA1*03:02 (p = 0.0051, OR = 5.297) and HLA-DRB1*09:01 (p = 0.0119, OR = 4.788) with susceptibility to HIV infection while HLA-DQB1*03:419 (p = 0.0478, OR = 0.327) was associated with protection from HIV exposure (
Table 3). A similar study from Peru concluded that HLA-B*35:43 showed the strongest association with HIV acquisition (p = 0.012), while HLA-A*02:01 and HLA-C*04:01 were both associated with high viral loads (p = 0.0313 and 0.0001, respectively) [
21].
HLA-C belongs to class I and encodes a protein composed of a membrane-bound mature heavy chain and a light chain, β2-microglobulin (β2M). HLA-C plays a role in presenting peptides to virus-specific T-cells, although much less is known about the CD8+ T-cell recognition of peptides restricted by HLA-C. This function is crucial for the initiation and maintenance of adaptive immunity [
22]. In addition, our analysis has shown that the HLA-C*07:01 allele might be associated with HIV susceptibility, which is consistent with the results of a genome-wide association study (GWAS) on HIV susceptibility in European individuals [
23], where this allelic variant was classified as a risk allele. Another study in a European population showed that the HLA-C*07:01 allele generates susceptibility to autoimmune hepatitis (AIH) [
34]. Our study confirmed the influence of HLA-C*07:01 on susceptibility to HIV-1 infection.
HLA-DQ molecules form a transmembrane protein composed of an α chain that is encoded by HLA-DQA and consists of α1 and α2 domains and HLA-DQB encoded a β chain consists of β1 and β2 domains. Both chains are anchored in the membrane and form an antigen-presenting groove. No association was found between the DQA1*03:02 allele and infectious diseases. However, this allele has been reported to be associated with systemic lupus erythematosus (SLE) in Chinese Han patients [
25].
The primary function of HLA class II molecules (HLA-DRB1, HLA-DQA1, and HLA-DQB1) is to process exogenous peptides for presentation to CD4+ T cells, which are crucial in antiviral cellular and humoral immunity [
26]. The HLA-DRB1 gene encodes the DRβ1 chain. The association of HLA-DRB1 alleles may affect the specific structure of the HLA-DR molecule and its binding affinity to epitopes [
27]. Previous studies have reported associations of the HLA-DRB1*09:01 allele with infectious diseases. For example, Anzurez et al. conducted a study that found that this allelic variant was associated with the risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in a Japanese population [
26]. In contrast, Nguyen et al. reported that HLA-DRB1*09:01 showed a protective effect against the development of Dengue Shock Syndrome (DSS), particularly in patients with DEN-2 infection in a Vietnamese population [
28]. Although there is no association between DRB1*09:01 and HIV infection, new associations between the DQA1*03:02 and DRB1*09:01 allelic variants and HIV infection have been identified, which could be attributed to an increased affinity and specificity of the peptide-binding region (PBR), influencing the recognition of pathogen-derived antigens [
29].
In contrast to susceptibility alleles, in the HLA-DQB1 locus, the HLA-DQB1*03:419 allele was identified as a novel protective allele against HIV infection in this study. The HLA-DQB1*03032 and HLA-DQB1*0602 alleles have shown protective associations against HIV infection in Caucasian and African American ethnic groups, respectively [
30]. Rallón et al. showed that the HLA-DQB1*03:02 allele was implicated in protection from HIV infection in a Spanish population [
31], and Hardie et al. showed that the HLA-DQB1*0603 allele conferred protection from HIV-1 infection in a Pumwani cohort [
32]. Therefore, the combination of HLA-DQA1 (encodes alpha chain) and HLA-DQB1 (encodes beta chain) forms a binding pocket that determines the DQ molecule’s specificity and diversity for antigen presentation. These allelic variants can affect peptide binding, leading to differential antigen presentation by the DQ molecule, which may be associated with resistance and susceptibility to HIV-1 infection.
Our results add information about a population with scarce, namely the Peruvian population, whose genetic composition is represented by that of Lima, the capital city of Peru, with about one third of the country’s population mostly coming from all over the country in the last 70 years. Lima has a mixed population, characterized by a high Amerindian component, around 70%; the rest are admixed with European, African and Asian ancestry [
33]. This is the first report of the predisposing associations of C*07:01, DQA1*03:02, and DRB1*09:01 and the protective association of DQB1*03:419 with HIV infection in Peruvian patients.
The present study has limitations worth noting. First, our HIV-uninfected group was selected based on self-reported risk behavior, which may underestimate actual risk behavior; therefore, we could have introduced biased information. Second, the recruitment methodology did not consider the determination of the patients’ HIV viral load. Thus, we were unable to compare the viral load with other variables. Additionally, the small sample size is also a limitation of this study, although these two parameters did not influence the test results according to the statistical analysis. This study only determined the four-digit resolution prediction for HLA typing. However, a high-throughput HLA typing resolution such as six or eight-digit resolution is crucial to increasing variant accuracy. We employed high sequencing depth and coverage (≥20X) to generate high-resolution HLA alleles.
Author Contributions
Conceived and designed the experiments: DOA, OAC, CYV, and RFA. Performed the experiments: DOA, MLGG, and OAC. Analyzed and interpreted the data: DOA, MLGG, OAC, and RFA. Data collection: SEC, CYV, and SEA. Wrote the paper: DOA, OAC, MLGG, and RFA. Manuscript review and revisions: DOA, OAC, MLGG, SEC, CYV, SEA, and RFA.