1. Introduction
The horse (
Equus caballus), a representative of the perissodactyls, is a popular farm animal. Due to its great socio-economic, biomedical and evolutionary value, this species has played a key role throughout human history. The horse is commonly used as an animal model for human genetic diseases and, since the 19th century, has contributed our understanding of immune response [
1,
2,
3,
4]. The current interest in equine immunology makes it necessary to investigate the genetics underlying the equine immune response, and the generation of a reference genome contributes significantly to this goal. Within adaptive immunity, T lymphocytes are key components in the context of cell-mediated response, thanks to the ability of the T cell receptor (TR) they express to perceive peptides of various nature.
The TRs are heterodimeric T lymphocyte membrane proteins consisting of one α and β chain or one γ and δ chain. The αβ T cell receptor is able to specific recognize a multitude of antigenic processed peptides presented on heterologous cells by the class I or class II major histocompatibility complex (MHC) molecules [
5]. The γδ TR are not limited by MHC restriction but they have the potential to recognize directly and without processing, diverse molecular entities in a manner similar to immunoglobulins or presented by RPI-MH1Like proteins, providing more flexible and not yet clear recognition of a wide range of ligands (soluble, membrane-bound, and unprocessed antigens) [
6].
Each TR chain comprises a variable (V) domain in its N-terminus and a constant (C) region, in the C-terminus encoded by multigene families arranged in a TR locus [
7]. To generate a large repertoire of TR capable of recognizing diverse antigens, during the development of the T lymphocytes in the thymus, the variable (V) and joining (J) genes of the TR a (TRA) and g (TRD) loci and the V, diversity (D) and J genes of the TR b (TRB) and d (TRD) loci undergo somatic recombination process, with the resulting rearranged V(D)J regions encoding the V domains of the a/b and g/d TR chains [
7].
Random trimming and addition of non-template nucleotide (N additions) at the VD and DJ junctional sites greatly increases diversity and form the most variable complementarity-determining regions 3 (CDR3-IMGT) domain usually accountable for antigen recognition. Two other hypervariable loops, CDR1 (CDR1-IMGT) and CDR2 (CDR2-IMGT) are encoded by the germline V gene. CDR loops are spaced between four framework regions (FR-IMGT).
The process of somatic recombination involves the site-specific cleavage of DNA by RAG-1 and RAG-2 proteins at conserved recombination signal (RS) sequences flanking the borders of the V, D and J genes, composed of conserved heptamers and nonamers with intervening non-conserved 12- or 23-bp spacers. After transcription the V(D)J sequence is spliced to the C gene.
As evident, the recombination process implies that the number of V, D and J genes in the germline DNA is an important determinant of the extent of the primary TR repertoire, becoming the subject of study in many species. These comparative analyses, although not very useful in themselves, may reveal new insights into the multiple distinct mechanisms that may create TR diversity in vertebrates. [
8].
Among the four TR loci, the genomic structure of the TRB locus is the most conserved across different species of mammals, with a V gene cluster (TRBV), consisting of a number that varies from species to species, positioned upstream of tandem-aligned TRBD-J-C clusters, each composed of a single D (TRBD) gene, several J (TRBJ) genes, and one C (TRBC) gene. A single TRBV gene with an inverted transcriptional orientation is located at the 3’ end of the locus. In most mammalian species [
https://www.imgt.org/IMGTrepertoire/;
9] two TRBD-J-C clusters exist. Exceptionally, a third TRBD-J-C cluster was identified in cetartiodactyl species [
10,
11,
12,
13,
14,
15].
However, in all mammalian species, MOXD2 and EPHB6 genes border each TRB locus at the 5’ and 3’ end, respectively, whereas TRY genes are interspersed among TRBV genes, arranged in two distinct genomic positions [
https://www.imgt.org/IMGTrepertoire/;
9].
In this paper, we provide a comprehensive genomic and expression analysis of the TRB locus in Equus caballus for the first time in a mammalian species belonging to the Perissodactyla. Based on the recently released horse genome, our analysis study showed that the horse TRB locus represents the largest locus identified so far in mammalian species, with 136 TRBV, 2 TRBD, 13 TRBJ and 2 TRBC genes in approximately 900 Kb. Furthermore, the availability in public databases of a transcriptome derived from splenic tissue of a healthy adult horse, allowed us to evaluate the characteristics of the V(D)J somatic rearrangements and the variability of the b-chain repertoire through a clonotype analysis. Although a significant number of germline TRBV genes are non-functional (69%), our results clearly demonstrate that the horse β-chain has a level of variability that is substantially similar to that described in other mammalian species. This is due not only to the present of effective TRBD and TRBJ genes available for somatic recombination, but also to the presence of inter-cluster and trans-cluster rearrangement process in addition to canonical intra cluster recombination.
4. Discussion
In modern times, interest in the species Equus caballus continues to promote genetic studies on this species and, undoubtedly, the generation of the reference genome assembly holds the promise to further improve our knowledge of it.
Horse genome analysis is indeed a very active area of research, aimed at identifying the genes responsible for various hereditary traits associated with fertility, athletic performance, disease resistance, etc., and at delineating the genetic makeup of individual horse breeds and populations [
1,
2,
3,
4]. The investigation also extends to the equine immune response, and, in this context, it begins to identify specific situations in which the horse may provide a unique immunological model for human diseases [
1,
2,
3,
4].
The present study, by defining the organization of the horse TRB locus, contributes to a better understanding of the repertoire of horse TRB genes and their evolution.
As expected, the general genomic organization of the horse TRB locus is similar to that of most other mammalian species in which it has been defined, with a V-cluster positioned upstream of two tandem-aligned TRBD-J-C clusters, although a substantial difference is its extent, making it the largest TRB locus identified to date. Notably, the same feature was found in the analysis of the previously annotated horse TRG locus [
34]. Much of the large horse TRB area (900 KB) is occupied by the V-cluster (135 TRBV genes), while the two TRBD-J-C clusters span approximately 6.5 Kb each. The size of the V-cluster can be related to the extent and complexity of gene duplications that have occurred in this region which led to the emergence of numerous multi-member TRBV subgroups. In fact, only five of the 29 TRBV subgroups are represent by a single member gene. In comparison, in human the majority of the TRBV subgroups consists of a single member gene. It is noteworthy that within large subgroups, such as the TRBV5 subgroup, there is greater diversification among some member genes (nucleotide identity is less than 75%), which would indicate the need for evolution to find new functions. These genes have nevertheless been assigned to their own subgroup by phylogenetic analysis. A similar situation was found in the sheep TRD locus among the TRDV1 genes whose subgroup is represented by over 40 member genes [
35].
Since our phylogenetic analysis revealed the clustering of each horse subgroup with a corresponding human one, except for the TRBV9 subgroup, which is missing in the horse, we can establish that gene duplication within each subgroup rather than the emergence of distinct subgroups is the major mode of evolution of the horse TRBV genes.
It is noteworthy that gene duplications generated a horse TRBV germline repertoire consisting of a higher percentage of pseudogenes (66,4%) than the human one (26.5%). However, the total number of functional genes is identical (47), although differences in the profile of the functional genes can be observed, such as the absence of functional TRBV1 and TRBV21 genes in humans.
Dot-plot analyses of the TRBV region reinforce the conclusion that the V-cluster arose through a series of complex duplications and suggest that these duplication events exhibit different patterns, with genes duplicating individually or genes duplicating together giving rise to extensive duplication units. The overall view of the matrix highlights how each large gene unit has duplicated itself along the entire cluster V, interspersing itself with the others.
In another mammalian species in which the architecture of the TRB locus has been analysed, duplications have generally involved one or a few portions (usually at the 5') of the V-cluster and the gene units were mostly tandemly duplicated [
12,
14,
28,
29].
It is possible that these duplication events were mediated by repetitive sequences. Within the horse TRB locus the percentage of repetitive elements is 30.35%, with LINEs being predominant over SINEs (19.77% versus 1.66%). The percentage of LINEs is the highest as well as that of SINEs is the lowest found among various previously analysed TRB loci. [
12,
14,
28,
29]. For instance, in the human TRB
, LINEs are 16.86% and SINE are 6.62%; in dog, 11.98% of repeat sequence are LINEs and 7.19% are SINEs; in pig and rabbit TRB loci, LINEs are 14.62% and 10.81%, respectively, even if the most abundant repeat elements are SINEs (15.65% and 14.18 %, respectively).
Therefore, LINEs may have played a key role in the evolution of the horse TRB locus. Indeed, if we look closely at the pip of
Figure S2, we can observe that LINEs are distributed along the entire locus and are particularly localized between duplicated blocks, which strongly suggests their contribution to the architecture of the locus.
To explore the functional implication of the genomic characteristics of the horse TRB locus, we analysed the V-D and D-J junctions, including CDR3, of transcripts retrieved from a public splenic RNA library. Although the number of unique clones analysed was not very high and referred to a single tissue of a single adult animal, and although not in all clones TRBV or TRBD genes could be identified, some considerations can be made. With the exception of TRBV1, TRBV10, TRBV13, and TRBV16, all TRBV subgroups containing functional genes were found in the clonotype collection, and the TRBV usage ranged from 0.006% of TRBV29 to 21.1% of TRBV20 or 16.1% of TRBV,5 considering only the subset of clonotypes where TRBV subgroup assignment was unambiguous. TRBV21 (13.0%), and TRBV28 (10.5%) subgroup genes were also frequently found.
For comparison TRBV20-1, TRBV5-1, TRBV29 and TRBV28 represent the top four most expressed TRBV genes in human peripheral blood leukocytes collected from 550 individuals [
36].
By CDR3 analysis, we also determined that 39.5% of recognizable TRBD clonotypes are TRBD1 and 60.5% are TRBD2. The preferential usage of TRBD2 results from intra-cluster rearrangements but also from an inter-cluster rearrangements and trans-rearrangements. This latter type of rearrangenment was also found in dogs and sheep [
28,
32]. For comparison, an almost identical use of the two TRBD genes was observed in human peripheral blood leukocytes with a slight prevalence for TRBD2 (50.1%) compared to TRBD1 (49.9%) [ref]. Conversely, in dog peripheral blood leukocytes, the TRBD1 gene appears more represented (71.4%) than TRBD2 [nostra ref dog]. However, variations in the TRBD usage may also be due to the difficulty in recognizing these genes in the transcripts.
Finally, CDR3 analysis revealed that the TRBJ2 cluster appears to be preferentially used (about 70%). It is possible that the preferential usage of the TRBJ2 set may depend on the number of J genes concentrated in a small genomic region, if multiple 12-RSs are important to increase the local recruitment of the RAG proteins [
37]. In this regard, it is notable that the TRBJ1 gene set of all the species [IMGT,
http://www.imgt.org;
28,
29], including the horse, is located in approximately 2.1 Kb, while the TRBJ2 genes are grouped in approximately 1 Kb. Therefore, the TRBJ2s seem to be crucial for the production of the beta chain transcripts. Alternatively, the frequency variation may depend on the tissue analysed. However, regardless of the genes involved in somatic recombination, the conservation of CDR3 length across species [
12,
28,
33] indicates that it is a TR beta chain feature essential for T cell function.