UMOD protein analysis using different bioinformatics tools:
- I.
Physicochemical findings using Protparam:
The physicochemical characteristics of the Uromodulin protein were estimated using ProtParam tool. The statistics showed that there were total 640 aa with molecular weight of 69760.86, theoretical isoelectric point (PI) was 5.05, negatively charged residues (Asp + Glu) were 69, while positively charged residues (Arg + Lys) were less that is 46. The molecular formula of protein was C
3011H
4654N
832O
952S
63, with a total number of atoms 9512. The extension coefficient can be observed either considering all pairs of Cys form Cystines in protein or all pairs of Cys are reduced. When all pairs of Cys form Cystines observed value was 101780 but with reduced pairs the value was 98780. Under in-vitro conditions the mammalian proteins half-life was estimated to be 30hrs, while in vivo environment like yeast and
E. coli, the half-life was predicted as greater than 20 and 10 respectively. For the laboratory experiments the stability of protein is critical so here instability score was also predicted which is 40.53, indicates that the studied protein is stable. Moreover, Aliphatic index was 70.69 and Grand average of hydropathicity (GRAVY) is -0.111 (
Table 4).
If we look into the detailed structure of protein, the two amino acids were observed to be higher in number that is 55 (8.60%) of Serine and Leucine. Further Glycine and Threonine with 52 (8.10%) were second highest. The remaining aa were Cys, Ala, Val, Asp, Arg, Gln and Pro were 48 (7.50%), 44 (6.9%), 44 (6.9%), 39 (6.1%), 30 (4.7%), 30 (4.7%), 29 (4.5%) respectively. Similarly ratio of carbon, hydrogen, nitrogen, oxygen and sulfur in the protein is 31%, 49%, 9%, 10%, and 1% respectively (
Figure 3).
- II.
Secondary structure prediction of UMOD using PsiPred:
This tool predicts the secondary structure of studied protein. In the following (
Figure 4), the amino acids region highlighted in pink shade (4) represents the helix, similarly yellow (34) symbolizes the strands and grey shades specifies the coils of protein (
Figure 4).
- III.
Transmembrane structure prediction using TMHMM - 2.0:
According to this tool 3.49475 aa are anticipated to be present in outside of membrane and 1.5012 are expected to be present in first 60 aa on an average that could play a part in transmembrane helical structure (
Figure 5).
- IV.
Protein motifs prediction using Motif finder:
This tool predicted that there are eight motifs in the protein which are Zona pellucida, EGF-3, EGF_CA, cEGF, EGF, hEGF, FXa_inhibition, and EGF_MSP1_1. Zona pellucida like domain located from 335 aa position to 583 aa, EGF domain is present in three regions in the whole sequence (32 to 63, 73 to 99, 118 to 148 and 299 to 321). Calcium binding EGF domain is present in two regions from 65 aa position to 99 and from 108 to 148. Furthermore, each domain’s position and i-Evalue is given in the protein sequence is given in the (
Table 5).
- V.
Post-translational modifications using ScanProsite:
This tool gives detailed insight of post-translational changes like phosphorylation, glycosylation, cell attachment sequences occur in the protein. Furthermore, it gives the count of disulphide bridges present in different domains of protein. 32-174 aa is Cysteine rich region of protein and mostly disulfide bridges are present at this region. In complete protein sequence 48 Cysteine residues and 24 disulfide bonds exists, location of disulphide bonds present within the domains of Uromodulin protein are depicted in (
Figure 6).
Results showed the phosphorylation prediction for all the residues of Serine (S), Threonine (T) and Tyrosine (Y) present in the complete sequence. Threonine sites where phosphorylation occurs are 40, 42, 51, 62, 107, 179, 296, 394, 457, 489, 573, 605 and phosphorylation at Serine residues occur at position 301, 237, 244, 291, 327,434 and 591. The kind of glycosylation which might be N-linked, C-linked, S-linked and O-linked, depends on the type of aa atom that bonds the carbohydrate chain. In Uromodulin there is only Asparagine N-linked glycosylation and sites are 38, 76, 80, 232, 275, 322, 396, and 513.
Table 6.
N-glycosylation and phosphorylation sites of Uromodulin.
Table 6.
N-glycosylation and phosphorylation sites of Uromodulin.
Amino acid position |
Glycosylation |
Phosphorylation |
38 |
N-linked glycosylation at asparagine |
|
40 |
-- |
Phospho-threonine |
42 |
-- |
Phospho-threonine |
51 |
-- |
Phospho-threonine |
62 |
-- |
Phospho-threonine |
76 |
N-linked glycosylation at asparagine |
-- |
80 |
N-linked glycosylation at asparagine |
-- |
107 |
-- |
Phospho-threonine |
179 |
-- |
Phospho-threonine |
232 |
N-linked glycosylation at asparagine |
|
237 |
-- |
Phospho-serine |
275 |
N-linked glycosylation at asparagine |
-- |
291 |
-- |
Phospho-serine |
296 |
-- |
Phospho-threonine |
301 |
-- |
Phospho-serine |
322 |
N-linked glycosylation at asparagine |
-- |
327 |
-- |
Phospho-serine |
394 |
-- |
Phospho-serine |
396 |
N-linked glycosylation at asparagine |
-- |
434 |
-- |
Phospho-serine |
457 |
-- |
Phospho-threonine |
489 |
-- |
Phospho-threonine |
513 |
N-linked glycosylation at asparagine |
-- |
573 |
-- |
Phospho-threonine |
591 |
-- |
Phospho-serine |
605 |
-- |
Phospho-threonine |
Covalent addition myristate a C14-saturated fatty acid causes acetylation to the N-terminal of a number of proteins via an amide linkage and N-myristoylation site of Uromodulin protein are 73-78, 88-93, 103-108, 116-121, 131-136, 207-212, 210-215, 233-238, 390-395, 474-479, 493-498, 608-613. The Arg-Gly-Asp (RGD) sequences range from 142-144.
- VI.
Phosphorylation prediction using NetPhos 3.1:
It displayed the Serine, Threonine and Tyrosine phosphorylation sites in the sequence of uromodulin. Pink lines show the threshold limit which is usually 0.5. Higher the threshold value of a residue higher is its possibility to be phosphorylated. Red lines indicate phosphorylation on the serine, green line on threonine and blue line shows phosphorylation at tyrosine residue (
Figure 7).
- VII.
Acetylation prediction using GPS PAIL 2.0:
This tool predicts acetylation of all the Lysine (K) residues in the subject protein sequence at position 246, 265, 307, 318, 341, 346, 350, 356, 420, 432, 436, 519, 577, 579 and 624 (
Figure 8).
- VIII.
Glycosylation prediction of Uromodulin using NetOGlyc-4.0:
This tool predicted glycosylation of carbohydrates in Uromodulin. Purple line gives an indication of threshold value (0.5). Higher threshold value of any residue showed the greater possibility of glycosylation (
Figure 9).
- IX.
Methylation prediction of Uromodulin using PRmePRed:
This tool provides the prediction for all K and R residues for methylation in subject protein. Uromodulin residues for methylation are 99, 142, 200, 204, 212, 245, 365, 385, 449, 459, 498,547, 586, 588, 597, 606, and 615.
Table 7.
Methylation sites of Uromodulin.
Table 7.
Methylation sites of Uromodulin.
Position |
Peptide |
Prediction score |
99 |
FSCVCPEGFRLSPGLGCTD |
0.692402 |
142 |
YLCVCPAGYRGDGWHCECS |
0.785598 |
200 |
EGYACDTDLRGWYRFVGQG |
0.780858 |
204 |
CDTDLRGWYRFVGQGGARM |
0.823455 |
212 |
YRFVGQGGARMAETCVPVL |
0.788485 |
245 |
PSSDEGIVSRKCAHWSGH |
0.556259 |
365 |
KVFMYLSDSRCSGFNDRDN |
0.543877 |
385 |
DWVSVVTPARDGPCGTVLT |
0.708181 |
449 |
QPMVSALNIRVGGTGMFTV |
0.763262 |
459 |
VGGTGMFTVRMALFQTPSY |
0.515069 |
498 |
TMLDGGDLSRFALLMTNCY |
0.602271 |
547 |
VENGESSQGRFSVQMFRFA |
0.932742 |
586 |
KCKPTCSGTRFRSGSVIDQ |
0.766291 |
588 |
KPTCSGTRFRSGSVIDQSR |
0.817883 |
597 |
RSGSVIDQSRVLNLGPITR |
0.676464 |
606 |
RVLNLGPITRKGVQATVSR |
0.831847 |
615 |
RKGVQATVSRAFSSLGLLK |
0.729294 |
- X.
Prediction of 3D structure using PDB RCSB:
This tool predicted 3D structure of Uromodulin where side chains showing the carbohydrate chains and inner chains represent polymer structure.
Figure 10.
3D structure of Uromodulin protein.
Figure 10.
3D structure of Uromodulin protein.
- XI.
Prediction of Uromodulin interaction networks using STRING database:
This tool is used to predict the interaction sites of Uromodulin protein which are B4GALNT2, TNF, SLC12A1, AQP2, ALB, GTPBP1, CD36, NPC1, SLc22A12, and SHROOM3 that help Uromodulin to perform its proper function (
Figure 11).
Around 11 nodes are present connecting Uromodulin protein, nodes represent protein networking. Query proteins and the first shell of interactors are represented by coloured network nodes, whereas the second shell of interactors is represented by white network nodes. Edges reflect protein-protein connections that are intended to be precise and meaningful, i.e., proteins work together to perform a common task; this does not imply that the proteins are physically bound to one another. Anticipated interactions, Black: coexpression, light blue: protein homology, light green: textmining, red: gene fusions, blue: gene co-occurrence and red represents gene neighbourhood.