2. Fibonacci-like sequences
These sequences, see [
1], could be defined in terms of the usual Fibonacci sequence by the recurrence relation (
where
denotes collectively the five sequences
and
. Their “seeds” or “initial conditions” are chosen as follows
,
,
and
. We show below, in
Table 4, the first few terms
The “seeds” described above, which were initially chosen by a trial and error thought process, have proven to be extremely appropriate and useful in their consequences, not only concerning the “ideal” classification scheme, mentioned above but also to derive a large number of interesting results. Specifically, the “seed” for the Fibonacci-like sequences
and
are in the detail as follows. For
, 13
is the number of
hydrogen atoms in serine (3)
and arginine (10) while 9
is the number of hydrogen atoms in leucine, with a total of 22 (
). For
, 30
is the number of
atoms in leucine (13)
and arginine (17) while 5
is the number of atoms in serine, with a total of 35 (
). Note, importantly, that when we say
atoms (not hydrogen atoms), we mean the whole set comprising hydrogen, carbon, oxygen, nitrogen and sulfur. We have devoted an entire section in [
1] (Section 4.2.5) to explain the usefulness of not only the choice of the “seeds” of the above sequences
and
but also the one of the other three
and
It is worth noting that the sequences
and
can give, as a secondary product, both the Fibonacci and the Lucas sequences. The difference
gives the (slightly modified) Fibonacci sequence noted
in an unusual but interesting form: its “seeds” here are inverted with respect to the usual Fibonacci sequence. Also, the sum of any of its first members until a certain index gives a Fibonacci number, exactly, contrary to the usual Fibonacci sequence with seeds 0, 1 which always gives one unit less than a Fibonacci number. For example, in our case, for
, we get
. The relation
gives the Lucas sequence:
It is important to note that the sequences in
Table 4 are highly intertwined by a (large) number of
identities connecting them (see Equ.(2) in [
1]). The reader could consult Appendix C, in [
1], to see how it is possible to check them for any large or very large values of the index n by using a computer with a mathematical software containing a built-in Fibonacci function. For low values of the index n in
Table 4, the verification could be easily done by hand or using a pocket calculator. We will also use some of these identities in our applications in this paper, as we successfully did in our recent paper, mentioned above. The identities, we need, will be presented as we go along, in the appropriate place, where we use them for the first time.
5. The 3rd base symmetry classification
In 1982, Findley et al., [
7], by viewing the genetic code as an
f-
mapping, extracted a fundamental symmetry for the doubly degenerate codons (group-II). Below, to ease the reading, we reproduce, a few elements from the above reference to help the reader understand what is the f-mapping. The authors consider the 64-codons set,
and define
where i, j, k designate the 1
st, 2
nd and 3
rd base in the codon
(B is for base, U, C, A, G).
, k
partitions
into four disjoints subsets where each subset contains only codons having the
same third base. Each of these subsets may be mapped by f into members of the amino acids set A, with the image being denoted
this is shown in
Table 6, below.
One has therefore
and
. With this f-mapping, the authors establish also relations that define a
one-to-one correspondence between one member of a
doubly degenerate codon pair and the other member (see the reference above for details). These relations could be stated, in words, as follows: (i) if a codon for an amino acid has 3
rd base U, then there is a codon for the same amino acid having 3
rd base C and vice versa OR (ii) if a codon for an amino acid has 3
rd base A, then there is a codon for the same amino acid having 3
rd base G and vice versa. For a doubly degenerate codon pair (i) and (ii) are mutually exclusive. For order-4, or quartets, (i) and (ii) hold simultaneously. For order-6, the sextets, the quartet part obeys (i) AND (ii) and, for the doublet part one has (i) OR (ii). For the odd-order degenerate codons (Ile, Met and Trp), however, there is a slight deviation from symmetry. In
Table 6, we show this classification. In the last two rows of this table, we have calculated, from
Table 3, the hydrogen atom content and the atom content in the side chains of the amino acids in the four columns, in the two views “
on” and “
off” (see
Section 1.2.). Note the hydrogen atom balances (
) and atom number
balances (
) in the last two rows in
Table 6. These express the exact one-to-one correspondence mentioned above (here the two codons of isoleucine AUU and AUC constitute an order-2
doublet). These balances will be established from our Fibonacci-like sequences below in this section.
5.1. The hydrogen atom content
5.1.1. “Activation key” on
In the U/C third-base set, there are
hydrogen atoms. In the A/G third-base set there are, respectively,
and
hydrogen atoms (grand total of
, see
Table 6 above). To describe this pattern, using our Fibonacci-like sequences, let us start again from Equ.(24) of Section 4.1.1 and write it in the following form, by expliciting the sum
Note that we have included the sixth term of the sequence , in the sum , in the second parenthesis. In this way, we reach the correct hydrogen atom pattern.
5.1.2. “Activation key” off
In this case, let us recall Equ.(27) of Section 4.1.2 (or Equ.(12) of
Section 3.2 which is the same)
and use the following identity linking the sequences
and
which, for
, writes
. By inserting this last number, 31, in the above equation and arranging, in a first step, we have
The second parenthesis in the left hand side can be written as
. This is the correct pattern for U/C third-base set but it remains to handle the other part in the above equation. A quick way consists in writing the factor
above as
as 8 is a Fibonacci number. All this lets us to put the above equation in the following form
which could be compared with the data in
Table 6 (case “
off”).
5.2. The atom content
5.2.1. “Activation key” on
Let us, here, start from Equ.(30) in
Section 4.2.1, written as
and use, first,
in cascade the recurrence relation of the sequence
Now, we arrange this relation as follows
To get the correct atom number pattern, we note that because of the following identity of the sequence
we can, for
, write
or
. By inserting this latter value in Equ.(43) above, we obtain
We recognize here the correct atom number pattern (see
Table 6)
5.2.2. “Activation key” off
This case is easily handled by starting from Equ.(34) of
Section 4.2.2. Using the recurrence relation of the sequence
(
), we write it as
Next, we use, again, the identity,
, already considered in
Section 5.2.1, but now for
:
. By inserting this relation in the equation above, we have
As the first term is already correct, we examine the second. Using the recurrence relations of both sequence
and
, we can write
and
. By inserting these values in the equation above, we end up with
which us the correct answer.
6. The “ideal” symmetry and the “supersymmetry” classification schemes
The main idea behind the “Ideal” symmetry classification scheme, [
9], is the use of the three sextets serine, arginine and leucine, each encoded by six codons, as “
generators”, with serine playing the central role. This scheme divides the 64 codons matrix in two groups of 32 codons each, the “leading” group and the “nonleading” group and each one of them consists of A+U rich and G+C rich (equal) parts. The “ideal” classification scheme is generated by combining the six codons of serine, arginine and leucine, as mentioned above, in the following manner. Serine, the
initial generator with its six codons, arginine also with its six codons and leucine with
only the quartet part of its six codons part define the whole “leading” group (with 32 codons). The remaining doublet part of leucine, on the other hand, constitutes a “seed” for the construction of the “nonleading” group (with 32 codons). In this scheme, the genetic code table is created by codons sextets based on exact
purine/pyrimidine symmetries, A+U rich/C+G rich symmetries and
Direct/Complement symmetries (see [
9]. The
Table 7 below, shows these groups.
In this table, the “leading” group is shown in yellow (A+U rich) and orange (G+C rich) while the “nonleading” group is shown in light grey (A+U rich) and light blue (C+G rich).
Soon after the publication of their paper, [
9], the authors postulated, in [
10], the existence of what they call a “supersymmetric” genetic code table, derived from the “ideal” symmetry genetic code table, and having now five symmetries between bases, codons and amino acids. These are purine-pyrimidine between bases and codons, direct-complement symmetry of codons between boxes, A+U rich and C+G rich symmetry of codons between two columns,
mirror symmetry between all purines and pyrimidines of the whole code and between second and third base of codons (see [
10]. This “supersymmetry” genetic code table is shown in
Table 8. It has been reproduced from [
10] except, for colors. Importantly, the two “mirror” symmetry axes (vertical and horizontal) are shown in dotted lines. In columns 4 and 5, the authors took (purine: 0, pyrimidine: 1). The first column in
Table 8 indicates the boxes: direct box (DB) and complement box (CB).
6.1. Hydrogen atom content
6.1.1. “Activation key” on
The hydrogen atom count is as follows, from
Table 3 and
Table 8, leading group (in yellow and orange, as in
Table 7): 192; nonleading group (in light grey and light blue, as in
Table 7): 170. To derive this hydrogen atom pattern, let us start from Equ.(25) of Section 4.1.1 and use again the equality
(from the identity in Equ.(16) of
Section 3.2 for
) to get, after arranging
which is the correct result.
6.1.2. “Activation key” off
In this case, the hydrogen atom count is as follows leading group: 192, nonleading group: 174. Here, we start from Equ.(27) of section 4.1.2
In this case, we consider, first, the number 8 and use the recurrence relation of the sequence
, to write it as
and, next, use the recurrence relation of
. With these elements, we could write Equ.(50) as follows
This is the correct result.
6.2. Atom content
6.2.1. “Activation key” on
From
Table 3 and
Table 8, we have 316 atoms in the leading group and 282 atoms in the nonleading group. Here, we start from the relation
, which led to Equ.(31) of
Section 4.2.1 but, this time, we add and subtract the quantity
, see
Table 4, to get the correct result
6.2.2. “Activation key” off
In this case, the atom number in the leading group is the same as before (316) but the atom number in the nonleading group is now equal to 286. This case could be handled by making appeal to the identity in Equ.(33) of
Section 4.2.2, which writes again for
We first write
as
, as in
Section 4.2.2, but we now (i) select
one copy of the number 61 in the above relation and write it as
, by virtue of the recurrence relation of the sequence
, and (ii) use the identity in Equ.(16) (
) for
, that is,
. This allows us to put Equ.(53) above in the form
which is the correct result.
6.3. The “supersymmetry” genetic code table
As the case of the “supersymmetry” genetic code table, [
10], has not been considered in [
1], where the 20 amino acids were all taken in the their uncharged state and proline’s side chain considered in shCherbak’s view (5 hydrogen atoms, 8 atoms and 41 nucleons), we give, here, the corresponding results and, next, consider the case where the four amino acids mentioned earlier are charged and proline with its two views,
on and
off.
6.3.1. Uncharged amino acids case and “activation key” on only
Consider, first, the identity
where we have added to both sides the same quantity
. For
, we have from
Table 4
The sum
, describing the leading group/nonleading group hydrogen atom pattern has already been obtained in [
1] but the (new) quantity
, will be useful in what follows. Using again the identity in Equ.(16) for
(
) and next the identity in Equ.(7) of
Section 3.1 for
, which gives
, we can put the left hand side of Equ.(55) in the form
If we take the number 91, the 7th term of the sequence
,
and write it as
, because
in the same sequence, we then have, from Equ.(56)
This is the Direct Boxes/Complement Boxes hydrogen atom pattern, respectively (see Table 8). (The calculations from this table go along the same lines as in the above sections. For the Direct Boxes, for example, take all the amino acids inside all of them and, taking into account the number of their codons, compute the number of hydrogen atoms, and same for the Complement Boxes.) To derive the hydrogen atom pattern for the mirror symmetry, a more elegant and quick way is as follows. Consider the identity
For
, we have
(see
Table 4). By inserting this last relation in Equ.(56) above, we get
This is the hydrogen atom pattern for the “mirror” symmetry (see
Table 8 above. See also Figure 2 in [
10] and the detailed explanations therein about this beautiful symmetry).
6.3.2. Charged amino acids case, “activation key” on and off
Now, we consider the case where (four) amino acids are in their (physiological) charged state which is the main subject in this paper.
6.3.2.1. Hydrogen atom content
In the case “activation key”
on, there are
hydrogen atoms in the Direct Boxes and
hydrogen atoms in the Complement Boxes (from
Table 3 and
Table 8). Here, we recall Equ.(25) of Section 4.1.1
By using again the identity in Equ.(16) for
,
, once, and arranging, we get
which is the correct result. In the case “activation key”
off, there are
hydrogen atoms in the Direct Boxes and
hydrogen atoms in the Complement Boxes. Here, we start from Equ.(12) of
Section 3.2 and write it as
where
from the recurrence relation of the sequence
. Next, we use the same identity in Equ.(38) of
Section 5.1.2, again for
(
), to rewrite (one copy) of the number
above
These are the correct hydrogen atom numbers mentioned above. Now, we look at the “mirror” symmetry. In the case “activation key”
on, there are
hydrogen atoms in Column 1 and
hydrogen atoms in Column 2 of
Table 8, using the data of
Table 4. Here, we start from Equ.(60) above and put it in the following correct form
where we have used the recurrence relation
of the sequence
and, next, replaced the number 53 of the latter sequence by the same number 53 of the sequence
which is equal to
. (Recall that, from Equ.(16), one has
)
In the case “activation key”
off, there are
hydrogen atoms in Column 1 and
hydrogen atoms in Column 2 (see
Table 8, data from
Table 4). Consider again Equ.(60) above
By using, repetitively, the recurrence relation of the sequence
and also the following relation
, from the identity
for
, we can put the equation above into the form
which is the correct answer.
6.3.2.2. Atom content
In the case “activation key”
on, there are
atoms in the Direct boxes and
atoms in the Complement boxes with a total of 598 (see
Table 8 and data from
Table 4). In this case, we start from the relation
(see Equ.(30 and below,
). It is now enough to write
, as a Lucas number, for example, and rewrite the above equation in the form
which describes correctly the above atom content numbers. In the case “activation key”
on, there are
atoms in Column 1 and
atoms in Column 2 (see
Table 8, data from
Table 4). Here, we start from Equ.(66) above and use the identity in Equ.(11),
with
(
. We have
By introducing the identity in Equ.(16) with
,
and arranging, we get finally the above correct atom numbers
In the case “activation key”
off, there are
atoms in the Direct boxes and
atoms in the Complement boxes, with a total of 602 atoms (see
Table 8, data from
Table 4). To describe this case, we start by writing Equ.(34) of
Section 4.2.2 as follows
Now we, first, take one copy of the number 61 and write it as
, using the identity
with
(
. Second, we write each of the other three copies of 61 using the recurrence relation
. Inserting these values in Equ.(71), we obtain
which is what we are looking for.
In the case “activation key”
off there are
atoms in Column 1 and
atoms in Column 2 (see
Table 8, data from
Table 4). It is possible to show that this case follows from the preceding one by noticing, as we did in the derivation of Equ.(64) above, that the number
is equal to
(these sequences are linked, see Equ.(16). By using the recurrence relation
and arranging, we have finally the following right answer
7. More on shCherbak’s Theory
In [
1], we derived the relation
Her which describes proline’s singularity (see [
3,
4]). Here, in this section, we go far further, by presenting com completely new results. First, consider, once again, the sequence
, more exactly
. We have, by s by squaring
It is not difficult to see, from
Table 3, that this number corresponds to the number of nucleons (or integer molecular mass) in the side chains of the amino acids coded by 23 codons, where the sextets are counted twice, and proline has 42 nucleons in its side chain and only 73 nucleons in its backbone, contrary to the other 19 amino acids having 74 nucleons in their backbones (see Equ.(74) above). Second, from the identity
, already considered in the sections above, we can write Equ.(75) as follows, using
twice
We recognize here the unit corresponding to the “singular” nucleon and the 1443 nucleons where proline, now, has 41 nucleons in its side chain and 74 nucleons in its backbone as the 19 other amino acids. Third, we can indeed derive the very molecular mass of proline from the above numbers of nucleons
and
. To see this, we make appeal to another tool from number theory,
i.e.,
modular arithmetic which has many applications in mathematics (group theory, knot theory, ring theory) and computer science (computer algebra, coding theory, cryptography, and so on), see for example [
11]. Also, several kinds of moduli are used in applications, as for example modulo 11 in the International Standard Book Number (ISBN) or mod 37 and mod 97 arithmetic in error detection in bank account numbers. We will, here, take as moduli, the integers
and
. (This is equivalent to summing the “digits” in base-100 and base-1000, respectively.) We have
The reader could use, if desired, quick online calculators for the modulo function, for example here [
12]. Using the trick of the digits summation, mentioned above (
and
, we can arrange the above relation as
. In what follows, we will use two functions from elementary number theory, Euler’s ϕ-function of an integer n which counts the number of positive integers less than or equal to n which are relatively prime to n, [
13], and also the φ-function which gives the sum of the divisors of an integer n, [
14]. In the case where the integer is a prime number p, these function simplify greatly and one has simply
and
. Noting that 43 above is the only odd number out of three (14, 14 and 44) and, what’s more, a prime “digit” (remember we are in base-100), we get by calling its φ-function
, as
. We have also
if we use
. These are the same relations as in Equ.(74) above. The numbers
and
are useful, as explained above but there is also a third number which will play, not only a role together with the other two, but it has also a meaningful interpretation. It is given by the following relation
This number corresponds to the number of nucleons in the side chains of the amino acids encoded by 23 codons (the sextets counted twice) with proline’s side chain having 42 nucleons and four amino acids are in their charged state (see
Section 1.2,
Table 3 and above it):
In the first parenthesis, 1 corresponds to the supplementary nucleon in proline’s side chain. In the second parenthesis, 1 corresponds to the charged arginine. In the third parenthesis, the units correspond respectively to lysine (charge +1), aspartic acid (charge -1) and glutamic acid (charge -1). We have therefore three meaningful numbers:
,
and
. From these, we consider the following expression
and take its
-function, the sum of its prime factors (
), see below about this function.
This number is equal to the number of nucleons (or molecular mass) of the
residue of proline (see [
5],
Table 1). When two amino acids (or more) combine to form a peptide, a water molecule (two hydrogen atoms and one oxygen atom) is released and what remains of each amino acid is called a
residue. Here, we have
, which is the molecular mass of the water molecule. Note that we have also, using two of the above numbers, 444 and 445
Both relations give the same result, 97. From Equs.(81-82), we have the two-fold result
Finally, it is also possible to derive the detailed atomic composition of the (whole)
molecule of proline:
. Start from Equ.(81) and then add the quantity
Now,
, as a Fibonacci number, it could be decomposed successively as
and, next, as
. By inserting this decomposition in the above equation and arranging, we have
This is the correct result. The number 60 has the prime factorization and gives 5 carbon atoms (carbon nucleus: 6 protons, 6 neutrons). The number 14 has the prime factorization and corresponds to one nitrogen atom (nitrogen nucleus: 7 protons, 7 neutrons). The number 32 has the prime factorization and corresponds to two oxygen atoms (oxygen nucleus: 8 protons, 8 neutrons). The last number, 9, corresponds to 9 hydrogen atoms.
In order to fully understand the reasoning presented below, it is important for the reader to keep in mind that, when looking at Equations 77 and 80, 1443 represents the number of nucleons in the side chains of the amino acids coded by 23 codons with the sextets counted twice and proline having 41 nucleons in its side chain, while 1444 represents the number of nucleons in the side chains of the amino acids coded by 23 codons with the sextets counted twice and proline now having 42 nucleons in its side chain. In fact, it appears that there is compelling evidence that the calculations performed here are "locked" technically. Below, we will show why but, before doing that, let us recall, briefly, a few elements of our so helpful arithmetic function
(see Appendix B in [
1]). From the Fundamental Theorem of Arithmetic, an integer n can be represented, uniquely, as a product of prime numbers irrespective of their order:
. The function
is defined by the formula
where
is the sum of the prime factors (including the multiplicities)
,
is the sum of the Prime Indices of the prime factors (including the multiplicities)
and
, so-called Big Omega function, is the number of the prime factors
. The portion
of this function was already involved above in the derivation of Equ.(81).
Now, let us look at the moduli
and
which were,
together with the numbers
and
,
critical in the derivation of Equs. (77), (80) and (82). Their prime factorization is given by
and
. We have
and
and, therefore,
. This is nothing but,
again, the integer molecular mass of proline’s residue, see Equs.(81)-(82). Also, by isolating the two terms
and
, in
, and including them in
, we get
. This is a more accurate description of proline’s
residue (see [
5],
Table 1), which could also be seen from Equ.(81) above, remembering that 89 is a Fibonacci number,
By pushing the precision to the extreme, we can arrange the side chain part as follows
, where we have made explicit the portions of
. We have 3 carbon atoms (atomic mass 12) and 6 hydrogen atoms, see the side chain in
Figure 1 below. Observe the last term, interpreted as 6 hydrogen atoms in the side chain, (
), with one hydrogen atom being susceptible to be “transferred” from the side chain to the backbone (shCherbak’s “borrowing”, see above and
Table 3). Of course, one has to add
, from Equ.(83), the water molecule, to get the whole molecule of proline. Below, in
Figure 1, we show it with the side chain boxed.
The unique charm and covert attraction of proline's structure are concealed inside the integer molecule masses, just waiting to be gently revealed through the use of modular arithmetic.
8. Multiplet structures
This section deals with another application of our Fibonacci-lke sequences, more precisely, the sequence
and
. In [
15], we have derived the exact multiplet structure of the genetic code, starting from the total number of codons, 64, expressed from the beginning, as
and using Fibonacci/Lucas decompositions. We subsequently used either a property of “superperfect” numbers or the relation between Fibonacci and Lucas numbers to write one factor 8 as
and next 7 as 3+4 to derive the above-mentioned multiplet structure. Here, we show that all the ingredients of this derivation are, in fact, already
ostensibly embedded in our Fibonacci-like sequences. Take
(see
Table 4). First, there is the recurrence relation
. This is the decomposition of the number 8 mentioned above, obtained here without recourse to “superperfect” numbers, for example . Next, from the Lucas sequence in Equ.(4),
, which is derived from the Fibonacci sequence
in Equ.(3), itself derived from the sequences
and
in Equ.(2), we have
. This is all we need to write
which leads, after writing the Fibonacci number 8 as
, to the following multiplet structure of the (standard) genetic code which could be expressed in two equivalent forms, Equ.(87) and Equ.(88)
The form in Equ.(87) describes Rumer’s division (see
Section 4): 5 quartets (4 codons each) and 3 quartet-parts of the 3 sextets (4 codons each, in the first parenthesis (set
), and 9 doublets (2 codons each), 3 doublet-parts of the 3 sextets (2 codons each), 1 triplet (3 codons), 2 singlets (1 codon each) and 3 stops (3 codons), in the second parenthesis (set
). The form in Equ.(88) describes as for it the usual multiplet structure: 5 quartets, 3 sextets (6 codons each,
), 9 doublets, 1 triplet, 2 singlets and 3 stops. The vertebrate mitochondrial genetic code could also be easily derived from Equ.(88), see [
1]. In fact, in unpublished notes, we have also derived from Equ.(86), with some little work, several other multiplet structures of the (non-standard) genetic codes. Let us give, here, only one example: the
Alternative Yeast Nuclear Code (#12 in the database [
16]. In this code, shown in
Table 9 below, the only change concerns the reassignment of the codon CUG of leucine which now codes for serine. We have therefore 5 quartets (V, A, T, P, G), 1 sextet (R), 1
quintet (L, UUR, CUY, CUA), 1
septet (S, UCN, AGY, CUG), 9 doublets (F, Y, C, H, Q, D, E, N, K), 1 triplet (I), 2 singlets (M, W) and 3 stops. To describe this code, let us start from Equ.(88) and rewrite it in the form
by selecting a factor
and developing it as
. Now, we write the Fibonacci number 8 as
and insert it in Equ.(88). We have, writing again
This relation describes this code. Arginine, the term , is now the only sextet left. The term is suitable for the quintet leucine coded now by five codons CUA (1 codon), CUY (2 codons), UUR (2 codons). The term describes the septet serine coded now by seven codons UCN (4 codons), AGY (2 codons) and CUG (1 codon). The remaining terms are the usual ones (see above). The case of the other non-standard genetic codes could be handled along the same lines with, of course, some additional work.