1. Introduction
It is quite difficult to answer to the question “Which came first? The chicken or the egg?”. The reason is because a hen is indispensable for an egg to be laid down and on the contrary an egg is indispensable for a hen to be born. Therefore, it is often stated that the “chicken-egg relationship” is one representative example, which it is quite difficult or even it might be impossible to solve because of a kind of circular argument.
1.1. “Chicken and egg relationship” between gene and protein
Gene and protein have the “chicken and egg relationship” as similarly to the genuine “chicken and egg” (
Figure 1) [1-4]. Needless to state, gene and protein simply described in this article are visible things, which are saved into various gene and protein databases. In addition, as well known, both gene and protein cannot be synthesized by random joining of nucleotides and amino acids, respectively [
5]. In other words, gene and protein mean a mature gene and a mature protein, respectively. Every mature protein is always synthesized through expression of the corresponding gene. Therefore, any protein cannot be produced in the absence of gene. On the other hand, at least some proteins are always required to express genetic information. Therefore, genetic information carried by a gene cannot be expressed in the absence of protein. How was the “chicken-egg relationship” between gene and protein formed? It is the purpose of this article to give an answer to the question.
1.2. Definitions of mature gene/protein and immature gene/protein
Gene described simply in the above sentences naturally means a gene encoding a mature protein. However, it needs to suppose immature genes encoding an immature protein to make clear the formation process of mature genes encoding a mature protein and to give a correct answer to the question, “which came first, gene or protein?”. Furthermore, there are large differences among some types of immature proteins as described below. Therefore, in addition to abbreviations of several words used in this article (
Table 1), definitions of genes and proteins are summarized in
Table 2 for readers to make it easier to understand the content of this article and to address the key points about mature genes, immature genes and so on.
Mature genes are genes encoding a mature protein, which is formed upon maturation of an immature gene. Mature proteins are proteins, which are encoded by a gene and are frequently called as a precision polymer machine. Note that the maturation of an immature gene became possible, which was always lead by enhancement of a weak activity on surface of an immature protein [
5]. Only base sequences of mature genes and amino acid sequences of mature proteins are usually saved in various databases.
There are two types of immature genes [
5]. Immature gene 1, is the first double-stranded (ds)-RNA, which was formed by random joining of GNC anticodons carried by anticodon stem loop (AntiC-SL) tRNA. Therefore, an immature gene 1 encodes an immature [GADV]-protein having a random [GADV]-amino acid sequence (
Table 2).
In the second type of immature gene or immature gene 2, there are three types of GC-NSF(a)s; GC-NSF(a) 2-1: GC-rich (GNC)
n-NSF(a) encoding an amino acid sequence consisted of four [GADV]-amino acids in GNC code era. GC-NSF(a) 2-2: GC-rich (SNS)
n-NSF(a) encoding amino acid sequence composed of 10 types of amino acids in SNS code era. GC-NSF(a) 2-3: Nonstop frame on antisense strand of GC-rich (NNN)
n genes (GC-NSF(a)), which was first recognized as a field for formation of an entirely new gene in universal genetic code era [
6] (
Table 2).
There are three types of immature proteins [
5]. Immature protein 0: an immature [GADV]-protein formed as peptide aggregates, which were produced by direct random joining of [GADV]-amino acids or with activated [GADV]-amino acids. Immature protein 1: an immature [GADV]-protein, which was produced by expression of one strand of the first ds-(GNC)
n RNA. Immature protein 2: an immature protein, which was produced by expression of three types of GC-NSF(a)s or GC-NSF(a) 2-1, GC-NSF(a) 2-2 and GC-NSF(a) 2-3 (
Table 2).
Naturally, immature protein 0, which was produced in the absence of ds-RNA, could not be matured to a mature [GADV]-protein. On the contrary both an immature protein 1 and 2, which were encoded by either one of strands of ds-RNA and by a GC-NSF(a) of ds-RNA/DNA gene, could evolve to a mature protein, respectively. Note that all the three types of immature proteins, 0, 1 and 2, were produced under respective protein 0
th-order structures [
5,
7].
1.3. The “chicken and egg relationship” between gene and protein in RNA world hypothesis
As a matter of course, elucidation of formation process of the genetic system composed of gene and protein holds one important key for solving the mystery of the origin of life. However, the “chicken-egg relationship” has become one large obstacle against solving the mystery. At just the time, RNAs, which have a catalytic activity similarly to an enzyme and a chemical structure similar to DNA, or ribozymes, were discovered [
8,
9]. The discoveries triggered proposition of the RNA world hypothesis on the origin of life [
1]. Therefore, it was considered that the mystery of the origin of life might be solved, if RNA itself can synthesize RNA or RNA can be self-replicated. In fact, many researches studying in the field of the origin of life have studied at the basis of the RNA world hypothesis. However, before long it has become to understand that there are many fatal flaws in the RNA world hypothesis [5,10-12]. For example,
In fact, the mystery of the origin of life has not been solved thus far due to the above reasons.
These mean that the origin-life investigation must be started over again from scratch, if the mystery of the origin of life cannot be solved by the RNA world hypothesis. In parallel, it was found that he questions, as “how was the genetic system composed of gene and protein formed under random processes on the primitive Earth?” and “how did life emerge on the primitive Earth?, must be reconsidered from the beginning.
1.4. The reasons why the “chicken and egg relationship” between gene and protein could not be solved thus far
Consider here what makes it difficult to solve the “chicken-egg relationship” between gene and protein and what needs to make clear the formation process of the relationship.
Those would be the reasons why it was unable to give an answer to the question, which came first, gene or protein?.
1.5. The “chicken-egg relationship” between gene and protein was established upon formation of a mature gene
The formation process of the “chicken-egg relationship” should be made clear, if the formation process of mature genes, which encode a mature protein with an elaborate structure, could be understood, because the formation process of the “chicken-egg relationship” must be directly related to the acquisition of amino acid sequence information of a mature protein by the respective genes. Thus, we have noticed that the formation process of the “chicken-egg relationship” between genes and proteins could be reasonably explained through formation process of mature genes.
It is necessary to understand the following three types of formation processes of new genes in order to give an answer to the question about the “chicken and egg relationsip”. Formation processes of mature genes are classified into the following three.
How was the first gene generated?
How were entirely new genes except the first gene, which do not possess any meaningful homology with any other previously existed genes, formed?
How were new genes having meaningful homology with any other previously existed genes formed?
Then, the formation processes of the three types of new genes are explained in order in the following Sections. Note that entirely new genes including the first gene were formed by maturation of immature gene through either one way for the above two formation processes 1 and 2 of a mature gene. On the contrary, new homologous genes encoding a mature protein were formed by transition of an original gene without using an immature gene. The “chicken-egg relationship” between gene and protein was established at the moment of completion of the respective mature genes.
2. How did the first gene acquire an amino acid sequence information of a protein?
2.1. The key points for the formation of the first (GNC)n gene
The formation process of the first gene is explained using schematic drawing of the whole process from chemical evolution (step 1) to the emergence of life (step 8) via formation of the first gene (step 7) (
Figure 2) [
5]. The first gene encoding a mature protein was generated through the process from chemical evolution, [GADV]-microsphere formation, formation of a primeval metabolic system, formation of AntiC-SL tRNA or prototype tRNA and establishment of GNC code. The process proceeded owing to various functions expressed on immature [GADV]-proteins, which were produced by random joining of [GADV]-amino acids.
What are key points for the formation of the first (GNC)
n gene shown in
Figure 2? (1) Water-soluble globular immature [GADV]-protein 0 could be produced even by random joining of [GADV]-amino acids owing to the protein 0
th-order structure or [GADV]-amino acid composition, which satisfies the four conditions (hydrophobicity/hydrophilicity, a-helix, b-sheet and turn/coil formabilities) for water-soluble globular protein formation [16-18]. (2) Immature protein 1 was synthesized through transcription of ds-RNA and translation of the transcript. Note that immature [GADV]-proteins produced by expression of random (GNC)
n codon sequences are substantially the same with immature [GADV]-proteins produced by direct random joining of [GADV]-amino acids. Therefore, both immature [GADV]-proteins 0 and 1 could be folded into the respective globular structures in water at a high probability [16-18]. In other words, the first gene was generated through the process of maturation of an immature protein synthesized under a primeval transcription/translation system including many catalytic functions of such immature [GADV]-proteins, which were produced in the absence of any genetic function [
5].
Although both immature protein 0 and immature protein 1 could be folded into immature and flexible but water-soluble globular structure [
5], there is a crucial difference between immature protein 0 and immature protein 1, because amino acids are only randomly arranged from protein 0
th-order structure in protein 0 and on the contrary an immature protein 1 was produced based on the ds-RNA (
Table 2). Therefore, there was no possibility for protein 0 to evolve to a mature protein, although it was possible for protein 1 to evolve to a mature protein by using memorizing ability of base changes of ds-RNA. Thus, life could emerge about 4 billion years ago owing to indirect function of protein 0 and direct function of protein 1 (
Figure 2).
2.2. Evolutionary process of synthesis of immature protein 0
Next, explain some processes to formation of the first gene in more detail. Only synthesis of an immature protein 0 could be carried out by direct random joining of [GADV]-amino acids, and by random joining of [GADV]-amino acids with activated amino acids, such as [GADV]-AMPs, [GADV]-
3’ACC
5’, [GADV]-amino acids carried by nonspecific
3’ACC
5’-AntiC-SL RNAs. Such uses of activated [GADV]-amino acids were the results that [GADV]-microspheres searched for mechanisms producing immature [GADV]-proteins with a higher [GADV]-amino acid content and a higher catalytic activity than before [
5].
2.3. From formation of single-stranded (GNC)n RNA to generation of the first (GNC)n gene
Successively, essentially random polymerization of [GADV]-amino acids were also carried out to produce an immature [GADV]-protein 1. That is, the immature [GADV]-protein 1 was synthesized using ds-(GNC)
n RNA, which was formed by complementary strand synthesis of ss-(GNC)
n RNA, under a prototype of tarnscription-translation system with immature [GADV]-proteins and nonspecific AntiC-SL tRNAs (
Figure 3) [
5].
Evolution of a weak catalytic activity, which appeared on a surface of the immature [GADV]-protein 1, to a mature [GADV]-protein with a higher catalytic activity generated the first mature gene (
Figure 3). Thus, the first gene encoding one mature protein was generated owing to many immature [GADV]-proteins 0, which were produced by random joining of [GADV]-amino acids, independently of mature genes.
Of course, the “chicken and egg relationship” can be observed not only between the first gene and the protein, which was produced by expression of the first gene, but also in all pairs between a mature gene and the corresponding mature protein, such as even between modern genes and modern proteins. So, next, it is explained how entirely new genes and homologous genes were formed, so that the reason, why the “chicken and egg relationship” is observed between all pairs of mature genes and mature proteins, which were produced by genetic expression of the corresponding genes, can be understood.
2.4. Formation process of entirely new genes after generation of the first gene
Three types of GC-NSF(a)s 2-1, 2-2, 2-3 were used to generate entirely new genes in every genetic code era (
Table 2). (1) ds-(GNC)
n genes in GNC primeval genetic code era. (2) ds-(SNS)
n genes in SNS primitive genetic code era. (3) GC-rich genes having a codon sequence similar to (SNS)
n sequence in the universal (standard) genetic code era [5,16-18]. The immature proteins 2, which were produced by expression of antisense sequences of the respective three types of GC-rich genes, could be folded into water-soluble globular structures at a high probability, because the immature proteins could satisfy the four or six conditions for water-soluble globular protein formation [5,16-18]. Therefore, mature proteins, which were formed through maturation of the respective immature proteins, are entirely new [GADV]-proteins in GNC code era, entirely new SNS-coding proteins in SNS code era and entirely new proteins encoded by GC-rich genes in universal genetic code era, respectively (
Figure 4) [
5].
Thus, many entirely new genes were generated through maturation of an immature protein produced by expression of immature genetic information, which was written in a nonstop frame on antisense strand of three-types of GC-rich gene (pan-GC-NSF(a)) (
Figure 4) [
5]. Naturally, all the maturation process are essentially the same as the case of formation of the first gene except usage of ds-RNS or of GC-NSF(a) (
Figure 3). This means that the “chicken and egg relationship” between one gene and the corresponding protein was always established at the moment when a gene encoding a mature protein was completed by maturation of a ds-RNA and a GC-NSF(a) encoding an immature protein, respectively (
Figure 4). All entirely new genes were formed owing to immature genes encoding an entirely new immature protein, although it is natural when the first gene was generated in the absence of gene [
5]. Thus, immature proteins played the lead role in generating two types of entirely new genes, one is the first gene and the others are all entirely new genes formed after generation of the first gene (
Figure 3 and
Figure 4). Then, the propensity to evolve gradually from an immature protein with a weak catalytic activity to a mature protein with a high activity as accumulating necessary mutations was the motive force, which lead to formation of all types of entirely new genes [
5].
2.5. Formation process of producing homologous genes
It should be noted that the process generating a homologous gene as accumulating necessary mutations on sense codon sequence after gene duplication [
19] is a transition process from an original gene encoding a mature protein to a new gene encoding another mature protein homologous to the original protein (
Figure 5). That is, the process is a simple transition from a mature protein to another new mature protein, although intermediates, which appeared during the transition process, might pass through a kind of immature state with some flexibility to adjust the original catalytic site to a new substrate.
3. Discussion
It is described in this article that formation process of the “chicken-egg relationship” between gene and protein can be reasonably explained from the standpoint of the GADV hypothesis. That is, according to the GADV hypothesis, it is considered that the first gene encoding a mature [GADV]-protein could be generated through a maturation process from an immature [GADV]-protein 1, which was produced by expression of one strand of ds-(GNC)
n RNA (
Figure 3). This means that the question, “which did first arise, gene or protein?” was an unanswered question. The formation process of the “chicken-egg relationship” could not be understood forever, as long as the question is forthrightly considered.
3.1. The “chicken-egg relationship” of gene /protein was formed by maturation process of an immature protein
The first “chicken-egg relationship” was established at just the moment, when the first gene encoding a mature protein was formed owing to an immature protein 1 (
Figure 3). However, the “chicken-egg relationship” cannot be observed between the immature gene and the corresponding immature protein (
Figure 6).
All other new genes encoding a mature protein with a high catalytic activity were formed from three types of GC-NSF(a)s 2 ((GNC)
n(a) 2-1, (SNS)
n(a) 2-2 and ordinary GC-NSF(a) 2-3) encoding an immature protein 2, through maturation of an immature gene (
Figure 3 and
Figure 4), or by transition from a parental mature gene to a daughter mature gene (
Figure 5). Therefore, the “chicken-egg relationship” always arose at the time when a gene encoding a mature protein was formed. In this way, various genes encoding a mature protein were generated and life emerged when genes necessary for life to live could be equipped. The wonderful present Earth, on which versatile organisms are inhabiting, has been formed as those organisms have been accumulating many versatile genes under mechanisms generating various mature genes. Consequently, many mature genes and many mature proteins, which have the “chicken-egg relationship”, have been formed and only data of such genes and proteins are saved in the present gene/protein databases. Inversely stating this, intermediate genes/proteins in a maturation process could not be generally seen (
Figure 6), although it might become possible to see even those intermediate genes/proteins by future investigation of modern databases of genes and proteins. The invisible process starting from an immature gene and proceeding to a mature gene have made it difficult to give an answer to the question or formation process of the “chicken-egg relationship” thus far (
Figure 6). The reason, why it has been considered that it is quite difficult or might be impossible to explain the formation process of the “chicken-egg relationship” between gene and protein, is because the maturation process from an immature gene to a mature gene have not been noticed thus far. GADV hypothesis has succeeded to visualize the invisible process and to explain the formation process of the “chicken-egg relationship”. This also would support the validity of the GADV hypothesis [
5,
16,
17].
3.2. The “chicken-egg relationship” observed among 6 members consisting of the fundamental life system
Actually, the “chicken-egg relationship” can be observed even in all combinations among the 6 members used in extant organisms(
Table 3). The reason, why the “chicken-egg relationship” can be observed in all the combinations, is explained below, although it is shown here only the “chicken-egg relationships” between gene and the other four members except protein and between protein and the other four members except gene to simplify the discussions.
3.3. The “chicken-egg relationships” between gene and other four members except protein
Gene and genetic code: Genetic code is meaningless and useless in the absence of gene. Genetic information cannot be written in RNA or DNA strand in the absence of genetic code. This means that gene never be formed without genetic code.
Gene and tRNA: Genetic information in a gene cannot be expressed in the absence of tRNA. On the other hand, tRNA for translation of genetic information is useless in the absence of gene.
Gene and metabolism: Genetic information carrier, RNA or DNA, cannot be produced in the absence of metabolism, in which nucleotides are synthesized. Metabolic system cannot be formed in the absence of gene, because bio-catalysts or mature enzymes cannot be synthesized in the absence of gene.
Gene and cell structure: Cell structure cannot be constructed in the absence of gene, because membrane proteins cannot be produced without gene. Genes would be dispersed if cell membrane is absent, even if genes could be formed.
3.4. The “chicken-egg relationships” between protein and other four members except gene
Protein and genetic code: Protein cannot be synthesized without genetic code, because genetic information cannot be translated without genetic code. Genetic code mediating between genetic information and protein is useless in the world in the absence protein.
Protein and tRNA: tRNA cannot be produced without protein (enzyme). Protein cannot be synthesized without tRNA, because genetic information in gene cannot be translated into amino acid sequence in the absence of tRNA.
Protein and metabolism: Metabolic system cannot be driven without protein (enzyme). Protein cannot be produced in the absence of metabolism, because amino acids, of which protein is composed, are not supplied through metabolic system.
Protein and cell structure: Cell structure cannot be constructed in the absence of protein, because membrane using proteins cannot be produced in the absence of proteins. Proteins would be dispersed without cell structure, even if proteins could be synthesized.
It must be emphasized here that the “chicken-egg relationships” observed among all combinations of 6 members can be explained based on the “chicken-egg relationships” observed between mature genes and mature proteins.
References
- Gilbert, W. The RNA world. Nature 1986, 319, 618. [Google Scholar] [CrossRef]
- Robertson, M.P.; Joyce, G.F. The origins of the RNA world. Cold Spring Harb. Perspect. Biol. 2012, 4, a003608. [Google Scholar] [CrossRef] [PubMed]
- Sankaran, N. How the discovery of ribozymes cast RNA in the roles of both chicken and egg in origin-of-life theories. Stud. Hist. Philos. Biol. Biomed. Sci. 2012, 43, 741–750. [Google Scholar] [CrossRef] [PubMed]
- Sankaran, N. The RNA World at Thirty: A Look Back with its Author. J. Mol. Evol. 2016, 83, 169–175. [Google Scholar] [CrossRef] [PubMed]
- Ikehara, K. Towards Revealing the Origin of life.—Presenting the GADV Hypothesis; Springer Nature, Gewerbestrasse: Cham Switzerland, 2021. [Google Scholar]
- Ikehara K, Amada F, Yoshida S.; Mikata, Y.; Tanaka, A. A possible origin of newly-born bacterial genes: Significance of GC-rich nonstop frame on antisense strand. Nucl. Acids Res. 1996, 24, 4249–4255. [Google Scholar] [CrossRef] [PubMed]
- Ikehara, K. Protein ordered sequences are formed by random joining of amino acids in protein 0th-order structure, followed by evolutionary process. Orig. Life Evol. Biosph. 2014, 44, 279–281. [Google Scholar] [CrossRef] [PubMed]
- Kruger, K.; Grabowski, P.J.; Zaug, A.J.; Sands, J.; Gottschling, D.E.; Cech, T.R. Self-splicing RNA: autoexcision and autocyclization of ribosomal RNA intervening sequence of Tetrahymena. Cell 1982, 31, 147–157. [Google Scholar] [CrossRef] [PubMed]
- Guerrier-Takada, C.; Gardiner, K.; Marsh, T.; Pace, N.; Altman, S. The RNA moiety of ribonuclease P is catalytic subunit of the enzyme. Cell 1983, 35, 849–857. [Google Scholar] [CrossRef] [PubMed]
- Shapiro, R. A replicator was not involved in the origin of life. IUBMB Life 2000, 49, 173–176. [Google Scholar] [CrossRef] [PubMed]
- Luisi, P.L. An open question on the origin of life: the first forms of metabolism. Chem. Biodivers. 2012, 9, 2635–2647. [Google Scholar] [CrossRef] [PubMed]
- Ikehara, K. Life Emerged from [GADV]-Protein World, but Not from RNA World!? Preprints. [CrossRef]
- Joyce, G.F.; Szostak, J.W. Protocells and RNA Self-Replication. Cold Spring Harb. Perspect. Biol 2018, 10, a034801. [Google Scholar] [CrossRef] [PubMed]
- Salditt, A.; Karr, L.; Salibi, E.; Le Vay, K.; Braun, D.; Mutschler, H. Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment. Nat Commun. 2023, 14, 1495. [Google Scholar] [CrossRef]
- Dill, K.A. Dominant forces in protein folding. Biochemistry 1990, 29, 7133–7155. [Google Scholar] [CrossRef] [PubMed]
- Ikehara, K.; Omori, Y.; Arai, R.; Hirose, A. A novel theory on the origin of the genetic code: a GNC-SNS hypothesis. J. Mol. Evol. 2002, 54, 530–538. [Google Scholar] [CrossRef] [PubMed]
- Ikehara, K. Origins of gene, genetic code, protein and life: Comprehensive view of life system from a GNC-SNS primitive genetic code hypothesis. J. Biosci. 2002, 27, 165–186. [Google Scholar] [CrossRef] [PubMed]
- Ikehara, K. Possible steps to the emergence of life: The [GADV]-protein world hypothesis. Chem. Rec. 2005, 5, 107–118. [Google Scholar] [CrossRef]
- Ohno, S. Evolution by gene duplication; Springer: Heiderberg, Germany, 1970. [Google Scholar]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).