Abstract
Information is the currency of life, but the origin of prebiotic information remains a mystery. We propose transitional pathways from the cosmic building blocks of life to the complex prebiotic organic chemistry that led to the origin of information systems. The prebiotic information system, specifically the genetic code, is segregated, linear, and digital and probably appeared during biogenesis four billion years ago. In the peptide/RNA world, lipid membranes randomly encapsulated amino acids, RNA, and protein molecules, drawn from the prebiotic soup, to initiate a molecular symbiosis inside the protocells. This endosymbiosis led to the hierarchical emergence of several requisite components of the translation machine: tRNAs, aaRS, mRNAs, and ribosomes. When assembled in the right order, the translation machine created biosynthetic polypeptides, a process that transferred information from mRNAs to proteins. This was the beginning of the prebiotic information age. The molecular attraction between tRNA and amino acids led to different stages of the translation machines and the genetic code. tRNA is an ancient molecule that designed and built mRNA for storing the information of its cognate amino acid. Each mRNA strand became the storage device for the genetic information that encoded the amino acid sequences in triplet nucleotides. As information appeared in the digital languages of the codon within mRNA, and the genetic code for protein synthesis evolved, the prebiotic chemistry then became more organized and directional. The origin of the genetic code is enigmatic; herein we propose an evolutionary explanation: the demand for a wide range of specific enzymes in the peptide/RNA world was the main selective pressure for the origin of information-directed protein synthesis. We review three main concepts on the origin and evolution of the genetic code: the stereochemical theory, the coevolution theory, and the adaptive theory. These three theories are compatible with our coevolution model of the translation machines and the genetic code. We suggest biosynthetic pathways as the origin of the specific translation machines which provided the framework for the origin of the genetic code. During translation, the genetic code developed in three stages coincident with the refinement of the translation machines: GNC code developed by the pre-tRNA/pre-aaRS /pre-mRNA machine, SNS code by the tRNA/aaRS/mRNA machine, and finally the universal genetic code by the tRNA/aaRS/mRNA/ribosome machine. Our hypothesis provides the logical and incremental steps for the origin of the programmed protein synthesis. In order to understand the prebiotic information system better, we converted letter codons into numerical codons in the Universal Genetic Code Table. We have developed a software called CATI (Codon-Amino Acid-Translator-Imitator) to translate randomly chosen numerical codons into corresponding amino acids and vice versa. This conversion has granted us insight into how the translation might have worked in the peptide/RNA world. There is great potential in the application of numerical codons to bioinformatics such as barcoding, DNA mining, or DNA fingerprinting. We constructed the likely biochemical pathways for the origin of translation and the genetic code using the Model-View-Controller (MVC) software framework, and the translation machinery step-by-step. Using AnyLogic software we were able to simulate and visualize the entire evolution of the translation machines and the genetic code. The results indicate that the emergence of the information age from the peptide/RNA world was a watershed event in the origin of life about four billion years ago.