6.1. The Virosphere — Yet What About It?
There is a fundamental flaw in the traditional ToL in that the large variety of viral agents are not usually considered relevant in phylogenetic reconstructions at the level of
organismal Phylodomains, but changing the historical neglect has not been easy [213-215]. As the conceptual ToL represents a tree of genomic lineages rather than a tree of cellular organization as such — and viral genomes follow lineage-wise patterns of vertical descent and variation just the same — there are no logical reasons for barring viruses from the universal ToL of ever-diversifying genomic lineages. Reaching even deeper yet, mobile genetic elements (MGEs) follow lineage-wise descent on their own, and viruses are suggested to represent a hybrid of two sources: replication-related functions from ancient MGEs, and capsid shells from ancestral cellular structural proteins [
111]. It has been pointed out before that single genes are never ‘
selfish’ in the strictest sense, but becoming part of ‘
selfish modules’ is a different story. As soon as reliable replication mechanisms had been established — to serve many genes in parallel as a ‘
common good’, in support of system-level continuance — the emergence of temporally and/or locally over-replicating modules several collectively self-perpetuating components became virtually inevitable, as exemplified by mobile transposable elements, plasmids and viral agents [
216].
Full recognition of viral evolutionary relevance has been unduly limited by academic debates about whether virus particles actually are alive or not [
217,
218]. Yet, viruses have system-sustaining qualities and self-directed modularity — thus dwarfing the potential of single genes in general [
219]. But not having a metabolism of their own, viruses depend on susceptible hosting systems for particle propagation and recurring infectivity. Viruses in general and RNA viruses in particular are thought to have arisen well before the generation of modular cells in lineages of vertical descent [
111,
220], and it is worth asking how the primordial hosting system may have been organized in the first place. Arguably, the ‘
surface protoplasm to proto-coenocyte scenario’ put forth in the present article is a suitable framework to look at the origin of viruses from a more general perspective by also considering the so-called ‘
Virocell Concept’: The focus then is primarily on the
intracellular phase of viral reproduction instead of
virus particles as simple spore-like
propagules [221-223], and the
ToL of organisms becomes a more comprehensive
ToL of virocells. The modern biosphere abounds with a variety of different viral lineages and other mobile elements, now coevolving with their hosts in every phylodomain of organismal life.
Nota bene: bacterial viruses are also known as
bacteriophages or
phages for historical reasons.
Emerging viral genome lineages have undoubtedly become involved in to-and-fro gene transfer (LGT/HGT) with the host system — facilitated by intimate proximity at the
Virocell stage. This concept represents an instructive case of ‘
symbiotic parasitism’, which has to strike a balance between lytic production of virus particles and the hosting system in terms of ‘
biological fitness’ of either partner. It is clear that any overly effective viral lineage would be on a suicide mission if all the potential host cells were wiped out upon acute virus infection. Accordingly many long-established viruses have found a way of entering a persistent ‘
latent state’, such as turning into a temporary plasmid stage or transposing reversibly into a host cell chromosome. Such virus genomes persisting in a latent phase can also be selectively advantageous for the hosting system [
224].
Of note, different evolutionary dynamics prevailed on either side of the symbiotic virus–host relationship, with alternative or complementary selective advantages for the disparate partners: (i) A virtually unlimited variety of short, unlinked genes for quasi-statistic peptide sequences presumably resulted from a stochastic trial-and-error sifting through mutational changes in the communal cytoplasm and gradual enrichment for functional improvements and collective optimization of many enzymatic reactions. Yet, viral lineages could also pick and choose from this plentiful source of potential innovation. (ii) Once transferred to a virus genome, a candidate gene — potentially useful for viral functionality — could then selectively be specialized to perfection, in direct competition with other viral lineages. (iii) Some of these virus-encoded functions in turn — after transfer back into the communal gene pool of the protoplasmic hosting system — could become of particular use to the emergence and diversification of organismal lineages as well. This is a superior category of constructive feedback by gene transfer from a youthful virosphere to a still embryonic, pre-organismal biosphere, which has been instrumental, I suppose, in shaping the conceptual connection from tentative OoL scenarios to unsolved problems about the rooting of the canonical ToL. (iv) Somewhat between viral and cellular genomes, plasmids were likewise forming self-directed, vertically stable lineages embedded in the progenote protoplasm. Of note, there is a floating conceptual border line between the categories since infectious viruses can turn latent as integrated MGEs or in a self-replicative plasmid stage not shedding virus particles. (v) Like viral lineages, plasmids could also have accelerated evolutionary perfection of the relatively few genes they carried along, and some of these genes may have become critically important for system persistence at large.
Instructive suggestions for virus-mediated organismal innovations are as follows:
Recombinational hotspots: Even previral agents, such as transposable MGEs, have found use in organismal genomes as entry points for horizontal gene transfer and shuffling of exons or protein domains [
225].
RNA to DNA transitions: The modes of genomic DNA replication are generally equivalent in the organismal phylodomains. Their basic enzymology however is partly non-homologous — especially between
Bacteria on the one hand versus
Archaea and
Eukarya on the other [
226,
227]. This means to me that the full transition to DNA has occurred stepwise and that the molecular diversity at different steps was potentially of critical relevance for connecting the core pattern in a residual “
ToL of 1 %” to its historical roots at the organismal and/or pre-organismal levels. With tentative origins in different ancestral RNA polymerase genes of the preceding RNA-dominated era, the phylogenetically irregular distribution of replicative DNA polymerase subunits across the deepest branches of the organismal ToL [
228] does not uniquely specify a single evolutionary paradigm regarding the biological history of cellular diversity. I’ll return to this aspect more specifically in the following
Sections. In contrast, the phylogenetic history of replication helicases in the archaeal branch appears more regular in following the rDNA-based standard tree [
229].
Heterochromatin-like clustering: The emergence of the eukaryotic nucleus is still a mystery, which has been interpreted by two basic models:
endosymbiotic theories from outside a prospective akaryote host cell versus
autogenous (or
endogenous)
hypotheses suggesting diversifying membrane trafficking from within larger pre- or proto-eukaryotic cells [
113,
230,
231]. The recent advances on ‘
viral replication factories’ have led to the notion “
that uncoupling of transcription from translation is a feature of giant viruses [and]
the ability to uncouple transcription from translation potentially has a very long evolutionary history” [
232] — in support of the hypothesis that the nucleus is derived from a characteristic ‘
viral factory’ [
233,
234]. This model entertains the additional hypothesis that eukaryotic histones likewise derived from viral origins, allowing differentially compacted chromatinization for giant viruses first and for their host cells secondarily [
235]. Molecular chaperones for histone assembly, too, may be related to viral proteins [
236]. The intimate coevolution of viral and cellular membrane fusion proteins may likewise be relevant in this context [
237], and bacterial viruses have perfected the translocation of DNA across membranes by molecular motors [
238].
Host line evolution and persistence: The interaction of bacteriophages with chromosome-borne MGEs can be reciprocal and intense at characteristic ‘
Phage-inducible chromosomal islands’ scattered in the host cell genome. These gene-bearing clusters allow viral genomes to integrate in a ‘
lysogenic state’ and benefit the host cell by promoting genetic variability, protecting from the lytic stage, and shielding against super-infection by other viruses from outside [
239,
240].
To conclude this Section, structural conservation in virus evolution is also studied by the comparative
Phyloproteomics approach [
241], which has strengthened the belief that viral lineages began to emerge at the very onset of genomic evolution [
242]. So, “
viruses should be considered drivers of cellular evolution rather than minimalistic genetic parasites” [
237]. Furthermore, the organized release from membrane-bound but not yet fully cellularized compartments — as ‘
Viral Escape’ of the earliest RNA viruses to begin with — has been integrated into an “
Extrusion model of viral Panspermia: from vesicles to viruses” [
238].
Incidentally, the conceptual inclusion of viruses in
phyloproteomic analyses has refueled the critical dispute about “
realistic evolutionary models” referred to as a latent ’
Phyloproteomics paradox’ further above [
244,
245]. Yet, the Kurland group has kept to rather discordant views about Woese’s concept of a collective
Progenote State and its unconventional implications (herein quoted verbatim; my emphasis in bold): “
Second is the discovery that the most recent universal common ancestor (MRUCA) of the modern crown is not a bacterium (or an archaeon or a eukaryote). Rather, MRUCA has extensive phylogenetic affinities with eukaryotes as well as both bacteria and archaea, which could mean that MRUCA has matured beyond the progenote stage. /// It is difficult to identify MRUCA with the progenote postulated by Woese [
4]
because there is nothing elementary or simple about its proteome, but then there is nothing simple in Woese’s sketches of the progenote. /// If there were a progenote in the early evolution of organisms, it would have appeared much before the debut of MRUCA. It is anyone’s guess how long the progenote mode of gene exchange persisted. /// However, it is inconceivable that a cell as complex as MRUCA could have been a progenote – or so it seems now” [
136].
To be sure, the conventional
Phylogenetics research community is no longer fond of Woese’s
ab initio notions about collective sharing from the earliest
Progenote stages (in the strictest sense of perfecting the genetic code) up to the rooting problem of the organismal ToL. To illustrate this point, Eugene Koonin has seriously dealt with Woese’s concepts in 2014 at the latest [
246], but in his otherwise highly informative review on the replication machinery of a tentative LUCA [
228] there is no mention of Woese’s considerations whatsoever — very much in contrast to Forterre’s conceptions [
177].
I will argue in the following Sections that Woese’s collective
Progenote State, in fact, has more potential bearing on the open issues with the formal ToL than the Kurland group and many others have been ready to admit. The potential role of plasmids in early evolution will play a key role in my reasoning in this regard. As for DNA lineages, viruses and plasmids have engaged in multiple interactions early on [
247]. Correspondingly, similar RNA-based interactions may have developed even earlier. Evidence from modern life confirms the actual existence of double-stranded RNA plasmids [
248]. I will argue for the possibility that an
RNA-based plasmid associated to ‘protothylakoids’ may have founded the deeply rooted genomic lineage that ultimately has led to the emergence of free-living bacteria-like cells, and other plasmid lineages may have comparable effects.
NB — For want of a better word, I deliberatively ‘borrow’ the thylakoid term for application in a more general ancestral sense than its well established meaning for highly advanced photosynthesis as represented in ‘purple bacteria’, cyanobacteria and plastids of eukaryotic plants.
6.2. Modular Cellularization — Progenote and Lineage Aspects Reconsidered
More generally speaking about the ’
Phyloproteomics paradox’ mentioned above, the issues raised may even go deeper than some arguably unrealistic assumptions set up by the opponents. At the bottom line of the main problem, apparent anomalies may have arisen under the kinetic influence of several opposing trends — at various scales with regard to organizational levels or temporally speaking. To my understanding, the most generally relevant trend reversal (not yet fully understood or systematically analyzed, though) concerns the transition from system-wide (or module-wide)
accretionary evolution at the collective
Progenote stage to begin with, vs. partly
reductive evolution later on when organismal lineages began to selectively adapt to narrowly defined environmental niches, such as marginal survival in extreme environments, cf. the ‘
Thermoreduction Hypothesis’ [
249,
250] or in predatory and parasitic lineages in upper sectors of the ToL [
251].
In other words, the now prevailing views expect that one and the same model be applicable ‘
from top to toe’ — or ‘
root to branches’ — in the tree to be computed on a graph. It was Carl Woese’s momentous and far-reaching insights to realize (i) that evolutionary dynamics must have been very different before and after the onset of organismal Darwinian speciation (as separated by “
Darwinian thresholds”) [
188,
172,
208], and (ii) that this overall “
Woesean–Darwinian transition process” was intrinsically composed of several, more locally defined principal components. This means that as soon as “
only one of the major evolving cell designs were to cross its Darwinian threshold, tree representation would appear to be appropriate because that one lineage (only) would be distinguishable from all the rest [as a ’
primary line of descent’]
, despite the fact that the others did not yet exist as discrete stable lineages” [
163]. It also means that all three Woesean “
Urkingdoms” (aka ‘
Phylodomains’) may well represent “
primary lines of descent” resulting from the “
Woesean–Darwinian transition process” by different and partly independent means.
Woese himself has already compiled a string of arguments (herein quoted verbatim; my emphasis in bold) that the universal ancestor to start the ToL has most likely been a Progenote:
“In principle the universal ancestor could have [1] resembled any one of the three major types of extant organisms. It also could have in essence been [2] a collage of all three, or have been [3] very unlike any of them. I will argue that the last alternative is the correct one and that the universal ancestor was a progenote.” /// The alterations “required to change one of the three phenotypes into either of the others are too drastic and disruptive to have actually occurred.” /// The “only solution to the problem is for the universal ancestor to have been a progenote.” /// In “the transition from the universal ancestor to its descendants we are witnessing the evolution of biological specificity itself.” /// Since “the progenote is far simpler and more rudimentary than extant organisms, the significant differences in basic molecular structures and processes that distinguish the three major types of organisms would be attributes that the universal ancestor never possessed. In other words, the more rudimentary versions of a function present in the progenote would become refined and augmented independently, and so uniquely, in each of its progeny lineages. This independent refinement (and augmentation) of a more rudimentary function, not the replacement of one complex function by a different complex version thereof (the beginning stages of which would be strongly selected against), is why remarkable differences in detail have evolved for the basic functions in each of the urkingdoms” [252] … “If modern large proteins could not be produced by progenotes, then a modern type of genome replication/repair mechanism did not exist. As with translation, a rudimentary mechanism implies a less accurate one, and the resulting high mutation rates necessitated small genomes. The structure of these genomes must reflect the primitive evolutionary dynamic in general. Therefore, I see the progenote genome as organized rather like the macronucleus of some ciliates today []: it comprised many small linear chromosomes (minichromosomes), each present in multiple copies. /// Small primitive genomes with low genetic capacity and imprecision in both translation and genome replication imply a primitive cell that was rudimentary in every respect /// It was [] a community of progenotes, not any specific organism, any single lineage, that was our universal ancestor — a genetically rich, distributed, communal ancestor. It was also this loose-knit biological unit that ultimately evolved to a stage in which it somehow pulled apart into two, then three communities, isolated by the fact that they could no longer communicate laterally with one another in an unrestricted way. Each had become sufficiently complex and idiosyncratic that only some genes, some subsystems, could be usefully transferred laterally. Each of these three self-defining communities then further congealed, giving rise to what we perceive as the three primary lines of descent” [4].
These passages support several major points to motivate the reasoning put forth in the present paper: (1) The high levels of genetic redundancy expected for macronucleus-like clustering in progenote entities were easier to attain from pre-genetic stages that already consisted of relatively large bodies of functionally interactive protoplasmic matter. (2) The initial inaccuracy of processive mechanisms affected replication as well as transcription and translation but not necessarily to the same extent at every particular substage. (3) A natural series of temporally ordered evolutionary perfection can be inferred from additional input as follows (not yet explicitly considered in Woese’s presentation): (i) initial perfection of tRNA charging at a genuinely pregenetic, analog stage of pre-progenote life-like molecular networking and functionality; (ii) intermediate perfection of RNA replication with RNA-to-protein coding and decoding mechanisms for individual genes at the early Progenote stage (sensu stricto); (iii) final perfection of genome-wide accuracy and processivity of DNA replication at the late Progenote stage (sensu lato), so as to warrant the conservation of chromosomal synteny in a species-wide population over considerable evolutionary time; and (iv) whilst the direct impact of chromosomal synteny was strongest for monomolecular plasmids and akaryotic genomes, it was less prevalent for the multi-molecular gene pool for cytoplasmic functionality in the progenote population at large, which may have been the major reason why it took longest to attain vertical stability for multi-chromosomal genomes from an ancestral nuclear-cytoplasmic lineage to eukaryotic organisms.
The high degree of polyphenotypic variability intrinsic to the communal
Progenote state is generally underappreciated in current studies of the rooting problem for the universal ToL This putative indistinctness has led Kandler to suggesting “
allopatric speciation of a multiphenotypical pre-cellular population” [
74], as pointed out by Wächtershäuser: “
These precells are seen as ‘multi-phenotypical’, having distinctly different metabolic phenotypes. Some sub-populations may be autotrophic, others heterotrophic; some anaerobic, others micro-aerophilic; some H2-consumers, others H2-producers, etc.” [
252].
It is the purpose of this Feature Paper to devise a plausible model to further support the prescient insights of Woese and Kandler — without falling prey to cladistic orthodoxy alone, which seem to imply that the composite eukaryote cell-type has been derived secondarily from a primary archaeal lineage [
25,
118,
253]. The subcellular strategy of self-directed lineage stabilization — pursued by viruses and plasmids alike — should also pave the way to understanding how Woese’s organismal 3-D ToL could have originated from a somewhat indistinct, internally complexifying ‘
progenote collage’ of protoplasmic masses, combining rudimentary properties of all three major cell types during a collective phase of
trunk-line evolution at a formally common root.
It is important in the present context to be critical about what is meant by a ‘cell’ as a morphological and organizational unit of life. Is it the shell-like container or modularity of its functional contents that is more important in comprehending the tentative origins and early evolution of these complex biological entities? Almost two decades ago, Juli Peretó considered the question of “
Early or late cellularization” as one of several controversies still waiting to be resolved, and he expressed a personal bias that “
life would have been cellular ‘ab initio’ [from the very start]” because he found it difficult to imagine how the necessary bioenergetics could be managed in ‘
acellular’ systems [
45]. This preconception appears related to received consensus views that ‘
protocells’ had to be small, vesicle-like to begin with for subsequently to evolve into
Akaryote (aka ‘
Prokaryote’) cells before additional features resembling eukaryote complexity emerged at considerably later stages.
As applied to eukaryotes however, the basic concepts of unitary (mononuclear)
cells and the corollary of
cellularization as a process of generating preferentially mononuclear cells from larger
acellular (
coenocytic, polynuclear) systems had originally been introduced when bacteria-like
Akaryote cells were not even known to science. On second thought, the organismal modularity of eukaryotes — the ancestral
nuclear-cytoplasmic lineage included — may not even be of a cellular nature primarily but rest on the modularity of equational nuclear division. The classical concept of
cellularization from acellular syncytia in eukaryotes has led me to conjecture that
Progenote entities systemically adopted a composite, “
plasmodial-like organization” as ‘
proto-coenocytes’ [
106,
114], which is not equivalent to the conventional conception of simpler, vesicle-like configuration in various ‘
protocell’ models.
Some modern
coenocyte examples comprise of multinucleate amoebae, siphonal green algae, or syncytial slime mold plasmodia. Large amoebae are also known today to act as evolutionary ‘
melting pots’, which facilitate the proliferation of chimeric microorganisms, such as giant viruses [
254,
255,
256]. Foraminifera and plasmodial slime molds are of particular interest in this context because of their tendency to coalesce by cytoplasmic fusion, respectively occurring within an extensive ‘
reticulopodial’ network [
257], or between larger ‘
plasmodial’ masses [
258].
In analogy to the concept of ‘
Viral Escape’ (above), the compound model suggested here assumes that bacterial cells were the first
Akaryote lineages to descend from the communal
Progenote State by ‘
Cellular Escape’ after a considerable period of accretionary evolution as plasmid genomes enclosed in endogenous
proto-organelle compartments — inside the composite, polymorphic and amoeba-like
proto-coenocytes of a collective, polyphenotypic ‘
proto-plasmodial trunk-line population’. Similar ideas were also being developed in the 1970ies suggesting that a
protein synthesis system was implanted into the
respiratory organelle by incorporating a
stable plasmid with genes for
ribosomal components [
259,
260] — equivalent to combining a ‘
ribosomal DNA episome’ with ‘
plasmid-associated thylakoids’ [
107].
These early ideas have been effectively shunned by proponents of the now prevalent doctrine that mitochondrial lineages solely originated from once free-living α-
proteobacteria [
261,
262]. On the other hand,
mitochondrial phyloproteomics can also tell a different story of most mitochondrial proteins not showing any particular relationship to α-
proteobacteria, which led to alternative views implying the pre-existence of mitochondria-like modules of eukaryotic ancestry — as potentially endogenous
premitochondria, well before eventual genomic interactions with free-living α-
proteobacteria [
112,
263,
264]. Comparative reviews of mitochondrial origins with regard to “
symbiogenic-chimeric vs autogenic-incremental” conceptions have since been taken up in favor of multiple symbiotic interactions very early on [265-267]. However, the ‘
early on’ in this debate has not yet explicitly included the collective
Progenote stage at the common root of Woese’s 3D-ToL.
As briefly mentioned earlier, I refer to the
Progenote concept in both its narrow and its broader meaning — corresponding to an early and a later phase of collective system optimizing — which initially concerned ‘
the making of genes and gene products’ for immediate usage on the spot, and ‘
the making of genomes’ for faithful inheritance in vertically stable cells and organismal lineages later on. Evolutionary optimization came about via differential survival of the better-fit performance within two to three coupled modes of molecular catenation in a processive manner: mRNA-directed protein synthesis by composite
ribosomes, and template-dependent nucleic acid synthesis by
transcriptase action and/or composite
replisomes. Yet, why should it require longer periods to optimize processive replisomes than what it took for the considerably larger ribosomes? — The most reasonable answer may be a complex one of matching the chemical reactivity and instability of RNA against the stochastic limits of tolerable genome size, as imposed by the large intrinsic error rates to start with [
268,
269]. In consequence, the relatively large genomes needed for cellular organisms to stably coexist with others could only have come into existence in the aftermath of one or more RNA-to-DNA transitions — presumably after DNA viruses and plasmid interactions had paved the way [
247,
270].
It is my present opinion that the notion of
plasmid-associated thylakoids [
107] offers the best potential for rooting the entire bacterial domain deeply in the predominantly collective
Progenote State of Woese’s early theorizing, and key to this notion is the conceptual separation of
modular cellularization from the molecular nucleation steps that established the earliest
vertically stable lineages of different minimalistic ‘
genomic agents’ as such. This is where certain plasmid lineages could have made a difference well before genuine cell lines had been established. There is a general understanding that much of the universally communal
Progenote State unfolded under the influence of RNA as the predominant informational molecule, but the lineage-wise establishment of organismal cell types very much depended on the adoption of DNA as genomic material [
176,
208,
226,
271]. It is thus reasonable to assume that the tentative
energy-harvesting genomic agent associated with ‘protothylakoids’ began as an RNA plasmid relatively early in the
Progenote Era.
The founding concept of
plasmid-associated protothylakoids promoted here argues that a certain RNA plasmid made itself indispensable by giving prominence to a triad of system-sustaining innovations already at the RNA-dominated beginnings of the collective
Progenote State: (1) self-reproductive capacities at a miniscule scale, including self-directed replication and self-serving ribosomes quite early on; (2) micellar to membrane-like association of amphiphilic peptides and lipid constituents; (3) directional channeling of photon-induced charge separation in coupling to inter-molecular transfer chains for electrons and/or protons, coordinated at lipid-raft-like nucleation centers. The early generation of endogenous vesicles in coupling to environmentally driven charge separation appears particularly attractive since this topology should allow for the simultaneous utilization of reactive electrons and protons for different redox reactions on either side of the emerging lipid rafts and early membranes. The plasmid-coordinated compartmented module of subcellular structure–function integration could thereby have provided the mechanistically organized “
Engine of Free Energy Conversion” needed to get life-like metabolism under way [
272], albeit augmented by different mechanisms and amplifying effects of repetitive environmental pumping than what arguably could not be expected from constant flow patterns under the long-favored OoL scenario at submarine hydrothermal vents.
On the sub-coenocytic basis of this model I find it natural to argue that the composite machinery of genomic DNA replication specific for bacteria began to consolidate first and did so by partly independent means as compared to functionally equivalent replication modules of other origins, which are partly shared by archaeal and eukaryotic organisms [
220]. Plasmids are often mentioned together with transposable elements and viruses for their early lineage-wise emergence as self-serving reproductive units at subcellular scales [
273,
274]. Their evolutionary potential, however — affecting the hosting system at large — could have been distinctly different, as exemplified above by the posited coupling of membrane-based energy harvesting directed by an early-emerging, system-supportive plasmid lineage, presumably as a
compartmented RNA plasmid associated with protothylakoids to start with.
This unconventional perspective can shed a new light on the puzzling complexity of RNA-to-DNA transitions ancestral to the major organismal cell types. A major uncertainty may point at the heart of the puzzle: Is it still reasonable to assume that there actually existed a cellular ‘LUCA’ with a uniquely definable replicative polymerase? [
228] — Or should this questionable presupposition rather be dismissed in favor of a historically more plausible scenario? Woese’s conception of a communal
Progenote Phase (in a broader sense) has, arguably, the highest potential to unravel this conundrum.
There is a particular null hypothesis worth keeping in mind when thinking in terms of the Progenote State, in that all three processive reactions at the heart of Progenote networking complexity are egalitarian with regard to substrate spectrum and communal as regards their systemwide effects: 1. Translation: A certain kind of ribosome is responsible for producing all the gene-encoded proteins in the communal system. 2. Transcription: A certain kind of RNA- (or later DNA-) dependent RNA polymerase is responsible for making all (or most of) the potential mRNAs in the communal system. 3. Replication: A certain kind of RNA- (or later DNA-) dependent principal replicase complex is responsible for duplication of all the genomic (‘chromosomal’) molecules in the communal system, and all the potential replication origin sites respond to the same communal mechanism of organizing nascent replication forks. The early deviation of viruses and plasmids from this principle of general communality — as well as cellular lineage consolidation later on — would then require additional steps for explanation. These early semi-autonomous modular agents changed the rules by limiting replication mechanisms toward reproduction of their own genes preferentially.
By comparing replicon organization in bacteria and eukaryotes I find it remarkable that DNA replication in bacteria resembles plasmid replication in various aspects: Bacterial genomes are foremost contained on single molecules of circular DNA, including a single, bidirectional replication origin. The same is true for most bacterial plasmids, which are non-essential for cell growth in general [
140,
275]. Furthermore, some 10% of bacteria have large, essential ‘
extra-numerous chromosomes’ which actually are mega-plasmids with other — plasmid-specific — replication origins and partition machineries [
175]. I am confident therefore that the founding principle of a
genomic plasmid compartmentalized with
protothylakoids is a viable model for gradual accretive evolution within a larger mass of potentially coenocytic protoplasm as early as the upcoming
peptide–RNA alliance in a primordial
RNP World scenario.
We can only guess what actually happened at that tentative RNA-dominated stage since there is no extant organism surviving to fully represent that era. Only RNA viruses, RNA plasmids and retrotransposons can give us some insight into its nature and tentative history, though experts are still divided on how to explain the spurious evidence [
244,
277,
278]. There are good indications that
reverse transcriptase (RT) of retro-transposons and RNA viruses had remarkable roles in RNA-to-DNA transitions at the organismal level, at least for the catalytic domains of major DNA replicase complexes in archaeal and eukaryotic cells [
228]. The RT enzyme relates to the characteristic core domain of RNA polymerase, aka transcriptase, whereas the major bacterial DNA replicases are based on a different RNA-making protein family, which in eukaryotes is including polyA polymerase, aka terminal riboadenylate transferase. At the superior superfamily level, though, all the known
Nucleic Acid Polymerase proteins are structurally related by resembling a right hand with
fingers, thumb, and palm regions and thus may have evolved from a very ancient common ancestor. [
279]. Moreover, the only class of DNA topoisomerase occurring in all three domains of life (Type IA) is often associated as well with RNA topoisomerase activity [
280,
281].
From early, relatively open discussions of potential possibilities at the major RNA-to-DNA transitions [
226,
282], the expert community is now more categorically divided between opposing views: (1) Assuming beforehand that there was a distinctive —
prokaryote-like — common cellular ancestor and that Woese’s communal
Progenote State had ended long before [
228], vs (2) accepting that the tentative
Progenote Phase ended stepwise by giving rise to different primary cell types one by one — as an implicit prerequisite for organizing the composite machineries of initiating and performing processive DNA replication at the organismal level, more than just once and by partly independent means [
270,
271,
283]. It is my present aim to bridge this virtual gap of understanding by conceptually separating the early stage of genomic nucleation in viral and plasmid lineages from the later steps of gaining genome integrity and cellular autonomy at the organismal level.
This explorative conception leans on a nesting principle by allowing certain semi-autonomous minigenomes to emerge and be compartmentalized inside a larger systemic whole, which in turn kept nourishing some of these embryonic genome lineages for accretionary growth up to their later ‘escape’ as quasi-autonomous cellular entities. To substantiate this notion I herein suggest reinterpreting the rooting problem of linking both Bacteria and the partly related mitochondrial lineage to the base of the canonical ToL according to the following considerations:
The founding core of intracellular genomes was a ‘protothylakoid’-associated RNA plasmid.
It carried an operational core for independent protein synthesis, perhaps assisted by a productive combination of recombinational bypass of replication-blocking lesions and/or the superior principle of ‘
rolling circle replication’ of ribosomal RNA sequences, — analogous to a commonly observed mode of differential gene amplification today [
284]. Somewhat indirectly,
rolling circle replication can be initiated by recombination between circularly permuted linear sequences and/or terminal redundancy [285-287]. The plasmid-based coding potential comprised one or more membrane-interacting amphiphilic proteins with directional charge transfer or other energy-converting capacity, which became vitally important for the surrounding protoplasmic system at large.
Micellar, vesicular or cisternal
protothylakoids accumulated around the associated plasmid molecules and eventually fused to form internal, organelle-like compartments [
107].
Topological closure of a surrounding envelope, however, could only be achieved in coevolution with appropriate transport systems into and out of the emerging compartments.
The RNA-to-DNA transition of the compartmentalized plasmid was partly independent from the larger systemic whole. The process began with plasmid-specific replication origins and ancestral primase–helicase complexes — presumably similar to the metazoan mitochondrial initiation system [
288]. Accordingly, the peculiar resemblance between DNA replicases in mitochondria and T7-like bacteriophages [
289] and the discovery of T7-like lysogenic prophage modules, which are inferred to better represent ancestral stages than the better known, strictly virulent T7-like phages themselves [
290], fall neatly in line with the case study of a lineage-defining gene exchange equilibrium between viruses and plasmids with regard to certain host-related replication specificity factors [
229]. The main point here is that minimal lineage-defining modules comprised of particular sequence elements in DNA to function as preferential internal replication origins and corresponding proteins to recognize the starting sites for processive template replication.
The nascent lineage-tracking genome modules in turn had to deal with increasingly multidimensional concerns for subsequent accretionary growth, not the least in bargaining overall genome length against the cumulative hazards of accidental damage and momentary replication infidelity. Inasmuch as the resolution of many such replication-blocking events required ‘
trans-lesion synthesis’ of DNA for
recombinational repair, all organismal genomes — and larger viruses too — depended on more than just a single kind of DNA polymerase and also needed more effective processivity clamps for long-term lineage persistence and stability. Arguably the most significant modular innovation in this regard is the establishment of bidirectional replication by multi-enzyme replisomes [
291]. A pair of sister replisomes is set in motion at a common origin of replication — only to be dissolved after pairwise collision at certain replication-termination zones [
292].
Each replisome is assembled at a nascent replication fork after ds-DNA has been opened at a replication origin by helicase/primase deposition. The overall gearing of these composite molecular machines appears comparable in all domains of life, but many individual components are structurally non-homologous in bacteria as compared to archaea and eukaryotes [
293]. As composite replisomes too (similar to ribosomes) represent an important
functional module amongst “
new cellular subsystems that are refractory to major evolutionary change” [
4], they should resist the replacement of single components by LGT/HGT. However, while proto-ribosomes were vitally important throughout the
RNA-directed early phase of collective
Progenote Evolution, typical replisomes became important only during the
later stage of RNA-to-DNA transitions — with a potential for multiple emergences at different proto-organismal branchpoints of the formal ToL.
6.3. RNA-to-DNA Transitions at the Crossroads
Generally speaking it is not any replicative DNA polymerase as such that is central to replisome organization but a spider-like hub with tethers to coordinate several molecular shackles to the effect that the loop-assisted fork structure remains united and functional to duplicate the entire replicon with end-to-end reliability. In particular the discontinuous, looped-out synthesis of Okazaki pieces on the lagging strand – away from the advancing helicase — is an intrinsic challenge to persistent replisome integrity. Accordingly, up to three DNA polymerase complexes are being tethered to the pivotal helicase [
294], each replicase being associated with a circular sliding clamp to warrant long-distance processivity of template-directed DNA synthesis [
295,
296]. Multi-subunit clamp loaders are needed nowadays for locking the ring-shaped sliding clamps in a full circle around ds-DNA, thus begging the question of how the intricate interdependence within composite replisome modules could possibly have derived from simpler structures with fewer capabilities.
Organismal replisomes come in just two different varieties, distinct in composition, structural topology and putative evolutionary origins: bacterial lineages on the one hand and archaeal/eukaryotic versions on the other. The mobilome and virosphere provide for further variation and may give us some insight on the evolutionary emergence of organismal replisomes [
293,
297]. Various viral DNA replication machineries are diverse enough to warrant meaningful statistical analyses, and numerical correlations appear relevant in two different contexts [
298]. Above all, the complexity of virus-encoded DNA replication machineries is positively correlated with increasing genome size, and the non-random patterns of co-occurrence for several key components may reflect step-wise, coevolutionary emergences of structurally interactive sub-modules within the composite replisome. The preferential coupling of replicative helicases, primases and DNA polymerases — in this order — underlines the central role of helicase action in assembling the composite replisomes and in defining a self-directed hereditary lineage by choosing where to initiate replication amongst different molecules of DNA. Another functional relationship connects accessory clamp loaders to DNA polymerases and sliding clamps, which are predominantly found in the largest virus genomes of all three domains of organisms, e.g. T4 phage (~170 kbp) in bacteria and giant viruses of eukaryotes (300 to 1200 kbp). Whilst all toroidal polymerase-binding sliding clamps require clamp loaders to be installed, there are numerous viruses of intermediate size having DNA replicases not in need of DNA-encircling sliding clamps, e.g. T7 phage (~40 kbp) in bacteria and herpes-like viruses (120 to 250 kbp) in eukaryotes. This structural distinction may also be functionally relevant for the consolidation of organismal genome lineages.
In general, virus evolution is locked in an arms race with cellular hosts. There is no organismal species known to be free of viral attack altogether, but the different varieties of viral DNA replication proteins are not spread evenly across the three domains of life. Considering the structural correspondence of viral key components to either bacterial or archaeal/eukaryotic counterparts, there is a striking host range bias as follows. Viruses with primases and DNA polymerases of the latter category are numerous in all three domains of life, whereas the bacterial counterparts are confined to rather few bacteriophages. A similar trend is also observed for the accessory proteins of sliding clamps and clamp loaders [
298]. These fundamental differences indicate that virus hallmark replication genes descended from primordial replicators well before the consolidation of cellular organismal lineages [
299]. It also indicates to me that the origins of early viral lineages lead back to the polyphenotypic ‘
proto-plasmodial trunk-line’ population characteristic of the universally communal
Progenote State — well before the emergence of free-living bacterial cells.
To me at least, the asymmetry in host range distribution also signifies a remarkable inequivalence regarding the ways and means of how the first lineages of bacterial and archaeal cells eventually emerged from a common ancestral
Progenote State. An inherently asymmetric model appears necessary to explain the empirical observations in terms of natural evolution. The compartmented, plasmid-based conception proposed herein does actually provide for a potentially appropriate asymmetry condition in that a small internally compartmented plasmid genome lineage could draw on the large quasi-stochastic gene pool of the surrounding protoplasm for further accretionary growth and adaptive selection toward cellular independence. The two-phase model implies a distinct asymmetry with particular kinetic consequences significant for differential evolution on either side of the compartmented partnership. This intrinsic disparity appears relevant to contextualize the notion of “
Woesean asymmetry” concept mentioned further above, which has been postulated to affect the earliest branches of the organismal ToL — as a “
nonclassical perspective [that]
takes some [time for]
getting used to” [
208], and the clock is still ticking for taking extra time.
The early duality may have set a path toward contrasting strategies in bacterial and eukaryotic cells to make a living. The fundamental complementarity of these strategies has long been recognized and discussed in terms of early evolutionary divergence [
116,
249,
300]. These general considerations are still valid and deserve renewed attention as revised, more basic OoL conceptions are adding weight to assuming sunlight-exposed
terrestrial settings, cohesive
coalescence and environmental
wet/dry cycling as major driving factors for early evolution.
The strategic differences between Akaryote and Eukaryote cell organization appear correlated with small and large cytoplasmic volume, respectively. Additional parameters derived therefrom are also relevant for differing modes of density-dependent population dynamics [300-302]. As the larger eukaryotic cells are generally limited to lower growth rates and final densities than what is possible for smaller and more specialized bacteria, eukaryotes tend to evolve in
equilibrium populations by so-called "
K-selection", as opposed to "
r-selection" dominating in the more
opportunistic populations of bacteria subjected to recurrent catastrophic mortalities. As ‘
r-selection’ generally results in high degrees of genomic streamlining, this evolutionary mode is probably of secondary origin [
173]. whereas the ancestral pre-genomic state of low-fidelity copying systems with high error rates had to cope with high levels of genetic redundancy [
303]. Accordingly, Woese’s collective
Progenote State primarily responded to ‘
K-selection’ — being more concerned with system maintenance, perseverance and homeostasis at moderate growth rates than with faithful reproduction at high growth rates right away. Furthermore the smaller population sizes attainable for larger systemic entities may have been subject to “
constructive neutral evolution” resulting in directional, “
ratchet-like increase in complexity”, which now pervades much of eukaryotic cell organization [
304].
The ‘
Karyogenic Proto-Coenocyte Hypothesis’ [
106] is thought to match the conditional requirements listed above by considering a collective and persistent population of large systemic
K-selected entities, which hosted compartmented, primarily
r-selected proto-organellar lineages of smaller size and with limited gene numbers per circular (or circular permuted) genome — first based on RNA and later DNA. This notion of complementary dynamics and synergistic evolutionary potential carries endogenous
symbiotic relationships to a new level of networking complexity. By reaching deeply back into the collective
Progenote Era, a more superior and integrative
Symbiosis principle should not just ascribe the emergence of mitochondria to a somewhat fortuitous singularity as “
a fateful symbiotic encounter” (
sensu Martin et al.) [
261,
305] between fully individualized ‘prokaryotic’ cells. Instead I find it more natural to conceptually link the arguably most significant symbiotic relationships in eukaryotic cell organization to the endogenous generation of subcellular genomic lineages based on plasmids, which in turn could adaptively respond to different environmental challenges. Various anaerobic modes of syntrophic cross-feeding are fundamental in many methanogenic microbial communities today, such as specific microbial biofilms [
306,
307]. The principle of
anaerobic syntrophy between hydrogen-producing and hydrogen-consuming microorganisms may have been operative in a precursory scenario to energize the ‘
plasmid-associated thylakoids’ as suggested herein. More generally speaking, the nascent plasmid-based lineages could grow constructively by ‘feeding’ on the relatively large genetic redundancy accumulating in the collective hosting system.
This scenario calls for syntrophic cross-feeding in densely packed anaerobic biofilm communities as a collective mode to nourish variable progenote entities that were genetically based on short mini-chromosomes and plasmids, none of which had yet assembled sufficiently many different genes for granting individual cells a chance to colonize pristine environments completely on their own. How then could ancestral proto-bacterial entities push up their monomolecular genomes to more appropriate sizes? The burden of achieving this capacity, I suppose, was mainly placed on organizing more dependable replisomes with sliding clamps and clamp loaders in the course of evolutionary RNA-to-DNA transitions as pioneered by virus and/or plasmid lineages. A critical look from Woese’s ‘nonclassical perspective’ may shed new light on the ancestral relationship of mitochondrial and bacterial lineages.
6.5. Two or three Superkingdom Phylodomains? — That’s the Question
Where did Bacteria and Archaea come from; how did they gain their general status of clonable, potentially autonomous cells; and how did eukaryotic cells enter the stage in the first place — or rather a ‘third way’? Before trying to answer any of these questions, we must first make plausible assumptions about historical detail at the preceding state(s) of biophysical/biochemical consistency and interactions, and then sift through various stochastic possibilities for further evolutionary change. One way or another: The three extant cell types appear to have some connections to the pre-cellular, pre-genomic
Progenote State, which in turn is herein assumed to be influenced by yet earlier, pre-genetic stages derived from relatively bulky layers of phase-separated, surface-coating protoplasm — already supporting an emergent network of protometabolic interactions. Intriguingly, however, the overall validity of a three-fold exit pattern of modular cell types from communal collectivity in the ancestral Progenote State has been discounted by the discovery of Asgard archaea, which appeared to nest the entire eukaryotic branch within the archaeal domain [
25,
26]. Yet, the growing global data sets on this newly detected branch of microorganisms do not really substantiate the criticality of that serious challenge to the canonical 3-D model of the organismal ToL [
27,
28,
321].
When the
Progenote concept was first proposed, the reasoning was based on the notion that “
evolution of the cell is the evolution of the genotype-phenotype relationship” and “
the link between genotype and [cellular]
phenotype” was not fully established at such primitive stages when catalytic functionality was just emerging at many different levels [
119]. For as long as stochastic error rates were high, the corresponding signal-to-noise ratios were rather low, and sustainable chain lengths of macromolecular products were quite limited to begin with. To characterize the nature of
Progenote entities in more positive terms, these are networking qualities foremost, resulting in the ”
refinement and selection of innovation-sharing protocols, such as the genetic code”. The community then “
rapidly developed complexity through the frictionless exchange of novelty enabled by the genetic code” [
172], thus accelerating the collective optimization of other system-bearing traits as well — eventually resulting in genomic lineage stability by vertical descent of modular cells [
208].
In formal terms Di Giulio has taken care of Woese’s legacy by arguing in the top-down direction of the ToL, and a group around Francisco Prosdocimi & Sávio Torres de Farias is joining ranks with this approach. Regarding “
rapid and progressive evolution” as being
typical of the Progenote State, Di Giulio focuses on the fundamental differences between bacterial and archaeal cell organization, such as DNA polymerases [
271], methyltransferases [
322], cell division system [
323], RNase P proteins [
324], and membrane lipids [
325]. The implication of many nonhomologous protein constituents in these and other basic system-bearing traits is taken as cumulative circumstantial evidence that the deep nodes of the canonical ToL — the founding ancestors of the
Superkingdom Phylodomains included — were still part of the conceptual
Progenote State [
326]. Moreover, he extends similar arguments with regard to
Eukaryogenesis from the collective
Progenote State no matter whether
Eukarya shared a common ancestor with
Archaea in general or
Asgardarchaeota in particular [
327]. The Prosdocimi/Farias group, too, is taking seriously the possibility that cellularization from an acellular
Progenote State occurred several times independently for different cell types [
129,
328].
Furthermore, the central tenet of Woese’s 3D-ToL is expressed more distinctly as “Domain Cell Theory“, which states that ”the descendants of each of the three domains retained its [cell-type] identity throughout its own unique evolutionary pathway” [
329]. Seen in this light, the rather fragmentary Asgard–Eukaryote affinity may well be considered an anomaly potentially explained by HGT events from sharing a particular environment at early ancestral stages [
28,
177]. Alternatively certain data quality problems with metagenomic Asgard genome reconstructions from environmental DNA sampling alone may even lead to artefactual conclusions [
330,
331].
Patrick Forterre is taking a somewhat ambivalent position on these issues from a semantic point of view. Even though he has long been supporting Woese’s 3D-ToL, he only considers using Woese’s
Progenote term in the strictest sense, which corresponds to ‘
the making of genes’ early on — not the ‘
making of genomes’ in a maturing phase of still collective
trunk-line evolution as suggested herein. What is clearly needed from my perspective is a separate identifier for the residual trunk-line stage (still progenote-like in a broader sense, as inferred by the generalized progenote hypothesis) after the bacteria had branched off as separate lineages. This conceptual stage was still ancestral to both Archaea and Eukarya but not equivalent to either one of two different descendants — and not to Archaea, in particular [
250]. I therefore suggest using ‘
Arkarya stem line’ for the conjoint ancestral ‘
Supradomain’ interval specifically.
The neologism ‘
Arkarya’ was originally proposed to include the crown groups of Archaea and Eukarya
domains as well [
332]. which actually would have made the classical Woese tree, stricto sensu, a two domains tree: “
Bacteria and Arkarya” [
281]. But Woese himself, being openly reluctant to address the common stem line by a distinctive name according to standard
cladistic principles, considered the formal branchpoints before the three ‘
primary lines’ not as conventional
cladistic branchpoints. Instead he postulated multiple ‘
Darwinian thresholds’ as the pivotal evolutionary turning points “
when the transmission of genetic information moves from a predominantly horizontal mode based on lateral gene transfer (LGT) to a predominantly vertical mode” [
177]. I suggest addressing the common group of Archaea and Eukarya
domains as an ‘
Arkarya Supradomain’ to mediate between the two controversial extremes. By this provision both Archaea and Eukarya would retain their status of legitimate domains, originating from a common ‘
Arkarya stem line’, which as such still could be considered part of the
LGT-dominated Progenote State (sensu lato).
To conclude this section, it is still maintainable that Woese’s three-domain model of life best reflects the biological reality of three distinctly different basic cell types [
333]. Moreover, the favorite loophole of modern 2-D ToL proponents deriving Eukarya from a fortuitous singularity — an ancient ‘merger’ between an archaeon and a bacterium — shows a peculiar disregard for mechanism [
334]. The extraordinary affinity of Eukarya to Asgard archaea, however, can also be explained by preferential horizontal gene transfers before the crown-group radiation of extant eukaryotes [
28,
177] — for example, if the ancestral Asgard lineage had stayed in symbiotic contact with the ancestral “
nuclear-cytoplasmic lineage” (
sensu Doolittle [
34]) for the longest time. The LUCAN networking model put forth herein substantiates the salient notion that “
Life was born complex” [
335], well before it started to diversify at the organismal level. These considerations do not strictly contradict the now prevailing views that eukaryogenesis (also) was accompanied by rapid “
bursts of gene gain” before the corresponding crown-group radiation [
336].