DNA bases: adenine thymine guanine cytosine

Evolution of the Genetic Code: A Data-Driven Scientific Analysis

Introduction

Understanding how the DNA genetic code evolved remains a central challenge in molecular biology. Traditional hypotheses, such as the RNA world and stereochemical models, propose different mechanisms for how codons came to specify amino acids. In this post, we examine these established theories and introduce new data that challenge conventional assumptions. By analyzing patterns in nucleotide assignments, chemical affinities, and evolutionary constraints, we highlight inconsistencies in prevailing models and propose alternative interpretations.

Work on foundational questions such as the evolution of the genetic code is typically funded through academic grant mechanisms rather than commercial investment. As a result, researchers in evolutionary biology must rely on well-structured grant proposals to secure support for data-driven and theoretical research.

This data-driven approach not only questions aspects of the standard theory but also provides fresh insights into the constraints and possibilities that shaped the genetic code. Researchers, students, and enthusiasts in molecular biology and evolutionary genetics will find a clear explanation of the competing models, the supporting evidence, and the implications for future research. Through careful analysis and structured discussion, this post aims to bridge historical understanding with novel perspectives, helping readers grasp both the complexity and the emerging patterns in the evolution of the genetic code.

The evolution of the genetic code is a central question in molecular biology, with multiple hypotheses proposed to explain how DNA codons came to specify amino acids. In this analysis, we examine established theories, including the RNA world and stereochemical hypotheses, and present new data that challenge conventional models. By combining historical experiments, comparative analysis, and theoretical insights, this post offers a fresh perspective on how the genetic code may have evolved.”


How Did The Genetic Code Evolve?

Physicists and mathematicians on one end and chemists and biologists on the other, have different ideas in terms of what the smallest building blocks of matter are. Physicists often theorize about the existence of some particles and then spend decades looking for them. Sometimes they find them; sometimes they explain them into existence using complicated math. Everyone else seems to be happy with breaking down the matter to the level of atoms while keeping track of the electrons for practical purposes. Even the nuclear physicists who built the atomic bomb(s) for scientific and deterrence purposes (with occasional real-life use) limited their thinking to the level of electrons, protons and neutrons. However, there are curious minds for whom that is not enough. They want to know what particles are smaller than the electrons, what particles make up the protons and neutrons, and, of course, what particles make up the particles that make up the building blocks of atoms.

The author of this text believes that there might be some psychology and especially personality traits involved in the difference in approach to examining the nature of things. Those who are firmly grounded in reality think at the level of atoms and molecules, while others are on the “slippery slope”. This is a quote from a medical school dean who shall not be named here. Medicine pays; theoretical physicsโ€ฆ not so much.

Looking from the standpoint of a biologist who has always been bad at math and therefore couldn’t delve much into physics, one has to accept the descriptive explanations of the topics that normally require a lot of math. The currently accepted theory of the origin of the universe is that in the beginning there was nothing, and since there was nothing, there was no time. This was explained in the famous book “A Brief History of Time” by Stephen Hawking (1). At some point in time (that did not exist), an infinitely small and infinitely dense point/dot/sphere of something spontaneously occurred and then exploded, creating all there is. The explosion also marked the beginning of time. Everything was too hot for life as we know it, so it took some time before it stopped raining rocks and the inorganic molecules were able to exist in liquid or solid form. Lightning and possibly radioactive bombardment of the young Earth gave rise to more complex organic molecules. The hypothesis was made based on the famous Miller’s experiment that is usually taught in the first biology class in high school. There have been repetitions of the experiment with more amino acids created than reported in the original experiment (2) Well, we have now arrived at the level of molecules. There should be no “slippery slope” here. Things should be easier to understand from now onโ€ฆ.right?

Even if we limit our scientific curiosity to this level of matter, we are faced with a difficult problem that evolutionary biology tackles with (thought) experiments and computer simulations. The building blocks of life are amino acids. They require a specific code composed of the three letters that represent the nucleotides. The age-old question is: how did the code come to be, and how did the molecule represented by that code come to be? It seems that both the language and the language carrier had to occur at the same time! We could think about this problem as if we needed an Android app and also a phone where that app would run. The comparison is more accurate than it may seem at first. Computer programs are complicated, but the path to the production of a functional protein is complicated as well. Folding is not random. The three-dimensional structure is “engraved” in the code, and there are many molecular players involved in the complicated processes of transcription and translation of the information into the final product.

Comparing Hypotheses: RNA World vs. Protein World

There is an explanation for that. We are now looking at the biology of modern life. However, primitive life may have been all RNA-based. RNAs can store information (in RNA viruses such as influenza (3)), act as enzymes (in rybozymes (4)), act as gene regulators (in ryboswitches (5)), and possibly perform other functions. The difference between DNA and RNA is only in one OH group, so it was feasible to postulate RNA’s antiquity and possible subsequent transformation into DNA. The RNA Wold hypothesis was introduced by Crick soon after he and  Watson (and rarely mentioned Rosalind Franklin) came up with the DNA structure (6). He didn’t call it the “RNA World Hypothesis”, though. The name came later. There are disagreements in terms of who came first: proteins or nucleic acids. However, the RNA World Hypothesis does not preclude the existence of proteins and their possible functions in the world of biology that precedes ours. This is assuming that there has been some switch in how biology works, of course. The weaknesses of the RNA World hypothesis are (7):

  1. RNA is too complex to appear on its own,
  2. RNA is too unstable,
  3. Catalysis is not a common feature of long RNA molecules,
  4. The number of of catalytic reactions that RNA can perform is limited.

Although these weaknesses are true, the proponents of the RNA World hypothesis have more or less applicable explanations. It would take too long to discuss every issue listed here in detail, but if willing, the reader is welcome to learn more from (7).

What does the opposing protein hypothesis say? The protein hypothesis argues that peptides are easier to synthesize (8), that they too can perform catalytic functions and that the early peptides/proteins were flexible, thus eliminating the need to explain the complexity of the 3D protein structures that we see today (9). In addition, efforts have been made to explain the gradual appearance of complexity in modern proteins (10). To develop the protein hypothesis further, Ikehara proposed the GADV hypothesis (11). It states that the first peptides were composed of only four amino acids (glycine, alanine, aspartic acid and valine) that happen to have the right combination of properties (hydrophaty, alpha helix, beta sheet formation tendencies) so the resulting peptides are stable and potentially functional. The experimental evidence for these claims is limited.

The supporters of the protein hypothesis accept/speculate that the initial primitive life was peptide – based. RNA World came later.

Major Hypotheses on the Evolution of the Genetic Code

The DNA/protein machinery (with RNA as its integral part) is the reality that we can observe, so the question remains: “How did the DNA code come into existence?” Crick called this a “difficult problem”. There are several theories that try to address this problem(12).

1 . The stereochemical hypothesis

This theory is the oldest. As the name implies, codons are assigned to an amino acid based on stereochemistry. Different combination of nucleotides bind amino acids differently. One can’t help but to observe that the interaction is not so simple in the modern transcription/translational machinery.

2. The coding coenzyme handle hypothesis

Apparently a complicated hypothesis that bridges the proto-tRNA and individual amino acids that originally could have acted as catalysts. Based on this hypothesis rybozymes gained more catalytic power and/or diversity from amino acids, hence the benefit of having a diverse genetic code.

3. The four column theory

The theory proposes that the catalytic amino acids were the first to enter the codon.

4. The co-evolution theory

Since it is unlikely that all 20 amino acids were formed by random chemical processes, some of them were made through biosynthesis, hence “co-evolution”. Recent experiments go against this theory (2).

5. The error minimization theory

At the heart of this theory is the principle of codon optimization by point mutations. It results in a more robust codon.

6. The frozen accident theory

The frozen accident theory was proposed by Crick. That theory states that what we see now is “frozen” ie. (maximally improved) and it got to that point by chance.

Comparative Analysis and Observations

When compared side by side, existing hypotheses on the evolution of the genetic code share common assumptions but diverge in how they account for structural constraints and error minimization. While some models emphasize chemical affinities or co-evolution with metabolic pathways, others rely on historical contingency or selective optimization. Taken together, these frameworks explain certain features of the genetic code, but none fully account for its universality and robustness without invoking additional assumptions.

Implications for Molecular Evolution:

A curious reader interested in finding out how the genetic code came to be would probably look for answers to the following questions:

  1. How did the first nucleobases and nucleotides occur? Current explanation: random, Miller experiment.
  2. How did the first RNA get synthesized? There is no real explanation (7).
  3. How did the first RNA gain the functions necessary to perform all the processes necessary for life? RNA catalytic activity is limited (7).
  4. If RNA was enough for life why invent the triplet genetic code required for yet another molecule that will need an adapter molecule to perform the same function of preserving life as RNA? If it is true that there were first proteins performing all the functions, what was the driving force for the fundamental switch of biology to RNA and then subsequently to DNA/RNA/protein machinery?
  5. Once the triplet codon is somehow formed, why are there so many theories about how it could have changed over time? They can’t all be right, but they could possibly all be wrong. Are we sure that the codons have changed over time? We share codons with bacteria, so if it had changed, it must have been frozen at the stage of the hypothetical elusive LUCA as Crick theorized.

Conclusion and Open Questions

Evolutionary biology doesn’t seem exact. With enough education in the life sciences, one should be able to analyze the problem and look for available data in order to (at least) see the direction from which the potential answers to these fundamental questions in biology may come. Biologists understand that the current DNA/RNA/protein interplay is too complicated for a spontaneous occurrence, so they hypothesize a world with simpler molecules and simpler biochemistry. However, they don’t agree on the nature of that world (protein or RNA). Furthermore, they hypothesize multiple switches of fundamental biological processes from protein over RNA to the DNA/RNA/protein machinery. There are at least SIX theories that attempt to explain the evolution of the DNA codon. There is not much research on how the first triplet code was formed. The current reductionist approach always starts from the assumption that in the beginning everything was dead and simple and that everything has a natural, spontaneous tendency to get complicated and evolve despite the opposing force of entropy. This is open for debate.

References:

  1. Hawking, S. A brief history of time: From the big bang to black holes. (Ishi Press International, 2020).
  2. Parker ET, Cleaves HJ, Dworkin JP, Glavin DP, Callahan M, Aubrey A, Lazcano A, Bada JL. Primordial synthesis of amines and amino acids in a 1958 Miller H2S-rich spark discharge experiment. Proc Natl Acad Sci U S A. 2011 Apr 5;108(14):5526-31. doi: 10.1073/pnas.1019191108. Epub 2011 Mar 21. PMID: 21422282; PMCID: PMC3078417.
  3. Jensen S, Thomsen AR. Sensing of RNA viruses: a review of innate immune receptors involved in recognizing RNA virus invasion. J Virol. 2012 Mar;86(6):2900-10. doi: 10.1128/JVI.05738-11. Epub 2012 Jan 18. PMID: 22258243; PMCID: PMC3302314.
  4. Janzen E, Blanco C, Peng H, Kenchel J, Chen IA. Promiscuous Ribozymes and Their Proposed Role in Prebiotic Evolution. Chem Rev. 2020 Jun 10;120(11):4879-4897. doi: 10.1021/acs.chemrev.9b00620. Epub 2020 Feb 3. PMID: 32011135; PMCID: PMC7291351.
  5. Tabuchi T, Yokobayashi Y. Cell-free riboswitches. RSC Chem Biol. 2021 Aug 4;2(5):1430-1440. doi: 10.1039/d1cb00138h. PMID: 34704047; PMCID: PMC8496063.
  6. Crick FH. The origin of the genetic code. J Mol Biol. 1968 Dec;38(3):367-79. doi: 10.1016/0022-2836(68)90392-6. PMID: 4887876.
  7. Bernhardt HS. The RNA world hypothesis: the worst theory of the early evolution of life (except for all the others)(a). Biol Direct. 2012 Jul 13;7:23. doi: 10.1186/1745-6150-7-23. PMID: 22793875; PMCID: PMC3495036.
  8. Milner-White EJ. Protein three-dimensional structures at the origin of life. Interface Focus. 2019 Dec 6;9(6):20190057. doi: 10.1098/rsfs.2019.0057. Epub 2019 Oct 18. PMID: 31641431; PMCID: PMC6802138.
  9. Pohorille A, Wilson MA, Shannon G. Flexible Proteins at the Origin of Life. Life (Basel). 2017 Jun 5;7(2):23. doi: 10.3390/life7020023. PMID: 28587235; PMCID: PMC5492145.
  10. Caetano-Anollรฉs D, Kim KM, Mittenthal JE, Caetano-Anollรฉs G. Proteome evolution and the metabolic origins of translation and cellular life. J Mol Evol. 2011 Jan;72(1):14-33. doi: 10.1007/s00239-010-9400-9. Epub 2010 Nov 17. PMID: 21082171.
  11. Ikehara K. [GADV]-protein world hypothesis on the origin of life. Orig Life Evol Biosph. 2014 Dec;44(4):299-302. doi: 10.1007/s11084-014-9383-4. Epub 2015 Jan 16. PMID: 25592392; PMCID: PMC4428654.
  12. Kun ร, Radvรกnyi ร. The evolution of the genetic code: Impasses and challenges. Biosystems. 2018 Feb;164:217-225. doi: 10.1016/j.biosystems.2017.10.006. Epub 2017 Oct 12. PMID: 29031737.