The Language of DNA

Tags

biology, codon, DNA, evolution, Genetics, language, Popular science, redundancy, science, translation

One of the striking things about the genetic code is the remarkable way it twists back on itself, combining redundancy and utility in a simple, elegant language. Many of us are introduced to the basic concept in school, but that introduction often leaves out the wrinkles — some of them newly discovered — which give the system its resilience and precision. Despite their complexity, most of these tricks are pretty easy to explain with linguistic analogies, which is precisely what I’m going to try in this post.

Four letters make up the genetic alphabet: A, T, G, and C. In one sense, a gene is nothing more than a sequence of those letters, like TTGAAGCATA…, which has a certain biological meaning or function. But what makes a series of letters have meaning, what gives it a function? In the most straightforward case, it happens because a gene is translated into a protein, a tiny molecular machine. Proteins are made of amino acids, and each gene lists the amino acids that make up a specific protein. Since there are only four genetic letters but 20 different amino acids, the information in a gene is organized into three-letter words called codons; there are only 16 ways of combining four letters into two-letter words, but bumping up the length by a single letter creates 64 possible three-letter words — more than enough to have one for each amino acid. The molecular machinery of the cell assembles a protein by reading through the appropriate gene on a strand of DNA, and stringing together the amino acids that match the codons.

The beauty of the system emerges from the fact that there are 64 possible words but they only need 21 different meanings — 20 amino acids plus a stop sign. That creates the first layer of redundancy, since codons can be synonyms. Just like ‘cup’ and ‘glass’ mean (essentially) the same thing, two different codons can refer to the same amino acid; for example, the GAG and GAA both mean ‘glutamic acid’. Synonymous codons offer some protection against mutation. If the last letter of a GAA happened to mutate into a G in a gene, it would still get a glutamic acid at that point, since GAA and GAG are synonyms.

Of course, synonymous codons don’t completely mask the effect of mutations. Continuing with the example above, if GAA mutated into GAC or GAT, the meaning of the codon would change. Instead of referring to glutamic acid, it would now refer to a different amino acid called aspartic acid. The change in amino acids would affect how the resulting protein functions, which would have consequences for the organism — it might not be able to taste a particular chemical, for example. The change would depend on what the protein normally does and on how different the new amino acid makes it, and that’s where another layer of redundancy comes in. The amino acids can be divided into groups based on important chemical properties, and codons that are similar (but not synonymous) often refer to different amino acids in the same group — that is, to amino acids with similar properties. Even though GAA/GAG and GAC/GAT refer to different amino acids, both are in the ‘polar’ group of amino acids, so the impact is less than if the switch had been to a completely different group. Our languages don’t need the same level of robustness as the genetic code, so there isn’t a similar redundancy (at least, not that I know of). It would be as though English were set up so that any typos of the word ‘cat’ would still be the name of a mammal (like ‘bat’) instead of something completely different (like ‘car’ or ‘hat’).

There’s one more wrinkle, a remarkable trick to get some extra precision out of all this redundancy. In English, synonyms often have slightly different meanings (think of ‘eat’ and ‘dine’, for example), and it turns out that synonyms in the genetic code are also subtly different. A paper published in Science last year showed that the codons at the beginning of a gene affect how strongly that gene is expressed. It’s as though someone described something as ‘a dime for twelve’ instead of ‘a dime a dozen‘; the meaning is exactly the same, but the phrase has less of an impact. Swapping GAA for GAG in a gene doesn’t affect its meaning, but if the change happens near the start of the gene it can change the activity level. (For the more technically inclined: this seems to result from changes in the secondary structure of the mRNA.) It’s an amazing trick, wringing an extra touch of utility out of a system driven by the need for resilience. These processes — the translation of DNA into proteins — are central to the story of evolution, so it’s not surprising to discover that they’ve been exquisitely honed over the aeons. It is beautiful, though.

Ref
Goodman DB, Church GM, & Kosuri S (2013). Causes and effects of N-terminal codon bias in bacterial genes. Science (New York, N.Y.), 342 (6157), 475-9 PMID: 24072823