Thursday, May 01, 2014

A Strange Codon Symmetry

Codons have a peculiar symmetry property that's not much discussed, which is that if you look at codon usage in protein genes for an organism, every codon occurs at roughly the same frequency as its reverse-complement pair. For example, the codon GCT (alanine) has a reverse complement of AGC (serine). If GCT occurs at a high rate, AGC will occur at a similar high rate. If one is low in frequency, the other will be low too. I've written about this before, but I want to return to it one more time, because it's bizarre and crazy and deserves an explanation, and I'm hard pressed to come up with one.
Usage frequencies for codons in Ktedonobacter racemifer. Click to enlarge.

This correlation tends to be highest in organisms with genomes that have a high G+C content and lowest for organisms that have low G+C content. The Pearson coefficient ranges from about 0.8 (high GC) to 0.3 (low GC).

I've never heard a good explanation for this phenomenon. I like to think aliens aren't to blame, though.

One possible explanation is that many protein genes originated, long ago, as antisense proteins, following a gene duplication event. If a protein is made from the plus strand of gene X and the corresponding antiprotein is made from the minus strand of the same gene (X'), the X' copy will have different amino acids from the normal copy (of course) but the codon/anticodon usage rates will be the same. 

Therefore, what we may be looking at is an echo (a reverse complement signal) from the distant past.

Another possibility is that antisense regions become active in the process of gene duplications. Suppose a gene (Gene1) gets copied, but the copy (Gene2) gets clipped during the duplication. In its new location on the genome, Gene2 will exist without its original stop codon. The nearest naturally occurring stop codon may be 60 base pairs downstream. But underneath that 60-bp region might be another gene (Gene3) on the opposite strand. Ultimately, the copied gene overlaps the gene underneath it by 60 bp:

Gene1 Gene2
----> ---->

This is called a convergent overlap, and such overlaps are common in nature. They're seldom longer than 100 base pairs (and are often just a few base pairs long). Divergent overlaps also occur.

Any kind of overlap will mean that an "antisense" signal will enter the codon pool.

One thing is for sure: The correlation between occurrence rates of codons and their reverse-complements is too strong to be due to chance.

A dramatic example of the kind of symmetry I'm talking about comes by way of a monster bacterium known as Ktedonobacter racemifer DSM 44963, which is a monster in the sense of sheer genome size: It has a 13.6-million-base-pair genome encoding a whopping 11,540 putative genes (plus 1,178 pseudogenes). The codon usage frequencies are shown in the graphic further above. Each codon is shown with the corresponding reverse-complement codon. The length of the bars corresponds to overall usage frequency (across all proteins in the genome). Frequencies for codon/reverse-complement pairs correlate strongly (r=0.799); too strongly to be by chance (p < 001).

Out of curiosity, I checked the codon frequencies for yeast (Saccharomyces cerevisiae strain DBVPG1106), and the same sort of relationship (though not as strong) occurs:

Again, to a first approximation, the frequency of a codon is dictated by the frequency of its reverse-complement cousin. It's interesting that the relationship is still visibly apparent even though Saccharomyces has a coding-region GC content of just 37.4%.


  1. Anonymous8:31 AM

    Where are these complementary codons in relation to one another? I would wonder whether the symmetry has to do with the maintenance of the 3D double helix. Eg, for every convexity, there's a concavity in the complement strand. Perhaps the shape these nucleotide arrangements take is important in this tendency towards primary sequence symmetry. Highly interesting, thanks!

    1. They're all over the genome, apparently. Otherwise the phenomenon would not be as apparent as it is. If it were just due to small overlaps (convergent stop regions, say) it would not show up as a genome-wide pheonomenon. To influence the codon frequencies globally means it's happening globally. Whether it has to do with supercoiling or other 3D phenomena, I have no idea.

    2. Anonymous10:37 AM

      What I meant was whether complementary trinucleotides are occurring in closer relation to one another, e.g., within so many hundreds of base pairs of one another or at consistent intervals. We've got some preliminary evidence that there are complementary pyrimidine/purine waves that seem to occur throughout genes. And each dinucleotide pairing has a characteristic twist which, cumulatively, dictates the conformation of the local double helix, even though this evens out over thousands of base pairs. In other words, maybe there is a conserved local shape to the DNA, in terms of curvature, that can be seen in trends of complementary trinucleotides. For every To there is a Fro.

  2. In both examples you show the rule is clearly broken for stop codons. Can you account for these "exceptions"?

    1. You're right, stop codons appear to be exceptions, particularly in yeast (a low-GC organism). I can't account for why certain exceptions exist but not others. Your guess is as good as mine.

    2. Anonymous10:03 AM

      Stop codon frequencies are bounded by the number of proteins, so maybe they're an outlier because of that.

    3. Agreed, although I personally believe pseudogenes are vastly underrported, and this could bear on the results.

  3. Thinking about the future life is a very common thing in today’s world. People do not work hard and still want to have a good future. Education is the only solution of their problems and concerns.

  4. Anonymous2:32 AM

    This comment has been removed by a blog administrator.

  5. This comment has been removed by a blog administrator.

  6. This comment has been removed by a blog administrator.

  7. This comment has been removed by a blog administrator.

  8. This comment has been removed by a blog administrator.

  9. This comment has been removed by a blog administrator.

  10. I’m really thankful for this list, it helped me save time searching for blogs that accepts guest post as I’m writing some inspirational article for my site and for the readers out there who wants to know about my topics. Beberapa Macam Jenis Penyakit Pada Mata Obat Bronkiolitis Tradisional
    Keep it up brother, I salute you for this share.
    Obat Herbal Limfoma Maligna
    Obat Amebiasis Tradisional

  11. article nice and very pleasant to read. I like it very much. its a good job. I have health articles. as follows :obat tradisional mata minus dan mata plus cara memulihkan patah tulang might be useful obat benjolan di ketiak cara mengobati maag. The article is also great. or this one cara mengobati batuk obat benjolan dibelakang telinga. May I share articles, it may be useful to other people.

  12. Hello there! This is my first comment here I truly enjoy reading through your posts. Can you suggest any other blogs/websites/forums that deal with the same subjects?

    Obat Tradisional Infeksi Saluran Kemih (ISK)
    Beberapa Manfaat Wortel untuk Kesehatan
    Gejala dan Penyebab Anemia Defisiensi Besi
    Obat Osteomalasia atau Pelemahan Tulang Herbal
    Obat Penyakit Paru Obstruktif Kronik atau PPOK Herbal

  13. I'am often to blogging and i really appreciate your content. The article has really peaks my interest. I am going to bookmark your site and keep checking for new information.
    Cara Mengobati Gagal Ginjal Selain Cuci Darah



  15. Thank you for the information that has been submitted, its content greatly enhances my insight Obat Herbal QnC Jelly Gamat Asli




Add a comment. Registration required because trolls.