Friday, June 21, 2013

RNA Folding and Purine Loading

The other day I learned that an acquaintance of mine had done graduate work in a famous molecular genetics lab. We started "talking shop," and I happened to mention some of my recent bioinformatics forays, in particular my recent unexpected finding that the purine content of mRNA can be predicted from the G+C (guanine plus cytosine) content of the genome.

The purine (A+G) content of protein-coding regions of DNA correlates with the overall A+T content of the genome. The higher the A+T content of the double-stranded DNA, the higher the purine content of the single-stranded mRNA. A total of 260 bacterial genomes were analyzed for this graph. Organisms with very high A+T content tend to have relatively small genomes, which is one reason there is more scatter toward the right side of the graph. Correlation: r=0.852.

My friend asked what the implications of this might be. I offered a couple of thoughts. First, I said that just as differences in G+C content between genes in a given organism can sometimes be used to detect foreign genes (e.g., embedded phage/virus genes, horizontal gene transfers, etc.), variations in the purine to pyrimidine ratio of gene coding strands might also be a way to detect foreign genes. For example, in an organism like Clostridium botulinum, where the genome's coding regions have an average purine content of 58.5%, finding a gene with purine content below 46% (two standard deviations away from the mean) might be a tipoff that the gene came from a different organism. This is a useful new technique, because genes with high-purine-content coding regions don't always have high A+T content (thus, detection of horizontal gene transfers via purine loading will expose genes that would otherwise be missed on the basis of G+C  content). In other words, two genes might have exactly the same G+C (or A+T) characteristics but differ in purine content. The difference in purine content would be the tipoff to a possible horizontal-gene-transfer event.

Another implication of the A+G versus A+T relationship involves foreign RNA detection. Bacteria need to be able to detect self versus non-self nucleic acids. (Incoming phage nucleic acids need to be detected and destroyed; and in fact, they are. This is how restriction enzymes were discovered.) Messenger RNA has secondary structure: it undergoes folding, based on intrastrand regions of complementarity. The amount of complementarity depends on the relative abundances of purine and pyrimidines that can pair with one another. If a strand of RNA is mostly purines (or mostly pyrimidines, for that matter), there will be less opportunity to self-anneal than if purines and pyrimidines are equally abundant. Thus, the folding of RNA will be different in an organism with high genome A+T content (low G+C content) than in an organism with low A+T.

An example of how purine loading can affect folding is shown below. The graphic shows the minimum-free-energy folding of the mRNA for catalase in Staphylococcus epidermidis strain RP62A (left) and Pseudomonas putida strain GB-1 (on the right). The Staph version of this messenger RNA has a 1.28 ratio of purines to pyrimidines, whereas the Pseudomonas version has a 0.98 purine-pyrimidine ratio. As a result, the potential for purine-pyrimidine hydrogen bonding is considerably less in the Staph version of the mRNA than in the Pseudomonas version, and you can easily see this by comparing the two RNAs shown below. The one on the left has far more loops (areas where bases are not complementary) and complex branching structures. In the mRNA on the right, long sections of the molecule are able to line up to form double-stranded structures; loops are few in number, and small.

The minimum-free-energy folding for two catalase mRNAs, one with high purine content (Staphylococcus, left) and one with lower purine content (Pseudomonas, right). Foldings were generated by Click to enlarge image.

This kind of difference can explain the ability of various strains of bacteria to reject infectious RNA from another strain's viruses (phage). Foreign RNA entering a cell will "look" foreign to the cell's endogenous complement of RNA nucleases, and based on this, host nucleases will quickly destroy the intruder RNA. This mechanism provides a primitive kind of immune system for bacteria.

There is one other important implication of the purine-loading curve. The curve resolves one long-standing open question in molecular biology, having to do with mutation rates. I'll talk about it in tomorrow's post. Please join me then—and bring a biologist-friend!