blogorrhea: mtDNA

Sunday, July 21, 2013

Why Mitochondrial DNA is Different

Most genomes that are high in A+T content (or low in G+C content) show a surprising DNA strand asymmetry: The message strand of genes tends to be rich in purines. This rule applies across all domains I've looked at except mitochondria, where message strands tend to be pyrimidine-rich rather than purine-rich. The following two graphs makes this clearer.

This is a graph of message-strand (or RNA-synonymous-strand) purine content plotted vertically, against A+T plotted horizontally, for 1,373 bacterial species. Each dot represents a genome. High-GC/low-AT organisms like Streptomyces and Bordetella are on left and low-GC/high-AT organisms like Clostridium botulinum are toward the right. The few dots on the far right are intracellular endosymbionts that have lost a good bit of DNA over the millennia. They tend to be extremely high in A+T.

Compare the above graph with the graph below, which is the same thing (message-strand A+G vs. A+T) for mitochondrial DNA (N=2543 genomes). There is still an upward slope to the data (and in fact it is steeper than it looks, because the range of y-values is different in the graph below than in the graph above). The slope of the regression line is very nearly the same (0.148 vs. 0.149) for both graphs. But you can see that in the graph below, nearly all the points are below y = 0.50. That means message-strands are high in pyrimidines rather than purines.

I speculated in a previous post that the reason mitochondrial DNA is pyrimidine-heavy on the message strand is that mtDNA encodes a very small number of proteins (13, in all), and they tend to be membrane-associated proteins, which use mostly non-polar amino acids. It turns out that codons for the non-polar amino acids are pyrimidine-rich.

To see if that's really what's going on, I obtained the DNA sequences for cytochrome-c oxidase and NADH dehydrogenase (the two must fundamental enzyme systems of mitochondria) from several hundred bacterial species. Actually, I was able to obtain DNA sequences for a total of 942 bacterial NADH dehydrogenase (subunit L) proteins. I also succeeded in obtaining DNA sequences for 647 bacterial cytochrome-c oxidase subunit 1 proteins. In mitochondria, these genes are known as ND5 and Cox1. In bacteria they're better known as nuoL and cyoB.

The graph below shows A+G for the two enzymes versus whole-chromosome A+T, for the relevant organisms.

Message strand purine content was derived from the DNA sequences of cyoB (pink) genes from 942 bacteria, and from nuoL (blue) genes from 647 bacterial species. The A+G values were plotted against host-organism whole-genome A+T content. All cyoB and nuoL sequences tended to be pyrimidine rich. But pyrimidine content was less for organisms with high A+T content. (Note the slightly positive slope of the regression line.)

The pink points are for cytochrome-c oxidase subunit 1 (cyoB) while the blue points are for NADH dehydrogenase subunit 5 (nuoL). Two things are worth noting. One is that the regression line is upward-sloping, meaning that as an organism's DNA gets richer in A+T content, the purine content on the message strand rises. This effect seems to be universal. The second thing to note is that almost all of the points in the graph lie below y = 0.5, as is the case for mitochondria. These two signature "mitochondrial" enzyme systems, critical to oxidative phosphorylation (in bacteria as well as higher organisms), do tend to use pyrimidine-rich codons—rendering the relevant genes pyrminidine-rich on the RNA-synonymous (message) strand of DNA. The hypothesis is upheld.

For you bio students, a bit of homework: You might want to think about why it is that membrane-associated proteins are rich in non-polar amino acids. (In human mitochondria, leucine and isoleucine are the most-used amino acids. Together they account for an amazing 30% of all amino acids used in mtDNA-encoded gene products.) Hint: Most membranes have a lipid bilayer, and lipids don't like water.

Saturday, July 20, 2013

More about Mitochondrial DNA

To recap my desktop-science experiments of the last month or so, I've found strandwise DNA asymmetry across domains, which is to say in bacteria, Archaea, eukaryotes, viruses, and mitochondrial DNA. In every case except mitochondria, the message (or RNA-synonymous) strand of DNA in coding regions tends to be purine-rich. The opposite strand tends to be pyrimidine-rich. Moreover, in all domains, including mitochondria, message-strand purine content increases in proportion to genome A+T content. (A+T content is a phylogenetic signature. Some genomes are inherently high in A+T content—or low in G+C content—while others are not. Related organisms tend to have similar A+T or G+C contents.)

Mitochondrial genes tend to be pyrmidine-rich on the message strand, seemingly in violation of the finding that in all other domains, message strands are purine-rich. The mitochondrial anomaly is actually very easy to understand (although it took me weeks to realize the explanation). In a nutshell: Mitochondrial DNA is pyrimidine-rich on message strands because mtDNA encodes only a few proteins (13, usually), all of them membrane-associated. Membrane-associated proteins are unusual because they tend to incorporate mostly non-polar amino acids such as leucine, isoleucine, valine, proline, alanine, or phenylalanine—all of which are specified by pyrimidine-rich codons.

The mitochondrion.

It seems to me mitochondrial DNA shouldn't be thought of as a genome, because well over 90% of mitochondrial-associated gene products are encoded by genes in the host nucleus. (In humans, there may be as many as 1500 nuclear-encoded mitochondrial genes.) This point is worth repeating, so let me quote Patrick Chinnery, TRENDS in Genetics (2003) 19:2, 60:

The vast majority of mitochondrial proteins (estimated at >1000) are synthesized in the cytosol from nuclear gene transcripts.

The circular mitochondrial "chromosome" (if it can be called that) is the vestigial remnant of a much larger genome that long ago migrated to the host nucleus, no doubt to avoid oxidative attack. The mitochondrion simply is not a safe place to store DNA. (Would you set up a sperm bank in a rocket-fuel factory?) It's teeming with molecular oxygen, superoxides, peroxides, free protons, and other hazardous materials.

The human mitochondrial chromosome.

Human mitochondrial DNA (which is typical of a lot of mtDNA) encodes just a handful of multi-subnit transmembrane proteins, namely: cytochrome-c oxidase, NADH dehydrogenase, cytochrome-b, and an ATPase. That's it. There are no other protein genes in human mtDNA. All other "mitochondrial proteins" are encoded somewhere else. (That includes 37 out of 44 subunits of the NADH dehydrogenase complex; the DNA polymerase that replicates mitochondrial DNA; the mitochondrial RNA polymerase; about 50 ribosomal proteins; so-called "mitochondrial" catalase; and hundreds of other "mitochondrial" proteins. All are encoded in the nucleus.)

Bottom line: Mitochondrial DNA encodes a very small ensemble of highly specialized membrane-associated proteins. We shouldn't expect this small ensemble to be representative of other genes found in other genomes. (And it's not.) That, in a nutshell, is why mtDNA is not particularly purine-rich in message strands.

But we should test this hypothesis, if possible. (And it is, in fact, possible.) Most bacteria are aerobic, which means most bacterial species have genes for cytochrome-c oxidase, NADH dehydrogenase, etc. The DNA for those genes should be similar to mtDNA with respect to strand-asymmetric purine content. If we analyze bacterial DNA, we should find that genes for cytochrome-c oxidase, NADH dehydrogenase, etc. are pyrimidine-rich on the message strand, just as in mtDNA.

In tomorrow's post: the data.