Reading up on this subject got me to thinking: If DNA undergoes damage and repair at transcription time (when genes are being expressed), shouldn't highly expressed genes differ in mutation rate from rarely expressed genes? (But, in which direction?) Also: Does one strand of highly expressed DNA (the strand that gets transcribed) mutate or repair at a different rate than the other strand?
We know that in most organisms, there is quite an elaborate repair apparatus dedicated to fixing DNA glitches at transcription time. (This is the so-called Transcription Coupled Repair System.) We also know that the TCRS has a preference for the template strand of DNA, just as RNA polymerase does. In fact, it's when RNA polymerase stalls at the site of a thymine dimer (or other major DNA defect) that TCRS kicks into action. Stalled RNAP is the trigger mechanism for TCRS.
But TCRS isn't the only repair option for DNA at transcription time. I've written before about the Archaeal Ogg1 enzyme (which detects and snips out oxidized guanine residues from DNA). The Ogg1 system is a much simpler Base Excision Repair system, fundamentally low-tech compared to the heavy-duty TCRS mechanism. The latter involves nucleotide-excision repair (NER), which means cutting sugars (deoxyribose) out of the DNA backbone and replacement of a whole section of DNA (at great energy cost). BER just snips bases and leaves the underlying sugar(s) in place.
Being a fan of desktop science, I wanted to see if I couldn't devise an experiment of my own to shed light on the question: Does differential repair of DNA strands at transcription time lead to strand asymmetry in highly expressed genes?
Methanococcus maripaludis |
The DNA sequence data for M. maripaludis was easy to match up against the list of HEGs obtained from http://genomes.urv.cat/HEG-DB. In fact, I was able to do all the data-crunching I needed to do with a few lines of JavaScript, in the Chrome console. In no time, I had the adenine (A), guanine (G), and thymine (T) content for all of M. maripaludis's genes, which allowed me to make the following graph:
What we're looking at here is message-strand purine content (A+G) on the y-axis versus A+T content (which is a common phylogenetic metric, akin to G+C content) on the x-axis. As you know if you've been following this blog, I have used purine-vs.-AT plots quite successfully to uncover coding-region strand asymmetries. (See this post and/or this one for details.) The important thing to notice above is that while points tend to fall in a shotgun-blast centered roughly at x=0.66 and y=0.55, the Highly Expressed Genes (HEGs, in red) cover the upper left quadrant of the shotgun blast.
What does it mean? Consider the following. Of the four bases in DNA, guanine (G) is the most vulnerable to oxidative damage. When such damage is left uncorrected, it eventually results in a G-to-T transversion mutation. A large number of such mutations will cause overall A+T to increase (shifting points on the above graph to the right). If G-to-T transversions accumulate preferentially on one strand, the strand in question will see a reduction in purine content (as G, a purine, is replaced by T, a pyrimidine) while the other strand will see a corresponding increase in purine content (via the addition of adenines to pair with the new T's). Bottom line, if G-to-T transversions happen on the message strand, points in the above graph will move to the right and down. If they happen on the template (or transcribed) strand, points will move left and up. What we see in this graph is that HEGs have gone left and up.
The fact that highly expressed genes appear in the upper left quadrant of the distribution means that yes, differential repair is indeed (apparently) happening at transcription time; highly expressed genes are more intensively repaired; and the beneficiary of said repair(s), at least in M. maripaludis, is the message strand (also called the RNA-synonymous or non-transcribed strand) of DNA, which is where our sequence data come from, ultimately. A relative excess of unrepaired 8-oxoguanine on the template strand (or transcribed strand) means guanines are being replaced by thymines on that strand, and new adenines are showing up opposite the thymines, on the message strand, boosting A+G.
I don't know too many other explanations that are consistent with the above graph.
I hasten to add that one graph is just one graph. A single graph isn't enough to prove any kind of universal phenomenon. What we see here applies to Methanococcus maripaludis, an Archaeal anaerobe that may or may not share similarities (vis-a-vis DNA repair) with other organisms.