Friday, June 13, 2014

Thermometer Genes

Heat shock proteins are an interesting class of proteins that provide "damage control" for enzymes when temperatures rise to the point where proteins start to unfold and refold improperly. Protein 3-dimensional structure is critical to proper enzyme function, and it doesn't take much thermal jostling to mess up a protein's structure. Therefore it's not surprising cells have their own miniature repair factories for refolding heat-misfolded proteins.

Collectively, heat shock proteins are part of a group of proteins known as chaperones, some of which are bonafide refoldases and others of which aid proteins in other ways. (For example, the ClpB protein rescues proteins from an aggregated state.)

GroEL is a so-called type I chaperonin involved in protein folding, assembly, and transport. Like many heat shock proteins, it's over-expressed at high temperatures and plays a critical role in growth and survival at non-permissive temperatures. Because of its importance in many cellular processes, GroEL is ubiquitous in bacteria, with most species having a single GroEL gene, but with about 30% of  genomes having two or more GroEL copies.
GroEL mRNA of Mycobacterium intracellulare can fold into the advanced low-energy secondary structure shown here.
The question of how cells up-regulate heat shock proteins during times of thermal stress is still largely open, although we know in some cases specific transcription regulator proteins are involved (but then the question becomes: how do the regulators know to up-regulate in times of heat stress?). The answer might not be that difficult. The messenger RNAs encoding GroEL and other heat shock proteins contain a great deal of secondary structure (that is to say, the RNA folds back on itself to form thermally sensitive structures). Recently, Wan et al. surveyed RNA thermal sensitivity in yeast and found thousands of so-called "mRNA thermometers": RNA molecules that unfold in response to heat. About three quarters of yeast RNA is thermo-stable at 37 degrees C, while around 55% of RNAs are unfolded at 55 degrees. In the folded state, RNA probably requires the help of helicases or other "helpers" to unfold, but at a high enough temperature, the molecules unfold by themselves and become eligible for translation by ribosomes.

To investigate the possible role of mRNA secondary structure in GroEL regulation, I wrote scripts that check a gene for all occurrences of length-9 (so-called "9-mer") nucleotide sequences that have a corresponding reverse-complement sequence in the same gene. When I checked the GroEL gene of Mycobacterium tuberculosis (Erdman strain), I found 14 pairs of complementarity 9-mers, representing regions of the gene that could, in theory, cause secondary structure to form in mRNA. A check of the sister organism M. intracellulare (whose GroEL gene is 80% identical to the M. tuberculosis version) showed 22 such complementary pairs.

Interestingly, the mutational differences between GroEL in M. tuberculosis and M. intracellulare do not appear to be randomly distributed along the gene. In M. tuberculosis, mutations occur at a rate of  0.12698 substitutions per site inside 9-mer regions (putative stems) versus a rate of 0.19722 for non-9-mer regions, indicating that (perhaps) selection pressure is different for self-complementing regions than for other regions. I found much the same thing in M. intracellulare, where the mutation rate was 0.16414 inside 9-mers and 0.19347 elsewhere.

Tending to confirm that selection pressure is different for the "secondary structure" regions versus other regions is the (surprising) finding that in M. tuberculosis complementary regions, the ratio of non-synonymous to synonymous mutations (Kn/Ks) is 0.526, versus 0.950 for other regions. In M. intracellulare, likewise, Kn/Ks is less in complementing regions (0.635) than in non-complementing regions (0.975).

To check whether these observations apply only to Mycobacterium or might be more widely applicable, I took a look at GroEL genes in Clostridium acetobutylicum strain ATCC 824 and Clostridium lentocellum strain DSM 5427. The Clostridia are phylogenetically quite distant from Mycobacteria (as confirmed by the fact that their GroEL genes share only 50% nucleotide sequence identity). A total of 462 mutations separated the two Clostridial genes. But again, the mutations segregated non-randomly according to whether they occurred in putative regions of complementarity (secondary structure) as opposed to non-complementing regions. In C. acetobutylicum the mutation rate in complementing regions was 0.23015 substitutions per site (29/126 bases) versus 0.28618 (433/1513 bases) for non-complementing regions, while in C. lentocellum the rates were
0.19047 (24/126) substitutions per site vs. 0.28949 (438/1513). For C. acetobutylicum the Kn/Ks ratios were 0.277 in 9-mers and 1.466 otherwise. For C. lentocellum, Kn/Ks was 0.538 in 9-mers and 1.415 outside 9-mers, tending to confirm that selection pressures are different in stems than in loops.

Bottom line, the data are consistent with a scenario in which secondary structure of GroEL mRNA (and/or ssDNA) plays a role in heat-activation of the gene, such that when temperatures exceed the melting point of secondary structures, the gene is eligible for transcription and/or translation. The gene is, in effect, its own thermometer.