Subscribe to our mailing list

* indicates required

Sunday, April 20, 2014

Bidirectionally Overlapping Genes

The occurrence of bidirectionally overlapping genes in bacteria is rare, and most such examples are dismissed as chimeric or representative of simple genome mis-annotation. After all, how can a gene make sense in one direction, but also make sense on the reverse-reading complementary strand of DNA? Such a situation is more than a mere palindrome. It's akin to the phrase:
Warsaw won, eh?
He now was raw.
The phrase has a sensical message in each direction, yet is not a mere bidi-symmetry of the "A man, a plan, a canal, Panama" kind. It defies credulity to believe a stretch of DNA spanning several hundred bases (several hundred "letters") could evolve to give a useful message in both directions. And yet, what is life itself, if not credulity-defying? Somehow, life began from primordial chemistry and evolved toward DNA genes coding for proteins. Is it so hard to believe that early replicant molecules (probably RNA) were transcribed and translated in both directions, and that some of the happy accidents survived? Is it so hard to believe that some proteins began life as reverse transcripts ("nonsense" proteins) that then evolved toward specialized functionality?

A bonafide example of a bidirectionally transcribed and translated gene was verified experimentally in 2008 by Silby and Levy, who were investigating the soil bacterium Pseudomonas fluorescens PF0-1. They found that the hitherto unknown cosA gene, which overlaps (on the opposite DNA strand) a gene for a fusaric acid resistance protein, is not only expressed as a protein but is required for soil colonization.

A section of P. fluorescens PF0-1 genome showing the existence of overlapping genes (note the yellow-colored segment, representing the cosA gene; the larger green gene above it, on the opposite strand, encodes a fusaric acid resistance protein). The overlapping genes have been shown experimentally to be expressed as protein.
Ironically, a month after Silby and Levy published their results, BMC Genetics published a study by Pallejà et al. looking at large gene overlaps in bacterial genomes. The Pallejà study concluded:
Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.
Silby and Levy argue that, to the contrary, current genome annotations are obscuring potentially important discoveries:
[Our] findings suggest that current genome annotations provide an incomplete view of the genetic potential of a given organism . . . In eukaryotes, the concept that genomes include numerous sense/antisense gene pairs is becoming increasingly obvious with genome-wide transcriptional studies in yeast [8] and Arabidopsis [10]. Antisense transcripts have been implicated in eye development [20] and control of entry into meiosis in yeast [21]. However, discussion of antisense transcription is limited to possible regulatory roles for antisense RNA [e.g. 8], without consideration of the possibility that they may specify proteins. Genome annotations do not routinely predict the existence of two protein-coding genes on opposite DNA strands, and in fact normally deliberately eliminate predicted overlaps. Moreover, small protein-coding genes can be missed by predictive algorithms. For example, the blr gene in E. coli specifies a 41 residue protein, and was discovered in a sequence believed to be intergenic [22]. The fact that antisense genes have been implicated in important biological functions indicates that more attention should be given to this emerging class of genes.
I happen to agree with Silby and Levy. It would be a shame if bidirectional overlaps in genomes are not investigated. The notion (furthered by Pallejà) that annotation software should suppress such findings automatically is repulsive. It's the kind of intolerant, rigid, dogmatic thinking science, quite frankly, doesn't need more of.


  1. Theorizing out of ignorance here, but: Rule One of evolution is, "Whatever works." There is no designer and hence no sense of style, no rules, no long-term plan in this stuff. Any hack, however ugly, that meets an immediate need will be retained. (As a programmer, there are lots of coding patterns I avoid because I know they will cause maintenance headaches later. Evolution lacks this kind of foresight.) So if reading a gene backwards makes a useful product, why not?

    Second, isn't it the case that reading a section of DNA backwards produces the same sequence of amino acids as reading forward? Wouldn't the only difference be that, in the reverse sequence of construction, the nascent protein might fold differently than if produced in the forward sequence?

    But third, the situation might be unstable in an evolutionary sense for this reason: that any mutation in a bidirectional sequence affects *two* products, not just one. So a mutation has a double chance to have a harmful result.

  2. Thank you for the thoughtful comment. You asked: "isn't it the case that reading a section of DNA backwards produces the same sequence of amino acids as reading forward?" No, it doesn't. The codon GCT (Alanine) is AGC (Serine) when transcribed from the other strand; most codons have different values forward than backward. That's kind of the whole point. You get a totally different protein when translating an antisense RNA. It will fold differently, have different catalytic centers, etc. etc. But the rest of your comments are in alignment with my own thinking. Nature probably does try all combinations.


Add a comment. Registration required because trolls.