Sunday, March 30, 2014

The primordial nature of phage genes

Recent ecological studies have shown that bacteriophage (viruses that attack bacteria) are numerically the most abundant biological entities on the planet. The estimated 1030 viruses (mostly phage) in the oceans, if stretched end to end, would span farther than the nearest 60 galaxies. These viruses are thought to cause the turnover, by virus-related death, of 20% of the ocean's biomass per day. In shotgun sequencing of marine samples, the majority of phage gene sequences are invariably found to be novel (not corresponding to any other known gene sequences). Hence, the bulk of genetic diversity on the planet may well be tied up in viral/phage "dark matter."

One of the most-studied bacteriophage classes is the so-called "T-even" (T2, T4, T6, etc.) class of phages, of which the poster child, arguably, is T4. These are phages that attack enteric bacteria (E. coli and its relatives), hence are commonly found in sewage.
T4 phage morphology.

T4 is interesting from a number of standpoints, not least of which is its distinctive head/tail morphology (see diagram). In phylogenetic studies, T4 typically shows up in basal positions on trees, meaning it is presumably ancient. Increasingly, viruses and phages are considered to be of primordial origin, possibly predating cellular life. Certainly, any theory on the origin of life has to come to grips with the fact that the major biomolecules (proteins, nucleic acids, lipids) had to exist, in some form, prior to the appearance of the first cell. Some experts suggest nucleic acids and proteins may have interacted with each other in a so-called Virus World scenario (see the excellent paper by Koonin) wherein microscopic hydrothermal pore systems (in mineral formations at the ocean floor) provided for sequestration of prebiotic processes in physical compartments that could be invaded by "selfish replicators."

Primordial interaction of proteins with nucleic acids (and their precursors) presumably gave rise to a number of artifacts that survive today, such as ribosomes (which contain over 50 small proteins in tight association with RNA), tRNA (RNA covalently bound to an amino acid),  adenine-containing cofactors (e.g. SAMe, NADPH), and viruses (capsid and other proteins bound to RNA or DNA). Conceptually, one can think of protein/nucleic-acid complexes as having diverged, at the Darwinian Threshold, along two lines: toward ribosomal life, or toward the viral world.
Bacterial cell covered with T4 virions.

The genes for certain viral capsid proteins (with colorful names like Jelly Roll Capsid) are among a number of "viral hallmark genes" that show no homology to any genes from the cellular world. Presumably, some of these genes are of truly primordial ancestry. We have a valuable clue to the origin of at least some of these genes in the case of phage T4. A number of tantalizing reports from the 1970s (see here, here, and here) suggest that the enzymes dihydrofolate reductase and thymidylate synthase (both encoded by T4 DNA) are, in fact, components of the virion baseplate and/or tail structure of T4. Hence, at least in some cases, it's conceivable that virion structural proteins began as enzymes.

What's particularly intriguing about the T4 enzymes is that T4's thymidylate synthase (ThyA), which is phylogenetically ancient, is encoded in the phage DNA immediately downstream of the gene for dihydrofolate reductase, with no intervening "junk DNA." Why is this significant? In many organisms (as I explained in an earlier post), these two enzymes occur in a single large bifunctional enzyme that's proposed to be the result of a gene fusion event. In organisms that have the double enzyme, the reductase occurs at the beginning (the N-proximal end) of the protein.

Just for fun, I took the protein sequence for T4 dihydrofolate reductase and fused it (in Notepad) with the sequence for T4 thymidylate synthase, then did a BLAST search of the fusion sequence against all the protein sequences at The naturally occurring bifunctional ThyA/dihydrofolate reductase enzymes from peach, balsam, rice, castor bean, and clementine (Citrus clementina) all showed up as hits, with E-values of 10-67 or better.

This doesn't prove that the bifunctional enzymes of the peach, etc. came from T4 phage, of course, but it is consistent with the general idea that the bifunctional ThyA/dihydrofolate reductases of algae and protists could (at least in theory) have started out as phage gene fusion products.

Let's put it this way: Weirder things have been known to happen.