Showing posts with label virology. Show all posts
Showing posts with label virology. Show all posts

Saturday, May 17, 2014

Evolution of Prophage Genes

Viruses have two modes of reproductive existence. In the familiar lytic cycle, the virus infects a cell, replicates itself until the cell bursts, and hundreds (or thousands) of virions are produced. But there is also a lysogenic mode of viral existence, in which the virus inserts a copy of its DNA in the host's own DNA. The viral DNA thus inserted becomes known as a prophage, which can remain dormant for long periods of time. The prophage can often be induced to enter a lytic cycle by exposure of cells to hydrogen peroxide or Mitomycin C. (Induction of phages in this fashion is thought to occur when a phage repressor protein is cleaved by recA after the latter is upregulated in the SOS response.)

Prophage genes are seen in a wide variety of bacteria (a 2008 paper estimated that over 60% of bacterial genomes contain prophage genes), and in fact human DNA is thought to contain at least 8% retroviral gene remnants. There's reason to suspect that certain large DNA animal viruses (such as herpes and vaccinia) have a lysogenic cycle. Certainly, viruses like varicella zoster (which can produce shingles many years after a person's initial infection) can remain dormant for decades before suddenly undergoing induction to a lytic phase.

Viruses that live an exclusively lytic lifecycle have relatively few opportunities to co-evolve with the host, because they spend little time in the host. Such a virus might spend years "hanging around" in the environment before encountering a host cell; then the lytic reproductive cycle may last only minutes or hours, and it's back to "hanging around" in the environment.

The situation is much different for a temperate virus (i.e., one that has a lysogenic cycle). A lysogenic virus essentially becomes an integral, first-class component of the host DNA and undergoes the same replication and repair processes that apply to host DNA. Accordingly, we should expect to see a much different pattern of evolution in the genes of lysogenic viruses (or prophages). And indeed we do.

The phylogenetic tree below was prepared using viral (phage) and bacterial genes for DNA adenine methylase (dam), an enzyme involved in DNA repair and replication. What's interesting about this gene is that many bacteria have their own (native) copies of this gene plus a prophage copy. And they differ, but not as much as, say, lytic-phage thymidine kinase versus native bacterial TK. (I showed phylo-trees for viral and bacterial TK enzymes in a prior post. If you'll recall, these enzymes differ so drastically that it's not at all clear that one derives from the other, ancestrally.)

DNA adenine methylase genes from three enteric bacteria and two phages (marked with asterisks). The top branch shows very close homology between prophage genes and their bacterial paralogues. The bottom branch shows that the native bacterial isoform of the enzyme is not as closely related to the prophage version(s).
With the dam genes, we see an interesting segregation pattern. There are two main branches to the phylo-tree. In the upper branch are the phage dam genes along with bacterial paralogues of these genes. The bottom branch shows how the non-paralogous (non-prophage) dam genes segregate.

To make these relationships clearer, here's a chart showing the overall G+C content as well as the GC3 (G+C content at codon base 3) for the various genes. The entries shaded in grey represent prophage genes. Notice that the G+C percentages are significantly lower for the prophage genes, but are higher than in free-living lytic-cycle phages (where GC3, in particular, is often less than 20%).


DNA adenine methylase genes for enteric bacteria and their temperate phages. Base-composition stats for prophage isoforms are shown in grey.
Organism Gene G+C GC3
Shigella sp. strain D9 ZP_05434596.1 49.60% 55.90%
E. coli EHW52521.1 49.60% 55.90%
S. enterica AAL22346.1 49.10% 54.50%
E. coli EHW55384.1 47.20% 47.90%
Salmonella phage RE-2010 YP_007003503.1 46.50% 46.20%
Shigella sp. strain D9 EGJ07993.1 46.20% 46.30%
S. enterica ETB92379.1 46.20% 43.00%
Fels-2 phage YP_001718754.1 46.20% 43.00%

If you compare the phylo tree shown further above with the phylo tree in my earlier post about thymidine kinase genes, you'll note that the prophage dam genes cluster very tightly with bacterial versions of these genes. That's because, as a fully integrated part of the genome, the prophage genes benefit from the host's DNA repairosome. They evolve gradually over long periods of time by the usual mechanisms. The genes are notably host-like because they're continuously repaired and groomed in the same manner as host DNA.

The takeaway here is: If you create a phylo-tree for a set of genes from hosts and viruses, and the genes cluster tightly with host versions, you're probably looking at the result of longterm lysogeny. On the other hand, if the virus genes do not cluster with host genes (as they usually don't!), that means you're looking at viruses that have a predominantly lytic mode of existence; viruses that probably got their genes from a far-distant ancestor of the modern-day host, if not from a primordial precellular precursor of some kind.

Saturday, March 29, 2014

Virus genes don't always come from the host

Usually, when a virus contains a certain kind of gene, and the host contains the same kind of gene, it's assumed the virus got its copy from the host. This is not a terribly safe assumption, however. In some cases it's demonstrably wrong.

For a striking example of how wrong this assumption can be, you need look no further than the tiny Chlorella alga that can be found living symbiotically inside the fresh-water ciliate Paramecium. When it's not living inside Paramecium, Chlorella is subject to infection by PBCV-1 (the Paramecium bursaria Chlorella virus).

Tiny green Chlorella cells can be seen here growing
inside Paramecium. The full Paramecium cell is
shown in the inset at lower left. (Photo by Charles Krebs.)
Both Chlorella and PBCV-1 have a gene for an enzyme called thymidylate synthase, which is the enzyme that produces thymidine monophosphate (dTMP, or just TMP), a precursor molecule for making DNA. Ordinarily, one would assume that the virus picked up the gene for this enzyme from its host at some point in the past. But there's a problem.

The only thymidylate synthase gene in Chlorella's genome codes for a protein with 508 amino acids. The PBCV-1 virus version of this gene codes for a much shorter protein with only 216 amino acids. It turns out there's a perfectly good explanation for the size difference. Like other small algae (such as Micromonas, Ostreococcus, and Bathycoccus) and certain protozoans as well, Chlorella has evolved a bifunctional enzyme. In Chlorella, the same enzyme acts as both a thymidylate synthase and as a dihydrofolate reductase. In most higher organisms, two different enzymes carry out these functions. Organisms that have the dual-function enzyme are presumed to have developed this capability through a gene fusion event sometime in the (most likely distant) past.

It turns out the PBCV-1 virus synthase not only isn't bifunctional, it carries out its thymidylate reaction by an entirely different mechanism than that used in the host enzyme. The host enzyme employs folate (but no flavins) as a cofactor, whereas PBCV-1 is strictly dependent on flavin adenine dinucleotide (FAD), as verified experimentally by Graziani et al. in 2006. We now know that many bacteria use the FAD version of this enzyme (often called ThyX, as disintguished from ThyA, the folate-only enzyme). And the FAD users all have relatively small thymidylate synthases, of about 200 to 300 amino acids.

The above scenario isn't exclusive to Chlorella and PBCV-1. It turns out, certain other small algae (Micromonas, Ostreococcus, and Bathycoccus; all happen to be salt-water algaae) have a bifunctional thymidylate kinase, yet they are subject to infection by viruses that use the much smaller, mono-functional flavin-binding enzyme.

In all these cases, the virus uses an entirely different style of enzyme than the host to carry out TMP production. There is essentially zero chance that the virus derived its enzyme from the host (or vice versa), because the reaction mechanisms of ThyA and ThyX are radically different. (For more detail on this, see the excellent review article at http://www.ncbi.nlm.nih.gov/books/NBK6401/.) These aren't orthologues; these aren't paralogues; these are entirely different enzymes.

So where did the virus get its thymidylate synthase from, if not the host?

If you take the protein sequence for the PBCV-1 thymidylate synthase and run a BLAST search at UniProt.org, the best non-viral hits (in the range of 58% identities, 77% similarities, E-value 10-69) are for the thymidylate synthases of Prochlorococcus marinus and other cyanobacteria, with cyanophages also scoring high. This makes a great deal of sense, because the photosynthetic Prochlorococcus and its relatives are thought to be some of the most ancient bacteria on earth (possibly going back 3.8 billion years). They're thought to be the ancestors of chloroplasts. At one point, they were almost certainly the predominant life form in the oceans. Since phycodnaviruses (of which PBCV-1 is a member) are thought to be quite ancient, it's entirely possible they got their thymidylate synthase from cyanobacteria. That's certainly what the protein-sequence evidence suggests.

I'll go with the evidence.

Monday, March 24, 2014

Viral abduction of proteins

I have to admit, when I first saw the paper by Newcomb and Brown called "Internal Catalase Protects Herpes Simplex Virus from Inactivation by Hydrogen Peroxide" (J. Virology, 2012, 86:21; full article here), I was tempted to dismiss it as a fluke. What Newcomb and Brown found is that herpes virus appears to contain a fully functioning version of the enzyme catalase. This enzyme, which is present in human cells and, in fact, most aerobic life forms (but also some anaerobes), breaks down hydrogen peroxide to molecular oxygen and water. It detoxifies hydrogen peroxide, if you will.
Herpes simplex virus components.

The odd thing about herpes virus containing catalase is that the herpes genome does not contain a gene for catalase (and this is acknowledged by Newcomb and Brown). Thus, any catalase present in the virion has to have been made by the host cell. The enzyme is piggybacking a ride inside the virion.

It appears that, far from being a fluke, the herpes virus tegument proteins (proteins that lie just underneath the capsid proteins that make up the outer shell of the virion) have evolved in such a way as to attract or stick to catalase, sucking it along for the ride.

Having catalase on board brings survival benefit to the virus. According to Newcomb and Brown:
HSV-1 [herpes virus] was found to be more sensitive to killing by hydrogen peroxide in the presence of a catalase inhibitor than in its absence. The results suggest a protective role for catalase during the time HSV-1 spends in the oxidizing environment outside a host cell. 
In what sense would catalase protect the virus? Peroxides are damaging to DNA, and herpes is a DNA virus. Where do peroxides come from? Short answer: phagocytes (think white blood cells). When a phagocyte ingests bacteria (or any material), its oxygen consumption increases. The increase in oxygen consumption, called a respiratory burst, produces reactive oxygenated species (nitric oxide, superoxides, hydrogen peroxide), which are toxic to most life forms, unless (of course) detoxifying enzymes come into play. In this case, herpes comes well-prepared for the confrontation. It brings copies of the host's own catalase.

This is an extremely clever adaptation (if that's what it is). If you're a virus, why go to the trouble of adding a dedicated catalase gene to your DNA if you can simply recruit host catalase into the capsid by suitable modification of a tegument protein?

Arenavirus can capture ribosomes in virions.
Selective entrainment of host proteins is not unknown in viruses (it's been well studied in HIV and in vesicular stomatitis virus, for example). Even in the case of catalase, it's been known since 1938 that vaccinia virus, a relative of smallpox, carries with it the host's own catalase.

Perhaps the most extreme (and startling) example of viral recruitment of host proteins into virions is provided by Arenavirus (an agent of aseptic meningitis in humans), which can package up host-cell ribosomes (see photo).

It could very well be that most large viruses, such as NCLDVs (and mid-size viruses as well; herpes is by no means large), routinely package host enzymes in their virions. As modern proteomic techniques are brought to bear on the study of virion-associated host proteins, we can probably expect many additional discoveries of this sort in the near future.

Sunday, March 23, 2014

Nucleus-like viruses and their enzymes

Recent findings in virology have forced biologists to consider many notions that just a few years ago would have seemed heretical and/or science-fiction-like. For example, there is now serious discussion of the possibility that cellular life descended from viruses (the Virus World theory; see also this paper). A growing (but still minority) viewpoint is that viruses should be considered symbionts rather than simply parasites (see the review by Villereal). Some have dared to propose that the eukaryotic cell nucleus actually stemmed from a virus. Others have speculated the reverse: that the large DNA viruses are actually escaped, spore-like nucei. Meanwhile, some say that during an earlier RNA World, viruses became the original inventors of DNA.

There's no question that large viruses of the NCDLV class have nucleus-like properties. Within a short time of infection, these viruses set up a complex structure inside the cell known as the virus factory, and the factory looks a lot like a cell nucleus. The authors of a recent paper on Mimivirus (the famously huge virus that infects freshwater amoeba) admitted that in previous work, they did, in fact, mistake the virus factory for the nucleus. (See photo.)

Which is the nucleus and which is the virus factory? In this photo, VP is a virus particle (of the enormous mimivirus) developing inside Acanthamoeba. A smaller virus factory (S) is just beginning to form on the left.

Macroscopic aspects aside, the large "nucleocytoplasmic" viruses (some of which infect animals and marine life, not just amoeba) bring with them many genes for enzymes that are normally found in a cell nucleus. I'm not talking about genes for DNA polymerases, topoisomerases, etc., but genes that act on small molecules. In a previous post, I mentioned the example of PBCV-1 (a virus that infects the alga Chlorella) having its own gene for aspartate transcarbamylase (ATCase), which is an enzyme that catalyzes the first committed step in pyrimidine synthesis. This enzyme (common to most living things) is predominantly found in the cell nucleus of higher organisms.

There are other examples. Many NCLDV-group viruses have a gene for deoxy-UTP pyrophosphatase, an enzyme that breaks the high-energy phosphates off dUTP so that uracil isn't accidentally incorporated into DNA. One can imagine that after a virus invades a cell and unleashes its nucleases on the cell's own RNA, many ribonucleotides (breakdown products of RNA) will be liberated; and many of these will then be reduced to deoxy-nucleotides (by ribonucleoside-diphosphate reductase) in preparation for viral DNA synthesis. As it happens, dUTP is quite easily incorporated into DNA (and is promiscuous in its Watson-Crick pairing with other nucleobases); the resulting malformed DNA can trigger apoptosis in some cells. The virus takes no chances. It brings its own dUTPase to make sure uracil never gets into its DNA by mistake.

Some viruses bring their own gene for thymidylate synthase, to bring about the conversion of dUMP to dTMP (in other words, methylation of uracil, in its deoxy-ribonucleoside-monophosphate form, to give thymidine monophosphate). Some also have a gene for thymidylate kinase, which converts dTMP (often just called TMP) to dTDP (or TDP).

Yet another "small-molecule" enzyme encoded by large DNA viruses is ribonucleoside-diphosphate reductase (RDPR). This enzyme is fundamental to the whole DNA synthesis enterprise. Its job is to convert ordinary ribonucleotides to the deoxy form that DNA needs. Without this enzyme, you can make RNA but not DNA. So it's typically found in the cell nucleus (in higher organisms).

It turns out, a gene for RDPR is contained in a great many viral genomes. When I did a BLAST search of the protein sequence for Chlorella virus ribonucleoside reductase against the UniProt database of virus sequences, the search came back with 863 hits, spanning viruses belonging not only to the NCDLV class (pox, mimivirus, phycodnaviruses, etc.) but also the Herpesviridae, plus many bacteriophage groups as well. In terms of the sheer variety of virus groups involved, it's hard to think of another "small-molecule-processing" enzyme that spans as many viral taxa. We're talking about everything from relatively small bacteriophages to mimivirus, and lots in between.

The reductase gene is so widespread, it made me wonder what its phylogenetic distribution might look like. In other words: Are viral RDPRs related to each other? Are they related to the host's own RDPR? Does the enzyme's evolution follow the viral path, or the host path?

Just for fun, I obtained a number of ribonucleoside reductase (small subunit) protein sequences for viruses, plants, animals, bacteria, fungi, and various eukaryotic parasites (using the tools at UniProt.org), then fed the results to the tree-maker at http://www.phylogeny.fr. What I got was the following "maximum likelihood" phylogenetic tree. (See this paper for details on the tree algorithm. Also, be sure to check out this nifty paper to learn more about how to read this sort of tree.)

For convenience, names of viruses are depicted in blue. Notice how, except for the Vaccinia-Variola group, which is deeply nested, most of the viral nodes are ancestral to most of the higher-organism nodes; you have to go through many levels of viral ancestors to get from the original, universal ancestor (presuming there was one) to the reductase gene of the pig, say. From this diagram, it would appear that the Pox-family reductase gene is derived, in some way, from a highly evolved host. But that's the exception, not the rule. All of the other viral genes are outgroups and/or, more usually, ancestors of one another.

Mimivirus is fairly high up the chain and shows relatedness to two very common freshwater and soil bacteria (Pseudomonas and Burkholderia).

It would be fun to go back and remake the tree, adding more organisms. (If you end up trying this, let me know the results.) For now, I'm comfortable concluding that except for pox-family viruses, the ribonucleoside reductase produced by major DNA viruses and phages are not derived from current-day hosts. A parsimonious (but not necessarily correct!) explanation is that the phage reductases are ancestral to host orthologs; but it is also possible that the phage reductases derive from very ancient hosts (not depicted in the tree), with current-day hosts appearing to derive from phage genes when in fact the similarity is to a long-ago host ortholog. In any case, the tree shows that organismal RDPRs tend to be related to organismal RDPRs and viral versions are related to viral versions. What we don't see anywhere is a viral sub-tree growing out of a host sub-tree (as would be the case if the viral enzymes simply derived from modern host enzymes).

The UniProt identifiers of the protein sequences used in this study are given below in case you want to try to replicate these results (or perhaps extend them). To retrieve the protein sequences in question, go to http://www.uniprot.org/ and click the Retrieve tab, then Copy and Paste the following sequences (one to a line) exactly as shown:



O57175
P33799
M1I7H3
E5ERR7
Q6GZQ8
Q77MS0
P28847
M1I8A4
W0TWG5
Q7T6Y9
Q9HMU4
T0MT29
201403222BWOVN08AD
B3ERT4
F2II86
F2L908
U7RFH3
Q4KLN6
I3LUY0
B9RBH6
Q9LSD0
S8GD97
W4I9N3
Q4DFS6
A4HFY2
G3XP91
S8B144