That's the image that comes to mind when I try to explain where virus genes come from. They don't come from the mother ship (the host). Oh sure, in some cases they clearly do derive from host genes. But in most cases, they clearly don't. The overwhelming majority of viral genes have no counterparts in host cells, and even for those that do, the genes in question are rarely true host orthologues.
|"No, Robbie, not like Europe."|
Short answer: Somewhere else.
Using a program like the excellent (and free) Mega6 ("Molecular Evolutionary Genetic Analysis") you can easily create phylogenetic trees from genetic data (FASTA sequences, protein or DNA), and when you do this for genes that occur in both viruses and host cells, you can see how they separate in phylo-space.
Example: I decided to look at the gene for thymidine kinase, which occurs in most living things plus a certain number of undead, maybe not-quite-living (in the usual sense) things known as viruses. Thymine (T) is, of course, an essential ingredient of DNA. Thymidine kinase converts thymidine (thymine bonded to deoxyribose, on the left, below) to the phosphorylated form (TMP, right) so it can participate in DNA synthesis.
When a virus attacks a cell, there's a lot of DNA turnover as the virus prepares to manufacture its own DNA, so thymidine kinase is a handy enzyme to have around if you're a virus. Many viruses, as a result, bring their own copy of the TK enzyme gene. But is the viral TK gene derived (in some kind of ancestral way) from the host cell's own TK gene? Not necessarily.
When you gather up the amino acid sequence data for a bunch of TK genes from bacteria and the phages (viruses) associated with them, you find that, phylogenetically speaking, the phage/viral TK genes are not very similar to the host genes.
In the above tree, phage (viral) genes cluster at the top. Host-cell kinases cluster at the bottom. The two clusters may be related to a distant ancestor (not shown), but one thing is certain: the phage versions of this gene are not simply a slight modification of the host gene. We know that's true because, remarkably, the thyK genes of certain alphaproteobacteria (Agrobacterium and its relatives) cluster with the phage genes, even though the bacteriophages are adapted to E. coli and Salmonella (and closely related enterics). In theory, the phage genes should cluster with the enteric bacteria, not with Agrobacterium and Rhizobium.
Where do the phage genes come from? Some have speculated that these phages originated with escaped bacterial secretion-system cassettes. That may well be, but the escape event had to have occurred many hundreds of millions of years ago. The T4 thymidine kinase gene has only 62% sequence homology with the E. coli gene. T4's version of the gene has a G+C content of 34%, with GC3 (third codon base) of just 19%. The E. coli version of the gene has overall G+C of 42% and GC3 of 36%. While it's generally conceded that evolution of viral genes occurs faster than host genes (particularly for RNA viruses and single-stranded DNA viruses), DNA viruses like T4 can't evolve outside of the host cell, as far as we know, and although DNA viruses may evolve faster than host DNA, they don't evolve thousands of times faster. (RNA viruses do evolve thousands of times faster, but that's an entirely different matter.)
To me, the above tree says that enteric phages diverged from the common ancestor of alphaproteobacteria and gammaproteobacteria (the latter include the enteric bacteria). We're talking hundreds of millions of years ago. (Note that E. coli and its closest relatives are thought to have diverged 140 million years ago.) A recent transfer of thymidine kinase genes from enteric bacteria to their phages is not credible. It's far more likely that an ancient precursor of today's T4 phage (and similar enteric phages) had the gene, and passed it down through the ages as the phages adapted to new hosts (first the alphaproteobacteria, then the phylogenetically newer enterics).
In tomorrow's post, I want to explain in detail how the above phylo tree was made and show you how you can make your own phylogenetic trees using the popular Mega6 program. If you've never used Mega6, you're in for a treat.