Thursday, May 15, 2014

How to Build Your Own Phylo Trees with Mega6

Mega6 is a popular freeware program for creating phylogenetic trees and sequence alignments, analyzing gene sequence data, performing various calculations (such as Tajima's D), computing pairwise distances between sequences, and doing lots of other types of genetic analysis. This program is widely used and is based partly on the work of Masatoshi Nei, the famed Penn State molecular geneticist and author of the seminal book Mutation-Driven Evolution (which I highly recommend). You need this program if you're into molecular genetics.

In my last post, I showed a phylogenetic tree that I created in Mega6 for thymidine kinases of phages and hosts. You can make trees of this sort yourself in a matter of minutes using Mega6. Let me take you through a quick example.

Here's how to recreate the tree involving thymidine kinase. Further below, I've listed the FASTA sequences you'll need. Obviously, you can obtain your own sequences from or other sources, if you want to make your own phylo trees based on other sequences.

All you have to do is Copy and Paste FASTA sequences into the Alignment Explorer window of Mega6, then generate a Maximum Likelihood tree (or a Parsimony Tree, or whatever kind of tree your prefer).

Here's the exact procedure:
  1. Fire up Mega6.
  2. Click the Align button in the main taskbar. (See screenshot below.)
  3. Choose Edit/Build Alignment from the menu that pops up. A new window will open. (Note: A dialog may ask you if you are creating a new data set; answer Yes. A dialog may also appear asking whether the alignment you're creating will involve DNA, or Proteins. If you're using the amino-acid sequences shown further below, click Proteins.)
  4. In Mega6, click the Align button on the left side of the taskbar. Select Edit/Build Alignment.
  5. Paste FASTA sequences into the Alignment Explorer window. 
  6. Select All (Control-A).
  7. Choose Align by ClustalW from the Alignment menu. An alignment options dialog will appear. Accept all defaults by clicking OK.
  8. After the alignment operation finishes (it takes 5 seconds), go to the menu and choose Data > Export Alignment > Mega Format. Save the *.meg file to disk.
  9. Now go back to the main Mega6 window and click the Phylogeny button in the taskbar. A menu will drop down.
  10. Select the first item: Construct/Test Maximum Likelihood Tree. A dialog will appear asking if you want to use the currently active data set. Click No. This will bring up a file-navigation dialog. Use that dialog to find the *.meg file you created in Step 7. Select that file and click Open.
  11. A tree options dialog will appear. If you just want to see a tree quickly, accept the defaults and click OK. The tree will take less than 10 seconds to generate. If you want to do a test of phylogeny (this part isn't obvious!), click the item to the right of "Test of Phylogeny" (see the graphic below) and choose "Bootstrap method," then set the number of bootstraps using the dropdown menu (see screenshot below).
  12. Click the Compute button to build the tree.
If you want to do bootstrap validation of tree nodes, you have to click the yellow area to the right of Test of Phylogeny. (See red arrow above.) Choose Bootstrap Method. Then set Number of Bootstraps to 500.

Note that if you choose to do a bootstrap test of phylogeny, the tree may take 5 minutes or more (possibly much more) to build, depending on how many nodes it contains. Bootstrapping is a compute-intensive operation. The idea behind bootstrap testing is that node assignments in phylo trees are often uncertain, and one way to check the robustness of a given assignment is to systematically introduce noise into the data to see how readily a node can be made to jump branches. A node that can easily be tricked into jumping to a different spot in the tree is untrustworthy. The bootstrap test attempts to quantify the degree of reliability of the node assignments. Usually, you want to do at least several hundred tests (500 is considered adequate). If a given node jumps branches in half the tests, the tree will carry "50" (meaning 50% confidence) at the top of the branch, meaning there's a only 50% certainty that that particular node assignment is correct as to branch location. A tree where all the branches have numbers greater than 50 can usually be considered reliable. 

Mega6 will output Maximum Likelihood trees, parsimony trees, and other types of trees, and the program will do an amazing variety of calculations and statistical tests. Most times, you can save analyses in Excel format, which of course is a godsend in case you need to do additional analysis that can't be done in Mega6. The documentation contains many helpful tutorials; be sure to give it a look.

If I had a wish-list for Mega6, it would be quite short. Mainly I'd like to be able to see quick graphic summaries of things like the ratio of synonymous to non-synonymous mutations in two DNA sequences. (The data for this is available via the HyPhy command under the Selection taskbar button, but only as raw spreadsheet data; you have to sum the columns yourself in Excel to get certain kinds of summaries, and if you want a graph, you have to create it yourself. This is hardly a major drawback. Still, it would be nice to see more summary data, quickly, in graphical form.) When you align two or more genes in the Alignment Editor, it shows asterisks above the identical nucleotides, but doesn't show percent identity anywhere, nor percent "positives" for amino acid data. The Alignment Editor also doesn't respond properly (worse: it responds inappropriately) to mouse-wheel actions, jumping to the end of a file horizontally when you wanted to wheel-scroll vertically.

Also mildly annoying: the tree renderings are bitmaps; I would much prefer to see SVG (Scalable Vector Graphics) format, which in addition to being infinite-resolution (vector format) also allows easy editing of line widths, colors, fonts, labels, etc. in a simple text editor. As it is now, to edit line widths or colors in phylo-trees, you have to drag out Photoshop.

But overall, I have few significant complaints (and much praise) for Mega6. It's an immensely powerful program, it's fast, it's quite intuitive, and the best part is, it's free. (For a more detailed commentary on the program's design philosophy and capabilities, see this excellent 2011 paper by Tamura, Nei, et al. It was written at the time of Mega5, but applies equally to Mega6.)

Below are the FASTA sequences used in making the phylo tree for yesterday's post. You can Cut and Paste these sequences directly into Mega6's Alignment Explorer:

>sp|P13300|KITH_BPT4 Enterobacteria phage T4 
>tr|S5MKX8|S5MKX8_9CAUD Yersinia phage PST 
>tr|I7KRQ7|I7KRQ7_9CAUD Yersinia phage phiD1
>tr|F2VXC8|F2VXC8_9CAUD Shigella phage Shfl2 
>tr|I7J3X5|I7J3X5_9CAUD Yersinia phage phiR1-RT 
>sp|Q98HR4|KITH_RHILO Rhizobium loti (strain MAFF303099) 
>tr|F0LSI7|F0LSI7_VIBFN Vibrio furnissii (strain DSM 14383 / NCTC 11218)
>sp|Q5E4F2|KITH_VIBF1 Vibrio fischeri (strain ATCC 700601 / ES114) 
>tr|V2ABB3|V2ABB3_SALET Salmonella enterica subsp. enterica serovar Gaminara str. ATCC BAA-711
>tr|I6H5M2|I6H5M2_SHIFL Shigella flexneri 1235-66
>sp|P23331|KITH_ECOLI Escherichia coli (strain K12)
>sp|Q66AM8|KITH_YERPS Yersinia pseudotuberculosis serotype I (strain IP32953
>tr|B4EXS0|B4EXS0_PROMH Proteus mirabilis (strain HI4320)
>tr|G0GHM0|G0GHM0_KLEPN Klebsiella pneumoniae KCTC 2242
>tr|F7YDB7|F7YDB7_MESOW Mesorhizobium opportunistum (strain LMG 24607 / HAMBI 3007 / WSM2075)
>tr|H0H7T7|H0H7T7_RHIRD Agrobacterium tumefaciens 5A


  1. ben ıssr ve rapd sonuçlarımdan oluşan 1-0 verilerimi nasıl formata çevirebilirim? Lütfen yardımcı olun....

  2. Thank you so much. Your works are fantastic and very useful for all of us. We are waiting for your next post. Thank you.
    Seo Service Provider In India

  3. You should enter your mobile number for verification purposes. The messenger application market is flooded with many picture sharing apps.

  4. I think this is a real great article post.Really looking forward to read more. Visit at
    Crazy Video Hub

  5. Fantastic article to go through,I would appreciate the writer's mind and the skills he has presented this great article to get its look in better style.

  6. It is a great job, I like your posts and wish you all the best. and I hope you continue this job well.
    NutraT line

  7. Hello, I am thomus jons thank you for this informative post. That is a great job. Wish you more success.Thank you so much and for you all the best. Takes Down

  8. Java Assignment help
    Therefore, students take Java Assignment help from the online assignment writing services. They are cost-efficient, as well as fast in their service. A student who is new to these assignment writing can get a proper idea on the format which should be followed. However, there is one more thing to ponder

  9. Assignmentservicerating is best reviews site.We at Top Quality Assignment believe that there is no shortcut to success and to attain success, hard work, dedication, and commitment must be present. We are an online platform where students check & write reviews for assignments related websites. reviews

  10. ABC Assignment Help is an incomparable online Accounting assignment help company delivering excellent academic assignments, essays, coursework and reports. Through a team of over 3000 subject experts we ensure individual attention to every student making the assignment help experience completely personalized in nature. With our round the clock services, you can be assured of high grades every time.

  11. Our encryption software protects your details. In other words, until you select to tell someone that you decided to hire for assignment writer UK, no one will come to know. We would love to work with you and provide you with a top quality assignment writing help to fetch you the top grades.


Add a comment. Registration required because trolls.