IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

phylogenetic tree

Mary K.Kuhner mkkuhner at kingman.genetics.washington.edu
Tue Nov 7 14:21:27 EST 2000

In article <8u9ipd$gqu$1 at mercury.hgmp.mrc.ac.uk>,
Mich Ard  <mich_ard at hotmail.com> wrote:

>                               I'm currently making a phylogenetic tree 
>based on the protein sequences, using the
>                               programs of clustalW and phylip, and found 
>the output is "unstable", it's always changing
>                               depending on the inputing order of these 
>sequences; and also I don't know the exact
>                               meaning of "bootstrap" and "resampling". 
>Could you guys there give me some help?

Two likely explanations for a very unstable tree:

(1)  You simply don't have enough information in your data
to get a good answer.  For example, if there are only 10
variable sites it's quite impossible to figure out all the
relationships of 20 species.

(2)  The model you are using does not fit your data.  For
example, if some of your proteins are produced by sticking
together parts of other proteins, they will never settle down on
a tree very well.  There *is* no tree in reality, so it can't be
estimated successfully.

The "bootstrap" is a way to try to assess how solidly your tree
is supported by your data.  It involves making many random
resamplings of your data.  For example, if you had an original
data set of 10 sites, the resampled data sets would also have
10 sites, but instead of having 12345678910 one new dataset
might have 2234566689.  Some sites are repeated, some are
omitted.  You generate a large  number (100 to 1000) of these
new resampled data sets and make a tree from each one.  Then
the 100 or 1000 trees are combined into a consensus tree
which shows the areas of agreement.  

You can do this quite easily with either Phylip or PAUP*, and it's a
good way to get an idea how well supported your tree is.  It
will tell you right away if you are in situation #1 (not enough data).
It's not as good for situation #2 (wrong model) because all of
the bootstrap trees are made from the wrong model too, and
they may all agree because they are all confused the same way.

It's called the bootstrap because its inventor thought that
making new data from your old data was like the proverb about
lifting yourself by pulling on your own bootstraps....but it does
seem to work pretty well.

Hope this helps,

Mary Kuhner mkkuhner at genetics.washington.edu


More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net