During last December, someone requested to our automatic
server the multiple alignment, phylogenetic tree, etc.
of a large set of sequences (hemoglobins, myoglobins,
leghemoglobins, 630 sequences in total).
The job could not be run at the time due to lack of resources,
and I promised the requester to send the answer back once we
could allocate it to a machine with enough memory. Well,
the job has been done, but I cannot find the original message,
(we must have deleted some files incorrectly) and hence cannot
send the answer back as promised. Please contact me if you
are still interested in the results. (Sorry to use the net for
this purpose).
At that time, the question of how large a phylogenetic tree
can be constructed, was also raised. Our algorithm has the
following characteristics:
For the construction of a phylogenetic tree between n sequences
Storage: It uses two input matrices, both n x n, one with the
distance between sequences and one with the variance
of these distances. Since these are double words, and
this is the dominant use of storage, you will need
16*n^2 bytes.
Time: (On a DEC5000 workstation, just the phylogenetic tree
construction)
n= 50 5.3 secs (random distances)
n=100 22.0 secs (random distances)
n=200 96.9 secs (random distances)
n=630 31.3 mins (tree mentioned above)
(In the long run, ie. for much higher n, an O(n^3) term
should dominate the time).
The algorithm approximates the unrooted tree, with variable
length branches, which minimizes the weighted sum of squares
of distance differences (between the given distance and the
"tree" distace).
For more information on these trees and multiple alignments,
send by e-mail the line "help AllAll" to cbrg at inf.ethz.ch
Gaston H. Gonnet, Informatik, ETH, Zurich