IUBio

Newly computed PAM250 matrix

Mark Cohen cohen at cumuli.vmsmail.ethz.ch
Thu Jun 25 03:25:46 EST 1992


n article <19JUN199220521347 at opal.mgh.harvard.edu> Mike Cherry
cherry at opal.mgh.harvard.edu posted the PAM250 matrix generated by us and
recently published in Science (Thanks, I would have done it myself but was
away).

For those of you who are interested this matrix was compiled using
all pairs of aligned amino acids separated by a PAM distance of between 6.4
and 100. and extrapolated by exponential fitting to 116.5 PAM.  This
minimizes errors, at low pam caused by inclusion of the same sequence with
typos or sequence errors and at high PAM by alignment errors.  The matrix
presented was normalised for two proteins at a PAM distance of 250 so
direct comparison with Dayhoff's matrix is valid.

There is a paper that covers much the same type of thing (recalculation
of mutation matrices) by D.T.Jones, W.R.Taylor and J.M.Thornton in
CABIOS 8,(1992),275-282.  I only got a copy today and so can pass no comment
except to say that a cursory glance shows that like us they found most
deviations form Dayhoff's 1978 matrix were in the entries for W and C, and
to a lesser extent Y and M.  This is expected as these are the amino acids
for which Dayhoff had least information.  The differences in our work and
the Jones, Taylor, Thornton paper is that we calculated ancestral
sequences and complete multiple alignments for all the groups, while they
used a rapid k-tuple based method for defining phylogeny.  Jones et al
used sequences of only >85% identity to generate their exchange tables.

Our work was done using software developed by Gaston Gonnet in the
Department of Informatik at the ETH Zurich.  For more information
regarding the software feel free to contact him at
gonnet at inf.ethz.ch
The software package (DARWIN) is available provided you promise not to
use it for commercial purposes.  DARWIN can be used as a tool for 
searching databases, constructing multiple alignments and ancestral 
sequences and if you have computing to spare recalculating Dayhoff
matricies using the ever increasing number of sequences released.

If any one would like more info contact me and I'll try to help.
Mark Cohen
cohen at cumuli.vmsmail.ethz.ch



More information about the Bioforum mailing list

Send comments to us at biosci-help [At] net.bio.net