Pairwise protein align. program wanted

Geoff Barton gjb at bioch.ox.ac.uk
Thu Jun 6 08:46:39 EST 1996

Morten Stig Andersen wrote:
> I am looking for a freeware pairwise alignment program (for unix) which
> can report percent similarity as well as percent identity, like the
> 'bestfit' and 'gap' programs in GCG.
> I have Bill Pearson's 'align' program, but it reports identity only.
> Any clues?

I'm posting this, because I often get asked this question, or get to 
see papers that quote "percent similarities".

Stick with percent identity.  Don't use arbitrary "percent similarities"
they are usually used to try and imply significance where there
isn't any!  Doing randomisations and quoting "SD score from mean" can be
a useful additional indicator that compensates to a certain extent 
for composition bias and length.  The AMPS package will give you these
figures (see ftp://geoff.biop.ox.ac.uk/README and our WWW site).

Remember that the significance of a percentage identity is dependent
on the length of the alignment.  Short alignments have a much
greater chance of giving high percent identitites than long alignments
(say, over 200 residues).    The average percentage identity for
"optimal" sequence alignments between proteins of unrelated 3D structure
is about 20% NOT 5% as many believe.



also in PostScript in:


particularly Figures 2 and 4 and associated text.

I hope this helps,


Geoffrey J. Barton, Laboratory of Molecular Biophysics, University of
Rex Richards Building, South Parks Road, Oxford OX1 3QU, U.K.
mailto:gjb at bioch.ox.ac.uk, Tel: +44 1865 275368, Fax: +44 1865 510454, 
ftp://geoff.biop.ox.ac.uk, http://geoff.biop.ox.ac.uk

