Pairwise protein align. program wanted

Ewan Birney birney at molbiol.ox.ac.uk
Thu Jun 6 03:12:39 EST 1996

Morten Stig Andersen wrote:
> I am looking for a freeware pairwise alignment program (for unix) which
> can report percent similarity as well as percent identity, like the
> 'bestfit' and 'gap' programs in GCG.
> I have Bill Pearson's 'align' program, but it reports identity only.
> Any clues?
> Thanks,
> Morten Andersen
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>  MORTEN STIG ANDERSEN           Department of Molecular Biology
>  Fax: +46 18 557723             Biomedical Center, University of Uppsala
>  Tel: +46 18 174379             BOX 590, S-751 24 Uppsala, SWEDEN


	% similarity is in fact a really silly way of trying to estimate
what is going on. It is very much arbitary, and even then, the arbitary
way it is done in many cases is not well done.

	If I were you, depending on the problem, I would use HMMs (if you
have more than one sequence), in which the sampling problem can be solved
by using Dirchlet mixture priors. The two main HMM packages out there are




The latter has a quite ok Web system.

HMMs have a strong statisical background to them and so you can reported back
a probability of matching the HMM (given certain assumptions). Dirchlet mixtures
take the place of a "similarity" matrix in other systems (that is a slight generalisation
but not much).

If you want to use a more heuristic system you can revert to Profiles which 
is how I routinely analyse things. Profiles are almost identical to HMMs but
lack the probablilistic framework. Our package for using Profiles is Wisetools.


I would really advise moving away from pairwise comparisons, even for, for example
an all-against-all matrix of related proteins in which you want to show that certain
proteins are more related than others (this is the other use which people often try to
use pairwise comparisons for). If you want something like that, you should be using
trees, of which there are a variety of methods, most of which one can get statistical
support for of some kind by using bootstrapping methods. PHYLIP is my package
of choice here.

(An aside... does anyone know of a PHYLIP web site?) 


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net