GCG V. 9.0-GAP

Guy Bottu gbottu at ben.vub.ac.be
Thu May 29 12:03:07 EST 1997

	Dear colleagues,

Allow me to add a few comments :

- the two sequences have no observable similarity and so, if you run gap
in its default mode (no penalty for terminal gaps), it will give
an alignment with a tiny overlap, while if you add the parameter
-endweight it will give an alignment with a crazy number of gaps. In
both cases the alignment is not biologically relevant anyway.

- the gap penalty and the scoring matrix both affect the result. As a
matter of fact, if you run gap on the two sequences with -gap=12 -len=4
(the default proposed by "Bill" Pearson to be used with the PAM250
and BLOSUM62 tables) you will get the same (bad) result no matter what
the order of the sequences is.

- Often, there are several alternative alignments with the same highest
possible score. The question which to choose is a difficult and
certainly not entirely trivial issue. As far as I know, nobody has
found a satisfactory answer. Fortunately, the ambiguity is most of the
time in a few spots of the alignment that have many gap positions and
it reflects an ambiguity of finding out in which order
deletion/insertion events occurred in the course of evolution.

So, I would not say that the program gap is bugged or misfeatured (what
we cannot say of a lot of other parts of GCG (both  :) and :(  !!)).  
Also, I find no reason to alias gap to gap -endweight or to return to
the renormalized PAM250 matrix as a default, but it is very useful to
give the users an elementary explanation of alignment algorithms and
scoring schemes, so that they know what they are using.

	Guy Bottu

