concerning end gaps and anchoring

Doug Eernisse Doug_Ee at um.cc.umich.edu
Fri Sep 3 12:35:32 EST 1993

In article <930903084111.20202612 at BOBCAT.CSC.WSU.EDU> 
Steve Thompson: VADMS genetics, THOMPSON at WSUVMS1.CSC.WSU.EDU writes:
>software.  If you know, by eye or otherwise, where the motifs are that
you want
>to force the alignment around, you can add foreign symbols to the
sequence at
>corresponding sites in all members of the group.  This works best if you
>flank your known motif with the foreign symbol but also works if you just
>insert it into a common feature (e.g. this works great for absolutely
>in disulphide bridges with protein alignments).  Then you need to modify
>substitution matrix which the program accesses to likewise add the
>symbol.  Give it a substitution value at least 10X that of identity for
>table.  Then when you run the program be sure and specify the alternate
>This works very well for many situations, both nucleotide and peptide,
and has
>been successfully used after my suggestion by many of my users to align
>previosly "unalignable" sequence sets.  Naturally, use an editor to
remove the
>foreign symbols after the alignment has been completed.  Give it a try.
>                                                   Steve Thompson

Right, this is similar to what I have done, although your manipulation of
the substitution matrix for amino acids is a very nice touch. If you just
want to try adding columns of a special symbol in your alignment and, like
me, you are using a Mac to edit your alignments, you might find the
tip useful. I have found the freeware version (2.22) of the text editor
which is one "Child Apps" which comes with Don Gilbert's SeqApp program,
be useful in this particular case. Actually, you need to download one of
many available pd BBEdit extensions written by other authors. I got this
one at "mac.archive.umich.edu" (anonymous ftp _after_ business hours) but
it should also be at Sumex and the various mirrors to these sites. On the
Michigan archives, look in /util/text/ for something like 
"BBE_InsertColumns.hqx" which simply allows you to insert columns of tabs 
in your data (you can also get the full version of BBEdit 2.22 while you 
are there). I have made trivial changes to the Think C code included,
the name, and recompiled to make other versions for specific characters
my gap symbol, space, "$", or whatever), but it is also easy enough to
change all the tabs you inserted in your alignment to be "$$$$$$" or
These may be stripped in a similar global manner using the built-in
As a general comment, one should be wary of such a method because it is
exceedingly difficult to limit alignment ambiguities to particular
The same applies to those who put bars over "ambiguous" alignment sites
which are then excluded from a phylogenetic analysis. The problem is,
alternative gap placements may extend into or out of the sites one would
like to treat in a special manner. At least keep those ambiguous sites in
your alignment, perhaps excluding them from some analyses by defining
a "charset" of those sites (PAUP). If you don't you risk losing track of
decisions on alignment. Steve's case of lacking disulphide bridges might
be an appropriate a priori justification for deciding where the gaps


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net