Eric Mansell wanted to find an alignment program for Macs that showed:
>
> > For example... seq1 MAELAVIR
> > seq2 --A---V-
> >
> > instead of... seq1 MAELAVIR
> > seq2 MAaLAVvR
> >
>
> I don't know the answer about programs, but cannot contain myself.
>
> We have no standard for how gaps are to be represented in sequences (that
> I know of, anyway). I like to use "-" for gap. GCG uses periods, whereas
> other people have on occasion used them to mean "the same as in the first
> sequence", and so do I. Some folks use blanks, other just skip over them as
> cosmetic spacing characters.
>
> Is it wise to encourage "-" to mean something other than "gap"? Is there a
> standard out there that I don't know about, and if so what does it say?
>
> Admittedly if "." were used instead of "-" in Mansell's example, it would
> satisfy me and probably him too, so my question is a bit off his point.
>
> -----
> Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
> Internet: joe at genetics.washington.edu (IP No. 128.95.12.41)
> Bitnet/EARN: felsenst at uwavm
I agree with Joe that it is frustrating that there does not appear to
be a standard for gap and match characters. I have written an alignment
editor for the Mac called "Aligner" which is a HyperCard 2.x stack.
It does just what Eric is looking for, I think, which is toggle between
full sequences and sequences displayed with match characters ('-')
where they match the top sequence. The toggle is very fast, thanks
to an efficient external command employed, and the sequences can
be output as a text file in interleaved format, with or without
match characters. I use periods ('.') to denote gaps. Aligner and
a related stack, DNA Translator, are described in CABIOS 8(2):177-184
(1992). One of the more useful features of Aligner (added since
the CABIOS paper) is its ability to color triplets of DNA according
to their amino acid coding, with all alternate genetic codes supported.
I use this to design primers for PCR of animal mtDNA, for example.
It is quite useful to be able to see the amino acid and DNA matching
simultaneously (when match characters and color coding are simultaneously
displayed). You can even do a screen capture to PICT (e.g., with
the shareware program FlashIt) and print out the alignment. I get
excellent results pasting the PICT into NISUS and printing it on
a friend's HP Deskwriter C (about $600 I think). Before you get
your hopes up too much, I should warn you that the color display
is rather slow, so don't expect to instantly view alignment changes.
Alignment editing forces the editor to turn off color and unmatch
sequences, so you need to wait to redisplay.
Regarding Joe's question, the above-mentioned stack DNA Translator
does convert between uses of '-' and '.' (as matches and gaps or
vice versa) and in fact it automatically converts Phylip.result
output to the alternate use of these symbols. Sorry Joe, I started
with '.' as gap because I was using EUGENE by MBIR, but I now
realize that I am probably in the minority, thanks especially to
the prevalence of GCG. If anyone doesn't want to bother with a
program to do this switching, just use a text editor to globally
substitute all '.' to '$' or some other unusual symbol, then all
'-' to '.', then all '$' to '-'. I doubt that a standard is
imminent, so perhaps it would be wise if all of us programmers
at least supported user definitions of gap and match characters
(as Swofford does for PAUP). I started to do this but it got
rather complicated, so I retreated to merely supporting a conversion
between the two.
Regarding the availability of my stacks, the last released version
is available for anonymous ftp to ftp.bio.indiana.edu in molbio/mac
as dnastack.hqx (the whole package, version 1.0i) or aligner.hqx
(just Aligner, same version). If this site has an older version,
please let me know (you can usually get the most current version
from my account, ftp to 'um.cc.umich.edu' and 'cd gdef', same file
names). Actually, a somewhat more current version of Aligner (1.0j)
is available as 'Aligner.hqx' with 'cd legd' instead of 'cd gdef',
but that version is still somewhat experimental and has at present
a shareware external resource which users should be aware of --
I am still trying different approaches to try to speed up coloring
of text in HyperCard.
----------------------------------------------------------------
| Doug Eernisse Doug_Ee at um.cc.umich.edu |
| userlegd at umichub.bitnet |
| Museum of Zoology, Univ. of Michigan, Ann Arbor, MI 48l09 USA |
| Phone: (313) 747-2193 Fax: (313) 763-4080 |
----------------------------------------------------------------