IUBio

need aligner help (A)

Doug.Eernisse at UM.CC.UMICH.EDU Doug.Eernisse at UM.CC.UMICH.EDU
Thu May 28 11:22:52 EST 1992


  Eric Mansell wanted to find an alignment program for Macs that showed:
 >
 > >         For example... seq1 MAELAVIR
 > >                        seq2 --A---V-
 > >
 > >         instead of...  seq1 MAELAVIR
 > >                        seq2 MAaLAVvR
 > >
 >
 > I don't know the answer about programs, but cannot contain myself.
 >
 > We have no standard for how gaps are to be represented in sequences (that
 > I know of, anyway).  I like to use "-" for gap.  GCG uses periods, whereas
 > other people have on occasion used them to mean "the same as in the first
 > sequence", and so do I.  Some folks use blanks, other just skip over them as
 > cosmetic spacing characters.
 >
 > Is it wise to encourage "-" to mean something other than "gap"?  Is there a
 > standard out there that I don't know about, and if so what does it say?
 >
 > Admittedly if "." were used instead of "-" in Mansell's example, it would
 > satisfy me and probably him too, so my question is a bit off his point.
 >
 > -----
 > Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
 >  Internet:         joe at genetics.washington.edu     (IP No. 128.95.12.41)
 >  Bitnet/EARN:      felsenst at uwavm
 
 I agree with Joe that it is frustrating that there does not appear to
 be a standard for gap and match characters. I have written an alignment
 editor for the Mac called "Aligner" which is a HyperCard 2.x stack.
 It does just what Eric is looking for, I think, which is toggle between
 full sequences and sequences displayed with match characters ('-')
 where they match the top sequence. The toggle is very fast, thanks
 to an efficient external command employed, and the sequences can
 be output as a text file in interleaved format, with or without
 match characters. I use periods ('.') to denote gaps. Aligner and
 a related stack, DNA Translator, are described in CABIOS 8(2):177-184
 (1992). One of the more useful features of Aligner (added since
 the CABIOS paper) is its ability to color triplets of DNA according
 to their amino acid coding, with all alternate genetic codes supported.
 I use this to design primers for PCR of animal mtDNA, for example.
 It is quite useful to be able to see the amino acid and DNA matching
 simultaneously (when match characters and color coding are simultaneously
 displayed). You can even do a screen capture to PICT (e.g., with
 the shareware program FlashIt) and print out the alignment. I get
 excellent results pasting the PICT into NISUS and printing it on
 a friend's HP Deskwriter C (about $600 I think). Before you get
 your hopes up too much, I should warn you that the color display
 is rather slow, so don't expect to instantly view alignment changes.
 Alignment editing forces the editor to turn off color and unmatch
 sequences, so you need to wait to redisplay.
 
 Regarding Joe's question, the above-mentioned stack DNA Translator
 does convert between uses of '-' and '.' (as matches and gaps or
 vice versa) and in fact it automatically converts Phylip.result
 output to the alternate use of these symbols. Sorry Joe, I started
 with '.' as gap because I was using EUGENE by MBIR, but I now
 realize that I am probably in the minority, thanks especially to
 the prevalence of GCG. If anyone doesn't want to bother with a
 program to do this switching, just use a text editor to globally
 substitute all '.' to '$' or some other unusual symbol, then all
 '-' to '.', then all '$' to '-'. I doubt that a standard is
 imminent, so perhaps it would be wise if all of us programmers
 at least supported user definitions of gap and match characters
 (as Swofford does for PAUP). I started to do this but it got
 rather complicated, so I retreated to merely supporting a conversion
 between the two.
 
 Regarding the availability of my stacks, the last released version
 is available for anonymous ftp to ftp.bio.indiana.edu in molbio/mac
 as dnastack.hqx (the whole package, version 1.0i) or aligner.hqx
 (just Aligner, same version). If this site has an older version,
 please let me know (you can usually get the most current version
 from my account, ftp to 'um.cc.umich.edu' and 'cd gdef', same file
 names). Actually, a somewhat more current version of Aligner (1.0j)
 is available as 'Aligner.hqx' with 'cd legd' instead of 'cd gdef',
 but that version is still somewhat experimental and has at present
 a shareware external resource which users should be aware of --
 I am still trying different approaches to try to speed up coloring
 of text in HyperCard.
 
  ----------------------------------------------------------------
 | Doug Eernisse         Doug_Ee at um.cc.umich.edu                  |
 |                       userlegd at umichub.bitnet                  |
 | Museum of Zoology, Univ. of Michigan, Ann Arbor, MI 48l09 USA  |
 | Phone: (313) 747-2193  Fax: (313) 763-4080                     |
  ---------------------------------------------------------------- 




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net