Way to gap around a 100 bp insertion?

Peter Rice rice at embl-heidelberg.de
Sat Sep 11 08:57:19 EST 1993

In article <CD3st6.B28 at watserv2.uwaterloo.ca>, kowalski at sciborg.uwaterloo.ca (Paul Kowalski) writes:
> Is there some clever method to gap "around" a 100 bp insertion in genomic
> DNA? Lowering weights doesn't seem to help. 

Sure. There is a program that does just that. Remember WORDSEARCH ? The program
everyone used to use before FASTA (and according to the latest newsletter
BLAST) came along for GCG.

WORDSEARCH will find any decent sized exon (18 bases or so will get lost in
the noise of course). SEGMENTS is simply a BESTFIT run tied to each of
the hits. SEGMENTS/WHOLE is a GAP run tied to each of the hits.

Use WORDSEARCH with a list size of 2 if you only have 2 exons. Use the
cDNA as the search sequence, and the genomic sequence as the "database".
I use a higher list size for safety, then edit out the lower scores from the
.WORD output file.

Use SEGMENTS to do the alignments.

WORDSEARCH is also wonderful for understanding fragments or contigs that
fail to overlap in Fragment Assembly.

I still wouldn't recommend WORDSEARCH for database searches, but for just a few
sequences (fragment assembly) or just one with several hits (as above) it
beats any other program.

You can lower the gap weights in SEGMENTS if you like. I often do for
Fragment Assembly matching, where you can expect a large number of 1-base
gaps (insertions or deletions) so you need a very low gap weight.

 Peter Rice, EMBL                             | Post: Computer Group
                                              |       European Molecular
 Internet:    Peter.Rice at EMBL-Heidelberg.DE   |            Biology Laboratory
                                              |       Postfach 10-2209
 Phone:   +49-6221-387247                     |       69012 Heidelberg
 Fax:     +49-6221-387306                     |       Germany

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net