Looking for Identities

Michael Coyne mcoyne at argo.net
Sat Nov 11 13:01:18 EST 1995

ez017400 at chip.ucdavis.edu (Hemang Patel) wrote:

>I am looking for a way to take a few proteins and look for common, short 
>seuqences that occur in different locations in the primary sequence. This 
>is not a question of alignment. Bestfit and pileup only find homologous 
>regions if they are contiguous in the overall sequence. Dotplot *seems* 
>to do this, but the output is graphical and is sometimes hard to 
>interpret and control. 
>I do not have a particular sequence I am looking for, I just want the
>computer to break the sequences down on its own and search all
>possibilities. For example, if a 5 amino acid sequence is in the N-terminal
>region of one protein, the C-terminal region of another, and in the middle
>of a third protein, how would i find this? Will BLAST do it? Or am I out 
>of luck? Thanks

There probably is a way to do this in GCG, but it would be convoluted,
especially since you don't know what small sequence streches you're
looking for.  The best program I can think of for this purpose would
be MACAW, put out by NCSA (or is it NCBI?).  This program will do
exactly what you're looking for, and give you an idea of signifcance.
You can get it via FTP, and it's available for most platforms.

If you have a particular short strech of amino acids or neucleotides
you're interested in, yu can search the database for other protiens
that contain such a sequence using FINDPATTERNS.

hope this helps...

mjcoyne at warren.med.harvard.edu

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net