Looking for Identities

mathog at seqaxp.bio.caltech.edu mathog at seqaxp.bio.caltech.edu
Mon Nov 13 13:31:22 EST 1995

In article <47u416$i6v at mark.ucdavis.edu>, ez017400 at chip.ucdavis.edu (Hemang Patel) writes:
>I am looking for a way to take a few proteins and look for common, short 
>seuqences that occur in different locations in the primary sequence. This 
>is not a question of alignment. Bestfit and pileup only find homologous 
>regions if they are contiguous in the overall sequence. Dotplot *seems* 
>to do this, but the output is graphical and is sometimes hard to 
>interpret and control. 
>I do not have a particular sequence I am looking for, I just want the
>computer to break the sequences down on its own and search all
>possibilities. For example, if a 5 amino acid sequence is in the N-terminal
>region of one protein, the C-terminal region of another, and in the middle
>of a third protein, how would i find this? Will BLAST do it? Or am I out 
>of luck? Thanks

Sounds like you want to try the software described in:

     C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu,
     A. F. Neuwald, J. C. Wootton (1993) "Detecting Subtle Sequence
     Signals: A Gibbs Sampling Strategy for Multiple Alignment",
     Science 262:208-214.

You have to make some assumptions, but then it works great.  For instance,
you say (effectively), "the subsequence appears once in each protein and
is 6 AAs long", or "it appears twice in each protein",  and so forth.

You can pick up the original from ncbi.nlm.nih.gov - it runs on assorted
Unix machines.  I ported it earlier this year to run on OpenVMS with DEC C,
if you want that, pick it up via anonymous FTP from seqaxp.bio.caltech.edu,
the file you want is [.software]gibbs.zip (in binary mode). 


David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net