In article <47u416$i6v at mark.ucdavis.edu>, ez017400 at chip.ucdavis.edu (Hemang Patel) writes:
>Hello,
>>I am looking for a way to take a few proteins and look for common, short
>seuqences that occur in different locations in the primary sequence. This
>is not a question of alignment. Bestfit and pileup only find homologous
>regions if they are contiguous in the overall sequence. Dotplot *seems*
>to do this, but the output is graphical and is sometimes hard to
>interpret and control.
>>I do not have a particular sequence I am looking for, I just want the
>computer to break the sequences down on its own and search all
>possibilities. For example, if a 5 amino acid sequence is in the N-terminal
>region of one protein, the C-terminal region of another, and in the middle
>of a third protein, how would i find this? Will BLAST do it? Or am I out
>of luck? Thanks
>
Sounds like you want to try the software described in:
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu,
A. F. Neuwald, J. C. Wootton (1993) "Detecting Subtle Sequence
Signals: A Gibbs Sampling Strategy for Multiple Alignment",
Science 262:208-214.
You have to make some assumptions, but then it works great. For instance,
you say (effectively), "the subsequence appears once in each protein and
is 6 AAs long", or "it appears twice in each protein", and so forth.
You can pick up the original from ncbi.nlm.nih.gov - it runs on assorted
Unix machines. I ported it earlier this year to run on OpenVMS with DEC C,
if you want that, pick it up via anonymous FTP from seqaxp.bio.caltech.edu,
the file you want is [.software]gibbs.zip (in binary mode).
Regards,
David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech