IUBio

findpatterns

mathog at seqaxp.bio.caltech.edu mathog at seqaxp.bio.caltech.edu
Tue Sep 26 12:34:00 EST 1995


In article <445odq$rrb at wn1.sci.kun.nl>, Jack Leunissen <jackl> writes:
>Tylzanowski Przemko <przemko> wrote:
>>I have to search genpept for a pattern. Well, everything worked, I got my
>>findpatterns.find file and now I am stuck. I have about 140 hits in about as
>>many genes. I would like to pull all the sequences out and have a quick look
>>but I don't knwo how to do that. Of coursde, I could do it by hand, but it
>>would be a bit tedious, especially if with some other patterns I find more
>>hits.
>>So, my question is this, how could I, easily and painlessly, retrieve the
>>sequences and comments in question. Subsequently I would like to put them into
>>pileup or smething like that.
>

I have a modified version of findpatterns called "rfindpatterns" - it
starts out like findpattern, looking for a pattern in a set of target
sequences, but it then extracts the hits with a user specified amount of
border around, and optionally replaces the pattern with a different one,
the results of each hit go to a separate file.  After that they can be fed
through pileup to align them, or if they are already aligned, through
reformat to put them straight into .msf format.  Works on protein and DNA.

For instance, let's say that you know that the protein motif DDX{3,5}DD
(completely fictitious motif!!!) is interesting and want to see if anything
lines up outside of it, say within 10 amino acids on either side.
Use a command like this:

 $ rfindpatterns/infile=sw:*/pattern="DDX{3,4}DD" -
   /lwidth=10/rwidth=10/replace=DDXDD

Here is what one of the output files looks like:

$ type R1.RFIND;
RFINDPATTERNS on:       Sw:2aaa_Pea
Original file info: P36875 pisum sativum (garden pea). protein phosphatase 
pp2a
regulatory subunit a (pr65) (fragment). 2/95
Matching pattern:  DDX{3,4}DD
Pattern location:  15 to 21
Lwidth: 10
Rwidth: 10
Replaced matching pattern with: DDXDD

   R1  Length: 25  September 26, 1995 10:30  Type: P  Check: 4262  ..

       1  HLKTDIMSVF DDXDDQDSFR FLAVE


You can pick up rfindpatterns.for,.txt,.cmd via FTP from 
seqaxp.bio.caltech.edu, in the [.software] directory.  Use ASCII transfer
mode.

ONLY SITES THAT HAVE A VALID GCG LICENSE SHOULD PICK UP THIS SOFTWARE!

Regards,

David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net